MH's Direct3D 8.1 Wrapper

by mh » Tue Mar 29, 2011 1:52 pm

I suspect that most Q2 ports have by now incorporated the refresh DLL into the base engine code, but there may still be a market for such an idea in people who just want to run stock Q2 on lower-end or integrated hardware.

The biggest pain would likely be in removing all the qgl stuff, which existed for no reason other than to be able to separate 3DFX OpenGL from Default OpenGL (and for logging, but you'd use GLIntercept for that nowadays). That's just messy grunt work with nothing concrete at the end of it.

surf->polys->verts[0] can be used directly as a parameter to DrawPrimitiveUP which is kinda neat and would hugely simplify the surface refresh. It could be moved to Vertex Buffers as a later exercise (I'd only bother if it was established that using -UP was a performance problem with stock Q2 maps though), particularly if you switch to shaders and do all the texcoord manuipulation there. It's easy to convert surface vertex layouts from a poly/trifan to a tristrip which might give better performance on some hardware.

Lightmap updating in Q2 is total cack; even worse than Q1. That needs a total overhaul. Instead of updating changes as they happen you need to accumulate changes and update them in bulk once only per frame. D3D is neater and cleaner than OpenGL here as you can LockRect a D3DPOOL_MANAGED texture (with D3DLOCK_NO_DIRTY_UPDATE), pass pBits + offset into R_BuildLightmap, then UnlockRect it and AddDirtyRect with the rectangle that's actually changed. So you just need to do a bunch of AddDirtyRect calls at the end of the current frame or the start of the next for any lightmap that's been modified. It does mean that updated lightmaps will lag 1 frame behind, but I'd defy anyone to notice.

The MD2 renderer in Q2 is hellishly messy. I've experimented in DirectQ with just loading all vertexes into a Vertex Buffer for MDLs and using stream offset to define the two frames to interpolate between (with interpolation being done in a vertex shader). VRAM usage is typically around the 1-2 MB mark (never seen it top 5 or so), but then DirectQ does compress vertex data down to 8 bytes and remove duplicate vertexes, so MDLs in DirectQ end up maybe 10% to 25% of the size of the original data. The concept could easily transfer across, and would be cleaner in some ways as you wouldn't have to mess around with cache memory in Q2. Otherwise just have a big enough system memory array, transfer the data in and DrawIndexedPrimitiveUP it. It won't be lightning fast but it will be fast enough.

2D drawing needs serious work. Just drawing all the console text as individual quads (or trifans) with one draw call per character can drop framerates down to single digit on some systems. It needs an intermediate layer to batch things up and flush batches on a state change or at the end of a frame. ID3DXSprite can do all of this automatically for you, and it's also viable for use with particles (and sprites, of course). It can be a little slow though as it AddRefs the texture, which for some reason takes far far more CPU cycles than it should (I suspect that the runtime is doing more than just incrementing a reference counter here). But so long as you're not doing too many texture changes (which you're not with the 2D stuff and particles/sprites) it's good enough.

Hmmmm, ideas ideas.

by mh » Tue Mar 29, 2011 3:28 pm

Some more notes.

The easiest way to load a texture is to just allocate a buffer of width * height * 4 + 18, expand to 32 bit into &buffer[18], fill the first 18 bytes of the buffer with a TGA header, and pass into D3DXCreateTextureFromFileInMemoryEx. Alternatively build a BMP file in memory and specify a palette, although the BMP format is slightly more complex (and needs crap like row padding).

Q2 really needs a separate "utility Hunk" for memory allocations like this, so just create one. Use VirtualAlloc, specify a maximum size of 32 or 64 MB, and you'll only ever use as much memory as you actually need but have plenty of headroom nonetheless.

The palette needs to be switched around to BGRA for texture loading in D3D.

Use D3DTOP_SELECTARG1 for GL_REPLACE instead of modulating with a default colour of white.

Use D3DTOP_MODULATE, D3DTOP_MODULATE2X or D3DTOP_MODULATE4X based on the value of the intensity cvar instead of lightscaling textures.

Use D3DSAMP_MIPMAPLODBIAS or D3DSAMP_MAXMIPLEVEL instead of scaling a texture by gl_picmip at load time.

Not quite sure what the best way to handle Draw_StretchRaw is. Probably CreateOffscreenPlainSurface and StretchRect is worth a try. Cache the surface and the dimensions that it was created at so that you only need to recreate it if the dimensions change. You could create a lockable backbuffer and write directly to it, but that might impact performance in the more general case when you're not showing a .cin file.

D3D code needs you to correct the half-pixel offset for 2D GUI rendering otherwise things look really fuzzy and horrible.

Don't bother with a matrix stack; it's much easier to just load matrixes directly as needed by SetTransform. Storing a D3DMATRIX in each entity_t is damn useful.

by **revelator** » Tue Mar 29, 2011 10:15 pm

by mh » Tue Mar 29, 2011 11:57 pm

Some of Quake II's OpenGL code will just not work well with D3D at all. OpenGL puts more of an abstraction layer in front of the hardware (this is neither good nor bad, it's just OpenGL philosophy) meaning that it tends to shield you from many of the messier details, D3D tends to shove the ugliness of low-level stuff in your face and forces you to deal with it yourself. There are advantages and disadvantages to both approaches, but code written to suit one does not tend to work well with the other.

D3D is extremely sensitive to number of draw calls. You absolutely must keep these down as low as you can get, so right from the very start you're looking to batch things up as much as possible. That's one reason why I suggested ID3DXSprite for the 2D stuff (and particles/sprites) - it will do the batching for you, meaning that a lot of ugly code you need to write just goes away. This means that you need to rewrite a good chunk of the 2D renderer though.

The MD2 renderer as it stands won't cut it either. They need to go into vertex buffers and they need to be indexed in order to get good performance without hurting resource allocation. That means a total rewrite of not only the renderer but also the in-memory format.

The surface renderer is just shit. It's the worst of the lot almost; dynamic light updates are totally inefficient for D3D and there is no batching done at all. Everything needs to be put into texture chains and drawn from those, with each surface in a texture chain being batched together. That's another total rewrite. Updating lightmaps as they pass is crazy, OpenGL already makes you suffer for that on some systems, and D3D will make you suffer on all systems. Another rewrite.

In the texture loader OpenGL lets you load all kinds of exciting formats that don't actually exist in hardware like GL_RGB or GL_RGBA, and it will silently convert them to BGRA at load time for you. D3D doesn't; it's BGRA all the way baby and forget about anything else (except stuff like luminance of course, but Q2 doesn't use that). More changes.

So abstracting the current Q2 renderer to support both APIs is going to result in something that looks ugly and performs badly. I worked around some of that in the D3D8 wrapper by making some attempts at batching stuff up, converting data to BGRA, etc, but the basic structure of the renderer prevented it from having full effect.

It's worth noting though that OpenGL actually does support the kind of code that D3D likes, and that this is the highest performing kind of code you can write with OpenGL. I guess it bypasses a lot of the abstraction and conversion layers in the driver and goes more directly (pun intended) to the hardware. So if you wanted to have the same renderer with a bunch of #ifdefs you would need to port your OpenGL code to this kind of code first.

by **revelator** » Wed Mar 30, 2011 12:39 am

what i thought about was more of a general wrapper which both architectures could share for common calls it would probably end up being rather advanced though to accomodate for the downfalls you describe. so i agree that might not be what people would look for

could be interresting though.

by **Baker** » Mon Apr 04, 2011 8:16 pm

by mh » Mon Apr 04, 2011 8:32 pm

by **revelator** » Mon Apr 04, 2011 9:56 pm

by **Baker** » Wed Jun 22, 2011 1:01 am

by mh » Wed Jun 22, 2011 9:37 pm

I thought it might get to the stage where it does that.

MH's Direct3D 8.1 Wrapper

Who is online