by mh » Sat Jul 14, 2012 5:38 pm
Intel have supported GL2/2.1 for a long time now, but of course with Intel "support" is relative.
Last year and earlier this year I did a lot of work with an Intel 945. This dated back 6 years or so, and has fairly solid GL1.4 or 1.5 support, can't remember which. GL_ARB_vertex_program and GL_ARB_fragment_program were both available, VBOs were available (although as a software T&L part they would have been emulated) and Quake-style rendering can be done with quite modern-looking code.
There are still advantages to using even VBOs in this case. For one thing it means that you get to avoid a lot of horrible shuffling of data around in memory before you actually get to draw stuff. The rules with software T&L are quite different to hardware T&L though - you do need to be mindful of your VBO access patterns and not just randomly jump around in the buffer (you can get away with this in hardware T&L without penalty), and both tend to crash in different places if you do different classes of bad things (going past the end of a buffer with hardware T&L is generally fine, with software you'll crash - if you're lucky). Using streamed rather than interleaved data can be advantageous (as can padding the position component to 4 floats for SIMD ops). For smaller maps (ID1 scale) you can get away with a glDrawArrays call per surface (store the glDrawArrays params in your msurface_t struct and it becomes really easy), for bigger maps it piles up (by then you'll be bottlenecking on the server more than in the renderer though). Not having GL_ARB_map_buffer_range means that dynamic VBOs are broken and for any kind of dynamic data it's best to use plain vertex arrays, but couple them with vertex and fragment programs for handing water and sky, make at least some attempt to group surfaces by shader type, and you never have to touch any surface vertex data at runtime and can get quite fast performance out of it (I was hitting ~280fps timedemo demo1 at 800x600 windowed towards the end).
An interesting quirk was that 4x aniso was maybe 5% to 10% faster than regular trilinear. No idea why, but it was measurable and repeatable.
The software vs hardware T&L differences highlight one thing I said earlier - that's a bridge beyond which the rules totally change. The kind of code optimizations you do for one are often somewhere between useless and harmful for the other, so if you take code that's optimized for the capabilities of one and try to run it on hardware that has the other, the best you can hope for is that it won't run as well as it should. Unfortunately GL doesn't offer a means to detect if you're running a software T&L implementation (and relying on detecting extensions is useless because - as we've seen - vertex programs and VBOs can be both emulated in software without any issues). That's a burden that needs to be shifted to the player - yuck.
If you really must run on both classes of hardware, you have a coupla choices. You can write separate codepaths for each, or you can pick one and accept that the other is going to suffer.