Once upon a time, when > was faster than < ... Wait, what?

COptimizationOpenglCpuGpu

C Problem Overview


I am reading an awesome OpenGL tutorial. It's really great, trust me. The topic I am currently at is Z-buffer. Aside from explaining what's it all about, the author mentions that we can perform custom depth tests, such as GL_LESS, GL_ALWAYS, etc. He also explains that the actual meaning of depth values (which is top and which isn't) can also be customized. I understand so far. And then the author says something unbelievable:

> The range zNear can be greater than the range zFar; if it is, then the > window-space values will be reversed, in terms of what constitutes > closest or farthest from the viewer. > > Earlier, it was said that the window-space Z value of 0 is closest and > 1 is farthest. However, if our clip-space Z values were negated, the > depth of 1 would be closest to the view and the depth of 0 would be > farthest. Yet, if we flip the direction of the depth test (GL_LESS to > GL_GREATER, etc), we get the exact same result. So it's really just a > convention. Indeed, flipping the sign of Z and the depth test was once > a vital performance optimization for many games.

If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison. So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.

Is the author making things up, am I misunderstanding something, or is it indeed the case that once < was slower (vitally, as the author says) than >?

Thanks for clarifying this quite curious matter!

Disclaimer: I am fully aware that algorithm complexity is the primary source for optimizations. Furthermore, I suspect that nowadays it definitely wouldn't make any difference and I am not asking this to optimize anything. I am just extremely, painfully, maybe prohibitively curious.

C Solutions


Solution 1 - C

> If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison. So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.

I didn't explain that particularly well, because it wasn't important. I just felt it was an interesting bit of trivia to add. I didn't intend to go over the algorithm specifically.

However, context is key. I never said that a < comparison was faster than a > comparison. Remember: we're talking about graphics hardware depth tests, not your CPU. Not operator<.

What I was referring to was a specific old optimization where one frame you would use GL_LESS with a range of [0, 0.5]. Next frame, you render with GL_GREATER with a range of [1.0, 0.5]. You go back and forth, literally "flipping the sign of Z and the depth test" every frame.

This loses one bit of depth precision, but you didn't have to clear the depth buffer, which once upon a time was a rather slow operation. Since depth clearing is not only free these days but actually faster than this technique, people don't do it anymore.

Solution 2 - C

The answer is almost certainly that for whatever incarnation of chip+driver was used, the Hierarchical Z only worked in the one direction - this was a fairly common issue back in the day. Low level assembly/branching has nothing to do with it - Z-buffering is done in fixed function hardware, and is pipelined - there is no speculation and hence, no branch prediction.

Solution 3 - C

It has to do with flag bits in highly tuned assembly.

x86 has both jl and jg instructions, but most RISC processors only have jl and jz (no jg).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionArmen TsirunyanView Question on Stackoverflow
Solution 1 - CNicol BolasView Answer on Stackoverflow
Solution 2 - CCrowley9View Answer on Stackoverflow
Solution 3 - CJoshuaView Answer on Stackoverflow