Monthly Archives: May 2007

GLSL: ATI vs NVIDIA

Today two new differences between Radeon and Geforce GLSL support.

1 – float2 / vec2
vec2 is the GLSL type to hold a 2d vector. vec2 is supported by NVIDIA and ATI. float2 is a 2d vector but for Direct3D HLSL and for Cg. The GLSL compilation for Geforce is done via the NVIDIA Cg compiler. Here is the GLSL version displayed by GPU Caps Viewer: 1.20 NVIDIA via Cg compiler. That explains why a GLSL source that contains a float2 is compilable on NVIDIA hardware. But the GLSL compiler of ATI is strict and doesn’t recognize the float2 type.

2 – the following line:

vec2 vec = texture2D( tex, gl_TexCoord[0].st );

is valid for NVIDIA compiler but produces an error with ATI compiler. One again, the ATI GLSL compiler has done a good job. By default, texture2D() returns a 4d vector. The right syntax is:

vec2 vec = texture2D( tex, gl_TexCoord[0].st ).xy;

Conclusion: always test your shaders on both ATI and NVIDIA platforms unless you target one platform only.

R600 is VTF-capable

“All of the fetch and filtering capabilities are available to each thread type, making the samplers completely agnostic about what’s using them.”

This line from Beyond3D article on R600 means that vertex, geometry and pixel shaders can access to texture samplers. So Vertex Texture Fetching is now available with Radeon 2k series :thumbup: What’s more, the R600 can handle very large texture up to 8192×8192 just like the G80.

Dynamic branching and NVIDIA Forceware Drivers

Several weeks ago, I posted on Beyond3D a thread on my dynamic branching benchmark. I wondered why dynamic branching performances on Geforce 7 were worse than ones on Geforce 6 or 8. I believe I’ve got the answer: Forceware drivers.

Here are some new results where ratio = Branching_ON / Branching_OFF :

7600GS – Fw 84.21 – Branching OFF: 496 o3Marks – Branching ON: 773 o3Marks – Ratio = 1.5
7600GS – Fw 91.31 – Branching OFF: 509 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6
7600GS – Fw 91.36 – Branching OFF: 508 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6
7600GS – Fw 91.37 – Branching OFF: 509 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6

7600GS – Fw 91.45 – Branching OFF: 509 o3Marks – Branching ON: 472 o3Marks – Ratio = 0.9
7600GS – Fw 91.47 – Branching OFF: 509 o3Marks – Branching ON: 472 o3Marks – Ratio = 0.9
7600GS – Fw 93.71 – Branching OFF: 508 o3Marks – Branching ON: 474 o3Marks – Ratio = 0.9
7600GS – Fw 97.92 – Branching OFF: 505 o3Marks – Branching ON: 478 o3Marks – Ratio = 0.9
7600GS – Fw 100.95 – Branching OFF: 508 o3Marks – Branching ON: 480 o3Marks – Ratio = 0.9

my conclusion is: dynamic branching in OpenGL works fine (read the performance are better than without dynamic branching: ratio > 1) for forceware < = 91.37. For the drivers >= 91.45, the ratio drops under 1. Dynamic branching works as expected for gf6 and gf8 but not for gf7 since forceware 91.45. So the bug explanation is a plausible answer (and it’s easily understandable: in this news we learnt that a forceware driver is made of around 20 millions of lines of code – a paradise for a small bug!!!). I’ve also done the test with the simple soft shadows demo provided with the NV SDK 9.5. The results are the same.

I’ve just done the bench with a 7950gx2 and the latest forceware 160.02 and dynamic branching is still buggy…

Quick Review – Asus Silient Square Pro

Voilà le nouveau ventirad pour le cpu que j’ai installé sur ma machine de test:

Je dois dire que j’en suis assez satisfait. Facile à installer, ce ventirad d’Asus est quasiment inaudible. Et il rempli à merveille sa fonction de dissipation car les aillettes restent froides en permanence. Evidemment, je pense que c’est mon AMD X2 3800+ qui ne chauffe pas assez mais par rapport au ventirad d’origine livré avec le cpu, la différence est nette.

Du bon matos :thumbup: