Tag Archives: nvidia

NVIDIA’s David Kirk Interview on CUDA, CPUs and GPUs

David Kirk, Nvidia’s Chief Scientist, interviewed by the guys at bit-tech.net.

Read the full interview HERE.

Here are some snippets of this 8-page interview:

“Kirk’s role within Nvidia sounds many times simpler than it actually is: he oversees technology progression and he is responsible for creating the next generation of graphics. He’s not just working on tomorrow’s technology, but he’s also working on what’s coming out the day after that, too.”
“I think that if you look at any kind of computational problem that has a lot of parallelism and a lot of data, the GPU is an architecture that is better suited than that. It’s possible that you could make the CPUs more GPU-like, but then you run the risk of them being less good at what they’re good at now”
“The reason for that is because GPUs and CPUs are very different. If you built a hybrid of the two, it would do both kinds of tasks poorly instead of doing both well,”

page 2:
“Nvidia has talked about hardware support for double precision in the past—especially when Tesla launched—but there are no shipping GPUs supporting it in hardware yet”
“our next products will support double precision.”
“David talked about expanding CUDA other hardware vendors, and the fact that this is going to require them to implement support for C.”
“current ATI hardware cannot run C code, so the question is: has Nvidia talked with competitors (like ATI) about running C on their hardware?”

page 3:
“It amazes me that people adopted Cell [the pseudo eight-core processor used in the PS3] because they needed to run things several times faster. GPUs are hundreds of times faster so really if the argument was right then, it’s really right now.”

Continue reading »


While I was releasing the Julia’s Fractal demo, I tested it on NVIDIA and ATI (as usual before releasing a demo). And as usual, a new difference appeared in the way the GLSL is supported on ATI and on NVIDIA. On NVIDIA the following is line is ok but produces an error on ATI (Catalyst 8.1):

gl_FragColor = texture1D(tex, (float)(i == max_i ? 0 : i) / 100);			

To be ATI Radeon-compliant, this instruction must be split in two:

if( i == max_i )
	gl_FragColor = texture1D(tex, 0.0);			
	gl_FragColor = texture1D(tex, i/100.0);			

I can’t immagine a world with more than two major graphics cards manufacturers. But if such a world exists, I stop 3d programming… Fortunately, NVIDIA accepts ATI GLSL syntax so there is only one code at the end. Conlusion: always check your GLSL shaders on ATI and NVIDIA before releasing a demo…


Today two new differences between Radeon and Geforce GLSL support.

1 – float2 / vec2
vec2 is the GLSL type to hold a 2d vector. vec2 is supported by NVIDIA and ATI. float2 is a 2d vector but for Direct3D HLSL and for Cg. The GLSL compilation for Geforce is done via the NVIDIA Cg compiler. Here is the GLSL version displayed by GPU Caps Viewer: 1.20 NVIDIA via Cg compiler. That explains why a GLSL source that contains a float2 is compilable on NVIDIA hardware. But the GLSL compiler of ATI is strict and doesn’t recognize the float2 type.

2 – the following line:

vec2 vec = texture2D( tex, gl_TexCoord[0].st );

is valid for NVIDIA compiler but produces an error with ATI compiler. One again, the ATI GLSL compiler has done a good job. By default, texture2D() returns a 4d vector. The right syntax is:

vec2 vec = texture2D( tex, gl_TexCoord[0].st ).xy;

Conclusion: always test your shaders on both ATI and NVIDIA platforms unless you target one platform only.

Dynamic branching and NVIDIA Forceware Drivers

Several weeks ago, I posted on Beyond3D a thread on my dynamic branching benchmark. I wondered why dynamic branching performances on Geforce 7 were worse than ones on Geforce 6 or 8. I believe I’ve got the answer: Forceware drivers.

Here are some new results where ratio = Branching_ON / Branching_OFF :

7600GS – Fw 84.21 – Branching OFF: 496 o3Marks – Branching ON: 773 o3Marks – Ratio = 1.5
7600GS – Fw 91.31 – Branching OFF: 509 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6
7600GS – Fw 91.36 – Branching OFF: 508 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6
7600GS – Fw 91.37 – Branching OFF: 509 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6

7600GS – Fw 91.45 – Branching OFF: 509 o3Marks – Branching ON: 472 o3Marks – Ratio = 0.9
7600GS – Fw 91.47 – Branching OFF: 509 o3Marks – Branching ON: 472 o3Marks – Ratio = 0.9
7600GS – Fw 93.71 – Branching OFF: 508 o3Marks – Branching ON: 474 o3Marks – Ratio = 0.9
7600GS – Fw 97.92 – Branching OFF: 505 o3Marks – Branching ON: 478 o3Marks – Ratio = 0.9
7600GS – Fw 100.95 – Branching OFF: 508 o3Marks – Branching ON: 480 o3Marks – Ratio = 0.9

my conclusion is: dynamic branching in OpenGL works fine (read the performance are better than without dynamic branching: ratio > 1) for forceware < = 91.37. For the drivers >= 91.45, the ratio drops under 1. Dynamic branching works as expected for gf6 and gf8 but not for gf7 since forceware 91.45. So the bug explanation is a plausible answer (and it’s easily understandable: in this news we learnt that a forceware driver is made of around 20 millions of lines of code – a paradise for a small bug!!!). I’ve also done the test with the simple soft shadows demo provided with the NV SDK 9.5. The results are the same.

I’ve just done the bench with a 7950gx2 and the latest forceware 160.02 and dynamic branching is still buggy…

New NVIDIA OpenGL Extensions Headers

The new OpenGL headers files contain new extensions stuff. You can download them from… just a second, I start GPU Caps Viewer and… okay I got it :thumbup: : from developer.nvidia.com/object/nvidia_opengl_specs.html.

But there are a couple of weird things:

1 – the glext.h version is 28 (#define GL_GLEXT_VERSION 28). The version I use to compile the oZone3D engine renderer is the 29. And I use this header since more than one year…

2 – the glext.h header does not compile with vc6 (yes I still use visual studio 6!) because of the GL_EXT_timer_query extension. Here is the origianl piece of code you can find in glext.h:

* Original code - does not compile with vc6.
#ifndef GL_EXT_timer_query
typedef signed long long GLint64EXT;
typedef unsigned long long GLuint64EXT;

and here is the code I updated for visual c 6:

Modified code for oZone3D engine - compile with vc6
#ifndef GL_EXT_timer_query
	#ifdef _WIN32
		typedef signed __int64 GLint64EXT;
		typedef unsigned __int64 GLuint64EXT;
		typedef signed long long GLint64EXT;
		typedef unsigned long long GLuint64EXT;

I wonder if the original glext.h compiles with vc7 or vc8. If anyone has the answer, feel free to contact me…

NVIDIA OpenGL Extension Specifications

Finally NVIDIA releases the specs of the new OpenGL extensions that come with the gf8800. Great news! :thumbup:

These specs are very important for us, poor graphics developers, in order to update our software with the latest cool features. So among these specs, there is the GL_EXT_draw_instanced that allows to do geometry instancing. Another extension is WGL_NV_gpu_affinity. This ext allows to send the gfx calls to a particular GPU in multi-gpus system. Should be cool to see how a 7950GX2 behaves. The GL_EXT_timer_query ext provides a nano-second resolution timer to determine the amount of time it takes to fully complete a set of OpenGL gfx calls. There are still so many cool extensions. As soon as I get a 8800 board, I’ll made a little tutorial to cover these cool extensions.

NVIDIA GLSL compiler

In the demo I received from satyr (see oZone3D.Net forums), there is a toon shader that uses glsl uniforms. The pixel shader looked like to:

uniform float silhouetteThreshold;

void main()
  silhouetteThreshold = 0.32;     

  //... shader code
  //... shader code
  //... shader code

This pixel shader compiles well on nVidia gc but generes an error on ati. The error is right since an uniform is a read-only variable. This is an example of the nVidia glsl compiler laxity. That’s why I code my shader on ati: if the code is good for ati, we can be sure it will be good for nvidia too (of course there are always some exceptions…)

SLI is nice!

I must confess the SLI is a really cool technology. I assembled a SLI station based on the new nVidia Geforce 7600 GS series. This is a little graphic card compared to the high-end ones like the 7900GTX or the big monster 7950 GX2 (which in passing is really quiet in spite of its two GPUs and ventirads): the price of a 7600GS is about 100 euros, the price of a 7900GTX is about 400 euros and the cost of the 7950 GX2 climbs up to 500 euros! In terms of shader pipelines, the 7600 GS has 5 vertex pipes and 12 pixel pipes, the 7900 GTX has 8 vertex pipes / 24 pixel pipes and the GX2 has 48 pixel pipes and 16 vertex pipes. And so what? A 7600 GS SLI system offers almost the same level of gpu power than a 7800/7900 for a half price. I’ve verified this fact with the latest version of the soft shadows benchmark. With one 7600 GS (single GPU mode) the score is 510 o3Marks. With the two 7600 GS in SLI mode, the score jumps to 1004 o3Marks. The 7950 GX2’s score in single GPU is about 1200 o3Marks. Ok my bench is a small bench, you’re right. What about 3dmrk06? The 3DMark06 score is 4115 for the 7600 GS SLI system. This score is close to the ones got with a non-overclocked gf7800/7900.

Little remark: my 7600 GS are not overclocked.

Conlusion: for 200 euros you can have an excellent graphic system which is also silent thanks to the 7600GS passive cooler!

Depth Map Filtering – ATI vs NVIDIA

Really ATI has some problems with OpenGL. Now I’m working on soft shadows and my tmp devstation has a Radeon X700 (not the top-notch I know but an enough powerful CG). With my X700 (Catalyst 6.6) the soft shadow edges are rendered as follows:

And on my second CG, a nVidia 6600gt (forceware 91.31), the soft shadows are as follows:

The GLSL shaders are the same, a 5×5 bluring kernel, with a shadow map (or depth map as you want) of 1024×1024 (via a FBO) with a linear filtering. Now if I set the nearest filtering mode, I get the following results for the X700:

and for the 6600gt:

It seems as if the Radeon GPU has a bug in the filtering module when the gpu has to apply a linear filter on a depth map. Very strange.
I’m not satisfied by this explanation but it’s the only I see for the moment.

This kind of problem shows how it’s important for a graphics developer to have at least 2 workstations, one with a nVidia board and the other with an ATI CG. I tell you, realtime 3D is made of blood, sweat and screams! :winkhappy:

NPOT Textures

It’s nice to come back to code!

I’m currently working on a new and simple framework for my OpenGL experimentations before implementing the algorithms in the oZone3D Engine . RaptorGL is a little bit too heavy for simple tests so for the moment I drop it. This new framework I called XPGL (eXPerimental Graphics Library), allows me to quickly test the new algos I’m working on. Every time I have to code a little but fully operational 3D demo in c++/opengl, I spend lot of time for a small result. In these moments, I say to myself that Hyperion is a very cool tool.

Okay, let’s see a weird behavour of radeon gpu. At the moment, my graphics controller is a Radeon X700. With the latest catalyst drivers (6.6), this graphics board should be an OpenGL 2.0 compliant CG. A little check to the GL_VERSION tells me the X700 is GL2 compliant. Then the X700 should handle non power of two texture since this feature is part of the OpenGL 2.0 core. But the GL_ARB_texture_non_power_of_two string is not found in the GL_EXTENSIONS. Maybe ATI does not mention the extensions that are part of the core. Anyways, I loaded a 600×445 npot-texture on a mesh plane and the X700 seems to support this texture. But with a ridiculous fps of 1… Software codepath? I think so! So I decided to load the same texture with power of two dims (512×512) and the fps is become decent again. With my gf6600gt (with the forceware 91.31) I never noticed this effect/bug because the GL2-support is better and nVidia gpus correctly support non power of two texture. You can download the demo with the npot and pot texture (the one mapped onto the mesh plane) hereafter and do the test for yourself. Feel free to drop me a feedback if you wish.

NPOT_Demo.zip (659k)

But keep in mind that graphics hardware is optimized for POT textures. Try to use POT textures in order to maximize your chances to see your demo running everywhere.