While I was releasing the Julia’s Fractal demo, I tested it on NVIDIA and ATI (as usual before releasing a demo). And as usual, a new difference appeared in the way the GLSL is supported on ATI and on NVIDIA. On NVIDIA the following is line is ok but produces an error on ATI (Catalyst 8.1):
I can’t immagine a world with more than two major graphics cards manufacturers. But if such a world exists, I stop 3d programming… Fortunately, NVIDIA accepts ATI GLSL syntax so there is only one code at the end. Conlusion: always check your GLSL shaders on ATI and NVIDIA before releasing a demo…
Petit test sympathique des dernieres Radeon. Pourquoi sympathique? Tout simplement parcequ’il utilise le fur benchmark en plus des éternels 3dmark ou jeux vidéos (genre crysis, quake wars). C’est le second test que je trouve qui utilise le fur benchmark et ils ont raison. Les gros benchmarks comme 3DMark/Crysis font travailler le GPU mais aussi le CPU et îl n’est parfois pas très évident de voir l’influence d’une carte graphique avec ce genre de test. En fait ces tests doivent être réservés pour bencher une configuration complète (CPU/RAM/GPU) pour gamers. Mais si l’on veut se concentrer uniquement sur la carte graphique, il faut que la charge CPU soit le plus faible possible (pas de trop quand même sinon le GPU va commencer à attendre les données). En ce sens, le fur benchmark convient très bien car sa charge CPU est relativement faible et la charge GPU très élevée. Du coup, pratiquement n’importe quel système CPU/RAM (bon peut être pas un PIII mais sait-on jamais) est bon pour bencher une carte 3d avec le fur benchmark.
From oZone3D.Net Forums, the Catalyst 7.9 seems to unleash ATI Radeon 2900 GPU. The Surface Deformer benchmark is a benchmark that requires a lot of vertex processing horse power. With Catalyst prior to 7.9, the score of an ATI 2900 was around 8000 o3Marks (that was already high). Now with Catalyst 7.9, the 2900 gets a score of 15000 o3Marks. Incredible!!! Why such a big big jump in OpenGL performance ?
My first thought is that ATI has managed to use correctly the unified arch of the R600 gpu. With unified arch, the workload is distribued over all shaders processors no matter the type of the shader prog (vertex or pixel). So if the vertex shader needs more processing power than the pixel shader, more shaders processors will be used for the vertex shader. My second thought: unified arch has involved new kernel code for catalyst and simply ATI has optimized the R600 codepath. A driver for a modern GPU like the R600 is a very complex piece of code and optimizing such a code is a huge task….
I found this bug while I was coding a new small soft shadows demo for GPU Caps Viewer. Soft shadows are built on shadow mapping and my OpenGL shadow mapping code works perfectly on all Geforce 6/7/8 and Radeon 1k but not on Radeon 2K (2400/2600/2900). Why ? Because of the shadow mapping comparison function that had a serious bug! To be short, the comparison function was supposed to return a boolean value (if shadow returns 0, else returns 1) and before Catalyst 7.9, this function returned, for Radeon 2K, the depth buffer value (as if the comparison function was disabled). But this bug is now a memory since Catalyst 7.9 has fixed it.
I guess we can say thanks to Quake Wars, that has been released few days ago and that is an OpenGL game. For this game (that is really nice), ATI has fixed all major OpenGL bugs.
Today two new differences between Radeon and Geforce GLSL support.
1 – float2 / vec2 vec2 is the GLSL type to hold a 2d vector. vec2 is supported by NVIDIA and ATI. float2 is a 2d vector but for Direct3D HLSL and for Cg. The GLSL compilation for Geforce is done via the NVIDIA Cg compiler. Here is the GLSL version displayed by GPU Caps Viewer: 1.20 NVIDIA via Cg compiler. That explains why a GLSL source that contains a float2 is compilable on NVIDIA hardware. But the GLSL compiler of ATI is strict and doesn’t recognize the float2 type.
2 – the following line:
vec2 vec = texture2D( tex, gl_TexCoord.st );
is valid for NVIDIA compiler but produces an error with ATI compiler. One again, the ATI GLSL compiler has done a good job. By default, texture2D() returns a 4d vector. The right syntax is:
A new version of the Soft Shadows Benchmark is available but this time using uniform arrays to pass the blurring kernel to the pixel shader. On nVidia boards, there is a little increase of speed (1 or 2 fps). On my X700… black screen… Houston, we’ve got a problem… This is with Catalyst 6.6. Okay I try the very latest Catalyst, the 6.7. Bad idea, it’s worse! Both versions (with and without uniform arrays) do not work anymore with C6.7. Back to C6.6. That really sucks! :thumbdown:
But I’ve just received a feedback telling me that the uniform arrays version works fine on an ATI X1600 Pro with C6.5. :thumbup:
Okay, there is certainly a problem with the X*** series and uniform arrays.
I’ve just found in the super paper of ATI, called “ATI OpenGL Programming and Optimization Guide” that all ATI GPUs from the R300 (Radeon 9700) to the latest R580 (Radeon X1900) only support NEAREST (and the mipmap version) filtering for depth map. That explains the previous results. So if you want a nVidia-like depth map filtering, you have to code the filtering yourself in the pixel shader. Okay, this answer suits me!
Really ATI has some problems with OpenGL. Now I’m working on soft shadows and my tmp devstation has a Radeon X700 (not the top-notch I know but an enough powerful CG). With my X700 (Catalyst 6.6) the soft shadow edges are rendered as follows:
And on my second CG, a nVidia 6600gt (forceware 91.31), the soft shadows are as follows:
The GLSL shaders are the same, a 5×5 bluring kernel, with a shadow map (or depth map as you want) of 1024×1024 (via a FBO) with a linear filtering. Now if I set the nearest filtering mode, I get the following results for the X700:
and for the 6600gt:
It seems as if the Radeon GPU has a bug in the filtering module when the gpu has to apply a linear filter on a depth map. Very strange.
I’m not satisfied by this explanation but it’s the only I see for the moment.
This kind of problem shows how it’s important for a graphics developer to have at least 2 workstations, one with a nVidia board and the other with an ATI CG. I tell you, realtime 3D is made of blood, sweat and screams! :winkhappy:
I’m currently working on a new and simple framework for my OpenGL experimentations before implementing the algorithms in the oZone3D Engine . RaptorGL is a little bit too heavy for simple tests so for the moment I drop it. This new framework I called XPGL (eXPerimental Graphics Library), allows me to quickly test the new algos I’m working on. Every time I have to code a little but fully operational 3D demo in c++/opengl, I spend lot of time for a small result. In these moments, I say to myself that Hyperion is a very cool tool.
Okay, let’s see a weird behavour of radeon gpu. At the moment, my graphics controller is a Radeon X700. With the latest catalyst drivers (6.6), this graphics board should be an OpenGL 2.0 compliant CG. A little check to the GL_VERSION tells me the X700 is GL2 compliant. Then the X700 should handle non power of two texture since this feature is part of the OpenGL 2.0 core. But the GL_ARB_texture_non_power_of_two string is not found in the GL_EXTENSIONS. Maybe ATI does not mention the extensions that are part of the core. Anyways, I loaded a 600×445 npot-texture on a mesh plane and the X700 seems to support this texture. But with a ridiculous fps of 1… Software codepath? I think so! So I decided to load the same texture with power of two dims (512×512) and the fps is become decent again. With my gf6600gt (with the forceware 91.31) I never noticed this effect/bug because the GL2-support is better and nVidia gpus correctly support non power of two texture. You can download the demo with the npot and pot texture (the one mapped onto the mesh plane) hereafter and do the test for yourself. Feel free to drop me a feedback if you wish.
I’ve just received an email from an user saying that he was’nt able to run the demos of the Vertex Displacement Mapping Tutorial on his brand new Radeon X1900XTX. VTF or Vertex Texture Fetching is a cool feature of high-end graphics chipsets and it’s part of Shader Model 3.0. The X1900 series is based on the R500 chipset (R580) that is a SM3.0 complient GPU. But in OpenGL side and especially in GLSL, VTF is not supported. The OpenGL query done with GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB always returns 0. That means that no texture units are available in the vertex shader.
ATI confirms this fact in one of its whitepapers shipped with the ATI SDK (ATI OpenGL Programming and Optimization Guide.pdf). At the page 11, we can read this: [i]”All ATI graphics HW have a few items that deserve special consideration when using GLSL. The first major item of note is the absence of vertex texture units. This means that vertex texturing is never available, and all shaders attempting to use texture functions in the vertex shader will fail to link.”[/i]. I know, this is a rude reality. The R580 GPU is really powerful and it’s a pity that ATI does not support VTF in his chipsets. I don’t know how the R580 behaves in D3D side but I can suppose the GPU has the same limitations. VTF is currently supported by Geforce chipset from 6600 to 7900. Conclusion: if you wish to play with VTF, use a nVidia board.
Maybe, all these problems will be solved with the SM4.0. I hope! :winkhappy: