Category Archives: OpenGL

ForceWare 163.75 et Occlusion Query

Je viens de passer deux heures de debug pour rien à cause d’un bug dans les Forceware 163.75. Une de mes routines qui utilise les fonctions de l’extension GL_ARB_occlusion_query plantait dès que le nombre de demandes devenait un peu grand (genre plus de 10000 demandes). Je me disais qu’il devait surement s’agir d’un buffer trop plein quelque part ou bien trop de demandes en attente (le debugger plantait dans la lib opengl de NVIDIA). Alors j’ai un peu bricolé et je me suis aperçu que glGetError() permettait à mon code de fonctionner ce qui laisse penser à un problème de latence / parallélisme au niveau des instructions OpenGL dans le pilote forceware.

Puis soudainement, en voyant arriver sur mon lecteur RSS la disponibilité du nouveau pilote ForceWare 169.38, je me suis dis: allez je ferme mes cinquantes fenêtres (je ne sais pas pour vous mais chez moi chaque instance de Visual Studio 2005 met 3 plombes pour se fermer en comparaison de Visual C++ 6.0 qui se fermait quasi immédiatement)
et hop j’installe ce nouveau driver. Yes! Mes routines d’occlusion query se sont remises à marcher parfaitement. Donc je me suis pris la tête pour rien sur un bug des ForceWare 163.75 ou mieux sur un bug qui est plus présent dans les ForceWare 169.38. Conclusion: mettez à jour vos drivers!

Quick Review – Lumina – GLSL studio

Je viens de m’amuser un peu avec Lumina. C’est un petit environnement de mise au point de shader GLSL. Le projet demarre et il y a encore pas mal de petites coquilles (essayez de charger plusieurs projets les uns à la suite des autres ou plus simplement chargez le projet de test deferred3.lum: l’interface graphique aime moyennement!) qui trainent mais le concept est bon. En regardant de plus prêt, cela ressemble fortement à une interface graphique posé sur un soft comme Demoniak3D. Il y a des scripts (écrits dans un language basé sur ECMA)pour mettre en place les éléments de la scene 3d et les controler. Il y a aussi les scripts GLSL (vertex, pixel et geometry). Si on analyse un fichier de projet on découvre une structure similaire à une démo Demoniak3D: un script XML, des nodes <script>, <shader>, etc.

Maintenant que le tour du proprio est fait, voilà mon premier projet de test ultra simple: afficher un torus jaune qui tourne le tout utilisant un vertex et un pixel shader pour le rendu. J’ai pu coder ce projet rapidement avec une analyse rapide des fichiers de projets *.lum.

Le projet est téléchargeable ici: lumina_jegx_test_01.zip

Globalement c’est sympa mais l’interêt de l’interface graphique est discutable. Dans ce type de soft (Lumina ou Demoniak3D) soit l’interface graphique est de haut niveau et simple à utiliser soit vaut mieux s’en passer. Je vais quand même étudier plus en détail le fonctionnement de Lumina ne serait-ce que pour améliorer Demoniak3D et son successeur…

Dans le même esprit que lumina il y a aussi FX Composer (NVIDIA) et RenderMonkey (ATI).

GLSL: ATI vs NVIDIA – Part…

While I was releasing the Julia’s Fractal demo, I tested it on NVIDIA and ATI (as usual before releasing a demo). And as usual, a new difference appeared in the way the GLSL is supported on ATI and on NVIDIA. On NVIDIA the following is line is ok but produces an error on ATI (Catalyst 8.1):

gl_FragColor = texture1D(tex, (float)(i == max_i ? 0 : i) / 100);			

To be ATI Radeon-compliant, this instruction must be split in two:

if( i == max_i )
{
	gl_FragColor = texture1D(tex, 0.0);			
}
else
{
	gl_FragColor = texture1D(tex, i/100.0);			
}

I can’t immagine a world with more than two major graphics cards manufacturers. But if such a world exists, I stop 3d programming… Fortunately, NVIDIA accepts ATI GLSL syntax so there is only one code at the end. Conlusion: always check your GLSL shaders on ATI and NVIDIA before releasing a demo…

A cumbersome bug in the Catalyst 7.12

The latest Catalyst version is the 7.12 (the Cat7.12 internal number is 8.442.0.0). But exactly like the Cat7.11, these drivers have a bug in the management of dynamic lights in GLSL. But this time, I searhed for the source of bug because this bug is a little bit cumbersome in Demoniak3D demos. And we can’t use a previous version since Cat7.11+ are required to drive the radeon 3k (HD 3870/3850). Then I’ve coded a small Demoniak3D script that shows the bug. This script displays a mesh plane lit by a single dynamic light. The key SPACE allows to switch the GLSL shader: from the bug-ridden to the fixed and inversely.

– The following image shows the plane enlightened with the fixed shader:

– The following image shows the plane lit with the bug-ridden shader:

Okay that’s cool, but where does the bug come from ? After a little time spent on shaders tweaking, my conclusion is that the bug is localized in the value of the built-in uniform variable gl_LightSource[0].position. In the vertex shader, this variable should contain the light position in camera space. It’s OpenGL that does this transformation, and we, poor developers, just need to specify somwhere in the C++ app the light position in world coordinates. In the vertex shader, gl_LightSource[0].position helps us to get the light direction used later in the pixel shader:

	lightDir = gl_LightSource[0].position.xyz - vVertex;

With the Catalyst 7.11 and 7.12, the value stored in gl_LightSource[0].position is wrong. Then, one workaround, until the ATI driver team fix the bug, is to manually compute the light pos in camera space by passing to the vertex shader the camera view matrix and the light pos in world coord:

	vec3 lightPosEye = vec3(mv * vec4(-150.0, 50.0, 0.0, 1.0));
	lightDir = lightPosEye - vVertex;

mv is the 4×4 view matrix and vec4(-150.0, 50.0, 0.0, 1.0) is the hard coded light pos in world coord.

In the fixed pipeline, dynamic lights are well handled as shown in the next image:

In the Demoniak3D demo, the bug-ridden GLSL shader is called OneDynLightShader and the fixed one OneDynLightShader_Fixed. The demo source code is localized in the OneDynLightTest.xml file. To start the demo, unzip the archive in a directory and launch
DEMO_Catalyst_Bug.exe.

The demo is downloadable here: Demoniak3D Catalyst 7.11/7.12 Bug

This bug seems to affect all radeons BUT under Windows XP only. Seems as if ATI is forcing people to switch to Vista. Not cool… Or maybe ATI begins to implement OpenGL 3.0 in the Win XP drivers. Do not forget that with OpenGL 3.0 as with DX10, the fixed functions of the 3D pipeline like the management of dynamic lights will be removed.

Les Catalyst 7.12 toujours à la sauce “Bug-Inside”

Les derniers pilotes Catalyst ont la version 7.12 (le numéro interne des Cat7.12 est le 8.442.0.0 – c’est pas un téléphone ok!). Mais exactement comme les 7.11, ces drivers ont un bug dans la gestion des lumières dynamiques en GLSL. Mais cette fois-ci, je me suis mis à la recherche du bug car il est un peu, voire très génant pour les demos Demoniak3D. J’ai donc pondu un petit script Demoniak3D qui met en évidence ce bug. Ce script montre un mesh plan éclairé par une lumière dynamique. Un appui sur la touche SPACE permet de changer de shader GLSL: on passe du shader bogué au shader corrigé et inversement.

– L’image suivante montre le plan éclairé avec le shader corrigé:

– L’image suivante montre le plan éclairé avec le shader bogué:

okay tout ceci est bien, mais d’oû vient le bug? Après avoir passé un peu de temps à tweaker les shaders, j’en suis arrivé à la conclusion que le bug se situe au niveau de la valeur contenue dans la variable uniforme built-in gl_LightSource[0].position. Au niveau du vertex shader, cette variable contient la position de la lumière exprimée l’espace de la caméra. C’est OpenGL qui effectue cette transformation, à notre niveau il suffit de spécifier la position de la lumière en coordonnées du monde. Au niveau du vertex shader, gl_LightSource[0].position nous permet de calculer la direction de la lumière utilisée plus tard dans le pixel shader:

	lightDir = gl_LightSource[0].position.xyz - vVertex;

Avec la Radeon HD 3870 et les Catalyst 7.11 et 7.12, la valeur contenue dans gl_LightSource[0].position est fausse.
Donc le workaround que je propose en attendant que la driver team d’ATI corrige le bug, est de passer au vertex shader la position de la lumière exprimée dans les coordonnées du monde ainsi que la matrice de vue de la camera et de faire la transformation à la main:

	vec3 lightPosEye = vec3(mv * vec4(-150.0, 50.0, 0.0, 1.0));
	lightDir = lightPosEye - vVertex;

mv représente la matrice 4×4 de vue de la caméra et vec4(-150.0, 50.0, 0.0, 1.0) représente la position de la lumière en coordonnées du monde.

Au niveau pipeline fixe, les lumières dynamiques sont bien gérées comme le montre l’image suivante:

Au niveau de la démo Demoniak3D, le shader GLSL bogué est appelé OneDynLightShader et celui corrigé OneDynLightShader_Fixed. Le code source de la démo Demoniak3D se trouve dans le fichier OneDynLightTest.xml. Pour lancer la demo, dézippez l’archive dans un répertoire
et lancez DEMO_Catalyst_Bug.exe.

La démo est téléchargeable ici: Demoniak3D Catalyst 7.11/7.12 Bug

Ce bug affecte toutes les radeons MAIS sous Windows XP uniquement. On dirait qu’ATI nous force un peu la main pour passer sous Vista. Pas trop sympa… Ou alors ATI commence à implémenter OpenGL 3.0 dans les drivers XP. Car n’oublions pas qu’avec OpenGL 3.0, tout comme avec DX10, les fonctions fixes du pipelines 3D comme la gestion des lumières dynamiques sont supprimées.

Je voudrais remercier la communauté WorldPCSpecs pour les tests. Merci les gars!

Les nouveaux Catalyst 7.11 à la sauce “Bug-Inside”

ATI vient de nous livrer les nouveaux Catalyst 7.11 pour nos belles Radeon. Mais on dirait que ça commence à être une habitude chez les petits gars d’ATI de nous pondre des pilotes bogués surtout pour les nouvelles cartes! Souvenez-vous des Catalyst 7.9 qui enfin corrigeaient un gros bug au niveau des shadow-maps et ce bug n’était visible que pour les Radeon 2k. Bien maintenant c’est la même chose avec les Cat7.11: ils sont bogués pour les Radeon 3k au niveau OpenGL: impossible de mettre plus d’une lumière dynamique dans les shaders GLSL! C’est quand même un sacré bug! Bon pour le moment je n’ai testé que sous WinXP donc peut etre que sous Vista c’est mieux.

A part ce bug (il y en a surement d’autres mais j’ai pas fait assez de tests pour le savoir), les Cat7.11 sont les premiers pilotes qui supportent les Radeon HD 3870. Le numéro interne des Cat7.11 est les 8.432.0.0.

Le téléchargement des Cat7.11 se passe ici:

WinXP 32-bit: [DOWNLOAD]
Vista 32-bit: [DOWNLOAD]

Catalyst 7.9 and Radeon 2K Shadow Mapping Bug

I found this bug while I was coding a new small soft shadows demo for GPU Caps Viewer. Soft shadows are built on shadow mapping and my OpenGL shadow mapping code works perfectly on all Geforce 6/7/8 and Radeon 1k but not on Radeon 2K (2400/2600/2900). Why ? Because of the shadow mapping comparison function that had a serious bug! To be short, the comparison function was supposed to return a boolean value (if shadow returns 0, else returns 1) and before Catalyst 7.9, this function returned, for Radeon 2K, the depth buffer value (as if the comparison function was disabled). But this bug is now a memory since Catalyst 7.9 has fixed it.

I guess we can say thanks to Quake Wars, that has been released few days ago and that is an OpenGL game. For this game (that is really nice), ATI has fixed all major OpenGL bugs.

GLSL: ATI vs NVIDIA

Today two new differences between Radeon and Geforce GLSL support.

1 – float2 / vec2
vec2 is the GLSL type to hold a 2d vector. vec2 is supported by NVIDIA and ATI. float2 is a 2d vector but for Direct3D HLSL and for Cg. The GLSL compilation for Geforce is done via the NVIDIA Cg compiler. Here is the GLSL version displayed by GPU Caps Viewer: 1.20 NVIDIA via Cg compiler. That explains why a GLSL source that contains a float2 is compilable on NVIDIA hardware. But the GLSL compiler of ATI is strict and doesn’t recognize the float2 type.

2 – the following line:

vec2 vec = texture2D( tex, gl_TexCoord[0].st );

is valid for NVIDIA compiler but produces an error with ATI compiler. One again, the ATI GLSL compiler has done a good job. By default, texture2D() returns a 4d vector. The right syntax is:

vec2 vec = texture2D( tex, gl_TexCoord[0].st ).xy;

Conclusion: always test your shaders on both ATI and NVIDIA platforms unless you target one platform only.

Dynamic branching and NVIDIA Forceware Drivers

Several weeks ago, I posted on Beyond3D a thread on my dynamic branching benchmark. I wondered why dynamic branching performances on Geforce 7 were worse than ones on Geforce 6 or 8. I believe I’ve got the answer: Forceware drivers.

Here are some new results where ratio = Branching_ON / Branching_OFF :

7600GS – Fw 84.21 – Branching OFF: 496 o3Marks – Branching ON: 773 o3Marks – Ratio = 1.5
7600GS – Fw 91.31 – Branching OFF: 509 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6
7600GS – Fw 91.36 – Branching OFF: 508 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6
7600GS – Fw 91.37 – Branching OFF: 509 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6

7600GS – Fw 91.45 – Branching OFF: 509 o3Marks – Branching ON: 472 o3Marks – Ratio = 0.9
7600GS – Fw 91.47 – Branching OFF: 509 o3Marks – Branching ON: 472 o3Marks – Ratio = 0.9
7600GS – Fw 93.71 – Branching OFF: 508 o3Marks – Branching ON: 474 o3Marks – Ratio = 0.9
7600GS – Fw 97.92 – Branching OFF: 505 o3Marks – Branching ON: 478 o3Marks – Ratio = 0.9
7600GS – Fw 100.95 – Branching OFF: 508 o3Marks – Branching ON: 480 o3Marks – Ratio = 0.9

my conclusion is: dynamic branching in OpenGL works fine (read the performance are better than without dynamic branching: ratio > 1) for forceware < = 91.37. For the drivers >= 91.45, the ratio drops under 1. Dynamic branching works as expected for gf6 and gf8 but not for gf7 since forceware 91.45. So the bug explanation is a plausible answer (and it’s easily understandable: in this news we learnt that a forceware driver is made of around 20 millions of lines of code – a paradise for a small bug!!!). I’ve also done the test with the simple soft shadows demo provided with the NV SDK 9.5. The results are the same.

I’ve just done the bench with a 7950gx2 and the latest forceware 160.02 and dynamic branching is still buggy…

New NVIDIA OpenGL Extensions Headers

The new OpenGL headers files contain new extensions stuff. You can download them from… just a second, I start GPU Caps Viewer and… okay I got it :thumbup: : from developer.nvidia.com/object/nvidia_opengl_specs.html.

But there are a couple of weird things:

1 – the glext.h version is 28 (#define GL_GLEXT_VERSION 28). The version I use to compile the oZone3D engine renderer is the 29. And I use this header since more than one year…

2 – the glext.h header does not compile with vc6 (yes I still use visual studio 6!) because of the GL_EXT_timer_query extension. Here is the origianl piece of code you can find in glext.h:

/*
* Original code - does not compile with vc6.
*/
#ifndef GL_EXT_timer_query
typedef signed long long GLint64EXT;
typedef unsigned long long GLuint64EXT;
#endif

and here is the code I updated for visual c 6:

/*
Modified code for oZone3D engine - compile with vc6
*/
#ifndef GL_EXT_timer_query
	#ifdef _WIN32
		typedef signed __int64 GLint64EXT;
		typedef unsigned __int64 GLuint64EXT;
	#else
		typedef signed long long GLint64EXT;
		typedef unsigned long long GLuint64EXT;
	#endif
#endif

I wonder if the original glext.h compiles with vc7 or vc8. If anyone has the answer, feel free to contact me…

NVIDIA OpenGL Extension Specifications

Finally NVIDIA releases the specs of the new OpenGL extensions that come with the gf8800. Great news! :thumbup:

These specs are very important for us, poor graphics developers, in order to update our software with the latest cool features. So among these specs, there is the GL_EXT_draw_instanced that allows to do geometry instancing. Another extension is WGL_NV_gpu_affinity. This ext allows to send the gfx calls to a particular GPU in multi-gpus system. Should be cool to see how a 7950GX2 behaves. The GL_EXT_timer_query ext provides a nano-second resolution timer to determine the amount of time it takes to fully complete a set of OpenGL gfx calls. There are still so many cool extensions. As soon as I get a 8800 board, I’ll made a little tutorial to cover these cool extensions.

NVIDIA GLSL compiler

In the demo I received from satyr (see oZone3D.Net forums), there is a toon shader that uses glsl uniforms. The pixel shader looked like to:

uniform float silhouetteThreshold;

void main()
{
  silhouetteThreshold = 0.32;     

  //... shader code
  //... shader code
  //... shader code
}

This pixel shader compiles well on nVidia gc but generes an error on ati. The error is right since an uniform is a read-only variable. This is an example of the nVidia glsl compiler laxity. That’s why I code my shader on ati: if the code is good for ati, we can be sure it will be good for nvidia too (of course there are always some exceptions…)

Uniform Arrays in GLSL

A new version of the Soft Shadows Benchmark is available but this time using uniform arrays to pass the blurring kernel to the pixel shader. On nVidia boards, there is a little increase of speed (1 or 2 fps). On my X700… black screen… Houston, we’ve got a problem… This is with Catalyst 6.6. Okay I try the very latest Catalyst, the 6.7. Bad idea, it’s worse! Both versions (with and without uniform arrays) do not work anymore with C6.7. Back to C6.6. That really sucks! :thumbdown:

But I’ve just received a feedback telling me that the uniform arrays version works fine on an ATI X1600 Pro with C6.5. :thumbup:

Okay, there is certainly a problem with the X*** series and uniform arrays.

Soft Shadows are Great!

I’ve just finihed to implement soft-shadows in the new oZone3D Engine. And I must say that soft shadows bring a huge amount of realism and credibility to 3d scenes. See for yourself:

The oZone3D tech demo is available here: Soft Shadows Demo

You can consider this demo as a little benchmark. Just start the oZone3D_SoftShadows_Benchmark.exe and look at the FPS in the title bar. With my current devstation I got the following score:

PC1: AMD64 3500+ / 1024M DDR400 / Radeon X700Pro 128M Catalyst 6.6 / WinXpsp2: 5 FPS :thumbdown:

Soft shadows are very GPU consuming but they are the future of 3D! So to make the most of soft shadows, update your graphics card!

ATI and Depth Map Filtering

I’ve just found in the super paper of ATI, called “ATI OpenGL Programming and Optimization Guide” that all ATI GPUs from the R300 (Radeon 9700) to the latest R580 (Radeon X1900) only support NEAREST (and the mipmap version) filtering for depth map. That explains the previous results. So if you want a nVidia-like depth map filtering, you have to code the filtering yourself in the pixel shader. Okay, this answer suits me!

Depth Map Filtering – ATI vs NVIDIA

Really ATI has some problems with OpenGL. Now I’m working on soft shadows and my tmp devstation has a Radeon X700 (not the top-notch I know but an enough powerful CG). With my X700 (Catalyst 6.6) the soft shadow edges are rendered as follows:

And on my second CG, a nVidia 6600gt (forceware 91.31), the soft shadows are as follows:

The GLSL shaders are the same, a 5×5 bluring kernel, with a shadow map (or depth map as you want) of 1024×1024 (via a FBO) with a linear filtering. Now if I set the nearest filtering mode, I get the following results for the X700:

and for the 6600gt:

It seems as if the Radeon GPU has a bug in the filtering module when the gpu has to apply a linear filter on a depth map. Very strange.
I’m not satisfied by this explanation but it’s the only I see for the moment.

This kind of problem shows how it’s important for a graphics developer to have at least 2 workstations, one with a nVidia board and the other with an ATI CG. I tell you, realtime 3D is made of blood, sweat and screams! :winkhappy:

NPOT Textures

It’s nice to come back to code!

I’m currently working on a new and simple framework for my OpenGL experimentations before implementing the algorithms in the oZone3D Engine . RaptorGL is a little bit too heavy for simple tests so for the moment I drop it. This new framework I called XPGL (eXPerimental Graphics Library), allows me to quickly test the new algos I’m working on. Every time I have to code a little but fully operational 3D demo in c++/opengl, I spend lot of time for a small result. In these moments, I say to myself that Hyperion is a very cool tool.

Okay, let’s see a weird behavour of radeon gpu. At the moment, my graphics controller is a Radeon X700. With the latest catalyst drivers (6.6), this graphics board should be an OpenGL 2.0 compliant CG. A little check to the GL_VERSION tells me the X700 is GL2 compliant. Then the X700 should handle non power of two texture since this feature is part of the OpenGL 2.0 core. But the GL_ARB_texture_non_power_of_two string is not found in the GL_EXTENSIONS. Maybe ATI does not mention the extensions that are part of the core. Anyways, I loaded a 600×445 npot-texture on a mesh plane and the X700 seems to support this texture. But with a ridiculous fps of 1… Software codepath? I think so! So I decided to load the same texture with power of two dims (512×512) and the fps is become decent again. With my gf6600gt (with the forceware 91.31) I never noticed this effect/bug because the GL2-support is better and nVidia gpus correctly support non power of two texture. You can download the demo with the npot and pot texture (the one mapped onto the mesh plane) hereafter and do the test for yourself. Feel free to drop me a feedback if you wish.

NPOT_Demo.zip (659k)

But keep in mind that graphics hardware is optimized for POT textures. Try to use POT textures in order to maximize your chances to see your demo running everywhere.

Ambient Occlusion Generator

I’m currently working on a new algorithm for the ambient occlusion generator. The basic idea comes from smash, the main coder of Fairlight, a famous demoscene group (thank you mate!). My old AmbOccGen was (is still) really slow: calculating per-vertex AO term for a 40000-polys object with 1000 samplers could take many hours and even more (days!). The following image shows a 40,000 polys scene (each torus has 20,000 polys) and the new alogrithm took only 5 minutes to compute the ambient occlusion for 8192 samples! Really cool and I know I can do better…

I’ll released an end-user tool when the new version of oZone3D will be ready. The new version of oZone3D is now a top priority task (and a particularly huge task…).

ATI X1900XTX and VTF

I’ve just received an email from an user saying that he was’nt able to run the demos of the Vertex Displacement Mapping Tutorial on his brand new Radeon X1900XTX. VTF or Vertex Texture Fetching is a cool feature of high-end graphics chipsets and it’s part of Shader Model 3.0. The X1900 series is based on the R500 chipset (R580) that is a SM3.0 complient GPU. But in OpenGL side and especially in GLSL, VTF is not supported. The OpenGL query done with GL_MAX_VERTEX_TEXTURE_IMAGE_UNITS_ARB always returns 0. That means that no texture units are available in the vertex shader.

ATI confirms this fact in one of its whitepapers shipped with the ATI SDK (ATI OpenGL Programming and Optimization Guide.pdf). At the page 11, we can read this: [i]”All ATI graphics HW have a few items that deserve special consideration when using GLSL. The first major item of note is the absence of vertex texture units. This means that vertex texturing is never available, and all shaders attempting to use texture functions in the vertex shader will fail to link.”[/i]. I know, this is a rude reality. The R580 GPU is really powerful and it’s a pity that ATI does not support VTF in his chipsets. I don’t know how the R580 behaves in D3D side but I can suppose the GPU has the same limitations. VTF is currently supported by Geforce chipset from 6600 to 7900. Conclusion: if you wish to play with VTF, use a nVidia board.

Maybe, all these problems will be solved with the SM4.0. I hope! :winkhappy: