Les Catalyst 7.12 toujours à la sauce “Bug-Inside”

Les derniers pilotes Catalyst ont la version 7.12 (le numéro interne des Cat7.12 est le 8.442.0.0 – c’est pas un téléphone ok!). Mais exactement comme les 7.11, ces drivers ont un bug dans la gestion des lumières dynamiques en GLSL. Mais cette fois-ci, je me suis mis à la recherche du bug car il est un peu, voire très génant pour les demos Demoniak3D. J’ai donc pondu un petit script Demoniak3D qui met en évidence ce bug. Ce script montre un mesh plan éclairé par une lumière dynamique. Un appui sur la touche SPACE permet de changer de shader GLSL: on passe du shader bogué au shader corrigé et inversement.

– L’image suivante montre le plan éclairé avec le shader corrigé:

– L’image suivante montre le plan éclairé avec le shader bogué:

okay tout ceci est bien, mais d’oû vient le bug? Après avoir passé un peu de temps à tweaker les shaders, j’en suis arrivé à la conclusion que le bug se situe au niveau de la valeur contenue dans la variable uniforme built-in gl_LightSource[0].position. Au niveau du vertex shader, cette variable contient la position de la lumière exprimée l’espace de la caméra. C’est OpenGL qui effectue cette transformation, à notre niveau il suffit de spécifier la position de la lumière en coordonnées du monde. Au niveau du vertex shader, gl_LightSource[0].position nous permet de calculer la direction de la lumière utilisée plus tard dans le pixel shader:

	lightDir = gl_LightSource[0].position.xyz - vVertex;

Avec la Radeon HD 3870 et les Catalyst 7.11 et 7.12, la valeur contenue dans gl_LightSource[0].position est fausse.
Donc le workaround que je propose en attendant que la driver team d’ATI corrige le bug, est de passer au vertex shader la position de la lumière exprimée dans les coordonnées du monde ainsi que la matrice de vue de la camera et de faire la transformation à la main:

	vec3 lightPosEye = vec3(mv * vec4(-150.0, 50.0, 0.0, 1.0));
	lightDir = lightPosEye - vVertex;

mv représente la matrice 4×4 de vue de la caméra et vec4(-150.0, 50.0, 0.0, 1.0) représente la position de la lumière en coordonnées du monde.

Au niveau pipeline fixe, les lumières dynamiques sont bien gérées comme le montre l’image suivante:

Au niveau de la démo Demoniak3D, le shader GLSL bogué est appelé OneDynLightShader et celui corrigé OneDynLightShader_Fixed. Le code source de la démo Demoniak3D se trouve dans le fichier OneDynLightTest.xml. Pour lancer la demo, dézippez l’archive dans un répertoire
et lancez DEMO_Catalyst_Bug.exe.

La démo est téléchargeable ici: Demoniak3D Catalyst 7.11/7.12 Bug

Ce bug affecte toutes les radeons MAIS sous Windows XP uniquement. On dirait qu’ATI nous force un peu la main pour passer sous Vista. Pas trop sympa… Ou alors ATI commence à implémenter OpenGL 3.0 dans les drivers XP. Car n’oublions pas qu’avec OpenGL 3.0, tout comme avec DX10, les fonctions fixes du pipelines 3D comme la gestion des lumières dynamiques sont supprimées.

Je voudrais remercier la communauté WorldPCSpecs pour les tests. Merci les gars!

Petit changement de Thème

Voilà j’ai encore changé de thème pour ce blog. Et oui que voulez vous, avec la quantité astronomique de thèmes disponibles, je pourrais en changer tous les jours. Ceci dit, pour ceux qui sont interessés, ce thème est dispo ici: Paalam.

Juste pour info, l’image d’entête est issue d’une petite démo 3d qui sera releasée prochainement.

Compétition d’overclocking GPU

Le site www.hardware.info propose une compétition d’overclocking de GPU et utilise le Fur Rendering Benchmark comme utilitaire principal (bon c’est ma version des faits vu que tout est écrit en néerlandais mais je ne dois pas me tromper de beaucoup – si quelqu’un comprend cette langue, merci d’avance pour un petit feedback de ce qui s’y raconte). Décidément, le fur benchmark fait parler de lui ces derniers temps. Mais le truc marrant c’est que tout le monde s’obstine à utiliser la version 1.0.0 alors que la version 1.1.0 existe…

La page de la compétition se trouve ici: GPU Overclocking Contest

The Technology of a 3D Engine @ Beyond3D – Part 1

“This series of articles is meant for anyone willing to write, or learn about the process of writing, a modern, streaming, 3D engine, taking advantage of current programmable hardware.”

Read the article HERE

Nos amis de Beyond3D viennent de lancer une nouvelle serie d’articles, cette fois ci sur l’architecture d’un moteur 3D moderne et qui sait exploiter nos cartes graphiques toujours plus puissantes. Après quelques pages de banalités (pages 1, 2 et 3), la quatrième et dernière page (quoi déjà?) nous parle plus en détail des différrentes API de rendu 3D (Direct3D et OpenGL) et l’auteur nous dit que son moteur (le FlExtEngine) utilise une couche d’abstraction pour le renderer 3D. C’est une solution dece type qui est utilisé dans le moteur oZone3D qui propulse Demoniak3D ou GPU Caps Viewer.

Donc la lecture de cet article et surtout des suivants vous permettra d’en apprendre un peu plus sur les coulisses de Demoniak3D. J’essaierai de faire un petit feedback lors de la sortie des autres articles.

Les nouveaux Catalyst 7.11 à la sauce “Bug-Inside”

ATI vient de nous livrer les nouveaux Catalyst 7.11 pour nos belles Radeon. Mais on dirait que ça commence à être une habitude chez les petits gars d’ATI de nous pondre des pilotes bogués surtout pour les nouvelles cartes! Souvenez-vous des Catalyst 7.9 qui enfin corrigeaient un gros bug au niveau des shadow-maps et ce bug n’était visible que pour les Radeon 2k. Bien maintenant c’est la même chose avec les Cat7.11: ils sont bogués pour les Radeon 3k au niveau OpenGL: impossible de mettre plus d’une lumière dynamique dans les shaders GLSL! C’est quand même un sacré bug! Bon pour le moment je n’ai testé que sous WinXP donc peut etre que sous Vista c’est mieux.

A part ce bug (il y en a surement d’autres mais j’ai pas fait assez de tests pour le savoir), les Cat7.11 sont les premiers pilotes qui supportent les Radeon HD 3870. Le numéro interne des Cat7.11 est les 8.432.0.0.

Le téléchargement des Cat7.11 se passe ici:

WinXP 32-bit: [DOWNLOAD]
Vista 32-bit: [DOWNLOAD]

Lancement du blog infâme.

Voilà le post d’ouverture de ce nouveau blog. J’arrête le JeGX’s DevBlog pour toutes sortes de raisons et je commence ce nouveau blog plus généraliste mais toujours orienté tech! De plus il sera seulement en français car pousser des cris de rage ou d’insulte est beaucoup plus simple en FR qu’en anglais.

Plus sérieusement, ce blog causera de softwares (utilitaires, jeux vidéo, …), de matos (surtout graphique, non mais!), de code (bin oui!) et de tout ce qui me semble intéressant au niveau tech.

Bon ça c’est fait…

Catalyst 7.9, Radeon 2900 and Surface Deformer

From oZone3D.Net Forums, the Catalyst 7.9 seems to unleash ATI Radeon 2900 GPU. The Surface Deformer benchmark is a benchmark that requires a lot of vertex processing horse power. With Catalyst prior to 7.9, the score of an ATI 2900 was around 8000 o3Marks (that was already high). Now with Catalyst 7.9, the 2900 gets a score of 15000 o3Marks. Incredible!!! Why such a big big jump in OpenGL performance ?

My first thought is that ATI has managed to use correctly the unified arch of the R600 gpu. With unified arch, the workload is distribued over all shaders processors no matter the type of the shader prog (vertex or pixel). So if the vertex shader needs more processing power than the pixel shader, more shaders processors will be used for the vertex shader. My second thought: unified arch has involved new kernel code for catalyst and simply ATI has optimized the R600 codepath. A driver for a modern GPU like the R600 is a very complex piece of code and optimizing such a code is a huge task….

Catalyst 7.9 and Radeon 2K Shadow Mapping Bug

I found this bug while I was coding a new small soft shadows demo for GPU Caps Viewer. Soft shadows are built on shadow mapping and my OpenGL shadow mapping code works perfectly on all Geforce 6/7/8 and Radeon 1k but not on Radeon 2K (2400/2600/2900). Why ? Because of the shadow mapping comparison function that had a serious bug! To be short, the comparison function was supposed to return a boolean value (if shadow returns 0, else returns 1) and before Catalyst 7.9, this function returned, for Radeon 2K, the depth buffer value (as if the comparison function was disabled). But this bug is now a memory since Catalyst 7.9 has fixed it.

I guess we can say thanks to Quake Wars, that has been released few days ago and that is an OpenGL game. For this game (that is really nice), ATI has fixed all major OpenGL bugs.

Fur Rendering Benchmark

I officially released the fur rendering benchmark 4 days ago. So let’s analyze a little bit the first feedbacks available on forums over the web.

Homepage: www.ozone3d.net/benchmarks/fur/

1 – Fur rendering benchmark isn’t cpu dependent and this is a very good thing for a graphics card benchmark. No matter the cpu speed, the result for a given card stays equivalent:
oZone3D.Net forums
extremeoverclocking.com forums
extremeoverclocking.com forums
“yes the first propper gpu bench ive come across, i really like this…. it stops all the arguments about memory timings and cpu speeds. its a good equaliser as all our systems can provide that 10% cpu info the gpu needs…”

2 – 8800GTX vs 2900XT
ATI 2900XT seems to beat NVIDIA 8800GTX. In all forums, the 2900XT is ahead:
oZone3D.Net forums
overclockers.co.uk forums
extremeoverclocking.com forums

3 – this benchmark seems to nicely overload the graphics card and then is a cool GPU burner and stress/stability test utility.
clubic.com forums : “Par contre j ai jamais vu ma carte graphique chauffer autant: environ 100° pendant la test:ouch:”
oZone3D.Net forums: “This thing just succeeded to shut down twice the PSU, caused by overloading of the graphics board!!”

I done a little test with my 8800GTX:
– gpu core temp at rest: 58°C
– gpu core temp at load: 83°C

Okay, that’s all for that small benchmark. :winkhappy:

GLSL: ATI vs NVIDIA

Today two new differences between Radeon and Geforce GLSL support.

1 – float2 / vec2
vec2 is the GLSL type to hold a 2d vector. vec2 is supported by NVIDIA and ATI. float2 is a 2d vector but for Direct3D HLSL and for Cg. The GLSL compilation for Geforce is done via the NVIDIA Cg compiler. Here is the GLSL version displayed by GPU Caps Viewer: 1.20 NVIDIA via Cg compiler. That explains why a GLSL source that contains a float2 is compilable on NVIDIA hardware. But the GLSL compiler of ATI is strict and doesn’t recognize the float2 type.

2 – the following line:

vec2 vec = texture2D( tex, gl_TexCoord[0].st );

is valid for NVIDIA compiler but produces an error with ATI compiler. One again, the ATI GLSL compiler has done a good job. By default, texture2D() returns a 4d vector. The right syntax is:

vec2 vec = texture2D( tex, gl_TexCoord[0].st ).xy;

Conclusion: always test your shaders on both ATI and NVIDIA platforms unless you target one platform only.

R600 is VTF-capable

“All of the fetch and filtering capabilities are available to each thread type, making the samplers completely agnostic about what’s using them.”

This line from Beyond3D article on R600 means that vertex, geometry and pixel shaders can access to texture samplers. So Vertex Texture Fetching is now available with Radeon 2k series :thumbup: What’s more, the R600 can handle very large texture up to 8192×8192 just like the G80.

Dynamic branching and NVIDIA Forceware Drivers

Several weeks ago, I posted on Beyond3D a thread on my dynamic branching benchmark. I wondered why dynamic branching performances on Geforce 7 were worse than ones on Geforce 6 or 8. I believe I’ve got the answer: Forceware drivers.

Here are some new results where ratio = Branching_ON / Branching_OFF :

7600GS – Fw 84.21 – Branching OFF: 496 o3Marks – Branching ON: 773 o3Marks – Ratio = 1.5
7600GS – Fw 91.31 – Branching OFF: 509 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6
7600GS – Fw 91.36 – Branching OFF: 508 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6
7600GS – Fw 91.37 – Branching OFF: 509 o3Marks – Branching ON: 850 o3Marks – Ratio = 1.6

7600GS – Fw 91.45 – Branching OFF: 509 o3Marks – Branching ON: 472 o3Marks – Ratio = 0.9
7600GS – Fw 91.47 – Branching OFF: 509 o3Marks – Branching ON: 472 o3Marks – Ratio = 0.9
7600GS – Fw 93.71 – Branching OFF: 508 o3Marks – Branching ON: 474 o3Marks – Ratio = 0.9
7600GS – Fw 97.92 – Branching OFF: 505 o3Marks – Branching ON: 478 o3Marks – Ratio = 0.9
7600GS – Fw 100.95 – Branching OFF: 508 o3Marks – Branching ON: 480 o3Marks – Ratio = 0.9

my conclusion is: dynamic branching in OpenGL works fine (read the performance are better than without dynamic branching: ratio > 1) for forceware < = 91.37. For the drivers >= 91.45, the ratio drops under 1. Dynamic branching works as expected for gf6 and gf8 but not for gf7 since forceware 91.45. So the bug explanation is a plausible answer (and it’s easily understandable: in this news we learnt that a forceware driver is made of around 20 millions of lines of code – a paradise for a small bug!!!). I’ve also done the test with the simple soft shadows demo provided with the NV SDK 9.5. The results are the same.

I’ve just done the bench with a 7950gx2 and the latest forceware 160.02 and dynamic branching is still buggy…

Quick Review – Asus Silient Square Pro

Voilà le nouveau ventirad pour le cpu que j’ai installé sur ma machine de test:

Je dois dire que j’en suis assez satisfait. Facile à installer, ce ventirad d’Asus est quasiment inaudible. Et il rempli à merveille sa fonction de dissipation car les aillettes restent froides en permanence. Evidemment, je pense que c’est mon AMD X2 3800+ qui ne chauffe pas assez mais par rapport au ventirad d’origine livré avec le cpu, la différence est nette.

Du bon matos :thumbup:

Embedded Your Shader Souce Code In Your C/C++ Apps

The NVIDIA developer blog shows a way to include shaders codes to your
windows exe: blogs.nvidia.com/developers/2007/03/inlining_shader.html.

But this example is not fully operational. I slightly modified the code to make it totally operational (I compiled it on vc++ 6.0):

1) Add a define to your resource.h file:
#define IDF_SHADEFILE 1000

2) Add an entry in your resource.rc file:
IDF_SHADERFILE RCDATA DISCARDABLE "myShader.glsl"

3) Use the resource in your code:
HMODULE hModule = GetModuleHandle(NULL);
HRSRC hResource = FindResource(hModule, (LPCTSTR)IDF_SHADERFILE, RT_RCDATA);
if(hResource) 
{
  DWORD dwSize = SizeofResource(hModule, hResource);
  HGLOBAL hGlobal = LoadResource(hModule, hResource);
  if(hGlobal) 
  {
    LPVOID pData = LockResource(hGlobal);
    if(pData) 
    {
	// Cast pData to a char * and you have your shader
	char *shader_code = (char *)pData;
	
        // Now do whatever you want with shader_code pointer. 
	// Do not forget that shader_code is not a zero-terminated string!
	// Use dwSize to handle that.
			
    }
  }
}

New NVIDIA OpenGL Extensions Headers

The new OpenGL headers files contain new extensions stuff. You can download them from… just a second, I start GPU Caps Viewer and… okay I got it :thumbup: : from developer.nvidia.com/object/nvidia_opengl_specs.html.

But there are a couple of weird things:

1 – the glext.h version is 28 (#define GL_GLEXT_VERSION 28). The version I use to compile the oZone3D engine renderer is the 29. And I use this header since more than one year…

2 – the glext.h header does not compile with vc6 (yes I still use visual studio 6!) because of the GL_EXT_timer_query extension. Here is the origianl piece of code you can find in glext.h:

/*
* Original code - does not compile with vc6.
*/
#ifndef GL_EXT_timer_query
typedef signed long long GLint64EXT;
typedef unsigned long long GLuint64EXT;
#endif

and here is the code I updated for visual c 6:

/*
Modified code for oZone3D engine - compile with vc6
*/
#ifndef GL_EXT_timer_query
	#ifdef _WIN32
		typedef signed __int64 GLint64EXT;
		typedef unsigned __int64 GLuint64EXT;
	#else
		typedef signed long long GLint64EXT;
		typedef unsigned long long GLuint64EXT;
	#endif
#endif

I wonder if the original glext.h compiles with vc7 or vc8. If anyone has the answer, feel free to contact me…

Quick Review – GPU Caps Viewer

GPU Caps Viewer is the new I worked on these last days. It’s the successor of HardwareInfos. GPU Caps Viewer is based on the branch v3.x of the oZone3D engine (while HardwareInfos is an oZone3D v.2.x branch based tool). In addition to classic GPU/CPU information / capabilities, GPU Caps Viewer offers two cool features:

– an OpenGL Extensions database. Either you can see the extensions supported by the current graphics card or you can see all existing extensions no matter the graphics board you have. You can quickly select an extension and jump directly to ist webpage (SGI or NVIDIA extensions specs). I must confess it’s very useful for me.

– a GPU-Burner… that was the hard-coding part of GPU Caps Viewer. The GPU-Burner allows to open several 3D windows. Actually you can open as many 3D views you want (1, 2, 4, 6, 10, 20, …). Each view renders a GLSL toon-shaded object with vsync disabled. You can set the size of each window individually (default size is 400×400). Each 3D view is rendered in its own thread… I let you imagine how hard is to debug a multitreaded gfx application :raspberry: And because I’m only a human, there are always some bugs in my code. But there is a very cool tool that helped me to manage the mad threads: ProcessExplorer :thumbup: You can download it here: www.majorgeeks.com/Process_Explorer_d4566.html.

Here an screenshot of my desktop with 13 instances of the 3D view runing at the same time. I will release GPU Caps Viewer very very soon. So stay tuned! :winkhappy:

NVIDIA OpenGL Extension Specifications

Finally NVIDIA releases the specs of the new OpenGL extensions that come with the gf8800. Great news! :thumbup:

These specs are very important for us, poor graphics developers, in order to update our software with the latest cool features. So among these specs, there is the GL_EXT_draw_instanced that allows to do geometry instancing. Another extension is WGL_NV_gpu_affinity. This ext allows to send the gfx calls to a particular GPU in multi-gpus system. Should be cool to see how a 7950GX2 behaves. The GL_EXT_timer_query ext provides a nano-second resolution timer to determine the amount of time it takes to fully complete a set of OpenGL gfx calls. There are still so many cool extensions. As soon as I get a 8800 board, I’ll made a little tutorial to cover these cool extensions.

Ageia PhysX SDK for free

Ageia has announced new licensing terms, allowing its PhysX SDK to be used and its runtime components distributed in all commercial and non-commercial PC projects for free.

This is a really good news for the community and for Hyperion! I filled up the register form and now I hope to receive the download link quickly.