Welcome!
Welcome dans ce blog infâme, un des endroits du web ou je poste les choses que je ne poste pas ailleurs (i.e: Geeks3D...) et où je teste les thèmes WP que je trouve sympa.

RSS Feed

Archive for the ‘OpenGL’ Category

Maths...


Read the rest of this entry »

As I said in this news, the release of Catalyst 8.10 BETA comes with a nice bugfix: vertex texture fetching is now operational on Radeon (at least on my Radeon HD 4850). From 2 or 3 months, Catalyst makes it possible to fetch texture from inside a vertex shader. You can see with GPU Caps Viewer how many texture units are exposed in a vertex shader for your Radeon:


But so far, vertex texture fetching in GLSL didn’t work due to a bug in the driver. But now this is an old story, since VTF works well. For more details about vertex displacement mapping, you can read this rather old (2 years!) tutorial: Vertex Displacement Mapping using GLSL.

This very cool news makes me want to create a new benchmark based on VTF!

I’ve only tested the XP version of Catalyst 8.10. If someone has tested the Vista version, feel free to post a comment…

Next step for ATI driver team: enable geometry texture fetching: allows texture fetching inside a geometry shader…

See you soon!

  • English
  • French

This demo uses instancing techniques (simple instancing, pseudo-instancing and geometry instancing(or GI)) to render a ring made of 10,000 small spheres. The demo is delivered in 5 versions:

  • each sphere is made of 1,800 triangles (18 millions triangles for the whole ring)
  • each sphere is made of 800 triangles (8 millions triangles for the whole ring)
  • each sphere is made of 200 triangles (2 millions triangles for the whole ring)
  • each sphere is made of 72 triangles (720,000 triangles for the whole ring)
  • each sphere is made of 18 triangles (180,000 triangles for the whole ring)

I added in the last moment a bonus: a 20,000 instances version, each instance made of 5,000 triangles. We get the monstruous count of 100 millions triangles (file Demo_Instancing_100MTriangles_20kInstances.exe).





DOWNLOAD

Several instancing techniques are used and you can select them with F1 to F6 keys.

  • F1: simple instancing with camera frustum culling: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the CPU as well as the camera frustum test. OpenGL rendering uses the glDrawElements() function.
  • F2: simple instancing without camera frustum culling: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the CPU but there is no longer camera frustum test. OpenGL rendering uses the glDrawElements() function.
  • F3: slow pseudo-instancing: there is one source for geometry (a mesh) and it’s rendered for each instance. Now the tranformation matrix calculation is done on the GPU and per-instance data are passed via uniform variables. There is no camera frustum test. OpenGL rendering uses the glDrawElements() function.
  • F4: pseudo-instancing: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the GPU and per-instance data are passed via persistent vertex attributes (like texture coordinates or color). This technique has been shown by NVIDIA in the following whitepaper: GLSL Pseudo-Instancing. There is no camera frustum test. OpenGL rendering uses the glDrawElements() function.
  • F5: geometry instancing: it’s the real hardware instancing. There is one source for geometry (a mesh) and rendering is done by batchs of 400 instances per draw-call. The whole rendering of the ring requires 25 draw-calls instead of 10,000. The tranformation matrix calculation is done on the GPU and per-batch data is passed via uniform arrays. There is no camera frustum test. OpenGL rendering uses the glDrawElementsInstancedEXT() function. Currently, only NVIDIA GeForce 8 (and higher) support this function.
  • F6: geometry instancing with persistant vertex attributes: it’s the hardware instancing coupled with the transmission of parameters is done via the persistent vertex attributes. But the number of persistent vertex attributes is very limited. The best I did is to render 4 instances per draw-call. But oddly, I got the best results with 2 instances per draw-call. In that case, the rendering of whole ring requires 5000 draw-calls. The tranformation matrix calculation is done on the GPU and per-batch data is passed via uniform arrays. There is no camera frustum test. OpenGL rendering uses the glDrawElementsInstancedEXT() function. Currently, only NVIDIA GeForce 8 (and higher) support this function.

Ok now, let’s see some results with a NVIDIA GeForce 8800 GTX and an ATI Radeon HD 3870. Both cards have been tested with an AMD 64 3800+.

18 millions triangles – 1800 tri/instance

NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32

  • F1: 223MTris/sec – 13FPS
  • F2: 223MTris/sec – 13FPS
  • F3: 223MTris/sec – 13FPS
  • F4: 223MTris/sec – 13FPS
  • F5: 223MTris/sec – 13FPS
  • F6: 171MTris/sec – 10FPS

ATI Radeon HD 3870 – Catalyst 8.2 XP32

  • F1: 429MTris/sec – 25FPS
  • F2: 463MTris/sec – 27FPS
  • F3: 446MTris/sec – 26FPS
  • F4: 274MTris/sec – 16FPS
  • F5: mode not available
  • F6: mode not available

8 millions de triangles – 800 tri/instance

NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32

  • F1: 190MTri/sec – 25FPS
  • F2: 190MTri/sec – 25FPS
  • F3: 205MTri/sec – 27FPS
  • F4: 213MTri/sec – 28FPS
  • F5: 205MTri/sec – 27FPS
  • F6: 152MTri/sec – 20FPS

ATI Radeon HD 3870 – Catalyst 8.2 XP32

  • F1: 251MTris/sec – 33FPS
  • F2: 236MTris/sec – 31FPS
  • F3: 297MTris/sec – 39FPS
  • F4: 251MTris/sec – 33FPS
  • F5: mode not available
  • F6: mode not available

2 millions de triangles – 200 tri/instance

NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32

  • F1: 47MTri/sec – 25FPS
  • F2: 47MTri/sec – 25FPS
  • F3: 57MTri/sec – 30FPS
  • F4: 131MTri/sec – 69FPS
  • F5: 167MTri/sec – 88FPS
  • F6: 148MTri/sec – 78FPS

ATI Radeon HD 3870 – Catalyst 8.2 XP32

  • F1: 47MTris/sec – 25FPS
  • F2: 59MTris/sec – 31FPS
  • F3: 74MTris/sec – 39FPS
  • F4: 112MTris/sec – 59FPS
  • F5: mode not available
  • F6: mode not available

720,000 triangles – 72 tri/instance

NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32

  • F1: 17MTri/sec – 25FPS
  • F2: 17MTri/sec – 25FPS
  • F3: 20MTri/sec – 30FPS
  • F4: 47MTri/sec – 69FPS
  • F5: 60MTri/sec – 88FPS
  • F6: 53MTri/sec – 78FPS

ATI Radeon HD 3870 – Catalyst 8.2 XP32

  • F1: 17MTris/sec – 25FPS
  • F2: 21MTris/sec – 31FPS
  • F3: 26MTris/sec – 39FPS
  • F4: 40MTris/sec – 59FPS
  • F5: mode not available
  • F6: mode not available

180000 triangles – 18 tri/instance

NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32

  • F1: 4MTri/sec – 25FPS
  • F2: 4MTri/sec – 25FPS
  • F3: 5MTri/sec – 30FPS
  • F4: 11MTri/sec – 69FPS
  • F5: 15MTri/sec – 89FPS
  • F6: 13MTri/sec – 79FPS

ATI Radeon HD 3870 – Catalyst 8.2 XP32

  • F1: 4MTris/sec – 25FPS
  • F2: 5MTris/sec – 31FPS
  • F3: 6MTris/sec – 39FPS
  • F4: 10MTris/sec – 59FPS
  • F5: mode not available
  • F6: mode not available

Quick results analysis:

  • we now understand why NVIDIA has called the technique using persistent vertex attributes “Pseudo-Instancing” (key F4). OpenGL glDrawElements() function is extremly fastand persistent vertex attributes require less overhead than uniforms to be passed to vertex shader. Both coupled together give this performance boost.
  • benefit of real hardware geometry instancing is mostly visible with few triangles per instance.
  • when there are many triangles per instance (1,800), the hardware implementation of glDrawElements() seems to be more efficient (twice!) on RV670 GPU than on G80.

Conclusion

From the results, hardware geometry instancing isn’t the kill-feature I expected. I find that very weird since the différence between 10000 render-calls with glDrawElements and 25 render-calls with glDrawElementsInstancedEXT is not verx important. Seems the instancing management (gl_InstanceID variable in the vertex shader) is a GPU-cycle eater!What a pity ATI hasn’t implemented yet geometry instancing in the Catalyst drivers. I’d be very curious to test hardware GI with a RV670.

Je viens de passer deux heures de debug pour rien à cause d’un bug dans les Forceware 163.75. Une de mes routines qui utilise les fonctions de l’extension GL_ARB_occlusion_query plantait dès que le nombre de demandes devenait un peu grand (genre plus de 10000 demandes). Je me disais qu’il devait surement s’agir d’un buffer trop plein quelque part ou bien trop de demandes en attente (le debugger plantait dans la lib opengl de NVIDIA). Alors j’ai un peu bricolé et je me suis aperçu que glGetError() permettait à mon code de fonctionner ce qui laisse penser à un problème de latence / parallélisme au niveau des instructions OpenGL dans le pilote forceware.

Puis soudainement, en voyant arriver sur mon lecteur RSS la disponibilité du nouveau pilote ForceWare 169.38, je me suis dis: allez je ferme mes cinquantes fenêtres (je ne sais pas pour vous mais chez moi chaque instance de Visual Studio 2005 met 3 plombes pour se fermer en comparaison de Visual C++ 6.0 qui se fermait quasi immédiatement)
et hop j’installe ce nouveau driver. Yes! Mes routines d’occlusion query se sont remises à marcher parfaitement. Donc je me suis pris la tête pour rien sur un bug des ForceWare 163.75 ou mieux sur un bug qui est plus présent dans les ForceWare 169.38. Conclusion: mettez à jour vos drivers!

Je viens de m’amuser un peu avec Lumina. C’est un petit environnement de mise au point de shader GLSL. Le projet demarre et il y a encore pas mal de petites coquilles (essayez de charger plusieurs projets les uns à la suite des autres ou plus simplement chargez le projet de test deferred3.lum: l’interface graphique aime moyennement!) qui trainent mais le concept est bon. En regardant de plus prêt, cela ressemble fortement à une interface graphique posé sur un soft comme Demoniak3D. Il y a des scripts (écrits dans un language basé sur ECMA)pour mettre en place les éléments de la scene 3d et les controler. Il y a aussi les scripts GLSL (vertex, pixel et geometry). Si on analyse un fichier de projet on découvre une structure similaire à une démo Demoniak3D: un script XML, des nodes <script>, <shader>, etc.

Maintenant que le tour du proprio est fait, voilà mon premier projet de test ultra simple: afficher un torus jaune qui tourne le tout utilisant un vertex et un pixel shader pour le rendu. J’ai pu coder ce projet rapidement avec une analyse rapide des fichiers de projets *.lum.

Le projet est téléchargeable ici: lumina_jegx_test_01.zip

Globalement c’est sympa mais l’interêt de l’interface graphique est discutable. Dans ce type de soft (Lumina ou Demoniak3D) soit l’interface graphique est de haut niveau et simple à utiliser soit vaut mieux s’en passer. Je vais quand même étudier plus en détail le fonctionnement de Lumina ne serait-ce que pour améliorer Demoniak3D et son successeur…

Dans le même esprit que lumina il y a aussi FX Composer (NVIDIA) et RenderMonkey (ATI).

I found this bug while I was coding a new small soft shadows demo for GPU Caps Viewer. Soft shadows are built on shadow mapping and my OpenGL shadow mapping code works perfectly on all Geforce 6/7/8 and Radeon 1k but not on Radeon 2K (2400/2600/2900). Why ? Because of the shadow mapping comparison function that had a serious bug! To be short, the comparison function was supposed to return a boolean value (if shadow returns 0, else returns 1) and before Catalyst 7.9, this function returned, for Radeon 2K, the depth buffer value (as if the comparison function was disabled). But this bug is now a memory since Catalyst 7.9 has fixed it.

I guess we can say thanks to Quake Wars, that has been released few days ago and that is an OpenGL game. For this game (that is really nice), ATI has fixed all major OpenGL bugs.

The new OpenGL headers files contain new extensions stuff. You can download them from… just a second, I start GPU Caps Viewer and… okay I got it :thumbup: : from developer.nvidia.com/object/nvidia_opengl_specs.html.

But there are a couple of weird things:

1 – the glext.h version is 28 (#define GL_GLEXT_VERSION 28). The version I use to compile the oZone3D engine renderer is the 29. And I use this header since more than one year…

2 – the glext.h header does not compile with vc6 (yes I still use visual studio 6!) because of the GL_EXT_timer_query extension. Here is the origianl piece of code you can find in glext.h:

/*
* Original code - does not compile with vc6.
*/
#ifndef GL_EXT_timer_query
typedef signed long long GLint64EXT;
typedef unsigned long long GLuint64EXT;
#endif

and here is the code I updated for visual c 6:

/*
Modified code for oZone3D engine - compile with vc6
*/
#ifndef GL_EXT_timer_query
	#ifdef _WIN32
		typedef signed __int64 GLint64EXT;
		typedef unsigned __int64 GLuint64EXT;
	#else
		typedef signed long long GLint64EXT;
		typedef unsigned long long GLuint64EXT;
	#endif
#endif

I wonder if the original glext.h compiles with vc7 or vc8. If anyone has the answer, feel free to contact me…

GPU Caps Viewer is the new I worked on these last days. It’s the successor of HardwareInfos. GPU Caps Viewer is based on the branch v3.x of the oZone3D engine (while HardwareInfos is an oZone3D v.2.x branch based tool). In addition to classic GPU/CPU information / capabilities, GPU Caps Viewer offers two cool features:

- an OpenGL Extensions database. Either you can see the extensions supported by the current graphics card or you can see all existing extensions no matter the graphics board you have. You can quickly select an extension and jump directly to ist webpage (SGI or NVIDIA extensions specs). I must confess it’s very useful for me.

- a GPU-Burner… that was the hard-coding part of GPU Caps Viewer. The GPU-Burner allows to open several 3D windows. Actually you can open as many 3D views you want (1, 2, 4, 6, 10, 20, …). Each view renders a GLSL toon-shaded object with vsync disabled. You can set the size of each window individually (default size is 400×400). Each 3D view is rendered in its own thread… I let you imagine how hard is to debug a multitreaded gfx application :raspberry: And because I’m only a human, there are always some bugs in my code. But there is a very cool tool that helped me to manage the mad threads: ProcessExplorer :thumbup: You can download it here: www.majorgeeks.com/Process_Explorer_d4566.html.

Here an screenshot of my desktop with 13 instances of the 3D view runing at the same time. I will release GPU Caps Viewer very very soon. So stay tuned! :winkhappy:

Finally NVIDIA releases the specs of the new OpenGL extensions that come with the gf8800. Great news! :thumbup:

These specs are very important for us, poor graphics developers, in order to update our software with the latest cool features. So among these specs, there is the GL_EXT_draw_instanced that allows to do geometry instancing. Another extension is WGL_NV_gpu_affinity. This ext allows to send the gfx calls to a particular GPU in multi-gpus system. Should be cool to see how a 7950GX2 behaves. The GL_EXT_timer_query ext provides a nano-second resolution timer to determine the amount of time it takes to fully complete a set of OpenGL gfx calls. There are still so many cool extensions. As soon as I get a 8800 board, I’ll made a little tutorial to cover these cool extensions.

I’ve just finihed to implement soft-shadows in the new oZone3D Engine. And I must say that soft shadows bring a huge amount of realism and credibility to 3d scenes. See for yourself:

The oZone3D tech demo is available here: Soft Shadows Demo

You can consider this demo as a little benchmark. Just start the oZone3D_SoftShadows_Benchmark.exe and look at the FPS in the title bar. With my current devstation I got the following score:

PC1: AMD64 3500+ / 1024M DDR400 / Radeon X700Pro 128M Catalyst 6.6 / WinXpsp2: 5 FPS :thumbdown:

Soft shadows are very GPU consuming but they are the future of 3D! So to make the most of soft shadows, update your graphics card!