Posts Tagged ‘geforce’
This demo uses instancing techniques (simple instancing, pseudo-instancing and geometry instancing(or GI)) to render a ring made of 10,000 small spheres. The demo is delivered in 5 versions:
- each sphere is made of 1,800 triangles (18 millions triangles for the whole ring)
- each sphere is made of 800 triangles (8 millions triangles for the whole ring)
- each sphere is made of 200 triangles (2 millions triangles for the whole ring)
- each sphere is made of 72 triangles (720,000 triangles for the whole ring)
- each sphere is made of 18 triangles (180,000 triangles for the whole ring)
I added in the last moment a bonus: a 20,000 instances version, each instance made of 5,000 triangles. We get the monstruous count of 100 millions triangles (file Demo_Instancing_100MTriangles_20kInstances.exe).



DOWNLOAD
Several instancing techniques are used and you can select them with F1 to F6 keys.
- F1: simple instancing with camera frustum culling: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the CPU as well as the camera frustum test. OpenGL rendering uses the glDrawElements() function.
- F2: simple instancing without camera frustum culling: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the CPU but there is no longer camera frustum test. OpenGL rendering uses the glDrawElements() function.
- F3: slow pseudo-instancing: there is one source for geometry (a mesh) and it’s rendered for each instance. Now the tranformation matrix calculation is done on the GPU and per-instance data are passed via uniform variables. There is no camera frustum test. OpenGL rendering uses the glDrawElements() function.
- F4: pseudo-instancing: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the GPU and per-instance data are passed via persistent vertex attributes (like texture coordinates or color). This technique has been shown by NVIDIA in the following whitepaper: GLSL Pseudo-Instancing. There is no camera frustum test. OpenGL rendering uses the glDrawElements() function.
- F5: geometry instancing: it’s the real hardware instancing. There is one source for geometry (a mesh) and rendering is done by batchs of 400 instances per draw-call. The whole rendering of the ring requires 25 draw-calls instead of 10,000. The tranformation matrix calculation is done on the GPU and per-batch data is passed via uniform arrays. There is no camera frustum test. OpenGL rendering uses the glDrawElementsInstancedEXT() function. Currently, only NVIDIA GeForce 8 (and higher) support this function.
- F6: geometry instancing with persistant vertex attributes: it’s the hardware instancing coupled with the transmission of parameters is done via the persistent vertex attributes. But the number of persistent vertex attributes is very limited. The best I did is to render 4 instances per draw-call. But oddly, I got the best results with 2 instances per draw-call. In that case, the rendering of whole ring requires 5000 draw-calls. The tranformation matrix calculation is done on the GPU and per-batch data is passed via uniform arrays. There is no camera frustum test. OpenGL rendering uses the glDrawElementsInstancedEXT() function. Currently, only NVIDIA GeForce 8 (and higher) support this function.
Ok now, let’s see some results with a NVIDIA GeForce 8800 GTX and an ATI Radeon HD 3870. Both cards have been tested with an AMD 64 3800+.
18 millions triangles – 1800 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 223MTris/sec – 13FPS
- F2: 223MTris/sec – 13FPS
- F3: 223MTris/sec – 13FPS
- F4: 223MTris/sec – 13FPS
- F5: 223MTris/sec – 13FPS
- F6: 171MTris/sec – 10FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 429MTris/sec – 25FPS
- F2: 463MTris/sec – 27FPS
- F3: 446MTris/sec – 26FPS
- F4: 274MTris/sec – 16FPS
- F5: mode not available
- F6: mode not available
8 millions de triangles – 800 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 190MTri/sec – 25FPS
- F2: 190MTri/sec – 25FPS
- F3: 205MTri/sec – 27FPS
- F4: 213MTri/sec – 28FPS
- F5: 205MTri/sec – 27FPS
- F6: 152MTri/sec – 20FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 251MTris/sec – 33FPS
- F2: 236MTris/sec – 31FPS
- F3: 297MTris/sec – 39FPS
- F4: 251MTris/sec – 33FPS
- F5: mode not available
- F6: mode not available
2 millions de triangles – 200 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 47MTri/sec – 25FPS
- F2: 47MTri/sec – 25FPS
- F3: 57MTri/sec – 30FPS
- F4: 131MTri/sec – 69FPS
- F5: 167MTri/sec – 88FPS
- F6: 148MTri/sec – 78FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 47MTris/sec – 25FPS
- F2: 59MTris/sec – 31FPS
- F3: 74MTris/sec – 39FPS
- F4: 112MTris/sec – 59FPS
- F5: mode not available
- F6: mode not available
720,000 triangles – 72 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 17MTri/sec – 25FPS
- F2: 17MTri/sec – 25FPS
- F3: 20MTri/sec – 30FPS
- F4: 47MTri/sec – 69FPS
- F5: 60MTri/sec – 88FPS
- F6: 53MTri/sec – 78FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 17MTris/sec – 25FPS
- F2: 21MTris/sec – 31FPS
- F3: 26MTris/sec – 39FPS
- F4: 40MTris/sec – 59FPS
- F5: mode not available
- F6: mode not available
180000 triangles – 18 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 4MTri/sec – 25FPS
- F2: 4MTri/sec – 25FPS
- F3: 5MTri/sec – 30FPS
- F4: 11MTri/sec – 69FPS
- F5: 15MTri/sec – 89FPS
- F6: 13MTri/sec – 79FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 4MTris/sec – 25FPS
- F2: 5MTris/sec – 31FPS
- F3: 6MTris/sec – 39FPS
- F4: 10MTris/sec – 59FPS
- F5: mode not available
- F6: mode not available
Quick results analysis:
- we now understand why NVIDIA has called the technique using persistent vertex attributes “Pseudo-Instancing” (key F4). OpenGL glDrawElements() function is extremly fastand persistent vertex attributes require less overhead than uniforms to be passed to vertex shader. Both coupled together give this performance boost.
- benefit of real hardware geometry instancing is mostly visible with few triangles per instance.
- when there are many triangles per instance (1,800), the hardware implementation of glDrawElements() seems to be more efficient (twice!) on RV670 GPU than on G80.
Conclusion
From the results, hardware geometry instancing isn’t the kill-feature I expected. I find that very weird since the différence between 10000 render-calls with glDrawElements and 25 render-calls with glDrawElementsInstancedEXT is not verx important. Seems the instancing management (gl_InstanceID variable in the vertex shader) is a GPU-cycle eater!What a pity ATI hasn’t implemented yet geometry instancing in the Catalyst drivers. I’d be very curious to test hardware GI with a RV670.
Extremetech has reviewed the new gems of Dell, the XPS M1730:

This is a beast, with a Core 2 Extreme X9000 mobile CPU running at 2.8GHz, a pair of GeForce 8800M GTX cards in SLI (the dream for all graphics geeks!), and two 200GB 7200 RPM drives in RAID 0 configuration, it’s no surprise that this $4500 monster is fast.
Dell’s XPS series is really a good product (I’m the lucky owner of the previous generation, the XPS M1710 (with a GeForce 7900GTX/512M)) so if you can afford it, buy it!
Here is a sum-up of the next GPGPU: far more important than you think. Think 10x a CPU’s performance.

This article discusses about the use of GPU for non-graphical purposes or GPGPU (General Purpose (computation) on Graphics Processing Units). The author talks essentially about ATI 3870 X2 and the FireStream 9170 Stream Processor but don’t forget that NVIDIA has also the same kind of products with the Geforce 8 or Tesla. ATI’s Radeon HD 3870 X2 can hit around 1TFLOPS (one trillion floating-point operations per second), in contrast, a high-end quad-core CPU can push out around 60GFLOPS, or one-sixteenth the amount of floating-point power.
To program the GPU for non graphical rendering, you can use the The FireStream SDK (Software Development Kit) for ATI cards that gives the developer low-level access to the workings of the GPU. With NVIDIA boards, you can use CUDA to perform equivalent tasks.
And the final sentence:
”So the next time you look at the Radeon HD 3870 or NVIDIA GeForce 8800 GT in your machine, remember that it’s more than just a graphics card; it’s a floating-point monster that will increasingly be used for non-graphical tasks.”
Related Links:
Les images du nouveaux 3DMark commencent à fleurir un peu partout sur les forums comme celui-ci: tgfc.qwd1.com.


on découvre l’auteur de cette image qui a utilisé les techniques classiques offline (mais avancées!) pour le rendu de cette image. Bon c’est vrai, l’artiste 3D a fait cette image en 2006 mais si 3DMark06 fait le rendu en temps réel de cette scène, alors chapeau bas!
FudZilla vient juste de publier cette news qui dit que certaines des images du nouveau 3DMak08 sont des fausses (des fakes): Fake 3DMark08 screen shots posted. Certaines sont vraies et d’autres sont des fausses. Mais lesquelles ?
Le site incriminé est: pcpop.com. Une façon comme une autre de faire du trafic sur un site!
Je pense qu’il faudra encore attendre un peu avant d’avoir une bonne idée du vrai rendu de 3DMark08.
Affaire à suivre!
It’s nice to come back to code!
I’m currently working on a new and simple framework for my OpenGL experimentations before implementing the algorithms in the oZone3D Engine . RaptorGL is a little bit too heavy for simple tests so for the moment I drop it. This new framework I called XPGL (eXPerimental Graphics Library), allows me to quickly test the new algos I’m working on. Every time I have to code a little but fully operational 3D demo in c++/opengl, I spend lot of time for a small result. In these moments, I say to myself that Hyperion is a very cool tool.
Okay, let’s see a weird behavour of radeon gpu. At the moment, my graphics controller is a Radeon X700. With the latest catalyst drivers (6.6), this graphics board should be an OpenGL 2.0 compliant CG. A little check to the GL_VERSION tells me the X700 is GL2 compliant. Then the X700 should handle non power of two texture since this feature is part of the OpenGL 2.0 core. But the GL_ARB_texture_non_power_of_two string is not found in the GL_EXTENSIONS. Maybe ATI does not mention the extensions that are part of the core. Anyways, I loaded a 600×445 npot-texture on a mesh plane and the X700 seems to support this texture. But with a ridiculous fps of 1… Software codepath? I think so! So I decided to load the same texture with power of two dims (512×512) and the fps is become decent again. With my gf6600gt (with the forceware 91.31) I never noticed this effect/bug because the GL2-support is better and nVidia gpus correctly support non power of two texture. You can download the demo with the npot and pot texture (the one mapped onto the mesh plane) hereafter and do the test for yourself. Feel free to drop me a feedback if you wish.

But keep in mind that graphics hardware is optimized for POT textures. Try to use POT textures in order to maximize your chances to see your demo running everywhere.
