Posts Tagged ‘radeon’
A graphics card with a very nice looking GPU cooler. Unfortunately, this card will available in Korea only…


- Source
This demo uses instancing techniques (simple instancing, pseudo-instancing and geometry instancing(or GI)) to render a ring made of 10,000 small spheres. The demo is delivered in 5 versions:
- each sphere is made of 1,800 triangles (18 millions triangles for the whole ring)
- each sphere is made of 800 triangles (8 millions triangles for the whole ring)
- each sphere is made of 200 triangles (2 millions triangles for the whole ring)
- each sphere is made of 72 triangles (720,000 triangles for the whole ring)
- each sphere is made of 18 triangles (180,000 triangles for the whole ring)
I added in the last moment a bonus: a 20,000 instances version, each instance made of 5,000 triangles. We get the monstruous count of 100 millions triangles (file Demo_Instancing_100MTriangles_20kInstances.exe).



DOWNLOAD
Several instancing techniques are used and you can select them with F1 to F6 keys.
- F1: simple instancing with camera frustum culling: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the CPU as well as the camera frustum test. OpenGL rendering uses the glDrawElements() function.
- F2: simple instancing without camera frustum culling: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the CPU but there is no longer camera frustum test. OpenGL rendering uses the glDrawElements() function.
- F3: slow pseudo-instancing: there is one source for geometry (a mesh) and it’s rendered for each instance. Now the tranformation matrix calculation is done on the GPU and per-instance data are passed via uniform variables. There is no camera frustum test. OpenGL rendering uses the glDrawElements() function.
- F4: pseudo-instancing: there is one source for geometry (a mesh) and it’s rendered for each instance. The tranformation matrix calculation is done on the GPU and per-instance data are passed via persistent vertex attributes (like texture coordinates or color). This technique has been shown by NVIDIA in the following whitepaper: GLSL Pseudo-Instancing. There is no camera frustum test. OpenGL rendering uses the glDrawElements() function.
- F5: geometry instancing: it’s the real hardware instancing. There is one source for geometry (a mesh) and rendering is done by batchs of 400 instances per draw-call. The whole rendering of the ring requires 25 draw-calls instead of 10,000. The tranformation matrix calculation is done on the GPU and per-batch data is passed via uniform arrays. There is no camera frustum test. OpenGL rendering uses the glDrawElementsInstancedEXT() function. Currently, only NVIDIA GeForce 8 (and higher) support this function.
- F6: geometry instancing with persistant vertex attributes: it’s the hardware instancing coupled with the transmission of parameters is done via the persistent vertex attributes. But the number of persistent vertex attributes is very limited. The best I did is to render 4 instances per draw-call. But oddly, I got the best results with 2 instances per draw-call. In that case, the rendering of whole ring requires 5000 draw-calls. The tranformation matrix calculation is done on the GPU and per-batch data is passed via uniform arrays. There is no camera frustum test. OpenGL rendering uses the glDrawElementsInstancedEXT() function. Currently, only NVIDIA GeForce 8 (and higher) support this function.
Ok now, let’s see some results with a NVIDIA GeForce 8800 GTX and an ATI Radeon HD 3870. Both cards have been tested with an AMD 64 3800+.
18 millions triangles – 1800 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 223MTris/sec – 13FPS
- F2: 223MTris/sec – 13FPS
- F3: 223MTris/sec – 13FPS
- F4: 223MTris/sec – 13FPS
- F5: 223MTris/sec – 13FPS
- F6: 171MTris/sec – 10FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 429MTris/sec – 25FPS
- F2: 463MTris/sec – 27FPS
- F3: 446MTris/sec – 26FPS
- F4: 274MTris/sec – 16FPS
- F5: mode not available
- F6: mode not available
8 millions de triangles – 800 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 190MTri/sec – 25FPS
- F2: 190MTri/sec – 25FPS
- F3: 205MTri/sec – 27FPS
- F4: 213MTri/sec – 28FPS
- F5: 205MTri/sec – 27FPS
- F6: 152MTri/sec – 20FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 251MTris/sec – 33FPS
- F2: 236MTris/sec – 31FPS
- F3: 297MTris/sec – 39FPS
- F4: 251MTris/sec – 33FPS
- F5: mode not available
- F6: mode not available
2 millions de triangles – 200 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 47MTri/sec – 25FPS
- F2: 47MTri/sec – 25FPS
- F3: 57MTri/sec – 30FPS
- F4: 131MTri/sec – 69FPS
- F5: 167MTri/sec – 88FPS
- F6: 148MTri/sec – 78FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 47MTris/sec – 25FPS
- F2: 59MTris/sec – 31FPS
- F3: 74MTris/sec – 39FPS
- F4: 112MTris/sec – 59FPS
- F5: mode not available
- F6: mode not available
720,000 triangles – 72 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 17MTri/sec – 25FPS
- F2: 17MTri/sec – 25FPS
- F3: 20MTri/sec – 30FPS
- F4: 47MTri/sec – 69FPS
- F5: 60MTri/sec – 88FPS
- F6: 53MTri/sec – 78FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 17MTris/sec – 25FPS
- F2: 21MTris/sec – 31FPS
- F3: 26MTris/sec – 39FPS
- F4: 40MTris/sec – 59FPS
- F5: mode not available
- F6: mode not available
180000 triangles – 18 tri/instance
NVIDIA GeForce 8800 GTX – Forceware 169.38 XP32
- F1: 4MTri/sec – 25FPS
- F2: 4MTri/sec – 25FPS
- F3: 5MTri/sec – 30FPS
- F4: 11MTri/sec – 69FPS
- F5: 15MTri/sec – 89FPS
- F6: 13MTri/sec – 79FPS
ATI Radeon HD 3870 – Catalyst 8.2 XP32
- F1: 4MTris/sec – 25FPS
- F2: 5MTris/sec – 31FPS
- F3: 6MTris/sec – 39FPS
- F4: 10MTris/sec – 59FPS
- F5: mode not available
- F6: mode not available
Quick results analysis:
- we now understand why NVIDIA has called the technique using persistent vertex attributes “Pseudo-Instancing” (key F4). OpenGL glDrawElements() function is extremly fastand persistent vertex attributes require less overhead than uniforms to be passed to vertex shader. Both coupled together give this performance boost.
- benefit of real hardware geometry instancing is mostly visible with few triangles per instance.
- when there are many triangles per instance (1,800), the hardware implementation of glDrawElements() seems to be more efficient (twice!) on RV670 GPU than on G80.
Conclusion
From the results, hardware geometry instancing isn’t the kill-feature I expected. I find that very weird since the différence between 10000 render-calls with glDrawElements and 25 render-calls with glDrawElementsInstancedEXT is not verx important. Seems the instancing management (gl_InstanceID variable in the vertex shader) is a GPU-cycle eater!What a pity ATI hasn’t implemented yet geometry instancing in the Catalyst drivers. I’d be very curious to test hardware GI with a RV670.
Here is a sum-up of the next GPGPU: far more important than you think. Think 10x a CPU’s performance.

This article discusses about the use of GPU for non-graphical purposes or GPGPU (General Purpose (computation) on Graphics Processing Units). The author talks essentially about ATI 3870 X2 and the FireStream 9170 Stream Processor but don’t forget that NVIDIA has also the same kind of products with the Geforce 8 or Tesla. ATI’s Radeon HD 3870 X2 can hit around 1TFLOPS (one trillion floating-point operations per second), in contrast, a high-end quad-core CPU can push out around 60GFLOPS, or one-sixteenth the amount of floating-point power.
To program the GPU for non graphical rendering, you can use the The FireStream SDK (Software Development Kit) for ATI cards that gives the developer low-level access to the workings of the GPU. With NVIDIA boards, you can use CUDA to perform equivalent tasks.
And the final sentence:
”So the next time you look at the Radeon HD 3870 or NVIDIA GeForce 8800 GT in your machine, remember that it’s more than just a graphics card; it’s a floating-point monster that will increasingly be used for non-graphical tasks.”
Related Links:
Sapphire’s 3870 X2 ATOMIC version will hit US and Europe market next month. This card will have 1Gb of GDDR3 memory, Dual DVI / TVO with Water Cooler. Nice product.

You can see two kind of power connectors: 2×3 and 2×4. The first one (2×3) is the regular PCI-Express power connector and the second one (2×4) is the new PCI-Express 2.0 connector.

More pitures on the Radeon 3870X2 ATOMIC: Sapphire_Prepares_ATOMIC_3870-X2.
The previous catalyst (7.11 / 7.12) had a nasty bug in dynamic lights management in GLSL.
I’ve just tested the latest Catalyst 8.1 WHQL with the Demoniak3D demo I coded for, and the bug has been fixed. In the release notes, there is no trace of this bug and its correction. Anyway, now this bug is fixed and this is the important thing.
Voilà histoire de bosser notre anglais, la Radeon HD 3870 X2 en video…
Les images du nouveaux 3DMark commencent à fleurir un peu partout sur les forums comme celui-ci: tgfc.qwd1.com.


on découvre l’auteur de cette image qui a utilisé les techniques classiques offline (mais avancées!) pour le rendu de cette image. Bon c’est vrai, l’artiste 3D a fait cette image en 2006 mais si 3DMark06 fait le rendu en temps réel de cette scène, alors chapeau bas!
FudZilla vient juste de publier cette news qui dit que certaines des images du nouveau 3DMak08 sont des fausses (des fakes): Fake 3DMark08 screen shots posted. Certaines sont vraies et d’autres sont des fausses. Mais lesquelles ?
Le site incriminé est: pcpop.com. Une façon comme une autre de faire du trafic sur un site!
Je pense qu’il faudra encore attendre un peu avant d’avoir une bonne idée du vrai rendu de 3DMark08.
Affaire à suivre!

ATI vient de nous livrer les nouveaux Catalyst 7.11 pour nos belles Radeon. Mais on dirait que ça commence à être une habitude chez les petits gars d’ATI de nous pondre des pilotes bogués surtout pour les nouvelles cartes! Souvenez-vous des Catalyst 7.9 qui enfin corrigeaient un gros bug au niveau des shadow-maps et ce bug n’était visible que pour les Radeon 2k. Bien maintenant c’est la même chose avec les Cat7.11: ils sont bogués pour les Radeon 3k au niveau OpenGL: impossible de mettre plus d’une lumière dynamique dans les shaders GLSL! C’est quand même un sacré bug! Bon pour le moment je n’ai testé que sous WinXP donc peut etre que sous Vista c’est mieux.
A part ce bug (il y en a surement d’autres mais j’ai pas fait assez de tests pour le savoir), les Cat7.11 sont les premiers pilotes qui supportent les Radeon HD 3870. Le numéro interne des Cat7.11 est les 8.432.0.0.
Le téléchargement des Cat7.11 se passe ici:
WinXP 32-bit: [DOWNLOAD]
Vista 32-bit: [DOWNLOAD]
I found this bug while I was coding a new small soft shadows demo for GPU Caps Viewer. Soft shadows are built on shadow mapping and my OpenGL shadow mapping code works perfectly on all Geforce 6/7/8 and Radeon 1k but not on Radeon 2K (2400/2600/2900). Why ? Because of the shadow mapping comparison function that had a serious bug! To be short, the comparison function was supposed to return a boolean value (if shadow returns 0, else returns 1) and before Catalyst 7.9, this function returned, for Radeon 2K, the depth buffer value (as if the comparison function was disabled). But this bug is now a memory since Catalyst 7.9 has fixed it.
I guess we can say thanks to Quake Wars, that has been released few days ago and that is an OpenGL game. For this game (that is really nice), ATI has fixed all major OpenGL bugs.
“All of the fetch and filtering capabilities are available to each thread type, making the samplers completely agnostic about what’s using them.”
This line from Beyond3D article on R600 means that vertex, geometry and pixel shaders can access to texture samplers. So Vertex Texture Fetching is now available with Radeon 2k series :thumbup: What’s more, the R600 can handle very large texture up to 8192×8192 just like the G80.
