{"id":87,"date":"2008-03-16T15:40:13","date_gmt":"2008-03-16T14:40:13","guid":{"rendered":"http:\/\/www.ozone3d.net\/blogs\/lab\/?p=87"},"modified":"2010-06-29T21:44:50","modified_gmt":"2010-06-29T20:44:50","slug":"opengl-geometry-instancing","status":"publish","type":"post","link":"https:\/\/www.ozone3d.net\/blogs\/lab\/20080316\/opengl-geometry-instancing\/","title":{"rendered":"OpenGL Geometry Instancing"},"content":{"rendered":"<hr \/>\n<p><b>This article has been updated with new demos and new GI technique. Read the complete article here: <a href=\"http:\/\/www.geeks3d.com\/20100629\/test-opengl-geometry-instancing-geforce-gtx-480-vs-radeon-hd-5870\/\">OpenGL Geometry Instancing: GeForce GTX 480 vs Radeon HD 5870<\/a>.<\/b><\/p>\n<hr \/>\n[French]\nVoici une petite d\u00e9mo qui utilise les techniques d&#8217;instancing (instancing simple, pseudo-instancing et geometry instancing(ou GI)) pour effectuer le rendu d&#8217;un anneau compos\u00e9 de 10000 petites sph\u00e8res.<br \/>\nLa d\u00e9mo est livr\u00e9e en 5 versions:<\/p>\n<ul>\n<li>chaque sph\u00e8re est compos\u00e9e de 1800 triangles (18 millions de triangles pour l&#8217;anneau entier)<\/li>\n<li>chaque sph\u00e8re est compos\u00e9e de 800 triangles (8 millions de triangles pour l&#8217;anneau entier)<\/li>\n<li>chaque sph\u00e8re est compos\u00e9e de 200 triangles (2 millions de triangles pour l&#8217;anneau entier)<\/li>\n<li>chaque sph\u00e8re est compos\u00e9e de 72 triangles (720000 triangles pour l&#8217;anneau entier)<\/li>\n<li>chaque sph\u00e8re est compos\u00e9e de 18 triangles (180000 triangles pour l&#8217;anneau entier)<\/li>\n<\/ul>\n<p>J&#8217;ai ajout\u00e9 au dernier moment un extra: une version avec 20000 instances de 5000 triangles chacune soit 100 millions de polygones (fichier Demo_Instancing_100MTriangles_20kInstances.exe).<br \/>\n[\/French]\n[English]\nThis demo uses instancing techniques (simple instancing, pseudo-instancing and geometry instancing(or GI)) to render a ring made of 10,000 small spheres. The demo is delivered in 5 versions:<\/p>\n<ul>\n<li>each sphere is made of 1,800 triangles (18 millions triangles for the whole ring)<\/li>\n<li>each sphere is made of 800 triangles (8 millions triangles for the whole ring)<\/li>\n<li>each sphere is made of 200 triangles (2 millions triangles for the whole ring)<\/li>\n<li>each sphere is made of 72 triangles (720,000 triangles for the whole ring)<\/li>\n<li>each sphere is made of 18 triangles (180,000 triangles for the whole ring)<\/li>\n<\/ul>\n<p>I added in the last moment a bonus: a 20,000 instances version, each instance made of 5,000 triangles. We get the monstruous count of 100 millions triangles (file Demo_Instancing_100MTriangles_20kInstances.exe).<br \/>\n[\/English]\n<center><br \/>\n<img decoding=\"async\" src=\"public\/200803\/demo_opengl_instancing_01_400x253.jpg\" \/><br \/>\n<img decoding=\"async\" src=\"public\/200803\/demo_opengl_instancing_02_400x253.jpg\" \/><br \/>\n<img decoding=\"async\" src=\"public\/200803\/demo_opengl_instancing_03_400x253.jpg\" \/><br \/>\n<\/center><\/p>\n<h3>DOWNLOAD<\/h3>\n<ul>\n<strong><a href=\"public\/200803\/Demo_Instancing_20080316.zip\">OpenGL Instancing DemoPack &#8211; (2676k)<\/a><\/strong>\n<\/ul>\n[French]\nIl y a plusieurs techniques d&#8217;instancing qui sont utilis\u00e9es et chaque technique est accessible avec une des touches F1 \u00e0 F6.<\/p>\n<ul>\n<li>F1: instancing simple avec camera frustum culling: il y a une seule source de g\u00e9om\u00e9trie (un mesh) et elle est rendu pour chaque instance. Le calcul de la matrice de transformation est fait sur le CPU ainsi que le test de clipping avec la camera. Le rendu OpenGL utilise la fonction glDrawElements().<\/li>\n<li>F2: instancing simple SANS camera frustum culling: il y a une seule source de g\u00e9om\u00e9trie (un mesh) et elle est rendu pour chaque instance. Le calcul de la matrice de transformation est fait sur le CPU mais il n&#8217;y a plus de test de clipping avec la camera. Le rendu OpenGL utilise la fonction glDrawElements().<\/li>\n<li>F3: pseudo-instancing lent: il y a une seule source de g\u00e9om\u00e9trie (un mesh) et elle est rendu pour chaque instance. Le calcul de la matrice de transformation est maintenant effectu\u00e9 sur le GPU. Le passage des param\u00e8tres pour chaque instance se fait avec des variables uniformes. Il n&#8217;y a pas de test de clipping avec la camera. Le rendu OpenGL utilise la fonction glDrawElements().<\/li>\n<li>F4: pseudo-instancing rapide: il y a une seule source de g\u00e9om\u00e9trie (un mesh) et elle est rendu pour chaque instance. Le calcul de la matrice de transformation est maintenant effectu\u00e9 sur le GPU. Le passage des param\u00e8tres pour chaque instance se fait avec des attributs de vertex persistants (comme les coordonn\u00e9es de textures ou la couleur). C&#8217;est cette technique qui<br \/>\na \u00e9t\u00e9 mise en avant par NVIDIA avec son whitepaper: <a href=\"http:\/\/download.nvidia.com\/developer\/SDK\/Individual_Samples\/DEMOS\/OpenGL\/src\/glsl_pseudo_instancing\/docs\/glsl_pseudo_instancing.pdf\">GLSL Pseudo-Instancing<\/a>. Il n&#8217;y a pas de test de clipping avec la camera. Le rendu OpenGL utilise la fonction glDrawElements().<\/li>\n<li>F5: Geometry Instancing: c&#8217;est le vrai instancing hardware. Il y a une seule source de g\u00e9om\u00e9trie (un mesh) et le rendu se fait par lots (ou batchs) de 400 instances par draw call. Le rendu complet de l&#8217;anneau ne n\u00e9cessite que 25 draw-calls au lieu de 10000. Le calcul de la matrice de transformation est effectu\u00e9 sur le GPU. Le passage des param\u00e8tres pour chaque batch se fait avec des tableaux de variables uniformes. Il n&#8217;y a pas de test de clipping avec la camera. Le rendu OpenGL utilise la fonction glDrawElementsInstancedEXT(). Actuellement, seules les cartes NVIDIA GeForce 8 (et sup.) supportent cette fonction.<\/li>\n<li>F6: Geometry Instancing avec attributs de vertex persistants: c&#8217;est le geometry instancing hardware coupl\u00e9 avec le passage des param\u00e8tres par les attributs de vertex persistants. Mais le nombre d&#8217;attributs de vertex persistants est tr\u00e8s limit\u00e9. Au maximum j&#8217;ai reussi \u00e0 rendre 4 instances par draw-call. Mais \u00e9trangement, 2 instances par draw-call donne de meilleurs r\u00e9sultats. Dans ce cas, le rendu complet de l&#8217;anneau n\u00e9cessite que 5000 draw-calls au lieu des 10000. Le calcul de la matrice de transformation est effectu\u00e9 sur le GPU. Il n&#8217;y a pas de test de clipping avec la camera. Le rendu OpenGL utilise la fonction glDrawElementsInstancedEXT(). Actuellement, seules les cartes NVIDIA GeForce 8 (et sup.) supportent cette fonction.<\/li>\n<\/ul>\n[\/French]\n[English]\nSeveral instancing techniques are used and you can select them with F1 to F6 keys.<\/p>\n<ul>\n<li>F1: simple instancing with camera frustum culling: there is one source for geometry (a mesh) and it&#8217;s rendered for each instance. The tranformation matrix calculation is done on the CPU as well as the camera frustum test. OpenGL rendering uses the glDrawElements() function.<\/li>\n<li>F2: simple instancing without camera frustum culling: there is one source for geometry (a mesh) and it&#8217;s rendered for each instance. The tranformation matrix calculation is done on the CPU but there is no longer camera frustum test. OpenGL rendering uses the glDrawElements() function.<\/li>\n<li>F3: slow pseudo-instancing: there is one source for geometry (a mesh) and it&#8217;s rendered for each instance. Now the tranformation matrix calculation is done on the GPU and per-instance data are passed via uniform variables. There is no camera frustum test.  OpenGL rendering uses the glDrawElements() function.<\/li>\n<li>F4: pseudo-instancing: there is one source for geometry (a mesh) and it&#8217;s rendered for each instance. The tranformation matrix calculation is done on the GPU and per-instance data are passed via persistent vertex attributes (like texture coordinates or color). This technique has been shown by NVIDIA in the following whitepaper: <a href=\"http:\/\/download.nvidia.com\/developer\/SDK\/Individual_Samples\/DEMOS\/OpenGL\/src\/glsl_pseudo_instancing\/docs\/glsl_pseudo_instancing.pdf\">GLSL Pseudo-Instancing<\/a>. There is no camera frustum test. OpenGL rendering uses the glDrawElements() function.<\/li>\n<li>F5: geometry instancing: it&#8217;s the real hardware instancing. There is one source for geometry (a mesh) and rendering is done by batchs of 400 instances per draw-call. The whole rendering of the ring requires 25 draw-calls instead of 10,000. The tranformation matrix calculation is done on the GPU and per-batch data is passed via uniform arrays. There is no camera frustum test. OpenGL rendering uses the glDrawElementsInstancedEXT() function. Currently, only NVIDIA GeForce 8 (and higher) support this function.<\/li>\n<li>F6: geometry instancing with persistant vertex attributes: it&#8217;s the hardware instancing coupled with the transmission of parameters is done via the persistent vertex attributes. But the number of persistent vertex attributes is very limited. The best I did is to render 4 instances per draw-call. But oddly, I got the best results with 2 instances per draw-call. In that case, the rendering of whole ring requires 5000 draw-calls. The tranformation matrix calculation is done on the GPU and per-batch data is passed via uniform arrays. There is no camera frustum test. OpenGL rendering uses the glDrawElementsInstancedEXT() function. Currently, only NVIDIA GeForce 8 (and higher) support this function.<\/li>\n<\/ul>\n<p>Ok now, let&#8217;s see some results with a NVIDIA GeForce 8800 GTX and an ATI Radeon HD 3870. Both cards have been tested with an AMD 64 3800+.<br \/>\n[\/English]\n<h3>18 millions triangles &#8211; 1800 tri\/instance<\/h3>\n<p><b>NVIDIA GeForce 8800 GTX &#8211; Forceware 169.38 XP32<\/b><\/p>\n<ul>\n<li>F1: 223MTris\/sec &#8211; 13FPS <\/li>\n<li>F2: 223MTris\/sec &#8211; 13FPS <\/li>\n<li>F3: 223MTris\/sec &#8211; 13FPS <\/li>\n<li>F4: 223MTris\/sec &#8211; 13FPS <\/li>\n<li>F5: 223MTris\/sec &#8211; 13FPS <\/li>\n<li>F6: 171MTris\/sec &#8211; 10FPS <\/li>\n<\/ul>\n<p><b>ATI Radeon HD 3870 &#8211; Catalyst 8.2 XP32<\/b><\/p>\n<ul>\n<li>F1: 429MTris\/sec &#8211; 25FPS <\/li>\n<li>F2: 463MTris\/sec &#8211; 27FPS <\/li>\n<li>F3: 446MTris\/sec &#8211; 26FPS <\/li>\n<li>F4: 274MTris\/sec &#8211; 16FPS <\/li>\n<li>F5: mode not available <\/li>\n<li>F6: mode not available <\/li>\n<\/ul>\n<h3>8 millions de triangles &#8211; 800 tri\/instance<\/h3>\n<p><b>NVIDIA GeForce 8800 GTX &#8211; Forceware 169.38 XP32<\/b><\/p>\n<ul>\n<li>F1: 190MTri\/sec &#8211; 25FPS<\/li>\n<li>F2: 190MTri\/sec &#8211; 25FPS<\/li>\n<li>F3: 205MTri\/sec &#8211; 27FPS<\/li>\n<li>F4: 213MTri\/sec &#8211; 28FPS<\/li>\n<li>F5: 205MTri\/sec &#8211; 27FPS<\/li>\n<li>F6: 152MTri\/sec &#8211; 20FPS<\/li>\n<\/ul>\n<p><b>ATI Radeon HD 3870 &#8211; Catalyst 8.2 XP32<\/b><\/p>\n<ul>\n<li>F1: 251MTris\/sec &#8211; 33FPS <\/li>\n<li>F2: 236MTris\/sec &#8211; 31FPS <\/li>\n<li>F3: 297MTris\/sec &#8211; 39FPS <\/li>\n<li>F4: 251MTris\/sec &#8211; 33FPS <\/li>\n<li>F5: mode not available <\/li>\n<li>F6: mode not available <\/li>\n<\/ul>\n<h3>2 millions de triangles &#8211; 200 tri\/instance<\/h3>\n<p><b>NVIDIA GeForce 8800 GTX &#8211; Forceware 169.38 XP32<\/b><\/p>\n<ul>\n<li>F1: 47MTri\/sec &#8211; 25FPS<\/li>\n<li>F2: 47MTri\/sec &#8211; 25FPS<\/li>\n<li>F3: 57MTri\/sec &#8211; 30FPS<\/li>\n<li>F4: 131MTri\/sec &#8211; 69FPS<\/li>\n<li>F5: 167MTri\/sec &#8211; 88FPS<\/li>\n<li>F6: 148MTri\/sec &#8211; 78FPS<\/li>\n<\/ul>\n<p><b>ATI Radeon HD 3870 &#8211; Catalyst 8.2 XP32<\/b><\/p>\n<ul>\n<li>F1: 47MTris\/sec &#8211; 25FPS <\/li>\n<li>F2: 59MTris\/sec &#8211; 31FPS <\/li>\n<li>F3: 74MTris\/sec &#8211; 39FPS <\/li>\n<li>F4: 112MTris\/sec &#8211; 59FPS <\/li>\n<li>F5: mode not available <\/li>\n<li>F6: mode not available <\/li>\n<\/ul>\n<h3>720,000 triangles &#8211; 72 tri\/instance<\/h3>\n<p><b>NVIDIA GeForce 8800 GTX &#8211; Forceware 169.38 XP32<\/b><\/p>\n<ul>\n<li>F1: 17MTri\/sec &#8211; 25FPS<\/li>\n<li>F2: 17MTri\/sec &#8211; 25FPS<\/li>\n<li>F3: 20MTri\/sec &#8211; 30FPS<\/li>\n<li>F4: 47MTri\/sec &#8211; 69FPS<\/li>\n<li>F5: 60MTri\/sec &#8211; 88FPS<\/li>\n<li>F6: 53MTri\/sec &#8211; 78FPS<\/li>\n<\/ul>\n<p><b>ATI Radeon HD 3870 &#8211; Catalyst 8.2 XP32<\/b><\/p>\n<ul>\n<li>F1: 17MTris\/sec &#8211; 25FPS <\/li>\n<li>F2: 21MTris\/sec &#8211; 31FPS <\/li>\n<li>F3: 26MTris\/sec &#8211; 39FPS <\/li>\n<li>F4: 40MTris\/sec &#8211; 59FPS <\/li>\n<li>F5: mode not available <\/li>\n<li>F6: mode not available <\/li>\n<\/ul>\n<h3>180000 triangles &#8211; 18 tri\/instance<\/h3>\n<p><b>NVIDIA GeForce 8800 GTX &#8211; Forceware 169.38 XP32<\/b><\/p>\n<ul>\n<li>F1: 4MTri\/sec &#8211; 25FPS<\/li>\n<li>F2: 4MTri\/sec &#8211; 25FPS<\/li>\n<li>F3: 5MTri\/sec &#8211; 30FPS<\/li>\n<li>F4: 11MTri\/sec &#8211; 69FPS<\/li>\n<li>F5: 15MTri\/sec &#8211; 89FPS<\/li>\n<li>F6: 13MTri\/sec &#8211; 79FPS<\/li>\n<\/ul>\n<p><b>ATI Radeon HD 3870 &#8211; Catalyst 8.2 XP32<\/b><\/p>\n<ul>\n<li>F1: 4MTris\/sec &#8211; 25FPS <\/li>\n<li>F2: 5MTris\/sec &#8211; 31FPS <\/li>\n<li>F3: 6MTris\/sec &#8211; 39FPS <\/li>\n<li>F4: 10MTris\/sec &#8211; 59FPS <\/li>\n<li>F5: mode not available <\/li>\n<li>F6: mode not available <\/li>\n<\/ul>\n[French]\nAnalyse rapide des resultats:<\/p>\n<ul>\n<li>nous comprenons maintenant pourquoi NVIDIA a appell\u00e9 &#8220;Pseudo-Instancing&#8221; la technique utilisant les attributs persistants de vertex (key F4). La fonction glDrawElements() d&#8217;OpenGL est extremement rapide et optimis\u00e9e et les attrubuts persistants de vertex n\u00e9cessitent moins de traitement que les variables uniformes pour \u00eatre pass\u00e9s au vertex shader. Les deux coupl\u00e9s ensemble donnent ce boost de performance.<\/li>\n<li>le b\u00e9n\u00e9fice du vrai hardware geometry instancing est principalement visible losqu&#8217;il y a peu de triangles par instance.<\/li>\n<li>lorsqu&#8217;il y a beacoup de triangles par instance (1800), l&#8217;imp\u00e9mentation mat\u00e9rielle de glDrawElements() semble \u00eatre plus efficace (pr\u00e8s de deux fois!) sur le GPU RV670 que sur le G80.<\/li>\n<\/ul>\n<h3>Conclusion<\/h3>\n<p>Au vu des r\u00e9sultats, le hardware geometry instancing n&#8217;est pas la kill-feature que j&#8217;attendais. Je trouve cela tr\u00e8s curieux car la diff\u00e9rence entre 10000 render-calls avec glDrawElements et 25 render-calls avec glDrawElementsInstancedEXT n&#8217;est pas tr\u00e8s importante. On dirait que la gestion de l&#8217;instancing (variable gl_InstanceID dans le vertex shader) fait perdre beaucoup de temps. Je trouve aussi dommage qu&#8217;ATI n&#8217;ait pas encore pris le temps d&#8217;impl\u00e9menter le geomtry instancing dans les pilotes Catalyst. Je serais tr\u00e8s curieux de tester le GI hardware avec un RV670.<br \/>\n[\/French]\n[English]\nQuick results analysis:<\/p>\n<ul>\n<li>we now understand why NVIDIA has called the technique using persistent vertex attributes &#8220;Pseudo-Instancing&#8221; (key F4). OpenGL glDrawElements() function is extremly fastand persistent vertex attributes require less overhead than uniforms to be passed to vertex shader. Both coupled together give this performance boost.<\/li>\n<li>benefit of real hardware geometry instancing is mostly visible with few triangles per instance.<\/li>\n<li>when there are many triangles per instance (1,800), the hardware implementation of glDrawElements() seems to be more efficient (twice!) on RV670 GPU than on G80.<\/li>\n<\/ul>\n<h3>Conclusion<\/h3>\n<p>From the results, hardware geometry instancing isn&#8217;t the kill-feature I expected. I find that very weird since the diff\u00e9rence between 10000 render-calls with glDrawElements and 25 render-calls with glDrawElementsInstancedEXT is not verx important. Seems the instancing management (gl_InstanceID variable in the vertex shader) is a GPU-cycle eater!What a pity ATI hasn&#8217;t implemented yet geometry instancing in the Catalyst drivers. I&#8217;d be very curious to test hardware GI with a RV670.<br \/>\n[\/English]\n","protected":false},"excerpt":{"rendered":"<p>This article has been updated with new demos and new GI technique. Read the complete article here: OpenGL Geometry Instancing: GeForce GTX 480 vs Radeon HD 5870. [French] Voici une petite d\u00e9mo qui utilise les techniques d&#8217;instancing (instancing simple, pseudo-instancing et geometry instancing(ou GI)) pour effectuer le rendu d&#8217;un anneau compos\u00e9 de 10000 petites sph\u00e8res. La d\u00e9mo est livr\u00e9e en 5 versions: chaque sph\u00e8re est &hellip; <a href=\"https:\/\/www.ozone3d.net\/blogs\/lab\/20080316\/opengl-geometry-instancing\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">OpenGL Geometry Instancing<\/span> <span class=\"meta-nav\">&raquo;<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[9],"tags":[84,257,258,259,638,260,23,110],"class_list":["post-87","post","type-post","status-publish","format-standard","hentry","category-opengl","tag-geforce","tag-geometry-instancing","tag-gldrawelementsinstancedext","tag-gl_instanceid","tag-opengl","tag-pseudo-instancing","tag-radeon","tag-tech-demos"],"aioseo_notices":[],"views":19282,"_links":{"self":[{"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/posts\/87","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/comments?post=87"}],"version-history":[{"count":0,"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/posts\/87\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/media?parent=87"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/categories?post=87"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.ozone3d.net\/blogs\/lab\/wp-json\/wp\/v2\/tags?post=87"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}