OpenGL Vertex Buffer Objects (VBOs): A Simple Tutorial

OpenGLVBOCubeSmallRecently, I have been getting a lot of similar questions about how to draw geometry in OpenGL without the use of glBegin()/glEnd(). This is mostly due to the interest in iPhone development which uses OpenGL ES 1.1, though I have received a few desktop performance questions as well. Since I've gotten multiple questions, I thought I would post a very simple tutorial for VBOs.

When most people first learn OpenGL, they are taught using glBegin() and glEnd(). But to shock of many people, these functions have been excluded from OpenGL ES, and there is pressure to remove these functions from future versions of OpenGL proper. The two main reasons for removing these functions are performance and simplicity.

In performance, the glBegin()/glEnd() technique is very slow. The first problem is coined 'immediate mode' drawing. In immediate mode drawing, the system is essentially dispatching all drawing commands on demand from the main system (which is CPU+system RAM+system bus) to the graphics card (GPU). In current modern architectures, both GPUs and CPUs are extremely powerful, but main system memory and the system bus are comparatively slow and create huge bottlenecks when trying to send data between them. Hence drawing in immediate mode is slow because you are waiting to send large amounts of data through the system bus to the GPU.

In addition, glBegin()/glEnd() style code creates a lot of function call overhead because you call a separate function for every vertex, color, texture coordinate, and normal. This overhead is non-trivial for objects with a lot of data.

With regards to simplicity, there are many ways to draw in OpenGL that have been added through the years. Having so many redundant ways to draw is confusing. In addition, OpenGL driver writers have a much bigger job because they have to support many more things. Typically driver writers try to optimize everything they can, but optimizing glBegin()/glEnd() is actually tricky because there are so many different permutations available to describe objects. For example:

glBegin(GL_TRIANGLES);

     glColor(); 

     glNormal(); 

     glTexCoord();

   glVertex();


     glColor(); 

     glNormal(); 

     glTexCoord(); 

  glVertex();


     glColor(); 

     glNormal(); 

     glTexCoord(); 

  glVertex();

glEnd();

Contrast to:

glBegin(GL_TRIANGLES);

  glColor(); 

    glNormal(); glTexCoord(); glVertex();

    glTexCoord(); glNormal(); glVertex();

    glTexCoord(); glVertex();

glEnd();

In the latter form, I basically say I want to apply a single color to everything. I also shuffle the order to the glTexCoord() and glNormal() which shouldn't change the drawing result. And in the last line, I omit the call to glNormal which I then expect to mean continue using the same value as the previous.

Drivers will typically want to pack the data into nice arrays with predictable orders behind the scenes, but the latter form is a nightmare to deal with.

So through the years, OpenGL has added more ways to draw. Vertex Arrays were the first thing added. They basically allow you to submit arrays containing all the vertex, color, normal, and texture coordinate data instead of calling a function for each value. While this avoids the large function call overhead and the nightmare glBegin()/glEnd() case I just laid out, it is still immediate mode drawing and subject to system bus bottlenecks.

The first answer to immediate mode drawing was the DisplayList. DisplayLists allow you to specify that you want a copy of your object to reside directly on the GPU. Since the information is local to the GPU, when it is time to draw, this information does not need to be re-dispatched across the system bus. This allows for very high performance.

There are several problems with displaylists. First, the data is totally static. If you need to make a minor change to the object in a displaylist, you must completely destroy and rebuild the displaylist. Second, displaylists built with glBegin()/glEnd() will still suffer from the large function call overhead. Third, driver writers still need to deal with the nightmare glBegin()/glEnd() scenarios.

Skipping a lot of intermediate history, Vertex Buffer Objects (VBOs) were introduced in OpenGL 1.5 as a new solution to these problems. VBOs allow you to keep your objects on the GPU like a DisplayList. However, VBOs now allow for non-static data. And the API they developed reused and overloaded the Vertex Array family of functions so specifying your data would be efficient.


The Tutorial:

If you are familiar with OpenGL Textures, then the technique will seem somewhat familiar. You need to create a buffer. Then you bind it to deal with it. Then you need to get the data into the buffer.

Basically the procedure can be broken down into:

Initialization:

// Initialization:

glGenBuffers(); // create a buffer object

glBindBuffer(); // use the buffer

glBufferData(); // allocate memory in the buffer



<stuff to get memory into the buffer>


glVertexPointer(); // tell OpenGL how your data is packed inside the buffer


Drawing:

// Draw:

glBindBuffer(); // use the buffer


glEnableClientState(); // enable the parts of the data you want to draw


glDrawArrays(); // the actual draw command


I have posted an example project at:

http://www.assembla.com/spaces/OpenGLVBOTutorial

(Grab the tar-ball in the files section or check out from the Mercurial repo.)


Since most of my questions have been iPhone related, this project starts with the default OpenGL ES template that is provided by Xcode. I am not going to focus on the other code in there, just the VBO related stuff.

I also decided to demonstrate the use of glDrawElements(). Instead of drawing vertex arrays directly, you have the option of using an array of indices whose indices refer to the indices of your vertex array. This can be efficient if you have a list of vertices you want to reuse to draw different objects without duplicating more data. In this example, I don't really take advantage of them, but it's not much more work to use them.

This example will just draw a simple 4-sided cube (no top or bottom).


OpenGLVBOCube

To get started, we first must create a VBO and use it.

// allocate a new buffer

glGenBuffers(1, &cubeVBO);

// bind the buffer object to use

glBindBuffer(GL_ARRAY_BUFFER, cubeVBO);


I have predefined some vertices and colors to use for this example. We next need to allocate enough memory in the VBO to hold all this data. 

const GLsizeiptr vertex_size =

    NUMBER_OF_CUBE_VERTICES

    * NUMBER_OF_CUBE_COMPONENTS_PER_VERTEX

    * sizeof(GLfloat);


const GLsizeiptr color_size = 

    NUMBER_OF_CUBE_COLORS

    * NUMBER_OF_CUBE_COMPONENTS_PER_COLOR

    * sizeof(GLubyte);


// allocate enough space for the VBO

glBufferData(GL_ARRAY_BUFFER,

    vertex_size+color_size,

    0,

    GL_STATIC_DRAW);


We pick GL_STATIC_DRAW because we aren't planning to change the data. GL_DYNAMIC_DRAW and GL_STREAM_DRAW are other potential options. They give hints to the OpenGL system about how often you plan to change your data so the driver has a possibility of making intelligent optimizations. Though depending on your OpenGL implementation, you may not see any performance difference.


Now we need to get the data into the buffer. We have two different pieces of data we want to get into the buffer, the vertices and the colors. There are different ways we can do this. glBufferSubData is one way of copying sections of data into the buffer. Another way is to use glMapBuffer. At a high level, glMapBuffer is sort of like direct memory access (DMA) for your video card. It potentially avoids an extra system copy by creating a direct map of memory.


If you search on the internet about which method is faster, you will see different answers, and the real answer is probably going to be, 'it's system dependent'. For Mac OS X, Apple has been promoting glMapBuffer as the method to use and shows performance benefits of using it. However, most of these benchmarks concern dynamic usage, and for our small, simple static example, there probably won't be a noticeable difference.


So, here is the glMapBuffer code.

GLvoid* vbo_buffer = 

    glMapBufferOES(GL_ARRAY_BUFFER, 

        GL_WRITE_ONLY_OES);


// transfer the vertex data to the VBO

memcpy(vbo_buffer, 

    s_cubeVertices, 

    vertex_size);


// append color data to vertex data. 

// To be optimal, 

// data should probably be interleaved 

// and not appended

vbo_buffer += vertex_size;

memcpy(vbo_buffer, 

    s_cubeColors, 

    color_size);


glUnmapBufferOES(GL_ARRAY_BUFFER); 

First, notice that on the iPhone, we must use glMapBufferOES because it is an extension, and not formerly part of OpenGL ES 1.1. Second, notice that we copy the vertices into the first half the array and the colors into second half of the array. This is actually not the most optimal thing to do, but it was the most straightforward thing for a tutorial. Generally, OpenGL implementations prefer that your data per vertex be interleaved. So for example, should have a vertex position followed immediately by its color value, then the next vertex position and next color value, and so forth. The reason is locality. When the system needs to actually draw a point, it is faster if all the data is in the same block instead of having to jump to all sorts of different addresses to fetch all the information needed (position, color, texture coordinates, normals).


Next we need to tell OpenGL how we packed our data. In our case, we tell OpenGL we have vertices starting from the beginning of the array. We have 3 components per vertex (x, y, x) and they are GL_FLOATs. And we repeat for the color data where we have 4 components of GL_UNSIGNED_BYTEs and start in the array position after the vertex points.

// Describe to OpenGL where the vertex data is in the buffer

glVertexPointer(3, GL_FLOAT, 0, (GLvoid*)((char*)NULL));


// Describe to OpenGL where the color data is in the buffer

glColorPointer(4, GL_UNSIGNED_BYTE, 0, (GLvoid*)((char*)NULL+vertex_size));

If you have texture coordinates or normals, don't forget to use glTexCoordPointer and glNormalPointer in the same way.


Now we are done with the setup for the VBO. But I said I would also introduce an index array. So we will next create another VBO for the index array. Some people call these Index Buffer Objects (IBOs), though I think technically they are still considered VBOs.

// create index buffer

glGenBuffers(1, &cubeIBO);

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, cubeIBO);

// For constrast, instead of glBufferSubData and glMapBuffer, 

// we can directly supply the data in one-shot

glBufferData(GL_ELEMENT_ARRAY_BUFFER, NUMBER_OF_CUBE_INDICES*sizeof(GLubyte), s_cubeIndices, GL_STATIC_DRAW);

This one is a little simpler since there is less data to deal with. First, notice we use GL_ELEMENT_ARRAY_BUFFER instead of GL_ARRAY_BUFFER in our call to glBindBuffer. This is what distinguishes the IBO from the VBO.

Second, just for contrast, I show another way of copying in the data to the buffer without glBufferSubData or glMapBuffer. Instead, the glBufferData command allows you to directly pass data in when you first allocate the memory. Since we don't need to combine separate arrays (for vertex and color), I thought I would show this technique as well.


Now that we've setup everything, actual drawing is pretty simple.

// Activate the VBOs to draw

glBindBuffer(GL_ARRAY_BUFFER, cubeVBO);

glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, cubeIBO);

glEnableClientState(GL_VERTEX_ARRAY);

glEnableClientState(GL_COLOR_ARRAY);


// This is the actual draw command

glDrawElements(GL_TRIANGLE_STRIP, NUMBER_OF_CUBE_INDICES, GL_UNSIGNED_BYTE, (GLvoid*)((char*)NULL));

We bind the buffers. We tell the system we want to draw vertices and colors. (We don't have normals and texture coordinates in this example, but if we did we would also need to enable GL_NORMAL_ARRAY and GL_TEXTURE_COORD_ARRAY.)

Then we call glDrawElements which does the actual drawing. If you skip this command, nothing will ever actually be drawn.


Finally, in cleanup, we need to free the VBOs when we are done with them.

glDeleteBuffers(1, &cubeIBO);

glDeleteBuffers(1, &cubeVBO);


Performance:

- I already discussed glMapBuffer() above.

- I also already discussed using interleaved data instead of partitioning all vertices, normals, colors, and texture coordinates in different parts of the buffer. For real production code, you should be interleaving.

- Also as mentioned earlier, if you have a way you can reuse vertices, index buffers are your friend.

- On the iPhone, you should always use floating point for vertex positions. Fixed point will be slow and inaccurate. However, Apple does recommend smaller packing for colors and texture coordinates. This is why I used GLubyte for the colors even though I personally prefer normalized floats.

- For index buffers, you should also use the smallest packing available to as possible. This helps keep glDrawElements as fast as possible. I use GLubyte in this example. 

- GL_TRIANGLE_STRIP is usually the fastest type possible. A lot of newcomers like to use GL_QUADS, especially for cubes. But ultimately, OpenGL must decompose everything into triangles. And in fact, GL_QUADS is not even available in OpenGL ES. 


Other Considerations:

- I do not show changing data in this example, though as mentioned above GL_DYNAMIC_DRAW and glMapBuffer are the tools you should look at. There are additional vendor specific extensions to improve performance when changing subsets of the data. For Apple, look at APPLE_flush_buffer_range.


Buy my book: Beginning iPhone Games Development

Want to learn how to do other things with OpenGL ES? Or perhaps, you want to learn OpenAL to accompany your OpenGL code? Or maybe you want to build a game for iOS? Well, please buy the book I co-authored, Beginning iPhone Games Development (published by Apress). Check out my website for the book, read the Table of Contents (it's in HTML, so it loads fast), and use the Google book preview to see if it has things you want to learn.



Update (2010/10/22): Near the end of Chapter 12 in my book, I do an OpenAL (microphone) capture example that happens to utilize these OpenGL VBO techniques to draw an oscilloscope like waveform. Since the audio data that needs to be drawn is continuously changing, this is an excellent example of using GL_DYNAMIC_DRAW.


The source code to this example is available with the book. Chapter 12's focus is primarily OpenAL and other iOS audio APIs, so I don't say a whole lot about GL_DYNAMIC_DRAW or VBOs, but this article should already explain what you need to know.


Nonetheless, I have been hoping to post a short article on that example, focusing on the VBO's as a supplement to both this article and readers of the book. This is of course, time-permitting and may be subject to getting Apress's approval. Still, you might want to keep an eye on this blog. (Just use the RSS feed.)




Note: Please forgive the strange newlines and line spacing in the code blocks. Sandvox is randomly inserting newlines and changing the line spacing and I can't make Sandvox stop. I have filed a bug report and hope they will give me a workaround or fix soon.


Note 2 (2009/09/05): I switched comment systems. The old comment thread can be found here:

http://www.haloscan.com/comments/ewing/8B9BD2EC1FAD419EABDB/


Copyright © PlayControl Software, LLC / Eric Wing