WaveInterference (OpenGL Shaders at CocoaHeads)

cocoaheads_textmedium

RainbowWave (WaveInterference)So as I try to think of things to fill in my blog, this is one of the significant ones because this is one of the reasons behind me leaving my anonymity behind.

At CocoaHeads in Silicon Valley some number of months ago, I gave a short demo on using OpenGL shaders to offload processing onto the graphics card. This had been prompted a month earlier by a student who gave a presentation on a Wave Interference program he made for science class. I noticed that the performance suggested it was using fixed pipeline drawing on the video card to render the Wave Interference patterns. I suggested at the time that shaders may make the thing stupidly fast (at the cost of complexity).

Anyway, just to prove to myself that I wasn't full of it, I decided to implement my suggestion. I then gave the presentation at the next Cocoaheads. I wanted to also give out the source code, but didn't have a public server or web site. I asked if the CocoaHeads organizers would post it. They did, but not after giving me a lot of jazz for not having a web site.

So now I'm finally putting one together. I'm still getting organized though, so I'm going to link to CocoaHeads (and plug their name again too).

CocoaHeads top link

CocoaHeads Silicon Valley link

Direct WaveInterefernce link


In a nutshell, the program demonstrates how to use an OpenGL Vertex Shader to move all the vertices in mesh. Since executing a shader is massively parallel on a video card that supports it, this is a really fast operation. The program also uses a Fragment Shader to color the mesh so you can see it more carefully. The code uses OpenSceneGraph which can be described as a wrapper around OpenGL. I leverage the osgviewerCocoa class that I wrote for an OpenSceneGraph example. It is a subclass of NSOpenGLView that integrates OSG functionality with Cocoa. And for kicks, I also utilize OpenAL for a sound 'visualization'. 

So I should write a formal tutorial and also add some improvements I've been wanting to add, but there's always this time thing. One of these days, maybe. For now, below is a copy of the notes I include in the package.


Wave Interference Notes:


Synopsis:

This program renders wave interference patterns generated by two wave sources. This program demonstrates how to use an OpenGL Vertex program "shader" to do the bulk of that computation. 


Motivation:

At the July 2007 Cocoaheads meeting in Silicon Valley, an attendee mentioned they tried to write a program that would visualize wave interference between two sources for a high school physics class. They tried OpenGL, but it was complicated. They ended up using the Unity Game Engine which quickly produced a nice simple visualization which showed a 3D mesh with 2 wave sources moving points on the mesh higher or lower fitting the crests/troughs of the waves. Performance was adequate but not terrific so I suspect things were going through the OpenGL fixed pipeline. I commented that Unity was probably a good choice because I agree OpenGL is complicated, but if one were to try to get better performance, OpenGL shaders would be the way to go.


Curious myself to make sure I wasn't full of it, I decided I would actually try implementing something similar but utilizing shaders.



About this example:

So initially this was merely an exercise about using the Vertex Shader. But it morphed a little bit into a visualization exercise and a little bit into a UI design exercise. It also turned into an Apple/OpenGL bug hunt as I seemed to hit a surprising number of bugs for something that is supposed to a simple example program, and not really bleeding edge.


As a visualization exercise, just rendering the 3D mesh turned out to lack clarity in my opinion. I felt supplemental information was needed which led to the introduction of using Fragment (Pixel) shaders to provide color reinforcement. Also, Apple's OpenAL team has been wanting me to further test/break the OpenAL implementation and file bugs, so an audio representation of data seemed to be appropriate as it would meet several goals: helping the 'visualization', testing OpenAL, and demonstrating the advantage of shifting work to the GPU so the CPU is free to do other things like audio.


As an User Interface exercise, I was trying to make something that wasn't atrocious. This is CocoaHeads afterall! I didn't want to release something to Cocoa developers that would embarrass me too badly. Thus I actually put effort into the UI design. I hope I don't embarrass myself further by admitting this. 


Mesh Visualization:

This is my attempt to visualize wave interference in 3D. The mesh is similar to the Unity demo we were shown at CocoaHeads. However, I have added color to enhance the visualization. I wanted to offer additional color systems, but this created one two many branch points in the shader for all the video cards/systems I tested and performance dropped to software which was unacceptably slow.


You can switch between polygon, wireframe, points for the mesh. Generally polygon is fastest and wireframe is slowest, though sometimes points can be fastest. Generally mainstream consumer hardware is optimized for polygons.


Overlay Display Panel (HUD):

The UI to move sources was really awkward with out this. With the HUD, you can see a 2D (top-down) view of the sources and listener and you can drag the objects to a new location. The sources also grow/shrink at the frequency rate. You can double click the sources to enable/disable their effect. The listener may only be disabled through the menu or panel interface (because the object is removed entirely so there would be no way to double-click it back).


Audio Representation:

This is probably not a physically correct way to use audio to represent this data. I scale the audio gain from 0.0 to 1.0 which follows the rise and falls of the waveforms. 1.0 represents crests and troughs while 0.0 is the midpoint. Thus in nodes of completely destructive interference, you will get no sound.


I say this is not physically correct because if you imagine just one sound playing, you should hear a tone at constant amplitude and it shouldn't oscillate as I currently do.


I also use positional audio because it seemed like fun. Maybe for reality, I should not be doing anything with gain and should just play sources at the correct locations and let your real world audio systems playback the data. If speakers are perfectly designed and placed and the environment is acoustically ideal, then you should hear the constructive/destructive interference naturally. But this is very unlikely to ever happen.


I also pitch shift the audio sample as you scale the frequency. However, I do not scale the audio gain as you change the amplitude. I didn't think this would work well from a 'visualization' stand point.


If there are any better ideas on how to use audio to "visualize" this problem, I would like to hear about them.




Requirements:

Requires a mostly up-to-date version of Mac OS X 10.4. I've tested under 10.4.9 and 10.4.10. Anything before 10.4.7 will probably be problematic with OpenAL because OpenAL 1.1 was released in 10.4.7 and OpenAL 1.0 was horridly broken.


Also requires a video card that can do shaders in hardware. This is obviously required to demonstrate the advantage of offloading work to the GPU. I consider the Nvidia GeForce FX Go5200 (found in 12" Powerbooks) to be near the minimum requirement range as they actually have hardware acceleration. I do not know if the Intel GMA chip sets qualify. (I would like to know.) But modern multi-core Macs may be fast enough to compensate. (That would be something I would like to know too.)



Running:

The program should look something like this:


WaveWindow


If your program looks dramatically different, you probably have an OpenGL driver issue.


Frame rate should be reasonable too. On a 12" Rev C Powerbook (1.3GHz, Nvidia GeForce FX Go5200), performance is about 15 to 20 frames per second (fps). (This system seems to be fill-limited). This is reasonably responsive. Anything not responsive/fluid means you probably hit software rendering and are doing all the work on the CPU and your CPU is not fast enough to handle this.


CPU utilization should generally not be high. On the same Powerbook mentioned above, the program's CPU utilization seems to be around 20% (ranges from 10% to 30%).



Performance:

So the point of the program was to demonstrate how pushing work onto the GPU can be advantageous. In general, video cards have been shattering Moore's Law. The reason is that the problem of drawing pixels is generally "embarrassing parallel". This means that a pixel B doesn't depend on anything from pixel A. So imagine if each pixel could get it's own processor for computation. This is a analogy of what is actually happening.


Now it is not necessarily true that GPUs will outperform CPUs. Some problems will do better on the CPU due to things like caching and other things. GPUs are terrible at branching. And sometimes you can have a really fast CPU and the GPU isn't up to par. But in general, for these kinds of tasks with a both a good GPU and good CPU, the GPU will give you the best performance. But, even if the CPU is faster, there is a secondary benefit of using the GPU; you free the CPU so it can be used for other tasks. And in doing this, you are getting parallelism (and without threading).


For a real world extreme example:


12" Rev C Powerbook (1.3GHz, Nvidia GeForce FX Go5200) 

= 20 fps  (22% CPU utilization for program, 12% for pmTool+Activity monitor)


G4 Cube (450MHz, Nvidia GeForce 6200) 

= 52 fps  (70% CPU utilization for program, 20% for pmTool+Activity monitor)


(The 6200 is a much more powerful video card than the Go5200.)



I find this example particular interesting. This G4 Cube is mostly a standard stock Cube circa year 2000, but the video card has been upgraded to a specially 'modified for the cube' Nvidia GeForce 6200.  All the other Cube components are original/era appropriate. The G4 Cube CPU is measured in MHz, not GHz. The Powerbook is circa 2004 and enjoys a much faster CPU and likely faster buses.


But in the example, the Cube's frame rate is over twice the Powerbook's because the GPU is doing all the hard work. The Cube's CPU utilization is much higher, but that is only because the old 450MHz processor finds the non GPU tasks much harder than the Powerbook processor. If the Cube did not have this fast video card, software rendering would easily overwhelm the system because it is already maxed out as it is.



Technologies/Dependencies:

This program makes use of:

OpenGL Shading Language (aka GLSL aka GLSlang)

OpenSceneGraph (used as API wrapper around fixed pipeline OpenGL)

OpenAL

Cocoa and Cocoa Bindings

Objective-C++


Building:

You need the OpenSceneGraph 2.0 frameworks. The easiest thing to do is copy all the frameworks inside the included program's .app bundle to /Library/Frameworks. Then it should just be a straightforward build in Xcode.




Code:

The code is lengthier than I originally intended (sorry), but there are many different topics that are unrelated so you only need to focus on the aspects you are interested in. The code was originally supposed to demonstrate OpenGL vertex shaders. So the most important place for that is in wave.vert. In fact that code can be reduced and explained right here:


r1 = sqrt((v.x - position1.x) * (v.x - position1.x) + (v.y - position1.y) * (v.y - position1.y));

s1 = cos(2.0*M_PI*sourceFrequency*waveTime - k * r1) * sourceAmplitude;


s1 is the solution to the wave equation which will represent the height of the mesh. Repeat again for s2. Then if both sources are active, add s1+s2.

I make this height the z value of the vertex, e.g. v.z. Once you do that, provide OpenGL with the final vertex position with this common GLSL idiom:

gl_Position = gl_ModelViewProjectionMatrix * v;


The GPU will compute this for every vertex and in parallel which is why it is fast. GLSL gives you the initial vertex values via the variable gl_Vertex.


Any values you need to pass into the shader from the C/C++ code can be done through a variable called a "uniform". I use these to pass in values like sourceFrequency, sourceAmplitude, and waveTime.



The wave.frag is the fragment shader for the mesh. I use it to compute color. All the functions before main() are mainly for color manipulation. You probably can ignore them.


The hudsourcepulse.vert is a vertex program for the sources in the HUD. This mainly exists to make the spheres grow and shrink. The lighting code at the bottom is to make things look better and I also use it to more easily provide the cyan and magenta colors you see. You can probably ignore most of this shader. If you want to learn about lighting, I recommend you go to the place I took this code from. (See References)


The rest of the code is Obj-C++ code. WaveView is the most important one. ViewerCocoa is mainly basic osgViewer integration with Cocoa. This code is available in the OSG source distribution under the examples directory (osgviewerCocoa). This code is really a different lesson.


WaveView.m is where it's all at. The code can be thought of in several ways:

1) Mirrored variables for the different components

2) Dealing with the Mesh, the HUD, and the Audio


In (1), the thing to notice is that there are a lot of ivars in the class. This is because the variables are duplicated to work best for each implementation. There is the Obj-C ivar, best for Cocoa bindings and working with the rest of the Cocoa interfaces. There is the OpenSceneGraph/GLSL counterpart to each of the Cocoa variables. And there there is an OpenAL counterpart to each of the variables. If you look at all the setters, you'll notice that we generally set the Obj-C/Cocoa ivar, we then set the OSG/OpenGL variables, and then we set the OpenAL state.


In (2), we have a lot of setup for the 3 things that exist in different locations. And then in setters, we make sure to set each of the 3 sections as appropriate.


In a nutshell, commonInit is where everything starts. Setup the Cocoa ivars, then setup the OSG ivars and tree of objects for the mesh, then create the HUD, and finally setup the audio.


The Cocoa ivars are pretty obvious. Everything attempts to be KVC compliant so Cocoa Bindings can be used.


The initial mesh primitive is created using traditional OpenGL fixed pipeline (non-shader) techniques. (FYI, OpenGL is on the verge of introducing a 3rd kind of shader called a "Geometry Shader" which might make this obsolete.) Much of the ugly code is just to tell OSG/OpenGL how to assemble the vertices and where they go in the scene graph tree.


In addition to creating the mesh vertices, the shader program must be created. Then we create a lot of "uniforms" so values can be passed to the shaders.


Next, the HUD is created. HUDs are a little painful in OpenGL. Core Animation would be wonderful here. But since we don't have it yet, we build it in OpenGL. First, we need glClipPlanes which constrain drawing to an area so things don't get drawn out-of-bounds. Then we set up a projection node and modelview node which allows us to change the coordinate system we work in so it is more natural. (I pick values to make it easier to calculate mouse positions too.) Then we draw the backing box for the HUD. I also draw a thin border around the box right after. Lots of tedious code, but not hard stuff. I do a few things with StateSets to control draw order. I do something like the Painter's Algorithm. Next I draw shapes to represent the sources and listener. I also play with the lighting so the shapes are more distinctive. I add a vertex program to the sources. Each source gets a separate instance of the vertex program even though it is the same program. I reuse all the uniforms from the mesh for these shaders so they have the same information. I create one special uniform with distinct values so I can tell the two sources apart since they share the same code. Finally, I create the listener object. There is no shader for this. One trick I do is I also add the listener shape back to the mesh so the same object is drawn in both the HUD and the Mesh display.


Finally, I setup the OpenAL. Basically, I load a .WAV file containing a single frequency tone and play it back in a loop. The sample must get loaded into an OpenAL "buffer". I also create two OpenAL "sources" which are things sounds come from. Obviously, this is a good match to are wave sources. Your job is generally to move sources if they change position and move the listener (the thing that hears the sources) if the listener changes position.



Lessons Learned:

Whereas Cocoa shelters you from differences in hardware, OpenGL shaders leave you at the mercy of your hardware and drivers. To my surprise, I hit a fair number of driver bugs. But there were also a lot of performance issues I encountered.


First, the GPU's I used really hates branches. This isn't really a surprise because I know the hardware is designed to be a matrix math cruncher and not a branch predictor. But I was a little surprised at how easy it was to fall off the hardware path and how dramatic a one line change can affect performance.


Branches are if-statements, loops, and also to my surprise, early return statements. The GPUs I used can take a certain number of if-statements, but if the complexity got too great, I would see the shader fall to software. (This happened for with my hue calculations.)


While-loops immediately killed my GPUs. I'm not too surprised here. Perhaps the interesting thing to see was while porting some C code to GLSL, I noticed there where some loops used to make sure angles fit within 0 to 360 degrees (for example), and loops were used to add or subtract until the value was correct. Little casual things like this go unnoticed on CPU land and long time ago might have been faster using an addition loop instead of multiplication or division, but on the GPU, this would instantly kill performance.


And I discovered the hard way, that putting an early return statement inside a function actually caused the shader to fall onto the software path. So something like the following would be really slow:


vec4 compute_color(float value)

{

vec4 ret_color = vec4(1.0, 1.0, 1.0, 1.0);

if(value == 0.0)

{

return ret_color;

}

// Could do stuff here, but in fact, it doesn't matter because the bottleneck is the early return.

return ret_color;

}


It is actually faster to do all the extra useless computation instead of trying to return early to avoid the computation.


Second, debugging OpenGL and shaders is tricky, particularly if the bug might be a driver problem. It is extremely helpful to have at least two different video cards, one by Nvidia, the other by ATI. My personal past experience with Nvidia and ATI is that the ATI cards have significantly more bugs. However, even in the example, I think I managed to uncover an Nvidia specific bug so they are not immune. The prospect of having to test Intel GMA scares me.


Having another operating system to test can be helpful to test. Something like having a Leopard seed can be handy to identify some Apple level driver bugs. However, having other operating systems like Windows and Linux can be helpful too because you can really isolate out Apple specific problems. But easier said than done.



References:

The wave equation/explanation I used comes from here:

http://physics.nad.ru/Physics/English/int_txt.htm


Lighting code I used in my vertex shader for the source spheres comes from the GLSL tutorial at:

http://www.lighthouse3d.com/opengl/glsl/index.php?ogldir2



Machines Tested On:

12" Powerbook Rev C. 1.3Ghz (Nvidia FX Go5200)

Dual G5 2.0GHz Tower (ATI Radeon 9800 Pro)

iMac G5 Rev B. (ATI Radeon 9600)

G4 Cube 450Mhz (Nvidia 6200)



Known Bugs and Issues:


- (OpenGL) On ATI cards I tested (Radeon 9600, 9800Pro), Wireframe mode does not work. This seems to be an OpenGL ATI driver bug. If I do something evil in my shader code that forces the shader to fall off hardware and use the software renderer, the wireframe draws correctly.


- (OpenGL) I wanted to provide alternative color modes in addition to the heated object model I currently use. But adding this branch in the shader causes all the hardware I tried to fall into software mode and performance slows to a crawl. The code base reflects this, but the shader decision section is commented out and the UI is removed. You might notice some of my image captures use this disabled code.


- (OpenGL) Clip planes using shaders don't seem to work under my hardware (both ATI and Nvidia) under Tiger. The fixed pipeline clipping does work (the "Listener" object is successfully clipped), but only parts going though shader seems to be broken. This seems to be a Tiger OpenGL driver issue. The result is that the "Sources" in the Overlay Display Panel (HUD) do not get clipped correctly and they draw outside the box.


- (OpenGL) Occasional crash when hiding or re-enabling the Overlay Display Panel (HUD) on an Nvidia GeForce FX Go5200 (12" Powerbook). I haven't reproduced the crash on other cards. The crash has so far only appeared when the audio is on and only crashes sometimes. I'm assuming this is yet another driver bug, but I'm not sure due to what. So far, the differences seem to be that the graphics for the "Listener" used only the fixed OpenGL pipeline (no shaders) and the graphics to draw the Listener are shared by both the Mesh display and the overlay display. I have no workaround for this.


- (OpenGL) Reproducible crash when the Listener is active and the window is minimized and then restored. The crash happens in the draw (presumably the first draw since restore though unverified) and inside the OpenGL driver. This appears to be a Tiger OpenGL bug and affects both ATI and Nvidia cards I tested. I suspect this is a cousin of the above bug that only appears with the Listener object, except this one is totally reproducible. I have a partial work around in place that magically makes the problem go away. On minimize, I disable the listener shape, and on restore, I reactivate it. I don't really understand why this avoids the problem because it seems like it would be like setting a variable and then unsetting it without anything else ever reading it. It seems that I should have to draw once without the listener shape and then add it back (though it could be a much bigger bug and even harder to fix). This workaround fails if you minimize and then turn on the listener while minimized and then restore. I tried extending this fix to cover this case, but in this case, the 'fix' didn't work.



- (SimpleViewer) Drag-and-drop copy image (cmd-click and drag on view) doesn't create the transparent drag-image correctly. This behavior used to work with osgViewer::SimpleViewer, but broke when SimpleViewer was abruptly removed and replaced. I still haven't figured out how to fix this one.


- (SimpleViewer) Copy image (screen capture) via Cmd-C resets the camera position. This is another behavior that broke with the removal of SimpleViewer I haven't yet figured out how to fix.


- (SimpleViewer) Minimize causes a reset of the camera position. This is because the Copy Image code is used to create a minimized picture for the Dock. Fix the above, and this will be fixed too.


- (SimpleViewer) osg::Viewer's stat's window doesn't move correctly with window resizes. This is a new feature to the viewer that I didn't implement. This needs fixing.


- (SimpleViewer) Mouse drag to spin seems really stiff with the 2.0 release. It used to be more fluid.

I think something was broken in the osg::Viewer backend.


- (OpenAL) Audio sample at C4 ("Middle C") will not correctly pitch shift down one full octave. This may not technically be a bug since the OpenAL spec says nothing about how much a sample must shift, but Middle C seems kind of important to me and I surprised it couldn't go one octave, particularly since Apple's OpenAL is built on top of Core Audio. To work around, I'm using a higher frequency sample (440Hz, A above middle C).


- I'm using deprecated (removed) ALUT API to load a .WAV file. Don't do this. I still need to learn the next easy way (with no 3rd party dependencies) to load an audio file and pass it to OpenAL.


- The initial values for the bindings act funny when I don't bind to a specific model, e.g, use a default NSObjectController with class set to NSMutableDictionary or NSUserDefaultsController. Without this, on good days, the program launches with all the UI set to the values I set in my AppController's awakeFromNib. On bad days, all the UI starts with all undefined/initial values. The bindings all work, but none of the UI elements necessary reflect the correct starting values. I'm not sure why it works sometimes and not others. To work around the problem I can either bind to NSUserDefaultsController or provide a model object class to set my NSObjectController class to. I would appreciate insights and feedback on this issue.


- I'm worried a little about my wave equation. The long final one starting with s = s1+s2+.... seems wrong to me. I think maybe it should have been s=s1+s2=...  I just use s1+s2 so my math might be wrong if the equation was correct.


- Not a bug, but an Easter Egg, the frequncy slider in the main interface uses discrete steps while the frequency slider in the Panel is continuous. The philosophy was that the Panel is an advanced option allowing finer control (like typing in coordinates), but the common usage doesn't care about this. Seeing the frequency change in predictable discrete steps like one octave up/down tends to be more useful since we are generally interested in the relative change for comparison since these are not real world units. Perhaps the better way would be to create a 'secret' key modifier (like shift) which makes the bar continuous if held down.


- Error messages for typing values that are out-of-range are very generic The messages are generated by the Number Formatters in IB and I'm at the mercy of their defaults. I'm a little surprised they don't automatically state the valid range since they have this information. Likely, the better way to deal with this is to use the Key-Value Validation instead. But if I was going to do it, I would want to do it correctly and support localized strings and so forth. I expect with the current system, I would at least get that for free.


Eric Wing

2007-07-28



Copyright © PlayControl Software, LLC / Eric Wing