week4

Ok, so now we get to have Real Fun, because we're going to finish off our walk-through FFT, hook it up to some 3-D visualization capabilities, and then connect it to the real world (like, where we live and breathe) using a camera-tracking a laser-pointer against a black background.

Again, look at last week and the previous week to see what has led up to this point in the class.

But first, Luke showed off the new Curtis Roads book Microsound, filled with loads of informative information informing us about informated sound-creation. Go to barnesandnoble.com to order it, or you can also visit the emf book page if you want to help support the good folks at The Electronic Music Foundation.

Remember that the patches (in a StuffIt archive file) and other sources of information for this class are available on the resources page.

NOW WE FINISH THE WALK-THROUGH FFT:

note: these notes are going to be a little scattered and cryptic, because my laptop was being repaired, so I had to write things down by hand! Yikes!

We wanted to find a way to control the 'scrolling' through the FFT using some kind of external interface, i.e. not a mouse or graphics tablet or other such technology. In a veritable stroke of genius, Luke hit upon the idea of using a laser pointer against a black background to determine the FFT frame value for resynthesis. The amazingly appropriate aspect is that the only handy black-background surface we could find was the shell of an old NeXT "pizza-box" computer. Hi! This is Steve Jobs! The red against the black was good because it provided an excellent color contrast for the camera-tracking software.

Looking for color is a relatively easy was to track "external" stuff, although you have to be careful because the color can shift depending on lighting conditions. The more contrast between your background and foreground (tracked) colors, the better. Also, we will be looking for a range of color in our tracker, because this will allow for a bit of lighting 'fuzziness'. If you do design an application for real-world use that relies upon color tracking, be sure not to hard-code the color into the app. Leave it able to be recalibrated in the context(s) where you will be using it.

Using our digital cameras (room 313 has a nice setup for this kind of work), we connect the "DV" output through the firewire port and use

jit.qt.grab to grab frames from the camera. It works just like jit.qt.movie, use a "metro" object to hit it with bangs and it will grab frames. The "open" message sent to it (see the patch) will start the object reading data. As before, the matrix rate in jitter will probably be different from the frame rate. Luke went into some considerations about frame rate, camera hardware, which is more efficient (analog vs. digital cameras), etc., but I forgot to write it all down. Ask Luke for advice if you get really serious about this stuff.

With image date streaming in through the jit.qt.grab object, we can use the

jit.findbounds object to track the visual input we want to find (in this case the red line of the laser pointer against the black background of the NeXT case). jit.findbounds will put out the coordinates of the "bounding box" that contains the color we define. For example, the bounding box for the following red-colored streak appearing through the camera on the computer monitor might look like this:

jitter allows us to use a variety of ways for specifying color; one of the easiest is RGB (red-green-blue). Typically these RGB values come in triple numbers, each being 1-255 (8 bits of color!). We need to scale these to between 0.0 -- 1.0 for each color component to pass into jit.findbounds. Also, jit.findbounds does in fact take a range of color values to allow for the 'fuzziness' in color of real world stuff. You will need to pass it minimum and maximum values for each color component (again, see the accompanying class patch to see how Luke did this).

The handy MSP "vexpr" object works like "expr" only can operate on arrays:

vexpr $i1/255 will divide all incoming elements (in this case 3 of them -- R G B) by 255.

However, the best approach would be for us not to define in advance the color we want to track, but to grab it from the incoming camera signal -- that way we will know how the color appears in the place where we use the application. The little "syringe" object on the Max/MSP/jitter palette does this. We can use it to grab a color from the incoming signal as displayed on the screen, and by connecting it's output to the "vexpr" object we can convert from the 0-255 values to the 0.0-1.0 values for the colors. We can also use "vexpr" to through in a range-calculation, so that we don't get too narrowly focused on one color value. The class patch shows this.

jit.findbounds puts out lists of maximum and minimum boundary values for the bounding box. The ever-useful "unpack" object will convert these to x & y coordinate values. Just for fun, Luke hooked them directly to the "qtmusic" object playing that vibrant QuickTime piano sound -- the coordinate values were mapped onto MIDI amplitude and note-number values. It was more fun than a barrel of monkeys.

With a little more math, we also connected the output of this little system directly to an oscillator to see how much latency was in the patch -- how closely did it track the movement of the laser pointer? All agreed it was quite good. Yes, quite good. It felt good. Good.

Because we are tracking a red line against a black background, we modified out original use of jit.findbounds to just look at the red component. This is for efficiency (we don't need to track the G and B color components). jit.unpack did the trick again.

Then we used the jit.lcd object to combine our bounding-box tracking with the incoming video signal, to create a BigEye kind of system, but one we can use easily within Max/MSP/jitter. We had some minor difficulties to resolve concerning the synchronization of the lcd drawing, the frame rate of incoming frames, and the clearing of the lcd to update the image. Using the "trigger" ("t") Max object allowed us to sync all our bangs... god, this language...

In general, if you are using several "metro" objects to repeatedly trigger events (especially with a tight timing), it is good to try to combine them into a single master "metro" clock -- efficiency is better + your events will be syncronized.

After putting all this stuff together, it worked! One thing we will have to keep in mind is that the bounding box coordinates may or may not map onto the number of FFT data frames we have. Scaling, scaling, over the bounding main... (ha ha! I just made that up!).

Before going to the next section of class, where we beef up the sonogram display to a snazzy 3-D rendering of the FFT data, Luke told a story about buying many pairs of fluorescent-colored socks in Amsterdam. We were all flabbergasted.

The last component in the Walk-Through FFT was indeed the 3-D output display. A sonogram is fine, but it just can't compare to the visual lushness and eye-popping splendor of an elegantly-rendered 3-D mountain range of FFT data. The good news is that the jitter package has encapsulated within it a very powerful 3-D computer graphics package known as OpenGL. If you have seen virtually any Hollywood production that uses computer animation, then you have seen software that was built upon an OpenGL substrate. OpenGL is a 3-D modeling standard that was developed by Silicon Graphics, Inc. (SGI) in the grand old days when they were a viable company. OpenGL apparently will outlast SGI -- it has pretty much become the de-facto standard for 3-D graphics rendering. The good news is that you can even get the code for FREE now, thanks to some hard-working programmers who do nifty stuff for the greater common good.

We will probably be revisiting OpenGL later in the class. In the meantime, here are a few links to some good OpenGL sites:

OpenGL.org -- the main place for OpenGL these days
SGI OpenGL -- the old SGI OpenGL page
Apple's OpenGL -- for Macs
Mesa 3D -- a free implementation of OpenGL (Linux, etc.)

I haven't included any tutorial pages -- just check google, there's a bazillion out there. There are also a number of OpenGL books that you should consider if you want to get serious about 3-D coding. Check the opengl.org books for the more popular manuals.

Anyhow, the

jit.gl.render object will take OpenGL commands and render them. It can also take a window name as an argument. This can be a window created by jit.gl.window as in: jit.gl.window foo such that jit.gl.render foo will send all rendering commands to the new window called "foo". I think that the directive "fullscreen" will cause it to send OpenGL commands to a full-screen rendering. Be careful, though, and leave a keycode that will allow you to return to your main screen. (I leave it to your imagination to figure out why this might be a problem.) Luke did this in class, but I forgot how. Darn.

Aside: just FYI, the jit.gl.* objects actually use jitter matrices to communicate with each other. I doubt you will ever have to access this data, but it does show the flexibility of the jitter matrix setup for doing arbitrary data massaging. In fact, one of the potentially cute uses of jit.findbounds is to operate on non-graphic data. Think about it...

As with the jit.qt.movie or jit.qt.grab objects. the jit.gl.render object works by getting banged. It renders a new scene everytime it gets a Max bang. Generally you will want to erase the previous scene before new rendering, but you can get some cool overlay effects by not doing this.

There's a lot of GL material imbedded in jitter; certainly more than we can cover here. The best way to learn this is to hit the tutorials, plus scope the web sites above, etc. It is a very powerful standard.

What we plan to do is fairly simple, though. But even something as simple as our use shows the power of OpenGL -- you get a lot of really fancy-looking stuff for very little effort. We plan to use a "texture", and we will modify the "texture" in one dimension to reflect the peaks in the FFT data.

What does this mean? Well, a "texture" is basically a flat set of 2-D graphics data (our sonogram!) that you can put on an OpenGL object. In our case, we plan to put the sonogram "texture" onto a flat plane, BUT we plan to warp the plane (using an OpenGL technique called an 'extrusion map') along one axis to show the amplitude peaks in the FFT spectrum.

Here's a quick outline of what we do:

1. create a plane
2. create a texture (from the FFT/sonogram data)
3. map the texture onto the plane
4. do an extrusion map based on the amplitude to 'lift up' the plane

Easy as pie, huh? Then when we get this finished, we set it up to receive input from our red-tracking video input patch and we send frame numbers from there to our playfft sound-synthesis patch. Yeah, we did this. There was a bit more trickiness in terms of syncronizing various aspects, and redrawing the image correctly, etc., but it all finally worked, and it was totally cool!

Check out the class patches for the finished version.