week3

During this session we continued down the happy path to aural-visual bliss that Luke began last week. We successfully managed to create a "phase vocoder" using FFT data and jitter matrix manipulations.

Be sure to check the resources page for Max/MSP/Jitter resources. We've also put on a StuffIt archive file containing the patches for the walk-through FFT that Luke developed in class.

THE STORY SO FAR:

We take FFT (spectral) data and store it into a jitter matrix, each column in the matrix is an FFT frame and the rows contain the frequency/amplitude data in the original captured-data matrix called "this" (for some bizarre reason only luke knows) both magnitude and phase are stored as two separate planes. This is then 'unpacked' into two separate matrices, one for amplitude ("thus") and one for frequency ("that")). Luke claims he can come up with better names for these matrices, but I tend to doubt it.

So, when we resynthesize, we are stepping through two matrices, but they are both indexed at the same rate and at the same time. If we want to mess around with the data, we will probably be a-wantin' to address both arrays simultaneously.

Actually, we are not doing an "oscillator bank" resynthesis with this, where in fact we would assign the amplitde and frequency from each 'bin' to a separate oscillator. Instead, we are doing a quicker-and-dirtier thing called an 'inverse FFT'. We take the data from the "thus" (amplitude) and "that" (frequency) matrices and recompute them into a single, two-planed (magnitude and phase) matrix like the one we got from the original FFT analysis. Then we do the inverse transform and MAGICALLY OUT COMES SOUND!

Today, we're gonna figure out how to display the data. We could just peel off the numbers, and assign them into a display-thingy from jitter, but two problems:

2. The second problem is that jitter constrains the numbers, so that we need to 'renormalize' the numbers (I think is what Luke said) so that they work with the jitter matrix decoder/

(actually, there were three problems)

3. Frequency data is logarithmic, because human perception of audio is weird beyond belief, so we need to stretch the data to fit a logarithmic display into the jitter display constraints. So, we need to FLIP it, SCALE it, and then STRETCH it ("it" being the data from the FFT matrix we want to display).

jit.op -- does any math operation on the matrix. i.e.

jit.op @op @val *.05 will multiply every element of the array by 0.05. We can use this for scaling the array elements.

Then we pass this modified matrix to:

jit.matrix 1 char 320 240 This will take the floating point 512 x whatever FFT data and convert it into a matrix that is 320 x 240 elements (these dimensions were chosen using Luke's wisdom about what works well for video, especially when blown up to full-screen). Each element will also now be constrained between 0-255. The jit.op multiplication of 0.05 was chosen in order to squeeze most of the FFT data between 0 and 1, because the jit.matrix command will "blow that up" to 0-255. This also accomplishes a pretty hefty data reduction, which is fine, because this is just the data we're going to view -- the 8-bit numbers resulting from the "char" conversion are fine for this. We won't be using this data for our resynthesis. We are keeping that segregated for them pristinely-cool digital sounds we all desire.

The next thing Luke had coded was a "makemap" patch that accomplished the logarithmic reduction of the data. What it did was to generate a 'transfer table' (transfer function) that basically squished the amount of space taken up by the lower frequency bins and expanded the amount of space used by the higher frequency data.

Finally, there was a thing jit.invert which does the axis-flipping (a common operation for computer graphics stuff).

Put it all out, and VIOLA! Nifty-looking sonogram!

Almost...

If we looked at the output of all this with a swept oscillator, we get a really blurred-out image of the sweep. Because we aren't really displaying the individual frequencies, just splatting up the bins and the frequencies "between" each basic bin center frequency gets blurred between bins. Oh well. Live with it. This is just for the fun display part anyhow.

Luke then showed how the "makemap" squished/expanded the data by running his wonderful movie of the feet through his table-lookup thing. Again, this was just to help with the display of the data. Our eyes aren't logarithmic like our ears are. Although it would probably be pretty cool if they were, huh?

Now we needed to synchronize our audio playback with the display (in particular, a line that cruises along the display showing where we are in the FFT data we are resynthesizing). Luke drives the frame rate by a simple phasor stepping through the frames, so we need to tie that to something that will draw a line across our data.

The Max LCD object allows us to use QuickDraw directives to draw lines. Sounds like it's just what we need. yeah. go cat go. And in fact, the directive:

clear, linesegment $1 0 $1 300 when given to the LCD object and driven by the numbers coming from the stepping phasor, does exactly what we want. So how do we get this into our jit.matrix display? There's a cute object called jit.lcd that does exactly this! Whee!!!!!!! jit.lcd 4 char 320 240 sets it up. Send the "clear, ..." directive to it from above, and we're happy as can be. All we have to do is to overlay the jit.lcd output onto our display of the FFT data.

The handy-dandy jit.op object can also do matrix x matrix calculations, not just a single operator on the entire matrix.

jit.op @op + when given two matrices at the two top inlets, will add each element from one matrix to the other.

Oops, Wayne Siegel arrived, just as we were debugging a problem that caused the little red line to scroll far to quickly across the FFT display. So now I'm sitting in my office, and not sure what strange and wonderful tales Luke is telling.