week3
During this session we continued down the happy path to aural-visual
bliss that Luke began
last week.
We successfully managed to create a "phase vocoder" using FFT
data and jitter matrix manipulations.
Be sure to check the
resources page for Max/MSP/Jitter resources.
We've also put on a StuffIt archive file containing the patches for
the walk-through FFT that Luke developed in class.
THE STORY SO FAR:
We take FFT (spectral) data and store it into a
jitter matrix, each column in the matrix is an FFT frame
and the rows contain the frequency/amplitude data in the
original captured-data matrix called "this" (for some bizarre
reason only luke knows) both magnitude and phase are stored
as two separate planes. This is then 'unpacked' into two separate
matrices, one for amplitude ("thus") and one for frequency
("that")). Luke claims he can come up with better names for
these matrices, but I tend to doubt it.
So, when we resynthesize, we are stepping through two matrices,
but they are both indexed at the same rate and at the same time.
If we want to mess around with the data, we will probably be a-wantin'
to address both arrays simultaneously.
Actually, we are not doing an "oscillator bank" resynthesis with this,
where in fact we would assign the amplitde and frequency from each
'bin' to a separate oscillator. Instead, we are doing a quicker-and-dirtier
thing called an 'inverse FFT'. We take the data from the "thus" (amplitude)
and "that" (frequency) matrices and recompute them into
a single, two-planed (magnitude and phase) matrix like the one we
got from the original FFT analysis. Then we do the inverse transform
and MAGICALLY OUT COMES SOUND!
Today, we're gonna figure out how to display the data. We could just
peel off the numbers, and assign them into a display-thingy from jitter,
but two problems:
1. The numbers are all reversed because the coordinate system starts
at the upper LH corner of the display screen (we have to flip the numbers).
Computers are like that.
2. The second problem is that jitter constrains the numbers, so that
we need to 'renormalize' the numbers (I think is what Luke said)
so that they work with the jitter matrix decoder/
(actually, there were three problems)
3. Frequency data is logarithmic, because human perception of audio
is weird beyond belief, so we need to stretch the data to fit a logarithmic
display into the jitter display constraints.
So, we need to FLIP it, SCALE it, and then STRETCH it ("it" being the
data from the FFT matrix we want to display).
jit.op -- does any math operation on the matrix. i.e.
jit.op @op @val *.05
will multiply every element of the array by 0.05. We can use this for
scaling the array elements.
Then we pass this modified matrix to:
jit.matrix 1 char 320 240
This will take the floating point 512 x whatever FFT data and convert
it into a matrix that is 320 x 240 elements (these dimensions were
chosen using Luke's wisdom about what works well for video, especially
when blown up to full-screen). Each element will also now be constrained
between 0-255. The jit.op multiplication of 0.05 was chosen in order
to squeeze most of the FFT data between 0 and 1, because the jit.matrix
command will "blow that up" to 0-255. This also accomplishes a pretty
hefty data reduction, which is fine, because this is just the data we're
going to view -- the 8-bit numbers resulting from the "char"
conversion are fine for this. We won't be using this data for
our resynthesis. We are keeping that segregated for them pristinely-cool
digital sounds we all desire.
The next thing Luke had coded was a "makemap" patch that accomplished
the logarithmic reduction of the data. What it did was to generate
a 'transfer table' (transfer function) that basically squished the
amount of space taken up by the lower frequency bins and expanded
the amount of space used by the higher frequency data.
Finally, there was a thing jit.invert which does the axis-flipping (a common
operation for computer graphics stuff).
Put it all out, and VIOLA! Nifty-looking sonogram!
Almost...
If we looked at the output of all this with a swept oscillator, we get a
really blurred-out image of the sweep. Because we aren't really displaying
the individual frequencies, just splatting up the bins and the frequencies
"between" each basic bin center frequency gets blurred between bins.
Oh well. Live with it. This is just for the fun display part anyhow.
Luke then showed how the "makemap" squished/expanded the data by running
his wonderful movie of the feet through his table-lookup thing.
Again, this was just to help with the display of the data. Our eyes
aren't logarithmic like our ears are. Although it would probably be
pretty cool if they were, huh?
Now we needed to synchronize our audio playback with the display
(in particular, a line that cruises along the display showing where
we are in the FFT data we are resynthesizing). Luke drives the frame
rate by a simple phasor stepping through the frames, so we need
to tie that to something that will draw a line across our data.
The Max LCD object allows us to use QuickDraw directives to draw
lines. Sounds like it's just what we need. yeah. go cat go.
And in fact, the directive:
clear, linesegment $1 0 $1 300
when given to the LCD object and driven by the numbers coming from
the stepping phasor, does exactly what we want. So how do we get this
into our jit.matrix display? There's a cute object called jit.lcd
that does exactly this! Whee!!!!!!!
jit.lcd 4 char 320 240
sets it up. Send the "clear, ..." directive to it from above,
and we're happy as can be. All we have to do is to overlay
the jit.lcd output onto our display of the FFT data.
The handy-dandy jit.op object can also do matrix x matrix calculations,
not just a single operator on the entire matrix.
jit.op @op +
when given two matrices at the two top inlets, will add each element from
one matrix to the other.
Oops, Wayne Siegel arrived, just as we were debugging a problem that
caused the little red line to scroll far to quickly across the
FFT display. So now I'm sitting in my office, and
not sure what strange and wonderful tales Luke is telling.