Uncategorized – Jazz Information Retrieval

Two articles on jazz rhythm

Tad — Mon, 26 Aug 2013 14:11:18 +0000

I came across two articles by Matthew Butterfield, a professor of music theory at Franklin & Marshall College, that should be interesting for us in surveying analyses of jazz, and in resorting to observations or measurements using waveforms as evidence.

One of the articles is titled “Why Do Jazz Musicians Swing their Eighth Notes,” and, while the question can hardly ever be answered definitively, it is a good one to be asking in analyzing jazz rhythm. Here is the citation:

Butterfield, Matthew W. “Why Do Jazz Musicians Swing Their Eighth Notes?” Music Theory Spectrum – The Journal of the Society for Music Theory 33. 1 (Spring 2011): 3-26, 107.

Butterfield makes some convincing points on a subject, jazz rhythm, that often seems to produce vague generalities and mysticism in the literature on jazz. Carefully examining ratios between different parts of soloists’ and drummers’ patterns of eighth notes, or eighth notes and quarter notes (which he calls the “Beat/Upbeat Ratio”), he goes on to show how soloists use minute variations in the ratios to embellish their phrases. That in turn suggests that they do so in distinctive ways, and that a particular artist’s expressive effects in this realm might be profiled, a possibility we should discuss further.

But my chief concern in this post is where and how Butterfield derives his evidence. He investigates audio waveforms for the nuances of timing he seeks to observe. Here is a footnote from the above article, p. 166, on his method for observing changes in the Beat/Upbeat ratio:

“BUR values in all musical examples included in this study were calculated by the author. The digital sound-editing program Audacity was used on a Macintosh computer to identify the onset of each note from a visual and aural analysis of its waveform. From these figures, IOIs between successive notes were defined and then employed to calculate the BUR values. There is inevitably some degree of uncertainty in identifying the attack point of each note, as noise and other onset ambiguities can render an exact determination impossible. By employing a consistent set of criteria to resolve ambiguities, I am confident that my figures are accurate to within ±.5 milliseconds, which translates into a BUR value accuracy of ±.05 at the tempos shown in Examples 4 and 5.”

Butterfield’s claims depend on his ability to identify onsets of given parts of the beat in a soloists’ performance. How can he be so sure he is that accurate? Because he is simply looking at a single point in musical time, aided by listening to the recording itself at that point, without aggregating it statistically (which we must do to characterize whole files)? What are his “criteria for resolving ambiguities”? is his manner of determining an onset useful to us in some fashion?

I have similar questions in another equally interesting article by this author. This one is open access:

Butterfield, Matthew. “Participatory Discrepancies and the Perception of Beats in Jazz,” Music Perception: An Interdisciplinary Journal, Vol. 27, No. 3 (February 2010): 157-176. URL: http://www.jstor.org/stable/10.1525/mp.2010.27.3.157

This piece tests the contribution of “participatory discrepancies,” or subtle rhythmic changes ahead or behind the beat, to the effect of swing in jazz. The idea has weight in ethnomusicology, based on widely known work of Charles Keil, who sees the inherent asynchronicity and rhythmic in music as an attraction to is, an invitation to become part and shape the interaction. (See Keil, Charles, “Participatory discrepancies and the power of Music,” Cultural Anthropology, 2, pp. 275-283.) Discrepancies between instruments have also been a central concern of ours, even if our agenda was to ask how they might affect beat-tracking software and identifying individual artists (especially drummers), rather than delving into how they fuel the fundamental human experience of playing or listening to music.

Butterfield’s main concern is whether listeners, even ones with limited musical training, can perceive these discrepancies, and he describes some elaborate tests with real subjects. His answer seems to be “not much.” His conclusion may be valid within the terms of his own research design, but it is a straw man: a misguided attempt to directly and empirically test what for Keil was more of a philosophical argument and musical manifesto. Average listeners may not be able to accurately tell whether a bass is behind a drummer, but the practice of playing ahead or behind itself is beyond question, in more kinds of music than jazz (and listeners may sense it even if they cannot articulate or analyze what they hear). Arguing this point more carefully would take us too far afield.(Butterfield himself makes some very valid points toward the end of this article, once he leaves testing with human subjects behind and proceeds from his own observations on the performance discrepancies themselves, which deserves another post here.)

What is most important here is that, on page 163, Butterfield once again refers to his own observations using waveforms in a rendering of a jazz recording in Audacity. In this case, he is interested identifying lags between the accompanying bass and drums, and determining who might be ahead, so that he can test whether listeners perceive the discrepancies.

Here Butterfield claims not only to be able to tell where an onset is, but to distinguish different instruments within a waveform:

“Cymbal strikes in particular tend to be well defined and easy to spot—they appear ‘furry.’ Bass onsets, by contrast, are characterized by a substantial burst in wave amplitude.”

See the diagram on p. 163 to visualize this “furriness” and “burst.” Butterfield then seems to veer toward, or call out for, a beat-tracking methodology in the next passage:

“[Bass onsets] are not as clear as cymbal strikes, however, and this required formulation of a consistent procedure to define them. To this end, determination of each bass onset would begin with an onset hypothesis, placing it tentatively at the peak of the first wave whose amplitude departed significantly from the prevailing shape preceding it. A careful aural analysis of the beat ensued, working backwards and forwards from that point and adjusting it in accordance with aural evidence for an earlier or later onset until it could be determined with confidence to within ±5 ms. Any beat where the bass onset could not be determined with confidence to within this interval was omitted from analysis.”

He seems to acknowledge the need to adjust moment to moment rhythmic events to a virtual or average beat, but then proceeds to do intuitively or ad hoc, it seems to me.

So there are some common points of approach between Butterfield’s work and ours. If these methods of gathering evidence are in question, what would it say in the end about his otherwise compelling arguments about rhythmic dynamics in jazz? Are there quantitative terms or observations about jazz practices that could be useful for us in profiling certain artists, or analyzing whole performances, or major structural parts?

human-generated, heavily annotated transcription

douglas — Thu, 27 Jun 2013 01:35:37 +0000

Just as a point of interest, here’s an example of a human-generated, heavily annotated transcription of a jazz performance:

Sancticity, Scofield solo analysis
transcribed by Bert Ligon
http://in.music.sc.edu/ea/jazz/Transcriptions/Sancticity.all.pdf

It’s clear that in the near future we’ll be able to generate a machine transcription that more or less matches this one in terms of notation. But it’s also clear that “which note, when” is just the very tip of what it might mean to analyze a performance.

Computational Ethnomusicology

douglas — Sun, 05 May 2013 17:38:37 +0000

Much of our discussion lately has been about biases of various sorts in MIR tools and how to avoid/fix them. Here’s a paper that touches on many of the topics we’ve been thinking about:

Computational Ethnomusicology
George Tzanetakis, Ajay Kapur, W. Andrew Schloss, Matthew Wright

http://www.karmetik.com/sites/default/files/publications/2007_compEthno_0.pdf

Plus many papers on similar themes at:
http://karmetik.com/publications

New jazz-mir mailing list!

douglas — Sun, 05 May 2013 17:27:36 +0000

We are now hosting a general discussion list for people and machines interested in exploring the application of MIR (Music Information Retrieval) techniques to jazz:

https://lists.columbia.edu/mailman/listinfo/jazz-mir

Extracting tempo from actual drum performance patterns

Tad — Sun, 06 Jan 2013 19:18:58 +0000

DAn commented on my last post agreeing that it was circular to ask whether a drummer is “ahead” or “behind” the beat, when it is precisely the drummer we may be relying on to determine the beat. That led me to want to snip the loop of this circularity in this post. Let’s leave aside the idea of a drummer’s relationship to some abstract temporal frame of reference and focus on what drums alone are doing.

Can we simply separate the drummer’s actual statement of the beat and examine that on its own? Or perhaps use this beat as the reference point for other musically meaningful events?

For a reliable, highly conventional statement of an actual tempo within many jazz performances, we would take the quarter note part of the cymbal “ride”. The familiar figure is:

dang dang-da dang dang-da-dang dang-da-dang

Which would be notated this way:

The quarter notes in 4/4 time would be the “dang” part. It is true that the second and fourth dang in each bar do not sustain for a full quarter note. They are paired with a “da.” But they are struck on the quarter note of each quarter in 4/4, creating a steady tempo.

To do this, we could look only at the exact outset of the cymbal sound (given we had the ability to separate this feature, a big, big “if” I realize.)

If these real, not abstract, quarter notes could be extracted, perhaps we could:

create an average tempo from that and ask how, or how much, the drummer deviates from his own overall practice or at any given point
use the actual drum tempo as a reference point for what other instruments are doing (given they too could be precisely measured or disaggregated).

This could help profile or identify percussionists, or how artists play together in specific ensembles. In addition, perhaps these drum beats could serve as criteria for creating musically meaningful segments that could then be analyzed.

How to Listen to “Talking Drums”: Terms for Analyzing Jazz Percussion Practices

Tad — Wed, 19 Dec 2012 18:54:55 +0000

Elvin Jones

Tony Williams

After working on beat tracking for whole performances, our conversation turned to how to think about the chief rhythmic instrument in a conventional jazz group. The drummer states the tempo for the group. He also varies it in subtle but important ways, and plays contrasting figures against it. This holds for early jazz and fusion jazz and everything in between. It is part of the rhythmic expression and dynamism that is central to jazz (and many other types of music). This level of rhythmic variation is also a challenge for MIR: it may confuse the effort to establish an overall metric for performance tempos, and also generate complex patterns that could be difficult to identify reliably.

We proposed that if we could compare different drummers with different approaches to timing and tempo, we could begin to sharpen our understanding of these musical features as they relate to MIR. We would select drummers who may tend to play “ahead” or else “behind” the beat. I mentioned Tony Williams and Elvin Jones, respectively, as having these contrasting approaches to timekeeping and rhythm. Orientation toward the beat is an important point of discussion among musicians negotiating how to play together. It is also a factor in our current discussion of how an individual local event or section of a performance deviated from the average tempo in our beat-tracking work. Investigating them should help refine the beat tracking tools we have. It can help profile artists or ensembles by rhythmic style. And it might even help in the more strictly musicological aim of uncovering the techniques that serve expression, and providing demonstrable evidence for, or against, interpretative and evaluative musical terms in jazz.

We agreed, however, that we have to define more closely what we mean by a musical event or performance being ahead or behind the beat, and whether there are other phenomena involved outside of strict matters of tempo that need to be distinguished and defined separately as part of the analysis. What follows makes a tentative step toward such definitions and remarks on the difficulties in reaching them.

I assume that, as part of Brian’s work on feature extraction, we will be able to distinguish and measure what the drummer, or his cymbal in particular, is playing with respect to the tempo. Drummers play very regular patterns as part of their role of keeping time. Even if they vary their figures, they state a pulse as part of their standard “ride” pattern. I assume also that if deviations from the average beat can be measured, the deviations of a certain instrument, like the cymbal, can be as well.

Based on that groundwork, what does it mean to play something ahead of the beat, whether in one single instance, or to do so as a marker of “style”? The word implies positive but subjective qualities like having energy, pushing the group forward (precisely in the face of the potential chaos or entropy in group improvisation). But the conventional usage it is poorly defined. “Ahead” yes, but of what?

I already implied that it could mean playing ahead of an average beat for the performance. But then there may or may not be issues of measurement error there when the tempo itself may be changing—or when it is precisely the drums whose statement of the tempo we are relying on determine the average tempo. Ahead of other performers? It is plausible that a percussive event could take place before some other musical one by a different instrument. But to say that at some given time one instrument plays “ahead” of another implies reference to some established tempo against which both can be measured, leading us back to the question of how tempo is established in the first place. Our beat-trackers do this fairly effectively for short term variations from the average tempo. But do we have a way to “view” or reference that grid for the short samples we will be starting with? Perhaps we can find a valid way to establish tempo that is tailored to such a short experimental segment.

Given that we can establish such references to an underlying beat, a reservation I (and DAn) still have about speaking of a given event or percussive stroke as being “ahead” is that may not actually being ahead of anything. There are presumably volumes written about what an “accent” is and how it “should” be played. But it seems to me that it is possible that accenting some note may create the impression of being ahead of some other instrument or tempo when it is not actually so—just as shading in black and white creates the illusion of color. To strike something a little harder in one note in a series may seem to give it more “energy” (as it does to the overall performance). It may actually have more energy, in the sense that the change is measurable in amplitude and perhaps frequency of overtones. Yet on the other hand to strike something harder may actually require or occur with a relatively earlier attack, confusing the issue further. In my work on Lucky Thompson’s style of accents, I assumed that accenting a note gave at least the impression of its being slightly ahead of other notes in the series, and of the overall rhythmic pulse, and then simply noted where I thought the accents were in a transcribed solo. (Shull, “When Backward Comes Out Ahead,” Annual Review of Jazz Studies 2003).

These issues in determining what is ahead of the beat may have their mirror image in what figures or events might be “behind” the beat. Is it bad to be “behind” something? Not in jazz or African-American music in general. Being “laid back”: it implies being soulful as opposed to aggressive; it means allowing space for the soloist to develop his or her ideas; and it suggests a sense of “holding back” or suspense that is an ingredient of storytelling.

But once again, we have to ask: behind what? The question of the frame of reference holds in this case as well as that of figures or performances that are ahead. Can a drummer be consistently behind the beat and keep from losing the tempo or momentum altogether? Or is it possible that, like the subjective feeling of being ahead, being behind is another kind of artistic illusionism?

I say this because when I listen to drummers who at face value seem to have this quality, I do not necessarily hear they are striking the instrument, say a cymbal, any later than might be expected by the overall tempo, the other instruments, or relative to what other drummers are doing. They may strike the cymbal more or less on the beat—but they allow it to sustain longer. This may be an effect of the choice of cymbal or stick, and of many other factors. The point is that the sense of a statement of pulse being “relaxed” or “behind” is really “broader” rather than actually “later.” In fact, I believe this sostenuto (sustained) effect creates the impression of ease in the work of many singers, whether jazz or not, who thus get a supple feeling without seeming to lose or float entirely outside the beat. I certainly hear it in Elvin Jones, as opposed to Tony Williams, both of whom I have been listening to in trying to articulate these concepts.

If the timbre or sonic qualities of a timekeeping beat by these two influential drummers can create these necessary illusions, so can their more sophisticated rhythmic combinations or figures. Identifying and comparing such figures may be far in the future for us, so I will briefly suggest how more complex patterns may come into play. Williams often states quarter note quintuplets in his accompanying “comments” on the soloist’s ideas, while keeping or implying the beat. That is not implying a faster rate; it is actually moving faster. In contrast, Jones tends to use quarter note triplets, or more properly displaced grouping of eighth note triplets, which seem to slow down the rate and energy of the steady stream of eighth notes generated by and expected of the soloist (at least in bebop and progressive jazz). The two drummers’ overall approaches to stating the tempo, in other words, are reinforced by their rhythmic figures and the way they subdivide the time signature.

In addition to questions of timing and relation to an underlying tempo, we have a great deal of work ahead of us on the tonal qualities of percussion instruments in jazz. As with any instrument, we confront not just timbre, but modulation of timbre, even within a given single abstract “note.” Perhaps even something as granular as a single “attack” might have to be segmented or studied further at a micro level before we can proceed further on these questions of timing and duration. This will be tough to define and measure, but I mention it because it might help profile different artists’ characteristic timbres or “sounds.” With regard to percussion instruments, it does seem that prospects are good for being able to separate them from other instruments, based on Brian’s latest work.

Rather than trying to tackle the complex phenomena that might comprise a single attack, it might be simpler to identify the precise onset of a given percussion event, in a certain recording, perhaps a single part of a drum kit (for which Brian has developed codebooks I believe, or for closely related parts of a drum set). We would also want to be able to ask what this duration of this event is after this onset. To begin, this might be chosen from a small segment of a single performance.

As a next step, we could ask how these timings, or patterns of them, relate to an average beat, and compare that to other events within or across performances. Then I believe we could confidently state that event X (onset or final decay) happened at time (A) within a tempo continuum. Of course we would need to define and implement the average beat or tempo in a satisfactory way in order to reliably track what individual instruments are doing in relation to it.

If these concepts were adequately defined, we might then examine one drummer or compare two of them. We could also apply what we learned to non-percussion instruments: Improvising soloists, too, have characteristic orientations toward “the beat.” If these tools were available, even far in the future, we could not only profile a given drummer, but get a more direct look at how they interact rhythmically with other musicians, or at how polyrhythmic practices create their expressive effects, with reference to hard evidence at a micro level.

Learning sparse instrument models

Brian — Tue, 11 Dec 2012 19:05:00 +0000

One of the first steps toward high-level analysis of audio recordings is decomposing the signal into a representation that can be easily digested by a computer. A more or less standard approach is to carve up the signal into a sequence of small frames (say, 50ms long), and then extracting some features from each frame, such as chroma/pitch distributions, or timbre/Mel-frequency cepstral coefficients.

One of the things that I’ve been working on is learning audio features which are informed by commonly used instrumentation in jazz recordings. The idea here is that if we can decompose a song into its constituent instruments — even approximately — it may be easier to detect high-level patterns, such as repetitions, instrument solos, etc. Lofty goals, indeed!

As a first step in this direction, I gathered the RWC Instrument Database, and extracted recordings of all the instruments we’re likely to encounter in any given jazz recording. These instrument recordings are extremely clean: one note at a time, in a controlled environment with almost no ambient noise. So it’s not exactly representative of what you’d find in the wild, but it’s a good starting point under nearly ideal conditions.

Each recording was chopped up into short frames (~46ms), and each frame was converted into a log-amplitude Mel spectrogram in .

Given this collection of instrument-labeled audio frames, my general strategy will be to learn a latent factorization of the feature space so that each frame can be explained by relatively few factors.

If we assume that the factors (the codebook) are already known, then an audio frame can be encoded via non-negative sparse coding:

where is a parameter to control the amount of desired sparsity in the encoding .

Of course, we don’t know yet, so we’ll have to learn it. We can do this on a per-instrument level by grouping all the audio frames associated with the th instrument, and alternately solving the following problem for both and :

After doing this independently for each instrument, we can collect each of the codebooks into one giant codebook . In my experiments, I’ve been allowing 64 basis elements for most instruments, and 128 for those with high octave range (piano, vibraphone, etc). The resulting has around 2400 elements.

It can be difficult to discern much from visual inspection of thousands of codebook elements, but some interesting things happen if we plot the correlation between the learned features across instruments:

Not quite surprisingly, there’s a large amount of block structure in this figure. Let’s zoom in on few interesting regions. First up, the upper-left block:

From this, we can see that piano, electric piano, vibraphone, and flute might be difficult to tease apart, but both acoustic and electric guitar separate nicely. Note that the input features here have no notion of dynamics, such as attack and sustain, which may help explain the collision of flute with piano and vibes. [Future work!]

The picture is much clearer in the middle block, where instruments seem to separate out by their range and harmonics. Note that violin still collides with piano and vibes (not pictured).

Finally, the lower-right block includes a variety of instruments, percussion, and human voice. With the exception of kick/toms, it’s largely an undifferentiated mess:

It seems a bit curious that cymbals show such strong correlations with almost all other instruments. One possible explanation is that most instrument codebooks will need to include at least one component that models broad-band noise; but cymbals are almost entirely broad-band noise. So, although the basis elements themselves appear ambiguous, it may be that the encodings derived from them are still interpretable: at least, interpretable by a clever learning algorithm. More on this as it develops…

loudness vs duration

douglas — Tue, 27 Nov 2012 21:32:09 +0000

I’ve been playing with plotting various EN analysis quantities against one another. I thought that pitch vs loudness or pitch vs segment duration might turn up something interesting, but visually at least, there’s not much of interest. Then I tried loudness vs duration, and wow! Some pretty distinct distributions. Parker playing Ornithology is fairly consistent, whereas our 10 versions of Autumn Leaves by ten different ensembles are more varied. Almost all of the tracks seem to have mostly short, loud segments. That might just be the nature of segments — they’re distinct events, so very quiet moments probably don’t end up as independent segments…

Ornithology:

Autumn Leaves:

Infinite tracks

douglas — Tue, 27 Nov 2012 19:52:35 +0000

I used Paul Lamere’s Infinite Jukebox app to generate some fun examples for the j-disc MIR launch event a couple weeks ago:

* Sonny Stitt: Autumn Leaves

* Bill Monroe: Roanoke

* Kenny G: Careless Whispers

segments vs beats

douglas — Tue, 27 Nov 2012 18:01:04 +0000

We’re starting to think that maybe beat tracking, as it’s usually implemented, isn’t really that useful for a lot of jazz. Not only do many jazz tracks seem to confuse beat trackers, but it’s not clear that “beats” are really that useful when asking the kinds of questions we’re interested in.

Here is a 2nd round of graphs looking at tempo over time. But this time I’ve plotted both beat lengths and segment lengths of many versions of Autumn Leaves using the EchoNest analysis engine. Beat detectors try to make an estimate of the track’s tempo and then find a beat grid that maps nicely onto the events in the track. That works well for most pop music, since there is a beat grid to be found. That’s often not quite the case in jazz. Segments are simply short snippets of sound that are meant to represent individual audio events, regardless of tempo/beat. Generally a beat will be composed of several segments, and segments can and often do cross over beat divisions.

The graphs aren’t particularly revelatory, but some of the differences between the beat curves and the segment curves are interesting. Next we need to listen through these while watching the segments curves to see if anything intriguing pops out…

N.B.: The Y axis is now fixed at 0.0-1.0 seconds to make it easier to compare across tracks. This also tames some of the wild jumps in beat length that appeared in the previous graphs.