The "Blues-o-Matic" Real-time Interactive Performance Model

Brad Garton and Damon Horowitz
Columbia University Music Department
703 Dodge Hall New York, NY 10027 USA
brad@woof.music.columbia.edu; damon@woof.music.columbia.edu 


This paper describes the features of the "Blues-o-Matic" performance
system, which consists of a PowerGlove controlling the evolution of
a simulated blues guitar performance. The performance model is implemented
on a NeXT machine, using the MusicKit to generate sound on the internal
Motorola DSP chip. Gestural data coming from the PowerGlove is interpreted
by the performance model, and used to influence the unfolding performance
in a relatively "high-level" manner. A discussion of the choice of
hardware used in this system along with an overview of the software
performance model is included. 


Overview 

Newer and more sophisticated synthesis algorithms require relatively
intelligent control of synthesis parameters in order to be used effectively.
For the real-time use of these synthesis techniques, this difficulty can
be addressed by building more complex control devices, thus enabling
the human performer direct access to the generation of sound. An alternative
approach is to imbed more knowledge about how to use the synthesis algorithms
in a program designed to interpret relatively high-level gestures and
information in the context of an on-going performance. The "Blues-o-Matic"
system described in this paper takes the latter approach. The program controls
a real-time performance model based on idiomatic blues-guitar playing
techniques by interpreting data from a gestural interface (the interface
device we currently use is a Nintendo PowerGlove). The interface design was
also motivated by an interest in exploring methods for codifying some portion
of the intuitive relationship between the gestural movement of a hand and
musical expression. This allows a very high-level control of the music,
as opposed to the literal control (such as the production of individual
notes) provided by most traditional instruments. The PowerGlove user functions
in a similar fashion to a conductor in shaping a performance, except with the
further ability of being able to determine the actual musical material present
by steering the direction of the evolving improvisation (somewhat in the
fashion of the leader of a free improvisation jam session). 


Sound Synthesis and The Performance Model 

The synthesis algorithm we used in our system is a real-time implementation
of Charles Sullivan's "strum" algorithm [Sullivan, 1990] developed by Rick
VanderKam at Stanford University. VanderKam's version was written using the
NeXT MusicKit; the synthesis is accomplished on the Motorola 56001 DSP chip
included in the older NeXT hardware. The synthesis program gives fairly
high-level control over a distorted electric guitar sound, including the
ability to specify pitches for two independent guitar strings and modification
of several timbral parameters (distortion coefficients and feedback gain). 

To achieve our goal of a "guided improvisation", we elected not to 
have the PowerGlove control the synthesis algorithm directly. This 
proved to be a rather serendipitous decision, as the latency of the 
PowerGlove in parsing gestures (described later in this paper) would 
have made direct control of all relevant parameters in a musically 
satisfying way very difficult. Instead we chose to use a separate 
program, or virtual performer, to act as an interpretive agent. This 
virtual performer interprets data coming from the PowerGlove as high-level 
musical directives. In certain cases, our system does allow for the 
direct specification of particular synthesis parameters, but these 
act in a global manner and are not confined to a single musical events. 
This mode of operation is generally used to set or modify baseline 
default values used by the synthesis algorithm. 

The virtual blues performer program was built using ideas we had
explored in earlier non-real-time performance models (see [Garton, 1992]
for a more complete description of this approach to modelling musical
performance). The model uses several interconnected layers: the physical
layer constrains the synthesis according to "real world" principles -- it
takes a finite amount of time for a human performer to move from one note
to another on a guitar fretboard, certain combinations of notes are
impossible on a real guitar, etc. Idiomatic playing techniques are coded
in the inflection layer of the model -- pitch-bending methods, the "blues" 
scale itself, particular picking techniques, etc. The gestural layer 
contains information about specific musical patterns which occur within 
the performance idiom being modelled. These gestures are the building 
blocks used by the program to construct longer musical passages. 

Our original intention was to have the final layer of the performance 
model (the shape layer -- controlling the context for the choice of 
gestures) dictated entirely by data from the PowerGlove. In actual 
practice, however, this approach proved to be far too difficult. Attempting 
to control the harmonic context, the pacing of the gestures, the "energy 
level" of the music as well as other unfolding musical factors simultaneously 
overloaded the capabilities of the PowerGlove (and the PowerGlove 
user as well!). We decided to give the virtual performer many of the 
decision-making capabilities included in our earlier performance models 
and opted to let the PowerGlove influence the direction of the musical 
decisions rather than make the decisions directly. We did leave open 
the option to specify exactly a particular gesture or musical choice, 
however, thinking that it would be wise to have this capability in 
the hands of the human user. 


Use and Design of the PowerGlove Interface 

A description of the PowerGlove apparatus is appropriate here, for 
our approach to the project was significantly affected by its structure 
and limitations. The glove on the user's hand emits a signal to a 
receiver which is mounted on the computer monitor. The data output 
from this hardware is captured by a driver using the NeXT's DSP. The 
information provided by the glove is as follows: the glove's position 
in three-dimensional space with respect to the receiver, its degree 
of clockwise rotation along the z-axis (between the performer and 
the monitor), and the amount of flex (three degrees of freedom) of 
each of the fingers. This presents a drastic restriction on the type 
of physical movement which is meaningfully detectable with this apparatus; 
there is no facility for determining the rotation of the glove with 
respect to the other two axes, rotation which is present in most "natural" 
hand gestures. Furthermore, there is a large error factor in the data 
which is transmitted; the position in space on any given axis is unreliable 
(will fluctuate) to a factor of one-tenth of the total range (i.e. 
for a range of 250 "units", an error of up to 25 in any reading is 
expected), and the degree of flex in the fingers can only reliably 
distinguish a fully flexed position from a fully open hand position. In
addition, there is a latency of approximately .5 seconds from the movement
of the glove to the receipt of the data by our program. 

Initially, we wanted to use the glove as a real-time continuous controller
producing immediate musical response to a movement. Such an interface
provides the intuitively pleasing feel of "sculpting" the music because
the sound responds as though it is literally being touched by the glove.
The latency of the glove eliminated the possibility of this approach. Instead, 
we implemented the project in two separate modes: an interactive improvisation 
steering mode, along the lines of our original conception of the project 
except modified to accommodate the limitations of the glove; and a 
gesture parsing mode to take advantage of the possibilities of figuratively 
painting gestures with the glove. This latter approach is not preempted 
by the limitations discussed above. These two modes are explained in detail
below. 


Parsing Gestures 

Mapping physical gestures into musical gestures is a difficult task, for
neither physical gestures nor musical gestures are well-defined. On the
musical side, our approach to the problem was to collect a range of typical
blues "riffs" (these are coded in the gestural layer of the performance model),
and use these as a set of hard-coded (with the exception of some randomness 
between different versions of a given riff) target gestures. To give 
us some higher-level control over these riffs from the PowerGlove, 
they were classified according to a few descriptive parameters -- 
pitch level, energy, sharpness, direction, etc. These parameters were 
then addressed from the PowerGlove by a type of motion which bears 
some literal relation to the parameter. For example, pitch level is 
determined by position on the y-axis, and a change can be directly 
signaled by a notch-like movement at a given level; energy is similarly 
reported by a "zigzag" motion. Alternately, the values for each of 
the parameters are used as atoms in a simple gestural language, which 
can parse any given gesture based upon its score for each of the parameter 
tests. 

In performance, this mode begins by marching through a harmonic 
template of a standard blues form with a default pattern. Once global 
values have been directly set for the parameters described above, 
gestures can be entered into the template by a checkmark motion with 
the glove. This tells the virtual performer to insert a gesture with 
the current parameters at the indicated point in the template. In 
addition, any glove gesture can be parsed based upon our descriptive 
parameters and sent to the virtual performer as a specific musical 
gesture to be entered into the template. Once the template is replete 
with gestures to be played at specific points, the glove can be used 
in a "transformation" mode. The space the glove moves through is here 
sectioned into a grid, with pitch level values along the y-axis and 
energy values along the x-axis. When moving the glove through this 
space, its current position is sent to the performer, which transforms 
whatever gesture exists in the current position of the template by 
changing the gesture's pitch level and energy values to correspond 
to the glove's position (this mode can be toggled to affect all gestures 
or only those which were not specified directly when entered into 
the template). This technique of parsing gestures and then steering 
the transformations of them successfully provides a method for establishing 
a form (by entering gestures into the template) and then controlling 
its evolution (transformation) while still maintaining the original 
structure. However, there is currently no provision for creating new 
musical material (outside of the set of gestures available) or altering 
the harmonic structure of the entire template. These additions would 
help to create a more meaningful and flexible musical tool.


Interactive Improvisation 

The goal in interactively steering the computer's improvisation was 
to provide an interface which would allow for real-time entry of parameters 
to control the evolution of the music. This objective is related to 
that of the "transformation" mode above, which directs the course 
of repeated passes through the harmonic template of gestures, except 
that here the harmonic template is abandoned in favor of a free improvisation 
in which the musical material itself is created by the performer. 
The low-level musical material available in this type of interaction 
is provided by the virtual performance model, which defines the manner 
in which sequences of notes can be produced in a blues style given 
a general direction and certain restrictions. In the interactive improvisation 
mode, the performer's gestures indicate ranges in which the improvisation 
can unfold. The parameters include those used in gesture-parsing mode, 
but are extended to offer more control in steering the actual musical 
material. The size of the pitch window restricting the music, the 
pitch level of this window, the direction of gestures operating within 
this window, and the energy, speed, and rhythmic constituents of the 
improvisation are all controllable by the performer through simple 
gestures with the PowerGlove. Each of these parameters can be sent 
individually, or any given physical gesture can be parsed and given 
values for each parameter (which are then all sent together). This 
latter form of control is essentially a mapping between a gesture 
and a specific type of improvisation -- a type defined by the ranges 
of the parameters. Through these forms of control, this mode effectively 
provides the performer with a means of steering the evolving improvisation 
based upon high-level changes in general parameters. The virtual performer 
handles the low-level specifics (of note choices and inflections) 
to realize intentions signaled by the performer. However, the latency 
present in our glove apparatus reduces the intuitive satisfaction 
by reducing its potential as a continuous controller. It also requires 
that the control of the performer be at more of a "meta" level than 
would perhaps be desirable. Ideally the present level of control should 
be maintained while more direct musical control is made available. 
This would allow the performer to achieve specific rhythmic or melodic 
lines within the context of the evolving improvisation. 


Comments 

One of the interesting aspects of this project was the fact that 
the technology used was relatively low-scale, but the system produced 
some complex and intriguing musical output. We believe that this resulted 
from the inclusion of an interpretive agent with some imbedded "knowledge" 
about music in the interface. The richness of the interaction which 
developed (despite the limitations of the PowerGlove) was constrained 
mainly by the sophistication of the performance model and not by the 
precision or complexity of the controlling hardware. Too often designers 
of musical interfaces (and possibly VR interface technologies in general) 
place a premium on the design of the interface hardware while neglecting 
consideration of the system or processes which the hardware is intended 
to be interfaced to. Taking "us humans" as a model again -- our hands 
have only five fingers, but the world presents an infinite number 
of ways to use them. 


References 

Garton, B. 1992. "Virtual Performance Modelling." Proceedings of the 1992
	International Computer Music Conference.  International Computer Music
	Association, San Francisco.

Sullivan, C. 1990. "Extending the Karplus-Strong Algorithm to Synthesize
	Electric Guitar Timbres with Distortion and Feedback." Computer Music
	Journal 14(3): 26-37.