Music and Connectionism
Peter Todd and D. Gareth Loy, editors

reviewed by Brad Garton, Columbia University Music Department

Before jumping too far into this review, it is probably important that I
"come clean" and describe the particular perspective I used when reading
this book.  I am not a researcher investigating aspects of human
intelligence with a computer, nor am I one of the growing number of
cognitive philosophers who use the machine as a springboard for
philosophical speculation.  I am a composer who happens to use computers to
realize my musical goals.  As such, I am somewhat of a dilettante in a
variety of computer-related research areas, including the
artificially-intelligent use of computers to model human behavior.  My
primary concern, however, is not with the research itself, but with
discovering tools I can adapt to aid in my pursuit of musical art.  Most of
my criticisms of this book spring from this "what's in it for me"

This perspective is not meant to be pejorative towards pure research into
musical behavior.  Indeed, music can serve as an excellent area of inquiry
into how humans operate.  Because of the special problems posed by the
practice of music, researchers such as Terry Winograd [Winograd, 1968]
and Marvin Minsky [Minsky, 1981] have seen fit to use music as a vehicle
for explorations in cognitive science.  It is important to realize, though,
that music has an extremely broad definition in contemporary society and
that research using music as a toy domain (however rich that domain may be)
must necessarily make some large and narrowing assumptions about what music

Thus, my first overarching criticism of this book is that the authors seem
too often unaware that using music as a toy domain generally means working
with toy music.  Although the scaling problem is discussed in several of the
papers, it is presented chiefly as a problem of quantitative complexity.
The musical scaling problem -- especially when talking about music in the
broad sense -- will require some significant qualitative changes, however.
The path from the low-level musical results presented in this book to a
high-level musical cognition/creation system is most probably non-linear and
possibly quite discontinuous.  Some of the papers in this collection imply
that these qualitative changes will perhaps occur as an emergent phenomenon
of hierarchically organized or more broadly structured neural nets.  After
reading through this book, I still remain unconvinced.

The second general criticism relates to the format and scope of the book.
The collected papers represent a scatter-shot approach to the moving target
of music perception and creation.  The editors acknowledge this in the
preface, stating that "the authors do not always concur or present a unified
world view but rather demonstrate the disagreement and diversity of opinion
characteristic of a dynamic young field." (p. x, Preface)  Gareth Loy (one
of the editors of the book), said that one of the primary objectives in
publishing the book was to get the research out to people in a variety of
disciplines and hopefully stimulate work in the area.  While this is
certainly a worthy goal, it would have been nice to see a bit more
"connective tissue" in the book explicating some of the assumptions and
points of contact within and between the papers.  To be sure, this is a tall
order for an emerging field of research, and too much editorial influence
might have run counter to Loy and Todd's stated mission.  I don't think that
a slightly more polemical stance by the editors would have harmed the book's
impact, and it might have made some of the underlying assumptions about
"what music is" more apparent to the reader.


The book is organized into four sections.  The first part of the book deals
with the background of neural net research in music.  Mark Dolson gives a
lucid explanation of some of the basic principles behind neural net
operation, complete with an example of a simple network designed to evaluate
and classify rudimentary rhythmic patterns.  Dolson does a marvelous job of
describing the problems he encountered when implementing his example
network.  He does not discuss in much detail how he arrived at the
particular network configuration he used for the rhythm classifier, however.
I get the impression that settling upon a network topology for a given task
is still an intuitive process.

In an addendum to his original paper (all papers in this book had been
previously published in several issues of the *Computer Music Journal*),
Dolson discusses the potential use of network models to synthesize sound.
He correctly concludes that neural nets are probably not well-suited for the
direct creation of a sound waveform, but that they do show some promise as an
interface mechanism to low-level synthesis algorithms.  I think this
might prove to be an exciting area for connectionist applications,
especially given the performance complexity (but highly realistic sound!) of
some of the new physical model synthesis algorithms being developed in the
computer music research community.

Gareth Loy closes the introductory section of the book with an historical
overview of efforts to create musically intelligent systems.  Using this
survey, Loy builds some strong arguments for a connectionist approach in
musical research.  Talking about "the problem of formal specification of
music", Loy claims that:

   The position of the composer... is really in the cracks between categories
   of formal description.  Describing the compositional process within the
   framework of a strictly formal representation will necessarily miss this
   dimension of the composer's art.  And yet a computable approach is perforce
   necessarily strictly formal.  This is the dilemma shared by all computer
   models of music. (p. 30)

"Traditional", formal AI techniques (rule-based systems, probabilistic and
algorithmic methods, etc.) fall into this trap when modelling musical
activity.  Loy feels that connectionism, being a "very pragmatic theory"
using the neuronal basis of the brain as a model, holds promise for
circumventing this difficulty (p. 32).  Connectionist systems appear able to
generalize and extract rules in complex contexts where formal descriptions
may be at best difficult.  Connectionism might prove to be the best
methodology for building models of music cognition.  Real (human) composers
and listeners alike are often quite unaware of any general rules or strategies
(if any) used in their musical experience.

Loy concludes his paper with some interesting philosophical speculations
about the artistic nature of automatic music systems.  Loy finishes 
by saying that his comments "only focus on the philosophical level
of machine models of human artistic expression." (p. 34)  He further states
that there are many other perspectives from which to view music, psychology,
computer science, musicology, etc. each with separate sets of questions and
objectives.  Unfortunately, Loy did not elaborate these different
perspectives.  I certainly would have enjoyed more of this discussion.
Loy's "philosophical level" could have helped situate some of the musical
attitudes implicit in this work within a wider context.  I also wish that 
some of these philosophical questions had played a larger role in the rest 
of the book, and that there were more interpenetration of the different 


Modelling "Low-Level" Perception

A case in point is a paper by Hajime Sano and B. Keith Jenkins which opens
the next section of the book (the "Perception and Cognition" section).  Sano
and Jenkins have devised a network model of pitch perception which they take
great pains to ground in the physiology of human hearing.  Their network
model is able to extract pitch from the harmonic information of complex
tones, even multiple complex tones using an extension to the basic model
described in an addendum by Jenkins.  This is no small feat, and certainly
very useful for a variety of computer music applications.  However,
statements such as "the majority of chordal emotional affect is related to
the relative positions of notes within an octave, not to which octaves they
are in" (p. 47) and "modelling the emotional effect of chords would most
likely require several pitch perception networks tied together feeding into
a chord classification, heteroassociative neural network" (p. 49) suggest
that the authors have an overly-simplified view of pitch perception (and
certainly perception in general).  Their scheme for reducing frequency data
into equal-tempered note data, despite the qualification that moving from
JND discriminations

[FOOTNOTE:  "JND" is an abbreviation for "Just Noticeable Difference".  In
this case a JND discrimination is the amount of pitch-shift necessary
for people to notice that the pitch is different.  This is much less than
the amount of pitch change between notes in a 12-tone equal tempered (the
piano keyboard) scale.]

to 12-tone pitch classes "is culturally dependent" (p. 45),
assumes *a priori* that pitch perception is a straightforward act
of mapping stimuli onto fixed categories.  This questionable premise leads
to the somewhat bizarre situation where the network model natively exhibits
absolute pitch perception rather than relative pitch perception.  The
underlying assumption that music fundamentally consists of notes has biased
the entire design of the model -- not a healthy situation if the project
is intended to model aspects of biological perception.

In "Connectionist Models for Tonal Analysis", Don Scarborough, Ben Miller
and Jacqueline Jones attack the problem of the induction of tonality: how do
we determine the key in a piece of tonal music?  Their model presupposes the
existence of a lower-level system for parsing incoming acoustic stimuli into
pitch categories (such as the Sano and Jenkins model).  From the vantage
point of this higher-level process, the authors are able to make some cogent
observations about music perception in general.  Although their simple
linear network model does a "more than creditable job" in extracting
tonality from simple monophonic or polyphonic music (p. 56), Scarborough
*et al.* harbor no illusions about modelling actual human perception:
"It is not clear how well this network simulates human performance because
we know very little about how people identify tonality." (p. 56)  The authors
also seem very aware of the particular cultural filters they have adopted in
designing their model, this again is probably a consequence of the
higher-level phenomenon they are investigating.

In the addendum to the original paper, Scarborough *et al.* discuss
some of the advantages and disadvantages of their simple linear network
approach.  They note that their network model is based upon the concept of
pitch classes, "an example of the cognitive assumption that the mind codes
experiences in an abstract symbolic code." (p.63)  They then speculate that
(at least in musical perception) this assumption may be wrong.

Bernice Laden and Douglas Keefe confront this issue directly in their paper
comparing several different methods for representing pitch in a neural
network model.  This is done in the context of a network intended to
classify chords as major, minor or diminished triads.  Laden and Keefe
contrast "cognitive" approaches relying upon the explicit symbolic
representation of notes with "psychoacoustic" approaches using harmonic
[Goldstein, 1973] or subharmonic [Terhardt, 1974] spectral complexes.  They
conclude that a spectral representation of pitch is preferable in network
models because it is tied more closely to the actual acoustic content of
musical sound, and it preserves much of this cognitively important

Because I read the book from front to back, probably the way most books are
read, I wish that the editors had placed Laden and Keefe's paper closer to
the beginning of this section of the book.  Laden and Keefe investigate
several network architectures and give empirical results for various numbers
of hidden units and learning epochs used.  This `hands-on' information and
the ensuing discussion was very useful for my learning how network models
are constructed, especially when coupled with Mark Dolson's paper at the
beginning of the book.  In addition, I would have preferred to read Sano and
Jenkin's paper on pitch extraction after the discussion of different
approaches given by Laden and Keefe, or at least have had some way of 
connecting the two papers more directly.

"Higher-Level" Musical Cognition
I also wish that Jamshed Bharucha's paper "Pitch, Harmony, and Neural Nets:
A Psychological Perspective" had opened the "Perception and Cognition"
section.  Although I disagree with Bharucha's criticisms of Laden and Keefe
(especially his decoupling of pitch perception from a spectral representation
of pitch), his discussion of the issues involved in modelling pitch and
harmony is quite enlightening.  Bharucha has done some seminal work in
musical connectionism, and his admonishments to consider known constraints,
both theoretical and empirical, when modelling human perception should inform
nearly all of the work being done in this field.

In this paper, Bharucha presents a model extending his MUSACT system
[Bharucha, 1987] for recognizing keys and chords from tones.  Bharucha
focuses on how this capability is learned, certainly a primary concern when
attempting to model human performance.  The strength of relying upon `real-
world', psychological models (in this case, Bharucha's insistence that
learning should occur through passive exposure to music; an hypothesis
which may or may not be true) is revealed by the development of a robust 
model of pitch cognition, exhibiting human characteristics such as 
transposition invariance (a melody sounds more or less the same no matter 
what key is used) and key-distance effects (essential for the development 
of tonal relationships).

Marc Leman tackles the question of tonal relationship development using a
neural network self-organization technique known as the Kohonen Feature Map
[Kohonen, 1984].  While I don't completely buy into Leman's assertion that
the notion of a "cognitive map" is "essential for the explanation of
cognitive processes" (p. 103), I certainly endorse his stated intention of
adopting what he calls a *subsymbolic* approach:

   This implies, among other things, that the system should exhibit what we
   call "responselike" behavior to stimuli in the environment.  This criterion
   embodies the idea that a system develops tonal semantics only in virtue of
   the response of the system to the environment.  Stated differently, the
   tones encountered acquire meaning solely because they are relevant for the
   action of the organism in the environment. (p. 103)

While I'm certainly not a hard core (nor necessarily even a soft-core)
behaviorist, I like this approach because of the relative lack of
assumptions about "how music should go" imbedded in a model based on this
design criterion.

Unfortunately, Leman is forced to abandon his "ultimate goal... to start
from the raw acoustic data" in favor of a "more modest approach" to pitch
representation "over which strict control could be more easily exercised."
(p. 106)  Leman settled upon an input pitch coding scheme based upon
Terhardt's subharmonic spectral complex theory [Terhardt, 1974] (an
approach grounded in the actual acoustic signal rather than some
*a priori* abstract representation scheme), for similar
reasons to those outlined in Laden and Keefe's paper.  The self-organized
KFMs resulting from exposure to major, minor and dominant seventh chords
(the most common chords in Western tonal music) show some remarkable
emergent features which can be correlated with Western listeners' experience
of tonality (i.e. chords and keys closely related through the "circle of
fifths" lie close together on the resultant KFM).  My fear is that this may
be more a consequence of the particular input coding adopted by Leman.
Leman reduces much of the spectral information from the Terhardt
representation into a single equal-tempered octave, a move which certainly
has some tonal implications.  It would be terrific if Leman could operate on
raw acoustic data.  If such a model were constructed, then it would be
fascinating to see the KFMs resulting from non-Western musics using
instruments with a large number of non-harmonic partials.

Bharucha and Peter Todd take a more time-oriented approach to the problem of
tonal structure development.  The authors present a sequential memory
network designed to work in conjunction with Bharucha's MUSACT model
[Bharucha, 1987].  Bharucha and Todd use this network memory to learn
schematic and veridical expectancies for sequences of chords.  Schematic
expectancies are musical commonalities existing within a cultural tradition,
and veridical expectancies are built from an individual's knowledge of
specific pieces of music.  The network ultimately learns a set of heavily
contextualized probabilities for a given chord occurring in an on-going
musical passage.  The interesting feature of this model is the authors'
consideration of how these expectancies are learned, especially given a
particular cultural environment.  As in Bharucha's earlier work, this model
learns through passive exposure, simply by "hearing" the music.  The
assumption behind this approach is that most people learn listening
strategies through the passive immersion in a musical culture.

The biggest difficulty I have with this model of human perception is the
implicit assertion that the violation or fulfillment of harmonic expectancies
is a major (if not *the* major) component in our hearing of music.
Taken to the extreme, this view suggests that we are either shocked or bored
when listening to music.  I believe that this aesthetical stance, generally
attributed to Leonard Meyer [Meyer, 1956], is more an artifact of the
compartmentalization of musical parameters and narrow focus upon the pitch
parameter which has developed in the Western musical tradition.  What does
the expectancy/violation theory tell us about other musical features?
Timbre is a fundamental part of my own experience of music -- what is a
timbral expectancy, and how is it violated?  I would argue for a more
holistic approach to musical perception, involving timbre, sonic density,
rhythm, time, etc. not as separate musical parameters but instead as
essential and interconnected parts of a unified perceptual entity.

Coding of Musical Features and Patterns

Robert Gjerdingen attempts to address some of these problems in the coding
of input to his adaptive resonance theory [Grossberg, 1976; Carpenter and
Grossberg, 1987] network model.  Gjerdingen's ART model (which he calls *L'ART
pour l'art*) learns to recognize abstract musical patterns from
relatively complex music (early Mozart).  *L'ART pour l'art* constructs
musical memories by representing a set of 34 input features as activations
which decay or become reinforced as new inputs are introduced.  Gjerdignen's
input features include items such as the melodic scale degree, the bass and
melody contour (up or down), melodic "inflections", etc.  Ostensibly the
learned feature vectors can be used to parse new music, in a somewhat more
complex version of Bharucha and Todd's harmonic expectancy model.

Although Gjerdingen included many other features of music besides simple
pitch and harmony in his input coding scheme, these features are all tied
almost exclusively to pitch.  Gjerdignen also grounds his work firmly in the
harmonic expectancy/violation theory of Meyer -- my criticisms of Bharucha
and Todd's model apply equally here.  By using Gjerdingen's selected
features for input, I suspect that *L'ART pour l'art* may be learning
more about what Gjerdingen thinks is important in music than general musical

With his decaying feature activations, however, Gjerdingen did include a
more explicit concept of the flow of musical time in his model; musical time
being more-or-less represented in previous models as a shifting "context".
Peter Desain and Henkjan Honing focus exclusively on time problems in their
connectionist approach to recognizing musical rhythms.  From a purely
utilitarian standpoint, this paper is noteworthy.  The parsing of music into
rhythms is badly needed by musicians working with real-time interactive
computer music systems.  It would also be extremely useful in the automatic
transcription of human-performed music by computer.

The model works by using "interaction cells" to bias "basic cells" towards
small integer relationships.  The basic cells contain the inter-onset
intervals measured between incoming musical events.  A refinement of the
model introduces "sum cells" to assist the interaction cells.  The sum cells
account for complex rhythms in which short durations are intermingled with
long durations (i.e. eighth or sixteenth notes with half or whole notes).

Even though the basic topology of Desain and Honing's model is fairly
simple, the interactions of the cells can become quite complex.  Desain and
Honing describe several techniques for analyzing network behavior -- the
clamping of all cell states except one to study the functioning of the
single cell, and the use of "state space" graphs (discussed in the addendum)
to observe global network activity.  Again from a purely utilitarian
standpoint, these discussions are quite useful, as is the listing of LISP
code implementing the model which is included with the article.


Melodic Composition

I was disappointed in the next major section of the book, "Applications".
From my "what's in it for me" perspective, I was looking forward to seeing
how neural network models could be used as compositional tools.  With one
exception, all of the papers dealt almost exclusively with the automatic
composition of melodies, and only one of these papers generated polyphonic
music (melodies with an explicit harmonic accompaniment).  Being more of a
timbre-oriented composer, this work was not particularly useful to me
personally.  This heavy emphasis on the production of melody-generating
systems is symptomatic of the narrow view of music as consisting mainly of a
sequence of pitches, an idea which pervades most of the research presented
in this book.  

As I described earlier, I am also allergic to the
"parameterization" of music that goes hand-in-hand with this approach to
musical modelling.  What is the pitch sequence of the famous motto of 
Beethoven's 5th Symphony without the characteristic rhythm and accent
patterns?  More to the point, what is this simple theme without the 
incredible intellectual and emotional context built by Beethoven?  Is it
even truly possible for contemporary listeners to hear this theme without
the 'Beethoven' context? I don't believe that the various parameters of 
music can be so cleanly separated from each other and investigated as 
independent entities.  I'm not even convinced of the primacy of pitch 
perception, at least for my own experience of music.  If music were nothing
more than a sequence of pitches conjoined with some rhythmic templates,
overlaid by a set of timbres, then listening would be a dreary experience 
indeed.  It could be argued that the intellectual and emotional excitement 
of music comes at higher levels of processing, but I don't subscribe to a
strongly-ordered, hierarchical model of mental processing.  My vote goes 
instead for a more integrated, a more *connected* approach.  My worry is 
that by working with a restricted, toy-domain music these integrated
connections are severed and the phenomena under investigation become
simplified right out of the model.

With this jeremiad aside, however,
Peter Todd presents a good overview of many issues involved in melodic
composition in the opening paper of this section, "A Connectionist Approach
to Algorithmic Composition".  Todd describes various systems using network
models, his intention being to "present alternative approaches and
tangential ideas [which] are included throughout as points of departure for
further efforts." (p. 173)  Todd then discusses a model which can learn from
sequences of pitches and rhythms given one or more simple melodies as input.
The network accomplishes this through the construction of contextualized
"plan vectors", or schemes for putting together melodic sequences.  Todd can
then manipulate the plans in various ways to produce new melodies.

One problem with this approach derives from the local view of pitch
transition built into the model.  This limits the size of the pitch
sequences which can be manipulated by the model.  I also suspect that longer
sequences would tend to "wander" without a clear musical direction because
the network cannot readily learn higher level knowledge of large-scale
musical structure.  In an addendum to his paper, Todd discusses the
hierarchical organization of several sequential network models to overcome
these problems.  How hierarchical knowledge might be learned by this
super-network is an unresolved question.

Michael Mozer concentrates on the construction of sequences of pitches only
with his CONCERT network.  No representation of rhythm or any other
parameters is included in the model.  Mozer is careful to work within
psychophysical constraints (such as judgements of "closeness" of
one pitch to another made by human observers, or relative amounts
of consonance and dissonance between different pitches)
in the representation of pitch in his model.  His feeling is that
the creation of "melodies people perceive as pleasant" must
be tied to a "psychologically-motivated representation of pitch."  (p. 202)
Mozer uses a pitch coding scheme which, rooted in psychophysics or not,
emphasizes diatonic relations developed in the Western tonal tradition.  As
presumptuous as this assumption is, I find more disturbing Mozer's assertion
that "a complete model of music composition should describe each note by a
variety of properties -- pitch, duration, phrasing, accent -- along with
more global properties such as tempo and dynamics." (p 195)  This statement
says much about what Mozer means by "composition".  If his intention is to
model human creativity, then his conception of that activity is extremely
constricted.  I realize that it is necessary to begin modelling a complex
activity with some basic set of assumptions about that activity, but I think
that Mozer's view of what composition entails is overly constrained.  I
don't believe that a robust model of general composition can be `scaled-up'
by simply adding note-properties.

Mozer does show how network models can capture dimensions of context in ways
which are not possible with "traditional" algorithmic compositional
paradigms.  Mozer also raises an interesting question concerning how to
judge the success of a network model:

   One potential pitfall in the research area of connectionist music
   composition is the uncritical acceptance of a network's performance.  It is
   absolutely essential that a network be evaluated according to some objective
   criterion.  One cannot judge the enterprise to be a success simply because
   the network is creating novel output. (p. 195)

This statement stands in direct contrast to Peter Todd's remark that the
melodies generated by his network model, "while incorporating important
elements of the training set, remain more or less unpredictable and
therefore musically interesting." (p. 188)  My own view is something like
the adage "the proof of the pudding is in the eating."  The problem with a 
musical Turing test is that the sonic pudding may taste radically different 
to different people.  Of course, if arguments about the success of a 
compositional model become arguments about musical taste and style, then the 
model has probably succeeded.

Musical Judgement and Style

J. P. Lewis actually tries to imbed some notion of musical taste in his
compositional model.  Lewis uses a technique he calls "creation by
refinement", in which "a standard supervised gradient descent learning
algorithm trains a network to be a 'music critic' (preferentially judging
musical examples according to various criteria)." (p. 212)  This acquired
critical knowledge is then used to refine a haphazardly created composition,
until the network decides it is "good".  Lewis also recognizes the
difficulties encountered by relatively low-level models when attempting to
capture a larger-scale musical structure.  His solution is similar to Todd's
in employing a hierarchical design strategy.  The hierarchy uses a scheme of
grammar rewriting rules, such as sequence *ABC* being expanded to
*AxByC*.  Lewis makes no claim that this *really* captures any
musical deep structure, his primary motivation being to make longer musical
passages computationally manageable.

Teuvo Kohonen, Pauli Laine, Kalev Tiits, and Kari Torkkola describe a
non-neural network algorithm for capturing compositional "style".  The
relation of this work to the other models presented in this book is through
its treatment of an unfolding musical context.  The algorithm uses Kohonen's
dynamically expanding context (DEC) algorithm [Kohonen, 1985] to specify the
succession of notes as loosely as possible.  The grammar learned through
this technique becomes specific only when a controversy or conflict is found
in the parsed data.  Like the other compositional models in this book,
Kohonen *et al.* apply the DEC learning algorithm to several pieces and
then use the acquired grammar to generate new pieces in the same style.  The
refreshing aspect of this paper is that the authors applied the technique to
polyphonic music instead of simply generating rather abstract pitch
sequences.  However, this approach suffers from an inability to capture any 
deep musical structure, as the examples in the paper humorously demonstrate.

The final paper in the "Applications" section is actually one of the best
examples of an `application' of connectionism in music.  Samir Sayegh uses a
network implementing Viterbi's algorithm [Viterbi, 1967] to find optimal
paths for guitar fingering.  Sayegh uses observed solutions of expert
guitarists to construct cost functions which are learned by the network.
These are then used to compute fingerings for other guitar pieces.  It would
have been nice if Sayegh had included some evaluations of the generated
fingerings by practicing guitarists, but the paper seems more focussed upon
the application of the algorithm as a computer science problem rather than a
music problem.


The book ends with a very short "Conclusions" section containing a Letter to
the Editor of the *Computer Music Journal* by Otto Laske commenting upon
connectionist composition, responses to the letter by Loy and Todd, and a
brief paper outlining some possible directions for future research by Todd.
Laske criticizes connectionist musical systems as representing
*model-based composition*, this being opposed to *rule-based
composition*.  Laske states that "connectionist models of composition seem
to come attached with an aesthetics that is more suited to pedagogy and
musicology in the orthodox sense than to compositional thinking and
composition theory." (p. 260)  Laske points to the lack of knowledge about
the deep structure of music in network models as a symptom of this
regressive approach.  While I don't subscribe to Laske's notion that we
composers must operate from within a "compositional theory", I do endorse
the related idea that computer music algorithms should have appropriate
handles for compositional manipulation.  I do think, however, that many of
the systems described in this book -- especially some of the pitch
perception and rhythm quantizing models -- would be excellent tools "to be
added to the composer's toolbox to further the creative effort" (Todd's
words, p. 261).  In this light, I can appreciate Loy's intention "to bring
these techniques to the attention of composers so that they may be validated
in practice." (p. 262)

Todd concludes his response to Laske by saying:

   Finally, it is ridiculous to speak of "a primitive notion of composition" as
   if there were an established, universal aesthetic hierarchy of means, let
   alone ends.  The fact that the connectionist approach shows "a lack of
   notions of composition theory" is one of its virtues, freeing the composer
   as it does from remembered compositional theories of the past -- if not
   "remembered musics of the past." (p. 261)

I don't agree that connectionist approaches show a lack of notions of
composition theory.  I believe that many of the connectionist systems
discussed in this book have very particular notions of composition theory,
and that this  compromises their claims to higher-level generality.  When I
read about systems which represent music as a set of discrete and virtually
independent parameters, then I realize that a very strong concept of
compositional theory has been implicitly and irrevocably imbedded in the
model.  When I read about attempts to capture the deep structure of music
through the hierarchical organization of low-level models, then I realize
that the authors have a relatively clear concept of how to construct music.
This bothers me, because the deep structure of music is itself not a very
clear concept.  In fact, there is considerable disagreement among us humans
as to what makes a piece of music cohesive or coherent, or even whether it
needs to be.  As I said earlier in this review, the "musical scaling problem"
is not a simple matter of computational scaling.  It is more a matter of
modelling intelligence in general.  This is probably what makes music such
an attractive area for AI research.

My big caveat is that there are some fundamental pitfalls which must be
recognized when Science intersects with Art.  The goals of the researcher
can be diametrically opposed to the goals of the artist.  Blanket,
cross-boundary pronouncements about either pursuit should be viewed with a
healthy degree of skepticism.

It is easy for a reviewer to wax critical of an endeavor as young as this.
Many of my criticisms are a bit "nit-picky".  Probably the fundamental
task for the reviewer, however, is to recommend the purchase (or non-purchase)
of the book under review.  In this case, my answer is easy:  I have already
recommended to a number of students that they get this book.  Even though
the collection is somewhat scattered -- after all, it is covering a broad
range of topics in an emerging field -- the book does give a good overview
of the initial research being done in connectionist music modelling.  If the
editors' intentions were indeed to stimulate and intrigue a potential
audience of composers and music researchers, then they have succeeded


Bharucha, J. 1987.  "MUSACT: A Connectionist Model of Musical Harmony."
   Proceedings of the Ninth Annual Conference of the Cognitive Science Society.
   Hillsdale, NJ: Erlbaum Associates, pp. 508-517.

Carpenter, G. A., and S. Grossberg. 1987.  "ART 2: Self-organization of
   Stable Category Recognition Codes for Analog Input Patterns."
   Applied Optics 26:4919-30.

Goldstein, J. L. 1973.  "An Optimum Processor Theory for the Central
   Formation of the Pitch of Complex Tones."  Journal of the Acoustical
   Society of America 63:486-497.

Grossberg, S. 1976.  "Adaptive Pattern Classification and Universal
   Recording, II: Feedback, Expectation, Olfaction, and Illusions."
   Biological Cybernetics 23:187-202.

Kohonen, T. 1984.  Self-organization and Associative Memory.  Berlin:

Kohonen, T. 1985.  "Dynamically Expanding Context."  Report TKK-F-A592.
   Helsinki, Finland: Helsinki University of Technology.

Meyer, L. 1956.  Emotion and Meaning in Music.  Chicago: University of
   Chicago Press.

Minsky, M. 1981.  "Music, mind and meaning."  Computer Music Journal

Terhardt, E. 1974.  "Pitch Consonance, and Harmony."  Journal of the
   Acoustical Society of America 55:1061-1069

Viterbi, A. J. 1967.  "Error Bounds for Convolutional Codes and an
   Asymptotically Optimum Decoding Algorithm."  IEEE Transactions on
   Information Theory IT-13(2):260-269

Winograd, T. 1968.  "Linguistics and the Computer Analysis of Tonal
   Harmony."  Journal of Music Theory 12:2-49.