Sei sulla pagina 1di 449

Music and Shape

Studies in Musical Performance as Creative Practice

Series Editor John Rink

Volume 1
Musicians in the Making: Pathways to Creative Performance
Edited by John Rink, Helena Gaunt and Aaron Williamon
Volume 2
Distributed Creativity: Collaboration and Improvisation in
Contemporary Music
Edited by Eric F. Clarke and Mark Doffman
Volume 3
Music and Shape
Edited by Daniel Leech-Wilkinson and Helen M. Prior
Volume 4
Global Perspectives on Orchestras: Collective Creativity and Social Agency
Edited by Tina K. Ramnarine
Volume 5
Music as Creative Practice
Nicholas Cook

About the series

Until recently, the notion of musical creativity was tied to composers and the works
they produced, which later generations were taught to revere and to reproduce in
performance. But the last few decades have witnessed a fundamental reassessment
of the assumptions and values underlying musical and musicological thought and
practice, thanks in part to the rise of musical performance studies. The five volumes in
the series Studies in Musical Performance as Creative Practice embrace and expand
the new understanding that has emerged. Internationally prominent researchers,
performers, composers, music teachers and others explore a broad spectrum of
topics including the creativity embodied in and projected through performance,
how performances take shape over time, and how the understanding of musical
performance as a creative practice varies across different global contexts, idioms
and performance conditions. The series celebrates the diversity of musical perfor-
mance studies, which has led to a rich and increasingly important literature while
also providing the potential for further engagement and exploration in the future.
These books have their origins in the work of the AHRC Research Centre
for Musical Performance as Creative Practice (, which con-
ducted an ambitious research programme from 2009 to 2014 focused on live
musical performance and creative music-making. The Centre’s close inter­
actions with musicians across a range of traditions and at varying levels of
expertise ensured the musical vitality and viability of its activities and outputs.
Studies in Musical Performance as Creative Practice was itself broadly con-
ceived, and the five volumes encompass a wealth of highly topical material.
Musicians in the Making explores the creative development of musicians in
formal and informal learning contexts, and it argues that creative learning is
a complex, lifelong process. Distributed Creativity explores the ways in which
collaboration and improvisation enable and constrain creative processes in
contemporary music, focusing on the activities of composers, performers and
improvisers. Music and Shape reveals why a spatial, gestural construct is so
in­valuable to work in sound, helping musicians in many genres to rehearse,
teach and think about what they do. Global Perspectives on Orchestras consid-
ers large orchestral ensembles in diverse historical, intercultural and postcolo-
nial contexts; in doing so, it generates enhanced appreciation of their creative,
political and social dimensions. Finally, Music as Creative Practice describes
music as a culture of the imagination and a real-time practice, and it reveals the
critical insights that music affords into contemporary thinking about creativity.
Music and Shape
Edited by
Daniel Leech-​Wilkinson
Helen M. Prior

Oxford University Press is a department of the University of Oxford. It furthers
the University’s objective of excellence in research, scholarship, and education
by publishing worldwide. Oxford is a registered trade mark of Oxford University
Press in the UK and certain other countries.

Published in the United States of America by Oxford University Press

198 Madison Avenue, New York, NY 10016, United States of America.

© Oxford University Press 2017

All rights reserved. No part of this publication may be reproduced, stored in

a retrieval system, or transmitted, in any form or by any means, without the
prior permission in writing of Oxford University Press, or as expressly permitted
by law, by license, or under terms agreed with the appropriate reproduction
rights organization. Inquiries concerning reproduction outside the scope of the
above should be sent to the Rights Department, Oxford University Press, at the
address above.

You must not circulate this work in any other form

and you must impose this same condition on any acquirer.

Library of Congress Cataloging-​in-​Publication Data

Names: Leech-Wilkinson, Daniel. | Prior, Helen M.
Title: Music and shape / edited by Daniel Leech-Wilkinson, Helen M. Prior.
Description: New York, NY: Oxford University Press, [2017] |
Series: Studies in musical performance as creative practice; 3 |
Includes bibliographical references and index.
Identifiers: LCCN 2016042331 | ISBN 9780199351411 (hardcover) | ISBN 9780199351442 (oso)
Subjects: LCSH: Music—Psychological aspects. | Music—Performance—Psychological aspects.
Classification: LCC ML3838.M94947 2017 | DDC 781.1/7—dc23
LC record available at

9 8 7 6 5 4 3 2 1
Printed by Sheridan Books, Inc., United States of America

List of contributors  ix
List of illustrations  xvi
About the Companion Website  xxiii
Preface  xxv

PART 1  Shapes mapped

Reflection  Evelyn Glennie  3
1 Key-postures, trajectories and sonic shapes  4

Reflection  Lucia D’Errico  30

2 Shape, drawing and gesture: empirical studies of cross-​modality  33

Reflection  Anna Meredith  57

3 Cross-​modal correspondences and affect in a Schubert song  58

PART 2  Shapes composed 

Reflection  George Benjamin  89
4 Affective shapes and shapings of affect in Bach’s Sonata for
Unaccompanied Violin No. 1 in G minor (BWV 1001)  96

Reflection  Steven Isserlis  127

5 Shape in music notation: exploring the cross-​modal representation
of sound in the visual domain using zygonic theory  129

Reflection  Alice Eldridge  165

6 The shape of musical improvisation  170
viii Contents

PART 3  Shapes performed 

Reflection  Max Baillie  207
7 Shape as understood by performing musicians  216

Reflection  Simon Desbruslais  242

Reflection Malcolm Bilson  248
8 Shaping popular music  252

Reflection  Steven Savage  278

PART 4  Shapes seen 

Reflection  Mark Applebaum  283
Reflection  I-​Uen Wang Hwang  302
9 Music and shape in synaesthesia  306

Reflection  Timothy B. Layden  321

Reflection Stephen Hough  323
Reflection Alex Reuben  324
10 Intersecting shapes in music and in dance  328

Reflection  Richard G. Mitchell  351

PART 5  Shapes felt 

Reflection Julia Holter  357
11 Musical shape and feeling  359

Reflection David Amram  383
Reflection Antony Pitts  386

Notes  389
Index  397

Mordechai Adler graduated from Tel Aviv University in 2014 with a PhD in
musicology. His dissertation, ‘Cross-​modal correspondence and musical repre-
sentation’, combines empirical studies of cross-​modal perception with musical
analyses. Adler is currently developing a music education method using cross-​
modal correspondences.
David Amram is one of the most prolific and performed composers of his gen-
eration, and has left a unique mark on the world of music. He became the first
composer-​in-​residence with the New York Philharmonic in 1966 at the request
of Leonard Bernstein. At eighty-​six Amram continues to work as a classical
composer, multi-​instrumentalist, band leader, lecturer and guest conductor,
constantly composing as he tours the world.
Mark Applebaum is Associate Professor of Composition at Stanford
University. His solo, chamber, choral, orchestral, operatic and electroacoustic
work has been performed widely and includes notable commissions from the
Merce Cunningham Dance Company, the Fromm Foundation and the Vienna
Modern Festival. Many of his pieces challenge the conventional boundaries of
musical ontology. Applebaum is also an accomplished jazz pianist and builds
electroacoustic sound-​sculptures out of junk, hardware and found objects.
Max Baillie is a leading instrumentalist of his generation, equally at home on
both violin and viola. As a performer he has appeared on stages from Carnegie
Hall to Glastonbury and from Mali to Moscow in a diverse spectrum of styles
including classical, pop, folk and electronic music, alongside leading artists
from around the world. He plays principal viola in the Aurora Orchestra and is
part of a series of unique creative projects which go beyond the concert stage.
Philip Barnard  worked for the Medical Research Council’s Cognition and
Brain Sciences Unit (CBSU) in Cambridge from 1972 to 2011, where he car-
ried out research on how memory, attention, language, body states and emo-
tion work together. He is now retired but remains a visiting researcher with the
CBSU. Since 2003, he has been collaborating with Wayne McGregor | Random
Dance to develop productive synergies between choreographic processes and
our knowledge of cognitive neuroscience.
George Benjamin was born in 1960 and began composing at the age of seven.
After studying with Messiaen he worked with Alexander Goehr at King’s
x List of contributors

College, Cambridge. His Ringed by the Flat Horizon was performed at the BBC
Proms when he was just twenty. Written on Skin has been scheduled by numer-
ous international opera houses since its 2012 premiere in Aix. He regularly
conducts some of the world’s leading orchestras and since 2001 has been the
Henry Purcell Professor of Composition at King’s College London.
Malcolm Bilson has been a key contributor to the restoration of the fortepiano
to the concert stage and to fresh recordings of the ‘mainstream’ repertory. He
has recorded the Mozart piano concertos with John Eliot Gardiner and the
English Baroque Soloists, and the complete Mozart and Schubert solo sona-
tas. Bilson gives concerts, masterclasses and lectures around the world. He is a
member of the American Academy of Arts and Sciences and has an honorary
doctorate from Bard College.
Lucia D’Errico is an artist devoted to experimental music, performing on
plucked string instruments. As a performer and improviser, she collaborates
with contemporary music groups and with theatre, dance and visual art com-
panies. An artistic researcher at Orpheus Institute Ghent, she is part of the
ME21 research project. Her doctoral research (on the docARTES programme)
focuses on recomposing baroque music. She is also active as a freelance graphic
Scott deLahunta has worked as writer, researcher and organizer on a range of
international projects bringing performing arts with a focus on choreography
into conjunction with other disciplines and practices. He is currently Senior
Research Fellow at Deakin University (Australia) in partnership with Coventry
University (UK), R-​Research Director (on sabbatical) at Wayne McGregor |
Random Dance, and Director of Motion Bank/​The Forsythe Company.
Simon Desbruslais has an international reputation as a trumpet soloist, spe-
cializing in the performance of baroque and contemporary music. His solo
disc Contemporary British Trumpet Concertos on Signum Classics includes
new works written for him by John McCabe, Deborah Pritchard and Robert
Saxton. He is a lecturer in music at the University of Hull, and has taught at the
universities of Oxford, Bristol, Nottingham and Surrey. He is writing a mono-
graph on the music and music theory of Paul Hindemith, based on his doctoral
dissertation from Christ Church, Oxford.
Zohar Eitan is a professor of music theory and music cognition at the
Buchman-​Mehta School of Music, Tel Aviv University. His recent research
was published in Cognition, Journal of Experimental Psychology: Human
Perception and Performance, Experimental Psychology, Music Perception,
Psychology of Music, Musicae Scientiae, Empirical Musicology Review and
List of contributors xi

Alice Eldridge is a researcher, lecturer and cellist with interdisciplinary inter-

ests in biological systems and sound. She leads the Music Informatics Degree
at the University of Sussex, where she works across the creative arts, technol-
ogy and science. As a cellist she embraces collaboration and has performed
with a diverse array of personalities, including Steve Beresford, Russell Brand,
Icarus, Shih-​Yang Lee, Vagina Dentata Organ and Evan Parker. She is a mem-
ber of the London Improvisers’ Orchestra and a regular at John Russell’s Fete
Qua Qua.
Eugene Feygelson took a performance-​based PhD at King’s College London,
focusing on modes of nonverbal communication used in improvisatory con-
texts. His other interests include classical and contemporary improvisation,
music cognition, and relationships between music and language, as well as
music’s role in human evolution. Feygelson’s postgraduate research, including
his Master’s from the University of Cambridge, was supported by the Jack
Kent Cooke Foundation.
Evelyn Glennie was the first musician to create and sustain a career as a solo
percussionist. She has played around two thousand five hundred concerts in
more than fifty countries, recorded thirty albums and won three GRAMMY
awards. In 1990 she released her autobiography, Good Vibrations, and since
then has written several essays, has contributed to a variety of publications and
often publishes printed music. She continues to invest in realizing her vision: to
Teach the World to Listen.
Rolf Inge Godøy is Professor of Music Theory at the Department of Musicology,
University of Oslo. His main interest is in phenomenological approaches to
music theory, taking our subjective experiences of music as the point of depar-
ture for music theory. This work has been expanded to include research on
music-​related body motion in performance and listening, using various con-
ceptual and technological tools to explore the relationships between sound and
body motion in the experience of music.
Alinka E. Greasley is Lecturer in Music Psychology at the University of Leeds,
where she teaches music psychology at all levels and leads the MA Applied
Psychology of Music programme. Her research lies mainly within the field of
social psychology of music, and her interests focus on people’s experiences with
and uses of music in everyday life, including musical preferences, categoriza-
tion of musical genres, functions of music, listening behaviour, electronic dance
music culture and DJ performance practice.
Julia Holter is a musician from Los Angeles interested in songwriting, perform-
ing and various methods of recording. Her most recent recording was the stu-
dio album Loud City Song (2013) on Domino Records. Since the release of her
previous two albums, Tragedy (2011) and Ekstasis (2012), she has performed
xii List of contributors

at venues and festivals throughout Europe, North America, Lebanon and

Australia. She has had pieces commissioned by and/​or has performed with
ensembles such as the Los Angeles Philharmonic and Stargaze. She frequently
collaborates in group projects with artist and musician friends including Rick
Bahto, Ramona Gonzalez, Yelena Zhelezov, Laurel Halo, Mark So, Cat Lamb
and Laura Steenberge.
Stephen Hough is an English pianist with a catalogue of more than fifty CDs.
His iPad app ‘The Liszt Sonata’ was released by Touch Press in 2013. As a com-
poser, he has been commissioned by the Wigmore Hall, the Musée du Louvre,
the National Gallery (London), musicians of the Berliner Philharmoniker,
BBC Symphony Orchestra, Westminster Abbey and Westminster Cathedral.
He is on the faculty of the Juilliard School in New York and is a visiting profes-
sor at the Royal Academy of Music in London and the Royal Northern College
of Music in Manchester.
I-​Uen Wang Hwang moved from Tainan, Taiwan to the USA and earned her
PhD in music composition from the University of Pennsylvania (1998). Since
she is both a painter and a musician, a link between her music and art naturally
developed. She often paints to amplify her creativity while composing. The
Taiwan National Symphony Orchestra has commissioned three of her sympho-
nies, including Diptych of Taiwan (2010), which was included in the CD that
won Taiwan’s Golden Melody Award (2014) for best art music album.
Steven Isserlis is a British cellist who is acclaimed worldwide for his technique
and musicianship. He enjoys a distinguished career as a soloist, chamber musi-
cian, educator and author. While his extensive performing and recording career
takes up the majority of his time, he has also written two books for children
about the lives of the great composers, and he gives frequent masterclasses all
around the world. For the past seventeen years he has been Artistic Director of
the International Musicians’ Seminar at Prussia Cove in Cornwall.
Mats Küssner is Research Associate in the Department of Musicology and
Media Studies at Humboldt University, Berlin. In 2014–​ 15, he was Peter
Sowerby Research Associate in Performance Science at the Royal College of
Music. Küssner completed his PhD within the AHRC Research Centre for
Musical Performance as Creative Practice at King’s College London, investi-
gating embodied cross-​modal mappings of sound and music. In 2013, he and
Daniel Leech-Wilkinson co-edited a special issue of Empirical Musicology
Review on ‘Music and Shape’.
Timothy B. Layden was born in the USA but is currently living and working in
the UK. He studied fine art at the University of the Americas (Mexico), before
receiving a doctorate in fine art from the University of Barcelona in 2005. He
is an interdisciplinary artist working primarily with sound, image and text. He
List of contributors xiii

has been involved in diverse art and educational projects around the world.
Much of his work is inspired by his own experience of synaesthesia.
Daniel Leech-​Wilkinson studied at the Royal College of Music, King’s College
London and Clare College, Cambridge, becoming first a medievalist and then,
since c. 2000, specializing in the implications of early recordings, especially
in relation to music psychology and performance creativity. He led a project
on ‘Expressivity in Schubert song performance’ within the AHRC Research
Centre for the History and Analysis of Recorded Music (CHARM), followed
by ‘Shaping music in performance’ as part of the AHRC Research Centre for
Musical Performance as Creative Practice. Books include The Modern Invention
of Medieval Music (2002) and The Changing Sound of Music (2009).
Anna Meredith is a composer and performer of both acoustic and electronic
music. She has been Composer in Residence with the BBC Scottish Symphony
Orchestra, RPS/​PRS Composer in the House with Sinfonia ViVA, the classical
music representative for the 2009 South Bank Show Breakthrough Award and
winner of the 2010 Paul Hamlyn Award for Composers. HandsFree (2012), a
PRS/​RPS 20x12 Commission for the National Youth Orchestra, was performed
at the BBC Proms, Barbican Centre and Symphony Hall as well as numer-
ous flashmob performances around the UK. Her debut EP Black Prince Fury
was released on Moshi Moshi records to critical acclaim including Drowned in
Sound’s ‘Single of the Year’.
Milton Mermikides  is Lecturer in Music and Head of Composition at the
University of Surrey, Professor of Jazz Guitar at the Royal College of Music,
and deputy director of the International Guitar Research Centre. He is a com-
poser, guitarist and sound artist with a keen interest in a range of disciplines
including jazz, popular, electronic and ‘world’ music, improvisation, digital
technologies in analysis and creative practice, music perception, art/​science col-
laboration, and data sonification.
Richard G. Mitchell is a film composer. He graduated from Central Saint
Martins in fine art and film, where he composed for students at Saint Martins,
Royal College of Art and National Film School. His best-​known works are
A Good Woman (Scarlett Johansson, Helen Hunt), To Kill a King (Tim Roth,
Rupert Everett), and Grand Theft Parsons (Johnny Knoxville), and he has an
Ivor Novello Award for Trial by Fire, a Royal Television Society Award for the
BBC The Tenant of Wildfell Hall, and a Polish Academy Award for Günter
Grass’s The Call of the Toad.
Adam Ockelford is Director of the Applied Music Research Centre at the
University of Roehampton, UK. His research interests are in music psychol-
ogy, education, theory and aesthetics​(particularly special educational needs
and the development of exceptional abilities); learning, memory and creativity;
xiv List of contributors

the cognition of musical structure; and the construction of musical meaning.

Recent books include Applied Musicology: Using Zygonic Theory to Inform
Music Education, Therapy and Psychology Research, and Music, Language and
Autism: Exceptional Strategies for Exceptional Minds.
Antony Pitts is a composer, conductor, producer, and winner of the Prix Italia,
Cannes Classical and Radio Academy BT Awards. From Hampton Court
Chapel Royal treble, New College Oxford Academic Scholar and Honorary
Senior Scholar, TONUS PEREGRINUS founder-director, Royal Academy of
Music Senior Lecturer and BBC Radio 3 Senior Producer to Artistic Director
of The Song Company, he has made music at London’s Wigmore Hall and
Westminster Cathedral, Amsterdam’s Concertgebouw, Berlin’s Philharmonie
Kammermusiksaal, and the Sydney Opera House.
Helen M. Prior is a lecturer at the University of Hull. Her work on music
and shape began when she was a postdoctoral researcher within the AHRC
Research Centre for Musical Performance as Creative Practice at King’s
College London. She has interests in musical performance, music and emotion,
and music perception and familiarity.
Alex Reuben makes movies characterized by dance, music and environment. His
films are exhibited by Picturehouse and Curzon Cinemas. Routes—​Dancing to
New Orleans was selected in the ‘Top 20 Movies of the Decade’ (Geoff Andrew,
BFI/​Time Out). Reuben is a lecturer at Camberwell and Central Saint Martins.
He has been commissioned by Sadler’s Wells, Channel 4 TV, DanceDigital
and the BBC, with awards from The British Council and Jerwood Charitable
Foundation, and he is director of Cinderella (RockaFela), a movie about cog-
nition and movement, for the Wellcome Trust and Arts Council England.
Steve Savage is an active producer and recording engineer. He has been the
primary engineer on seven records that received GRAMMY nominations,
including CDs for Robert Cray, John Hammond, The Gospel Hummingbirds
and Elvin Bishop. Savage holds a PhD in music and teaches musicology at
San Francisco State University. He has several books that frame his work as
a researcher and as a practitioner, including his most recent work Mixing and
Mastering in the Box (2014).
Michael Spitzer is Professor of Music at the University of Liverpool, having
previously taught for many years at Durham University. Author of Metaphor
and Musical Thought (2004) and Music and Philosophy: Adorno and Beethoven’s
Late Style (2006), his research explores the interfaces between music theory,
aesthetics and psychology. He inaugurated the series of international confer-
ences on music and emotion at Durham in 2009, and is presently writing a
history of music and emotion.
List of contributors xv

Renee Timmers is Reader in Psychology of Music at the University of Sheffield,

where she directs the research centre Music, Mind, Machine in Sheffield. She
was trained in the Netherlands in musicology and psychology. She carried out
postdoctoral research at King’s College London, Northwestern University and
Radboud University Nijmegen, among others. Her main areas of research
include expressive timing in music performance, perception and expression of
emotion in music, and multimodal experiences of music.
Jamie Ward is Professor of Cognitive Neuroscience at the University of Sussex.
He has an MA in Natural Sciences from the University of Cambridge and a
PhD in Psychology from the University of Birmingham. He previously held a
faculty position at University College London. He is Co-Director of Sussex
Neuroscience and was Founding Editor of the journal Cognitive Neuroscience.
He has a particular research interest in synaesthesia and, more generally, in the
question of how information is integrated between the senses.


1.1 A pianola representation of the first eight bars of J. S. Bach’s Fugue

in C major, Well-​Tempered Clavier Book I  6
1.2 The spectrogram of a sustained deep C double bass tone (top) and
the spectrogram of the same tone passed through a time-​varying
wahwah filter (bottom)  17
1.3 The spectrogram of a distortion guitar sound with a downward
glissando followed by a slow upward expansion (top), and so-​called
sound-​tracings of this sound by nine listeners (bottom)  19
1.4 The score of the first two bars of the last movement of Beethoven’s Piano
Concerto No. 1 (top), and graphs showing the position, velocity and
acceleration of the vertical motion of the right-​hand knuckles, wrist
(RWRA) and elbow (RELB) in the performance of these two bars  23
1.5 The top part shows motiongrams (i.e. video-​based summary images
of motion trajectories; see Jensenius 2013 for details) of three
different successive dance performances by the same dancer to a
twenty-second excerpt from Lento from György Ligeti’s Ten Pieces
for Wind Quintet (Ligeti 1998), and the bottom part shows for the
purpose of reference three repetitions of the spectrograms of this
excerpt. 25
R.1 Schematization of bodily music-​shape forces  31
3.1 Mean weighted pitch (black line) and mean absolute pitch interval
(grey line) per two-​bar phrase  66
3.2 Mean intensity (left) and maximum intensity (right) per two-​bar
phrase for three performers. Intensity was measured from commercial
recordings combining the piano and the vocal line.  67
3.3 Average rhythmic durations (black line) of the vocal line and
standard deviation of rhythmic durations (grey line) within successive
two-​bar phrases  68
3.4 Median spectral centroid (Hz) per stanza for three performances of
Schubert’s ‘Die Stadt’  69
3.5 Normalized phrase duration of successive two-​bar phrases in the
performance by DFD, IB and TQ  77

List of illustrations xvii

R.2 Berg, Wozzeck, Act 3, bars 3–​7  91

R.3 Berg, Wozzeck, Act 3, bars 69–​71  92
R.4 Berg, Wozzeck, Act 3, bar 114  92
R.5 Berg, Wozzeck, Act 3, bar 220  92
R.6 Berg, Wozzeck, Act 3, bars 318–​21  93
R.7 Berg, Wozzeck, Act 3, bars 370–​71  93
R.8 Berg, Wozzeck, Act 3, bar 392  93
R.9 Berg, Wozzeck, Act 1, bar 717  94
R.10 Berg, Wozzeck, Act 2, bars 810–​12  94
R.11 Berg, Wozzeck, Act 3: harmonic connections  95
4.1 Bach, Sonata for Unaccompanied Violin No. 1 in G minor
(BWV 1001), Adagio, bars 1–​13  100
4.2 Vivaldi, Violin Concerto Op. 3 No. 6, Largo, bars 1–​6  105
4.3 Inflections of the fifth cycle  106
4.4 Tempo and dynamic map of Luca, bars 1–​13  110
4.5 Tempo and dynamic map of Perlman, bars 1–​13  111
4.6 Tempo and dynamic map of Kremer, bars 1–​13  114
4.7 Bach, Sonata for Unaccompanied Violin No. 1 in G minor
(BWV 1001), Fuga, bars 1–​4  118
4.8 Bach, Sonata for Unaccompanied Violin No. 1 in G minor
(BWV 1001), Siciliana, bars 1–​2  119
4.9 Bach, Sonata for Unaccompanied Violin No. 1 in G minor
(BWV 1001), Presto, bars 1–​11  120
4.10 (a) Hypermetrical reduction of Bach, Sonata for Unaccompanied
Violin No. 1 in G minor (BWV 1001), Presto, bars 1–​6; (b) metrical
reduction of bars 6–​8, revealing syncopation  121
5.1 Oboe and cor anglais duet from the third movement of Vaughan
Williams’ Fifth Symphony  131
5.2 Representation of primary interperspective relationships  132
5.3 Primary and secondary zygonic relationships  133
5.4 The image of a small black dot  135
5.5 Two small black dots  135
5.6 A primary interperspective relationship of location, whose value is
shown using Cartesian coordinates  135
5.7 A secondary zygonic relationship of location reflects the fact that
the difference in location between dots B and C is deemed to exist in
imitation of the difference between A and B.  136
5.8 Imitation of location at the tertiary zygonic level  137
5.9 The perceived orderliness inherent in a straight line modelled in
zygonic terms  137
5.10 One shape deemed to exist in imitation of another  138
xviii List of illustrations

5.11 Single interperspective values of difference cannot be imitated

between domains;​therefore, systematic mapping and iconic
representation in Peircean terms are not possible.  139
5.12 Domains whose perspective values are capable of conveying
a sense of size can bear cross-​modal imitation of ratios at
the secondary level and therefore have the capacity for iconic
representation. 140
5.13 Iconic representation of pitch in terms of location through
tertiary-level imitation  140
5.14 Example of the derivation of pitch through imitation of a ratio
between differences in location from a constellation in Stockhausen’s
Sternklang (1971)  143
5.15 Indirect connection between graphic and sound  144
5.16 An arbitrary shape is given meaning by convention.  145
5.17 The meaning of an arbitrary shape learned through imitation  146
5.18 Cross-​modal relationship engendered by pitch-​colour
synaesthesia 147
5.19 Taxonomy of the possible types of relationship between musical
sounds and visual images  149
5.20 A child’s transcription and performance of a rhythm  150
5.21 Regular cross-​modal mapping between sound and score, and score
and sound  151
5.22 A congenitally blind child’s representation of pitch glides on
German film  153
5.23 Cross-​modal imitation at the tertiary level assumed to underlie the
representation of a pitch glide as a straight diagonal line  154
5.24 Western staff notation embeds arbitrary symbols within a
semi-​regular framework of pitch and time.  155
5.25 Music Time in braille music notation (represented in print form),
with explanations of the signs  157
5.26 The fingering for the opening four chords of Music Time presented
using guitar chord symbols  158
5.27 The three semiotic processes at work as a guitarist performs from a
chord symbol  159
5.28 Fragment of Jamie Roberts’ synaesthetically derived score of
Jean-Michel Jarre’s Oxygène, track 4  159
5.29 Types of semiosis functioning in a fragment of Jamie Roberts’
synaesthetic score of Oxygène 160
R.12 Opening of the Prelude of Bach’s Cello Suite No. 5 in scordatura
notation 166
6.1 An illustration of a complex chains-​of-​thought improvisation
methodology 173
List of illustrations xix

6.2 An illustration of musical refractions. In the course of an

improvisation, a phrase is manipulated by the selection of one of
many transformational processes (1–​8 present a few of countless
possibilities). 176
6.3 Coexisting interpretations of Phrase α 178
6.4 Improvised continuations of Phrase α 180
6.5 An illustration of how the fixing and variation of musical topics may
forge improvisational continuations from Phrase α 181
6.6 Coltrane’s cube: some possible phrases of Coltrane’s
Acknowledgement plotted in the three-​dimensional musical space
of metric placement, rhythmic separation and chromatic
transposition, with a few coordinates illustrated with standard
notation 183
6.7 Phrase α existing at the centre of a three-​dimensional musical space
with variously proximate neighbouring phrases 185
6.8 An impression of M-​Space: phrase α sits at the centre of many
simultaneous dimensions of musical transformation.  186
6.9 A multi-​level depiction of Smith’s solo on The Sermon 188
6.10 Five improvisational structures: 1) ‘Nuclear’. 2) ‘Field Series’.
3) ‘Pivot’. 4) ‘Merged’. 5) ‘Unbounded’.  190
6.11 Opening section of Léonard’s cadenza (L1/​L2) and corresponding
sections from Beethoven’s Violin Concerto Op. 61 (B1/​B2)  196
6.12 Second section of Léonard’s cadenza (L3/​L4A/​L4B) and
corresponding sections from Beethoven’s Violin Concerto Op. 61
(B3/​B4)  197
6.13 Final section of Léonard’s cadenza (L5/​L6A/​L6B) and
corresponding sections from Beethoven’s Violin Concerto Op. 61
(B5/​B6)  199
6.14 Graphic representation of Léonard’s cadenza illustrating the
relationship of musical proximity to Beethoven’s original score  200
R.13 The opening of the Allemande from J. S. Bach’s Partita in D minor
for solo violin: (R.13a) as usual and (R.13b) upside down  209
R.14 Allemande, bars 1–​8, with a harmonic analysis of tonal centres and
harmonic rhythm  210
R.15 The passage in R.14 represented as a physical journey through space
between related tonal orbits  211
R.16 Allegro assai, bars 1–​8, from J. S. Bach’s Sonata in C major for
unaccompanied violin  212
R.17 The passage in R.16 showing the harmonic rhythm  213
R.18 The Allegro assai, bars 13–​16, showing melodic rhythm  214
7.1 Model of musical shaping. In the online version, each component is
numbered, and numbered examples of each component are presented
in linked tables.  222
xx List of illustrations

R.19 Bach, B minor Mass, Gloria II, bars 57–​61: a) Gesellschaft edition,
followed by b) a notated interpretation  244
R.20 Bach, B minor Mass, Cum sancto spiritu, bars 111–​17  245
R.21 J. S. Bach, Complete Trumpet Repertoire, Vol. III with my
annotations 245
R.22 Tchaikovsky, Swan Lake Suite Op. 20a, ‘Intrada’, rehearsal
mark 13 246
R.23 Pritchard, Skyspace (2012), third movement, notated for piccolo
trumpet in A, bars 1–​8  246
R.24 Beethoven, Piano Sonata in F minor, Op. 2 No. 1, first movement,
bars 1–9  249
R.25 Three-​dimensional mixing metaphor  279
R.26 The Metaphysics of Notation, panel 4  285
R.27 The Metaphysics of Notation, panel 4 close-​up: descending
‘shields’ 286
R.28 The Metaphysics of Notation, panel 4 close-​up: sinusoidal curve  287
R.29 The Metaphysics of Notation, panel 5  288
R.30 The Metaphysics of Notation, panel 5 close-​up: materialization of
rectilinear forms  289
R.31 The Metaphysics of Notation, panel 5 close-​up: contrasting materials,
‘heart guitar’ and canonic dots  291
R.32 The Metaphysics of Notation, panels 3, 4, 5, 6 and 7 in stacked
arrangement 293
R.33 The Metaphysics of Notation, close-​up: circle and oval pair inverted
across panels 3 and 4  294
R.34 The Metaphysics of Notation, close-​up: ‘scroll’ with number five
inverted across panels 3 and 4  295
R.35 The Metaphysics of Notation, close-​up: panels 4 and 5 inverted
shields, connection to the ‘heart guitar’  296
R.36 The Metaphysics of Notation, close-​up: panels 5, 6 and 7 dangling
angles, chain of circles, dot clock  297
R.37 The Metaphysics of Notation, panels 9 & 10 in stacked
arrangement 299
R.38 Watercolour paintings: (a) Red and White and (b) Fireworks   304
9.1 RP’s synaesthetic experience to overtone singing by Wolfgang
Saus 315
R.39 Timothy B. Layden, Dark Glistening. 322
10.1 Selected illustrations for the productions of (a) ATOMOS,
(b) ENTITY and (c) UNDANCE for Wayne McGregor | Random
Dance 330
10.2 (a) Still from video annotating form and flow in Forsythe’s
One Flat Thing; (b) Difference forms in movement viewed from
above 331
List of illustrations xxi

10.3 Relationships and representations that bridge sources of inspiration

and a finished work in contemporary dance  334
10.4 Examples of deep patterning in multimodal fusion  340
10.5 A core mammalian mental architecture with four subsystems, each
with three components (image, memory and processes)  341
10.6 Interacting cognitive subsystems: a nine-​subsystem architecture for
the human mind  343
10.7 Extracts from the Mind and Movement educational resource that
illustrate (a) the development of imagery based upon musical stimuli
and (b) the translation of that imagery into innovative movement
material 346


0.1 Historical examples of the use of shape  xxvi

3.1 Original text and English translation of ‘Am fernen Horizonte’  62
3.2 ‘Die Stadt’, stanza 1 versus 2: contrasting and parallel
dimensions  63
3.3 ‘Die Stadt’, stanza 1 versus 3: contrasting and parallel
dimensions  63
3.4 Recorded performances of ‘Die Stadt’ by Fischer-Dieskau, Bostridge
and Quasthoff  75
3.5 Partial correlations between pitch and intensity after correction for
correlations with dynamic indications in the score (N = 12)  76
3.6 Correlations of duration with phrase intensity and forte indication
(N = 12)  78
7.1 Participants in the interview study  218
7.2 Participants who discussed each musical level  223
7.3 Participants who discussed each trigger  225
7.4 Participants who discussed each technical modification  227
7.5 Participants who discussed each heuristic  229
7.6 Differences between Elsie’s two versions of the extract  232
8.1 Number of popular musicians in Prior (2012b) who played each
genre of music   255
8.2 Layers of shaping in popular music performances  271
9.1 The ‘tone shapes’ reported by Zigler (1930)  309
11.1 Some of the synonyms for ‘shape’ collected for Prior (2010)  361
11.2 Associations reported in Eitan and Granot (2006)  362
11.3 Highest-scoring results from Eitan and Timmers (2010; Table 1), with
a proposed environmental cause for the participants’ preference  374

Oxford University Press has created a password-​protected website to accom-

pany  Music and Shape,  which  contains additional illustrations, including all
the book’s colour illustrations, sound files, videos and excepts from interviews.
Examples available online are indicated in the text with Oxford’s symbol .
Anna Meredith, I-​Uen Wang Hwang and Timothy B. Layden have all pro-
vided artwork and corresponding sound files for the compositions they dis-
cuss in their ‘Reflections’ on shape. Adam Ockelford, Lucia D’Errico and Max
Baillie have used colour to clarify some of their illustrations. Zohar Eitan,
Renee Timmers and Mordechai Adler provide a score of Schubert’s ‘Die
Stadt’, discussed in their chapter. Helen M. Prior provides numerous additional
figures, together with extracts (in the Tables) from her interviews with musi-
cians that illustrate each component of the model of musical shaping she finds
that they use. Malcolm Bilson provides, over and above his Reflection, a video
of a full-​length lecture he gave at the Liszt Academy, Budapest, on the topics
discussed here.
All these items enrich a reading of Music and Shape.​us/​musicandshape

Daniel Leech-​Wilkinson and Helen M. Prior

There can be no doubt that concepts of shape are ubiquitous in musical

discourse and music cognition: we use innumerable shape-​related
metaphors for most (if not all) features of music such as dynamics,
timbre, harmony, pitch, contour, rhythm, texture, tempo, timing,
expressivity and affective qualities. Also, we encounter shapes in
various music-​related images such as in graphical scores, composers’
sketches, music analysis illustrations, as well as in more directly signal-​
based shape images as waveforms and spectrograms, and last but not
least, as shape images of music-​related body motion. We could thus
speak of widespread and deep-​rooted shape cognition in music.
—Godøy (2013: 223)

‘Music and what?!’ people have tended to ask us. But, as Godøy’s remarks sug-
gest, that puzzlement is not shared by musicians: they always seem to know
what we’re referring to.1 In a sense it’s that discrepancy that inspired this book
and the research project from which it has emerged. For although the connec-
tion between shape and sound may seem mystifying to others, Prior (2012)
finds that professional musicians use ‘shape’ to talk and think about how to
perform notes, phrases, melodic lines, melodic patterns, harmonic features,
harmonic patterns, rhythms, movements, compositions, changes in loudness,
tempo and expression; and this applies in classical music, jazz, folk, pop, rock,
urban, world musics and crossover, and for people who originated from thirty-​
one countries, 43 per cent of them fluent in a language other than English.
Moreover, for speakers of languages which do not use a simple equivalent to
‘shape’ in discussion of music, the concept was nevertheless immediately recog-
nized from their own musical discourse. The use of the term is also not merely
a current ‘fashion’: there is evidence to show its use by composers, performers
and critics throughout the twentieth century and to some degree earlier (see
Table 0.1).2 Evidently, shape is a concept that is flexible, ubiquitous and very
useful when thinking and speaking about performance and composition.
With so many and such varied uses, it cannot just be visual or tactile shape
that we are dealing with. Shape must be doing much broader metaphorical
work, transferring into different, less tangible domains including time, quan-
tity, intensity, complexity, speed and emotional response, at least.3 One way of xxv
looking at this is to say that shape means so many things in relation to music
xxvi Preface

TABLE 0.1   Historical examples of the use of shape

Shape as W. R. Anderson (critic) ‘Caractacus, the composer’s Op. 35 (Leeds, 1898:

form or dedicated to Queen Victoria), immediately
structure precedes the Variations, the Sea Pictures, and
Gerontius, and looks strongly onward from
the earlier cantatas, both in shape and idiom.’
(Anderson 1934: 396)
Benjamin Britten (composer/​ ‘I never, never, start a work without having a
performer) very, very, clear conception of what that work is
going to be. Err… When I say conception, I don’t
mean, necessarily tunes, or specific rhythms, or
harmonies, or … old fashioned things like that,
but I mean the actual … shape of the music,
the kind of music it’s going to be, rather than the
actual notes.’
‘I know that the first drafts for The Turn of the Screw
were in what one would call then the normal three-​
act form … and … even I  think, the libretto was
written in that shape. ’ (Britten and Mitchell 1969)
Fryderyk Chopin (composer/​ ‘Chopin … is at the piano and does not observe
performer) that we are listening to him. He improvises as if
haphazardly. He stops. “What’s this, what’s this?”
exclaims Delacroix, “you haven’t finished it!”
“It hasn’t begun. Nothing’s coming to me. …
Nothing but reflections, shadows, shapes that
won’t settle. I’m looking for the colour, but I can’t
even find the outline.” ’ (George Sand, in Eigeldinger
1997: 240)
Lang Lang (performer) ‘When you’re talking about Mr Barenboim, he can
really bring the knowledge, the structure, how to
put every element into one big shape.’ (Barenboim
Claus-​Steffen Mahnkopf, Frank ‘While in common-​practice music concepts
Cox and Wolfram Schurig such as theme or motive, phenomena such as
(composers/​musicologists) line or melody and the systems of syntax and
rhythm are generally taken to be self-​explanatory,
the question concerning corresponding means
in post-​traditional music (i.e. new music since
1945)—​that is, sufficient to shaping the musical
surface in a potentially meaningful manner—​is
rarely reflected upon and more commonly
suppressed.’ (Mahnkopf, Cox and Schurig 2004: 7)
Anthony Marwood (performer) ‘I didn’t have any influence over the structure or
shape of the piece, but I know that he [Thomas
Adès] had my playing in mind when he wrote it.’
(Anonymous 2008: 15)
Michael Quinn (critic) ‘The Delmé find the shape, structure and even the
nobility of Haydn’s Emperor but the detail seems
sadly lacking.’ (Quinn 1999: 72)
Stephen Plaistow (critic) ‘… his feeling for the shape of a Bach fugue, and
for part playing and the character and brilliance
of Bachian figuration, is full of finesse, quite the
equal of any Bach specialist’s.’ (Plaistow 1965: 114)

Preface xxvii

TABLE 0.1  Continued

Shape in Nalen Anthoni (critic) ‘The music breathes a life of its own as he ardently
relation to inflects its phrases to shape the tension of his line.’
musical (Anthoni 2008: 65)
expression Dietrich Fischer-​Dieskau ‘Shape the endings of the long phrases in the
(performer) recitative in a way that the conductor can
easily follow you.’ (Dietrich Fischer-​Dieskau in
Monsaingeon 1992)
Trevor Harvey (critic) ‘The orchestral playing is not just good, it is really
outstanding: the conductor knows how to give
us flexible and shapely phrases as well as tightly
rhythmic music.’ (Harvey 1954: 59)
Rachel Podger (performer) ‘With Vivaldi there are so many possibilities to
shape the music.’ (Podger 2003: 15)
Stephen Plaistow (critic) ‘Richter doesn’t shape the actual subjects in
the fugues very much, preferring to state them
flatly and to let the counterpoint achieve its own
expressiveness.’ (Plaistow 1965: 114)
Alec Robertson (critic) ‘It is a pity this artist has so little feeling for the
shape of a phrase.’ (Robertson 1947: 165)
Stanley Sadie (critic/​musicologist) ‘Another thing Podger is specially good at is the
shaping of those numerous passages of Vivaldian
sequences, which can be drearily predictable,
but aren’t so here because she knows just how to
control the rhythmic tension and time the climax
and resolution with logic and force.’
(Sadie 2003: 51)
Shape in Aaron Cassidy (composer/​ ‘… the notion that the primary morphological
relation to musicologist) unit—​not only in my music but also in music in
movement general—​is not merely the aural gesture, but far
or gesture more importantly, the physical gesture. I would
assert that the shapes and local forms that we
hear and process as listeners are at their core
the byproducts of physical, visceral activities and
energies, and, further, that the physical motion
required to create a particular sound or set of
sounds is the most important component of a
gesture’s morphological identity.’ (Cassidy 2004: 34)

that it in effect means nothing at all. But that kind of throwing up of hands in
despair doesn’t lead to very penetrating scholarship; and in any case, its very
imprecision may prove to be its raison d’être. Better, then, to approach shape as
a concept with some unusual and intriguing properties, and to try to find out
what those might be and what they might suggest about its place in the brain’s
responses to music.
It was with this aim as an ideal, albeit one we could not hope to realize,
that we planned and carried through a three-​year research project (2009–​12) on
music and shape, funded by the UK’s Arts and Humanities Research Council
within its Research Centre for Musical Performance as Creative Practice.4 In the
event we managed to continue for a further two years, since there was so much
to do and King’s College London continued to provide support. This book
contains some of the results of that project work (the chapters by Küssner,
xxviii Preface

by Leech-​Wilkinson and by Prior). But mainly it consists of contributions by

scholars not involved with the project whose work seemed to us to be dealing
with topics in which, our research suggested, shape might be implicated. This
then is not a conference proceedings. We did hold a highly interesting and fruit-
ful conference on ‘Music & Shape’ in London in July 2013, the result of an
open call for papers, and the studies we attracted are published in three special
issues (forming volume 8)  of the journal Empirical Musicology Review. The
chapters in this book, however, were commissioned separately, the choice of
authors reflecting the research areas that seemed most crucial to those engaged
in the music-​and-​shape project. Authors’ home disciplines include music psy-
chology, music analysis, music therapy, musicology, performance (jazz, clas-
sical and DJ), synaesthesia, and dance (scholarship and performance). That
every author, although none had discussed shape before, found it quite easy to
see how their work might contribute, only confirms the flexibility and ubiquity
of shape as a concept that does some useful work for those who try to under-
stand music and musical practice.
As well as commissioning the eleven chapters, we followed the example set
by the Cambridge Companion to Recorded Music (one of the publications from
our predecessor project, the Centre for the History and Analysis of Recorded
Music).5 There we had included ‘personal takes’ by a wide variety of artists in
different areas linked to recordings. For this volume, we commissioned ‘shape
reflections’ from a similarly wide spread of music practitioners. We approached
a range of performers (wind, strings, keyboards, percussion, guitar) and com-
posers (classical, film, graphic, jazz, popular)—​two of whom were also notable
conductors—​as well as a record producer, a music painter and a synaesthete.
Their instruction was simply to tell us how they used ‘shape’ in their own musi-
cal work and thinking, or to reject it as a useful concept if in fact they didn’t use
it (none took up that last option).
Performers’ thought about music-​making has not always been well under-
stood by musicology, despite being at the heart of much ethnomusicology,
and more recently music sociology and psychology. At its best (for example,
Berliner’s 1994 ethnography of jazz practice), studies of performers’ expe-
rience of what they do can illuminate a whole world of music-​making. In
a previous study of violinists and harpsichordists (Leech-​Wilkinson and
Prior 2014)—​developed here in Chapter 8 on DJs’ practices by Greasley and
Prior—​we argued that the way musicians talk about performing details in
scores, however approximate it may look to music theory, reveals a highly
efficient means of enabling the body to generate expressive performances in
real time. The Reflections here offer similar evidence over a wider field. Of
the heuristics we identified, shape proves to be one of the most powerful, for
it summarizes, with a generality that allows it to be implemented and enacted
in a great many ways, the essential characteristics of a ‘musical’ performance.
As Eitan, Timmers and Adler conclude in their Chapter 3, shape is a concept
that is sufficiently flexible to map between domains on any hierarchical level
Preface xxix

from a single note to a whole piece of music; it can apply to scores, perfor-
mances and listening experiences, and within those to such varied features as
narrative structure, form, loudness, brightness, tempo, speed, density, register,
intensity, harmonic or interval patterning, pitch direction, sound spectrum,
distance and timbre. As such, it acts as a highly efficient synthesizing tool for
musicians to use in order to negotiate the vast array of musical choices avail-
able to them in performance.
Shape’s flexibility and usefulness are just as clear from the range of other views
that this book offers. In Adam Ockelford’s Chapter 5, shape is seen as a core
property of music that links together its notation, its audible features and our
cognition of musical structure. In Michael Spitzer’s Chapter 4, a sonata by Bach
is compared ‘both to the shape of particular emotional behaviours and to the
expressive shapings of a formal model’ as well as ‘performance styles of “expres-
siveness” ’. For Milton Mermikides and Eugene Feygelson, writing about impro-
visation in Chapter 6, shaping processes are conceived of as strategies through
which material is selected and transformed within musical space. For Philip
Barnard and Scott deLahunta, in Chapter 10, ‘shape’ is a useful concept for
dancers and choreographers not just to describe bodily configuration and move-
ment but also ‘to index the more ineffable meanings and relationships that are
intuited to “make sense” in an artistic context’. For Rolf Inge Gødoy (Chapter 1),
‘shape-​cognition in music is opening up new areas of musicological, aesthetic
and affective psychological research, as well as providing practical tools in artis-
tic creation, for example in the domains of sonic design and various kinds of
multimedia art’. For the synaesthetes with whom Jamie Ward works, shapes not
only are a means of conceptualizing complex interactions of musical features and
the feelings they seem to trigger, but are experienced ‘at multiple levels in music:
from single notes through to whole compositions and performances’ (Chapter 9)
as sensations automatically generated by hearing music. Among the various
aspects of shape our practitioners discuss in their Reflections, George Benjamin
mentions shape especially in relation to form, Malcolm Bilson to performance
style, Stephen Hough to composition style, Timothy B. Layden to visual impres-
sions, Lucia D’Errico to bodily sensation, Alex Reuben to body movement,
Alice Eldridge to both gesture and visual representations, Richard G. Mitchell
to emotional change, Evelyn Glennie to dynamics (in the fluid sense), David
Amram to musical character, I-​Uen Wang Hwang to rhythm and metre, Max
Baillie to harmony, Simon Desbruslais to timbre, Steven Isserlis to narrative,
Steve Savage to sonic landscape, Antony Pitts to initial inspiration, and Julia
Holter to closure. It goes without saying that all of these factors could be written
about separately and in much greater depth, and indeed they have been. But the
point is not that ‘shape’ could always be replaced by a more precise term—​one
which varies according to the context in which shape is being used. Rather, what
we need to ask is why shape is so useful in the sample of contexts discussed here,
and by implication in so many others; and why it is so much more useful than
the more precise term that might pin it down in each case.
xxx Preface

Concepts very like the ‘synthesizing’ notion we discuss here have been
invoked in the past. Mine Doğantan-​Dack has summarized this interestingly in
her essay in the Empirical Musicology Review volume mentioned above:
Christian von Ehrenfels, who is best-​known today for his article titled ‘Über
Gestaltqualitäten’, i.e. ‘On Gestalt Qualities’, … published in 1890 …
argued that each experience we have of a Gestalt or form in any sensory
modality is cognized as structurally analogous to the experience of a spa-
tial shape. In other words, spatial Gestalten serve in his view as references
for our comprehension of forms in other modalities. An immediate impli-
cation of this idea is that concepts related to the perception of spatial
shapes can be applied to shapes extended in time—​for instance, tonal
patterns. Indeed, the idea that there are similarities of form between dif-
ferent fields of experience is one of the most important conclusions of
Ehrenfels’ article. (Doğantan-​Dack 2013: 213–​14)

Jin Hyun Kim, in her article in the same collection, notes that:

Delineating the causal relation between bodily aroused states and vocal-
izations, [Friedrich von] Hausegger discusses dynamic forms of sound,
which are experienced as an expression of mental states, in his seminal
monograph Music as Expression (Die Musik als Ausdruck) (1887). …
Hausegger contends that shaped vocal sounds are not only experienced
as expressions of others’ aroused states, but also give rise to the ‘co-​
sense (Mitempfindung)’ of arousal (p. 42). He also considers this kind of
phenomenon in the context of non-​sentient phenomena such as music
and dance.
In the monograph Shaping and Movement in Music (Gestaltung und
Bewegung in der Musik), [Alexander] Truslit (1938) tackles the coupled
relationship between the shaping of music and musical experience. The
shaping of music is regarded as fundamental to the musical experience,
which takes place during both music-​making and music perception; the
latter is characterized by the listeners’ ‘co-​shaping (mitgestalten)’ of music
(Truslit 1938, p. 20) through their inward experience of movement (p. 27).
Basing the shaping of a sound on its duration and intensity, Truslit con-
ceives of movement as the primordial element being shaped. Movement in
music is shaped by dynamics—​gradations of sound intensity changing the
volume of sound as perceived—​in conjunction with agogics—​temporal
changes of sound causing its deceleration or acceleration within the given
overall temporal structure—​resulting together in spatio-​temporal con-
tours of music. According to Truslit, dynamics and agogics act as funda-
mentals of the process of musical shaping. (Kim 2013: 164–​5)

Yet neither of these studies was followed up at the time, probably because they
had no points of contact with contemporary musicology, whose concern above
Preface xxxi

all was to present music as a subject for historical and textual study. Closest
in the intervening years, as Doğantan-​Dack points out, was Susanne Langer
(1942), whose interest in how music feels brings some of her work into the same
orbit. And indeed, shape’s re-​emergence recently can be understood as part of
a growing interest in those musical practices and responses that draw on feeling
more than on thinking; this is a result of the increasing focus of music stud-
ies on emotion, enabled by the development of music psychology and neuro-
science. In this context, Kim, Doğantan-​Dack and Leech-​Wilkinson (this last
in our volume) have all (independently) pointed to child-​psychiatrist Daniel
Stern’s work on vitality affects (2010), which in a sense (though unknown to
Stern) extends Truslit’s work, as a valuable theoretical base for understanding
musical shape. Interrelations with other work are suggested, too, by Godøy in
the continuation of the quotation that begins this Introduction:
We could thus speak of widespread and deep-​rooted shape cognition in
music, as well as in human reasoning in general, as suggested by some
directions in the cognitive sciences, foremost by so-​called morphodynami-
cal theory and so-​called cognitive linguistics. (Godøy 2013: 223)

Much relevant work has been done by researchers studying music and gesture,
outstandingly Godøy himself and Marc Leman (Leman 2007; Godøy and
Leman 2010). Gesture clearly implies shape: it is often considered as includ-
ing performers’ executive and expressive movements—that is, how they move
while they play—but it has also been used extensively to talk about habits in
the forming or performing of short sequences of notes (Gritten and King 2006,
2011). Yet while gesture is closely tied to indicative human movement, shape
seems more abstract and thus more flexible in its application to musical and
other kinds of action.
Another difficulty is hinted at in Leech-​Wilkinson’s chapter, where the pos-
sibility is raised that a sense of ‘shape’ arises from a submodal feature common
to all the sense modalities. This extends beyond the cross-​domain mapping
that several chapters (Ockelford’s; Eitan, Timmers and Adler’s; and Spitzer’s
in particular) see as crucial to shape’s multiple applications. A submodal role
for shape might explain why our understanding of what shape refers to in
music is at once so multifaceted and so hazy, and why it may always remain
so. For submodal features, as pointed out by Marks (1978), are necessarily
beyond conscious perception: they are components in our sensory experience
but not accessible to consciousness directly through the senses. Alex Reuben’s
impressionistic Reflection on his work as a filmmaker may well be pointing
towards this aspect of shape: in using shape to link feelings in different senses
and art forms, he is not being merely touchy-​feely but may be drawing on the
submodal qualities of shape (operating in the recently discovered domain of
multisensory perception) as an aspect of the dynamics of all sensory experi­
ence. What previously seemed fanciful now is beginning to seem simply
xxxii Preface

correct: feelings aroused by one sense can be linked by the brain to feelings
aroused by others, so that input in one mode can be read in terms of the
impressions arising from others—and not just for synaesthetes, in fact partic-
ularly not for synaesthetes since for them the effect is fixed whereas for others
it varies with context. Synaesthetes, nevertheless, offer particularly interesting
insights into musical shape. For, as Jamie Ward has shown, their experiences,
though remarkably varied, still make better sense to nonsynaesthetes than
artificial alternatives.
It looks, then, as if the kind of work that ‘shape’ does for musicians draws
on some quite fundamental aspects of perception, while at the same time offer-
ing us a host of ways of thinking about the experience and practice of music
on many other levels. The chapters and Reflections are interleaved and ordered
so as to emphasize interconnections. While they are grouped thematically into
five sections—​shapes mapped, composed, performed, seen and felt—​there is
also a gradual shift of theme so that the borders between sections are fuzzy. To
read from cover to cover, then, should be to take a journey through views of
music and shape. Most contributions speak of multiple facets of this complex
relationship, however: other orderings are possible, and dipping in and out will
often make further connections apparent.

Anderson, W. R., 1934: record review of HMV DB 2142, The Gramophone 11 (no. 130): 396.
Anonymous, 2008: interview with Anthony Marwood, Gramophone 85 (no. 1029): 15.
Anthoni, N., 2008: review of Arkiv Production 477 7371–​2, Gramophone 86 (no. 1035): 65.
Barenboim, D., 2005: ‘Barenboim on Beethoven: masterclasses’ (EMI DVD 68993).
Berliner, P. F., 1994: Thinking in Jazz: The Infinite Art of Improvisation (Chicago and
London: University of Chicago Press).
Britten, B. and D. Mitchell, 1969: ‘Benjamin Britten in conversation with Donald Mitchell’,
CD booklet accompanying BBC Legends: Britten Mozart Requiem (BBCL 4119–​2).
Cameron, L., 2010:  ‘What is metaphor and why does it matter?’, in L. Cameron and
R. Maslen, eds., Metaphor Analysis:  Research Practice in Applied Linguistics, Social
Sciences and the Humanities (London: Equinox), pp. 3–​25.
Cassidy, A., 2004:  ‘Performative physicality and choreography as morphological deter-
minants’, in C.-​ S. Mahnkopf, F. Cox and W. Schurig, eds., Musical Morphology
(Hofheim: Wolke Verlag), pp. 34–​51.
Doğantan-​Dack, M., 2013: ‘Tonality: the shape of affect’, Empirical Musicology Review 8/​
3–​4: 208–​18.
Ehrenfels, C. von, 1890: ‘Über “Gestaltqualitäten” ’, Vierteljahrsschrift für wissenschaftliche
Philosophie 14: 242–​92.
Eigeldinger, J.-​J., 1997: ‘Chopin and “La note bleue”: an interpretation of the Prelude Op.
45’, Music & Letters 78/​2: 233–​53.
Preface xxxiii

Gibbs, R. W., 2008: ‘Metaphor and thought: the state of the art’, in R. W. Gibbs, ed., The
Cambridge Handbook of Metaphor and Thought (Cambridge:  Cambridge University
Press), pp. 3–​13.
Godøy, R. I., 2013: ‘Shape cognition and temporal, instrumental and cognitive constraints
on tonality. Public peer review of “Tonality: the shape of affect” by Mine Doğantan-​
Dack’, Empirical Musicology Review 8/​3–​4: 223–​6.
Godøy, R. I. and M. Leman, 2010:  Musical Gestures:  Sound, Movement, and Meaning
(New York: Routledge).
Gritten, A. and E. King, eds., 2006: Music and Gesture (Aldershot: Ashgate).
Gritten, A. and E. King, eds., 2011:  New Perspectives on Music and Gesture (Aldershot:
Harvey, T., 1954: record review of Decca LW 5114, The Gramophone 32 (no. 374): 59.
Kim, J. H., 2013:  ‘Shaping and co-​shaping forms of vitality in music:  beyond cognitiv-
ist and emotivist approaches to musical expressiveness’, Empirical Musicology Review
8/​3–​4: 162–​73.
Langer, S., 1942: Philosophy in a New Key (Cambridge, MA: Harvard University Press).
Leech-​Wilkinson, D. and H. M. Prior, 2014:  ‘Heuristics for expressive perfor-
mance’, in D. Fabian, R. Timmers and E. Schubert, eds., Expressiveness in Music
Performance:  Empirical Approaches across Styles and Cultures (Oxford:  Oxford
University Press), pp. 34–​57.
Leman, M., 2007:  Embodied Music Cognition and Mediation Technology (Cambridge,
MA: MIT Press).
Mahnkopf, C.-​S., F. Cox and W. Schurig, 2004:  Musical Morphology (Hofheim:  Wolke
Marks, L. E., 1978:  The Unity of the Senses:  Interrelations among the Modalities
(New York: Academic Press).
Monsaingeon, B., 1992: The Mastersinger—​Lesson III (EMI DVB 3101949).
Plaistow, S., 1965: review of Deutsche Grammophon (S)LPM18950, The Gramophone 43
(no. 507): 114.
Podger, R., 2003:  ‘A question to … Rachel Podger, Baroque violinist’, Gramophone 80
(no. 966): 15.
Prior, H. M., 2012: ‘Shaping music in performance: report for questionnaire participants
(revised August 2012)’, http://​​wp-​content/​uploads/​2015/​09/​Prior_​
Report.pdf (accessed 9 April 2017).
Quinn, M., 1999: review of Droffig National Trust NTCC014, Gramophone 76 (no. 914): 72.
Robertson, A., 1947: record review of Decca M 602, The Gramophone 24 (no. 287): 165.
Rothfarb, L., 2001: ‘Energetics’, in T. Christensen, ed., The Cambridge History of Western
Music Theory (Cambridge: Cambridge University Press), pp. 927–​55.
Sadie, S., 2003: review of Channel Classics CCS15958, Gramophone 80 (no. 966): 51.
Stern, D., 2010: Forms of Vitality: Exploring Dynamic Experience in Psychology, the Arts,
Psychotherapy, and Development (Oxford: Oxford University Press).
Truslit, A., 1938: Gestaltung und Bewegung in der Musik (Berlin: Chr. Friedrich Vieweg).

Shapes mapped
Evelyn Glennie, percussionist

The shape of music is constantly fluid because nothing resonates the same
twice. Every sound and shape is born and reborn. When music is printed on the
page it takes shape in my imagination with the eye leading the way.
As a performer, the environment is my instrument and percussion instru-
ments are my tools to deliver the sound. I can provide all the musical ingredi-
ents for the environment I am immersed in. The acoustic will mould the sound
meal which is thus delivered to the audience. The members of the audience will
have differing perspectives on the sound and shape according to where they are
situated and their emotional state at the time.
Listening is ever-​present, recognizing that the body is a huge ear that allows
us to experience the sensation of the sound journey, reached far beyond the
capacity of the ear alone. That in turn creates the fluid shapes in music.


Key-​postures, trajectories and sonic shapes

Rolf Inge Godøy

It seems that we come across expressions of shape everywhere in music-​

related contexts. When talking about music, people—​with or without musi-
cal training—​often tend to use shape metaphors such as ‘thin’, ‘fat’, ‘smooth’,
‘rough’, ‘curved’, ‘flat’ etc., or when listening to music, people often tend to
trace shapes with their hands or other body parts, shapes that reflect sonic fea-
tures of the music. And needless to say, the body motion of musicians and
dancers in performance can be perceived as shapes, as can music notation
and graphical scores (see Ockelford, Chapter  5 below), in more recent times
extended to signal-​based graphical representations of musical sound as wave-
forms and spectrograms (see Greasley and Prior, Chapter 8 below).
The ubiquity of shape expressions in music-​related contexts seems to be
spontaneous and robust, as well as quite practical, when we talk about music.
But on reflection, the widespread use of shape metaphors and other shape
­representations is also enigmatic for the simple reason that audible sound is basi-
cally invisible (unless we use some technology for sound visualization), whereas
‘shape’ is primarily something in the visual domain. ‘Shape’ is defined in the
New Oxford American Dictionary as ‘the external form, contours, or outline of
someone or something’ and as ‘a geometric figure such as a square, triangle, or
rectangle’, yet it can also have more indirect or conceptual visual-​geometric sig-
nifications such as ‘the specified condition of someone or something’. Although
this and similar definitions of ‘shape’ may include such general and nonvisual
applications of the term, the question remains as to why and how we so readily
link sonic features with visual shape representations in musical experience.
This is even more enigmatic considering that music unfolds in time, and so
shape by definition is something that we overview or ‘have in the field of vision’
as an ‘all-​at-​once’ experience, and hence is something ‘instantaneous’ in our
minds (at least subjectively, although there is a time-​dependent scanning and
4 mental processing going on in the perception and cognition of visual images).
Key-postures, trajectories and sonic shapes 5

How this ‘temporal-​to-​atemporal’ transformation in our minds works still

seems not to be well understood in the relevant cognitive sciences, but from our
own and others’ research, we believe the linking of sonic features and visual
shape images has much to do with experiences of music-​related body motion.
In what can be broadly called a motor theory perspective on music perception,
it seems that body postures at salient moments in sound production (both
instrumental and vocal), what we call key-​postures, and body motion trajecto-
ries between these key-​postures, relate to subjectively perceived sonic shapes, as
suggested by the title of this chapter.
The basic tenet of this chapter is therefore that shape in music-​related con-
texts is closely related to experiences of something that we do or mentally simu-
late that we do; so after an introductory presentation of some main notions
of shape in musical experience and music-​related research this chapter goes
on to develop some ideas of motor cognition in music. Relevant elements of
research on music-​related body motion are reviewed, including various kinds
of the sound-​producing body motion of musicians and sound-​accompanying
body motion that we can observe in music listening. A  central issue in this
connection is an assessment of the correspondences between body motion fea-
tures and sonic features: rhythm and pitch contours are often seen to be clearly
reflected in body motion, but other features such as texture, timbre, dynamics
and a number of so-​called expressive features may all be related variably to
body motion and thus also to shape images.
One crucial issue in such a listing of links between body motion and sonic
features is that of timescales: in listening, either to a short tune or to a more
extended work of music, we need to segment sound and associated body motion
into meaningful chunks that enable more specific determinations of shape.
Various instrumental, biomechanical, cognitive and musical-​ aesthetic con-
straints seem to converge in suggesting that we experience fragments of music
at what we call the meso timescale, very approximately in durations ranging
from 0.5 to 5 seconds, as particularly salient with regard to both body motion
and sonic features. After a presentation of relevant research on this phenom­
enon of chunking and motion-​sound shapes at the meso timescale, the chapter
concludes with some ideas on how principles of key-​postures, trajectories and
sonic shapes may be put to use in music-​related research and practical contexts.

Shape representations

Needless to say, music and shape is a very extensive topic, with ramifications to
most areas of music and music-​related research. Yet out of all this material, it
could be useful to take a brief look at some aspects of western music notation
and more recent instances of shape representations in musical research, to bet-
ter situate our motor theory perspective on shape in music.
6 Music and Shape

FIGURE 1.1   A pianola representation of the first eight bars of J. S. Bach’s Fugue in C major, Well-​
Tempered Clavier Book I. This representation highlights the gradually expanding pitch space,
fanning out to several octaves from the initial middle C. The shape of this pitch space expansion is
one of the main architectural elements here (as well as in the rest of J. S. Bach’s works and much other
music for that matter); however, the timescale of this kind of shape is rather slow, i.e. is on what we
call the macro timescale (see p. 14 below).

For one thing, western common practice notation, as well as recent exten-
sions such as MIDI pianola representation (Figure 1.1), can partly be regarded
as a kind of choreographic script, a system for denoting sound-​producing body
motion to be realized by performers. Trained score readers may readily see corre-
spondences between the graphical shapes in the score, the required motion shapes
of the performers and the emergent sonic shapes, in particular as pitch contours
and rhythmical-​textural shapes. In other cases, there may be less c​ lear relation-
ships between visible shapes in the score and subjectively perceived sonic shapes;
e.g. a tamtam strike may be indicated in the score as a single onset point in time,
perhaps with some dynamic marking and indication of the type of mallet to be
used, yet the result in performance is a protracted and extremely complex sound.
Evidently, timbral features are in general not well represented in western
notation because of its focus on pitch and duration. And as we know, this focus
has tended to leave expressive features of pitch, dynamics and timing outside
the mainstream conceptual apparatus, relegating these to the domain of perfor-
mance practice, a focus that has led to problems when attempting to represent
music of other cultures by western music notation transcriptions. But within
this pitch-​and duration-​focused western musical culture, we have also seen
some further abstractions from perceptual features, such as at times disregard-
ing octave placement, equating for instance an octave-​compressed chord with
a widely spaced chord.1 Similar distortions of perceptually salient pitch shape,
and also of rhythm–​shape relationships, are found in twentieth-​century serial
and integral serial music, as well as in so-​called pitch-​class set theory, effectively
resulting in what could be called a ‘spatiotemporal collapse’ of salient percep-
tual features (see Godøy 1997 for a discussion of this).
Key-postures, trajectories and sonic shapes 7

In the twentieth century, however, we have also seen attempts to develop

more graphical and shape-​reflecting representations, such as the Schillinger
system (Sethares 2007) or various kinds of graphical scores, such as those of
Cage, Ligeti, Bussotti, Logothetis and others (Schäffer 1976). One of the most
important music-​and-​shape efforts of the twentieth century is in the work
of Xenakis, for example in his development of connections between musical
and architectural shapes such as in his well-​known composition Metastaseis
and later design of the Philips Pavilion at the Brussels World’s Fair in 1958
(Xenakis 1992).
Since the advent of sound-​analysis technologies, we have had the means for
signal-​based representations of musical sound as shapes. An early and remark-
able effort in this direction of visualizing the shapes of actual sonic unfold-
ing of music was the work of Cogan (1984), and in the ensuing decades we
have seen a great expansion of signal-​based representations of music in the
domain of so-​called Music Information Retrieval (MIR). MIR is actually a
matter of going in the opposite direction from western music notation: instead
of making continuous sound from discrete symbols, it tries to extract the dis-
crete pitches and durations from continuous, complex and, we could say, often
messy signals. Confronted with continuous musical sound, we soon realize that
the great difficulty in MIR in making computer-​based transcription of music
(in particular of polyphonic music) is that human listening, including shape
perception in music, although seemingly versatile and robust, is dependent on a
number of perceptual cues in combination with extensive prior knowledge and
mental schemas, hence on something that has yet to be implemented in MIR
As the universe of continuous sound has been opened up to explorations
by signal processing technologies, in principle giving us access to the above-­
mentioned timbral and expressive features, we also need to develop a concep-
tual apparatus for handling these features (see e.g. Peeters et al. 2011). One
pioneering research effort based on continuous sound was that of Pierre
Schaeffer and co-​workers (Schaeffer 1966, [1967] 1998; Chion 1983; Godøy
1997, 2006). The point of departure for Schaeffer and co-​workers was to take
the subjectively perceived overall pitch, dynamic and timbre-​related shapes of
sound fragments, of so-​called sonic objects, as the point of departure, and then
successively to differentiate more and more subfeatures as shapes, only at a
later stage trying to correlate these subjectively perceived shape features with
physical features of the acoustic signal.
After this pioneering work of Schaeffer and co-​workers, there have been
some related projects of exploring musical sound by way of subjective shape
metaphors, for example the Unités Sémiotiques Temporelles (UST) project
(Delalande et  al. 1996), which is more oriented towards affective features of
sonic objects. The common point of departure for Schaeffer and the UST
project was the idea that although western musical culture has been good at
8 Music and Shape

conceptualizing features that can be ordered into more abstract symbolic sys-
tems such as those of pitch and duration, it has not been well suited to con-
ceptualizing more continuous, composite and multidimensional features. In
assessing the work of Schaeffer and followers, we find the idea of using various
shape images as a nonsymbolic means for feature representation to have been
an attractive solution, something that we now see has an affinity with body
motion (Godøy 2006).

Shape ontologies

Shape in musical contexts is a multimodal phenomenon because it involves

sound and vision and—​our main concern here—​also the sense of motion.
Multimodality has in recent years received a lot of attention in the cognitive
sciences, and ‘classical’ notions of the separation of the senses have been chal-
lenged. There is now mounting evidence that the sense modalities work together
and complement one another, sometimes even with one sense modality over-
riding another, resulting in what may be judged as illusions, as in the ‘McGurk
effect’ where visual impressions of a speaker’s mouth motion can change the
subjective interpretation of the sound heard (McGurk and MacDonald 1976).
The sense of motion is now regarded as composite, including kinematic (vis-
ible motion), effort (dynamic, not directly visible), proprioceptive (self monitor-
ing) and haptic (sense of touch) components. Additionally, musical sound is
obviously highly composite and multidimensional, with many features in paral-
lel. This means that we need to be sufficiently precise about which features we
have in mind when we discuss shape in musical contexts so that we do not make
so-​called category mistakes, mixing incommensurable features. We thus need to
consider shape ontologies, carefully analysing what features of musical sound
and/​or body motion we are referring to, and also whether some instances of
shape can be considered amodal, i.e. more independent of a specific modality
and applicable across modalities.
Considering shape ontologies also means trying to distinguish what is in the
signal (auditory, visual, haptic, etc.) and what is in our minds, regarding men-
tal shape images as just as salient as more physical shapes, provided that these
mental shape images are shared by people. This should mean in turn that we
treat illusions on an equal footing with the ‘real’, as long as they are subjectively
experienced as relevant for experiences of shape in music, as in the well-​known
illusions of endless ascending or descending sounds by Jean-​Claude Risset,
similar to M. C. Escher’s optical illusions of endless ascending or descending
staircases. The dividing line is to be placed between subjectively comparable
and incomparable features, meaning that there should be a perceivable similar­
ity between two domains, as is the case with this endless decent or ascent in
Risset and Escher, making auditory and visual shape sensations ontologically
Key-postures, trajectories and sonic shapes 9

commensurable. On the other hand, abstractions based on western music nota-

tion may lead to category mistakes, for example by transferring numerical fea-
tures from one domain to another without reflecting perceivable similarities.
The risk of making such category mistakes is also present in technology-​
based shape applications, in so-​called sonifications of data from nonauditory
sources, converting a visual domain image to sound. We could use the term
‘mapping’, well known in music technology contexts (Hunt, Wanderley and
Paradis 2003) for keeping track of shape ontologies. Basically, ‘mapping’ means
taking data from one domain and assigning it to features in another domain.
For instance, stock exchange data could be used to control pitch on a musical
instrument so that we could listen to the development of the stock market as
a melodic curve. Or we could use a stream of video or data from other sensors
in mapping body motion to sound generation in various ways, and so listen
to body motion (Jensenius and Godøy 2013). Or we could take a picture of a
cat and use this picture as a spectrogram for generating a sound. The extent
to which the resultant sound would have any ‘cat’-​like perceptual features is
doubtful: we could probably call this cat sonification a case of category mistake
in shape mapping, in principle similar to the ontological mismatching of shape,
mentioned above, that we may find in music using western common practice
Mapping is at the core of all electronic instrument development, and given
the fact that any mapping between input data and sound output is possible
with electronic instruments, the crucial question concerns what kinds of map-
pings make sense to, or could be called ‘intuitive’ by, musicians and audience.
This is a question that can be studied empirically, as has been done in some
recent research projects (see Jensenius 2007 and Nymoen 2013 for overviews).
From this research as well as numerous informal observations over decades of
development in the field of new electronic instruments, the prime candidate
for shape transfer from one domain to another is our sense of body motion,
meaning the mapping of motion along axes in three-​dimensional space to vari-
ous perceptually salient sonic feature dimensions, typically pitch, loudness and
spectral centroid.

Shape cognition

Findings in a number of domains seem to converge in suggesting that notions

of shape are fundamental to much (and perhaps most) human cognition and
behaviour. This means that we should also consider some principles of gen-
eral, amodal shape cognition, as these may be useful when we migrate across
modalities and features as we do here in the context of music and shape.
Providing an ‘all-​at-​once’ overview image of whatever we perceive or think
about is both the prime attribute of shape cognition and its prime advantage,
10 Music and Shape

as well as its challenge, in our context: if we do not somehow have such over-
views of lived experience and are just submerged in a continuous stream of
sensations we will not be able to make sense of the world in general or of music
in particular, as was pointed out by Edmund Husserl more than a century ago
(Husserl 1991). To Husserl, it was obvious that we need to interrupt the con-
tinuous stream of sensations from time to time, and make overview images of
whatever is being perceived, by a series of intermittent ‘now-​points’ (Godøy
2010b). Shape cognition could then be defined as our capacity to capture and
handle the ephemeral and temporally distributed features of music, as well as
other lived experience. And with presently available methods and technologies
for recording and processing both sound and body motion, we have the pos-
sibility of ‘freezing’ transient sound and motion and examining them at leisure
as shapes.
Historically, one of the first and most extensive projects on shape cognition
originated in music with gestalt theory in the last decades of the nineteenth cen-
tury (Smith 1988), with, among other things, a focus on how shapes emerge and
are conserved across different instances, such as melodies across various instru-
mental or vocal guises. Gestalt theory was later extended to other domains,
and is now often primarily associated with the visual. The remarkable insights
of early gestalt theory concerning coherence criteria in shape cognition still
have validity today, both in auditory perception (Bregman 1990) and in human
motor control (Klapp and Jagacinski 2011).
But one of the most extensive recent research efforts on shape cognition is
no doubt that of so-​called morphodynamical theory (Thom 1983; Petitot 1985,
1990; Godøy 1997). The gist of morphodynamical theory is that human per-
ception, understanding and reasoning are based on ordering sensory input as
shapes, or in the words of René Thom, the leading figure of this theory, ‘the
first objective is to characterise a phenomenon as shape, as a ‘spatial’ shape. To
understand means first of all to geometrise’ (1983: 6).2
Of interest here is the morphodynamical distinction between the ‘control
space’ and the ‘morphology space’, meaning a distinction between the input
and the perceived results of any generative model (Petitot 1990), be that in
physics, biology, behavioural sciences or other domains such as musical sound.
It is always the perceived shapes—​the features of the morphology space—​that
are of interest for us here in musical contexts, and the distinction between con-
trol and morphology spaces helps us to determine what are ontologically com-
parable features and avoid various mapping mismatches or category mistakes
as mentioned above.
The distinction between control and morphology spaces is particularly use-
ful for exploring categorical thresholds between shapes. This means making
systematic explorations of perceived shapes by generating incrementally differ-
ent variants through what is often called analysis-​by-​synthesis (Risset 1991).
A  simple but important example of this is the distinguishing of ‘percussive’
Key-postures, trajectories and sonic shapes 11

and ‘bowed’ sounds by the steepness of the attack segment at the beginning
of the sounds:  with a very short attack we get the subjective sensation of a
percussive sound, and when gradually increasing the duration of the attack,
we sooner or later get a ‘bowed’ sound sensation. In other words: we explore
the thresholds between these two sound categories (features in the morphology
space) by incrementally varying the duration of the attack segment (a value in
the control space).
The analysis-​ by-​synthesis approach enables exploration of perceptually
salient features by comparing incrementally different variants along several
feature axes, for example combining the incremental attack dimension (‘sharp-
ness’) with feature dimensions for spectral centroid (a measure of ‘brightness’
in timbre perception) in a two-​dimensional analysis-​by-​synthesis exploration.
The analysis-​by-​synthesis approach is actually what people practise in music
production contexts, tweaking the buttons for equalizing, reverberation or
other kinds of effects processing in the mixing studio, or adjusting drum mem-
branes, instrument and microphone placement, and so on in the recording
room, until the ‘right’ sound is found. When individual musicians or conduc-
tors repeatedly try out versions of singular sounds or phrases until they find the
sonic expression they are searching for, they too are practising an analysis-​by-​
synthesis approach.
In summary, the analysis-​by-​synthesis approach is holistic in the sense of
allowing us to evaluate perceptual features of a whole chunk of sound, and in a
way it also bridges the symbolic-​to-​subsymbolic divide, which in the terminol-
ogy of Schaeffer is called the abstract–​concrete divide: singular values along an
axis (or a scale) are abstract, whereas sonic objects with multiple features that
are holistically perceived as shapes are concrete (Schaeffer 1966; Chion 1983).
Related to analysis-​by-​synthesis is the idea of blending two shapes, an idea
that has become popularly known as the ‘morphing’ of visual images (such as
human faces) or of sounds, the latter case also being known as cross-​synthesis.
There are various signal processing models for this, but it is also possible to
generate a series of incremental variants, say between sound A and sound B,
and explore the categorical threshold between the two.
The inherent challenge with such variant shape methods is that interesting
shapes are multidimensional:  they usually cannot be characterized by only
one value axis so choices have to be made as to what aspect(s) are focused on.
The same goes for similarity ratings of differing shapes: Which part of the
shapes are we comparing, or are we going for a more global or cumulative
similarity judgement? In their pioneering work on categorization, Eleanor
Rosch and colleagues suggested that categories may be strongly linked with
motor schemas (Rosch et al. 1976): thus the category ‘chair’ may be difficult
to define from construction features alone (there are too many variants of
design, for example from rococo to modern) but is easier to categorize as
something to sit on.
12 Music and Shape

In sum, we can see that shape cognition in music, as well as in general, has to
do with features and categorical thresholds, and that the shape of body motion
can be an important part of understanding categories and shapes in music. It
follows that motor theory should be a part of shape cognition in music.

Motor theory

One leading idea in several domains of the cognitive sciences during the last
three decades has been to regard human cognition as rooted in bodily experi-
ence, as what has been broadly called embodied cognition. An essential fea-
ture of embodied cognition is that perception, thinking and understanding are
all related to mental simulation of body motion, meaning that we mentally
imitate the actions that we believe are the cause of what we perceive or that
actively trace one or more features of what we perceive. As suggested by Alain
Berthoz, with reference to Cézanne, seeing is a matter not just of passively tak-
ing in visual information with our eyes, but also of mentally tracing the outline
of what we are seeing, as if we are ‘touching’ whatever we see with our gaze
(Berthoz 1997).
In the case of spoken language, this means mentally simulating articula-
tory motions of the vocal apparatus when we listen to speech, and in the case
of music, mentally simulating the sound-​producing body motions we believe
musicians are making:  hearing ferocious drumming, we might imagine ener-
getic hand motions, while hearing soft string music, we might imagine slow and
protracted bowing motion. Such triggering of sound-​producing images in lis-
tening means that we associate the shape of the sound-​producing body motion
with the shape of the sound that we hear.
This theory of associations of sounds that we hear (or merely imagine) with
some kind of sound-​producing body motion is known as the motor theory
of perception, sometimes referred to in the plural—​‘motor theories’—​because
several versions have been proposed. Originating in the 1960s in linguistics
(Liberman and Mattingly 1985), this can now be regarded as a more general
theory, also including other areas of human cognition (Galantucci, Fowler
and Turvey 2006). The gist of motor theory is that perception is production in
reverse, meaning that when we listen, we project motor images onto what we
are hearing and use these motor images as mental schemas to make sense out
of what we are hearing.
Motor theory concerns learning and expertise: if we are familiar with a lan-
guage or type of music, we probably know in more detail what body motion
goes into producing the sound; yet we may also have sketchier or vaguer motor
images of sounds we are not so familiar with. Although I myself speak nei-
ther Korean nor Polish, I believe I can distinguish these two languages by what
I perceive as their respective required phonological gestures. Having some
Key-postures, trajectories and sonic shapes 13

approximate image of sound production, which we have called motormimetic

sketching, is better than having no image at all, and we believe this applies to
music as we have found in our observation studies of so-​called air instrument
performance (Godøy, Haga and Jensenius 2006). In these studies, we found that
most listeners, including those with no musical training, seemed able to repro-
duce sound-​producing body motion that fitted the music they heard, reflecting
in their air performances overall pitch and rhythmic features and, more vari-
ably so, details of musical textures and articulations. We can see manifestations
of the motor theory in other cases of imitative behaviour, such as in scat sing-
ing and in beatboxing, with some people demonstrating a truly astonishing
capacity to imitate nonvocal sounds with their vocal apparatus.
One crucial feature of motor theory as applied to music is that all sonic
events are included in some kind of body-motion trajectory, trajectories that
will typically start before the onset of the sound(s), encompass the sound(s) and
often continue after the sound(s), for example moving the hand/​mallet towards
the drum, making an impact on the drum membrane, moving the hand/​mallet
back to the initial position. In the motor theory perspective, this drumming
body-​motion shape will contribute to our shape images of drum sound: in the
words of Berthoz (1997), ‘Perception is simulated action’. Although we now see
increasing support for the motor theory perspective on perception from brain
observation studies as well as from behavioural studies, we still have several
challenges in finding out more about the links between sound and body motion
in perception of the various musical features, something that we believe also
necessitates an exploration of the timescales at work in musical experience.

Shape timescales

The basic tenet of this chapter is that most features of music, ranging from
low-​level acoustic and body-motion features to high-​level affective and aes-
thetic features, are time-​dependent, yet can also be thought of as shapes. Shape
images are in a sense ‘outside time’, to use the expression of Xenakis (1992):
they are ‘snapshots’ of what has unfolded or is about to unfold in time. This
raises issues of continuity versus discontinuity in musical experience (and other
time-​related experiences for that matter), issues much focused on by philoso-
phers and psychologists in the nineteenth and twentieth centuries, in particu-
lar by Husserl as mentioned above (see Godøy 2008, 2010b, 2011, 2013). One
approach to this enigma of the temporal versus the atemporal may be to look
at constraints at work in our perception of sound and motion, in particular to
try to single out qualitative differences at the various timescales involved here.
As we know, human hearing is situated in the region of approximately 20
to 20,000 Hz (for healthy young people), with a threshold at around 20 Hz
for fused versus distinct features. This means that the timescale above 20 Hz
14 Music and Shape

is mostly concerned with shapes of frequency relationships, meaning spectral

or formantic shapes (such as vowels and other stationary tone colour compo-
nents) and pitch relationships (intervals), whereas the timescale below 20 Hz is
concerned with all the other shape features of music and music-​related body
motion. But there are also some qualitative timescale thresholds at work in
the region below 20 Hz. In our own research we have found it useful to discern
three main timescales that apply to both sonic and body-motion features:
Micro timescale features: basically stationary or continuous features
of sound and motion: stationary pitch, loudness and timbre, and the
corresponding stationary postures, and continuous, smooth body
motions as in sustained bowing or blowing. This will include what is
often referred to as ‘sound’ in popular music research, meaning the
overall subjective impression, readily recognized in even very short
fragments of music (Gjerdingen and Perrott 2008).
Meso timescale features: sound and motion features typically
unfolding in approximately the 0.5 to 5 seconds duration-​range and
holistically perceived as motion-​sound chunks, in the same duration-​
range as the so-​called ‘sonic objects’ of Schaeffer and co-​workers.
The meso timescale is usually sufficient for perceiving most salient
sonic features such as rhythmical and textural patterns, melodic,
harmonic and modal/​tonal features, as well as expressive, style,
genre, overall aesthetic and affective features, and the corresponding
body-motion features. A number of research findings converge on
the meso timescale as the most important in several areas of human
cognition, in particular that motion duration, attention spans, short-​
term memory and meaning formation all seem to be attuned to this
timescale (see Godøy 2013 for an overview).
Macro timescale features: the timescale of sections, movements,
whole works and various other long-​lasting music-​related events.
The perceptual workings of the macro timescale seem not to be
well researched, but we would hypothesize that it also concerns the
overlap and/​or lingering memories of successive chunks from the
meso timescale.

Although these three timescales coexist in musical experience, it is possible

to zoom in and out of various timescales, intentionally shifting our atten-
tion, something designated by Schaeffer as the ‘context-​contexture’ perspec-
tives (Schaeffer 1966; Chion 1983). This means that any sonic object may be
included in some larger-​scale context yet  also have its own internal context
called ‘contexture’. The essential principle for Schaeffer and for us here is that
at all these timescales we can conceptualize features as shapes.
However, of these three main timescales, the meso timescale is clearly the
most important when it comes to the experience of various salient musical
Key-postures, trajectories and sonic shapes 15

features, as mentioned above. Furthermore, the most important attribute of

the meso timescale here is that motion-​sound chunks are holistically perceived,
that they are somehow kept in consciousness as whole units; because of this,
the prime sources of shape cognition in music are ‘instantaneous’ overview
images of both sound and body motion.

Sonic features

Seeing evidence from various research fields converging on the importance of

the meso timescale, we might find it useful to take a closer look at sonic features
at this level. Inspired by the work of Schaeffer and co-​workers on sonic objects,
we adopt a subjective-​perceptual top–​down approach of listening and differen-
tiating sonic features at this meso timescale. This method originated in the early
days of musique concrète, when, for practical reasons, composers used looped
sound fragments on phonograph discs, called sillon fermé (‘closed groove’), in
the mixing of sounds when composing electroacoustic music. With repeated
listening to these looped sound fragments, Schaeffer and co-​workers noticed
that their attention shifted from the everyday significations of the sound frag-
ments to the more subjectively perceived overall shapes of the sounds. This led
to developing a scheme for classifying the sonic objects, called the typology of
sonic objects, by their overall dynamic shapes and their overall pitch-​related
shapes. The three main dynamic shapes are as follows:
Sustained: a protracted sound, such as in bowing and blowing
Impulsive: a short sound with a sharp attack as in percussive and
plucked sounds
Iterative: a sound with rapid fluctuations such as in a tremolo

These three main types have clear correlates in body motion: the sustained
sonic objects imply a continuous transfer of energy from the body, hence a
continuous effort such as bowing or blowing; the impulsive implies an abrupt,
discontinuous type of body motion, so-​called ballistic motion, as in hitting
or kicking; and the iterative implies a rapid back-​and-​forth or shaking body
Furthermore, there are categorical thresholds in this typology, and we can
explore these thresholds by producing incremental variants as presented ear-
lier. If a sustained sound is shortened below a certain duration threshold, it
will be perceived as an impulsive sound, and conversely, if an impulsive sound
is extended beyond a certain duration threshold, it will be perceived as a sus-
tained sound. Likewise, if an iterative sound is slowed down to a certain rate,
it will turn into a series of distinct impulsive sounds, and conversely, if a series
of distinct impulsive sounds is accelerated beyond a certain rate, it will change
into an iterative sound. As we shall see later, these category changes are related
16 Music and Shape

to so-​called phase-​transitions in body motion: changes in the morphology space

resulting from incremental changes in the control space, as we would say in
morphodynamical theory.
In the sonic object typology, there is furthermore an analogous coarse clas-
sification of the overall pitch-​related features of a sonic object:
• Definite pitch: more or less stationary throughout the sonic object
• Complex pitch: inharmonic or various noise band sounds
• Variable pitch: pitch changing in the course of the sonic object, for
example by glissando

These two typological classifications (dynamic-​related and pitch-​related) were

combined into a 3 x 3 matrix, and could be applied as a first and coarse, yet very
useful, classification of overall sonic features as shapes. Other criteria were also
added to this rudimentary typology, and zooming into the micro features of the
sound, we could then elaborate a classificatory scheme called the ‘morphology
of sonic objects’.
In the morphology of sonic objects there is a similar top–​down shape-​related
classification of sonic features, including perceptually salient spectral features,
both quasi-​stationary and more fluctuating, as well as profiles, rate, ampli-
tude and patterns of these fluctuations. Prominent morphological features are
found in the so-​called grain and gait (‘allure’ in French, sometimes rendered in
English as ‘motion’, but also as ‘allure’) categories, where grain denotes vari-
ous fast fluctuations in the sound (of pitch, dynamics, timbre) and gait slower
fluctuations. As an example of this, consider the burring of a deep double bass
sound, readily evoking the metaphor of a kind of grain surface, and a slower
gait such as in the opening and closing of a wahwah mute as we can see in the
spectrogram representations in Figure 1.2.
The various morphology features in turn have further qualifications, denot-
ing the amplitude, rate, regularity and so on of grain- or gait-type fluctua-
tions, all the time with clearly shape-​related labels.3 Importantly, what we see in
Schaeffer’s classificatory scheme is an attempt to single out and give names to
previously unnamed, yet perceptually salient, features of musical sound, some-
thing that is still an important challenge for psychoacoustic research (Peeters
et al. 2011).
Furthermore, advances in signal-​based music research of the last couple
of decades have enabled research on various expressive features, both at the
subnote level and at the supranote level, in turn enabling research on musical
performance with two shape-​related sonic features (see Goebl et al. 2006 for an
1. Timing/​groove, including tempo curves as shapes
2. Expressivity, representing various minute inflections as shapes
Key-postures, trajectories and sonic shapes 17

Frequency (Hz)

0 4.026
Time (s)

Frequency (Hz)

0 4.026
Time (s)

FIGURE 1.2   The spectrogram of a sustained deep C double bass tone (top) and the spectrogram of
the same tone passed through a time-​varying wahwah filter (bottom). The double bass tone has a
distinct burring sound, what could be referred to as a grain morphology feature in the terminology of
Schaeffer (1966), and the wahwah filtered version of this double bass tone has additionally a slower
open-​close-​open-​close etc., a gait (or allure) morphology feature in the terminology of Schaeffer
(1966). At two different timescales, both grain and gait are clearly body-motion shape-​related features,
i.e. grain making a fast shaking motion and gait making a slower opening and closing motion (cf. the
onomatopoetic associations of opening/​closing the mouth in pronouncing ‘wahwah’).

Needless to say, we also often find uses of shape expressions designating more
traditional western music-​theory-​related sonic features in innumerable writings
on musical analysis, such as:
• Melodic features, such as contours, various kinds of patterns
• Harmonic features, both single chords and composite chord
• Modality, not as abstract pitch space (or scales) but as shapes of
interval constellations, referred to as ‘physiognomy’ by Lutosławski
(Norwald 1969)
• Rhythmical patterns and textures as shapes

In summary, we could say that most sonic features of musical experience could
be represented as a shape, bearing in mind the idea presented earlier that shape
is a fundamental cognitive strategy for making sense of the world. Yet there are
also a number of sonic features that are so close to body-​motion features, as is
the case for rhythm and texture, that we need to have a look at what is sound
and what is body motion here.
18 Music and Shape

Body-motion features

We can observe a great variety of music-​related body motion in dance, concert

and everyday listening situations, and in the course of several years of inter-
national research collaboration in this area we have come to suggest a basic
classification scheme for music-​related body motion (Godøy and Leman 2010):
• Sound-​producing body motion, related to all the sonic features
mentioned above, but more specifically excitatory, meaning energy
transfer from the body to the instrument (including the vocal
apparatus) such as in bowing, blowing, hitting and stroking; and
modulatory, meaning changing the effects of the energy transfer, such
as left-​hand finger motion on string instruments and mute opening
or closing on brass instruments. There are also various types of
ancillary body motion here, to avoid fatigue or strain injury, to help in
articulation and expressivity, to communicate with other musicians,
or to make theatrical impressions on the audience. Although not
strictly sound-​producing, conducting could also be included here
because of its role in guiding the musicians by beating time signatures,
and by postures, facial expressions and various motion trajectories,
expressing sonic features as shapes. Eminently shape-​related is also
sol-​fa and chironomy (in Jewish and Christian sacred music), and
other kinds of gestural visualization of musical features used in
various improvisational contexts.
• Sound-​accompanying body motion includes all kinds of body motion

that listeners make to music, such as in dancing, walking, nodding
and gesticulating. Common to all sound-​accompanying body motion
is that it is somehow related to one or more perceived sonic features
such as the predominant beat or melodic contour of the music.
Although we may see differing sound-​accompanying body motions
made to the same music, so that the music has multiple gestural
affordances (Godøy 2010a), there is often a clear reflection of the
overall subjectively perceived energy of the music in the body motion,
as we can see in Figure 1.5.

There will be overlaps in many (perhaps most) cases between these catego-
ries, meaning that music-​related body motion will also often be multifunc-
tional: some motion by a musician may, for instance, be both sound-​producing
and communicative, such as an upward hand motion to prepare a fortissimo
chord on the keyboard, at the same time serving as an upbeat signal to the
other musicians, in addition to demonstrating a high level of energy to the
Besides observing body motion in performance that is not strictly sound-​
­producing, we may also readily observe body motion imitating sound-producing
Key-postures, trajectories and sonic shapes 19

when people listen to music, something that we have seen in so-​called sound-​
tracing, when listeners spontaneously draw (on a digital tablet or in the air) the
shape of sounds that they hear, an example of which can be seen in Figure 1.3.
More extensive study of sound-​tracings, including statistical processing of cor-
relations between tracings and sound features, suggests that pitch contours are
quite robustly perceived as shapes, but also that dynamic and timbral features

Frequency (Hz)

0 4.841
Time (s)

FIGURE 1.3   The spectrogram of a distortion guitar sound with a downward glissando followed by a
slow upward expansion (top), and so-​called sound-​tracings of this sound by nine listeners (bottom).
The sound-​tracings were made on a digital tablet by the listeners immediately after hearing the sound
for the first time, and should reflect something of how they spontaneously perceived the overall shape
of this sound.
20 Music and Shape

may likewise be spontaneously traced as shapes as long as there is not too much
competition between the features (Nymoen 2013).
The point here is that listening to or imagining music activates mental
images of some kind of music-​related body-motion, and that these images are
one of the main sources for shape concepts in musical experience. Taking the
consequences of such close links between sonic and body-motion features, the
question arises as to the true nature of musical features such as rhythmical and
textural patterns: Are they sonic or body-motion patterns? For instance, is a
dance pattern (waltz, tango, samba) a sonic or body-motion pattern? Similarly,
is chunking in music based on sonic cues (sometimes referred to as qualitative
discontinuities in the sound) or on body-motion patterns? Our understanding
is that music includes both sonic and body-motion features, and that these fea-
tures are united in multimodal shape images although they actually emerge
from various constraints at work in the production of musical sound.

Constraint-​based shapes

The fact that before the advent of electronic music technology music tradition-
ally was made by body motion in interaction with physical instruments or the
human vocal apparatus means that, in addition to body-motion constraints,
various instrument constraints imposed by physics are reflected in the resultant
sonic shapes. Observing that musical expression is ‘on top of’ instrumental and
body-motion constraints by no means diminishes the endless volitional expres-
sive capacities of music, but it should remind us to take various constraints on
sound ​production into account when we talk about shape in music.
To begin with, musical instruments have constraints, both in the mode of
excitation and in the subsequent energy dissipation: hitting a metal plate with
a hammer is an impulsive type of body motion, resulting in a sound with short
attack followed by a long decay. The perceived sonic shape is constrained here
by the size, shape and material of the metal plate and the hammer, and by the
force and duration of the impact. Instrumental and vocal sounds typically
have such overall envelope shapes, but may also have various internal textural
features as a direct physical response to excitations, for example the rough or
grainy sound of a deep double bass (bearing in mind the presentation of grain
earlier), or the hollow smooth sound of a high harmonic (flageolet) tone on
a violin.
In our music and shape context, it is interesting to consider so-​called physi-
cal model sound synthesis as a way of thinking that takes physical constraints
into account, such as in a mathematical model that simulates the physical exci-
tation and resonance features of ‘real’ instruments or the human voice where
the resultant sonic shapes are constrained by the physical parameters of the
model. The point is that the behaviour of the physical model results in ‘real
Key-postures, trajectories and sonic shapes 21

world’ emergent sonic shapes, fitting with our ecological schemas of how sound
unfolds, in contrast to an abstract synthesis model such as additive synthesis,
where in principle any number of sinusoid components, with any frequency,
duration, fluctuations and so on, may be combined, and where there is really
no connection to the outside world except via those images we might project
onto the sound from previous experiences of similar features, by what is called
‘anthropomorphic projection’.
Instrumental or vocal performances in turn have their sets of constraints,
not just those we typically associate with different instruments—​their idioms
or clichés (the things that are easy to play and sound well on an instrument)—​
but also more general body-motion constraints that we believe contribute to
the shape of musical sound. Body-motion constraints, both biomechanical and
more neurocognitive (sometimes difficult to tell apart), effectively limit possible
body-motion range, speed and duration, and also necessitate rests and shifts
in posture to avoid fatigue and/​or strain injury. Also, the fact that all human
body motion takes time, because it is not possible to move instantly from one
position to another, means that there always will be transition time between
positions. This in turn means that music-​related body motion is continuous
(although it may at times appear as abrupt) and hence may result in fusion or
contextual smearing of otherwise singular sound onsets, apparent as so-​called
phase-​transitions and coarticulations.
Phase-​transition designates changes in behaviour due to changes in some
parameter such as the speed and/​or amplitude of body motion (Haken, Kelso
and Bunz 1985). In our context this means that otherwise singular motion-​
units may fuse into a superordinate unit if the speed is increased, and con-
versely, a rapid motion may become split into distinct units if the speed is
decreased, as would be the case of a 3/​4-​time waltz pattern going from three
beats per measure to one beat per measure with increasing tempo, and con-
versely, from one beat per measure to three beats per measure with decreasing
tempo, similar to the transitions between sustained, impulsive and iterative
sounds mentioned above.
Coarticulation means that there is a fusion and contextual smearing of body
motion so that otherwise singular actions fuse into more superordinate trajec-
tories; in other words, body motion creates a context where the present state
of an effector (finger, hand, vocal tract) is determined by what was just done
as well as what is to be done next (Rosenbaum 1991). This means that there
are so-​called carryover and anticipatory effects at work in sound p ​ roduction,
something that has been quite extensively studied in linguistics (Hardcastle
and Hewlett 1999) but less so in music (Godøy, Jensenius and Nymoen 2010;
Godøy 2014). This coarticulatory fusion also has consequences for the sound
produced, contributing to a similar contextual smearing of sound and of
motion, resulting in continuous trajectories that in turn are one of the sources
of shape experience in music.
22 Music and Shape

Another element of motor control is that body motion seems to be organized

hierarchically by a series of goals (Grafton and Hamilton 2007). Following
findings from motor control research, this can be understood as a series of
postures, with continuous motion between them (Rosenbaum et al. 2007). For
convenience, we have chosen to use the terms key-​postures and trajectories in
our publications, ‘key-​postures’ denoting the shape and position of the sound-​
producing effectors (fingers, hands, arms, tongue, lips, vocal tract and so on)
and ‘trajectories’ denoting the continuous motion of the effectors between
these key-​postures.
One aspect here is that of continuous versus intermittent motor control, a
much-​debated topic for more than a century (Elliott, Helsen and Chua 2001).
Classical control theory, be that in human motion or machines, stipulates two
basic control schemes, closed loop with continuous feedback adjustment (as
in a thermostat) and open loop with only intermittent control, typically lim-
ited to initiating the motion, as in hitting a golf ball. Closed loop seems plau-
sible enough from everyday experience, in that we adjust our body motion in
response to the effects of our body motion, as in balancing, singing a tone,
bowing on a string instrument and so on. Yet the difficulty here is that all such
adjustment takes time. To avoid delays, there must be some kind of anticipa-
tory cognition at work: we somehow have to have an ‘all-​at-​once’ image of the
ensuing motion trajectory, which is a feature of open loop control, as in hitting
the golf ball. There is mounting evidence for this kind of anticipatory cognition
at work in human motor control, leading to the idea of action gestalts, where
human motion is seen as a series of pre-​programmed motion shapes (Klapp
and Jagacinski 2011).
One result of going deeper into shape cognition is the realization that atten-
tion and effort are unequally distributed, that there is an intermittency of both
attention/​control and effort/​energy influx in body motion. Intermittency in
human motor control is now gaining support from a number of observations
such as the work on action gestalts (Klapp and Jagacinski 2011)  mentioned
already and more general human motion control theory (Loram et al. 2011),
and it supplements the evidence for key-​posture-​based action planning and
control (Rosenbaum et al. 2007).
Adapted to our context, we believe music can also be understood as cen-
tred on certain salient moments in time in the form of downbeats and other
accents—​on what we call ‘goal-​points’ in music—​and that the key-​postures
are situated at these goal-​points. These key-​postures and goal-​points in the
music are intermittent, and so there is a fundamental discontinuity at work
in music, albeit a discontinuity that may be forgotten in the face of the con-
tinuous motion trajectories and sound between these goal-​points, as well as
through continuous series of often overlapping chunks in succession, as we
hypothesize is the case at macro timescales in music. Furthermore, we hypoth-
esize that the shape of these postures and trajectories also forms the basis for
sonic shapes. We can see a short example of this in Figure 1.4, where we have
Key-postures, trajectories and sonic shapes 23

FIGURE 1.4   The score of the first two bars of the last movement of Beethoven’s Piano Concerto
No. 1 (top), and graphs showing the position, velocity and acceleration of the vertical motion of the
right-​hand knuckles, wrist (RWRA) and elbow (RELB) in the performance of these two bars. We
clearly see the up–down motion at the downbeats, i.e. at what we call the goal-​points, as well as the
relative high velocity at these points, typical of so-​called ballistic motion.

the key-​postures at the goal-​points of the downbeats and continuous motion

trajectories between these key-​postures.

Motion-​sound chunks

On the basis of our own and others’ research, then, we believe that there are sev-
eral elements of musical instruments, body motion and human cognition that
converge in singling out meso timescale motion-​sound chunks as primordial for
the experience of shape in music, elements that may be summarized as follows:
• A number of findings in research on human motor control, memory
and attention point to the meso timescale as special in terms of
meaning in both perception and action.
• More specifically in music, the meso timescale is also sufficient for

perceiving a number of musically salient features such as rhythm,
texture, dynamics, timbre, melodic, harmonic and modal features,
style and genre, and sense of motion and affect.
24 Music and Shape

In the context of music and shape, the meso timescale motion-​sound chunks
are clearly carriers of salient shape experiences in music:
• All sounds are included in some action trajectory, with various
principles of human motion such as phase-​transition and
coarticulation contributing to emergent effects of fused body-motion
and sonic shapes; thus, there is a contextual smearing of otherwise
singular motion and sound elements within the fused chunk.
• This contextual fusion is evident in most musical features, but
in particular in tightly welded units such as various ornaments
(Pralltriller, mordent, turn, etc.) and other figures (all kinds of
rhythmical patterns such as waltz, tango, samba and so on) where
the speed and density of motion and sonic events typically are so
high that anticipatory cognition is required, so that these figures are
conceived and performed as singular, holistic body-motion shapes.

In a phenomenological perspective, motion-​sound chunks may be understood

in this way:
• The ‘all-​at-​once’ and the ‘now-​points’ were basically epistemological
arguments of Husserl (and several of his contemporaries) but now can
be understood as grounded in intermittent, serial ballistic, anticipatory
cognition (Husserl 1991; Godøy 2008, 2010b, 2011, 2013).
• Singling out the fusion features of the meso timescale, and

assessing the available evidence here, we hypothesize that musical
experience combines discontinuity with continuity by concatenating
meso-​timescale chunks into macro-​timescale experiences of
continuous music.

Motion-​sound scripts

Although there is converging evidence that the meso timescale is crucial for per-
ceiving very many musical features, we also clearly experience music at longer
timescales: people go to performances of symphonies and operas, participate
in various long-​lasting music-​related events and rituals, or report long-​endur-
ing trance-​like experiences of music. Yet the perception of large-​scale forms in
music seems not to be a well-​researched topic. What we have is a substantial
number of western music analysis texts that assume the efficacy of large-​scale
forms, but the little perceptual-​empirical material that we have come across
suggests that we should be rather sceptical of such claims until further notice
(see for example Eitan and Granot 2008).
Lacking more systematic research in this area, we could assume from our
motor theory perspective that general principles of goal-​directed motor cogni-
tion apply here, so that we understand long sequences as a series of key-​postures
Key-postures, trajectories and sonic shapes 25

with intervening continuous motion trajectories and may also mentally quickly
run through a long stretch of music, just as we mentally run though a long walk
or a whole journey by a series of landmarks or junctions. This would essentially
amount to understanding large-​scale musical works as extended motion-​sound
scripts, as a series of concatenated and/​or overlapping motion-​sound chunks,
creating a sense of long-​range continuity in musical experience. In addition to
the features of meso-​timescale chunks, the macro timescale may often, by its
longer extension, have new dramaturgical and/​or narrative features. We could
also speculate that such macro-​level motion-​sound scripts in turn could be
envisaged as having shapes, shapes that we could glimpse in an instant, just
as we could envisage a long walk or journey; in other words, the same prin-
ciple of ‘all-​at-​once’ overview images applies here too, as a kind of compressed
‘trailer’ or ‘story board’ for the whole work, as in the famous statement by Paul
Hindemith that ‘If we cannot, in the flash of a single moment, see a composi-
tion in its absolute entirety, with every pertinent detail in its proper place, we
are not genuine creators’ (2000: 61).
What we do know from our research on music-​related body motion is that
we can see some salient global features over longer stretches, such as quan-
tity of motion (essentially a physical measure based on total displacement of
the body or parts of the body within a unit of time), recurrent patterns of

Frequency (Hz)

0 Time (s) 62.84

FIGURE 1.5   The top part shows motiongrams (i.e. video-​based summary images of motion
trajectories; see Jensenius 2013 for details) of three different successive dance performances by
the same dancer to a twenty-second excerpt from Lento from György Ligeti’s Ten Pieces for Wind
Quintet (Ligeti 1998), and the bottom part shows for the purpose of reference three repetitions of the
spectrograms of this excerpt. From this macro timescale view of the dancer’s body motion, we can
clearly see the overall shape (curve out from initial position and back) and mode of motion (mostly
calm but with a few abrupt elements).
26 Music and Shape

body motion, as well as amplitude, velocity and degree of calmness or agita-

tion by extracting measures of ‘jerkiness’ in the recorded body-motion shapes
(Hogan and Sternad 2007). Such global features of body motion can in turn
be correlated with various other qualitative observations of affect, style and
genre, providing us with important shape insights also at the macro timescale.
In Figure 1.5 we see such an example of three variant versions of a twenty-​
­second dance sequence to music by Ligeti, each variant having similar overall
motion qualities, although local details vary.
While much remains to be done in this area, the idea of shape cognition
seems to be both applicable and useful at the macro timescale, provided that at
this timescale we also succeed in making ‘all-​at-​once’ or ‘instantaneous’ over-
view images of body motion and sonic features.

Thinking shapes in music

The observation that shape metaphors and graphical shape representations

are ubiquitous in music-​related contexts should by itself suggest that there is
a close relationship between shape cognition and sound in musical experience.
But given presently available methods and technologies for recording, analysing
and correlating all kinds of sonic and motion feature data, it should be possible
to make much more systematic explorations of shape cognition in music (see
also Küssner, Chapter  2 below). We now have possibilities for bypassing the
restrictions of western music notation in music research and working directly
with shapes as holistic, nonsymbolic entities in music.
Yet in the face of such optimism, we still face many challenges, first of all
to develop less obtrusive and ecologically valid observation settings for music-​
related body motion, and also to develop better means of data processing,
both of input signals and for exploring various patterns and correlations.
Additionally, there are the enigmas, already mentioned, of how our minds are
able somehow to extract information from a continuous stream of sensations,
to break out of the continuous flux of time and generate more or less stable
overview images, and also to integrate sense modalities—​enigmas that we hope
the cognitive sciences can shed light on in the coming decades.
Despite such challenges of method as well as basic cognitive issues, the great
advantage of shape-​cognition in music is in opening up new areas of musi-
cological, aesthetic and affective psychological research, as well as providing
practical tools in artistic creation, for example in the domains of sonic design
and various kinds of multimedia art. In this connection, thinking and actively
working with shapes in music as was practised several decades ago by Schaeffer
and co-​workers, by what we have called motormimetic sketching of sonic fea-
tures, means embarking on what is essentially a hermeneutical circle of drawing
Key-postures, trajectories and sonic shapes 27

(mentally, on paper, digitally), listening, drawing, listening, each time creating

a greater awareness of sound and body-motion features as shapes, and in this
process enhancing our understanding of music and other multimedia arts.


Berthoz, A., 1997: Le sense du mouvement (Paris: Odile Jacob).

Bregman, A., 1990: Auditory Scene Analysis (Cambridge, MA, and London: MIT Press).
Chion, M., 1983: Guide des objets sonores (Paris: INA/​GRM Buchet/​Chastel).
Cogan, R., 1984: New Images of Musical Sound (Cambridge, MA, and London: Harvard
University Press).
Delalande, F., M. Formosa, M. Frémiot, P. Gobin, P. Malbosc, J. Mandelbrojt and
E. Pedler, 1996: Les Unités Sémiotiques Temporelles: Éléments nouveaux d’analyse musi-
cale (Marseille: Éditions MIM –​Documents Musurgia).
Eitan, Z. and R. Y. Granot, 2008: ‘Growing oranges on Mozart’s apple tree: “inner form”
and aesthetic judgment’, Music Perception 25/​5: 397–​417.
Elliott, D., W. Helsen and R. Chua, 2001:  ‘A century later:  Woodworth’s (1899) two-​
component model of goal-​directed aiming’, Psychological Bulletin 127/​3: 342–​57.
Galantucci, B., C. A. Fowler and M. T. Turvey, 2006: ‘The motor theory of speech percep-
tion reviewed’, Psychonomic Bulletin & Review 13/​3: 361–​77.
Gjerdingen, R. and D. Perrott, 2008:  ‘Scanning the dial:  the rapid recognition of music
genres’, Journal of New Music Research 37/​2: 93–​100.
Godøy, R. I., 1997: Formalization and Epistemology (Oslo: Scandinavian University Press).
Godøy, R. I., 2006: ‘Gestural-​sonorous objects: embodied extensions of Schaeffer’s con-
ceptual apparatus’, Organised Sound 11/​2: 149–​57.
Godøy, R. I., 2008:  ‘Reflections on chunking in music’, in A. Schneider, ed., Systematic
and Comparative Musicology:  Concepts, Methods, Findings (Frankfurt:  Peter Lang),
pp. 117–​32.
Godøy, R. I., 2010a:  ‘Gestural affordances of musical sound’, in R. I. Godøy and M.
Leman, eds., Musical Gestures: Sound, Movement, and Meaning (New York: Routledge),
pp. 103–​25.
Godøy, R. I., 2010b: ‘Thinking now-​points in music-​related movement’, in R. Bader,
C. Neuhaus and U. Morgenstern, eds., Concepts, Experiments, and Fieldwork: Studies
in Systematic Musicology and Ethnomusicology (Frankfurt am Main: Peter Lang),
pp. 245–​60.
Godøy, R. I., 2011: ‘Sound-​action awareness in music’, in D. Clarke and E. Clarke, eds.,
Music and Consciousness (Oxford: Oxford University Press), pp. 231–​43.
Godøy, R. I., 2013: ‘Quantal elements in musical experience’, in. R Bader, ed., Sound—​
Perception—​Performance (Berlin: Springer), pp. 113–​28.
Godøy, R. I., 2014: ‘Understanding coarticulation in musical experience’, in Sound,
Music, and Motion: 10th International Symposium, CMMR 2013, Marseille,
France, 15–​ 18 October 2013, Revised Selected Papers (Berlin: Springer),
pp. 535–​47.
28 Music and Shape

Godøy, R. I. and M. Leman, 2010:  Musical Gestures:  Sound, Movement, and Meaning
(New York: Routledge).
Godøy, R. I., E. Haga and A. Jensenius, 2006: ‘Playing “air instruments”: mimicry of
sound-​producing gestures by novices and experts’, in S. Gibet, N. Courty and J.-​F.
Kamp, eds., Gesture in Human-​Computer Interaction and Simulation: 6th International
Gesture Workshop, Lecture Notes in Artificial Intelligence 3881 (Berlin: Springer), pp.
Godøy, R. I., A. R. Jensenius and K. Nymoen, 2010: ‘Chunking in music by coarticulation’,
Acta Acustica united with Acustica 96/​4: 690–​700.
Goebl, W., S. Dixon, G. De Poli, A. Friberg, R. Bresin and G. Widmer, 2006: ‘ “Sense” in
expressive music performance: data acquisition, computational studies, and models’, in
P. Polotti and D. Rocchesso, eds., Sound to Sense, Sense to Sound: A State of the Art in
Sound and Music Computing (Berlin: Logos Verlag), pp. 195–​242.
Grafton, S. T. and A. F. Hamilton, 2007: ‘Evidence for a distributed hierarchy of action
representation in the brain’, Human Movement Science 26: 590–​616.
Haken, H., J. Kelso and H. Bunz, 1985: ‘A theoretical model of phase transitions in human
hand movements’, Biological Cybernetics 51/​5: 347–​56.
Hardcastle, W. J. and N. Hewlett, eds., 1999: Coarticulation: Theory, Data and Techniques
(Cambridge: Cambridge University Press).
Hindemith, P., 2000: A Composer’s World: Horizons and Limitations (Mainz: Schott).
Hogan, N. and D. Sternad, 2007:  ‘On rhythmic and discrete movements:  reflections,
definitions and implications for motor control’, Experimental Brain Research 181/​1:
Hunt, A., M. Wanderley and M. Paradis, M., 2003: ‘The importance of parameter map-
ping in electronic instrument design’, Journal of New Music Research 32/​4: 429–​40.
Husserl, E., 1991: On the Phenomenology of the Consciousness of Internal Time, 1893–​1917,
trans. J. B. Brough (Dordrecht: Kluwer Academic).
Jensenius, A. R., 2007: ‘Action–​sound: developing methods and tools to study music-​related
body movement’ (PhD dissertation, University of Oslo).
Jensenius, A. R., 2013: ‘Some video abstraction techniques for displaying body movement
in analysis and performance’, Leonardo: Journal of the International Society for the Arts,
Sciences and Technology 46/​1: 53–​60.
Jensenius, A. R. and R. I. Godøy, 2013: ‘Sonifying the shape of human body motion using
motiongrams’, Empirical Musicology Review 8/​2: 73–​83.
Klapp, S. T. and R. J. Jagacinski, 2011: ‘Gestalt principles in the control of motor action’,
Psychological Bulletin 137/​3: 443–​62.
Liberman, A. M. and I. G. Mattingly, 1985:  ‘The motor theory of speech perception
revised’, Cognition 21: 1–​36.
Ligeti, G., 1998: Ten Pieces for Wind Quintet, on London Winds, György Ligeti Edition,
Vol. 7: Chamber Music (Sony SK 62309).
Loram, I. D., H. Gollee, M. Lakie and P. J. Gawthrop, 2011: ‘Human control of an inverted
pendulum: is continuous control necessary? Is intermittent control effective? Is intermit-
tent control physiological?’, The Journal of Physiology 589/​2: 307–​24.
McGurk, H. and J. MacDonald, 1976: ‘Hearing lips and seeing voices’, Nature 264: 746–​8.
Norwald, O., 1969: Lutosławski (Stockholm: Norstedt).
Key-postures, trajectories and sonic shapes 29

Nymoen, K., 2013: Methods and technologies for analysing links between musical sound and
body motion (PhD dissertation, University of Oslo).
Peeters, G., B. L. Giordano, P. Susini, N. Misdariis and S. McAdams, 2011: ‘The timbre
toolbox: extracting audio descriptors from musical signals’, Journal of the Acoustical
Society of America 130/​5: 2902–​16.
Petitot, J., 1985: Morphogenèse du Sens I (Paris: Presses Universitaires de France).
Petitot, J., 1990: ‘Forme’, in Encyclopædia Universalis (Paris: Encyclopædia Universalis).
Risset, J.-​C., 1991: ‘Timbre analysis by synthesis: representations, imitations and variants
for musical composition’, in G. De Poli, A. Piccialli and C. Roads, eds., Representations
of Musical Signals (Cambridge, MA, and London: MIT Press), pp. 7–​43.
Rosch, E., C. B. Mervis, W. D. Gray, D. M. Johnson and P. Boyes-​Braem, 1976:  ‘Basic
objects in natural categories’, Cognitive Psychology 8: 382–​436.
Rosenbaum, D., 1991: Human Motor Control (San Diego, CA: Academic Press).
Rosenbaum, D., R. G. Cohen, S. A. Jax, D. J. Weiss and R. van der Wel, 2007: ‘The problem
of serial order in behavior: Lashley’s legacy’, Human Movement Science 26/​4: 525–​54.
Schaeffer, P., 1966: Traité des objets musicaux (Paris: Éditions du Seuil).
Schaeffer, P. (with sound examples by G. Reibel and B. Ferreyra), [1967] 1998: Solfège de
l’objet sonore (Paris: INA/​GRM).
Schäffer, B., 1976: Introduction to Composition (Warsaw: PWM Edition).
Sethares, W. A., 2007: Rhythm and Transforms (Berlin: Springer).
Smith, B., ed., 1988: Foundations of Gestalt Theory (Munich and Vienna: Philosophia
Thom, R., 1983: Paraboles et catastrophes (Paris: Flammarion).
Xenakis, I., 1992: Formalized Music, rev edn. (Stuyvesant, NY: Pendragon Press).
Lucia D’Errico, guitarist and graphic designer

There is no optical space in my experience of music. If I leave aside a sponta-

neous association of pitches with fields of colour (so flat and vibrant, though,
that they acquire almost a haptic quality), the role of sight is relegated to the
preliminary and purely intellectual moment of musical notation. The shape
that delineates itself when listening to or making music is rather the blind den-
sity of my own body. It is a body subjected to forces of different magnitude that
act from both inside and outside itself.
This shape is kept in dynamic tension by four force lines:  the first (dis-
charge) anchors it to the ground, the second (charge) keeps it upright, the third
(advance) propels it forwards, and the fourth (recoil) backwards. Synchronic
musical elements organize themselves around these lines in a way that is sche-
matized in Figure R.1. Thus, whereas the bass has the role of a hidden region
where both balance and drive are located, melody is the recognizable and com-
municating part, as are the face and the hands. Harmony connects and reg-
ulates these regions like an organ system, and rhythmical elements fulfil the
motoric function. These tensions/​elements can amalgamate, as well as inter-
change functions, as in a harmoniously working human body; but a musician
can also choose to dissociate or to omit some of them. It is in one such case that
we experience the harrowing beauty of the aria ‘Aus Liebe will mein Heiland
sterben’, from Johann Sebastian Bach’s St Matthew Passion. The accompani-
ment of the voice is restricted to high-​sounding instruments only: our breath is
reduced to the length of our air tube, and whatever stands beneath is paralysed,
What is a more wonderful example of this body-​like musical shape than the
song ‘Das ist ein Flöten und Geigen’, from Robert Schumann’s Dichterliebe,
based on a text by Heinrich Heine? A wedding feast is taking place, but that
of the poet’s beloved; he is, so to speak, peeping in through the window. On a
gauche waltz rhythm (the dance of sexual liberation at the time) that reproduces
Reflection: Lucia D’Errico 31




FIGURE R.1   Schematization of bodily music-​shape forces (in colour at )

the musical frenzy of the party, the right hand of the piano weaves a suspended,
almost religious obbligato: ‘Dazwischen schluchzen und stöhnen/​Die liebliche
Engelein’ (‘in between, sobbing and groaning,/the lovely little angels’). One
­single sonic sensation contains the bodily giddiness of the happy couple and
the dejected inertia of the onlooker.
These four force lines need not be intended as vectors that cause a move-
ment throughout time, but rather as internal potentialities. Advance and recoil
are not forces that establish a chronological order; they interact with it, gen-
erating micro variations and perturbations inside a steady sequential grid.
Advance is not acceleration, but longing. Recoil is not ritardando, but lingering.
Additionally, this bodily shape, so complex and changing in itself, moves inside
another shape, which I would call architectural: the diachronic dimension of
music. Again, it is not an architecture one can see, but rather a space to cross
with blind eyes. This, depending on the levels of complexity, might resemble a
palace, a hut, or even a garden or a desert; it might have varying temperature
and light (but no optical shapes!). As a listener, I am led to move in unexplored
spaces. As a performer, it is I who is trying to lead someone else through an
architecture I know well. As a composer, I conceive this architecture first and
32 Music and Shape

then try to inhabit it until I am ready to distinguish and remember all of its
Strange as it may sound, something very similar happens in my work as
a graphic designer. There are no optical shapes beforehand:  there are forces,
which organize themselves on the empty canvas. The result is not predeter-
mined, but issues from the coagulation of these physical drives into visual ele-
ments. It is not a question of reproducing the visible, but of making visible
(Paul Klee). I ignore the subject I want to design, since it is dictated afterwards
by the arrangement of vectors I perceive somatically. For the same reason, the
habit of organizing music in an optical way as a timeline is as serviceable as it
is misleading. A musical experience is not the sonic rendering of a linear score.
On the contrary, a score should be nothing but the code, the deciphering of
which might recreate a planned spatial and haptic experience in the listener
through sound.

Shape, drawing and gesture

Mats B. Küssner

Human processing of sound and music as

a multimodal phenomenon

Music—​as pertaining to the very act of shaping sounds over time during a
performance—​engages most of our senses. As audience members in a concert,
we hear the musical sounds, we see the musicians on stage, and we feel the
rhythmic beat, only to realize that we have been tapping our finger to it, and
perhaps we taste a moment of sweetness during an intensely emotional passage.
Although the latter seems metaphorical, it also seems an apt description, sug-
gesting an underlying mapping from sound experience to taste (Knöferle and
Spence 2012). Even sitting at home and listening to a record in solitude with
eyes closed necessarily entails a multimodal experience as we map features of
the musical sound onto other domains, particularly the spatial and visual. That
is the central argument of the chapter. We feel the melodic line ascending and
descending; we feel we are moving or being moved forward, gently at times or
with sudden force; we sense the brightness or gloominess of some passages; or
perhaps we conjure up internal images that the music invoked in us and that
now become an integral part of our listening experience. How do we map music
onto other domains and why do we do it so readily? In this chapter, I address
the former question in some depth by reviewing studies on individuals’ draw-
ings and gestures in response to musical sounds. I introduce these multifaceted
shapes of sound and music as a way of studying music perception and cogni-
tion empirically, and outline methodological issues and challenges. To begin
with, however, I take a very brief look at some potential explanations for why
these cross-​modal mappings may exist in the first place.

34 Music and Shape

It is possible that our brains have evolved to be equipped with an innate

capacity for auditory-​visual correspondences (Walker et al. 2010), though such
a view is currently contested (Lewkowicz and Minar 2014). What appears to be
undisputed, however, is that learning plays a crucial role in shaping cross-​modal
correspondences (Spence and Deroy 2012). From an evolutionary perspective,
by far the most common mode of music listening is the experience of musical
sound emerging from social contexts. In communal activities—​perhaps origi-
nally serving the purpose of group cohesion and bonding (Roederer 1984)—​
we see and hear sounds being produced by our conspecifics who use various
gestures, postures and possibly instruments to create and/​or accompany musi-
cal sounds. Indeed, the earliest couplings of visuo-​spatial and auditory cues
are likely to happen in parent–​infant interactions such as mothers singing to
their child (Trehub and Trainor 1998), displaying a wide range of (exaggerated)
expressive behaviour (e.g. facial expressions, gestures, etc.). And while we form
cross-​modal associations by observing others making sounds and music, we
are also perceptive to cross-​modal mappings of music in our own bodies, for
instance when we sing. Through proprioceptive feedback, we are able to feel
the rise of our larynx when producing a high-​pitched sound with our voice
(Parkinson et al. 2012), or we might notice the raising of our eyebrows (Huron,
Dahl and Johnson 2009; Huron and Shanahan 2013). Through repeated expo-
sure to such couplings of perception and action, we form stable associations
between the actions performed and the sounds being heard, to the point where
both form a common representation (Prinz 1990).
Many of the effects found in cross-​modal perception of music, and indeed
perception in general, have their origin in speech perception and cognitive lin-
guistics. The idea that the perception of speech is not merely the processing of
physical properties of the sound but largely based on an internal simulation of
the actions that produced the sound—​formalized in the motor theory of speech
perception (Galantucci, Fowler and Turvey 2006; Liberman and Mattingly
1985)—​has been highly influential in the cognitive sciences. But, of course,
apart from the biological mechanisms, the impact of culture—​e.g. through
language—​ is evident and manifested in cross-​ cultural differences of map-
pings of pitch, for instance (Dolscheid et al. 2013; Eitan and Timmers 2010).
Influentially, Lakoff and Johnson (1980) argued that conceptual metaphors are
based on our experiences and interactions within a cultural environment, shap-
ing the way we think and perceive the world. That is, we may use our experi-
ence of MORE IS UP, LESS IS DOWN—​originally referring to the numerous
instances in the physical world (i.e. the so-​called source domain)—​and map it
onto an abstract domain (i.e. the target domain) such as the pitch space where
MORE refers to higher pitches (see Zbikowski 2002).
All of these accounts have in common cross-​modal experiences shaped by
bodily experiences within a particular cultural environment. In terms of cross-​
modal mappings of music, my account is in line with scholars arguing that the
Shape, drawing and gesture 35

interaction of modalities is the primordial mode of music listening (Godøy

2003) and that music perception is a multimodal phenomenon rooted in our
bodies as a natural mediator between the physical world and musical experi-
ence (Leman 2007). If the body plays a central role in making sense of music,
then studying music perception through overt bodily responses such as draw-
ings and gestures should tell us something about this meaning-​making process.

Traditional experimental paradigms

of cross-​modal correspondences

How sound and music are mapped onto the visual and visuo-​spatial domains—​
with paradigms other than drawing or gesturing—​has been reviewed at length
elsewhere (Eitan 2013; Spence 2011) and is not discussed here. However, it is
important to review the experimental paradigms underlying the vast majority
of empirical findings to date to be able to put drawing and gesturing approaches
into context.
To a large extent, increasing knowledge of cross-​modal correspondences is
based on reaction-​time paradigms that were developed by Garner in the 1960s
around the same time that the cognitive revolution gained momentum, with the
underlying metaphor of the human mind as a computer processing incoming
information.1 According to this view, sensory input from different modalities
is integrated at various levels of processing ranging from early sensory/​percep-
tual levels to late semantic levels (for a review see Marks 2004). The speed with
which this processing occurs can be measured in behavioural experiments in
which participants respond to features of a dimension of a modality by press-
ing buttons which have been assigned certain feature values. In the simplest
case, there is only one modality involved, and features are varied only along
one dimension. For instance, participants may be asked to indicate as quickly
as possible whether the pitch (i.e. the relevant dimension) of a sound is high
or low, while the loudness (i.e. the irrelevant dimension) is kept constant. This
task—​which has been termed ‘speeded identification’—​often serves as a base-
line condition, involving two possible stimuli and two possible responses. If
the irrelevant dimension is varied as well (e.g. loudness: soft and loud), we get
four possible stimuli (high/​soft, high/​loud, low/​soft, low/​loud) while the num-
ber of possible responses is still two. In the latter scenario—​‘speeded classifica-
tion’—​participants’ task is to ignore the variation in the irrelevant dimension
(i.e. loudness) and indicate the feature value (high versus low) of the relevant
dimension (i.e. pitch). While these examples concern a single modality, there is
extensive research combining dimensions from several modalities (for a review
see Spence 2011). Whenever there are greater reaction times in comparison to
a baseline condition due to the variation of features in an irrelevant dimen-
sion or stimulus, this is referred to as ‘Garner interference’. On the other hand,
36 Music and Shape

whenever features from two dimensions—​whether within a single modality or

across modalities—​are aligned congruently (e.g. high pitch, high elevation)
such that the pairing gives rise to smaller reaction times in comparison to
incongruently aligned features from the same two dimensions (e.g. high pitch,
low elevation), this is referred to as a ‘congruence effect’.
In such reaction-​time experiments it is important either to balance the
position of the response buttons across participants or to manipulate it
deliberately as a further independent variable due to the well-​studied effects
of stimulus–​response compatibility (Fitts and Seeger 1953). These repre-
sent another classic paradigm within which one may study cross-​modal cor-
respondences. Crucially, the role of the participants’ actions, in the form
of button presses, becomes an integral part of the cross-​modal mapping.
For instance, in an experimental setting where the two response buttons for
high and low pitch are arranged vertically, a high pitch is faster classified as
‘high’ when the corresponding button is the upper rather than the lower one
(Rusconi et al. 2006).
Besides the development and refinement of tasks involving speeded
responses, there is an even older type of paradigm concerned with unspeeded
responses. In fact, most of the early cross-​modal mapping experiments con-
sisted of unspeeded tasks, asking participants to locate sounds with different
discrete pitches in space (e.g. Pratt 1930; Trimble 1934).
Another commonly observed unspeeded task is forced-​choice matching.
When employing such a paradigm, individuals are asked to choose from a lim-
ited set of responses—​there may be several but in some cases as few as two—​
the one they think fits best with a stimulus presented. In a series of experiments,
Walker (1987) asked people to match pure tones which varied in frequency,
amplitude, waveform and duration with abstract visual figures which varied in
vertical and horizontal arrangement, size, pattern and shape. But ‘real’ musi-
cal excerpts and prints of paintings have also been used in one of the earliest
empirical studies in which participants were asked to match musical sound to
pictorial representations (Cowles 1935).
All paradigms described thus far have in common that participants’
responses are fairly restricted. While this allows researchers to investigate cross-​
modal mappings rigorously by refining their paradigms and manipulations fur-
ther and adding to an ever-​increasing body of evidence, the rigour comes at
the cost of richer, qualitative data which provide another fruitful angle on the
object of study: this is why researchers have applied paradigms involving open-​
ended responses. Studying cross-​modal mappings of sound and music with
free drawings and other bodily gestures opens up new pathways for enquiry. In
the following two sections, I provide an in-​depth summary of studies applying
drawing and gesturing paradigms in order to investigate the perceived shape of
sound and music.
Shape, drawing and gesture 37

Drawings of sound and music

Children’s drawings of sound and music have been studied extensively, creat-
ing a large body of empirical evidence and proving influential for studies with
adults. They are thus reviewed here in some depth before moving on to adults’
drawings of sound and music.



Children’s drawings have played an important role in psychology as it has

been argued that they form a window onto a child’s cognitive development
(Hargreaves 1978; Olson 1970; Piaget and Inhelder 1973; Werner 1980). In a
musical context, drawings of simple sound stimuli and musical excerpts might
thus be seen as insights into music cognition and the development of musi-
cal thinking (Davidson and Scripp 1988). Even though it is a moot question
exactly what these drawings represent—​windows onto, or rather reflections of,
musical thinking (Barrett 2000)—​they have been studied extensively since the
end of the 1970s, owing to two broadly shared assumptions among research-
ers (Barrett 2005: 125): first, young children may not have developed yet the
language to express adequately their musical thinking, and second, some musi-
cal experiences may defy linguistic descriptions and would be better and more
revealingly described nonverbally.
In a series of seminal experiments investigating visual representations of
simple rhythmic fragments, Bamberger (1980, 1982) paved the way for numer-
ous studies investigating children’s, as well as adults’, invented notations
of  music. On the basis of shapes produced in her experiments, in which she
asked children aged four to twelve years first to clap a simple rhythm and then
to draw it, Bamberger (1982) proposed a developmental trajectory from ‘rhyth-
mic scribbles’ mimicking the clapping action with the pen, through figural
representations capturing perceptual groupings of the sounds, to metric rep-
resentations displaying the awareness of an underlying metric pulse by assign-
ing each symbol a particular duration. However, this Piagetian view, in which
each stage is replaced by the next, has been challenged by evidence showing
that children acquire a ‘database of strategies’ (Barrett 2005: 130), using one
or several approaches that seem most appropriate given the nature of the task
and the stimuli (Reybrouck, Verschaffel and Lauwerier 2009). For example,
Upitis (1987), among others who extended the work on visual representations
of rhythmic sequences (Davidson and Colley 1987; Davidson and Scripp 1988;
Smith, Cuddy and Upitis 1994), found that, regardless of musical training,
children aged seven to twelve years are all able to make sense of rhythm by
using figural or metrical representations or a combination of both types. Upitis
38 Music and Shape

used various active and passive rhythm tasks—including clapping rhythms,

drawing (and recognizing drawn) rhythms, verbal interpretations and tapping
along—and found that children draw on a large pool of representational strat-
egies. Importantly, she also emphasized the role of context, and was able to
show in subsequent studies that children are much less likely to represent the
rhythmic structure if it is embedded within an unknown melody (Upitis 1992).
Only when the pitch structure is fairly simple (e.g. an ascending scale) and the
rhythmic structure more complex do children show a more elaborate visual
representation of the rhythmic structure (Upitis 1990).
These findings are echoed by Davidson and Scripp (1988: 222), who call for
‘increasingly divergent paths of rhythm and pitch in representational devel-
opment’, seeing ‘rhythm and pitch in a figure-​ground relationship, that is, the
rhythmic “figure” in isolation becomes “ground” when pitch is introduced into
the context of the phrase’ (ibid.: 226). In a musical culture based largely on
pitch, it is perhaps not surprising that children prefer, and find it easier, to draw
the pitch rather than the rhythm of a melody. Recent findings support this ten-
dency: Verschaffel et al. (2010) found that stimuli whose salient feature is the
pitch or the melody give rise to more differentiated visualizations than stimuli
whose salient feature is related to either rhythm or dynamics.
These are all examples of individual musical parameters studied either in
isolation or within the context of simple musical fragments. The question of
whether findings from such studies can and should be generalized to ‘real’
musical excerpts is currently debated (Elkoshi 2002; Reybrouck et al. 2009;
Verschaffel et al. 2010). Asking more than one hundred children aged seven to
eight-and-a-half years to draw rhythmic sequences that were either produced
(in isolation) by the children or part of a musical excerpt they listened to,
Elkoshi (2002) found no correlation between the visualizations of short sound
fragments and the musical excerpt, arguing that this gap cannot be closed and
that one may not infer from one to the other (see also Reybrouck et al. 2009).
On the other hand, more recent evidence suggests that such a correlation may
well exist (Verschaffel et al. 2010). Testing a comparably large group of eight-​
to-​nine-​and eleven-​to-​twelve-​year-​olds, with and without musical training,
revealed that the quantity of differentiated visualizations2 in response to short
simple sound stimuli, each of which had been designed to highlight one spe-
cific musical parameter (pitch, duration and loudness), correlated positively
with the quantity of differentiated visualizations in response to real musical
excerpts, chosen to highlight three corresponding musical features (melody,
rhythm and dynamics).
Since proponents of Gestalt approaches may well have a point, it is important
to look at some of the evidence from real/​complex musical excerpts. Gromko
(1994) asked sixty children aged four to eight years to sing or play a short
folk song provided by the author and then to ‘write the way the song sounds’
(ibid.: 139). Moreover, the children’s perceptual discrimination was tested in
Shape, drawing and gesture 39

a standardized rhythm and tonal task. Results revealed a positive correlation

between the musical understanding rating, computed on the basis of the per-
formance in the singing/​playing and the perceptual discrimination task, and the
depiction of rhythmic and tonal elements in their invented notations, suggest-
ing that representation—​alongside the more traditional measures of produc-
tion and perception—​may indeed reflect the development of children’s musical
understanding. Comparing invented notations of familiar and unfamiliar
melodies of fifty children aged six to nine years with no formal musical train-
ing outside school, Upitis (1990: 94) found that the most commonly produced
shapes and symbols are ‘(a) icons, (b) words, (c) discrete marks for pitches and/​
or durations, and (d) continuous lines for pitch and/​or mood’. While there was
no apparent effect of age, an effect of familiarity showed that words and pic-
tures were more common for familiar songs—​according to the children, that is
enough to recognize the tune—whereas discrete symbols for pitch were more
common for unfamiliar songs. Using the same familiar song as Upitis, ‘Twinkle,
Twinkle, Little Star’, and testing twenty Suzuki-​trained3 children aged five to
ten years (duration of training varying from seven months to four years), Hair
(1993) found that apart from the youngest children who used pictures only,
the choice of pictures, icons, music symbols, and abstract lines and shapes was
similarly distributed across levels of age and musical training.
I have shown already that there is no clear developmental trajectory of the
strategies for representing music, but evidence pertaining to the influence of
musical training is contradictory: some researchers have found that increased
levels of musical training in children lead to more differentiated visualizations
of sound and music (Reybrouck et al. 2009; Verschaffel et al. 2010), while oth-
ers have found no effect of training (Hair 1993; Upitis 1987), suggesting that a
great deal depends on the nature of the task and the stimuli.
In one case, musical training even appeared to be detrimental to the accuracy
of the visual representations (Davidson, Scripp and Welsh 1988). The authors
asked more than four hundred musically trained and untrained children, ado-
lescents and adults to notate the two songs ‘Row, Row, Row Your Boat’ and
‘Happy Birthday’. More than 90 per cent of the trained participants aged
twelve to eighteen years were unable to produce a correct conventional nota-
tion for the pitch of ‘Happy Birthday’, while their invented notations showed
fewer errors. Caused by what the authors called ‘concept-​driven errors’, many
trained participants assumed that the first and last notes of ‘Happy Birthday’
had to be the same and erroneously ‘corrected’ their invented notations too.
However, a group of trained participants who had focused exclusively on learn-
ing to sight-​read songs relied more on their perceptual abilities and made no
conceptual errors, providing evidence that the kind of musical training children
receive significantly affects their musical understanding.
Finally, a study investigating both visual and kinaesthetic responses to
music is particularly pertinent here (Kerchner 2000). Asking twelve musically
40 Music and Shape

trained and untrained children aged seven to eight years and ten to eleven years
to listen to the first movement of Bach’s Brandenburg Concerto No. 2 and to
describe their listening experience both verbally, by creating a ‘listening map’,
and kinaesthetically, by moving their body, revealed that the most commonly
addressed ‘perceptual topics’ included ‘instrument, register, continuous motion,
formal sections, repetition, dynamics, tempo, contour, and pattern’ (ibid.: 36–​7).
The type of visualization was dependent on age: the younger group created less
differentiated mappings—​drawing pictures, the contour or the instruments—​
whereas the older group used words and combinations of shapes to represent
both extramusical properties (e.g. mood) and musical parameters such as the beat.
Regarding the kinaesthetic responses, both groups depicted a broad variety of
musical parameters such as ‘beat, subdivided beat, articulation, melodic rhythm,
embellishment, duration, style, phrase, subphrases and motivic fragments, con-
tour, form, and pattern’ (ibid.: 42). Perhaps expectedly, both the visual and the
kinaesthetic responses were more differentiated than the verbal responses.
If the assumption that some musical experiences defy linguistic descriptions
is correct, the same should hold for adults. Indeed, some of the studies aimed
at uncovering aspects of children’s musical understanding through visual rep-
resentations have included adult participants as well. Davidson et al. (1988)
reported that invented notations of ‘Happy Birthday’ by seven-​year-​olds are
comparable to those of ten-​year-​olds and untrained adults. Moreover, it was
revealed that children older than nine years, as well as musically untrained
adults, show very stable figural representations, while only participants able to
read music display fully developed metric representations (Bamberger 1982).
Smith et al. (1994) found similar drawings of rhythmic sequences across groups
of musically untrained children and trained and untrained adults. In the next
section I focus on adults’ drawings of sound and music in more detail.



Compared to the amount of evidence accumulated from children’s drawings,

that available from adults is considerably smaller, although what evidence does
exist is motivated by a greater variety of research questions. As in studies with
children, there exists a distinction between simple sound stimuli and more com-
plex musical excerpts. Regarding the former type, few studies have been carried
out thus far, approaching the subject from a number of angles.
Influenced by the theorizing of composer and pioneer of musique concrète,
Pierre Schaeffer (1966), work from the fourMs research group4 is pertinent here
(Godøy et al. 2006; Haga 2008). Schaeffer proposed that through repeated expo-
sure to sound segments listeners should disengage from the sound source and
focus entirely on the sonic event, something referred to as acousmatic listening.
By drawing a sonic event on paper over and over again—​or simply imagining
Shape, drawing and gesture 41

the shape of it in one’s mind—​otherwise hidden or inaccessible features of the

musical object are supposed to be revealed. Godøy and colleagues (Godøy, Haga
and Jensenius 2006; Godøy 2010) and Haga (2008) tested this in an exploratory
study in which they asked nine participants with varying degrees of musical
training to represent short sound fragments (2–​6 seconds long) with a pen on
an electronic graphics tablet. The sound stimuli—​produced with traditional and
electronic instruments, as well as taken from the environment—​were categorized
according to a typology proposed by Schaeffer (1966), and comprised impulsive,
continuous and iterative sounds, whereby both pitch and timbre were classi-
fied into ‘stable’, ‘unstable/​changing’ and ‘undefined’. Although this study was
more concerned with the hand gestures and participants were unable to see the
trace they were creating, the analysis was based on the resultant drawings. It
was revealed that, regardless of their level of expertise, individuals are fairly
consistent, for example in representing pitch with height and the decay of a
percussive sound with a descending line, but differ in respect of sound segments
with multiple features such as a constant pitch and changing timbre, which some
participants represented with a horizontal line while others drew curved shapes.
Küssner and Leech-​Wilkinson (2014) were particularly interested in the
influence of musical training on such ‘sound-​tracings’, and they asked forty-​
one musically trained and thirty untrained individuals to represent visually a
set of pure tones varied in pitch, loudness and tempo. Concerned particularly
with the visual representations of the sound stimuli, the participants could see
their drawings—​also carried out with a pen on an electronics graphics tablet—​
on a screen in front of them. Unlike the experimental procedure by Godøy and
colleagues in which participants drew after they heard the sounds, in this study
individuals were asked to draw along with the sound as it was played. Küssner
and Leech-​Wilkinson found that, overall, pitch is represented on the vertical
axis and loudness with the size. However, representational strategies chosen
by untrained participants were more varied than those of trained participants.
On the other hand, a comparison of the subgroups of trained and untrained
participants who explicitly stated that they used height for pitch and size for
loudness revealed that musically trained participants are more accurate than
untrained ones, possibly because trained participants’ perception–​action cou-
plings have been shaped more extensively.
Another study worth noting here focused on a cross-​cultural comparison
of visual representations of sound between the UK, Japan and Papua New
Guinea. Using simple sound stimuli varying in pitch contour and asking par-
ticipants to create marks on a sheet of paper so that other community members
could associate them with the sound heard, Athanasopoulos and Moran (2013)
found that UK participants and Japanese participants familiar with western
notation used the y axis for pitch and the x axis for time, proceeding from left
to right. Participants from a traditional Japanese music background depicted
time vertically, starting at the top and moving down, which probably relates to
42 Music and Shape

traditional Japanese writing. While both UK and Japanese participants used

symbolic representations, Papua New Guineans showed iconic representations,
depicting aspects not deliberately manipulated by the authors such as timbre
(e.g. flute sound) or loudness.
These three studies already give an idea of how drawing paradigms can be
applied in a variety of contexts to address important questions related to music
perception and cognition. But, of course, it is vital to consider drawings of
‘real’ musical excerpts too.
Possibly the earliest study examining listeners’ drawings of music is that
of Hooper and Powell (1970), who sought to shed light on the influence of
the type of music (absolute versus programme), the activity during listening
(accompanying with rhythm instruments versus sitting still and listening care-
fully or for pleasure) and the presentation mode (live versus recorded). Their
results revealed that participants’ drawings were more elaborate when the music
was ‘absolute’, the participants rhythmically engaged and the presentation live.
Discussing their findings in terms of music education, the authors suggest that
especially liveness and participation may give rise to increased visual imagery.
Gromko (1995) investigated the extent to which drawing responses of adults
without formal musical education reflect their musical understanding. To that
end, she presented her 127 participants with various excerpts of classical music
and asked them to ‘create an iconographic representation of the musical sound,
using lines, shapes, or graphics’ (ibid.: 34) and to provide verbal descriptions
of the excerpts. Results revealed that fewer than 50 per cent indicated musical
properties, such as melodic lines, rhythmic groupings or dynamics in their draw-
ings and fewer than 25 per cent in their verbal descriptions. Of those who did,
fewer than 5 per cent represented more than only rhythmic elements (‘enactive
scribbles’). Given that 60 per cent reported some involvement in musical perfor-
mance activities in high school, Gromko concluded that visual representations
reveal little about individuals’ musical understanding.
According to the work of Tan and Kelly (2004), however, there is a clear
difference between musically trained and untrained participants. Unlike other
researchers, the authors presented individuals with short but complete musical
compositions and took great care to suggest as little as possible in their instruc-
tion, since mentioning ‘shapes’, ‘lines’ etc. might have influenced individuals’
choices. It was revealed that trained participants by and large show abstract
representations, focusing on musical properties such as melodic themes, repeti-
tion or timbre. On the other hand, musically untrained participants chose to
depict extramusical ideas such as associative pictures including narratives and
emotions. Their drawings also often included the listener as an agent or nar-
rator. This trend was confirmed in the study by Küssner and Leech-​Wilkinson
(2014) in which two short musical excerpts led musically trained participants
to represent their pitch contour, while some untrained participants changed
Shape, drawing and gesture 43

the strategy they had applied for the pure tones and chose to create pictorial
representations based on associative ideas.
Finally, a drawing study within a clinical setting is worth mentioning here.
De Bruyn and colleagues (2012) asked a group of participants with Autism
Spectrum Disorder (ASD) and a group of controls to draw along with vari-
ous musical excerpts on an electronic graphics tablet, focusing on either the
rhythmic structure or the melodic contour. Results revealed that both groups
performed equally well in the rhythm condition, but participants with ASD
performed slightly better in the melody condition. Overall, the results are inter-
preted as evidence that patients with ASD have no difficulty imitating struc-
tural aspects of the music.5
All of the drawing studies reviewed here can be regarded as involving special
(two-​dimensional) types of gestures as well—​gestures with the side effect of
creating a visible trace on paper or screen. Next, I turn to empirical evidence of
‘proper’ three-​dimensional gestures in response to sound and music.

Gestural representations of sound and music

Empirical research on gestural representations of sound and music is still in

its infancy. To clarify, by ‘gestural representation’ I refer here to experimen-
tal paradigms in which individuals were specifically asked to represent or
depict aspects of (musical) sound with hand gestures.6 Similar to drawing
approaches—​though far more sparse—​the first studies on gestural representa-
tions of music were carried out with children in a music educational context to
explore new methods of music listening and to shed light on the development
of musical understanding (Espeland 1987; Kerchner 2000). In a more recent
study, Kohn and Eitan (2009) investigated five-​and eight-​year-​old children’s
gestural responses to sound stimuli varied in pitch, loudness and tempo. Their
analysis was based on a procedure in which independent observers trained in
Laban Movement Analysis7 were asked to rate the children’s movements along
the three spatial axes, as well as their muscular energy and the speed. Results
revealed that pitch was associated with the vertical axis, loudness with the ver-
tical axis and muscular energy, and tempo with speed and muscular energy.
More specifically, increase in loudness gave rise to upward movements and
heightened muscular energy, while decrease in loudness resulted in down-
ward movements and lowered muscular energy. Pitch was represented with an
upward–​downward movement when the pitch contour was increasing–​decreas-
ing. However, decreasing–​increasing pitch contour did not lead to consistent
downward–​upward movements, a result which is interpreted in light of a com-
monly observed bias for increasing–​decreasing contours (Eitan and Granot
2006; Küssner et al. 2014).
44 Music and Shape

Küssner et al. (2014) ran a similar experiment with adult musically trained
and untrained participants who were presented with a series of pure tones con-
currently varying in pitch, loudness and tempo. The authors found that—​just
as with drawing approaches—​musically trained participants show more accu-
rate pitch–​height mappings. It was also revealed that the bias for increasing–​
decreasing contours does not hold for musically trained participants. There
were multiple strategies for representing the loudness, depending on the com-
plexity of the stimulus: if only loudness was changed, participants associated
the loudness with both the y axis and the z axis; in more complex stimuli, loud-
ness was associated with muscular energy, operationalized as fast shaking-​
hand movements. Tempo was associated with the speed of hand movement
and elapsed time with the x axis (see also Küssner 2014). Moreover, this study
revealed interaction effects between the concurrently manipulated auditory
features—​such as pitch and loudness affecting the association between tempo
and speed of hand movement—​suggesting that gestural mappings of isolated
musical parameters should not automatically be generalized to more complex
auditory stimuli such as music.
Another study investigating isolated musical parameters has been carried
out by Nymoen et al. (2011), in which they asked participants to move a rod in
response to pitched and nonpitched sounds. It was revealed that pitch was most
strongly associated with the vertical axis and loudness with speed and hori-
zontal movements. Using a more restricted instruction, Kozak, Nymoen and
Godøy (2012) asked their participants to carry out either smooth or discon-
tinuous circular hand movements in response to sound stimuli manipulated in
rhythmic complexity, attack envelope, pitch, loudness and brightness. Focusing
on individuals’ ability to synchronize with the sound stimuli, they found that
discontinuous movement patterns resulted in better synchronization, with
musically trained participants performing more accurately in some trials only,
but never worse than untrained participants. Moreover, smooth attack enve-
lopes resulted in more motion, regardless of musical training.
Investigating gestural responses to everyday sounds—​ and more specifi-
cally, action-​and nonaction-​related sounds—​Caramiaux et  al. (2014) tested
the hypothesis that action-​related sounds would give rise to sound-​producing
gestures, whereas nonaction-​related sounds would entail the representation
of their sonic shape. Confirming their hypothesis, the authors discovered that
speed profiles of participants’ movements were more similar for nonaction-​
than for action-​related sounds. It was suggested that the identification of the
source of an action-​related sound (e.g. pouring cereal into a bowl) leads to
more idiosyncratic hand gestures than tracing the sonic shapes of nonaction-​
related sounds.
There are very few studies investigating how adults represent music—​that is,
‘real’ musical excerpts as opposed to a set of musical features (pitch, loudness,
timbre, etc.)—​with three-​dimensional hand gestures. Haga (2008) asked three
Shape, drawing and gesture 45

trained dancers and three untrained individuals to respond with spontaneous

gestures to various musical excerpts including pieces by Vivaldi and Ligeti and
one electronic piece of music composed for the purposes of the study. The
results of this observational study showed that there was broad consensus
among trained and untrained participants. The more detailed and complex the
musical excerpt, the more variation was observed in the gestures. Interestingly,
the dancers were often seen adding their own interpretative gestures to fill parts
in the musical excerpts in which a pulse was missing (see also Küssner 2013).
Moreover, it was observed that dancers developed their gestures on repeated
presentation of a musical excerpt, remembering what they had done previously
and exploring further gestural shapings of the music.
In a study using more restricted hand gestures, (western) participants
were asked to move a joystick in response to three pieces of traditional guqin
Chinese music (Leman et al. 2009). Participants repeatedly listened and ges-
tured along with the music over four sessions in which each piece was presented
twice consecutively. The findings revealed that the relative number of consistent
responses—​i.e. similar velocity patterns—​grew over the course of the experi-
ment, especially for the two more melodic pieces of the experimental stimuli.
These two melodic pieces also led to progressively similar movement responses
across participants, while the third piece—​described by the authors as having ‘a
more narrative character with less fluent melodic line’ (ibid.: 264)—​gave rise to
increasingly idiosyncratic movement responses. Besides recording participants’
movements, Leman and colleagues also recorded the movements of the musi-
cian and correlated them with the listeners’ movements. It was found that the
correlation between the musician’s shoulder movement and the participants’
arm movement strengthened over the course of the experiment for the two
melodic pieces, suggesting that the movement velocity patterns are shared not
only between listeners but also to some extent between musician and listener.
In a more recent study by the same group (Maes et al. 2014), the relationship
between music and movement was investigated by comparing listeners’ free
movement responses to music with their linguistic descriptions of the expres-
sive qualities of the music. The musical excerpt used in this study was the begin-
ning of the first movement of Brahms’ First Piano Concerto. The participants
were told: ‘[t]‌ranslate your experience of the music into free full-​body move-
ment. Try to become absorbed in the music that is presented and express your
feelings into body movement. There is no good or wrong way of doing it. Just
perform what comes up in you’ (ibid.: 71). While the participants moved during
the whole length of the excerpt, the authors identified three respective ‘heroic’
and ‘lyric’ passages, each thirty seconds long, for the purpose of their analyses.
On the basis of Laban’s Effort–​Shape model, participants rated the expressive
qualities of the excerpts on a bipolar scale consisting of twenty-​four adjectives,
sixteen pertaining to effort and eight to shape. Using a motion-capture system,
the researchers extracted seven movement features and matched them to the
46 Music and Shape

effort and shape categories. Results revealed that all the movement features
clearly differentiated between the two types of excerpt. For instance, if the
average value for ‘acceleration’ was high for the heroic passages, it was low for
the lyric passages. Moreover, there was an effect of musical training, show-
ing that trained participants achieved higher values for the movement features
‘size’ and ‘height’. This suggests that they moved more and filled more space
with their gestures during the experiment, possibly because they were more
familiar (and comfortable) with the music. Regarding the analysis of the lin-
guistic expressions, there was much agreement among the participants as to
how well a particular adjective described the expressive qualities of the music.
Furthermore, it was found that the extremes of the movement features corre-
lated with the extremes of the adjective scales such that an excerpt which was
rated, for instance, as conveying the expressive qualities ‘big’, ‘broad’, ‘thick’
and ‘exalting’ also gave rise to a high value for the movement feature ‘size’. The
authors interpret their findings as evidence for the sharing of expressive quali-
ties of music in linguistic expressions and body movements.
Having reviewed both drawing and gesture studies and shown the diversity
of contexts in which they were carried out, I now focus on some methodologi-
cal issues and how they can be addressed in future studies.

Methodological issues

When cross-​modal mappings of auditory stimuli are studied, the outcome will
depend to a large extent on the specifics of the experiment such as the choice
of stimuli, the experimental setting and the instruction given to participants.
By discussing some of the issues involved I hope to provide a helpful, if by no
means exhaustive, overview for researchers who wish to carry out experiments
on cross-​modal mappings of sound and music.


This dichotomy is not specific to the study of cross-​modal mappings but can
be found in any other field in which researchers have to face the problem of the
whole versus its parts. Unlike psycho-​acousticians who exclusively work with
highly controlled, synthesized sound stimuli, music researchers are particularly
concerned with the unravelling of cross-​modal mappings of real music, and a
broadly accepted way to study these is to investigate its constituent parts such
as pitch, loudness or timbre. As I have shown above, the problem with study-
ing characteristics of musical sounds in isolation (e.g. change in pitch) is the
creation of an ontological gap: we cannot be sure that findings from studies
using synthesized pure tones in order to investigate cross-​modal mappings of
pitch apply equally to situations in which we listen to the changing pitches of
Shape, drawing and gesture 47

a musical performance. There are too many other factors involved in the latter
that render generalizations problematic. On the other hand, the choice of ‘real’
musical excerpts as experimental stimuli gives rise to a number of confounding
variables since it is unavoidable that other musical qualities such as dynamics
or articulation—​or at least timbral qualities—​will be co-​varied with pitch. This
makes it difficult, if not impossible, to study causal links. I therefore suggest
that researchers should, whenever possible, include both types of stimuli in
their experiments (e.g. Eitan and Timmers 2010; Küssner and Leech-​Wilkinson
2014) in order to get a better idea of the extent to which findings from highly
controlled psycho-​acoustical stimuli hold true for musical excerpts, and also of
the extent to which findings from studies using musical excerpts can be repli-
cated by manipulating the musical sound feature of interest in isolation.


A further option—​which might be seen as an attempt to bridge that onto-

logical gap—​may be to synthesize auditory stimuli that resemble ‘real’ musi-
cal sounds, as can be achieved through the use of MIDI. For instance, Eitan
and Granot (2006) used synthesized piano sounds to study cross-​modal map-
pings of various musical features such as pitch, dynamics and speed. While
such an approach has the advantage of presenting participants with more
‘natural’ stimuli in comparison with pure tones, it leaves open the question of
whether the same results would have been obtained with, say, guitar or trom-
bone sounds. Any step towards a more ecological musical stimulus comes at the
cost of introducing new variables that need to be controlled for in an experi-
ment aiming to uncover causal relationships. And while advances in music syn-
thesizing software allow features such as ‘expression’ or rubato to be switched
on, the gap between this and human musical performance—​though gradually
shrinking—​is still very audible. When designing an experiment, researchers
thus need to consider carefully the advantages and disadvantages of employing
MIDI-​based sound stimuli.


Although arguably simplistic compared to real musical excerpts, pure tones can
be synthesized with varying degrees of complexity. However, most studies so
far—​at least those concerned with music cognition—​have included pure tones
whose features were manipulated in isolation. There is scope for many more
studies using controlled pure tones (or more naturally sounding ones, such as
MIDI sounds) whose features are concurrently varied in a systematic manner
(for recent examples see Eitan and Granot 2011; Küssner et al. 2014). As men-
tioned above, in most cases music consists of the dynamic co-​variation of sev-
eral musical parameters. These co-​variations may, to some extent, be recreated
48 Music and Shape

in the synthesis of pure tones, achieving more ecologically valid stimuli while
keeping possible confounding variables at a minimum.


In almost all studies investigating cross-​modal mappings of music, research-

ers have used relatively short excerpts from longer musical compositions.
One notable exception is the study by Tan and Kelly (2004) in which musi-
cally trained and untrained participants were asked to depict graphically whole
musical compositions. The authors raised the important issue that short musi-
cal excerpts, when taken out of context within a piece, may lead to varying
visualizations and cross-​modal mappings. I  agree that the context plays an
important role, perhaps not so much for basic mappings of sound features such
as pitch and loudness but for more elaborate (visual) representations of music
that take into account instrumentation, texture, harmony, repetition and so on.
Even though it is probably hardly ever feasible to include recordings of whole
symphonies in an experiment, there is scope for studying the effects of shorter,
yet complete musical compositions on people’s visual representations, and for
comparing them with responses to shorter, out-​of-​context, musical excerpts.


The ‘liveness’ aspect of a musical performance has recently attracted increased

attention, relating to topics such as audience engagement (Sloboda 2013),
performer–​audience interaction (Whitney 2013)  and emotional responses in
the listener (Egermann et al. 2013). Being physically present at a concert might
indeed give rise to quite different visualizations and representations of music
from those engendered when listening to a recording in a laboratory setting.
While one pioneering study (Hooper and Powell 1970) revealed that pictorial
representations of music in a live context led to more elaborate responses, there
is scope for more research of that kind. It should make intuitive sense that the
visual presence of musicians, their body movements and instruments, as well
as the presence of other audience members, may lead one to associate different
shapes from those generated during solitary listening.


Apart from the ‘liveness’ aspect, there is evidence that individuals’ motor activ-
ity during listening affects their cross-​modal mappings of music. For instance,
it has been suggested that the motor behaviour during listening influences chil-
dren’s visual representations of musical excerpts (Fung and Gromko 2001). A
group of children allowed to move with props or in sand while listening to the
music produced visualizations that included more detailed representations
Shape, drawing and gesture 49

of rhythm, beat and groupings of notes compared to a group of children who

were asked to sit still. Hooper and Powell (1970) reported similar results for
adults who were accompanying musical excerpts rhythmically: they showed
more elaborate visual representations than groups of adults who were told
either to listen carefully or to listen for enjoyment. It is therefore plausible that
the overt engagement in motor activities shifts our attention to rhythmic prop-
erties, which—​during suppression of motor activity—​might not have reached
the threshold of consciousness.


The nature of the task, including experimental stimuli but also the exact word-
ing of the instruction and participants’ interpretation of it, determines what
is being assessed during an experiment. As Rusconi and colleagues (2006)
pointed out in a critique of some classic psychophysical experiments investi-
gating pitch–​height mappings, there is a crucial difference between spontane-
ous and mandatory mappings. Spontaneous cross-​modal mappings are seen as
occurring automatically, independent of the context and possibly without our
being aware of it, whereas mandatory mappings require our full consciousness
and deliberate action. At best, the latter are used to refine some finding well
supported by empirical evidence; at worst, they introduce highly artificial cat-
egories to an experiment, leading to meaningless responses.
Besides mandatory cross-​modal mappings, which are restrained by a lim-
ited choice of response categories, there are also what might be called elabo-
rate responses. Whether spontaneous or not,8 they constitute free, unrestricted
responses to some stimulus, for instance by drawing a sound or a piece of
music. While such paradigms provide richer data than, for instance, reaction
time measures, they often also require some unstandardized analysis proce-
dure, complicating the comparisons between studies. Whichever paradigm
researchers apply after weighing advantages and disadvantages, it is important
that they are aware of the kind(s) of cross-​modal mappings they are measuring.


Thanks to the availability of adequate experimental tools such as electronic

graphics tablets (Küssner et al. 2011), researchers investigating visualizations
of sound and music have been able to study ‘sound-​tracings’ (Godøy et al.
2006) or the process of visualizing sounds (Küssner and Leech-​Wilkinson
2014). This approach might offer an additional angle different from the focus
in most previous studies, i.e. the final product of the sound/​music visualization.
Asking participants to draw or gesture along with the sound enables research-
ers to study not only how they map them cross-​modally but also the degree to
which they are in synchrony with various sound/​musical features. Particularly
50 Music and Shape

for researchers regarding perception as an active process based on action–​

perception cycles, paying attention to the action that creates a certain sound
visualization appears to be overdue.

Future directions

Cross-​modal mappings of sound and music have been studied for a long time,
dating back to the work of Carl Stumpf (1883), who investigated metaphori-
cal mappings of pitch, for instance. Matching tasks and reaction-​time para-
digms dominated psychological studies on cross-​modal correspondences in the
twentieth century and have given rise to an impressive amount of empirical
evidence (Spence 2011), with ample opportunity for future studies. However,
the advent of the embodied cognition research programme (Shapiro 2007) has
led to a rethinking of cognition and put considerable emphasis on the role of
the body and its interaction with the physical environment in cognitive pro-
cesses. Consequently, epistemologies are changing and new paradigms are
being developed that consider more carefully the role of the body in psycho-
logical experiments. In musicology, the formalization of an embodied music
cognition theory (Leman 2007) has given a new impulse to studying music cog-
nition with the direct involvement of the body. Leman’s ‘graphical attuning’
and Godøy’s (2006) ‘sound-​tracing’ represent new ways of studying sound and
music cross-​modally.
This progress would not have been possible, of course, without the devel-
opment of new technologies. Electronic graphics tablets and motion-capture
systems allow researchers to measure participants’ responses to sound and
music with unprecedented precision. Importantly, and in line with the notion
of embodied, goal-​directed actions in real time, they provide insights into
the shaping of cross-​modal correspondences rather than its final products—​the
shapes. However, such approaches come with issues pertaining to both the equip-
ment and the analysis techniques. For instance, some motion-capture systems
are still very expensive or require custom-​made software (Küssner et al. 2011).
Another problem is that participants have to wear markers (Maes et al. 2014),
move large joysticks (Leman et al. 2009) or hold a remote controller in their
hand (Küssner et al. 2014) in order to indicate the shapes of auditory stimuli
with their bodies. Thus, one of the challenges for (music) researchers will be to
develop tools that are both less intrusive and less costly.
Crucially, techniques for analysing tracings of sound and music need to be
developed further. First attempts have been made for the analysis of drawings
(Noyce, Küssner and Sollich 2013), as well as for free three-​dimensional gestures
(Nymoen et al. 2013), identifying techniques such as Gaussian Processes, non-
parametric and canonical correlations, and pattern recognition classifiers. Due
to music’s unfolding nature over time, the issue of analysing time-​dependent
Shape, drawing and gesture 51

data is a well-​known problem in music psychology and has been discussed

before by scholars concerned with emotional responses to music (Levitin et al.
2007; Schubert and Dunsmuir 1999). Only joint efforts by researchers from
various disciplines such as musicology, psychology, mathematics and computer
science make such endeavours possible nowadays, and it will be pivotal to (con-
tinue to) share testing software and detailed insight into analysis techniques
between researchers in the future.


I have provided an overview of studying the perceived shapes of sound and

music from various methodological and epistemological angles. Traditional
reaction-​time paradigms, among other approaches, have revealed how people
map auditory features onto the visual or visuo-​spatial domain. Recent stud-
ies involving people’s overt bodily representations of sound and music are an
opportunity to develop a fresh perspective on a familiar subject. The exten-
sive empirical work carried out in the realm of developmental psychology has
shown that children’s drawings can reveal a great deal about how they make
sense of sound and music. And there is no reason to believe that there should
be any less revelatory potential for adults’ drawings. Indeed, first empirical
investigations with adults have proven useful in illuminating the role of musi-
cal training in music cognition, the effect of literacy in cross-​cultural compar-
isons of sound shapes and the role of cognitive skills in a clinical setting, to
name but a few contexts. Even more so, free bodily gestures provide scope for
studies investigating cross-​modal mappings of sound and music. Although
gestures are ubiquitous in everyday life and often observed in response
to music—​ from finger tapping to dancing—​ only recent developments in
motion-​capture technologies have turned them into a serious alternative for
studying sound and music cross-​modally. Leman’s (2007) ‘second-​person
descriptions’—​subjective responses to music articulated through verbal or
nonverbal descriptions of bodily phenomena—​provide a theoretical basis on
which more research into cross-​modal perception and cognition by means of
drawings and gestures can be carried out. Godøy and colleagues realized the
potential of carrying out ‘systematic and large-​scale studies of sound-​gesture
relationships’ at least a decade ago, if not before (Godøy 1997, 2006). First
attempts have been made—​as shown in this chapter—​but there is still a long
way to go. In conclusion, perhaps only one thing is clear: if multimodal per-
ception of music is indeed essentially based on our bodies interacting with
the environment, using appropriate body-​centred experimental paradigms
and analysis techniques to investigate cross-​modal mappings of music will be
a necessary step on our mission to capture the full breadth of human musical
52 Music and Shape


This work was supported by King’s College London and by the AHRC
Research Centre for Musical Performance as Creative Practice (grant number


Athanasopoulos, G. and N. Moran, 2013:  ‘Cross-​ cultural representations of musical

shape’, Empirical Musicology Review 8/​3–​4: 185–​99.
Bamberger, J., 1980: ‘Cognitive structuring in the apprehension and description of simple
rhythms’, Archives de psychologie 48: 171–​99.
Bamberger, J., 1982:  ‘Revisiting children’s drawings of simple rhythms:  a function for
reflection-​in-​action’, in S. Strauss, ed., U-​shaped Behavioral Growth (New York: Academic
Press), pp. 191–​226.
Barrett, M. S., 2000: ‘Windows, mirrors and reflections: a case study of adult constructions
of children’s musical thinking’, Bulletin of the Council for Research in Music Education
145: 43–​61.
Barrett, M. S., 2005: ‘Representation, cognition, and communication: invented notation in
children’s musical communication’, in D. Miell, R. MacDonald and D. Hargreaves, eds.,
Musical Communication (Oxford: Oxford University Press), pp. 117‒42.
Burger, B., 2013: ‘Move the way you feel: effects of musical features, perceived emo-
tions, and personality on music-​induced movement’ (PhD dissertation, University of
Caramiaux, B., F. Bevilacqua, T. Bianco, N. Schnell, O. Houix and P. Susini, 2014: ‘The role
of sound source perception in gestural sound description’, ACM Transactions on Applied
Perception 11/​1: Article 1.
Claxton, G., 1980: ‘Cognitive psychology: a suitable case for what sort of treatment?’, in
G. Claxton, ed., Cognitive Psychology: New Directions (London: Routledge & Kegan
Paul), pp. 1‒25.
Cowles, J. T., 1935: ‘An experimental study of the pairing of certain auditory and visual
stimuli’, Journal of Experimental Psychology 18/​4: 461–​9.
Davidson, L. and B. Colley, 1987: ‘Children’s rhythmic development from age 5 to 7: per-
formance, notation, and reading of rhythmic patterns’, in J. C. Peery, I. W. Peery and T.
W. Draper, eds., Music and Child Development (New York: Springer), pp. 107–​36.
Davidson, L. and L. Scripp, 1988: ‘Young children’s musical representations: windows on
music cognition’, in J. A. Sloboda, ed., Generative Processes in Music: The Psychology
of Performance, Improvisation, and Composition (New York: Oxford University Press),
pp. 195–​230.
Davidson, L., L. Scripp and P. Welsh, 1988: ‘ “Happy Birthday”: evidence for conflicts of
perceptual knowledge and conceptual understanding’, Journal of Aesthetic Education
22/​1: 65–​74.
Davies, E., 2006:  Beyond Dance:  Laban’s Legacy of Movement Analysis (New  York:
Shape, drawing and gesture 53

De Bruyn, L., D. Moelants and M. Leman, 2012: ‘An embodied approach to testing musi-
cal empathy in participants with an autism spectrum disorder’, Music and Medicine
4/​1: 28–​36.
Dolscheid, S., S. Shayan, A. Majid and D. Casasanto, 2013: ‘The thickness of musical pitch:
psychophysical evidence for linguistic relativity’, Psychological Science 24/​5: 613–​21.
Egermann, H., M. Pearce, G. Wiggins and S. McAdams, 2013:  ‘Probabilistic models of
expectation violation predict psychophysiological emotional responses to live concert
music’, Cognitive, Affective, & Behavioral Neuroscience 13/​3: 533–​53.
Eitan, Z., 2013: ‘How pitch and loudness shape musical space and motion: new findings
and persisting questions’, in S.-​L. Tan, A. Cohen, S. Lipscomb and R. Kendall, eds.,
The Psychology of Music in Multimedia (Oxford: Oxford University Press), pp. 161–​87.
Eitan, Z. and R. Y. Granot, 2006: ‘How music moves: musical parameters and listeners’
images of motion’, Music Perception 23/​3: 221–​48.
Eitan, Z. and R. Y. Granot, 2011: ‘Listeners’ images of motion and the interaction of
musical parameters’, paper presented at the 10th Conference of the Society for Music
Perception and Cognition (SMPC), Rochester, NY, USA, 11–​14 August 2011.
Eitan, Z. and R. Timmers, 2010: ‘Beethoven’s last piano sonata and those who follow croc-
odiles: cross-​domain mappings of auditory pitch in a musical context’, Cognition 114/​3:
Elkoshi, R., 2002:  ‘An investigation into children’s responses through drawing, to short
musical fragments and complete compositions’, Music Education Research 4/​2: 199–​211.
Espeland, M., 1987: ‘Music in use: responsive music listening in the primary school’, British
Journal of Music Education 4/​3: 283–​97.
Fitts, P. M. and C. M. Seeger, 1953: ‘SR compatibility: spatial characteristics of stimulus
and response codes’, Journal of Experimental Psychology 46/​3: 199–​210.
Fung, C. V. and J. E. Gromko, 2001: ‘Effects of active versus passive listening on the qual-
ity of children’s invented notations and preferences for two pieces from an unfamiliar
culture’, Psychology of Music 29/​2: 128–​38.
Galantucci, B., C. A. Fowler and M. T. Turvey, 2006: ‘The motor theory of speech percep-
tion reviewed’, Psychonomic Bulletin & Review 13/​3: 361–​77.
Godøy, R. I., 1997: ‘Knowledge in music theory by shapes of musical objects and sound-​
producing actions’, in M. Leman, ed., Music, Gestalt, and Computing: Studies in
Cognitive and Systematic Musicology (Berlin: Springer), pp. 89–​102.
Godøy, R. I., 2003: ‘Motor-​mimetic music cognition’, Leonardo 36/​4: 317–​19.
Godøy, R. I., 2006: ‘Gestural-​sonorous objects: embodied extensions of Schaeffer’s con-
ceptual apparatus’, Organised Sound 11/​2: 149–​57.
Godøy, R. I., 2010: ‘Gestural affordances of musical sound’, in R. I. Godøy and M. Leman,
eds., Musical Gestures: Sound, Movement, and Meaning (New York: Routledge).
Godøy, R. I., E. Haga and A. R. Jensenius, 2006: ‘Exploring music-​related gestures by
sound-​tracing: a preliminary study’, paper presented at the 2nd ConGAS International
Symposium on Gesture Interfaces for Multimedia Systems, Leeds, UK, 9–10 May 2006.
Gromko, J. E., 1994: ‘Children’s invented notations as measures of musical understanding’,
Psychology of Music 22/​2: 136–​47.
Gromko, J. E., 1995: ‘Invented iconographic and verbal representations of musical sound:
their information content and usefulness in retrieval tasks’, The Quarterly Journal of
Music Teaching and Learning 6: 32–​43.
54 Music and Shape

Haga, E., 2008: ‘Correspondences between music and body movement’ (PhD dissertation,
University of Oslo).
Hair, H. I., 1993: ‘Children’s descriptions and representations of music’, Bulletin of the Council
for Research in Music Education 119: 41–​8.
Hargreaves, D. J., 1978: ‘Psychological studies of children’s drawing’, Educational Review
30/​3: 247–​54.
Hooper, P. P. and E. R. Powell, 1970: ‘Influences of musical variables on pictorial connota-
tions’, Journal of Psychology 76/​1: 125–​8.
Huron, D. and D. Shanahan, 2013: ‘Eyebrow movements and vocal pitch height: evidence
consistent with an ethological signal’, The Journal of the Acoustical Society of America
133/​5: 2947–​52.
Huron, D., S. Dahl and R. Johnson, 2009: ‘Facial expression and vocal pitch height: evi-
dence of an intermodal association’, Empirical Musicology Review 4/​3: 93–​100.
Kerchner, J. L., 2000: ‘Children’s verbal, visual, and kinesthetic responses: insight into their music
listening experience’, Bulletin of the Council for Research in Music Education 146: 31–​50.
Knöferle, K. and C. Spence, 2012:  ‘Crossmodal correspondences between sounds and
tastes’, Psychonomic Bulletin & Review 19/​6: 992–​1006.
Kohn, D. and Z. Eitan, 2009: ‘Musical parameters and children’s movement responses’, in
J. Louhivuori, T. Eerola, S. Saarikallio, T. Himberg and P. S. Eerola, eds., 7th Triennial
Conference of the European Society for the Cognitive Sciences of Music (Jyväskylä: ESCOM).
Kozak, M., K. Nymoen and R. I. Godøy, 2012:  ‘Effects of spectral features of sound
on gesture type and timing’, in E. Efthimiou, G. Kouroupetroglou and S.-​E. Fotinea,
eds., Gesture and Sign Language in Human–​ Computer Interaction and Embodied
Communication (Berlin: Springer), pp. 69–​80.
Küssner, M. B., 2013: ‘Music and shape’, Literary and Linguistic Computing 28/​3: 472–​9.
Küssner, M. B., 2014: ‘Shape, drawing and gesture: cross-​modal mappings of sound and
music’ (PhD dissertation, King’s College London).
Küssner, M. B. and D. Leech-​Wilkinson, 2014:  ‘Investigating the influence of musical
training on cross-​modal correspondences and sensorimotor skills in a real-​time drawing
paradigm’, Psychology of Music 42/​3: 448–​69.
Küssner, M. B., N. Gold, D. Tidhar, H. M. Prior and D. Leech-​ Wilkinson, 2011:
‘Synaesthetic traces: digital acquisition of musical shapes’, presented at the Supporting
Digital Humanities Conference: Answering the unaskable, Copenhagen, Denmark, 17–​
18 November 2011.
Küssner, M. B., D. Tidhar, H. M. Prior and D. Leech-​Wilkinson, 2014: ‘Musicians are
more consistent: gestural cross-​modal mappings of pitch, loudness and tempo in real-​
time’, Frontiers in Psychology 5/​789, (accessed
9 April 2017).
Lakoff, G. and M. Johnson, 1980: Metaphors We Live By (Chicago: University of Chicago
Leman, M., 2007:  Embodied Music Cognition and Mediation Technology (Cambridge,
MA: MIT Press).
Leman, M., F. Desmet, F. Styns, L. Van Noorden and D. Moelants, 2009: ‘Sharing musical
expression through embodied listening: a case study based on Chinese Guqin music’,
Music Perception 26/​3: 263–​78.
Levitin, D. J., R. L. Nuzzo, B. W. Vines and J. O. Ramsay, 2007: ‘Introduction to functional
data analysis’, Canadian Psychology/​Psychologie canadienne 48/​3: 135–​55.
Shape, drawing and gesture 55

Lewkowicz, D. J. and N. J. Minar, 2014: ‘Infants are not sensitive to synesthetic cross-​modality

correspondences: a comment on Walker et al. (2010)’, Psychological Science 25/​3: 832‒4.
Liberman, A. M. and I. G. Mattingly, 1985:  ‘The motor theory of speech perception
revised’, Cognition 21/​1: 1–​36.
Maes, P.-​J. and M. Leman, 2013: ‘The influence of body movements on children’s percep-
tion of music with an ambiguous expressive character’, PloS ONE 8/​1: e54682.
Maes, P.-​J., E. Van Dyck, M. Lesaffre, M. Leman and P. M. Kroonenberg, 2014:  ‘The
coupling of action and perception in musical meaning formation’, Music Perception
32/​1: 67–​84.
Marks, L. E., 2004: ‘Cross-​modal interactions in speeded classification’, in G. A. Calvert, C.
Spence and B. E. Stein, eds., Handbook of Multisensory Processes (Cambridge, MA: MIT
Press), pp. 85–​105.
Noyce, G. L., M. B. Küssner and P. Sollich, 2013: ‘Quantifying shapes: mathematical tech-
niques for analysing visual representations of sound and music’, Empirical Musicology
Review 8/​2: 128–​54.
Nymoen, K., B. Caramiaux, M. Kozak and J. Torresen, 2011: ‘Analyzing sound tracings:
a multimodal approach to music information retrieval’, paper presented at the 1st
International ACM Workshop on Music Information Retrieval with User-​Centered and
Multimodal Strategies (MIRUM), Scottsdale, AZ, USA, 28 November–1 December 2011.
Nymoen, K., R. I. Godøy, A. R. Jensenius and J. Torresen, 2013:  ‘Analyzing corre-
spondence between sound objects and body motion’, ACM Transactions on Applied
Perception 10/​2: Article 9.
Olson, D. R., 1970:  Cognitive Development:  The Child’s Acquisition of Diagonality
(New York: Academic Press).
Parkinson, C., P. J. Kohler, B. Sievers and T. Wheatley, 2012: ‘Associations between audi-
tory pitch and visual elevation do not depend on language: evidence from a remote
population’, Perception 41/​7: 854–​61.
Piaget, J. and B. Inhelder, 1973: Memory and Intelligence (London: Routledge & Kegan Paul).
Pratt, C. C., 1930: ‘The spatial character of high and low tones’, Journal of Experimental
Psychology 13/​3: 278–​85.
Prinz, W., 1990: ‘A common coding approach to perception and action’, in O. Neumann and
W. Prinz, eds., Relationships between Perception and Action (Berlin: Springer), pp. 167–​201.
Reybrouck, M., L. Verschaffel and S. Lauwerier, 2009:  ‘Children’s graphical notations
as representational tools for musical sense-​making in a music-​listening task’, British
Journal of Music Education 26/​2: 189–​211.
Roederer, J. G., 1984: ‘The search for a survival value of music’, Music Perception 1/​3: 350–​6.
Rusconi, E., B. Kwan, B. L. Giordano, C. Umiltà and B. Butterworth, 2006: ‘Spatial repre-
sentation of pitch height: the SMARC effect’, Cognition 99/​2: 113–​29.
Schaeffer, P., 1966: Traité des objets musicaux (Paris: Editions du Seuil).
Schubert, E. and W. Dunsmuir, 1999: ‘Regression modelling continuous data in music psychol-
ogy’, in S. W. Yi, ed., Music, Mind, and Science (Seoul: National University Press), pp. 298–​352.
Shapiro, L., 2007:  ‘The embodied cognition research programme’, Philosophy Compass
2/​2: 338–​46.
Sloboda, J. A., 2013: ‘How does it strike you? Obtaining artist-​directed feedback from the
audience at a site-​specific performance of a Monteverdi opera’, paper presented at the
Perfor­mance Studies Network Second International Conference, Cambridge, UK, 4–​7
April 2013.
56 Music and Shape

Smith, K. C., L. L. Cuddy and R. Upitis, 1994:  ‘Figural and metric understanding of
rhythm’, Psychology of Music 22/​2: 117–​35.
Spence, C., 2011: ‘Crossmodal correspondences: a tutorial review’, Attention, Perception, &
Psychophysics 73/​4: 971–​95.
Spence, C. and O. Deroy, 2012: ‘Crossmodal correspondences: innate or learned?’, i-​Perception
3/​5: 316–​18.
Stumpf, C., 1883: Tonpsychologie (Leipzig: S. Hirzel).
Suzuki, S., E. Mills and T. C. Murphy, 1973:  The Suzuki Concept:  An Introduction to a
Successful Method for Early Music Education (Berkeley, CA: Diablo Press).
Tan, S.-​L. and M. E. Kelly, 2004: ‘Graphic representations of short musical compositions’,
Psychology of Music 32/​2: 191–​212.
Thompson, M., 2012: ‘The application of motion capture to embodied music cognition
research’ (PhD dissertation, University of Jyväskylä).
Trehub, S. E. and L. Trainor, 1998: ‘Singing to infants: lullabies and play songs’, in C. Rovee-​
Collier, L. P. Lipsitt and H. Hayne, eds., Advances in Infancy Research, Vol. 12 (Stamford,
CT: Ablex), pp. 43–​78.
Trimble, O. C., 1934: ‘Localization of sound in the anterior-​posterior and vertical dimen-
sions of “auditory” space’, British Journal of Psychology: General Section 24/​3: 320–​34.
Upitis, R., 1987: ‘Children’s understanding of rhythm: the relationship between develop-
ment and music training’, Psychomusicology: Music, Mind & Brain 7/​1: 41–​60.
Upitis, R., 1990:  ‘Children’s invented notations of familiar and unfamiliar melodies’,
Psychomusicology: A Journal of Research in Music Cognition 9/​1: 89–​106.
Upitis, R., 1992: Can I Play You My Song? The Compositions and Invented Notations of
Children (Portsmouth, NH: Heinemann).
Van Dyck, E., 2013: ‘The influence of music and emotion on dance movement’ (PhD dis-
sertation, Ghent University).
Verschaffel, L., M. Reybrouck, M. Janssens and W. Van Dooren, 2010: ‘Using graphical
notations to assess children’s experiencing of simple and complex musical fragments’,
Psychology of Music 38/​3: 259–​84.
Walker, P., J. G. Bremner, U. Mason, J. Spring, K. Mattock, A. Slater and S. P. Johnson, 2010:
‘Preverbal infants’ sensitivity to synaesthetic cross-​modality correspondences’, Psychological
Science 21/​1: 21–​5.
Walker, R., 1987: ‘The effects of culture, environment, age, and musical training on choices
of visual metaphors for sound’, Perception & Psychophysics 42/​5: 491–​502.
Werner, H., 1980:  Comparative Psychology of Mental Development, 3rd edn (New  York:
International Universities Press).
Whitney, K., 2013: ‘Singing in duet with the listener’s voice: a dynamic model of the joint shap-
ing of musical content in live concert performance’, paper presented at the Performance
Studies Network Second International Conference, Cambridge, UK, 4–​7 April 2013.
Zbikowski, L. M., 2002: Conceptualizing Music: Cognitive Structure, Theory, and Analysis
(New York: Oxford University Press).
Anna Meredith, composer

Shape is both the most important aspect of my composing and the hardest
thing to describe. Before I write any piece, whether a piece for orchestra or an
electronic track, I draw a sketch of its contour along a timeline; so my drawers
are stuffed with pages of jaggy lines, builds and cuts which help me control my
pacing—​one of the most important things to me in my music. One of these
sketches and its associated composition can be accessed at .
As to what the lines mean, that’s harder to pin down. At the risk of sounding
flaky, I think the best description might be that they are tracing the energy of
a piece. So a big diagonal build on my sketch might not necessarily mean ‘get
louder’ or ‘get faster’ but could suggest a way of controlling the musical energy
of an idea as my way of showing the trajectory of a line or fragment I’ve come
up with.
When I’m writing a piece, it feels like this drawing/​sketching process is my
way of auditioning my ideas:  so if I’ve got something, no matter how little,
I  then imagine it going through the dramatic shapes I  need for the piece to
see if the material will be appropriate. This involves keeping half an eye on a
stopwatch while striding round my studio tunelessly singing bits of the mate-
rial and muttering things like ‘idea breaks apart and glitches here’ or ‘melodic
line builds until it takes over whole ensemble’, to see if I think it’ll work. Once
I’ve got the right ideas, and am confident that they’ll stand up to the drama I’ve
got planned for them, my next step becomes more of a zooming in, looking at
part of the shapes, working out exactly how I get from A to B and filling in the


Cross-​modal correspondences and affect in

a Schubert song
Zohar Eitan, Renee Timmers and Mordechai Adler

Western music is imbued with conventional mappings of musical features onto

aspects of the human and natural world. Some such correspondences have
become well-​established musical symbols. Melodic fall and rise, for instance,
have represented both physical and metaphorical descents and ascents at least
since the ninth century C.E., as settings of ‘descendit de caelis’ and ‘ascendit in
caelum’ in the Credo of the Latin mass attest. Experimental work in psycho-
physics, perception and cognition, however, suggests that such mappings are
not mere conventions of musical style, since mappings of sound dimensions
like pitch and loudness onto nonauditory features (e.g. visual brightness, object
size or height) consistently and automatically occur outside musical contexts.
There is abundant evidence that cross-​modal associations involving auditory
features may be activated automatically and implicitly, in particular in the con-
text of simultaneous stimulation. If visual and auditory stimuli, for instance,
co-​occur, and participants are asked to respond to one dimension only, the
second dimension nevertheless influences processing of the first dimension. In
particular, presentations of visual stimuli that are congruent or incongruent
with auditory stimuli (e.g. spatial height and pitch ‘height’, visual brightness
and auditory loudness) facilitate or interfere with the processing of the audi-
tory stimuli and vice versa (for reviews of relevant empirical research see Eitan
2013; Eitan and Timmers 2010; Marks 2004; Spence 2011). Furthermore, while
cross-​modal mappings of sound are widely reflected in language (e.g. ‘high’
and ‘low’ pitch, ‘bright’ and ‘dark’ sound) and in other conventional symbolic
idioms, such as western music notation, ample research suggests that they may
originate from sources other than language or culturally ingrained convention.
For instance, some audiovisual mappings may be discerned in preverbal infants
and even in nonhuman species. These include correspondences of pitch and
Cross-modal correspondences in a Schubert song 59

spatial height (Walker et al. 2010; Wagner et al.1981; see Lewkowicz and Minar
2014 for a critique), pitch and visual shape (e.g. round versus sharp; Walker et
al. 2010), pitch and luminance (Ludwig, Adachi and Matsuzawa 2011), pitch
and physical size (Morton 1994; see also Tsur 2006), and loudness and lumi-
nance (Lewkowicz and Turkewitz 1980). As can be seen from these examples,
mappings of auditory features onto visual-​spatial dimensions in particular are
frequent, highlighting a possible central role of notions related to shape.
While experimental studies can suggest the kind of mappings expected to
play a role in music listening, the actual manifestation of cross-​modal interac-
tion in music may be confounded by the diversity of mappings that might be acti-
vated simultaneously, and by contextual factors that influence the connotations
activated. We aim to demonstrate that, multiplicity and context-​dependency
notwithstanding, an analysis of cross-​domain mappings in music, informed by
experimental findings in cross-​modal research, can elucidate important aspects
of musical meaning and reference. In particular, we examine the interrelation-
ship between two central pillars of musical meaning:  cross-​modal and emo-
tional mappings of musical features. Furthermore, we aim to demonstrate how
multiplicity of cross-​modal interaction is instrumental in generating complex,
multilayered musical meanings, which in combination may often be most easily
and efficiently summarized by the metaphor of shape. Investigating a musical
setting of a text permeated with references to nonauditory sensory domains
may serve as a useful point of departure for such endeavour. We chose to con-
centrate on Schubert’s well-​known (and oft-​discussed) setting of Heine’s ‘Die
Stadt’ (from Schwanengesang D. 957), examining both score-​based (composi-
tional) and performance-​based features, the latter grounded on quantitative
analysis of recorded music.


An important and under-​investigated issue concerning cross-​modal correspon-

dences of musical features is the one-​to-​many relationships these correspon-
dences often present: a feature of musical sound or structure may correspond
with diverse nonauditory dimensions. Higher pitch, for instance, corresponds
perceptually with smaller object size, higher spatial location, lighter colour
or sharper (pointed) shape, among other features (Eitan and Timmers 2010).
Likewise, different attributes of music or musical sound may conspire in rep-
resenting a single nonauditory feature. For instance, lower pitch, increased
loudness and longer duration all correspond with larger, heavier objects (for
research reviews see Eitan 2013; Eitan and Timmers 2010; Marks 2004; Spence
2011). Only a few studies, however, have investigated how concurrent variations
in multiple auditory dimensions (ubiquitous in music) affect cross-​modal cor-
respondences (but see Adler 2014 for relevant experimental work).
60 Music and Shape

Importantly, in musical contexts, auditory features that map onto features

of other sensory domains (such as vision, touch or motion) also associate regu-
larly with dimensions of emotion, like valence and activity, and with specific
basic emotions. For instance, low pitch, corresponding perceptually with dark
colour and dim light (Ludwig et al. 2011; Marks 1989, 2004; Melara 1989;
Spence 2011), also suggests negative emotional valence, particularly sadness
(Collier and Hubbard 2001, 2004; Eitan and Timmers 2010; Hevner 1937).
These are nonarbitrary mappings, as indicated by implicit cross-​modal effects
of emotional associations. For example, musical features associated with emo-
tion influence the emotional processing of visual scenes (Cohen 2001; Boltz
2004). Correspondingly, emotions associated with visual (Boltz 2013; Timmers
and Crook 2014) or verbal stimuli (Weger et al. 2007) influence the perception
of music presented concurrently, such that, for instance, positive or negative
valence associated with words or images may enhance the perception of high
and low pitches respectively.
Furthermore, dimensions of nonauditory modalities that often map onto
features of sound (e.g. luminance or height) may themselves be associated
with emotion. For instance, the dichotomies dim-​bright and dark-​light, which
map onto pitch and loudness, also associate with emotional valence, such that
brighter light and lighter colour correlate with positive valence. This is evident
both in language (e.g. ‘dark’ and ‘bright’ moods) and in nonverbal measures
of emotion, often expressed implicitly. For instance, positively valenced words
were processed faster when printed in white, rather than black; the opposite was
true for negative words (Meier, Robinson and Clore 2004). Correspondingly,
evaluation words positively or negatively affected brightness perception (Meier
and Robinson 2005; Meier et al. 2007), suggesting that the origin of the valence
attribution may be related to the evaluation of day versus night. Similarly,
spatial height and spatial rise correlate with positive emotion, as suggested
by both language metaphors (feeling high or low, high-​spirited) and implicit
nonverbal measures. For instance, the valence of words presented to partici-
pants affects spatial–​visual attention, such that positive words shift attention
upwards, and negative words shift attention downwards (Meier and Robinson
2004). Analogously, moving objects up or down enhances recall of positive and
negative episodic memories respectively (Casasanto and Dijkstra 2010; see also
Freddi, Cretenet and Dru 2013, and Meier and Robinson 2005).
In interpreting cross-​ domain mappings in music, then, an interrelated
triad of mappings should be considered:  between sound and other sensory
modalities (in particular visual–​spatial), sound and emotion, and nonauditory
modalities and emotion (e.g. low pitch–​dark colour, low pitch–​sadness, dark
colour–​sadness); each of these three types of correspondence itself suggests
multiple mappings (e.g. low pitch may be dark, but also large and spatially
low). Furthermore, some cross-​modal mappings of music may be mediated by
shared emotional associations. For instance, Palmer et  al. (2013) show, in a
Cross-modal correspondences in a Schubert song 61

cross-​cultural study, that listeners’ colour and emotional associations of musi-

cal pieces are strongly correlated. Thus, for instance, slower music in the minor
mode may be perceived as ‘darker’ since musical features such as minor mode
or slow tempo and visual features (e.g. dark colour) are both associated with
similar (e.g. ‘sad’) emotional quality. Contributions to and modulations of this
triadic mapping between emotion, sound and visual-​spatial metaphors lie in
the realm of both performers and composers, and it is something that unfolds
dynamically over time.
As an example that has a rich tradition of performance and commentary,
we explore the interrelationships between cross-​modal and affective mappings
of musical features through both score-​based and performance analyses of
Schubert’s ‘Die Stadt’, his setting of Heine’s ‘Am fernen Horizonte’. We inves-
tigate how Schubert’s text-​setting employs both types of mappings and how
different performances of this lied modulate such mappings. We analyse points
in the music where cross-​modal and affective features are aligned, and where
they seem noncongruent (e.g. a sunrise revealing a ‘dark’ emotional state). In
exploring points where different musical features may suggest contradictory
mappings, we investigate both the composer’s and the performers’ strategies in
dealing with such complexities.
We begin with a brief commentary on Heine’s text, examining interrelation-
ships, congruencies and incongruences between its visual, kinaesthetic and
emotional features. We show how a set of contrasts and parallelisms between
the poem’s three stanzas are suggested by these interrelations. These observa-
tions lead to an analysis of the musical structure of Schubert’s setting, reflect-
ing the structures observed in the text analysis. In particular, the musical
analysis emphasizes cross-​modal mappings and their relationships with emo-
tion and affect. We compare (using quantitative analysis of acoustic data) three
recorded performances of the lied, by Dietrich Fischer-​Dieskau (henceforth
DFD), Ian Bostridge (IB) and Thomas Quasthoff (TQ), examining how the
performed interpretations reflect or modify the interrelationships of cross-​
modal and affective mappings in Schubert’s Heine setting. Finally, we discuss
the contribution of the notion of shape in the analysis and its relevance for
coherence of perception within an ever-​varied multidimensional context.

Heine’s ‘Am fernen Horizonte’: perception, emotion

and narrative structure

A retrospective précis of Heine’s ‘Am fernen Horizonte’ (renamed ‘Die Stadt’ in

Schubert’s setting) may communicate the following: a broken-​hearted narrator
travels by boat from dusk to sunrise, gazing at the city in which he has lost his
beloved. As the sun rises, the city is radiantly revealed and with it the narrator’s
glowing heartbreak.
62 Music and Shape

TABLE 3.1   Original text and English translation of ‘Am fernen Horizonte’

Am fernen Horizonte On the far horizon

Erscheint, wie ein Nebelbild, Appears, like a misty vision,
Die Stadt mit ihren Türmen, The town, with its turrets
In Abenddämmrung gehüllt. Shrouded in dusk.
Ein feuchter Windzug kräuselt A damp wind ruffles
Die graue Wasserbahn; The course of the grey water;
Mit traurigem Takte rudert With mournful strokes
Der Schiffer in meinem Kahn. The boatman rows my boat.
Die Sonne hebt sich noch einmal The sun rises once more,
Leuchtend vom Boden empor Glowing upwards from the earth
Und zeigt mir jene Stelle, And shows me that place
Wo ich das Liebste verlor. Where I lost my beloved.

Translation by Richard Wigmore

In the poem itself, however (Table 3.1), this narrative is revealed only at
the very end. The first stanza describes a city seen from afar at dusk. We are
given no explicit information about the narrator’s identity, emotions, actions or
whereabouts (indeed, a naïve reading of this stanza could ascribe it to a third-​
person narrator, gazing impartially at a remote view). We know nothing of a
water-​trip or of the grief of lost love. We don’t know who gazes at the town or
what (if anything) it means to him.
What we do obtain is considerable visual information. We know that the
town is seen from afar, at the horizon (Am fernen Horizonte). We know that
its outlines are veiled as a foggy image (wie ein Nebelbild) and rather dark,
shrouded in dusk (In Abenddämmrung gehüllt). We know quite a bit about
space and light, but little (at least explicitly) about anything else that matters.
While the first stanza presents a gaze at a remote and static object (the town),
the second stanza is a close-​up shot of the narrator’s immediate surroundings
(water, boat, boatman’s rowing), involving both motion (Windzug kräuselt, rud-
ert) and emotion (traurigem). Furthermore, at the end of this stanza it is clear
that the poem is narrated by its own protagonist in a first-​person narration (in
meinem Kahn) and that this protagonist is neither objective nor impartial. Even
the oar strokes are described as ‘mournful’ (traurigem Takte), though we can,
at this stage, only guess what the mourning is about.
In perceptual terms, then, the second stanza contrasts with the first with
regard to distance (far–near) and motion (dynamic–​static; Table 3.2). Note
that in addition to dimensions of motion and colour (graue Wasserbahn), this
stanza also involves the tactile modality (feuchter Windzug), consistent with the
close-​by perspective it presents. These changes in the depiction of perceptual
realms are in line with the changes in narrative perspective, stressing first-​per-
son narrative and strong (though still subdued) emotions.
It is only in the final (third) stanza—​indeed, only in its last line—​that the
crux of the poem is revealed: we now know what the town means to the nar-
rator and why he keeps gazing at it from dusk to sunrise.1 Appropriately, the
Cross-modal correspondences in a Schubert song 63

TABLE 3.2   ‘Die Stadt’, stanza 1 versus 2: contrasting and parallel dimensions

Dimension 1st Stanza 2nd Stanza

Distance Far Near

Motion Static/​passive Dynamic (oar strokes, wind)
Light Dark, misty Grey
Sensory modalities Visual Kinaesthetic, tactile, visual
Emotion Implicit Explicit (‘mournful’)
Narration mode Yet unknown First person

TABLE 3.3   ‘Die Stadt’, stanza 1 versus 3: contrasting and parallel dimensions

Dimension 1st Stanza 3rd Stanza

Object described Town Town

Distance Far Apparently nearer
Motion Static Dynamic—​upward (sunrise)
Luminosity Dark Bright
Lucidity Misty Clear
Sensory modalities Visual Visual
Emotion Implicit Explicit (‘loss of beloved’)
Narration mode Yet unknown First person

agent of this narrative and its emotional revelation is the very source of clar-
ity: the rising sun itself.
The third stanza both parallels and contrasts with the first (see Table 3.3);
no less importantly, it complements it. Both stanzas involve viewing the same
object—​the town—​and both emphasize the perceptual dimension of visual
brightness. However, the two stanzas contrast with regard to the view itself,
as well as its emotional underpinnings. Visually, the scene is now bright and
painfully clear, highlighted by the glowing, rising sun, and thus contrasted with
the darker, dim view of the opening stanza. Moreover, the last stanza involves
motion and change (particularly upward motion, associated with positive and
active emotions), rather than stasis:  the sun is ‘rising from the earth’ (vom
Boden empor).2
These ‘perceptual’ contrasts between the stanzas are accompanied by
emotional and narrative correlates, as the previously veiled connotations
of the distant town now become painfully clear to both reader and pro-
tagonist. However, from another perspective, perceptual metaphor and
emotional import are strikingly incongruous here. As mentioned, visual
brightness (luminosity) and lightness widely serve as metaphors for emo-
tional valence, such that brighter light and lighter colour correlate with
positive valence. This association, evident in verbal metaphor (Stimmung
64 Music and Shape

hellt sich auf—​literally, mood brightens up),3 also affects behaviour and
cognition implicitly and automatically, as evidenced in diverse empirical
work (Meier and Robinson 2005; Meier et al. 2007). Similarly, spatial rise
correlates with active and positive emotion, as suggested by both language
metaphors (e.g. Die Stimmung steigt—​the mood rises) and nonverbal exper-
imental measures (Casasanto and Dijkstra 2010; Freddi et al. 2013; Meier
and Robinson 2004).
In particular, the rising sun serves as a metaphor for ‘elevated’—​hopeful,
cheerful and active—​emotions:
• … he who kisses the joy as it flies /​Lives in eternity’s sun rise (Blake,
• But soft! What light through yonder window breaks? It is the east,
and Juliet is the sun! (Shakespeare, Romeo and Juliet, II/​ii)

Such hopeful or joyful, ‘sunny’ emotions are often associated with renewal or
• The sunrise is a glorious birth (Wordsworth, ‘Intimations of
• Was it light that spake from the darkness /​Or music that shone from
the word /​When the night was enkindled with sound /​of the sun or
the first-​born bird? (Swinburne, ‘Music: An Ode’)
• Himmelhoch jauchzend, zum Tode betrübt (heavenly rejoicing, then

deathly sorrowing; Goethe, Egmont)

In ‘Die Stadt’, in apparent incongruity with such metaphors and with a host
of empirical studies of the association of light and mood mentioned above, the
rising sun (which, importantly, is the subject and active agent of this stanza: it
‘shows’ the protagonist the town) evokes a ‘dark’ memory and a mood of
mournful, hopeless despair. In the next sections, we investigate how both
Schubert and some of his most prominent present-​day performers encounter
this seeming contradiction, as well as other aspects of Heine’s imagery.

Schubert’s reframing of Heine’s narrative

Schubert’s setting of Heine’s evocative text (see Appendix 3.1 ) has been
extensively analysed from diverse perspectives (e.g. Clark 2002; Hascher
2008; Kerman 1962; R. Kramer 1994; L. Kramer 2003, 2004; Morgan 1976;
Schwarz 1986; Youens 2007). As noted, our main goal in the present analysis is
elucidating how Schubert’s musical setting encounters cross-​modal mappings
and their interactions with emotional expression, as suggested by the text. To
lay the ground for that analysis, however, we first present some observations
concerning the structure of Schubert’s song as it relates to Heine’s text.
Cross-modal correspondences in a Schubert song 65

The vocal sections of ‘Die Stadt’ present an ABA’ design, consistent with
the text’s structure, as described above. The third stanza repeats and comple-
ments the first, while the second stanza contrasts with both. The outer stanzas
(bars 6–​14, 27–​35), harmonically and melodically closed, are similar to each
other in their harmonic and melodic structures. They also present a similar
rhythmic structure (including the conspicuous dotted rhythms in the piano
accompaniment) and piano texture. The third stanza, however, contrasts with
the first in several conspicuous ‘surface’ features, particularly dynamics (f to ff,
contrasting with the overall pp of the first two stanzas) and register (the piano
accompaniment rises an octave and the bass is doubled). The vocal line also
rises higher than in the opening stanza (to g2, the highpoint of the entire song,
on ‘Liebste’, bar 34) and is more disjunct and angular, presenting the largest
melodic intervals in the song (fourths, fifths, minor sixth, octave; bars 29–​31,
33–​35). In contrast, the vocal line of both stanzas 1 and 2 presents only seconds
and thirds. Stanzas 1 and 3, then, while structurally almost identical,4 contrast
in conspicuous expressive aspects (dynamics, register, vocal contour, interval
size), the last stanza achieving a more dramatic, decisive closure, complement-
ing the tonally stable yet muted opening.
The musical setting of stanza 2 (bars 14–​27), like its text, strikingly contrasts
with those of both outer stanzas. While the settings of both stanzas 1 and 3
depict a harmonically closed structure, a homophonic texture and an arched,
mostly ascending melodic contour, stanza 2 introduces a static yet dissonant
harmony throughout (the ambiguous diminished-​seventh chord C–​E♭–​F♯–​A,
which also shapes the melodic line), a florid, arpeggiated accompaniment fig-
ure and a continuously falling vocal contour (descending from the previously
established high point, e♭2, to c1). Together, these features embody a paradoxi-
cal combination of several metaphorical movements: rapid, repetitive surface
motion (the piano figuration), which is yet static (unchanging harmony) and
aimless overall, going nowhere (diminished-​seventh chord, harmony devoid of
any clear tonal ‘direction’). This notwithstanding, a constant, steep fall under-
lies the entire stanza (the vocal contour).
Figures 3.1–​3.3 quantitatively plot some of the relationships among the
three vocal stanzas, as described above. Figure 3.1 depicts the contour of the
vocal line (top, black line) expressed in terms of the weighted average pitch
per two-​bar phrase (weighted according to note duration). Additionally, it
shows the mean absolute melodic interval per two-​bar phrase (bottom, grey
line), which indicates how much the pitch of the vocal line varies in successive
two-​bar phrases. Figure 3.2 plots the mean intensity (left) and the maximum
intensity (right) per two-​bar phrase for three performances of ‘Die Stadt’ (to be
discussed separately later).5 Figure 3.3 plots the mean and standard deviation
of the rhythmic durations present in the vocal line per two-bar phrase.
The figures suggest a complex web of similarities and contrasts between the
three stanzas. Stanzas 1 and 3 are similar in melodic contour, both presenting
66 Music and Shape



Mean absolute interval

Mean weighted pitch



60 4

0 2 4 6 8 10 12
Two-bar phrase

FIGURE 3.1   Mean weighted pitch (black line) and mean absolute pitch interval (grey line) per
two-​bar phrase

an ascending contour (with stanza 3 rising higher), which contrasts with the
descending contour of stanza 2 (Figure 3.1, top). With regard to melodic inter-
vals, however, it is stanza 3, presenting larger intervals, which contrasts with
both stanzas 1 and 2. This pairing is also depicted by intensity (Figure 3.2):
stanza 3 presents (in all three performances) considerably higher intensity than
both earlier stanzas (intensity contours, which differ for all three stanzas, also
vary with performance, which will be discussed later). The rhythm of the vocal
line, on the other hand, shows a process of change (Figure 3.3), in which the
stanzas become more rhythmically diverse, in particular through the presence
of longer durations.
These complex interrelationships notwithstanding, the three vocal stan-
zas could present a fairly conventional narrative structure, in which the outer,
stable stanzas frame a central unstable one, with the last stanza intensifying
and dramatizing the concluding tonal closure through louder dynamics, higher
register and larger melodic intervals. Yet Schubert turns this ‘reasonable’ form
upside-​down (or rather, inside-​out): he frames the vocal sections with introduc-
tory and concluding sections, both identical to the piano part of the central
second stanza, with its harmonically ambiguous diminished-​seventh harmony
and florid arpeggiations.
The expressive and structural implications of this framing have been fre-
quently observed and debated in the critical and analytical literature (e.g. Clark
2002; Kerman 1962; Kramer 2003, 2004; Morgan 1976; Schwarz 1986; Youens
2007), and we do not address them at length. Two related outcomes of this
Mean intensity DFD Maximum intensity DFD
Mean intensity IB Maximum intensity IB
Mean intensity TQ Maximum intensity TQ
80 85

75 80
Intensity (dB)

70 75

65 70

60 65

55 60
0 2 4 6 8 10 12 0 2 4 6 8 10 12
Two-bar phrase Two-bar phrase

FIGURE 3.2   Mean intensity (left) and maximum intensity (right) per two-​bar phrase for three performers. Intensity was measured from commercial
recordings combining the piano and the vocal line. Interruptions in lines indicate bars that are separated by piano accompaniment intermezzi.
68 Music and Shape

Duration (crotchets)



0 2 4 6 8 10 12
Two-bar phrase

FIGURE 3.3   Average rhythmic durations (black line) of the vocal line and standard deviation of
rhythmic durations (grey line) within successive two-​bar phrases

gambit should be pointed out, however. Structurally, it turns the song from
what could have been a tonally and narratively closed entity (as described
above) to an open one—​perhaps (as Morgan 1976 and others suggest) as a
link to other songs in Schwanengesang.6 Narratively, rather than an intermedi-
ate stage connecting dusk (first stanza) and sunrise (third stanza), the nightly
rowing scene of the second stanza is also a frame for the entire song, sup-
plying the material for its opening and closing piano figuration. Due to this
framing, Schubert’s song now takes place in a constant, perhaps eternal limbo,
accompanied by Charon’s constant rowing, leading nowhere; thus, Schubert’s
foggy framing perhaps suggests who the boatman is and what ancient tale—​the
Orphean tale of love lost—​the narrator (and Heine) is trying to retell.

Structure and cross-​domain mappings in Schubert’s ‘Die Stadt’

Having discussed general characteristics of the structure of ‘Die Stadt’, we now

turn to some of the main cross-​domain mappings that play a role in connecting
sound and images evoked by the text and by Schubert’s music, discussing in
particular associations with light, distance, motion and emotion.


As noted above, ample experimental research in perception and psychophysics

(for reviews see Eitan and Timmers 2010; Marks 2000, 2004; Spence 2011) sug-
gests that visual brightness corresponds with auditory loudness (louder/​
Cross-modal correspondences in a Schubert song 69

brighter), pitch height (higher/​ brighter) and pitch direction (rising pitch/​
brighter). Visual brightness is also associated with aspects of the sound spec-
trum, particularly spectral centroid (higher/​brighter). Research also suggests
associations of colour lightness or brightness with modality (major is lighter
and brighter than minor; Bresin 2005, Palmer et al. 2013), tempo (faster music
associates with lighter colours; Palmer et  al. 2013), and interval size (larger
melodic intervals associated with more extreme degrees of brightness or dark-
ness; Hubbard 1996).
Schubert’s ‘Die Stadt’ uses the most conspicuous of these correspondences
unequivocally. Thus, the dimensions contrasting the first and second stanzas,
set in dusk, and the third stanzas, depicting sunrise, are those most widely and
conspicuously associated with brightness: sound intensity (which has been
associated with visual brightness even in newborns; Lewkowicz and Turkewitz
1980) and pitch height (associated with colour lightness and brightness both in
humans and in other primates; Ludwig et al. 2011).
Sound intensity also affects the spectral structure of the musical sound
(both piano and vocal), such that louder sound emphasizes higher, ‘brighter’
spectral components; hence, loudness contrasts between the third stanza and
the preceding stanzas entail corresponding differences in spectral ‘brightness’
associated with visual brightness (Griscom and Palmer 2012). To examine
whether the analogy of visual brightness and spectral structure is expressed in
performances of ‘Die Stadt’, we calculated the median spectral centroid for the
three stanzas (piano solo sections excluded) for each performance (Figure 3.4).
Spectral centroid was measured using the Libxtract plugin available in Sonic

Median spectral centroid (Hz)



1000 IB


1 2 3

FIGURE 3.4   Median spectral centroid (Hz) per stanza for three performances of Schubert’s ‘Die Stadt’.
Spectral centroid was measured from commercial recordings combining piano and vocal line.
70 Music and Shape

Visualiser.7 As the figure shows, the increase in intensity in the third stanza is
indeed accompanied by a rise in spectral centroid (compared to second stanza)
for all performances, and for two of the three performances (TQ and DFD)
median spectral centroid in the third stanza is also considerably higher than in
the first stanza.
Additionally, larger melodic intervals emphasize cross-​ modal mappings
of pitch and brightness, producing more extreme (bright or dark) mappings
(Hubbard 1996). Hence, the concentration of the largest melodic intervals in
the setting of ‘Leuchtend vom Boden empor’ (literally, ‘glowing upwards from
the earth’) is telling.
A different type of allusion to light quality (yet unaccounted for by cross-​
modal empirical research) involves the diminished-​seventh sonority, which
frames the song and underlies its central stanza. Due to its symmetrical struc-
ture, the diminished-​seventh chord is the most ambiguous sonority in the tonal
harmonic palette, and may be associated (in enharmonic interpretations) with
virtually every tonal centre. Though in its present context this ambiguity is
not exploited, the chord may serve as an apt symbol of the foggy visual (as
well as emotional) quality shrouding the song. Whether this high-​level sym-
bolic association (grounded in tonal syntax, rather than basic perceptual cor-
respondences) also affects listeners’ perception is an intriguing question, which
remains to be empirically explored.


Acoustically, sound intensity (the main physical determinant of perceived

loudness) is the strongest correlate of physical distance, decreasing by approxi-
mately 6 dB with the doubling of the distance from a sound source. Another
acoustical cue for distance is spectral filtering in the upper spectral regions,
which increases with distance (Blauert 1997). In other words, softer sounds, as
well as duller sounds (possessing lesser energy in the higher spectral regions)
are both associated with greater physical distance. As noted, for both vocal
and most instrumental musical sound, higher spectral components tend to be
emphasized as increases in sound intensity and its main perceptual correlate,
loudness (Sundberg 1999; see also Melara and Marks 1990 regarding interac-
tions of timbre, loudness and pitch).
Pitch direction and distance change are also associated. An association
between pitch rise and looming (rapidly approaching) motion was found
even for nonhuman primates (rhesus monkeys; Ghazanfar and Meier 2009).
Such association in humans is suggested by a tendency to hear rising pitch
with unchanging intensity as increasing in loudness (and thus as approach-
ing: Neuhoff, McBeath and Wanzie 1999). There also seems to be an acoustical
basis for perceptual correspondences of pitch height or pitch direction with
spatial distance: the Doppler effect, in which frequency is shifted down for a
Cross-modal correspondences in a Schubert song 71

receding source. Thus, a lower or descending pitch would be associated with

greater or increasing distance. Note, however, that pitch–​distance relationships
are not unequivocal: an association between pitch rise and increasing distance
was found in music-​related imagery tasks (Eitan and Granot 2006).
Additionally, temporal and spatial distances are strongly associated in per-
ception and cognition, such that shorter duration is congruent with shorter
spatial distance (Merritt, Casasanto and Brannon 2010; Walsh 2003). This sug-
gests that manipulation of tempo or inter-​onset intervals (IOI) may also serve
to suggest changes from far (larger distance–​slow tempo or longer IOI) to near
(shorter distance–​faster tempo or shorter IOI).
For a song beginning with a gaze at the distant horizon (Am fernen
Horizonte), manipulation of such distance-​related attributes is of particular
interest. Comparing the setting of the first and third stanzas suggests a far–
near dichotomy through loudness contrasts (pp versus f /​ ff; see mean and max-
imum intensities in performances of the song, Figure 3.2). This also generates
differences in spectral energy, although the two do not need to be in a direct
linear relationship in performances of the score (compare Figures 3.2 and 3.4).
The low, muted pitch register of the piano accompaniment in the first stanza
emphasizes the sense of vast distance; the higher register in the third stanza is
thus another correlate of the protagonist’s approach to the town and, meta-
phorically, to his own anguish and pain, distant and veiled in the first stanza.
This approach is further indicated by the ascending vocal line, which suggests
a ‘close up’ on the nearing town at the song’s highpoint (‘Liebste’, bar 34), a
highpoint in pitch, loudness and emotional intensity, and the point where the
narrative is finally revealed.
While loudness is the most conspicuous attribute generating the far–near
contrast between the first and the third stanzas, in the second stanza (and in the
framing piano introduction and conclusion, which allude to it) Schubert rather
uses timbre and temporal density (i.e. IOI) to convey the dimension of spatial
distance. As noted above, the text of the second stanza suggests a ‘close up’ on
the narrator’s immediate surroundings (water, boat), thus contrasting with the
gaze at the distant town in the first stanza. This far–near contrast, however, is
not expressed through loudness contrast, as a pianissimo indication prevails
throughout this stanza (in actual performance loudness even tends to decrease,
as Figure 3.2 indicates). Rather, emphasis of the upper partials of the piano
sound may convey physical proximity, as implied by the score and brought out
in performance. The registration and doubling of the repeated diminished-​sev-
enth chord, underlying the entire stanza, coincide with the higher overtones
from the bass: c2 (the 8th partial), f♯2 (the 11th partial) and a2 (the 13th partial)
and emphasize proximity rather than distance of the bass. The shorter IOIs
(demisemiquavers) used in the piano arpeggiation of the second stanza further
articulate a shorter distance and ensuing smaller physical dimension (e.g. water
ripples) associated with the text.
72 Music and Shape

Importantly, it is the piano accompaniment, rather than the voice, which

activates the far–near contrasts between the first and second stanzas. The
voice remains remote—​soft and low, as well as decreasing in pitch and loud-
ness throughout the stanza (Figures 3.1 and 3.2). The change of distance from
the first stanza to the second involves the external, physical image. It is only
in the third stanza (where distance-​related features, particularly loudness and
pitch contour, are also associated with the voice part) that emotional distance
changes too: the approach to the town and to the emotional content associated
with it now becomes personal.


Textually, the three stanzas are clearly distinguished from each other in the
qualities of motion they suggest. The first stanza does not allude to motion
in any direct way. The second, in contrast, is full of motion, and suggests two
simultaneous types of movement: the erratic wind, creating ruffles in the water,
and the measured, ‘mournful’ oar strokes (Takte—​also musical beats). The
third stanza is underlined by a single majestic motion: the rise of the sun ‘from
the earth’.
Schubert applies two types of mapping to suggest these motion qualities.
One is the direct analogy between temporal aspects of physical motion (e.g.
pace, regularity) and aspects of rhythm (IOI, metric accent). The other analogy
maps pitch space onto physical space, and thus pitch change (e.g. rise or fall,
steps or leaps) onto physical motion (for reviews of relevant cognitive research
see Eitan and Granot 2006; Eitan 2013). In the second stanza (bars 16–​26),
both mappings are applied. The arc of rapid arpeggiation on first beats sug-
gests, through both rhythm and pitch contour, the wind and the water ripples
that the wind generates. The boatman’s Takte (the repeated, steady lowering of
the oars into the water) are alluded to by accented As, repetitively descending
two octaves (second and third beats). In the third stanza it is the pitch/​space
analogy, applied in the vocal line, which suggests motion (the rising sun): the
rising vocal contour and the concentration of upward leaps (fifth, fourth) on
‘vom Boden empor’ (bar 30).


As noted above, cross-​modal and emotional mappings are often highly cor-
related, in music and elsewhere. Low pitch, for instance, is associated with
darker (low lightness) or dimmer (low brightness) visual stimuli, as well as with
negative, low-​intensity emotion (e.g. sadness). Correspondingly, negative emo-
tional states—​‘dark’ emotions—​are themselves associated with darker or dim-
mer visual stimuli. Similarly, high or ascending visual stimuli, high or rising
auditory pitch, and positive, ‘uplifting’ moods are also cognitively associated.
Cross-modal correspondences in a Schubert song 73

A comparable relationship (as discussed above) may be discerned for slower

tempo (or longer IOI), darker colour and sad emotion.
In the third stanza of ‘Die Stadt’, Heine’s text itself challenges these estab-
lished analogies, relating brightness and spatial rise—​both associated with pos-
itive emotion—​with the poem’s painful conclusion. Schubert’s setting includes
some robust auditory correlates of increased visual brightness: higher intensity
(and the ensuing spectral ‘brightness’) and pitch register, as well as spatial rise
(overall vocal contour, large accented rising intervals). These, however, are com-
bined with the hallmarks of negative emotional valence in music: minor mode,
and slow tempo and pace. Both features are particularly underscored here: the
minor modality by the use of the lower second degree (bar 32), the slow pace
due to the contrast with the faster motion in the preceding and following piano
arpeggiations. Furthermore, at the vocal line climaxes (bars 29–​31, 33–​34) the
voice doubles the bass line, rather than the upper line of the piano texture as it
did in the first stanza. Hence, the rise to the song’s highpoint is now paradoxi-
cally associated with the deepest, lowest register.
The result of this combination of features—​a loud, brighter sound joined
with intensified minor modality and slow pace—​is not a ‘compromise’ between
the apparently conflicting features of increasing brightness and deep sorrow.
Rather, it provides an analogue of the state of tragic revelation suggested by
Heine’s text, a state of extreme ‘darkness visible’ where pain veiled (and muted)
by night, fog and distance is now revealed by the cruel sunlight, revealing to the
protagonist those ‘regions of sorrow … where peace and rest can never dwell,’
and where ‘hope never comes’ (Milton, Paradise Lost, Book 1).
This painful clarity of negative emotion in the third stanza is differentiated
from a fuzzier negativity in the second stanza, with its uncertain dissonant har-
mony. We have argued above that the diminished-​seventh chords may associate
with lack of visual clarity. Additionally, uncertainty and dissonance emphasize
negative valence, as does the descent in melodic contour in the vocal line of this
stanza (Timmers and Philippou 2010; Collier and Hubbard 2001). This leaves
the first stanza relatively unaffected emotionally (although within the general
mood conveyed by the minor mode and slow tempo). Indeed, it is the stanza
with the stablest melodic and rhythmic characteristics, containing relatively
small pitch changes and stable rhythmic patterning.

Cross-​domain mappings in three performances of ‘Die Stadt’

We now examine in more detail three performances of the score and how these
modify or add to the observed cross-​modal mappings and affective associa-
tions. We focus here on performers’ local variations in intensity and tempo.
Both types of variations map in various ways onto cross-​modal and affective
dimensions. As discussed above, intensity is closely associated with the distance
74 Music and Shape

of a sound source. Physically, it is also associated with energy, quantity and

size of a sound source: a larger number of sound sources, or sources which are
physically larger, generally create a louder sound. Possibly related to this physi-
cal mapping between intensity and energy is the contribution of variations in
intensity to perceived changes in emotional arousal in music (Schubert 2004;
Coutinho and Cangelosi 2011): intensity, spectral centroid and tempo are all
strong predictors of emotional arousal (Coutinho and Cangelosi 2009, 2011;
Gingras, Marin and Fitch 2014). Not surprisingly, growth in intensity and
energy are associated with increases in physical motion, possibly because larger
and faster motion produces greater impact in sound (e.g. Dahl, Grossbach
and Altenmüller 2011). Moreover, as discussed earlier, intensity tends to be
mapped onto luminosity, with louder sounds perceived as brighter (Lewkowicz
and Turkewitz 1980; Marks 1989). Since visual brightness is itself associated
with emotional valence (Meier et al. 2004; see above), this mapping implies a
contribution of sound intensity to the perception of emotional valence. A sys-
tematic relationship between timbral brightness (sharpness) and valence has
indeed been found (Coutinho and Cangelosi 2009, 2011), while a relationship
between intensity and valence may also be present depending on the context
(Timmers 2007).
Variations in tempo may map onto many of the same dimensions as inten-
sity. Duration is closely associated with physical distance or length (Merritt
et al. 2010). Higher tempo compared to lower is associated with faster motion
and higher energy. These associations may play a role in the contribution
of tempo to the perception of emotional arousal in music (Coutinho and
Cangelosi 2009, 2011). According to Walker, Walker and Francis (2012), tempo
is one of the dimensions that may be linked through ‘crosstalk’ with various
other dimensions of connotative meaning. Crosstalk refers to mappings that
are activated through other mappings that regularly occur. For example, if
higher pitch is associated with brightness and smaller size, then smaller size
and brighter objects may also be associated, which is indeed what Walker et al.
(2012) demonstrated. In a similar way, faster tempo is linked to smaller, lighter
and sharper objects, which are brighter and spatially higher. Notably, these
analogies also imply an association between faster tempo and positive valence,
assuming other parameters remain equal. Some evidence for the association
of tempo and valence has been found (e.g. Lake, LaBar and Meck 2014), but
further empirical verification is needed within and outside the context of music.
Table 3.4 shows the details of the three recordings used for the quantita-
tive analysis. These performances, by three well-​known singers, were deemed
substantially different from each other (based on our subjective listening),
representing possible interpretations of the song and its diverse cross-​domain
The measurements of intensity and tempo were done in PRAAT, freely
available audio analysis software which includes facilities to annotate audio
Cross-modal correspondences in a Schubert song 75

TABLE 3.4   Recorded performances of ‘Die Stadt’ by Fischer-​Dieskau, Bostridge and Quasthoff

Singer Pianist Index Record Details (Release on CD)

Dietrich Fischer-​Dieskau Gerald Moore DFD Deutsche Grammophon (2005)

Ian Bostridge Antonio Pappano IB EMI Classics (2009)
Thomas Quasthoff Justus Zeyen TQ Deutsche Grammophon (2001)

files and automatically analyse audio characteristics, including intensity meas-

ures in dB. First, the onsets of two-​bar phrases were manually indicated, which
provided a segmentation of the audio file. Onsets were defined to coincide with
the sung initiation of the phrase. Appendix 3.1 indicates the location of
phrase onsets using numerals (phrase 1, phrase 2, etc.). This was done only for
those phrases that included a vocal line, thus excluding the piano introduction,
interludes and coda. For phrases at the end of a stanza, phrase endings were
also determined, coinciding with the onset of the piano interlude. The phrase
onset and offset data were used to segment the music and to calculate phrase
durations. Using this segmentation of the audio file, we extracted the average
and maximum intensities of each two-​bar phrase.


Figure 3.2 shows the intensity values per two-​bar phrase and their variation
across the three performances. Comparison of the two panels shows a strong
overlap between the profiles in mean and maximum intensity (left and right
panels, respectively), except that the maximum intensity is on average 5 dB
higher than the average intensity (see differences in scale). The intensity profiles
are also very highly correlated for the three singers and clearly separate the
third stanza from the first two, following the forte indication of the score and
the crescendo towards fortissimo in the final phrase, as discussed above.
Focusing on changes in maximum intensity per two-​bar phrase within each
stanza, we find that intensity seems to correlate in particular with pitch con-
tour (compare Figures 3.1 and 3.2). Using partial correlations, we can corre-
late measured intensity with the weighted pitch per phrase, after correction for
correlations with a forte indication in the score. This means that the jump in
intensity is accounted for by the forte indication, and the remaining variation is
correlated with pitch height. Table 3.5 shows the resulting partial correlations,
which are strong for IB in particular, followed by TQ and DFD with lower but
still significant partial correlations.
Intensity and pitch height are both associated with visual brightness, a
prominent feature of the text. Intensity reinforces associations related to pitch
of the vocal line. In the first stanza, intensity and pitch rise, which may relate
to the appearance (erscheint) of the city at dusk. In the second stanza, intensity
and pitch descend and dissolve into the rowing motion of the accompaniment,
76 Music and Shape

TABLE 3.5   Partial correlations between pitch and inten-

sity after correction for correlations with dynamic indica-
tions in the score (N = 12)


Mean weighted pitch .591* .813** .649*

* p < .05; ** p < .01.

which itself comes to a standstill in the fermata before third stanza. The soft-
ness of the low pitch disambiguates the low voice as being depleted of energy
rather than ‘full’ or ‘big’, which low voices can also be (Eitan and Timmers
2010). Additionally, it emphasizes the emotional distance of the protagonist.
While the second stanza involves a physical close-​up, the protagonist is psycho-
logically distanced and isolated. In the third stanza, intensity is (as discussed)
a main parameter in the change in psychological distance:  from remote and
passive to emotionally involved, and from a darkened, veiled mood to pain-
ful clarity. The increase in intensity and pitch within the stanza sustains this
process until the final tones of the singer, in which the source of the emotional
experience is revealed.
Within this general trend of matching intensity to pitch contour, the singers
deviate to varying degrees from a perfect correlation, which may be a way to
communicate the ambiguity of the pitch ‘rises’. DFD deviates most strongly
from the correlation with pitch height. In the first stanza, this is the case for the
last phrase, where he decreases in intensity rather than increasing, which can be
seen as a depiction of the dusk (Abenddämmrung) and the limbering darkness.
In the third stanza, intensity is high from the start. It builds up to a degree but
diminishes within the final phrase, where the loss of the beloved is acknow­
ledged, and the return to the dark and subdued mood that follows is anticipated.
Both of these deviations from a matching of intensity with pitch contour seem
to qualify the pitch rises as dark in mood and depressed in emotion. The peaks
in pitch are moderated, emphasizing a distance and darkness of mood.


As explained before, the duration of sung two-​bar phrases was calculated from
the measured phrase onsets and offsets. All two-​bar phrases in the first two
stanzas consisted of six crotchets. However, in the final stanza, phrases vary
in score duration:  alternating phrases are slightly shorter or longer than six
beats, the longer phrases containing relatively long notes (minims for Boden
and Liebste). To make the phrase durations comparable, the measured dura-
tions were normalized by dividing performed duration by score duration and
multiplying this by 6, to show the duration of all phrases as if they consisted of
Cross-modal correspondences in a Schubert song 77


Normalized phrase duration


0 2 4 6 8 10 12
Two-bar phrase

FIGURE 3.5   Normalized phrase duration of successive two-​bar phrases in the performance by DFD, IB
and TQ. Interruptions in lines indicate bars that are separated by piano accompaniment intermezzi.

six crotchets. The normalized phrase durations can consequently be used as an

indication of relative tempo of the performed music: short (normalized) phrase
durations indicate a relatively fast tempo.
Figure 3.5 shows the variations in normalized durations of the phrases in
the three performances. DFD and TQ show very similar profiles of duration
variations (r = .886) shared to a degree by IB as well (r = .797 and r = .715).
All three performances speed up in tempo across the three stanzas. Within this
general trend, the performance of DFD is relatively slow and the performance
of IB relatively fast. The performance of IB is different in its local profile in the
second and third stanzas: IB does not slow down toward the end of the second
stanza, and he does not speed up in the second and third phrases of the third
Pairwise correlations indicate that there are no consistent associations between
the variations in normalized phrase duration and any of the measured score fea-
tures (rhythmic duration, rhythmic variability, pitch height) with all correlations
being nonsignificant (|r| < .375, p > .05). The variations in normalized phrase
duration negatively correlate with variations in intensity for DFD and TQ. This
was not the case for IB (Table 3.6). This negative association between dynamics
and duration for DFD and TQ is related to a parallel in global trend rather than
in local trends: in particular in the final stanza, intensity and tempo are high,
compared to the other two stanzas. If this global trend is corrected for (using
partial correlations), the correlation between intensity and duration becomes
insignificant for all three performances (|r| < .309, p > .05). Correlations between
the forte indication and phrase durations are reliable for DFD and TQ, however,
78 Music and Shape

TABLE 3.6   Correlations of duration with phrase

intensity and forte indication (N = 12)

Singer DFD IB TQ

Phrase intensity -​.572* -​.304 -​.702**

Forte indication -​.589* -​.534x -​.691*

x p = .074; * p < .05; ** p < .01.

and close to significant for IB (Table 3.6). This highlights that tempo is used to
support the increase in proximity, activation and emotional arousal in the third
stanza (particularly in comparison to the first) and possibly (through crosstalk)
also the increase in brightness.
If we look at the local variations in tempo within stanzas, it seems that
tempo is used at times to intensify the effects of other elements of the music
and at times to moderate them. In the first stanza, for example, all three perfor-
mances gradually slow down towards the end of the stanza (Abenddämmrung).
In this stanza, the upward motion in pitch is accompanied by a slowing down
rather than a speeding up of tempo, perhaps associating with the stillness of
the evening and the static quality of the scene.
In the second stanza, tempo again decreases gradually for DFD and TQ,
although less strongly than in the first stanza. This time the slowing down
accompanies a decrescendo and a fall in pitch. The decrease in tempo intensi-
fies the associations related to the descent in pitch and intensity of the vocal
line, increasing a sense of the isolation and depression of the protagonist. In
contrast, IB does not slow down in the second stanza but keeps a steady tempo.
This choice may instead emphasize the steady motion of the oars, highlighting
external scenery rather than psychological process.
In the third stanza, the tempo is faster from the start of the stanza for IB,
while DFD and TQ speed up after a slower first phrase. This growth in motion
coincides with an increase in intensity and underlines the emotional intensity
and turmoil of the stanza. The changes in speed, increasing rhythmic irregu-
larity, further contribute to the emotional intensity of this stanza. IB extends
the phrases containing the longer notes (minims for Boden and Liebste), while
DFD and TQ lengthen in particular the final phrase and thus dramatically
accentuate the memory of the beloved (Liebste verlor).


Of the three performances, IB’s can be seen as providing the most literal read-
ing of the score, being relatively steady in tempo at a local level, and showing a
very strong correlation between intensity and pitch. He emphasizes the global
changes in the poem and the music:  there is stronger motion in the second
Cross-modal correspondences in a Schubert song 79

stanza than in the first, and intense emotional arousal in the third. The dra-
matic climaxes in the third stanza are underlined through intensity and rubato,
extending the moments of intense emotion. His performance is seemingly most
bright and active, employing a relatively fast tempo and rising intensity with
rising pitch.
DFD shows the strongest modification of affective and cross-​modal map-
pings. His performance matches the global intensification of the poem and the
music with a global rise in the tempo across stanzas and contrasting dynamics,
in particular in the third stanza. However, each overall increase in tempo is
counter​balanced by a considerable decrease in local tempo. His performance
of the first two stanzas can be heard as the darkest, most mournful of the
three: any surges in brightness through rising pitch are darkened by using softer
dynamics and slower tempos. The emotional instability and intensity of the
third stanza is emphasized through sudden and strong tempo changes, while
the temporary nature of the vivid memory of the beloved is highlighted by an
early return to the subdued character of the start.
In TQ’s performance, the difference in character between the first two stan-
zas is less apparent than in the other performances. The second stanza is not
faster than the first, although there is a contrast in speed between the end of
the first and the start of the second. The two stanzas are also very similar in
overall intensity. At a local level, TQ shows similar uses of tempo and dynamics
to DFD and (in some cases) IB. TQ tapers off the increase in pitch in the first
stanza by limiting the growth in intensity. He also limits the decrease in intensity
in the pitch fall of the second stanza, and slows down only slightly in this stanza,
providing a relatively constant affective character. In contrast, emotional inten-
sity is very strong in his performance of the third stanza, with a sudden rise in
dynamics, spectral brightness (indicated by spectral centroid) and fluctuation in
tempo. The emotional climax is sustained until the final notes of the vocal line.

Discussion and conclusion


Our ostensible purpose in this chapter has been to demonstrate through an

analysis of a specific piece—​Schubert’s ‘Die Stadt’—​how basic cross-​modal
mappings, often investigated by cognitive science in simple experimental set-
tings, may serve to elucidate complex musical meanings, particularly in the
context of musical text-​setting. These meanings are defined and modified over
time through a complex interaction of dimensions that come together in the
form of a constantly changing shape, as further discussed below. Specifically,
we explored the relationships between cross-​modal, affective and sonic char-
acteristics in Schubert’s song, and the role of music performance in expressing
(and sometimes modifying) these relationships.
80 Music and Shape

Underlying our analysis is an attempt to demonstrate how two central pil-

lars of musical meaning—​cross-​modal and emotional mappings of musical
features—​interrelate. In musical discourse these two approaches to musical
connotation have had fundamental roles, sometimes complementarily, some-
times competitively. Throughout the history of western music, conventional-
ized musical figures have been used to allude both to aspects of the natural
world and to human passions (particularly, though not solely, in the context
of setting words to music). However, arguments against the propriety of ‘tone
painting’ of the natural world (as contrasted with the depiction of emotion)
have recurred in musical and aesthetic discourse, particularly in the eighteenth
and nineteenth centuries (see Wilson, Buelow and Hoyt 2001 for a historical
survey, and Walden 2013 for discussions of representation in nineteenth-​and
twentieth-​century music). We have tried to demonstrate how a close musical
reading, informed by empirical psychophysical and cognitive research, may
suggest ways in which emotional and ‘pictorial’ cross-​modal connotations of
music can interact.
While reinterpreting and reassociating traditions of musical representation
through the lens of empirical psychological research, the analysis also high-
lights the need for empirical studies of music and emotion to investigate fur-
ther the contribution of cross-​modal mappings, including shape. It is striking
how closely and directly many cross-​modal mappings of sound also map onto
the valence and arousal dimensions of emotion. Such ‘triadic’ connections of
sound, nonauditory dimensions and emotion include dimensions discussed in
this chapter, such as brightness, clarity, activation, power or motion, as well
as other perceptual dimensions, such as tactile roughness, sharpness (sharp–
blunt) or heat (Eitan and Timmers 2010; Spence 2011). Here we have demon-
strated how mappings of light, distance and motion onto sound may reveal
the interaction of descriptive (cross-​modal) and affective meanings. Thus, for
instance, the role of sound intensity in modulating a sense of distance also has
strong affective implications, alluding to a psychological or affective ‘distance’
between subject and object of desire. Slow tempo can similarly induce the idea
of distance, which adds to its expressive power to modulate emotional arousal
and valence.


Methodologically, our analysis attempts to demonstrate how quantitative

analysis (such as measuring intensity and tempo in performance) may be
integrated with ‘softer’ analysis of text, musical features and their relation-
ships. Measurements of pitch, tempo, intensity and duration allowed the
quantification of the relationships among these dimensions, which itself sup-
ported the interpretation of the multidimensional expression of concepts cen-
tral to Heine’s text. However, in the context of an exploratory (rather than
Cross-modal correspondences in a Schubert song 81

experimental) study, qualitative interpretation of the quantitative analysis is

essential. It is through consideration of the wider musical context that the vari­
ations in measurement dimensions become meaningful. In other words (and
rather obviously), quantitative analysis quantifies the strength of relationships,
while qualitative analysis provides meaning.
There is a strong empirical base for most of the mappings discussed here.
Nevertheless, interpretations of acoustic and compositional characteristics in
terms of affective or cross-​modal associations may not necessarily align with
listeners’ conscious experience of the music. Some such mappings may be
latently present: they may become apparent when prompted, as through con-
textual reading tasks (Dolscheid et  al. 2013). On the other hand, mappings
may influence listeners’ perceptions willingly or unwillingly. For example, par-
ticipants can be instructed to focus attention on one dimension, which they
think they do, but their responses are nevertheless influenced by the secondary
dimension (ibid.).
While we have focused on mappings of low-​level psychoacoustical param-
eters such as pitch height, tempo and loudness, it is not our intention to reduce
music to such basic properties. Rather, we aim to show the richness of conno-
tations implied even by basic sonic aspects. Nor it is our intention to suggest
that mappings are always linear and simple. Even apparently straightforward
mappings may heavily depend on context. For instance, slow tempo can suggest
relaxation, compared to a faster tempo that precedes it, and thus correspond
with positive valence; yet it can also be perceived as depressed in energy, thus
suggesting a negative connotation (as found, for example, in Timmers 2007).


The multiplicity of possible mappings (both cross-​modal and emotional) for

each musical feature, and the multiplicity of musical features that may associate
with each nonauditory dimension, particularly emphasize the role of musical
and textual context in interpreting such cross-​domain mappings. Yet, as we
have shown, context may sometimes leave the contest of mappings unresolved
and even stress an embedded contradiction (as in the case of the painful bright-
ness concluding ‘Die Stadt’).
The power of the performer to contribute to the meaning of music is evi-
dently clear from the modulation of two dimensions (intensity and tempo) that
have strong affective and cross-​modal implications. Moreover, the performer
plays a crucial role in interpreting, clarifying and choosing among possible
mappings: for example, the choice of particular tempo and intensity levels may
modify associations with pitch (Eitan and Granot 2006). The faster tempo and
the correlation between pitch and intensity in IB’s performance may indeed
make this particular performance brighter and more positive in connota-
tion, while we have argued that the decreases in tempo and intensity inserted
82 Music and Shape

in DFD’s performance at the end of stanzas emphasize the music’s darkness

and the protagonist’s isolation. This interpretation of performance extends our
understanding of the ways in which performers contribute to the expressiveness
of music (for an overview, see Fabian, Timmers and Schubert 2014), highlight-
ing how correlations with musical structure and deviations from such correla-
tions can be meaningful, but also emphasizing that performance aspects have
meaning irrespective of correlations with musical structure.
Importantly, modulation of meaning and emotion occurs in real time with
musicians making split-​second choices and decisions throughout performance.
The multiplicity of the options for modulating the various dimensions of
sound to produce a variety of possible mappings, interacting in complex ways,
requires an efficient means of managing the overall dynamic profile, and its
affective associations, from moment to moment through a performance of a
score. What is required is a concept that maps easily between domains, on any
hierarchical level, and that can apply equally to scores, performances and expe-
riences, and within them to such aspects as narrative structure, form, loudness,
brightness, tempo, speed, density, register, intensity, harmonic or interval pat-
terning, pitch direction, sound spectrum, distance and timbre—​all the dimen-
sions of score, sound and performance that we have discussed here. This is
what shape achieves. In that sense, shape can be seen as a synthesizing tool
that allows musicians to manage the otherwise bewilderingly complex field of
possibilities for action and meaning which cross-​domain mapping presents to
performers and listeners from one moment to the next.


This analysis of ‘Die Stadt’ suggested ways through which cross-​modal map-
pings contribute to the emotional and interpretative meaning of the song. We
emphasized how basic features of the music, including pitch, intensity and
tempo, are carefully employed to provide a multisensory experience of Heine’s
text. The textual context brought forward metaphors related to light, distance
and motion. Our analysis highlighted musical parallels to these metaphors and
affective connotations that come into play through particular treatment in the
composition and its performances, suggesting that the deceptively simple cross-​
modal correspondences examined by experimental psychology may combine
to generate a highly complex, multivalenced web of musico-​poetic meanings.
Finally we argued how the process of managing such a complex array of pos-
sibilities can be handled via the notion of musical shape.
Our analysis suggests that cross-​modal mappings should be more centrally
included in models of the perception of emotion in music. We see such map-
pings as closely connected to processes captured under the mechanism of ‘emo-
tion contagion’ (Egermann and McAdams 2013) and attributed to relationships
Cross-modal correspondences in a Schubert song 83

with vocal expression of emotion. Including cross-​domain mappings as a

source for affective associations helps to explain a wider variety of emotional
expression beyond the delimited conceptual framework of basic emotions.


Adler, M., 2014: ‘Cross-​modal interactions and musical representation’, in Hebrew (PhD
dissertation, Tel Aviv University).
Blauert, J., 1997: Spatial Hearing: The Psychophysics of Human Sound Localization (Cambridge,
MA: MIT Press).
Boltz, M. G., 2004: ‘The cognitive processing of film and musical soundtracks’, Memory
and Cognition 32: 1194–​205.
Boltz, M., 2013: ‘Music videos and visual influences on music perception and appreciation:
should you want your MTV?’, in S.-L. Tan, A. Cohen, S. Lipscomb and R. Kendall, eds.,
The Psychology of Music in Multimedia (Oxford: Oxford University Press), pp. 217–​34.
Bresin, R., 2005: ‘What is the color of that music performance?’, in Proceedings of the
International Computer Music Conference 2005 (Barcelona: ICMC), pp. 367–​70.
Casasanto, D. and K. Dijkstra, 2010:  ‘Motor action and emotional memory’, Cognition
115: 179–​85.
Clark, S., 2002: ‘Schubert, theory and analysis’, Music Analysis 21: 209–​43.
Cohen, A. J., 2001: ‘Music as a source of emotion in film’, in P. N. Juslin and J. A. Sloboda, eds.,
Music and Emotion: Theory and Research (Oxford: Oxford University Press), pp. 249–​72.
Collier, W. G. and T. L. Hubbard, 2001: ‘Judgments of happiness, brightness, speed and tempo
change of auditory stimuli varying in pitch and tempo’, Psychomusicology 17: 36–​55.
Collier, W. G and T. L. Hubbard, 2004: ‘Musical scales and brightness evaluations: effects
of pitch, direction and scale mode’, Musicae Scientiae 8: 151–​73.
Coutinho, E. and A. Cangelosi, 2009: ‘The use of spatio-​temporal connectionist models in
psychological studies of musical emotions’, Music Perception 27: 1–​15.
Coutinho, E. and A. Cangelosi, 2011:  ‘Musical emotions:  predicting second-​by-​second
subjective feelings of emotion from low-​level psychoacoustic features and physiological
measurements’, Emotion 11: 921–​37.
Dahl, S., M. Grossbach and E. Altenmüller, 2011: ‘Effect of dynamic level in drumming:
measurement of striking velocity, force, and sound level’, in Proceedings of Forum
Acusticum, June 27–​July 1, 2011 (Aalborg, Denmark: Danish Acoustical Society), CD-​
ROM, pp. 621–​24.
Dolscheid, S., S. Shayan, A. Majid and D. Casasanto, 2013:  ‘The thickness of musical
pitch: psychophysical evidence for linguistic relativity’, Psychological Science 24: 613–​21.
Egermann, H. and S. McAdams, 2013:  ‘Empathy and emotional contagion as a link
between recognized and felt emotions in music listening’, Music Perception 31: 139–​56.
Eitan, Z., 2013: ‘How pitch and loudness shape musical space and motion’, in S.-L. Tan,
A. Cohen, S. Lipscomb and R. Kendall, eds., The Psychology of Music in Multimedia
(Oxford: Oxford University Press), pp. 165–91.
Eitan, Z. and R. Y. Granot, 2006: ‘How music moves: musical parameters and images of
motion’, Music Perception 23: 221–​47.
84 Music and Shape

Eitan, Z. and R. Timmers, 2010: ‘Beethoven’s last piano sonata and those who follow croc-
odiles: cross-​domain mappings of auditory pitch in a musical context’, Cognition 114:
Fabian, D., R. Timmers and E. Schubert, eds., 2014: Expressiveness in Music Performance:
Empirical Approaches Across Styles and Cultures (Oxford: Oxford University Press).
Freddi, S., J. Cretenet and V. Dru, 2013: ‘Vertical metaphor with motion and judgment: a
valenced congruency effect with fluency’, Psychological Research 78/5: 736–48. Available
at doi: 10.1007/​s00426-​013-​0516-​6 (accessed 9 April 2017).
Ghazanfar, A. A. and J. X. Maier, 2009: ‘Monkeys hear rising frequency sounds as loom-
ing’, Behavioral Neuroscience 123: 822‒7.
Ghazanfar, A. A., J. G. Neuhoff and N. K. Logothetis, 2009: ‘Auditory looming perception
in rhesus monkeys’, Proceedings of the National Academy of Sciences 99: 15755–​7.
Gingras, B., M. M. Marin and W. T. Fitch, 2014: ‘Beyond intensity: spectral features effec-
tively predict music-​induced subjective arousal’, The Quarterly Journal of Experimental
Psychology 67: 1428–​46.
Griscom, W. S. and S. E. Palmer, 2012: ‘The color of musical sounds: color associates of
harmony and timbre in non-​synesthetes’, Journal of Vision 12: abstract 74.
Hascher, X., 2008: ‘ “In dunklen Träumen”: Schubert’s Heine-​Lieder through the psycho-
analytical prism’, Nineteenth-​Century Music Review 5: 43–​70.
Hevner, K., 1937: ‘The affective value of pitch and tempo in music’, The American Journal
of Psychology 49: 621–​30.
Hubbard, T. L., 1996: ‘Synaesthesia-​like mappings of lightness, pitch, and melodic inter-
val’, The American Journal of Psychology 109: 219–​38.
Kerman, J., 1962: ‘A romantic detail in Schubert’s Schwanengesang’, The Musical Quarterly
48: 36–​49.
Kramer, L., 2003: Franz Schubert: Sexuality, Subjectivity, Song (Cambridge: Cambridge
University Press).
Kramer, L., 2004:  ‘Odradek analysis:  reflections on musical ontology’, Music Analysis
23: 287–​309.
Kramer, R., 1994: Distant Cycles: Schubert and the Conceiving of Song (Chicago: University
of Chicago Press).
Lake, J. I., K. S. LaBar and W. H. Meck, 2014: ‘Hear it playing low and slow: how pitch
level differentially influences time perception’, Acta Psychologica 149: 169–​77.
Lewkowicz, D. J. and N. J. Minar, 2014:  ‘Infants are not sensitive to synesthetic cross-​
modality correspondences: a comment on Walker et al. (2010)’, Psychological Science
25: 832–​4.
Lewkowicz, D. J. and G. Turkewitz, 1980:  ‘Cross-​ modal equivalence in early
infancy: auditory-​visual intensity matching’, Development Psychology 6: 597–​607.
Litterick, L., 1996: ‘Recycling Schubert: on reading Richard Kramer’s Distant Cycles:
Schubert and the Conceiving of Song’, Nineteenth-​Century Music 20: 77–​95.
Ludwig, V. U., I. Adachi and T. Matsuzawa, 2011: ‘Visuoauditory mappings between high
luminance and high pitch are shared by chimpanzees (Pan troglodytes) and humans’,
Proceedings of the National Academy of Sciences 108: 20661‒5.
Marks, L. E., 1989:  ‘On cross-​modal similarity:  the perceptual structure of pitch, loud-
ness, and brightness’, Journal of Experimental Psychology:  Human Perception and
Performance 15: 583–​602.
Cross-modal correspondences in a Schubert song 85

Marks, L. E., 2000:  ‘Synesthesia’, in E. Cardeña, S. J. Lynn and S. C. Krippner, eds.,

Varieties of Anomalous Experience:  Examining the Scientific Evidence (Washington,
DC: American Psychological Association), pp. 121–​49.
Marks, L. E., 2004: ‘Cross-​modal interactions in speeded classification’, in G. A. Calvert,
C. Spence and B. E. Stein, eds., The Handbook of Multisensory Processes (Cambridge,
MA: MIT Press), pp. 85–​105.
Meier, B. P. and M. D. Robinson, 2004: ‘Why the sunny side is up: associations between
affect and vertical position’, Psychological Science 15: 243–​7.
Meier, B. P. and M. D. Robinson, 2005:  ‘The metaphorical representation of affect’,
Metaphor and Symbol 20: 239–​57.
Meier, B. P., M. D. Robinson and G. L. Clore, 2004: ‘Why good guys wear white: automatic
inferences about stimulus valence based on brightness’, Psychological Science 15: 82–​7.
Meier, B. P., M. D. Robinson, L. E. Crawford and W. J. Ahlvers, 2007: ‘When “light” and
“dark” thoughts become light and dark responses: affect biases brightness judgments’,
Emotion 7: 366–​76.
Melara, R. D., 1989: ‘Similarity relations among synesthetic stimuli and their attributes’,
Journal of Experimental Psychology: Human Perception and Performance 15: 212–​31.
Melara, R. D. and L. E. Marks, 1990:  ‘Interaction among auditory dimensions:  timbre,
pitch, and loudness’, Perception and Psychophysics 48: 169–​78.
Merritt, D. J., D. Casasanto and E. M. Brannon, 2010: ‘Do monkeys think in metaphors?
Representations of space and time in monkeys and humans’, Cognition 117: 191–​202.
Morgan, R. P., 1976: ‘Dissonant prolongation: theoretical and compositional precedents’,
Journal of Music Theory 20: 49–​91.
Morton, E., 1994: ‘Sound symbolism and its role in non-​human vertebrate communication’,
in L. Hinton, J. Nichols and J. Ohala, eds., Sound Symbolism (Cambridge: Cambridge
University Press), pp. 348–​65.
Neuhoff, J. G., M. K. McBeath and W. C. Wanzie, 1999:  ‘Dynamic frequency change
influences loudness perception:  a central, analytic process’, Journal of Experimental
Psychology: Human Perception and Performance 25: 1050–​9.
Palmer, S. E., K. B. Schloss, Z. Xu and L. R. Prado-​León, 2013: ‘Music–​color associations
are mediated by emotion’, Proceedings of the National Academy of Sciences 110: 8836–​41.
Reed, J., 1997: The Schubert Song Companion (Manchester: Manchester University Press).
Schubert, E., 2004: ‘Modeling perceived emotion with continuous musical features’, Music
Perception 21: 561–​85.
Schwarz, D., 1986: ‘The ascent and arpeggiation in ‘Die Stadt,’ ‘Der Doppelgaenger,’ and
‘Der Atlas’ by Franz Schubert’, Indiana Theory Review 7: 39–​45.
Spence, C., 2011:  ‘Crossmodal correspondences:  a tutorial review’, Attention, Perception
and Psychophysics 73: 971–​95.
Sundberg, J., 1999:  ‘The perception of singing’, in D. Deutsch, ed., The Psychology of
Music (New York: Academic Press), pp. 171–​214.
Timmers, R., 2007:  ‘Vocal expression in recorded performances of Schubert songs’,
Musicae Scientiae 11: 237–​68.
Timmers, R. and H. Crook, 2014:  ‘Affective priming in music listening:  emotions as a
source of musical expectation’, Music Perception 31: 470–​84.
Timmers, R. and M. Philippou, 2010: ‘Influences of musical certainty on perceived emo-
tions and, vice versa, influences of musical emotions on certainty in decision-​making’,
86 Music and Shape

in S. M. Demorest, S. J. Morrison and P. S. Campbell, eds., Proceedings of the 11th

International Conference for Music Perception and Cognition (ICMPC11) (Seattle:
University of Washington), pp. 812–3.
Tsur, R., 2006: ‘Size–​sound symbolism revisited’, Journal of Pragmatics 38: 905–​24.
Wagner, S., E. Winner, D. Cicchetti and H. Gardner, 1981: ‘ “Metaphorical” mapping in
human infants’, Child Development 52: 728–​31.
Walden, J. S., ed., 2013:  Representation in Western Music (Cambridge:  Cambridge
University Press).
Walker, P., J. G. Bremner, U. Mason, J. Spring, K. Mattock, A. Slater and S. P. Johnson,
2010:  ‘Preverbal infants’ sensitivity to synaesthetic cross-​modality correspondences’,
Psychological Science 21: 21–​5.
Walker, L., P. Walker and B. Francis, 2012: ‘A common scheme for cross-​sensory correspon-
dences across stimulus domains’, Perception 41: 1186–​92.
Walsh, V., 2003: ‘A theory of magnitude: common cortical metrics of time, space and quan-
tity’, Trends in Cognitive Sciences 7: 483–​8.
Weger, U. W., B. P. Meier, M. D. Robinson and A. W. Inhoff, 2007: ‘Things are sounding
up: affective influences on auditory tone perception’, Psychonomic Bulletin and Review
14: 517–​21.
Wilson, B., G. J. Buelow and P. A. Hoyt, 2001: ‘Rhetoric and music’, Grove Music Online,
http://​​subscriber/​article/​grove/​music/​43166 (accessed
9 April 2017).
Youens, S., 2007: Heinrich Heine and the Lied (Cambridge: Cambridge University Press).

Shapes composed
George Benjamin, composer and conductor

Musical shape on the largest scale

A few works in the repertoire have a formal contour so simple that it can be
recalled in toto after a single hearing. Some, like Borodin’s Steppes of Central
Asia, Debussy’s Sunken Cathedral or the first of Berg’s Three Orchestral Pieces,
approach from the distance, reach an apogee and then recede. Others merely
build inexorably from virtual silence to a cataclysm—​Grieg’s In the Hall of the
Mountain King, the passacaglia interlude from Ligeti’s Le grand macabre or,
most famously, Ravel’s Bolero.
In most of these, all surface resources of music—​register, instrumental
density, velocity and above all volume—​are exploited to create the most rudi-
mentary formal outline: an arch or a wedge. Some—​Debussy and particularly
Berg—​are marked by a much larger degree of internal diversity and intricacy,
though the fundamental structural mould still holds.
Other basic shapes rely less on incremental sonic display and instead employ
different, and more versatile, essential tools in structural definition, namely sym-
metry and repetition. Using these typically involves subtler resources, involving
above all thematic material and—​at least until the early twentieth century—​
key. Da capo arias and classical minuets are obviously moulded along these
lines, as are, to a lesser or greater extent, variation and rondo forms. Add the
arts of expanded contrast, transition and evolutionary development, and the
same forces also underpin sonata form. When this architectonic blueprint was
combined with the narrative thrust of the novel—​or of opera—​more dynamic
and unpredictable forms were the result, from Beethoven to Berlioz, Wagner,
Mahler, Debussy, Berg, Carter and beyond.
A complex musical work has many diverse—​and simultaneous—​shapes.
On the largest imaginable scale, the placing of grand orchestral perorations in
Wagner’s Götterdämmerung has a specific and precisely judged contour, as does
90 Music and Shape

the placing of silences, across the three acts. Similarly the large-​scale rhythm of
thematic recall, the alternations between varying types of texture, the shadings
and pacing of the highly diverse harmonic palette, the labyrinthine tonal design
and the flirtation with cadence, all of these were supremely interlaced by this
master of dramatic architecture and proportion. Even the contrasts in tempo,
metre and phrase structure; the use of restricted registers and specific timbres
(piccolo, stopped horns, multiple harps, tam-​tam and suspended cymbal); the
varying types of word-​setting—​all of these have a macro form, many of them
intersecting from time to time, some at the very surface of the music, others
more deeply buried within its construction.
The perception of large-​scale form requires much guess-​work from the
attentive ear during performance; particularly in modern music, the full shape
can be comprehended only as the very last note falls into silence. At first it may
be almost impossible to discern the type of formal play involved in a work,
or its manner of unfolding and its scale. This is one of the challenges—​and
delights—​for the listener, and the ear searches for exceptions as much as sym-
metries in order to orientate itself along the path of an unfamiliar work.
The arrival of the chorus in the fifth movement of Mahler’s Second, the
first notes of the harp at the very end of the first act of Tristan—​these pianis-
simo entries, an hour into the structure, are decisive structural incidents, just as
potent and memorable as the most energetic prestissimo or extreme tutti climax.
The trombones in the Pastoral Symphony, the large bells in the Symphonie fan-
tastique or the gongs in Le marteau sans maître—​these timbral signals, mark-
ing the later stages of each work, all share a similar function. In an opera of
such syntactical volatility and compression as Verdi’s Falstaff, the arrival of
symmetrical phrase patterns and continuous set forms in the third act has a
decisive influence on the recalled form, as does the lapping lullaby motion and
unambiguous tonal security in the recognition scene in Elektra.
Of all the factors involved in creating structures on this scale, perhaps the
least easy to grasp and to recall is the harmonic thread; but such works will
live or die according to the success in handling this most intangible of musical
phenomena. This is all the more challenging for a composer working outside
the predefined pathways and goalposts that the tonal system provides.
No other composer has exceeded Berg as a master of large-​scale design, and
the plotting of structure across Wozzeck—​on and below the surface—​makes
most other music seem like child’s play. In particular, the third act has an irre-
sistible momentum and dramatic impulse, yet it also seems underpinned by
the deepest architectural foundations. The first two acts exploit a sequence of
older musical forms as scaffolding to support the frequently jagged, expression-
istic surface of the music—​ranging from sonata form to passacaglia, scherzo
to fugue. However, beyond the opening scene—​a highly idiosyncratic set of
variations—​Act 3 is virtually free of conventional formal background, each
successive scene inventing a sui generis prototype of astounding originality.
Reflection: George Benjamin 91

But beneath its tumultuous moment-​to-​moment flow, there is a simple, large-​

scale pattern—​in effect, a sequence of widely spaced harmonic knots—​which,
though not easy to discern, might have been of great use to Berg while compos-
ing this final act at a red-​hot intensity in 1922.
The sequence of imitative instrumental lines at the beginning of the first
scene gradually assembles a bitonal hexachord (combining the triads of E♭
major and D major), the act’s first harmonic sonority (Figure R.2).
The complex and dense parallel harmonies that conclude the interlude at the
end of this first scene coalesce onto the identical combination of major triads,
superimposed over an ominous low B (Figure R.3).
The note B instigates and dominates the whole of the ensuing murder scene,
which concludes, famously, on two gigantic crescendos on this pitch, the first of
these culminating in a sfffz hexachord (Figure R.4).
After the tavern scene, the fourth scene commences with exactly the same
six-​note collection, in a different registration (Figure R.5).
Remarkably, almost every note in the fourth scene is derived from this sin-
gle hexachord and its multiple inversions and transpositions, and it concludes
with a return to this ‘mother’ chord at its original pitch level, sustained for
almost twenty bars (though in a more sombre tessitura than at the opening).
Eventually this chord cadences onto a D minor triad with added supertonic,
from which the final interlude ensues (Figure R. 6).
After an enormous, cathartic climax this interlude concludes on the same D
minor sonority, spread pp across most of the orchestra (Figure R.7).
The opera ends with a luminous, floating carillon—​four flutes and celesta
oscillating between two consonant tetrachords—​underpinned by a hollow per-
fect fifth in the strings and harp (Figure R.8).

FIGURE R.2  Berg, Wozzeck, Act 3, bars 3–​7

FIGURE R.3  Berg, Wozzeck, Act 3, bars 69–​71

FIGURE R.4  Berg, Wozzeck, Act 3, bar 114

FIGURE R.5  Berg, Wozzeck, Act 3, bar 220

FIGURE R.6  Berg, Wozzeck, Act 3, bars 318–​21

FIGURE R.7  Berg, Wozzeck, Act 3, bars 370–​71

FIGURE R.8  Berg, Wozzeck, Act 3, bar 392

94 Music and Shape

FIGURE R.9  Berg, Wozzeck, Act 1, bar 717

FIGURE R.10  Berg, Wozzeck, Act 2, bars 810–​12

This same oscillation over a perfect fifth is used at the end of the first act,
though in a darker and fuller register (Figure R.9).
The second act also concludes with the same harmonies, presented however
in a fragmented and gruff way, so deep in register that they are barely percep-
tible (Figure R.10).
Nevertheless, the conclusion of all three acts of Wozzeck ‘rhyme’
harmonically—​as is the case, incidentally, in Berg’s other operatic masterpiece,
Lulu—​and the complete third act is straddled by a daisy-​chain of harmonic
connections below the surface, each recapitulating harmonic sonorities as a
means of closure, and each giving Berg a firm telos at which to aim his extraor-
dinarily diverse and subtle invention (Figure R.11).
Reflection: George Benjamin 95

FIGURE R.11  Berg, Wozzeck, Act 3: harmonic connections

A great question facing a modern composer at the start of a work is, Where
is the harmony going? Many allow the pitch material to expand without appar-
ent destination or predetermined direction, forging the form by fantasy and
focused improvisation. Others create a goal—​going full circle by returning to
an opening harmony or by inventing a pseudo-​tonic sonority at which to aim.
Equally it’s possible to envisage a complete path in advance—​particularly if
the harmonic vocabulary isn’t too complex—​which might be modified radically
as the composition evolves. Some harmonic plans preclude change, with the
resultant piece remaining locked into the identical, static blueprint from begin-
ning to end. Yet others allow the music to pursue a self-​generating—​or even
arbitrary—​harmonic mechanism below the surface, abruptly cutting the music
off when there is no more to say.
While conceiving a new work, a composer today may well envisage a circle,
a series of blocks, a sequence of loops, a perpetually descending spiral—​or a
combination of some or all of these—​as a metaphor for its harmonic trajec-
tory. These simple, imagined shapes are integral to the compositional process,
not mere pretence or poetic analogy. Such a path may be planned in advance,
though usually, I suspect, it evolves during the act of composition. Regardless
of its provenance, a large-​scale work—​whatever the era or idiom—​needs a firm
and decisive sense of closure at its end. The shape Berg proposes for the conclu-
sion of Wozzeck maintains more than a degree of relevance today.

Affective shapes and shapings of affect

in Bach’s Sonata for Unaccompanied Violin
No. 1 in G minor (BWV 1001)
Michael Spitzer

Emotions have shapes, and musical emotions mirror those shapes. This is a sim-
ple enough claim. But the multitude of assumptions packed into this statement
could fill a library. Indeed, it drives the industry of emotion studies, which has
overtaken music psychology and aesthetics (but not yet musicology) since the
humanities’ affective turn a decade or so ago.1 Do emotions have shapes, or is it
the behaviours, intonations and intentions associated with them? Why and how
is music emotional, and is the emotion expressed, induced or perceived? Does
emotion even exist? What is a ‘shape’, and how can a musical shape be captured
analytically? Is it the preserve of the composer or the performer? And so on.
In walking through this jungle, I  put together six arguments, all prefabri-
cated: the theory lies in the assemblage rather than in the constituent ideas.
1. I speak not of ‘emotion’ in the round but of ‘emotions’ in the plural,
many comprising discrete basic categories such as sadness, happiness,
fear, tenderness and anger.
2. These categories are ethological, originating in adaptive animal
3. Emotional behaviour is expressed through goals, or ‘action tendencies’.
4. In music, emotional categories are associated with acoustic features
which are readily identifiable.
5. Goal-​directed emotional behaviour in music is conceivable when we
think of music as a virtual person, according to ‘persona theory’.
6. I discriminate between emotional expression and induction, on the
basis that a listener discerns a musical process as being ‘expressive
of emotion’ (rather than transitively expressing a composer or
96 performer’s affective intentionality).
Shapes of affect in Bach’s Sonata in G minor 97

This assemblage may have seemed counter-​intuitive to previous writers. For

instance, although the notion of ‘action tendency’ (e.g. the tendency to flee,
to fight, to love) has existed for a long time (Frijda 1986), music psychologists
have hitherto limited its application to behaviours that musical emotion is liable
to provoke: sad music may make us weep, happy music cause us to dance, and
so on (Sloboda and Juslin 2001: 87–​9). By contrast, music analysis, with its
expert conception of musical form as a landscape of goal-​directed tonal forces
(Nussbaum 2007), makes available fresh applications: ‘action tendency’ can
now suggest the actions, gestures and aspirations of a musical subject navigat-
ing tonal forces within a work.
Another assemblage is historical, bridging the epochal distances from the
evolutionary origin of emotion, through modern social habits, and then across
the aesthetic realm, finally reaching the peculiar domain of musical emotion.
We may question the relevance of a primitive adaptive emotion such as fear,
evolved millions of years ago, to our present-​day aesthetic experience of Bach,
for example. This chapter is about affective shapes, and the shaping of affect,
exemplified in Bach’s Sonata for Unaccompanied Violin No. 1 in G minor
(BWV 1001). Before this analysis can get under way, we need a little more
Emotion theorists since Darwin’s The Expression of Emotions in Man and
Animals like to model the evolutionary development of the human brain on the
archaeological metaphor of the city, with the oldest layers being the deepest.
According to Freud,
... suppose that Rome is not a human habitation but a psychical entity
with a similarly long and copious past—​an entity, that is to say, in which
nothing that has once come into existence will have passed away and all
the earlier phases of development continue to exist alongside the latest
one. (Freud, Civilization and its Discontents, quoted from Oatley 2004: 63)

Emotions proper are held to have emerged not in the very oldest and innermost
layer of the brain—​the ‘corpus striatum’—​which controls basic animal rou-
tines such as walking, patrolling, foraging and mating, but in the central lim-
bic system. Often called the ‘emotion brain’, the limbic system arose with the
peculiarly social world of mammals unavailable to reptiles. It involves sociable
behaviours such as mother/​infant care-​giving, vocal signalling and play, and
is the site of the basic emotions: happiness, anger, fear, desire and sadness.
A crucial feature highlighted by the ‘city metaphor’ is that the more intellec-
tual neocortex—​the third and newest layer, which developed over the six mil-
lion years in our evolution from the apes—​does not supersede these two older
layers. On the contrary, ‘as with the organization of cities, earlier forms and
developments have continued, and provided for subsequent developments and
elaborations’ (ibid.: 67). In particular, the neocortex elaborates the sociality of
the limbic emotions. This notion of earlier layers persisting through later ones
98 Music and Shape

helps us understand the permeability of music’s emotional spaces. The notori-

ous bang in the slow movement of Haydn’s ‘Surprise’ Symphony is a construct
of abstract syntactic patterning, and hence of neocortical sophistication. At
the same time, we flinch because of our ancient brain​stem reflex, a primitive
reaction towards sudden noises. The primeval shock is not superseded or cov-
ered over but co-​opted as a template on which to hear the modern surprise.
Oatley charts the shift from ancient to modern emotions essentially as a
change from emotion as goal-​driven (shared by animals) to emotion as social
(evinced by some animals, but quintessentially human). Thus happiness was
originally associated with goal fulfilment, and later became a symptom of
social cooperation. Sadness, once an emotion of goal loss, is now linked to loss
of relationship. Likewise, the primitive fear of danger was transformed into a
fear of social rejection. Importantly, the social emotions are structured accord-
ing to the template of the primitive emotions. Using the terms of Lakoff and
Johnson’s (1980) theory of metaphor, which traces a trajectory from the physi-
cal and embodied to the abstract and conceptual, I propose that there may be
a ‘metaphorical mapping’ from the primitive to the social in our experience of
emotion in musical structure (Spitzer 2004).
A pinnacle of socially refined emotion is the Trio from Mozart’s Symphony
No. 40 in G minor. Its loving tenderness is sonically expressed by a cluster of
acoustic features: diatonic and triadic sweetness, smoothness of line and rhythm,
soft dynamics and gentle tempo. It projects loving ‘shapes’ at the phrase level in
terms of tender-​sounding gestures which put one in mind of a human dialogue:
dialogues of periodicity within instrumental groups, and between strings and
wind. The overlay of horns in thirds at the moment of recapitulation—​instru-
ments conspicuously withheld until then—​clinches the Trio’s analogy with an
operatic love duet. The full, sociable, manifestation of the emotional category,
love, is consummated at an architectonic level, the reprise.
Mozart’s Trio was the subject of Leonard Meyer’s longest analytical study,
a text where he floated his intriguing, and never fulfilled, concept of emotional
‘ethos’ (Meyer [1976] 2000; Spitzer 2009). This was an important and future-​
facing departure for Meyer, because his seminal work on emotion actually
avoided talk of specific or ‘discrete’ emotions in the plural. The Trio is also sug-
gestive because it epitomizes the extreme conventionalization of musical style,
opening up an important space between the historical and cultural relativity
of stylistic codes, and the evolutionary pedigree of emotions themselves. Even
within modernity, the category of tenderness is recognizable across an aston-
ishing variety of historical styles, from Schubert lullabies to the opening of
Mahler’s Ninth Symphony; or, going backwards, from the Siciliana of Bach’s
G minor violin sonata to the medieval topos of sweetness (Wegman 2003).
Affective shapes interact with ‘display rules’ (Ekman 1984: 320–​1), enabling us
to recognize a common emotion in Mahler and Bach.
Shapes of affect in Bach’s Sonata in G minor 99

My portmanteau argument and analysis is disposed in two parts, and it

explores various notions of musical shape, emotional shape and the process of
‘shaping’. In the first part of this chapter, which focuses on Bach’s first move-
ment, the Adagio, I propose a three-​tier model of musical shape: emotion is
projected, respectively, through the relatively instantaneous level of acoustic
cues, midlevel phrasing and large-​scale form. In short, I discover the same
three-​tier model of Mozart’s Trio in Bach’s violin sonata. I show that emotions
themselves have shapes, which can be analysed through the interaction of three-​
tiered musical shape with the display rules of the baroque language, including
specific formal models such as ritornello patterns. Furthermore, I understand
‘shaping’ as the process of expressing emotions by inflecting or transform-
ing structural models. Emotional shaping is thereby kindred with the shaping
enacted through performance, and I bring my analysis into dialogue with three
recordings of Bach’s first movement, including data captured in tempo and
dynamic maps. The second part of this study looks at the other movements,
first individually and then at the overall shape of the entire cycle. I argue that
emotional shape is also borne out through the interactions of the four move-
ments with each other. My starting point, however, is Naomi Cumming’s phe-
nomenological study of Bach’s Adagio, based on Peirce’s semiotic categories.
While Cumming’s analysis is rich, it is a useful measure of how much more can
be achieved through more recent work in emotion theory.

Part I: The Adagio

Naomi Cumming’s The Sonic Self (2000) is a magisterial essay on musical

meaning. Conversant with analytic philosophy and music psychology, at its
heart the book is a theory of musical semiotics from a stringently Peircean per-
spective. Whereas readers of Jean-​Jacques Nattiez and Raymond Monelle will
have come to the book with a reasonable familiarity with Peirce’s concepts (pri-
marily, his sign typology of icon, index and symbol; his phenomenological cat-
egories of firstness, secondness and thirdness), Cumming presented arguably
the first systematically persuasive application of Peircean semiotics to music.
Her argument is an elegant dance between the poles of music’s subjectivity
and signification, the former bespeaking music’s immediacy and uniqueness,
the latter reckoning with how musical expressivity is mediated through form
and convention. The book climaxes with a case ​study of Bach’s Adagio from
his Unaccompanied Violin Sonata in G minor, illuminated with Cumming’s
insight as a practising violinist. Cumming’s analysis suggests that the issue for
her was not ‘shape’ so much as ‘shaping’, as is implicit in her understanding of
musical ‘gesture’.
100 Music and Shape

As an example of gesture, Cumming identifies the series of three descend-

ing thirds (B♭–​G, G–​E♭, E♭–​C♯) at bar 5 (2000: 225; see Figure 4.1). Part of
what makes these figures so ‘gestural’ is that they interrupt the directed tonal
motion to the cadence, a ‘wilful’ goal-​orientation that Cumming identifies with
thirdness in music. Yet there is no need to be so specific, because by Cumming’s
lights gestures are pervasive as the common units of musical currency, being
coterminous with any midrange musical event. This is because of the ‘propen-
sity of listeners to hear’ in terms of ‘short, directed motions’ (165). The pri-
macy of gestures is striking, given that Cumming identifies them not with the
semiotic primacy of firstness and icons, but with the secondness of indexes.
Musical listening, then, seems to begin ‘in the middle’: not with sound (= first-
ness; iconicity) per se, but with the shaping of sound into indexical musical gestures.
Cumming’s view is that with gesture, general sound qualities are embodied in a
specific musical reality (the ‘secondness’ of Peirce’s ‘indexicality’), and in such a

FIGURE 4.1   Bach, Sonata for Unaccompanied Violin No. 1 in G minor (BWV 1001), Adagio, bars 1–​13.
Shapes of affect in Bach’s Sonata in G minor 101

FIGURE 4.1  Continued

way that ‘the synthesis of its structural elements, when they are heard as embody-
ing aspects of [human] movement (in directionality, force, etc.), … suggest expres-
sive agency’ (149). In short, a gesture is a ‘melodic shaping’ (230), the embodiment
of a sonic quality as a particular musical event. Cumming’s notion of shaping is
enriched by the ontological differences of sign types; I part company with her, how-
ever, when she tries to extend the Peircean method to theories of emotion.
102 Music and Shape

Following Peirce, Cumming associates the categories of firstness and icon

with vocality, and those of secondness and index with embodied physical
motion. In turn, thirdness and symbol bring with them more elusive qualities of
‘will’, ‘desire’ and the goal-​orientation of long-​range musical argument, as with
the cadential drive momentarily arrested by the descending-​third ‘gestures’ at
bar 5. If the musical ‘voice’ is characterized by immediacy, and musical gestures
by their singularity, then wilfulness in music is ‘rule-​bound’, the logic of its rules
emerging only through the gradual unfolding of the formal process. There is
a suggestive alignment in Cumming’s system between the spectrum of voice–​
gesture–​wilfulness with that of timbre–​melodic figure–​formal process, as well as
with the rising proportions from local, through medium-​range, to global. In prac-
tice, these dimensions are all imbricated within each other: tonal and gestural
properties are implicit within the musical detail, just as it is counter-​intuitive to
ignore the role of texture and gesture in the shaping of large-​scale form.
Equally suggestive is Cumming’s notion of the musical work as a ‘complex
synthesis’ of sound, gesture and process, each involving a different ontology
(of voice, body and will). Given that a musical gesture is relatively ‘blinkered’
(230), the ‘bringing together of gestural events into the aural perspective of a
tonal purpose is an act of “synthesis” between different kinds of signs’ (232).
For Cumming, this large-​scale synthesis constitutes ‘shape’ at the highest level.
Complex synthesis, then, is as rich and vital as subjectivity itself, which is why
it leads Cumming into a theory of musical persona. Just as the human subject
uses voice, gesture and will to express emotion, so does the virtual persona pro-
jected by the musical work. The problem, however, is that Cumming’s theory
of emotion is much less supple than her Peircean underpinnings, looking rather
outdated from the vantage point of contemporary emotion theory. She articu-
lated her thoughts on musical emotion in debates with Langer, Levinson, Kivy
and Davies, but completed The Sonic Self before the publication of seminal
works by Juslin, Robinson and Nussbaum.
Cumming’s idea of emotion seems to be locked into rigid, and linguisti-
cally defined, epithets—​i.e. words. She hears bars 1–​2 as a blend of ‘pathos and
reflectiveness, spontaneity and containment’ (221), which makes it easier for
her to set up the more cognitive approach of Karl and Robinson (1995) and
others as straw-​man arguments to be easily knocked down. By contrast, emo-
tion theory’s recent psychological turn—​particularly the ‘appraisal theories’ of
Juslin and Robinson—​makes such binaries insupportable. From the perspec-
tive of more recent theories, it is possible to construct a scenario of ‘sadness’
which is both complex and unitary, while also corresponding to everyday-​life
expressions of this emotion.


Oatley’s ethological model of sadness as loss of goal or attachment has many

entailments. At a designative level, where the facial, vocal, gestural or attitudinal
Shapes of affect in Bach’s Sonata in G minor 103

expression of grief is evolved to elicit emotional support from others, the musi-
cal persona mimics the dejected face, drooping posture and plaints of a sad
person. These are the familiar ‘acoustic features’ of sadness codified by many
psychologists (Juslin 1997; Huron 2008; Gabrielsson and Lindström 2010): slow
tempo, minor-​mode key, narrow intervals, legato articulation, variability of tex-
ture, preponderance of descending melodic contours and a high level of dis-
sonance, especially involving the semitone appoggiaturas of the pianto topic
(Monelle 2000). Descending lines suggest loss of physical and mental energy;
narrow intervals and legato articulation imitate low-​energy mumbling.
A fresher perspective is afforded by Huron’s connection between sadness
in music and the ‘detail-​ oriented thinking’ of ‘depressive realism’ (Huron
2011: 48), following the work of Alloy and Abramson (1979), who consider the
impact of emotion on cognition and perception. From this angle, reflection and
self-​reflection are seen as behavioural aspects of sadness, an emotion which is
an adaptive opportunity for a wounded organism to recover by taking stock of
the situation. How would this be illustrated analytically, given that psycholo-
gists of emotion have shied away from looking at the ‘structural features’ of
emotion beyond the parametric level? I suggest that ‘detail-​oriented thinking’ is
borne out by thematic atomism and formal fragmentation, the way the Adagio
lurches rhapsodically from one contrast to another. Its lurching vicissitudes are
indeed another side of sadness’s lack of goal, just as its aimlessness is worked
out by spurts of spontaneous melismas and maggiore episodes. (For more on
such signalling see Spitzer 2009.) Huron identifies such major-​key interludes
in minor-​key works with ‘nostalgia’, which he thinks is a flavour of sadness.
We find such episodes at bars 2–​3 (a lurch from G minor to B♭ major and
back again) and, more dramatically, at bars 11–​13 (shifting from C minor to E♭
major and back to C). The pathos of these nostalgic moments is heightened by
their very interruption, or ‘containment’, to invoke Cumming’s term—​part of
sadness’s relentless denial of goal-​orientation.
Sharpening the focus on the opening phrase, ‘atomism’ is evinced in the
sheer density of the Adagio’s texture, a ‘thickness’ which demands ‘detail-​
oriented’ reflection from the listener. Hence we see the reciprocal relationship
between sadness as a disposition of musical material (dense and fragmentary),
and sadness as a mode of hearing (acute and detail-​oriented). Otherwise put,
we don’t just hear sadness, we also hear in a sad way. Density is heard in the
ways the opening phrase both invokes and resists formal and contrapuntal
schemata. Gjerdingen hears it as instantiating a 1–​7 … 4–​3 ‘Meyer schema’,2
even though this breaks his own rule (2007: 112) that 4–​3s shouldn’t overlap
1–​7s. At the very least, the schema is deformed. Better to hear it, I suggest, as
a mutual interference of two schemata: a 1–​7 … 7–​1 (complementary pianti
weeping gestures, with the second F♯ displaced up an octave), and a 5–​4  …
4–​3, a descending line which will emerge in the fugue subject, but introduced
here with the opening D elided. The tritone leap from C to F♯ leaves the C high
and dry, seeming to foreshadow the subdominant bias of the movement (as in
104 Music and Shape

the C minor ritornello of bar 13). There is a similar tension between the pro-
pensity of analysts to read the Adagio ‘top-​down’ as a Schenkerian 8-​descent
(Cumming 2000: 233), or ‘bottom-​up’ as a descending, rule-​of-​the-​octave bass
pattern (Lester 1999: 34). This either-​or binary detracts from the messy, poly-
phonic richness of the Adagio’s texture, for instance the quasi-​canonic counter-
point in bars 2–​3, where the melody’s E♭–​D step is mirrored a little later in the
‘bass’. This quasi-​canon—​a mensurally distorted canon at the octave (recall-
ing the G major/​minor canons in the Goldberg Variations)—​is missed by both
Lester and Cumming, but is suggestive of the Adagio’s very self-​reflection.
Thus Cumming’s four affective epithets—​ pathos, reflection, spontaneity
and containment—​really hang together as a package of entailments of a single
emotional category, sadness, considered as a type of adaptive behaviour. Pathos
and reflection are, respectively, outward-​and inward-​facing behaviours: gestur-
ing to observers, reflecting on loss. Spontaneity and containment are comple-
mentary symptoms of goal loss: energy breaks out, breaks down or is blocked.


A behavioural approach to musical emotion begs the question of agency: What

is it that ‘behaves’? The philosophical theory of musical persona, developed by
Levinson (1990), Davies (1994), Cumming (2000), Nussbaum (2007) and oth-
ers, proposes the answer at the highest level. What ‘behaves’ (read: speaks, ges-
tures, moves, wills, weeps, fights, flees, dances, etc.) is a virtual subject projected
through the interplay of tonal forces across an imaginary musical landscape.
Analytically, it is easiest to show this by considering expressivity in music as
transformation, or inflection, of a model, akin to Arthur Danto’s account of
individual (stylistic) ‘manner’ as ‘adverbial’ (see Ross 2003: 234), focusing on
the ‘how’ (the inflection of pattern) rather than the ‘what’ (the origin and status
of the pattern itself). Much, if not all, western music is composed by elabo-
rating a stylistic, formal or contrapuntal model. A plausible model for Bach’s
Adagio—​indeed, for a great deal of his music—​is Wilhelm Fischer’s conceptu-
alization of a three-​part ritornello scheme of Vordersatz–Fortspinnung–​Epilog,
as recuperated and developed by Laurence Dreyfus (1998: 61). A possible source
for the Adagio is the Vivaldian staple exemplified by the Largo of his Violin
Concerto Op. 3 No. 6 (see Figure 4.2). An opening I–​V–​I gambit (Vordersatz)
leads to a more dynamic (Fortspinnung) central module typified by a circle-​of-​
fifths progression underpinned by a descending bass (akin to rule-​of-​the-​octave
bass descents). The model is rounded out by a V–​I closing gambit (Epilog).
Gauged against Vivaldi’s prototype, the relative complexity and density of the
Adagio’s opening ritornello, bars 1–​4, snaps much more vividly into view. See
in particular the central Fortspinnung module (end of bar 2 to middle of bar 4).
A fifth cycle is discernible, but heavily disguised. The B♭, the first note of the
cycle, is buried as an inner voice beneath the top G on the third beat of bar 2.
Shapes of affect in Bach’s Sonata in G minor 105

FIGURE 4.2   Vivaldi, Violin Concerto Op. 3 No. 6, Largo, bars 1–​6

B♭ is formally dislocated from the next note of the cycle because it is projected as
an ending of the first phrase (Vordersatz): the note is relatively long (a quaver),
resolves the preceding tonal tension (dominant-seventh harmony) with a tonic
and is articulated by the following demisemiquaver rest. If the B♭ is an ending,
then the lurch up to the E♭ sounds like a new beginning, metrically stronger than
the first beat of the next bar. The A natural, the third beat of the cycle, is metri-
cally weakened by being displaced by a quaver in bar 3; and it is disconnected
from the E♭ because that note had fallen back down to a B♭. D, the fourth note of
the cycle, is reached only two quavers after A: each of the four steps of the cycle is
differentiated by a distinct textural shape and metrical placement. This makes the
‘skeleton’ of the phrase, its grammatical deep structure, quite challenging to hear.
Moreover, this sense of discontinuity is compounded by the abrupt tonal shifts
across bars 2–​3 from G minor to B♭ major and back to G minor.
Hence pinpointing bars 2–​3 shows how the Adagio’s fifth cycle is highly
deformed, effectively into a series of isolated structural notes, suggesting a
‘detail-​oriented’ listening in line with the ‘depressive realism’ of sadness.
106 Music and Shape

There are two diametrically opposite accounts of ‘shape’ that can be drawn
from this example. In the first, the Adagio’s sadness, with its detail-​oriented
depressive realism, is ‘adverbial’, being a transformation of a formal model. All
four movements of Bach’s sonata begin with the same Vordersatz–​Fortspinnung–​
Epilog ritornello model (see Figure 4.3). Focusing on the central cycle-​of-​fifths
module spotlights the successive transformations, each one of which produces

FIGURE 4.3   Inflections of the fifth cycle

Shapes of affect in Bach’s Sonata in G minor 107

a different emotion—​more on this in Part II. The key point here is that eliciting
contrasting expressive character by transforming a framework is the essence of
variation form. See also, in the history of theory, Heinichen’s (1711) or Niedt’s
([1706] 1721) lessons to budding composers on how to adapt expressive figura-
tions to libretti in order to project differing emotions. This adverbial account
highlights ‘shaping’ rather than ‘shape’.
The opposite account discovers ‘shape’ in the pattern rather than in its
inflection. By ‘pattern’, I mean the shape of the music’s ‘behaviour’. Elsewhere,
I termed such dynamic emotional shapes ‘affective trajectories’ (Spitzer 2013).
Although Oatley and others characterize sad behaviour with loss of goal, the
emotion is not without directionality. Sadness is a strongly aversive emotion;
for an emotion whose essence is loss of goal, the only goal for sadness, para-
doxically, is to stop being sad. This affordance is strangely ignored by psy-
chologists of musical expectancy. Margulis (2005), for instance, theorizes (after
Huron) three classes of expectation:  surprise, denial and expectation proper.
But the sadness of the Adagio’s opening phrase, as an aversive emotion, is
surely implicative of an escape from this sadness. The lurch into the tender/​
happy B♭ major episode in bars 2–​3 is admittedly a ‘surprise’ in its abruptness.
Yet isn’t this flight to the major, to a positive valence, implied by the aversive
quality of the opening? (Conversely, in the many works where such flight to the
major is denied, isn’t this minor-​mode standstill registered as a form of ‘con-
tainment’ or even repression?)
It is important to pin this trajectory to fundamentals in order to u­ ndergird
more complex, Lacan-​tinged, explanations (Spitzer 2013). For instance, if
sad music takes separation anxiety as an axiom, then its trajectory seeks to
­recreate (recuperate, memorialize, return to) the severed social bonds, typically
in the form of a maggiore ‘dream image’. The dream image at the centre of
the ritornello’s central module, bars 2–​3, for all its brevity, is more animated
(the scalar uprush to E♭) and intervallically more expansive (fourths, tritones,
sixths), and it momentarily even trips into a dance lilt. And then this episode
is just as s­ uddenly snuffed out by the F♯, returning the music to G minor. The
‘shape’, then, is an implicative drive away from materials associated with sad-
ness, towards those expressive of tenderness and happiness, accompanied by a
sudden ‘opening out’ or expansiveness, suggestive of feelings discharged from
within, or liberated from a constraint; and then a sudden return to the ini-
tial state. Importantly, the middle tender/​happy state is not separable from this
process, but part of sadness’s trajectory. It is helpful that Huron characterizes
tender/​happy music contextualized within sad music as ‘nostalgia’ (although
maggiore episodes are surely not all backwards-​facing: see the discourse gen-
erated by Levinson’s identification of the second group of Mendelssohn’s
Hebrides Overture with ‘hope’; Levinson 1990, Karl and Robinson 1995).
The value of such a broad conception of ‘shape’ is that it doesn’t commit
us either to a formal model (such as Fischer’s and Dreyfus’s ritornello model)
108 Music and Shape

or to specific pitches, rhythms or even contours. This is an important consid-

eration to bear in mind, given the complex politics of the score–​performance
relationship: from this standpoint, what is being performed is not just a score,
but also a performance shape inscribed within the score. Since performances,
qua performance activities, rehearse emotional shapes in their own right, their
relationship to the shapes in the score is thus more akin to ‘mirroring’ than to
mechanical reproduction. I say more about this later.
Thus I hear the B♭ major episode at bars 2–​3 as having the same shape
as the turn from C minor to E♭ major at bars 11–​13. Contextually, they are
analogous: the middle module of the opening ritornello, the central climax of
the piece (bar 11 is exactly midway). Tonally, the patterns are similar: g–​B♭–​g,
c–​E♭–​c. Motivically, thematically and formally, however, their materials are
completely different. The ‘scalar uprush’ at bar 2 is possibly discernible in the
seventh ascent, B♭–​A♭, at bar 11; but this ascent actually begins in C minor
at the start of the bar, and on a B♮, with the B♭–​A♭ uprush really elaborat-
ing a middle​ground voice-​leading progression from D to E♭. The commonal-
ity of shape, rather, is heard at the level of shared affective trajectory. The
key difference is one of scale: the affective trajectory is massively amplified.
Everything now is bigger and more clearly pronounced. Its sadness is sadder:
the interlocking suspensions and chains of major sevenths at bar 11 consti-
tute the Adagio’s most excruciating moment. Its dream image is more ecstatic
and extended: the hint of dance at bars 2–​3 is now really confirmed; the
remarkable opening up of its register to two octaves, climaxing with the bold
leaps between A♭s, suggests an uprush of emotion, feeling erupting from the
depths of the music. These leaps elaborate perhaps the emblematic gesture of
Bach’s violin music—​the rising arpeggiation across multiple-​stopped strings.
This rise, together with the straining resistance of the strings, lends itself par-
ticularly well to a feeling of emotional discharge. Finally, the collapse back
to the minor is far more dramatic than earlier at bar 3: after a build-​up to
a cadence in E♭ major across bars 11–​12, the cadence is dramatically inter-
rupted by a diminished-seventh chord at bar 13, which returns the music to a
minor key. The interrupted cadence at bar 13, underscored by a pause, is the
Adagio’s salient event, and it ushers in the subdominant reprise (the ritor-
nello in C minor), a structural deformation constituting a dissonance at an
architectonic level.
The Adagio’s emotional shape, then, is rendered at successively higher struc-
tural levels: first, ‘vocally’ implicit in the acoustic features of the opening into-
nations; second, ‘gesturally’ explicit at the level of the phrase (bars 2–​3); third,
formally fulfilled at the level of architecture (bars 11–​13). Mozart’s Trio also
does that, and it is plausible that many works in the western repertoire pro­
ject emotional shape at rising levels. Cumming’s vocality–​gesture–​will progres-
sion points in this direction, although her Peircean lens arguably occludes more
than it illuminates.
Shapes of affect in Bach’s Sonata in G minor 109


Before we turn to the rest of Bach’s sonata to explore ‘shape’ and ‘shaping’ at
the level of the cycle, we need to complete—​even ‘consummate’—​this dialectic
in the reality of musical performance. Isn’t ‘shaping’ what a violinist does with
Bach’s materials? On the other hand, can one speak of performance ‘shapes’
across an interpretation? A  common experimental protocol in emotion psy-
chology research is to get a performer to interpret the same phrase in different
ways so as to project varying affective states. Is it thereby legitimate to view
‘adverbial’ compositional processes, such as variations, as ‘performative’ in this
respect, the composer shaping a musical model into a distinct affect just as a
performer shapes the music? If so, then a notion of emotional shape/​shaping
may shed new light on the interaction of scores and performances.
In a market saturated with recordings of Bach’s music for unaccompanied
violin, I have selected distinguished versions by Itzhak Perlman, Sergiu Luca
and Gidon Kremer. Although Cumming doesn’t engage with specific perfor-
mances of the Adagio, her Peircean triad voice–​gesture–​will suggests generic
differences between these three violinists’ approaches. Perlman’s classic 1988
recording epitomizes mainstream late twentieth-​century interpretative practice,
playing the piece with large-​scale, often symmetrical phrasing. Perlman brings
out the broad formal unfolding of the Adagio, the ‘will’ of the tones. Luca’s
1992 ‘historically informed’ (HIP) recording is focused much more sharply
on the intricate gestures of the Adagio’s rhetorical delivery. It is tempting to
style HIP ‘gestural’, after Cumming, although its rhetorical quality reminds
us that it is difficult to conceive of musical gesture apart from vocality. That
said, the portamento ‘sobs’ prevalent in early twentieth-​century practice, as in
Fritz Kreisler’s 1926 recording, may sound even more vocal than HIP. My third
example, Kremer’s 1981 version, is interesting for combining modern tech-
niques with intricate phrasing, yet the latter expressing not HIP sensibilities
so much as rhapsodic waywardness. Taking the Kremer last, I begin with a
point-​by-​point comparison of the Perlman and Luca versions, concentrating
on the ‘emotional shape’ of the opening ritornello and its ‘architectural’ expan-
sion across bars 11–​13, in the light of tempo and dynamic maps of the perfor-
mances (Figures 4.4 and 4.5).3
Luca’s rendering of Bach’s opening projects the wave-​ like spectral and
dynamic shapes highly characteristic of the period bow (Fabian 2005: 95). The
short baroque bow is conducive to the ‘ “period” stroke’: soft onset and rapid
decay. A spectrogram easily reveals that the higher frequencies crest and fall
across Luca’s bow strokes on the strong beats of bars 1–​2, and that the steep
oscillations are matched by the dynamic swells and ebbs. Conversely, spec-
trograms of Perlman’s performance, on a modern bow and instrument, show
his solidly sustained tone and dynamics. Luca’s spectral/​dynamic wave shape
is also mirrored in the oscillations of the tempo maps, but not in synchrony
40 0

30 –20
Tempo (BPM)

Energy (dB)
20 –40

10 –60

0 –80
1.1 1.3 2.1 2.3 3.1 3.3 4.1 4.3 5.1 5.3 6.1 6.3 7.1 7.3 8.1 8.3 9.1 9.3 10.1 10.3 11.1 11.3 12.1 12.3 13.1

FIGURE 4.4   Tempo and dynamic map of Luca, bars 1–​13

40 0

30 –20
Tempo (BPM)

Energy (dB)
20 –40

10 –60

0 –80
1.1 1.3 2.1 2.3 3.1 3.3 4.1 4.3 5.1 5.3 6.1 6.3 7.1 7.3 8.1 8.3 9.1 9.3 10.1 10.3 11.1 11.3 12.1 12.3 13.1

FIGURE 4.5   Tempo and dynamic map of Perlman, bars 1–​13

112 Music and Shape

with the note swells, and differently between the two players. It is interesting
that both Luca and Perlman begin at similar tempos (21 bpm), and accelerate
to a peak at beat 3 of the first bar (Luca 25.2 bpm; Perlman 23.9 bpm), before
slowing down. Both players also decelerate towards the end of bar 2 (Luca
23.5 bpm; Perlman 16.7 bpm), against the grain of an older performance tradi-
tion (perhaps beginning with Joachim’s 1903 recording) of taking the ‘uprush’
scale at beats 3–​4 somewhat faster. In both recordings, then, the ritornello’s
Vordersatz is shaped by a nearly identical tempo wave (Perlman: 21–​23.9–​16.7
bpm; Luca: 21–​25.2–​23.5 bpm), helping to project it as a self-​contained unit, a
sort of sonic pillar.
Luca and Perlman drift further apart in how they treat the remainder of the
ritornello and the music immediately after it. A lot of my analysis pivots on
the boundary between the Vordersatz and Epilog of Bach’s ritornello, marked
by the B♭–​F♯ gesture at bar 3 and the wrench it effects back from major and
minor. The three performances interpret this boundary in different ways.
Luca articulates the four semiquavers at the beginning of bar 3 very care-
fully, with a hint of dotted rhythm on the first of each pair, and a diminuendo
towards the quaver D (from -​26 to -​34 dB) thereby rendering the louder B♭–​F♯
tonal interruption more dramatic (from -​34 to -​21 dB). On the one hand, this
cuts off the fifth-​cycle Fortspinnung module from the Epilog. On the other, there
is surprisingly little deceleration into the long cadenza-​like melisma at bar 3
(from 30 bpm at bar 3.2 to 26.6 bpm at bar 3.4), despite the performance tra-
dition in non-​HIP recordings of taking the melisma considerably slower. Yet
both aspects bespeak the same tendency of HIP readings to focus on small-​unit
articulation and play down broader contrasts. In this respect, Perlman’s read-
ing is markedly different, epitomizing the mainstream tradition’s preference for
seamless legato, uniformity of tone, long-​range or block-​like contrasts, and the
projection of large-​scale structure.
Where Luca separates modules 2 and 3, Perlman’s powerful sense of line
drives through them in a fine art of transition. He maintains a high dynamic
level across the four semiquavers (-​25 dB), all articulated evenly and with equal
intensity, swelling successively through the B♭–​F♯ gesture to the G resolution
at bar 3.3 (-​27 dB), climaxing with the melisma (-​31 to -​35 dB). The arrival
of this gesture, then, is smoothly mediated and subsumed into the swell into
the melisma: it is part of a wave, rather than a brusque shock. Perlman’s art
of transition is underscored by tempo changes: the B♭ ​major dream image at
the start of bar 3 is fastest (accelerating from 20.2 bpm at bar 3.1 to 24.1 bpm
at bar 3.2), and his Adagio subsequently decelerates through the B♭–​F♯ ges-
ture to the luxuriously paced melisma (21 bpm at bar 3.3 to 15.1 at bar 3.4).
Compared to Luca, Perlman widens the tempo differential between dream
image and melisma: a slight difference of 4.8 bpm with Luca (from 31.4 to 26.6
bpm), nearly double that with Perlman (from 24.1 to 15.1 bpm). Otherwise put,
in Perlman’s recording, the duration of quaver beats from the D through the
Shapes of affect in Bach’s Sonata in G minor 113

B♭–​F♯ gesture to the climactic G lengthens in increments of 0.4 seconds, from

1.3" to 1.7" to 2.1", bespeaking an extraordinarily precise grasp of rhythmic
modulation. Moreover, the constituent notes of Perlman’s melisma are pro-
jected as individual entities (similarly to the melismas of bar 1), rather than
passed through quickly as subordinate diminutions, as they are in Luca’s
performance. By contrast, the post-​cadential material, from bar 4 beat 3, is
remarkable for its lack of rhythmic flexibility: Perlman now plays with absolute
regularity of tempo, regaining and sustaining a fast 21 bpm. The faster tempo
and rhythmic regularity serve to place the preceding melisma into deeper pro-
file (although Perlman eases out of it gently, not on the V6$of bar 4 but on
the tonic chord two beats later). This broad contrast between bars 3 and 4—​a
contrast carefully mediated in wave-​like transitions—​reveals Perlman as the
longer-​range formal thinker.
Another instance of Perlman’s projection of large-​scale patterns is borne
out in a striking relationship between the ritornello and the ‘architectural’ cli-
max at bars 11–​13. Now, Perlman and Luca concur in reserving the clearest
instantiation of ‘wave tempo’ to this point: in both recordings, the sudden turn
to E♭ major on the third beat of bar 11 is the point where up/​down tempo flux
becomes synchronized to the beat. From this point, both Perlman and Luca
slow down and speed up from one beat to the next, climaxing at the third beat
of bar 12 with a slope down to the interrupted cadence and fermata. This regu-
lar tempo wave never appears elsewhere in Luca’s performance, but it does in
Perlman’s: in the Fortspinnung and Epilog of the ritornello at bars 2–​3, at twice
the amplitude—​oscillating every two beats, rather than every single beat. The
climax at bars 11–​13, then, performs the tempo wave twice as fast as at bars
2–​3, the model for its shape—​on the crotchet rather than on the minim. The
original performance shape is thereby accelerated and intensified, in elegant
‘contrary ​motion’ to the material’s greater expansiveness at bars 11–​13.
The differential between the peak and trough of Perlman’s tempo through
the Fortspinnung and Epilog is 9 bpm across two beats (24 bpm at bar 3.2; 15
bpm at bar 3.4), from the climax of the ‘dream image’ to the depressive nadir
of the melisma. The tempo shape mirrors the emotional shape, as might seem
natural (Luca does not do this). The differential at the architectural climax,
between the peak of bar 11.3 (23.1 bpm) and the trough of bar 13.2 (14.9 bpm),
is almost exactly the same, 8.2 bpm, but now spread out much more expan-
sively across seven beats. The vertiginous beat-​to-​beat differential (averaging
4.4 bpm) within this two-​bar stretch further heightens the excitement, but does
not muddle the impression that the passage, in Perlman’s performance, has the
same tempo/​affect shape as in the ritornello.
Kremer, a player noted for idiosyncrasy, combines aspects of modern and
HIP idioms: a somewhat fractured rendering of individual detail with con-
temporary bowing practice (Figure 4.6). Compared to Luca and Perlman,
Kremer is expressively ‘deviant’, insofar as deviation from a norm is a standard
40 0

30 –20
Tempo (BPM)

Energy (dB)
20 –40

10 –60

0 –80
1.1 1.3 2.1 2.3 3.1 3.3 4.1 4.3 5.1 5.3 6.1 6.3 7.1 7.3 8.1 8.3 9.1 9.3 10.1 10.3 11.1 11.3 12.1 12.3 13.1

FIGURE 4.6   Tempo and dynamic map of Kremer, bars 1–​13

Shapes of affect in Bach’s Sonata in G minor 115

technique of creating an expressive effect, involving, in Eric Clarke’s words,

‘deliberate departures from the indications of the written score’ (2003). Despite
his modern bow, Kremer plays the Vordersatz with dramatic hairpin diminuen-
dos (unlike Perlman’s solidly sustained tone), recalling Luca’s shapes but with
the contrast vertiginously amplified. Kremer begins at 18.6 bpm, slower than
Perlman and Luca’s 21 bpm. Where the others’ tempos rise and fall through bar 1,
Kremer gets even slower, to 15.7 bpm at beat 4, lurching to a faster tempo at
bar 2, whose four beats then slide down successively towards the uprush (20
bpm, 18.9 bpm, 16.7 bpm, 14.6 bpm). Kremer, like the others, begins bar 3
at a faster tempo (21.7 bpm): where Luca decelerates and Perlman gets even
faster, Kremer actually keeps a steady pace before suddenly accelerating at the
climactic G of bar 3 (from 21.3 at the B♭–​F♯ gesture to 25.6 bpm), thus taking
the boundary half a beat later than Perlman, not at the B♭–​F♯ gesture but at
its note of resolution. Regarding the architectural climax at bar 11, although
Kremer, like the others, performs a turning point on the third beat (fast, fol-
lowed by a slope down to the interruption and fermata), he doesn’t project
Perlman’s or Luca’s ‘synchronized tempo wave’. In fact, Kremer’s entire read-
ing conspicuously disdains any regular tempo waves, a marker, perhaps, of his
mannerist irregularity. This is also manifest in the lack of synchrony between
his tempo and dynamic curves. In both Luca’s and Perlman’s performances,
tempo and dynamics generally shadow each other, especially at the opening
(i.e. they get faster and louder, slower and softer, at the same time). In Kremer’s
performance, tempo and dynamics are more independent from each other.4
Lest one dismiss Kremer’s reading as wantonly obscure, there are aspects to
his ritornello that are strikingly revealing. Like Luca, Kremer articulates the
initial four semiquavers at bar 3 irregularly, yet in reverse: not dotted (Luca) but
iambic, like Scotch snaps. He thereby brings out the notes of the fourth cycle
(A and D) from under their appoggiaturas (B♭ and E♭): from the standpoint
of projecting the skeleton of the Fortspinnung, Kremer is thus clearer than
Perlman or Luca. Another telling detail is that Kremer, unlike practically every
other front-​rank exponent of this Adagio, plays the F♯ at bar 4 without the trill.
Yet waiving the trill pays huge expressive dividends for Kremer: it throws the
emphasis on the G♮ on the strong beat of bar 4, and highlights the gesture as an
expansion (semiquavers into quavers) of the B♭–​A and E♭–​D pianti at the start
of the bar. Indeed, this is where the semiquaver pianti’s iambic shaping becomes
strategic: setting up the ‘deviant’ pianti to be straightened out and resolved, as
a climactic G–​F♯ trochee. The tensions of Kremer’s phrasing discharge into the
bar 4 G–​F♯ climax as in no other performance. This is how Kremer conspires to
combine intricate attention to local detail with broad formal thinking. (Taking
an opposite path to a similar end, Perlman had downplayed the semiquavers at
bar 4 not via rhythmic displacement but by powering through them.)
Perlman, Luca and Kremer’s performance styles are all expressive in
their own ways. Following Schubert and Fabian’s appeal for a typology of
116 Music and Shape

‘expressiveness’ in musical performance, Perlman and Luca’s idioms are char-

acterized, respectively, as ‘mainstream expressive’ (‘long-​range fluctuations of
dynamics, tempo rubato and shaping of singing melodic lines’; Schubert and
Fabian: 575) and ‘baroque-​appropriate stylish’ (ibid.: 581). According to Schubert
and Fabian, listeners evaluate the ‘stylishness’ of the latter by its perceived fit
within an historical (baroque) grammar of expressiveness. I would argue that
Kremer’s performance is expressive in a third way, as ‘deviant’: Clarke’s ‘deliber-
ate departure’ from a set of norms. Indeed, what heightens Kremer’s deviance
is that he seems to play the first two performance options—​HIP intricacy and
‘mainstream’ cantabile—​against each other into a sort of interference pattern.
How, then, do these three distinct styles of ‘expressiveness’ relate to the emo-
tional shapes of sadness? A bland reply would be that performance styles sit
next to compositional styles as just another variety of ‘display rules’, elaborat-
ing emotional categories in terms of their various grammars. A more interest-
ing solution, as hinted earlier, is to see them as ‘mirroring’ emotional shapes
in performative terms. HIP, mainstream and deviant styles each generalize a
particular aspect of the package of entailments that constitutes sadness. HIP
fits with sadness’s orientation to detail; mainstream performance with the
legato smoothness associated with sadness’s ‘mumbled articulation’ (Huron
2011:  149), in contrast to HIP’s drier articulation; deviant performance with
sadness’s goal-​evasion and sudden contrast. No performance style can mono­
polize an emotional shape. The three styles we have looked at elaborate par-
ticular aspects of that shape. The situation is quite complex; from a different
standpoint, for instance, Perlman and Kremer’s interpretations could actu-
ally be said to be more detail-​oriented than Luca’s, because they project the
ornaments—​especially the melismas at bars 1 and 3—​thematically, rather than
subsuming them hierarchically. Making a meal of these little notes is unhistori-
cal, therefore expressively deviant, and thus more pathetic. Much depends on
one’s point of view.

Part II: The Cycle

The remaining three movements of Bach’s sonata—​ Fuga, Siciliana and

Presto—​project contrasting emotional categories. Even if we recognize that
the emotions are distinct from each other, identifying what these emotions are​
and ascribing linguistic emotion terms to them a​ re very different matters. The
tenderness of the Siciliana is nearly as patent as the sadness of the Adagio
(ostensibly, as I  recount below, because we learn tenderness from the cradle,
and it is foundational for later relationships; conversely, its loss—​inducing sep-
aration anxiety and depression—​is equally prevalent). By contrast, whether the
Fuga and Presto are fearful or angry (or deliberative or impassioned) is very
much open to question, and indeed subject to performance interpretation. The
Shapes of affect in Bach’s Sonata in G minor 117

crux, however, is the relativity of this openness: the emotions of the fugue and

finale are more ambiguous than those of the first and third movements. In a
parallel study, based on the audibility of Bach’s emotions for two sets of listen-
ers of differing expertise (Spitzer and Coutinho 2014), one of the ‘take-​home
messages’ was that nearly everybody identified sadness and tenderness, respec-
tively, in the Adagio and Siciliana, whereas opinion was much more divided for
the other two movements.5
Faced with the ineffability of some musical emotions—​a consequence, per-
haps, of a loss of historical sensibility—​one approach would be to hypothesize
these emotions on a purely theoretical basis, audacious, even outlandish, as
that might sound. In short, one could speculate that these emotions really are
intrinsic to the musical shapes inscribed within the compositional trace, even
if nobody today has the historical ears to hear them. That is the approach
I have attempted with the Fuga and Presto. As with the Adagio, my theoretical
analysis spotlights the ritornello’s central Fortspinnung module for clarity of


Like sadness, the other basic emotions can be defined in terms of goals and
social relationship. Anger is typically triggered by the frustration of an ‘active
goal’, leading to aggressive behaviour such as fighting. Tenderness, or love, is
associated with ‘physical and mental closeness’ and with nurturing behaviour.
Fear is stimulated by an appraisal of ‘danger or goal conflict’, the subject react-
ing with withdrawal (e.g. fleeing) or freezing (e.g. trembling) behaviours (Oatley
2004: 81–​2).
A cursory overview of the sonata’s remaining three movements does suggest
that their material unfolds these respective emotional behaviours. If the second
movement’s fugal opening expresses a kind of tetchy, repressed or even ‘cold’
anger, then the music ‘lashes out’ in the successive eruptions of semiquaver pas-
sagework. These ‘eruptions’ recall James Russell’s ‘script’ for anger (1991: 39),
a more elaborated version of Oatley’s schema: after an offence, a person glares
and scowls, will feel internal tension and agitation and a desire for retribu-
tion, and finally will lose control and strike out. The Siciliana is generically and
topically a lullaby, its tenderness mirroring the intimate and nurturing social
closeness of a dialogue between a mother and child (described by Colwyn
Trevarthen as the ‘primary intersubjectivity’ enacted in their rhythmic turns
of cross-​modal dialogue; 1999–​2000: 177). The literature on lullabies is exten-
sive, often referring to their cross-​cultural features of simplicity, smoothness,
descending contours, relative slowness and short phrasing (Unyk et al. 1992;
Trainor and Hannon 2013). Daniel Leech-​Wilkinson has linked the preponder-
ance of falling pitch contours in art-​music lullabies to the descending motions
of Infant-​ Directed Speech (IDS) or ‘motherese’ (Leech-​ Wilkinson 2006).
118 Music and Shape

Importantly, the yearning, increasingly chromatic, quality of the Siciliana is

equally expressive of erotic adult love, in line with the finding of psychologists
that infant-​directed love is a template for older experiences. Despite mature
lovers’ increasing ability to integrate closeness and independence as they grow
older, people’s love schemata ‘are shaped by children’s early experiences and are
thus relatively permanent’ (Hatfield and Rapson 2004: 656). Finally, the relent-
less semiquaver runs of the Presto finale, cashing in the metaphor of musical
‘motion’ as physical motion across a landscape, suggests panic-​stricken flight
in response to threat. Fear in music is one of the most variegated of emotions,
since it can be associated with several aspects of threat: the threat itself; its
foreboding qualities (musical ‘danger signals’ typically being soft, low sounds);
a trembling before this threat (typified by tremolando); freezing on the spot
(musical stasis, pedals, hiatus); or, as in the present case, physical flight, often
with no clear direction. The seemingly aimless vicissitudes of the Presto suggest
fleeing in the face of an unknown threat—​perhaps flight from the preceding
three movements themselves. As with the Adagio, the broad formal ‘behav-
iours’ of the Fuga, Siciliana and Presto unfold patterns encapsulated in the
opening ritornello, specifically within its central Fortspinnung module.
Of the four movements, the Fuga presents the Fortspinnung’s fifth cycle
most transparently (Figure 4.7). Its four notes are articulated plainly (undec-
orated) and with rhythmic and metrical regularity (equally spaced quavers).
Kremer brings out these pitches by performing them staccato—​not marked
in Bach’s score, but arguably implicit in historical performance style (Luca’s
historical interpretation does the same). If staccato articulation may be expres-
sive of fear as much as anger, then the latter emotion comes to the fore with the
dense triple-​stopping at bar 3, which Kremer plays particularly aggressively.
The Amazon review commends Kremer for his aggression: ‘Kremer accenting
the repeated notes in the fugue’s subject harshly and fiercely. … [The fugue]
explodes with a palpable fury from the instrument’. However, such fury is an
outlier in Fuga interpretations: Luca and Perlman are much more subdued (per-
haps they choose to emphasize the first half of Russell’s anger schema—​glaring

FIGURE 4.7   Bach, Sonata for Unaccompanied Violin No. 1 in G minor (BWV 1001), Fuga, bars 1–​4
Shapes of affect in Bach’s Sonata in G minor 119

and internal tension—​rather than the aggressive second half). Nevertheless,

Kremer’s outlying performance is arguably in line with the aggression implicit
in Bach’s deformed contrapuntal treatment, epitomized by his rare use of a
subdominant answer. A normative tonal answer in bar 2 would have remained
on the G, instead of descending to F, and would have resolved to F a little later
as part of a D minor (dominant) harmony at bar 3. Yet Bach supplies an anom-
alous subdominant answer instead, so as to cadence on C minor at bar 3 (in the
baroque repertoire, the other great exception to this rule is the subdominant
answer of the fugue in the Toccata and Fugue in D minor attributed to Bach).
The subdominant answer creates a powerful clash at bar 3 between the E♭ and
the D, so that the contrapuntal voices ‘fight’ with each other aggressively.
In the Siciliana, the first three notes of the cycle are clear and are rendered
with the lullaby’s typical long–​short rhythmic lilt (Figure 4.8). However, F,
the fourth note, is displaced by one quaver (the listener expects it two qua-
vers after the C), stretching the length of the C, thereby stretching the lilt
like elastic. This rhythmic flexibility is congruent with the smoothness and
avoidance of sharp contrast noted of lullabies, and is markedly distinct from
the fugue subject’s (rhythmic) regularity. But there are other lullaby aspects
implicit in the voicing of the Siciliana’s fifths cycle. The pitches of the cycle
are distributed between two contrapuntal voices, whereas in the Adagio and
Fuga the cycle kept to a single voice. The impression of there being two
voices is heightened by the registral gap between the lower and upper notes,
which is much greater than in the previous movements; for instance, the
D and G are an octave-and-a-half apart. This registral separation encour-
ages the listener to ‘stream’ the pitches as distinct voices, metaphorically
suggesting a dialogue between two musical personas: one could even inter-
pret the voice-​crossing of the two parts on the final F of the cycle (taken
by the lower voice, rather than, as expected, by the upper) as symbolic of
their harmonious interaction. In the Fuga, the cycle at bar 2 ‘fights’ against
the fugal answer beneath it. In the Siciliana, the dialogue is not conflictual

FIGURE 4.8  Bach, Sonata for Unaccompanied Violin No. 1 in G minor (BWV 1001), Siciliana,
bars 1–​6
120 Music and Shape

FIGURE 4.9   Bach, Sonata for Unaccompanied Violin No. 1 in G minor (BWV 1001), Presto, bars 1–​11

but harmonious, because the pitches of the cycle are shared between the two
voices, and indeed cross over.
The rhythmic and textural uniformity of the Presto—​its continuous succes-
sion of semiquavers—​makes it initially difficult to pick out the fifth cycle from the
background figuration (Figure 4.9). Interestingly, the cycle is slightly extended
by a further fifth progression: A–​D is followed by G–​C at bar 11. This is the only
movement where this happens. It is as if the forward-​moving harmonic drive of
the music is so great that the cycle’s seemingly endless implicative potential to
rotate around the circle of fifths (B♭–​E♭–​A–​D–​G–​C–​F–​B♭ etc.) can hardly be
contained. This harmonic drive compounds the Presto’s rhythmic speed. As well
as panic, the movement expresses another corollary of fear: shock. The Presto
unfolds a series of shocks by subverting its metrical pattern; indeed, this pat-
tern is constantly shifting in unpredictable ways, cognitively ‘wrong-​footing’ the
listener (as it symbolically wrong-​foots the fleeing subject, as it were). For
instance, the very start of the cycle, the B♭ at bar 6, subverts a pattern of two-​bar
phrases established at the opening (Figure 4.10a and b). That is, the fast music
suggests a slower metrical grouping, whereby bars 1–​2 constitute one ‘beat’ of a
‘hyper-​bar’, bars 2–​3 a second beat, and bar 5 the onset of a third beat. It is this
implicit three-​beat hyper-​bar that is interrupted by the B♭; it introduces a ‘hyper-
metrical’ disruption. Moreover, a ‘metrical reduction’ of the cycle at bars 6–​8
(leaving out the semiquavers between its notes) reveals that, by accenting the sec-
ond beat of each group (the crotchets E♭, D and C), it encapsulates the preced-
ing hypermetrical disruption in miniature (Figure 4.10b). Hence not only does
the cycle arrive as a metrical shock to bars 1–​5, but it is itself a series of metrical
shocks. And there is another, broader, level at which the Presto expresses fear:
the sheer speed of the music makes it difficult to follow. This literally overwhelm-
ing quality evokes the classic formula of the sublime, which is fear at its most
philosophically elevated level. The Presto evokes sublime fear both as cognitive
overload and as the behavioural reaction to fear, which is to flee.
Shapes of affect in Bach’s Sonata in G minor 121

FIGURE 4.10   (a) Hypermetrical reduction of Bach, Sonata for Unaccompanied Violin No. 1 in G minor
(BWV 1001), Presto, bars 1–​6; (b) metrical reduction of bars 6–​8, revealing syncopation

Bach’s cycle, then, projects four distinct emotional behaviours. The central
module of the ritornello stereotype serves as a bellwether for each behaviour.
Its atomization in the Adagio suggests the lack of goal—​ the lethargy—​
connected with sadness or depression. Its conflictual disposition in the Fuga
enacts the aggressive conflict often linked to anger, when goals are blocked.
Its fluid and flexible disposition in the Siciliana suggests the tender dialogue
between mother and child in a lullaby, mirroring social closeness. And its
animation—​and overflow into an extended fifth cycle—​in the Presto evokes
a subject’s physical flight, in extreme fear or panic. All of these emotional
behaviours constitute ‘shapes’, as previously defined, shaping the ritornello

A third dimension—​in addition to ‘shape’ and ‘shaping’—​is the relational one,
through which the various behaviours define and articulate themselves against
each other. This dimension commutes shape/​shaping into a kind of transforma-
tional ‘vector’, nudging it from the domain of emotion proper to that of ‘affect’.
Although the terms ‘emotion’ and ‘affect’ tend to be used interchangeably, in this
instance I follow thinkers such as Brian Massumi (2002), who represents a con-
stellation of ideas drawn chiefly from Deleuze, Bergson and Spinoza. Massumi
122 Music and Shape

theorizes affect as an energetics of indeterminate bodily intensity, casting feeling

as a fluid process anterior to emotion proper. Emotional signification—​as in the
meaning of discrete emotional categories, such as sadness, tenderness, anger,
etc.—​marks a stage where the fluid vectors of affect are stalled, frozen and ren-
dered determinate. But suggestive as the affect–​emotion distinction may be, I
don’t believe it fits cleanly either with the ontology of music (which is intrinsi-
cally fluid and vectorial anyway) or with the new paradigm in emotion research
(which postdates Massumi’s seminal work). For instance, the concept of emo-
tion as behaviour is a dynamic one, particularly in music. Even so, the notion
of affect can usefully tilt the discussion towards a more processual standpoint.
Emotion dissolves into time: at the broadest level, it becomes apparent that the
very discrimination and experience of these emotions is not a fixed absolute, but
something which develops with age and expertise over a lifetime’s immersion in
these works (Spitzer and Coutinho 2014).
Emotional characteristics are not absolute but relational: with sadness,
what is at issue is not slow tempo per se, but slower or slowest; with anger, it is
not conflicted texture, but more and most conflicted. As David Huron (2011)
reminds us, low pitch does not in itself connote sadness; if so, men would
always sound sadder than women. It is low pitch relative to a corpus or group.
For Bach, that corpus is the baroque style; for Mozart’s Trio, it is the classical
style. When listeners discern tenderness in both the Siciliana and the Trio, they
refer each movement to its respective stylistic context, and read off the emotion
by translating different display rules or ‘languages’: the disjunctions of Bach’s
Fuga are tame next to, say, those in Le sacre du printemps, but extreme in rela-
tion to the sonata at hand and baroque music in general.
Compared to the circle of style, Bach’s sonata as a whole constitutes a more
focused and tightly circumscribed cycle. I tend to hear the sonata like a Calder
mobile, with ‘vectorial’ transformations happening not just between contigu-
ous movements but combinatorially across all movements. The contiguous vec-
tors are the most direct because they unfold in time. The shift from Adagio to
Fuga takes us from stasis to cadential action; from ambiguity to conflict; from
self-​reflection to orientation towards another subject. Fuga to Siciliana moves
from interpersonal conflict to harmonious dialogue. Siciliana to Presto takes us
from lyrical freezing (the lullaby is a lyric standstill) to a flight response. And
were we to return full circle to the beginning, the difference between Presto and
Adagio would be that between the individual in the world and individual self-​
absorption: in fear, the E♭ blocks the Presto’s drive; in sadness, the E♭ affords
energy to the Adagio’s lethargy.
Connections also cut across the movements noncontiguously, so that the
sonata—​like style in general—​is as much a free mobile as a cycle moving in
one direction. The Adagio’s sadness is an aversive emotion, pushing us away
from dissonance; the yearning of the Siciliana pulls us towards harmony. The
Shapes of affect in Bach’s Sonata in G minor 123

Presto’s fear is passive: we are in the grip of a stereotype; the Fuga’s anger is
deliberative, deploying the stereotype with intent.
Viewing Bach’s sonata as a combinatoire of warring passions jibes with how
emotions functioned in Shakespeare and Homer, according to the emotion his-
torian Philip Fisher. In Fisher’s words, the passions, or ‘vehement states’, define
each other by fighting each other:
When used to define and express the substance of the self, the vehement
states—​anger, wonder, ambition, jealousy, shame, pity, or fear—​draw
on an essentially Greek and especially Homeric theory of substance and
struggle, or, as the Greeks called it, agon. Substances mutually make
each other known, not only because of their differences but because of
moments of conflict. It is at the meeting point where combat takes place
and mutual destruction is possible that each becomes for the first time
visible as what, in itself, it is. A  large rock is one substance, the water
of the sea another. At the shoreline where the sea pounds against the
rock, the rock registers in its shape nothing but the consequences of thou-
sands of years of waves cutting into it, even as each individual wave was,
in turn, stopped and broken by the rock’s resistance… The shattering
wave, the pounded rock make visible on each side the nature of sea and
rock, but they do so at the very moment that each of the two is situation-
ally flooded from without by the differences that occur as each limits the
other. (2003: 51)

The mutual definition of the passions is enshrined in their narrative pairings.

Anger is a common outcome of sadness; the rage of Achilles is consequent on
Achilles’ long depression, his grief on the death of Patroclus. Fisher describes
how the turn from one passion to the other involves a redirection of the will from
inward-​facing mourning to outward-​facing vengeance (ibid.: 64). Modern psy-
chologists are also aware that ‘sadness occurs in several dynamically significant
patterns’, chief of which is the sadness–​anger pattern that characterizes some
low moods, such as depression (Izard and Ackermann 2004: 259). One may rea-
sonably speculate, then, that Bach’s Siciliana lullaby mourns the mother of his
children, his first wife, Maria Barbara. Formally, its lyrical standstill interrupts
the rage of the Fuga from the panic of the Presto. Ending a cycle with pan-
icked flight may seem odd—​especially given that so many minor-​mode finales in
western music end this way (one thinks, for instance, of Chopin’s second piano
sonata, just after its funeral march)—​until one recalls Aristotle on drama. Tragic
catharsis is compounded from pity and fear. It is not that the musical persona/​lis-
tening subject is fleeing from anything in particular. Rather, the emotion of terror
is expressed through its behavioural correlative, which is flight. Fear—​the most
vectored and temporal of the emotions—​endows Bach’s cycle with the shape of
things to come.
124 Music and Shape


Alloy, L. and L. Abramson, 1979: ‘Judgment of contingency in depressed and nondepressed

students: sadder but wiser?’, Journal of Experimental Psychology: General, 108: 441–​85.
Clarke, E., 2003: ‘Introduction’ to Psychology of Music. §IV: Performance, Section 1,
Grove Music Online, http://​​subscriber/​article/​grove/​music/​
42574pg4 (accessed 9 April 2017).
Cumming, N., 2000: The Sonic Self (Bloomington: Indiana University Press).
Davies, S., 1994: Musical Meaning and Expression (Ithaca, NY: Cornell University Press).
Dreyfus, L., 1998: Bach and the Patterns of Invention (Cambridge, MA: Harvard University
Ekman, P., 1984: ‘Expression and the nature of emotion’, in K. Scherer and P. Ekman, eds.,
Approaches to Emotion (Hillsdale, NJ: Erlbaum), pp. 319–​44.
Fabian, D., 2005:  ‘Towards a performance history of Bach’s Sonatas and Partitas for
Solo Violin: preliminary investigations’, in L. Vikarius, ed., Essays in Honor of László
Somfai: Studies in the Sources and the Interpretation of Music (Lanham, MD: Scarecrow
Press), pp. 87–​108.
Fisher, P., 2003: Wonder, the Rainbow, and the Aesthetics of Rare Experiences (Cambridge,
MA: Harvard University Press).
Frijda, N., 1986: The Emotions (Cambridge: Cambridge University Press).
Gabrielsson, A. and E. Lindström, 2010: ‘The role of structure in the musical expression of
emotions’, in P. Juslin and J. Sloboda, eds., Handbook of Music and Emotion: Theory,
Research, Applications (New York: Oxford University Press), pp. 367–​400.
Gjerdingen, R. O., 2007: Music in the Galant Style (New York: Oxford University Press).
Hatfield, E. and R. L. Rapson, 2004: ‘Love and attachment processes’, in M. Lewis and
J. M. Haviland-​Jones, ed., Handbook of Emotions, 2nd edn (London: Guilford Press),
pp. 663–​76.
Heinichen, J. D., 1711: Neu erfundene und gründliche Anweisung des General-​Bass (Hamburg:
Benjamin Schillern).
Huron, D., 2006: Sweet Anticipation: Music and the Psychology of Expectation (Cambridge,
MA: MIT Press).
Huron, D., 2008: ‘A comparison of average pitch height and interval size in major-​and
minor-​key themes: evidence consistent with affect-​related pitch prosody’, Empirical
Musicology Review 3/​2: 59–​63.
Huron, D., 2011: ‘Why is sad music pleasurable? A possible role for prolactin’, Musicae
Scientiae 15/​2: 146–​58.
Izard, C. and B. Ackerman, 2004:  ‘Motivational, organizational, and regulatory func-
tions of discrete emotions’, in M. Lewis and J. M. Haviland-​Jones, ed., Handbook of
Emotions, 2nd edn (London: Guilford Press), pp. 253–​64.
Juslin, P., 1997: ‘Emotional communication in music performance: a functionalist perspec-
tive and some data’, Music Perception 14/​4: 383–​418.
Juslin, P. and J. Sloboda, eds., 2010: Handbook of Music and Emotion: Theory, Research,
Applications (New York: Oxford University Press).
Karl, G. and J. Robinson, 1995: ‘Levinson on hope in the Hebrides’, Journal of Aesthetics
and Art Criticism 53: 195–​259.
Kivy, P., 1989: Sound Sentiment: An Essay on the Musical Emotions (Philadelphia: Temple
University Press).
Shapes of affect in Bach’s Sonata in G minor 125

Lakoff, G. and M. Johnson, 1980: Metaphors We Live By (Chicago: Chicago University Press).

Leech-​Wilkinson, D., 2006: ‘Portamento and musical meaning’, Journal of Musicological
Research 25: 233–​61.
Leech-​Wilkinson, D., 2013: ‘The emotional power of musical performance’, in T. Cochrane
and B. Fantini, eds., The Emotional Power of Music (New  York:  Oxford University
Press), pp. 41–​54.
Lester, J., 1999:  Bach’s Works for Solo Violin:  Style, Structure, Performance (New  York:
Oxford University Press).
Levinson, J., 1990: ‘Hope in the Hebrides’, in idem, Music, Art, and Metaphysics (Ithaca,
NY: Cornell University Press), pp. 336–​75.
Margulis, E., 2005: ‘A model of melodic expectation’, Music Perception 22/​4: 663–​714.
Massumi, B., 2002:  Parables for the Virtual:  Movement, Affect, Sensation (London:
Duke University Press).
Meyer, L. B., [1976] 2000: ‘Grammatical simplicity and relational richness: the trio of Mozart’s
G-​minor symphony’, in The Spheres of Music: A Gathering of Essays (Chicago:
University of Chicago Press), pp. 55–​125.
Monelle, R., 2000: The Sense of Music: Semiotic Essays (Princeton: Princeton University
Niedt, F. E., [1706] 1721: Musicalische Handleitung, Part 2, 2nd edn (Hamburg: Benjamin
Schillers Wittwe & Joh. Christoph Kißner).
Nussbaum, C., 2007: The Musical Representation: Meaning, Ontology, and Emotion (Cambridge,
MA: MIT Press).
Oatley, K., 2004: Emotions: A Brief History (Oxford: Blackwell).
Robinson, J., 2005: Deeper than Reason: Emotion and Its Role in Literature, Music, and Art
(Oxford: Clarendon Press).
Ross, S., 2003: ‘Style in art’, in Jerrold Levinson, ed., The Oxford Handbook of Aesthetics
(New York: Oxford University Press), pp. 228–​44.
Russell, J. A., 1991: ‘In defense of a prototype approach to emotion concepts’, Journal of
Personality and Social Psychology 60/1: 37–​47.
Schubert, E. and D. Fabian, 2006: ‘The dimensions of baroque music performance: a
semantic differential study’, Psychology of Music 34/4: 573–​87.
Sloboda, J. and P. Juslin, 2001: ‘Psychological perspectives on music and emotion’, in
P. Juslin and J. Sloboda, eds., Music and Emotion: Theory and Research (New York:
Oxford University Press), pp. 71–​104.
Spitzer, M., 2004: Metaphor and Musical Thought (Chicago: University of Chicago Press).
Spitzer, M., 2009: ‘Emotions and meaning in music’, Musica Humana 1/​2: 153–​94.
Spitzer, M., 2013:  ‘Sad flowers:  affective trajectory in Schubert’s Trockne Blumen’, in T.
Cochrane and B. Fantini, eds., The Emotional Power of Music (New  York:  Oxford
University Press), pp. 7–​21.
Spitzer, M. and E. Coutinho, 2014: ‘The effects of expert musical training on the percep-
tion of emotions in Bach’s Sonata for Unaccompanied Violin No. 1 in G minor (BWV
1001)’, Psychomusicology: Music, Mind, and Brain 24/​1, 35–​57.
Trainor, L. and E. Hannon, 2013: Musical Development, in D. Deutsch, ed., The Psychology
of Music, 3rd edn (New York: Academic Press), pp. 423–​98.
Trevarthen, C., 1999–​ 2000: ‘Musicality and the intrinsic motive pulse: evidence from
human psychology and infant communication’, in special issue on ‘Rhythm, narrative,
and origins of human communication’, Musicae Scientiae 3, supplement: 155–​99.
126 Music and Shape

Unyk, A., S. Trehub, L. Trainor and G. Schellenberg, 1992:  ‘Lullabies and simplicity:  a
cross-​cultural perspective’, Psychology of Music 20: 15–​28.
Wegman, R., 2003: ‘Johannes Tinctoris and the “New Art”  ’, Music and Letters
84/​2: 171–​88.


Joachim, J., [1903] 2003: The Great Violinists: Recordings from 1900–​1913. (Testament
SBT2 1323).
Kremer, G., 1981: Johann Sebastian Bach: Sonatas and Partitas for Unaccompanied Violin
(Philips 6769 053; CD reissue: ECM New Series 1926–​27).
Luca, S., 1992: Johann Sebastian Bach: Sonatas and Partitas for Unaccompanied Violin
(Nonesuch HC-​73030; CD reissue: 73030).
Perlman, I., 1988: Johann Sebastian Bach: Sonatas and Partitas for Unaccompanied Violin
(EMI Classical CDS 7 49483 2; reissue: 0 85281 2).
Steven Isserlis, cellist

To perform a piece of music is essentially to tell a story. The task of an inter-

preter is that of narrator and actor; he or she must relate the tale woven by
the composer, not merely portraying, but fully identifying with the characters
and their fates. Music, like fiction, needs form and shape in order to be believ-
able or moving. Needless to say, musical forms can be infinitely varied—​and
perhaps the word ‘story’ is confining it too closely, when so much music might
as easily be perceived as a poem, a fantasy, a reverie; but whatever its nature, a
composition needs the discipline of a preordained structure in order to attain
the inevitability of satisfying art.
As a performer, there is no way that I can take the listener on a musical jour-
ney unless I understand (or at least attempt to understand!) the various aspects
of the work being performed. This involves several levels of comprehension.
To begin, an overall knowledge of the score is essential: just as an actor cannot
give a convincing account of a role without knowing what happens to all the
characters in the play (not just to the actor’s own character), so a musician must
be familiar with all the voices in a piece of music. This should go without say-
ing; but, strangely, it doesn’t. I have even heard of teachers discouraging their
students from delving too deeply into the score, lest it make their interpretation
less individual! Ahem.
Secondly, one needs a strong overview of the general shape of the work.
All musical forms present their own challenges. If a composer has chosen to
write in sonata form, for instance, there is no way that an interpreter can give a
proper account of the work without understanding the contrasts and similari-
ties between the (usually three) main subjects—​any more than one could under-
stand a novel without knowing who the main characters are. But of course the
demands on a performer go far beyond that basic grasp of the facts. Not only
does one have to delve into the inner fabric of those main subjects, with their infi-
nite variety of light and shade, of strength and gentleness, of rhythmic alertness
128 Music and Shape

and languor, and so on: one has also to be familiar with their fates, with the
interplay between them, with their transformation over the course of the work.
This knowledge informs every aspect of a performance—​tone colours, tempo
relationships, dynamic contrasts, etc. Again, this may sound obvious, but all
too often musicians fail to come to terms with these basic elements. The result is
boring performances—​and alas, there are far too many of those! I would liken
these haphazard musicians to travellers walking through a forest, lurching from
tree to tree, appreciating the beauty of each tree, perhaps, but with no idea how
to get to the other side of the forest. Conversely, a performer who understands
the structure of a work will be blessed with the freedom of a bird flying above
that forest, perceiving each detail in all its exquisite clarity, but able at all times
to make out the overall direction of the path. Foreknowledge of the form—​the
story—​must inform the interpretation from the outset. The actor analogy again
seems apt: at the end of a great performance of Hamlet, the audience should
somehow feel that, despite the many unpredictable twists and turns of the play,
there has been an inevitable trajectory to the hero’s fate.
Within this overall view of the work, one must also, of course, grasp the
microstructure of each phrase. In music, as in speech, every clause, every part
of the phrase, has a centre, with notes leading to and away from it. Just as in
speaking one highlights the most important or unexpected word in a sentence,
so an equivalent event in music will require some sort of emphasis. This can
be achieved with dynamic stress, of course; it can also be done with time—​the
so-​called agogic accent, in which one lingers on one note, making up the time
on less important notes (‘rubato in tempo’); with colour; or in countless other
ways. If the performance is to sound truly alive, no two consecutive notes should
have exactly the same weight. And then, again as in speech, there is the question
of punctuation: music is full of commas, full stops, semicolons, full colons and
so on. For string ​players, the bow must be ready to leave the string at all times;
for pianists, hands and feet must similarly be employed in order to allow the
phrase to breathe; and so on through all instruments and voices. If this very basic
aspect of phrasing is overlooked, the music becomes as comprehensible as the
soliloquy of the unfortunate actor—​very far from the great Hamlet described
above—​who, in a panic, stumbles to the front of the stage and gabbles out:
‘Tobeornottobethatisthequestionwhethertisnobler’, etc. Not a consummation
devoutly to be wished.
In short, the musician’s challenge is to convey the sentences, the paragraphs
and the overall narrative, the personalities, utterances and destinies of the
musical characters, with as much clarity as possible. In even shorter: the aim
is to communicate the meaning of the music—​what a surprise! But it requires
thought as well as feeling, study as well as spontaneity. It also requires the
technical mastery that enables the performer to shape each phrase as the music
requires. It’s not that straightforward, in fact—​not as easy as it should sound.

Shape in music notation

Adam Ockelford

This chapter uses an extension of ‘zygonic’ theory to investigate how structure

is perceived in the auditory and visual domains, and how incoming data from
the two sensory modalities may be connected in the mind through different
forms of cross-​modal mapping. I argue that such mappings enable patterns in
sound to be depicted coherently and consistently as two-​dimensional images.
Four types of relationship that potentially exist between sound and shape are
identified, namely ‘regular’, ‘indirect’, ‘arbitrary’ and ‘synaesthetic’. The first
three of these are shown to be analogous in varying degree to the tripartite
typology of signs—​icon, index and symbol—​developed by the American phi-
losopher Charles Peirce, a founder of semiotics. The new, zygonic model is then
used to interrogate the nature and function of picture scores created by young
children, tactile representations of pitch created by their blind peers, western
notation based on the staff, music transcribed into braille, guitar chord symbols
and a synaesthete’s visualization of a track from Jean-​Michel Jarre’s Oxygène.

Zygonic theory

Zygonic theory seeks to answer the question of how it is that music makes
sense:  how, in the absence of semantic content, it is structured and forms
abstract narratives in sound that convey meaning over time. The theory is
‘psychomusicological’ in nature, in that it advances a musicological hypothesis
underpinned by psychological principles. Hence it is an epistemological hybrid,
in which the idiographic intuitions characteristic of music theory and analysis

130 Music and Shape

are informed by the nomothetic findings proper to cognitive psychology (Cross

1998; Gjerdingen 1999; Ockelford 2009).
Zygonic theory takes our apprehension and understanding of music to be
built up from cognitive data acquired from a number of perceptual domains
pertaining to sound, such as pitch, loudness and timbre. Each domain can
exist in different states, thereby functioning as a perceived sonic variable. In
philosophical terms such states are conceptualized as ‘qualia’, that is, units of
experience (see, for example, Chalmers 1996; Kanai and Tsuchiya 2012). Some
domains, such as duration, have a single axis of variability, while others, like
timbre, are multidimensional in nature; some gauge qualities such as loudness,
while others detail a sound’s perceived location in time or space; and some,
like pitch, pertain to individual notes, while others, including tonality, are
characteristic of a group (Ockelford 1991b). The perceived state of a domain
at any given point in time is termed, in zygonic theory, a ‘perspective1 value’
(Ockelford 2005: 10–​12). In western music theory, such values are assigned
labels, which serve to define them more or less specifically, permitting iden-
tification and facilitating their replication over time. The degree of specificity
with which perspective values are defined is contingent both on the fidelity with
which they can be perceived, remembered and reproduced, and on the accuracy
demanded by a particular musical context. For example, values of pitch, which
play a prominent role in most musical design and can be discerned with some
precision (Stevens 1975), are typically labelled exactly (using designations such
as ‘a4’ and ‘f♯3’), whereas values of loudness, which generally fulfil a second-
ary structural function (Boulez [1963] 1971: 37; Ockelford 1999: 4), and whose
perception is contextually bound, are classified more broadly (using terms such
as p and f).
Perspective values can be compared in the mind, and the resulting mental
constructs—​which, in the American cognitive linguist George Lakoff’s terms,
constitute forms of ‘link schema’ (1987: 283) that inhabit the mental space per-
taining to music processing (Fauconnier [1985] 1994)—​can, again, be conceptu-
alized in music-​theoretical terms. For example, differences in pitch are referred
to as ‘intervals’. In the nomenclature of zygonic theory, link schemata such as
these are termed ‘interperspective relationships’ (Ockelford 1991b: 133). Like
the perspects to which they pertain, these are variable in nature, potentially
existing across a range of ‘interperspective values’.
Interperspective values reflect the nature of the perspects to which they per-
tain and are of different types. For example, the interval between the onsets of
two notes gauges the perceived difference in time that separates them. Without
(necessarily) being aware of it, musicians invoke interperspective values of
this type whenever they use a phrase such as ‘the cor anglais comes in five
beats after the oboe’ (given that they have a sense of how long each beat is);
see Figure 5.1. The perspect ‘duration’ (or note-​length) incurs interperspec-
tive relationships that may be heard and understood as differences or ratios
Shape in music notation 131

FIGURE 5.1   Oboe and cor anglais duet from the third movement of Vaughan Williams’ Fifth

by musicians adopting a consciously conceptual mode of listening (forms

of comparison that are implicit in standard western notation). For instance,
the oboist performing the passage from Vaughan Williams’ Fifth Symphony
shown in Figure 5.1 may mentally calculate that the first note is a semiqua-
ver longer than a crotchet would have been (a difference) or that the dotted
crotchet in bar 2 of the excerpt is three times as long as the quaver that follows
(a ratio). Other perspects yet, such as timbre, bear values that are irreducible
to solitary coefficients (see Risset and Wessel 1999), whose interperspective
relationships are therefore typically complex too (though cf. Slawson 1985).
For example, the performers of the passage in Figure 5.1 may regard the cor
anglais as having a ‘darker’ sound than the oboe. Given this diversity, it is
perhaps inevitable that, while for a given sound the relationships between per-
spective values are bound together in a common phenomenological experience
(see Roskies 1999), all perspects have come to serve distinct functions in music,
as we shall see.
Interperspective relationships between perspective values—​metaphorically,
those that are closest to the perceptual ‘surface’—​ are termed ‘primary’.
Relationships between primary relationships operate at a deeper level and are
said to be ‘secondary’. In some musical contexts, ‘tertiary’ relationships fulfil
an important cognitive function too (Ockelford 2002). In most circumstances
this represents the maximum level of cognitive abstraction that the perceived
relationships between musical objects attain (irrespective of style).
Interperspective relationships may be illustrated graphically using the letter I
with an arrow superimposed (see Figure 5.2). Where such relationships connect
perspective values that are extended in time—​as pitch, loudness and timbre usu-
ally are—​arrowheads are filled; relationships linking singular features such as
duration and onset time are open (see Ockelford 1999). To avoid ambiguity, the
perspect concerned is indicated through a superscript—​a single letter such as
P for pitch, O for onset and T for timbre may suffice—​and the level of the rela-
tionship is shown through an appropriate subscript. For example, in Figure 5.2,
132 Music and Shape

FIGURE 5.2   Representation of primary interperspective relationships

the primary interperspective relationship of pitch indicates the descending

major second between the d2 and c2 in the oboe part; the relationship of timbre
between the oboe and the cor anglais conveys the ‘darker’ sound of the latter;
and the secondary relationship of onset reflects the fact that the cor anglais’s
opening motive is a beat shorter than the oboe’s.
Zygonic theory holds that the cognition of musical structure is a function of
a particular class of interperspective relationships, through which one value is
heard as deriving from (or, conversely, generating) another. This occurs when
one value is thought to imitate another or to be the model that another is heard
to replicate. This is because structure equates to organization, or control, and if
one value is deemed to imitate another, then, given that a range of values would
otherwise have been possible, the second will be constrained metaphorically by
the first. The interperspective relationships through which imitative order is
perceived are of a special type that I term ‘zygonic’2 (Ockelford 1991b: 140ff.).
A  ‘primary zygonic relationship’ or ‘zygon’ may be represented as shown in
Figure 5.3, which illustrates values of pitch, duration and loudness derived
through imitation from the opening of the oboe’s phrase. Note the use of com-
plete arrowheads to indicate a relationship between values that are perceived
to be the same. ‘Imperfect’ zygonic relationships, in which the value generated
differs slightly from the one it imitates, are indicated using half arrowheads,
which, as we have seen in the case of non-​zygonic interperspective relation-
ships, are indicative of change (Ockelford 2006:  87). Like half arrowheads,
FIGURE 5.3   Primary and secondary zygonic relationships
134 Music and Shape

complete arrowheads may be filled (in the case of values extended in time) or
open (in the case of singularities). An example in the domain of duration is
shown in Figure 5.3, where a crotchet tied to a semiquaver in the oboe part
is imitated a beat later by a crotchet tied to a triplet quaver in the cor anglais.
Secondary zygons may be deemed to link primary interperspective relation-
ships, where one is thought to imitate another. In Figure 5.3, examples pertain-
ing to pitch and onset are shown. Once more, superscripts and subscripts are
used to indicate the perspective domain in which the relationship exists and its
level. Finally, tertiary zygons may connect secondary relationships in the mind
of the listener. These occur in the domain of perceived time, for instance, where
there is a regular accelerando or ritardando; see also Figure 5.8.
It is believed that zygonic relationships such as those depicted in Figure 5.3
offer a highly simplified representation of certain cognitive events that we may
reasonably suppose take place (typically nonconsciously) during meaningful
participation in musical activity—​whether listening, performing or creating
music anew through improvisation or composition. Moreover, the single con-
cept of a zygon bequeaths a vast perceptual legacy, with many possible mani-
festations: potentially involving any perceived aspect of sound, existing over
different periods of perceived time, and operating within the same and between
different pieces, performances and hearings. Zygons may function in a num-
ber of ways: reactively, for example, in assessing the relationship between two
extant values, or proactively, in ideating a value as an orderly continuation from
one presented. They may operate between anticipated or remembered values,
or even those that are wholly imagined, only ever existing in the mind. (There
is, of course, no suggestion that the one concept is cognitively equivalent in all
these manifestations, only that it is logically so.) Even a short passage of music
comprises a large number of perspective values, potentially linked through a
vast network of relationships, whose effect would be perceptually overwhelm-
ing were it not for the fact that the mind seeks (and is able to find) groups of
relationships that give the impression of acting together in coordinated fash-
ion. This issue is considered at length elsewhere (for example, Ockelford 2005).

Extending zygonic theory to art and related forms

of visual representation

In one of the first main expositions of zygonic theory (Ockelford 1999), I hinted
that the notion of the creation and cognition of structure through imitation
need not be limited to music (and so to the perceptual domains pertaining to
sound)—​potentially having application to painting, sculpture and ballet, for
example. In relation to art, I observed that ‘pictures normally display an inner
coherence that ultimately derives from the repetition of one or more of its per-
ceived aspects, such as colour, size, shape or texture. An abstract drawing that
Shape in music notation 135

lacked duplication of any feature—​that showed no evidence of symmetry, uni-

formity or regular change—​would be the visual equivalent of random bursts
of white noise’ (ibid.: 114).
Some of the wider implications of this idea are currently being worked
through as part of the ‘Sounds of Intent’ project, an investigation led by the
Applied Music Research Centre at the University of Roehampton into the
musical and wider artistic development of children with learning difficulties
(see, for example, Ockelford 2012a). In this chapter, in the context of diverse
forms of music notation, we explore how zygonic theory may have relevance to
the creation and cognition of shapes, perceived visually. As most music scores
are monotone—​black images on a white, two-​dimensional background—​it is
on this scenario that we focus our attention.
Consider first the simplest possible image: the smallest apprehensible black
dot on an otherwise blank page (Pomerantz and Portillo 2011: 1336). This can
be completely defined in terms of its (relative) location (see Figure 5.4). Now
imagine a second dot, placed a short distance from the first (Figure 5.5). We
can model the perceived relationship between the two as an interperspective
relationship of location. Like intervals in the domain of pitch, this relationship
can be defined as a vector, as it has the two components ‘magnitude’ and ‘direc-
tion’, although here the latter is infinitely variable (and not just ‘up’ or ‘down’,
as is the case with pitch); see Figure 5.6.

FIGURE 5.4   The image of a small black dot

FIGURE 5.5   Two small black dots


FIGURE 5.6   A primary interperspective relationship of location, whose value is shown

using Cartesian coordinates
136 Music and Shape

In music, the directionality of interperspective relationships tends to be

determined by the temporal order of the events to which they pertain: that is,
the value heard second is usually compared to that apprehended first, although
in retrospect, the relative salience of the values in the musical narrative may
reverse things (Cone 1987:  249ff.; Ockelford 1999:  80). But with static visual
images, how is directionality likely to be determined, given that the eye is free
to glance from either dot to the other at will? With images that convey semantic
information in sequential form (such as texts and scores, in which the continu-
ous temporal unfolding of a verbal or musical narrative is represented a chunk
at a time on the page), the direction in which time is represented and in which,
therefore, a viewer’s gaze typically shifts will be culturally determined (the com-
plex saccadic pattern of eye movements that underpin reading notwithstanding;
see, for example, Kowler 2011). In the West, this means that interperspective
relationships typically function principally from left to right, and, other things
being equal in the horizontal dimension, from top to bottom of the page.
As we saw above (Ockelford 1999: 114), it is my contention that the under-
lying principle of zygonic theory—​that the imitation of a variable in sound
equates to musical structure (since one perspective or interperspective value is
perceived as being constrained by another)—​can be extended to the visual arts.
In the case of monotone dots on a page, at least three are required, since two
dots with the same location are indistinguishable from each other.3
Exact imitation of magnitude and direction of an interperspective value of
location yields three dots in a straight line (see Figure 5.7). The second can
be considered to emulate the first through a secondary zygonic relationship.
Partial imitation is also possible if magnitude or direction alone is implicated.
Constant change in a series of interperspective differences of location may be
detected and regarded as imitation at the tertiary zygonic level (Figure 5.8).



1 (3,1)
1 (3,1)

FIGURE 5.7   A secondary zygonic relationship of location reflects the fact that the difference in
location between dots B and C is deemed to exist in imitation of the difference between A and B.
Shape in music notation 137


3 Loc
Loc (6.75,2.25)
2 Loc


1 (3,1)

FIGURE 5.8   Imitation of location at the tertiary zygonic level


2 (h)

1 +3x

+3x +y

+y 1
1 Loc

FIGURE 5.9   The perceived orderliness inherent in a straight line modelled in zygonic terms

All images can ultimately be considered as being made up of dots like those
shown in Figures  5.4–​5.8. One way of modelling the orderliness of a straight
line is to regard it as describing consistent change in the horizontal and vertical
dimensions, as shown in Figure 5.9. The filled arrowheads symbolize a theoreti-
cally infinite number of relationships that are the same. The connecting (dashed)
lines indicate relationships that work in parallel. A cluster of dots perceived as
forming a single gestalt may be considered to be related to others that are the same
or similar through the compound perspect ‘shape’. Where imitation is thought
to be present, the interperspective relationships between shapes may be deemed
to be zygonic (Figure 5.10). Finally, consider that humanly created images may
be regarded as imitating the visual qualities of objects in ‘real life’ (just as music
may incorporate environmental sounds such as bird-​song; see Ockelford 2012b).
138 Music and Shape


FIGURE 5.10   One shape deemed to exist in imitation of another

Analysing shape in music notation using zygonic theory

Having considered how, according to zygonic theory, structure may be created

and cognized in two-​dimensional visual images, we now turn our attention to
the potential application of this thinking to music notation.
The principal function of notation is to serve as a link between composer
and performer (who may be the same person):
• To communicate musical ideas more or less precisely in visual form,
thereby enabling sight-​reading, for example, and permitting varying
degrees of interpretation and improvisation, so the performer has a
role beyond mere replication;
• As an aide memoire in performance; and
• To enable instructions of how to perform a piece of music to be
stored permanently.

Observe that there is no logical demarcation between what can reasonably be

regarded as ‘scores’, comprising musical notation, and other visual stimuli,
created by composers such as Cornelius Cardew, which are intended to serve
as a stimulus for performers to determine more or less for themselves which
musical sounds to play. Accepting that there is a certain conceptual fuzziness
in its definition, our focus here is on the former category (scores), with the
acknowledgement that there are conventions of performance within a given
community of musicians who use notations that are not written down, but
to which players and singers are nonetheless expected to conform: practices
that are passed on, consciously or nonconsciously, from one generation of
performers to the next, including the use of rubato, dynamics and vibrato.
And there are other spheres of musical activity where the nature of the score
gives performers the licence to improvise within predetermined constraints
(as in the figured basses used in continuo playing, for example, or lead sheets
in jazz).
Shape in music notation 139

How is it, then, that sounds can be represented through visual images, and
what forms do such analogues take? Scores work on the principle that system-
atic relationships are possible between shape and sound: a shape is taken to have
a meaning such that, in the mind of the performer, the visual image dictates, to
a greater or lesser extent, which sound is to be made, and when. The zygonic
conjecture provides a theoretical framework for modelling how this process
works, and evinces three possible mechanisms through which the visual repre-
sentation of musical sounds can occur. As we shall see, these have a somewhat
convoluted relationship with the threefold Peircean typology of signs:  icon,
index and symbol (Peirce [1867–​71] 1984: 57; [1893–​1917] 1998: 461).
The first form of representation is equivalent to Peirce’s notion of ‘icon’,
whereby a sign denotes its object by virtue of a shared quality. Zygonically
speaking, this is a ‘regular’ mapping, in which the relationship between per-
spective values in different domains involves a function that is not specific to
the perceptual domains concerned—​rather, pertaining to a feature or quality
that is abstracted from the perceptual surface and is common to both. To see
this principle in action, consider that, as differences are domain-​specific, so
single values of difference cannot be mapped systematically between domains
(see Figure 5.11).4 In contrast, ratios, being abstract, are not bound by the con-
text in which they occur. Hence they permit regular mapping between domains.
See, for example, Figure 5.12. Ratios may occur between differences that are
expressed through secondary relationships. In Figure 5.13 the regular inter-
domain relationships are at the tertiary level. In Peircean terms, the forms of
representation set out in Figures 5.12 and 5.13 are forms of icon that he called
‘diagrams’, ‘which represent the relations, mainly dyadic, or so regarded, of
the parts of one thing by analogous relations in their own parts’ (Peirce [1893–​
1913] 1998: 274).


494 Hz


Sound sources Perceived as


What would
440 Hz imitation of
pitch by location
mean in these

FIGURE 5.11   Single interperspective values of difference cannot be imitated between domains;​
therefore, systematic mapping and iconic representation in Peircean terms are not possible.
140 Music and Shape





Sound sources Perceived as
Imitation of
the ratio between
two durations by
the ratio between
two lengths is possible

FIGURE 5.12   Domains whose perspective values are capable of conveying a sense of size can bear
cross-​modal imitation of ratios at the secondary level and therefore have the capacity for iconic





494Hz P/Loc





Tertiary zygonic

imitation of ratio


FIGURE 5.13   Iconic representation of pitch in terms of location through tertiary-level imitation

In psychophysical terms, the notion that regular mapping between domains

requires cognitive abstraction at the level of secondary or tertiary relation-
ships (implying the appearance of at least two values in each domain) may
initially appear to run counter to Stanley Smith Stevens’ assertion that any
two stimuli can be equated through the procedure of cross-​modality match-
ing, provided they have at least one aspect or attribute that varies in degree
(1975:  132). This would suggest that a single pitch, for example, could be
Shape in music notation 141

mapped consistently onto a given location (within a given two-​dimensional

spatial universe).
But how could this be? We will explore Stevens’ contention by analysing
the work of psychologists Samuel Mudd (1963) and Michael Thorpe (2015)
which, at first blush, seems to support the notion that a value in one domain
may be felt to have some equivalence in its own right to a value in another.
Mudd investigated what he called the ‘natural potential’ of certain aspects of
sonic stimuli to evoke perceptual–​motor responses by having participants place
a peg in a square matrix of holes in the position that they felt best represented
the frequency, intensity, duration or direction of a sound to which they were
exposed. In Thorpe’s procedure, which examined the connection between only
pitch (perceived frequency) and location, the peg was replaced with a dot on a
computer screen that could be manipulated using a mouse. Both experiments
produced similar results, suggesting that there is indeed a tendency to map pitch
systematically onto vertical and, to a lesser extent, horizontal axes (although
there was considerable variation between participants).
However, in both experiments, a reference pitch was linked to a predeter-
mined location that was presented before each judgement of distance was made.
In Mudd’s case, one of six different pitches then followed, three of which were
above the reference pitch and three below. In total, Mudd’s participants under-
took eighteen trials (three for each pitch). In Thorpe’s test, one of ten different
pitches was heard following the reference tone, with a total of ninety attempts
(nine for each pitch). Hence the visual mapping was to pitches that were heard
in a relative (intervallic) sense rather than just as absolute values. Even so,
this would imply that participants were attempting to equate cross-​modal dif-
ferences rather than ratios, which zygonic theory suggests could not be done
consistently. Consider, though, that the experiments involved repeated presen-
tations of the reference tone in relation to a number of different pitch stimuli.
Therefore, following the first trial, listeners would be able to compare intervals
(and their visual correlates) through secondary interperspective relationships of
ratio. If this were the case, then one would expect participants’ initial attempts
to map a pitch onto a particular location to be far more variable than their
subsequent efforts, which could be gauged against previously heard intervals.
And, indeed, analysis of Thorpe’s (2015) ‘practice’ data (preliminary trials in
which participants familiarized themselves with the task) shows that this is pre-
cisely what occurred. In every case, participants’ first trials differed significantly
from the mean of subsequent mappings of the same pitch [F(9,333) = 10.30,
p < .001], between which no significant differences were found.
Moreover, it is conceivable that listeners were additionally gauging intervals
in relation to putative extremes of pitch (and, rather more precisely, of the
pegboard or computer screen). As Thorpe says in relation to his experiment:
the highest and lowest frequencies of presented comparison tones … were
octaves of the reference tone. An octave is the most recognisable of pitch
142 Music and Shape

intervals and it is reasonable to assume that all participants recognised

these as the extreme values of the variable each time they heard them.
The remaining comparison tones could then be represented by objects
positioned in order of their perceived pitch height in between these ‘book
ends’ with equal spacing. (2015: 132–​3)

This potentially explains Steven’s assertion concerning the cross-​modal map-

ping of individual values. Any perspective value from a familiar and (percep-
tually) finite domain can be gauged in relation to an imagined and limited
continuum of other values. Hence an isolated pitch can be deemed to be ‘high’,
for example, and mentally equated with a dot towards the top of a sheet of
paper: neither the pitch nor the dot can escape the perceptual legacy of relativ-
ity borne of previous experience that exists in the mind of every listener.
An important consideration in terms of the zygonic theory is whether or not
cross-​modal relationships such as these can be deemed to be imitative: that is,
whether, in a given set of circumstances, a sense of derivation can be deemed
to cross the modal divide (in the theoretical situations shown in Figures 5.12
and 5.13 it was assumed that imitation was present). A key factor in determin-
ing the presence and, potentially, the directionality of zygonic relationships is
the context in which the notation is used, and two contrasting scenarios are
explored here.
First, let us take the case of a composer intent on using a form of nota-
tion comparable to the form of representation of pitch shown in Figure 5.13
or, indeed, Mudd’s and Thorpe’s participants as they strive to complete the
experi­mental assignment they have been set. In either case, the task is to create
a visual analogue of a series of pitches heard or imagined. The pitches them-
selves could not be imitated, of course, nor the differences between them, but,
as we have seen, the ratios between these could be said to generate the propor-
tions between the distances separating the dots on the page (or their coun-
terparts on the pegboard or computer screen). Nonetheless, the pitches and
the marks on paper (or their equivalents) are necessary to reify the cognitively
more abstract ratios, and since the qualia are perceptually more salient than the
relationships between them—​both to the eye and to the ear—​it may seem to the
composer or research participant that the sense of derivation is vested in the
stimuli themselves; that is, each pitch seems to generate a dot.
Second, we consider the position of performers, reading a score such as
Stockhausen’s Sternklang (1971), in which representations of twenty-​eight con-
stellations provide visual imagery for them to interpret (see Figure 5.14). Here,
the sense of derivation works the other way round, in that the relative positions of
dots on the page are used to determine the relationships between intervals and so,
ultimately, pitches. Two horizontal guide lines, each deemed to represent a pitch
from a given set based on the overtone series, provide the necessary points of refer-
ence to enable every dot to stand for a further pitch in its own right (Stockhausen
FIGURE 5.14   Example of the derivation of pitch through imitation of a ratio between differences in
location from a constellation in Stockhausen’s Sternklang (1971) (image of der Adler © Stockhausen-​
Stiftung für Musik, Kürten, Germany, 1971,; used with
144 Music and Shape

1977:  28), since in the vertical dimension two interperspective differences exist
between each dot and the lines, between which a ratio can be gauged and trans-
ferred cross-​modally through a tertiary relationship. To the extent that this rela-
tionship can be regarded as imitative, so it can be classed as zygonic.
Consider that the constellations provide temporal performance information
too: according to Stockhausen ‘The points should be played/​sung in a short
terse manner, corresponding rhythmically to their graphic layout’ (ibid.: 28).
Although the tempo is not prescribed, this implies tertiary imitation of onsets
as shown in Figure 5.14. And, finally, note that the size of dots in Stockhausen’s
score is linked to intensity: ‘There are points of 5 different thicknesses to which
5 degrees of loudness should correspond. The thickest point is played/​sung at ff.
The smallest point corresponds to the respective tutti volume, and all other
points are relatively louder’ (ibid.: 29; emphasis in original). Because both size
and intensity can be compared through primary interperspective ratios, so
cross-​modal relationships are possible at the secondary level. Again, these may
be deemed to be zygonic if the changes in musical dynamics are taken to exist
through imitation of their visual representations.
So much for ‘regular’ cross-​modal relationships, which offer a coherent
way of linking visual images to musical sounds, and which may convey a
sense of derivation. There are other, ‘irregular’, possibilities too, which may
be ‘indirect’ or ‘arbitrary’. Examples of the former are to be found in the
charts that were part of some scores in the second half of the twentieth
century and beyond, which show wind players the required disposition of
their fingers in order to produce certain combinations of pitches simultane-
ously. For instance, an image such as that shown in Figure 5.15 may appear

FIGURE 5.15   Indirect connection between graphic and sound (Photographic image © Felicity
Ockelford, 2014)
Shape in music notation 145

in a piece of oboe music as an instruction to the player to produce f  2 and c3

together (Stone 1980: 194).
The relationship between shape and sound is indirect since it has two dis-
crete components. First, there is the connection between the chart and the
physical positioning of the player’s fingers on the instrument. In terms of
Peircean semiotics, the relationship between the two is iconic, since the loca-
tion of the fingers is represented by analogy. But there is a second, perhaps
less obvious, stage, in which the physicality of a player’s fingering serves as
an index for the sound that is subsequently (and consequently) made (Figure
5.15). Peirce describes an index as denoting something by virtue of an actual
connection between them, physically or causally, such as ‘the hand of a clock,
and the veering of a weathercock’ that can be observed or inferred ([1893–​
1913] 1998:  274). Here there is a physical link between the position of the
fingers and the musical sound that is produced, and to the observer the former
can denote the latter.
With regard to zygonic theory, indirect relationships between notation and
musical sound like this one are only partly imitative—​in the iconic element,
where, as far as the performer is concerned, fingering is copied from a graphical
representation, and vice versa in the case of the composer. Indices, in contrast,
are contingent or causal, rather than being generative through imitation.
‘Arbitrary’ cross-​modal relationships in music notation work on the prin-
ciple of the Peircean ‘symbol’ (ibid.: 274), in which the connection between
a sign and the object it represents (here, a shape and a sound) is determined
purely by convention. That is to say, any sound can be symbolized by any shape
(Figure 5.16).
How are such relationships formed? According to Peirce (ibid.: 274), a sym-
bol only becomes so ‘in the fact that a habit, or acquired law, will cause replicas
of it to be interpreted as meaning [x, y or z]’. But whence does the habit or law
derive? And what is the mechanism for its replication? (Is it imitative?)

FIGURE 5.16   An arbitrary shape is given meaning by convention.

146 Music and Shape

It seems reasonable to assume that all arbitrary musical symbols were

originally conceived by an individual or individuals wishing to find a way
of conveying instructions to performers effectively yet efficiently. The mean-
ings of the symbols would then have been replicated through written or oral
instruction—​‘meta-​symbols’. In practice, teachers quite often supplement
explanation with demonstration; indeed, in teaching children for whom spo-
ken language is a challenge, such as some of those on the autism spectrum,
demonstration may even replace explanation (Ockelford 2013). A pupil may
then practise realizing the association between shape and sound by replicat-
ing it in various musical contexts. Hence, in zygonic terms, imitation may
play a part in arbitrary musical symbols becoming known and embedded in
cognition (see Figure 5.17).
For some people, a fourth type of connection is possible between visual
images and sound, occurring through ‘synaesthesia’:  a neurological phenom-
enon in which the stimulation of one sense leads to involuntary experiences in
another (Baron-​Cohen and Harrison 1996; Harrison 2001; Cytowic, Eagleman
and Nabokov 2011; see also Ward, Chapter 9 below). A number of well-​known
musicians working in a variety of cultures, eras and genres, including Duke
Ellington, Billy Joel, Nikolai Rimsky-​Korsakov and Olivier Messiaen, have

FIGURE 5.17   The meaning of an arbitrary shape learned through imitation

Shape in music notation 147

reported seeing certain colours in response to particular pitches, harmonies,

tonalities or timbres. However, synaesthesia is not confined to elite compos-
ers and performers, with prevalence estimates among the general population
varying from around 1 in 20 to 1 in 2,000 (see, for example, Baron-​Cohen et al.
1996; Sagiv and Ward 2006). Nor is the response to sound confined to colours.
Ockelford and Matawa (2009: 52), for example, report the case of a nine-​year-​
old boy, Joshua, who has retinopathy of prematurity and, as a consequence,
is registered blind, with no sight in his right eye and only a small amount of
peripheral vision in his left. Joshua, who has ‘absolute pitch’ (the rare capac-
ity to identify and reproduce pitches in the absence of a reference tone; see, for
example, Deutsch 2012), describes how minor keys produce the sensation of a
‘bluey-​grey tunnel’, which he is ‘rushing down’, while major keys are perceived
as an ‘orangey-​red room’ in which there are ‘darker, shallow holes on the floor’.
Individual notes conjure up powerful associations too. The note B♭ elicits the
image of a light blue room with large windows in the distance, for example,
whereas B♮ is simply green. This implies a cross-​modal relationship of the type
in Figure 5.18.

Neural connections between the visual

and auditory processing centres in the brain

Visual correlate
Auditory correlate 494Hz

Visual image/Auditory image

FIGURE 5.18   Cross-​modal relationship engendered by pitch-​colour synaesthesia

148 Music and Shape

Relationships like this are not imitative in nature, since they stem directly
from the idiosyncratic wiring of an individual’s neural circuitry; nor do they
readily fit within the threefold Peircean typology of signs. Once externalized,
however, imitation would of course be possible, and the colour, shape or other
image could function as a symbol in Peircean terms.
In anticipation of our discussion below concerning forms of music
notation that are accessible to touch readers, we now consider the extent
to which the principles of perceived structure in two-​dimensional visual
images may transfer to the tactile figures used by blind people from the sec-
ond half of the twentieth century, when two main ways were developed to
convert lines and shapes into tactile form. The first, known as ‘thermoform-
ing’, constitutes a vacuum-​moulding process using thin sheets of plastic,
which permit the shape and texture of small objects to be copied and stored
in a relatively easily manageable form. The second employs ‘swell-​paper’,
which, when heated, produces raised lines and shapes in response to black
images (Edman 1992; Ockelford 1996a). Unfortunately, both approaches
are labour-​intensive and time-​consuming. However, refreshable haptic dis-
plays, akin to video screens for touch, are increasingly making analogues
of visual materials more readily accessible to blind people (Rastogi and
Pawluk 2013).
To the extent that the salient data from visual images can be detected when
they are reproduced in tactile form, so the kinds of relationships described and
illustrated above pertaining to dots, lines and shapes may be perceived in the
domain of touch (Révész 1950; Lechelt, Eliuk and Tanne 1976; Heller 1991;
Lederman and Klatzky 2009). It is my contention that these may be imitative
and therefore function zygonically. The perceptual and cognitive challenges of
assimilating information by touch alone should not be underestimated, how-
ever: readers of tactile scores can perceive what lies beneath their moving fin-
gertips only at any given point in time, which makes distances and angles hard
to judge. Moreover, two-​dimensional images larger than a square centimetre or
so have to be mentally reconstructed from series of sensations gleaned through
a painstaking process of digital scanning, making considerable demands on
memory. Nonetheless, complex musical information can be encoded and
decoded in tactile form, as we shall see.
To summarize: in this section, using zygonic theory, we have identified
four ways in which qualities of musical sounds and shapes (or their tactile
equivalents) may be related systematically in cognition. Such relationships
inform the design of musical scores and have enabled them to function in
various ways to represent sounds and to instruct performers how and when
to produce them. The fourfold taxonomy of sound–​shape relationships and
its connections with Peirce’s tripartite classification of signs is shown in
Figure 5.19.
Shape in music notation 149

sound-shape relationships

regular synaesthetic
(iconic) irregular

indirect arbitrary
(indexical and iconic) (symbolic)

FIGURE 5.19   Taxonomy of the possible types of relationship between musical sounds and visual

From theory to analysis: six examples of sound–​shape

relationships in musical scores

In this section I explore how sound–​shape relationships work in several types

of musical score using the theoretical assumptions set out above. I begin by
analysing a child’s ‘picture score’, produced in response to a rhythmic stimulus,
and intended to represent it. This is followed by a discussion of the tactile rep-
resentation of changes in pitch created by a young person who has had no sight
from birth. The semiotic properties of western staff-​based notation are con-
sidered next, using a short passage of music that is subsequently transcribed
into braille and guitar chord symbols, for the purposes of comparative analy-
sis. Finally, a synaesthete’s visualization of a track from Jean-​Michel Jarre’s
Oxygène is subjected to scrutiny using zygonic theory.


Figure 5.20 shows seven-​year-​old Jessica’s visual representation of a rhythm

played by one of her classmates, Henry, as observed by Jeanne Bamberger
(1995). Jessica and her friends had been set the task of putting down on paper
whatever they thought would help them remember the rhythm the following
day, or to help someone else to play it (Bamberger 2013: 10). And that is indeed
what happened: the next day, the children were able to reproduce the rhythm
assisted, at least in part, by their invented notation.
What cognitive processes can we assume are in play here? Bamberger herself
describes Jessica’s approach to rhythmic representation as ‘formal’, in that she
attempts to show the relative distances in time between the clapped events by
matching them with circles of two sizes (ibid.: 12). This interpretation implies
150 Music and Shape

FIGURE 5.20   A child’s transcription and performance of a rhythm

a consistent and coherent connection between inter-​onset interval and diame-

ter, which, in terms of zygonic theory, equates to regular cross-​modal mapping
(Figure 5.21). In Peirce’s nomenclature, the circles function iconically.



Welch (1991), following Walker (1981, 1985, 1987), investigated the mental
images that congenitally blind children produced in response to auditory
stimuli by having them depict the variation in pitch of quasi-​musical sounds
on a thin plastic membrane known as ‘German film’, on which is it possible
to produce raised lines using a stylus. As Walker had done before him, Welch
found that blind children systematically associated changes in pitch with the
vertical position on the page (Welch 1991: 220), in just the same way as their
FIGURE 5.21   Regular cross-​modal mapping between sound and score, and score and sound
152 Music and Shape

sighted peers do. Whether this striking similarity of mental imagery, irrespec-
tive of vision, resulted from a central cognitive processing mechanism that
works across a number of perceptual modalities, or was merely a consequence
of a common musical metalanguage used in the education of both blind and
sighted children (in which notes are said to be ‘high’ or ‘low’, for example),
remained a moot point. Evidence for the former view is to be found in a range
of psychological work: for example, that of Pratt (1930) and of Roffler and
Butler (1968), whose research (including, in the latter case, participants who
were blind) led them to conclude that every tone has an intrinsic spatial char-
acter, a finding supported by the investigation of Rusconi et al. (2006), which
also showed that the internal representation of pitch is spatial in nature; the
experiments of Mudd (1963) and Thorpe (2015), discussed above, in which a
common pattern of pitch-​position mapping was found in most participants;
and the empirical enquiry by Küssner and Leech-​Wilkinson (2014), who
found that the majority of their research participants (particularly trained
musicians) represented pitch with height when using a real-​time drawing para­
digm. Other research, however, in the field of ethnomusicology—​pointing to
the fact that, in certain cultures, pitch is not conceived of as high and low,
but ‘small’ and ‘large’ in Bali and Java, for example (Brinner 2008), ‘young’
and ‘old’ in the Amazonian basin (Seeger 2004), and ‘thick’ and ‘thin’ among
Farsi, Turkish and Zapotec speakers (Shayan, Ozturk and Sicoli 2011)—​pro-
vides support for notion that pitch/​space metaphors are not universal, but
language-​based (see Zbikowski 2002: 66–​76). However, the position is not
clear-​cut: Walker’s comparative study (1987) involving children from indig-
enous Canadian ethnic groups including the Inuit, Haida, Secwepemc and
Tsimshian found that, overall, participants displayed a proclivity for asso-
ciating pitch with vertical placement (rather than pattern, shape or horizon-
tal length). This apparent contradiction may be explained by the finding of
Eitan and Timmers (2010), that diverse cross-​domain mappings for pitch
exist latently in western participants in addition to the verticality meta-
phor—​a conclusion supported by the work of Antovic (2009) with Serbian
and Romani children, and Dolscheid et al. (2013). Dolscheid’s research
team, based in Nijmegen, found that Dutch speakers’ tendency to describe
pitches as ‘hoog’ (high) and ‘laag’ (low) could be overridden through training,
whereby they could learn to conceive of pitch in the way that, as we noted
above, Farsi speakers do—​as ‘naazok’ (thin) and ‘koloft’ (thick). Dolscheid
et al. took this to support the Whorfian hypothesis (Whorf [1956] 2012) that
language affects, in a fundamental way, the nature of perception and cogni-
tion. The impact of culture on perceptual salience is important too, as the
work of Athanasopoulos and Moran (2013) shows: here, a nonliterate Papua
New Guinean tribe, the BenaBena, produced iconic responses to short musi-
cal stimuli, which focused on hue and loudness rather than the variation in
Shape in music notation 153

pitch that proved to be most significant for western and Japanese musicians.
Clearly, the visual metaphors we intuitively use in conceptualizing music con-
stitute an area ripe for further cross-​cultural psychological studies.
A typical example of the responses given by Welch’s (western-​encultur-
ated) participants is shown in Figure 5.22. Here the stimulus, which had been
produced by Walker for his experiments (see 1987: 495), comprised two pitch
glides, produced by a linear sweep in frequency from 571 Hz to 800 Hz over
a period of two seconds and its reversal, separated by a second’s silence. The
sounds had been generated using a Roland CS15 synthesizer and were as pure
as it was practicable to make them, with 95 per cent of the total spectral energy
lying at the fundamental frequency. Lines produced on German film are inevi-
tably somewhat jagged due to the way in which the stylus presses into the
plastic, though inspection of the children’s efforts as a whole suggests that the
lines in Figure 5.22 were intended to be completely straight. Zygonic analy­
sis indicates that there are potentially two forms of cross-​modal imitation
functioning here at the tertiary level, whereby horizontal distance on the page


571Hz 800Hz 571Hz

0 1 2 3 4 5

(after Welch 1991)
German film

raised lines
with a

FIGURE 5.22   A congenitally blind child’s representation of pitch glides on German film
154 Music and Shape

FIGURE 5.23   Cross-​modal imitation at the tertiary level assumed to underlie the representation of a
pitch g​ lide as a straight diagonal line

equates to perceived time, and height corresponds to pitch (see Figure 5.23).
Semiotically speaking, the child’s depiction of pitch is iconic.


Conventional western music notation, using staves of five lines, clefs, time sig-
natures and a range of arbitrary signs to indicate the duration of notes, rests,
articulation, phrasing, dynamics and other aspects of performance, presents,
in Peircean terms, an intriguing mix of iconic and symbolic representation.
FIGURE 5.24   Western staff notation embeds arbitrary symbols within a semi-​regular framework of
pitch and time
156 Music and Shape

Pitch height is mapped approximately in the vertical dimension and time in

the horizontal, forming a visual framework in which the symbols for notes and
instructions pertaining to them are placed. The use of this scheme over sev-
eral centuries and its near universal take-​up in musically literate communities
across the world in modern times is testament to its effectiveness (the potential
impact of western cultural hegemony notwithstanding), though becoming a
fluent reader demands a commitment of hundreds, if not thousands, of hours.
See Figure 5.24.


Until the twentieth century, blind musicians learned new pieces by ear or by
having someone read out the information contained in a score—​approaches
that continue to this day in many cultures and in aural traditions. However,
having to rely on others to render musical information that is freely available to
their sighted peers represents an unwelcome loss of autonomy for blind peo-
ple wishing to access music in notated form. It was Louis Braille himself—​the
inventor of the literary braille code—​who devised the first workable system
of reading and writing music in tactile form. His initial ideas were published
in 1829 (Lorimer 1996), though it took almost a century for braille music to
become established as the main method of making staff notation available for
blind people (Kersten 1997). A braille transcription of the excerpt from Figure
5.24 is shown in Figure 5.25.
Braille comprises cells of six potential dots, yielding 26 or sixty-​four pos-
sible combinations (including the blank cell). These are arranged from left
to right in horizontal lines in the same way as print. Cells are read, one at a
time, as the index finger traces over them. Reading music through braille is not
entirely equivalent to using a print score, however, for a number of reasons.
First, the way that cells are laid out on the page means that they have to be read
in series: the two-​dimensional nature of printed music scores, over which the
eye scans horizontally and vertically, is necessarily compressed into single lines.
Second, the limit of sixty-​four dot-​and-​blank combinations makes context-​
dependent meanings inevitable, with all the information required to define a
single note often necessarily being conveyed in more than one cell. Together,
these characteristics mean that the iconic elements of print notation—​the
portrayal of pitch and time in the vertical and horizontal dimensions of the
page through imperfect tertiary zygonic imitation (see again Figure 5.24)—​are
absent, and the visual representation of the ‘gist’ of the music is missing. All
that remains are arbitrary signs that function symbolically in Peircean terms;
this is one reason music in braille is more difficult to learn and use than its print
equivalent (Ockelford 1991a, 1996b).
FIGURE 5.25   Music Time in braille music notation (represented in print form), with explanations of the signs
158 Music and Shape


Matrices of potential dots, ostensibly similar to those used in braille, also char-
acterize guitar chord symbols. The way that these function as signs is very dif-
ferent, however. The four chords used in the excerpt shown in Figure 5.24 may
be visually communicated to guitarists as indicated in Figure 5.26. There are
three semiotic processes at work here. First, there is an iconic link between each
graphic and the positioning of the guitarist’s fingers (Figure 5.15). Second,
each hand position on the strings functions as an index for the chord that is
produced. Third, the distance between the ‘nut’ (represented by the thick black
horizontal line at the top of each graphic) and each of the points where the
fingers press on the strings (shown by the black ellipses) is analogous with the
imaginary interval between the pitch to which the string is tuned and the note
that actually sounds (see Figure 5.27).



My final example is of a score produced by a synaesthete, Jamie Roberts, in

response to Jarre’s Oxygène, track 4. The representation of the musical segment
32‒40 seconds from the beginning is shown in Figure 5.28. Originally, each of
the two motives shown largely filled an A4 sheet of paper (297 mm x 210 mm).
At the time of creating his depiction of the piece, Jamie was seventeen and
had had chronic fatigue syndrome for more than four years, which had greatly
hindered his education. He had never had formal tuition on an instrument,
though he had enjoyed his weekly class music sessions at the selective second-
ary school he had attended, and he had started to teach himself the keyboard.
Jamie reports that when he listens to music he visualizes it as a graph, seeing
lines in two-​dimensional space that he conceives as converting the energy of
sound into the energy of sight. He finds it hard to imagine that other people do
not see the images that he does. As Figure 5.28 shows, Jamie’s representation
picks out the two most salient strands in the texture, the motoric ostinato and
the main melodic motive. His visualization captures the rhythmic pulses of the
music with jagged lines, whose contour appears not to be related to pitch. The

FIGURE 5.26   The fingering for the opening four chords of Music Time presented using guitar chord
Shape in music notation 159

FIGURE 5.27   The three semiotic processes at work as a guitarist performs from a chord symbol
(photographic image © Felicity Ockelford, 2014)

FIGURE 5.28   Fragment of Jamie Roberts’ synaesthetically derived score of Jean-Michel Jarre’s
Oxygène, track 4

contrasts in timbre are shown in colours (see Figure 5.28 on ). Because of

the repetition in the music, which is matched visually in Jamie’s rendition,
elements could be interpreted symbolically by listeners following the score; see
Figure 5.29. Hence, although we can presume that the source of the images was
different in neurological terms from that of the circles in the score produced by
the child in Bamberger’s study (see again Figure 5.20), their semiotic status is
the same.
160 Music and Shape

FIGURE 5.29   Types of semiosis functioning in a fragment of Jamie Roberts’ synaesthetic score of


This chapter explored the function of shape in music notation and set out a
model, using the principles of zygonic theory, that aimed to show how forms of
cross-​domain mapping between musical sounds and visual images may logically
occur in cognition. Four types of relationship between the perceptual domains
pertaining to hearing and vision were identified: regular, irregular (which may
Shape in music notation 161

be indirect or arbitrary) and synaesthetic. The connections between this think-

ing and Peirce’s threefold typology of signs—​icon, index and symbol—​were
investigated in the context of children’s picture scores, tactile representations
of pitch, staff notation, braille music, guitar chord symbols and a synaesthete’s
representation of track 4 of Oxygène.
In summary, the findings were as follows. First, the evidence from chil-
dren’s untutored representations of music suggests that sophisticated cross-​
modal mapping between sound and shape, using tertiary interperspective
relationships (which incur three steps of abstraction from the perceptual
surface), occurs early and intuitively. This view is reinforced by blind chil-
dren’s representations of changing pitch, which are equally sophisticated and
exist in the absence of any visual model to guide them, although it seems
likely that, their absence of vision notwithstanding, such children may have
been influenced by the common musical metalanguage they share with their
sighted peers, with its implied conceptualization of qualities of pitch such
as ‘high’ and ‘low’. Despite these general similarities, though, psychological
studies have shown that the precise nature of cross-​modal mapping may vary
from individual to individual and according to cultural convention. However,
the individual perceptual differences that exist do not appear to interfere with
the capacity of musicians to adopt one of the forms of standard notation
that have evolved which use (albeit approximately) the horizontal dimension
to represent time and vertical position on the page as an analogue of pitch.
While most of the symbols used in music notation are arbitrary products of
custom and practice, for a relatively few people, certain cross-​modal map-
pings may be hard-​wired through synaesthesia. These vary from one person
to another, although synaesthetically derived representations can be learned
and appreciated by others.
Beyond the thinking set out in this chapter, potential next steps include a
comprehensive exploration of different forms of symbolic visual representa-
tion of sound to verify the broad applicability of the model set out here or to
suggest certain modifications to it. More research is required into how cross-​
modal mappings are learned and into auditory-​visual synaesthesia. Finally, it
may be possible to use the model to generate new forms of audiovisual instal-
lation that have an intuitive cross-​modal appeal.


Antovic, M., 2009:  ‘Musical metaphors in Serbian and Romani children:  an empirical
study’, Metaphor and Symbol 24/​3: 184–​202.
Athanasopoulos, G. and N. Moran, 2013:  ‘Cross-​ cultural representations of musical
shape’, Empirical Musicology Review 8/​3–​4: 185–​99.
Bamberger, J., 1995:  The Mind behind the Musical Ear:  How Children Develop Musical
Intelligence (Cambridge, MA: Harvard University Press).
162 Music and Shape

Bamberger, J., 2013:  Discovering the Musical Mind:  A  View of Creativity as Learning
(Oxford: Oxford University Press).
Baron-​Cohen, S. and J. Harrison, eds., 1996:  Synaesthesia:  Classic and Contemporary
Readings (Hoboken, NJ: Wiley-​Blackwell).
Baron-​Cohen, S., L. Burt, F. Smith-​Laittan, J. Harrison and P. Bolton, 1996: ‘Synaesthesia:
prevalence and familiarity’, Perception 25/​9: 1073–​9.
Boulez, P., [1963] 1971: Boulez on Music Today, trans. S. Bradshaw and R. R. Bennett
(London: Faber and Faber.
Brinner, B., 2008: Music in Central Java: Experiencing Music, Expressing Culture (New York:
Oxford University Press).
Chalmers, D., 1996: The Conscious Mind: In Search of a Fundamental Theory (New York:
Oxford University Press).
Cone, E., 1987: ‘On derivation: syntax and rhetoric’, Music Analysis 6/​3: 237–​56.
Cross, I., 1998: ‘Music analysis and music perception’, Music Analysis 17/​1: 3–​20.
Cytowic, R., D. Eagleman and D. Nabokov, 2011: Wednesday Is Indigo Blue: Discovering
the Brain of Synesthesia (Cambridge, MA: MIT Press).
Deutsch, D., 2012: The Psychology of Music, 3rd edn (Waltham, MA: Academic Press).
Dolscheid, S., S. Shayan, A. Majid and D. Casasanto, 2013: ‘The thickness of musical pitch:
psychophysical evidence for linguistic relativity’, Psychological Science 24/​5: 613–​21.
Edman, P., 1992: Tactile Graphics (New York: AFB Press).
Eitan, Z. and R. Timmers, 2010: ‘Beethoven’s last piano sonata and those who follow croc-
odiles: cross-​domain mappings of auditory pitch in a musical context’, Cognition 114/​
3: 405–​22.
Fauconnier, G., [1985] 1994: Mental Spaces: Aspects of Meaning Construction in Natural
Language (Cambridge: Cambridge University Press).
Gjerdingen, R., 1999: ‘An experimental music theory?’, in N. Cook and M. Everist, eds.,
Rethinking Music (Oxford: Oxford University Press), pp. 161–​70.
Harrison, J., 2001:  Synaesthesia:  The Strangest Thing (Oxford:  Oxford University
Heller, M., 1991: ‘Haptic perception in blind people’, in M. Heller and W. Schiff, eds., The
Psychology of Touch (Hillsdale, NJ: Erlbaum), pp. 129–​60.
Kanai, R. and N. Tsuchiya, 2012: ‘Qualia’, Current Biology 22/​10: R392–​6.
Kersten, F., 1997:  ‘The history and development of braille music methodology’, The
Bulletin of Historical Research in Music Education 18/​2: 106–​25.
Kowler, E., 2011: ‘Eye movements: the past 25 years’, Vision Research 51/​13: 1457–​83.
Küssner, M. and D. Leech-​Wilkinson, 2014: ‘Investigating the influence of musical training
on cross-​modal correspondences and sensorimotor skills in a real-​time drawing para-
digm’, Psychology of Music 42/​3: 448–​69.
Lakoff, G., 1987: Women, Fire, and Dangerous Things: What Categories Reveal about the
Mind (Chicago: University of Chicago Press).
Lechelt, E., J. Eliuk and G. Tanne, 1976: ‘Perceptual orientational asymmetries: a compari-
son and visual and haptic space’, Perception and Psychophysics 20/​6: 463–​9.
Lederman, S. and R. Klatzky, 2009: ‘Haptic perception: a tutorial’, Attention, Perception,
& Psychophysics 71/​7: 1439–​59.
Lorimer, P., 1996: ‘A critical evaluation of the historical development of the tactile modes
of reading and an analysis and evaluation of researches carried out in endeavours to
Shape in music notation 163

make the Braille code easier to read and to write’ (PhD dissertation, University of
Mudd, S., 1963: ‘Spatial stereotypes of four dimensions of pure tone’, Journal of Experi­
mental Psychology 66/​4: 347–​52.
Ockelford, A., 1991a: Music and Visually Impaired Children: Some Notes for the Guidance
of Teachers (London: Royal National Institute for the Blind).
Ockelford, A., 1991b: ‘The role of repetition in perceived musical structures’, in P. Howell,
R. West and I. Cross, eds., Representing Musical Structure (London: Academic Press),
pp. 129–​60.
Ockelford, A., 1996a: Music Matters: Factors in the Music Education of Children and Young
People Who Are Visually Impaired (London: Royal National Institute of the Blind).
Ockelford, A., 1996b: Points of Contact: A Braille Approach to Alphabetic Music Notation
(London: Braille Authority of the United Kingdom).
Ockelford, A., 1999:  The Cognition of Order in Music:  A  Metacognitive Study (London:
Roehampton Institute).
Ockelford, A., 2002: ‘The magical number two, plus or minus one: some limits on our
capacity for processing musical information’, Musicae Scientiae 6/​2: 185–​219.
Ockelford, A., 2005:  Repetition in Music:  Theoretical and Metatheoretical Perspectives
(Aldershot: Ashgate).
Ockelford, A., 2006: ‘Implication and expectation in music: a zygonic model’, Psychology
of Music 34/​1: 81–​142.
Ockelford, A., 2009:  ‘Zygonic theory:  introduction, scope, prospects’, Zeitschrift der
Gesellschaft für Musiktheorie 6/​1: 91–​172.
Ockelford, A., 2012a:  Applied Musicology:  Using Zygonic Theory to Inform Music
Psychology, Education and Therapy Research (New York: Oxford University Press).
Ockelford, A., 2012b: ‘What makes music “music”? Theoretical explanations using zygonic
theory’, in J.-​L. Leroy, ed., Actualités des universaux musicaux (Topicality of Musical
Universals) (Paris: Editions des Archives Contemporaines), pp. 123–​48.
Ockelford, A., 2013: Music, Language and Autism: Exceptional Strategies for Exceptional
Minds (London: Jessica Kingsley).
Ockelford, A. and C. Matawa, 2009:  Focus on Music 2:  Exploring the Musical Interests
and Abilities of Blind and Partially-​Sighted Children with Retinopathy of Prematurity
(London: Institute of Education).
Peirce, C., [1867–71] 1984: Writings of Charles S. Peirce: A Chronological Edition, vol. 2
(Bloomington: Indiana University Press.
Peirce, C., [1893–1913] 1998: The Essential Peirce: Selected Philosophical Writings, vol. 2
(Bloomington: Indiana University Press.
Pomerantz, J. and M. Portillo, 2011, ‘Grouping and emergent features in vision: toward a
theory of basic Gestalts’, Journal of Experimental Psychology: Human Perception and
Performance 37/​5: 1331–​49.
Pratt, C., 1930:  ‘The spatial character of high and low tones’, Journal of Experimental
Psychology 13/​3: 278–​85.
Rastogi, R. and D. Pawluk, 2013: ‘Dynamic tactile diagram simplification on refreshable
displays’, Assistive Technology 25/​1: 31–​8.
Révész, G., 1950: Psychology and Art of the Blind, trans. H. Wolff (London: Longmans,
164 Music and Shape

Risset, J.-​C. and D. Wessel, 1999:  ‘Exploration of timbre by analysis and synthesis’, in
D. Deutsch, ed. The Psychology of Music, 2nd edn (New  York:  Academic Press),
pp. 113–​69.
Roffler, S. and R. Butler, 1968:  ‘Localization of tonal stimuli in the vertical plane’, The
Journal of the Acoustical Society of America 43/​6: 1260–​6.
Roskies, A., 1999: ‘The binding problem’, Neuron 24/​1: 7–​9.
Rusconi, E., B. Kwan, B. Giordano, C. Umilta and B. Butterworth, 2006: ‘Spatial represen-
tation of pitch height: the SMARC effect’, Cognition 99/​2: 113–​29.
Sagiv, N. and J. Ward, 2006: ‘Cross-​modal interactions: lessons from synesthesia’, in S.
Martinez-​Conde, S. Macknik, L. Martinez, J.-​M. Alonso and P. Tse, eds., ‘Visual per-
ception—​fundamentals of awareness: multi-​sensory integration and high-​order percep-
tion’, Progress in Brain Research 155: 263–​75.
Seeger, A., 2004: Why Suyá Sing: A Musical Anthropology of the Amazonian People
(Champaign, IL: University of Illinois Press).
Shayan, S., O. Ozturk and M. Sicoli, 2011: ‘The thickness of pitch: crossmodal metaphors
in Farsi, Turkish, and Zapotec’, The Senses and Society 6/​1: 96–​105.
Slawson, W., 1985: Sound Color (Berkeley: University of California Press).
Stevens, S. S., 1975:  Psychophysics:  Introduction to Its Perceptual, Neural, and Social
Prospects (New Brunswick, NJ: Transaction).
Stockhausen, K., 1977: Sternklang: Park-Music für 5 Gruppen (Kürten: Stockhausen-​Verlag).
Stone, K., 1980:  Music Notation in the Twentieth Century:  A  Practical Guidebook
(New York: Norton).
Thorpe, M., 2015: ‘The cognition of pitch patterns and cross-​modal spatial structure’ (PhD
dissertation, University of Roehampton).
Walker, R., 1981:  ‘The presence of internalised images of musical sounds and their rel-
evance to music education’, Bulletin of the Council for Research in Music Education
66/​67: 107–​12.
Walker, R., 1985: ‘Mental imagery and musical concepts: some evidence from the congeni-
tally blind’, Bulletin of the Council for Research in Music Education 85: 229–​38.
Walker, R., 1987: ‘The effects of culture, environment, age, and musical training on choices
of visual metaphors for sound’, Perception and Psychophysics 42/​5: 491–​502.
Welch, G., 1991: ‘Visual metaphors for sound: a study of mental imagery, language and
pitch perception in the congenitally blind’, Canadian Journal of Research in Music
Education 33 (Special ISME Research Edition): 215–​22.
Whorf, B., [1956] 2012: Language, Thought, and Reality, 2nd edn, ed. J. Carroll, S. Levinson
and P. Lee (Cambridge, MA: MIT Press).
Zbikowski, L., 2002:  Conceptualizing Music:  Cognitive Structure, Theory, and Analysis
(New York: Oxford University Press).
Alice Eldridge, cellist and coder

Inside and outside shape

I see music as a very human means of creating, exploring and communicating

abstract ideas and emotions. I believe this is made possible through the capacity
of organized sound to recruit and coordinate dynamic patterns of interaction
across a network of diverse objects and processes distributed across the brains,
bodies and worldly objects of musicians and listeners. Reflecting my personal
practice as an improvising cellist and my academic interest in digital music, I
offer a particular account of some of the roles shape plays in framing and sup-
porting these processes in both acoustic and digital music-making. My own
experiences are accompanied by those of other improvisers1 to illustrate the
idea that shape provides a lingua franca to conceptualize and talk about rela-
tions between the otherwise divergent array of objects and activities tied up
in musical creation, performance and listening. In particular, I consider how
the role of shape differs during what I am calling ‘offline’ (learning, practis-
ing, composing) versus ‘online’ musicking (performance and improvisation in

Acoustic practice: integrating shapes in ears, fingers and eyes

Musicians’ daily practice is structured by patterns. On the one hand, we drill

scales into our fingers until they become automatic; on the other, we dream
up awkward bowing, breathing, fingering, timing or interval patterns to retain
focus and prevent ourselves going onto automatic pilot. By seeking alternative

166 Music and Shape

representations which create more easily memorizable shapes, the deliberate

choice of patterns can support memory as well as help focus attention.
When practising and learning music I definitely think of shape. In some
situations it can make the learning easier. For example practising scales
for improvisation—​practising them with some sort of rhythmical pattern.
That also makes it more interesting and in that way keeps the concentra-
tion better. (Julie Kjær, saxophonist)

These patterns we concoct for practice can be seen as mental images that allow
us to integrate representations in motor, visual and sonic processes—​shapes
in our bodies, eyes and ears. Initially, integration of these mappings requires
conscious effort. As a child I learned to read music by first learning Curwen’s
solfège handsigns (see Beach 1914) and songs about colourful insect characters:
‘C is for Clarence Caterpillar, D is for Dora Dragon Fly’, etc., forging links
between note names and their position in physical, pitch and visual space. As
you learn an instrument, another set of mappings is established, from the notes
on the staff to the fingerings necessary to produce the designated pitches. These
visual–​motor mappings rapidly take precedence, a phenomenon neatly illus-
trated by scordatura notation. Bach’s Cello Suite No. 5 is written for scordatura
tuning as shown in Figure R.12 (top right). When I see the top interval of the
chord in bar 2 of the Prelude, I ‘know’ and hear a minor third but quite happily
read and finger a perfect fourth, suggesting that I am not reading ‘B♭’ at all, but
‘first finger in half position on the A string’.
The integral role of muscle memory in musicianship is nothing new: research
into musical implications of motor theories of perception abounds (e.g.
Godøy 2010). Anecdotes from expert musicians deftly illustrate. Improviser
Steve Beresford, speaking before a recent concert of the London Improvisers’
Orchestra, remarked that despite having moved from trumpet to piano decades
ago, he still automatically thinks about melody lines in terms of trumpet fin-
gerings. The sight of musicians (saxophonists and keyboard players espe-
cially) air-​fingering in gigs to work out the thrust of a solo they’re hearing is
not uncommon. These are not just ‘air instrument’ performances, mimicking
sound-​producing gestures (Godøy, Haga and Jensenius 2006): these habits of
expert musicians suggest that actual or imagined activation of motor schemata

FIGURE R.12   Opening of the Prelude of Bach’s Cello Suite No. 5 in scordatura notation
Reflection: Alice Eldridge 167

is an integral part of conscious musical comprehension. In expert instrumen-

talists, it seems, sensory-​motor contingencies are so developed that fingering
patterns and sound producing gestures are not only automatic, but an integral
part of conscious musical cognition.

Representations of pattern in music software languages

If acoustic instrumentalists offload music cognition onto shapes in their fingers

and instruments, a growing community of musician–​programmers is develop-
ing software languages which provide a similar cognitive scaffold. Live Coders
embrace the unique potential of software as a dynamic instrument that can be
rewritten in real time, improvising with software algorithms on the fly. Writ
large in their manifesto is a commitment to work directly with algorithms as
thoughts and the performer’s mind as instrument.2
Some live coding languages explore 2D visual environments such as David
Griffiths’ Scheme bricks (2008), in which pieces of code, representing musical
methods and parameters, can be plugged into each other much like nested build-
ing blocks. Others combine text and graphic elements, such as Alex McLean’s
Texture (McLean and Wiggins 2011), where spatial locations of the elements
have syntactic relevance, affording the live composition of musical ideas in 2D
space, or Thor Magnusson’s ixi lang (2011a), which combines graphic and tex-
tual elements, giving space and style functional roles in improvisation. More
recently, McLean’s Tidal is presented as a ‘mini-​language embedded in Haskell,
for the live coding of pattern’ (McLean 2014), moving away from explicit spa-
tial metaphor and offering terse and powerful expressions for the creation and
manipulation of generative music.
In these situations, the software language itself acts to externalize and sup-
port musical cognition, much like a musical score (Magnusson 2011b): each
in its own way actively foregrounds the representation and manipulation of
patterns. Just as acoustic instruments are recognizable by their timbral charac-
teristics, so music-​software languages are evolving to afford particular musical
structures: ‘Connoisseurs report that they can identify certain musical environ-
ments, not only by how they sound, but also by which musical patterning or
form they afford’ (Magnusson 2011c: 1).

Inside shape in improvisation

Guitarist John Russell talks about free improvisation as the closest he gets
to ‘what music actually is’.3 I might go further and suggest that improvising
with others distils many of the joys of being human, capturing the best bits of
168 Music and Shape

conversation, cooking, dancing and dressing up, meeting people and testing
and exchanging new ideas: instantaneously making something. At such times,
the deliberate planning, monitoring and manipulation of activities give way to
more intuitive processes.
In one improviser’s comments, the lack of conscious engagement is almost
the hallmark of ‘good’ improvising, active consideration of shape coming into
play only when musical spontaneity ebbs:
With some musicians I  have a really good connection, the music flows
and I  don’t tend to think a lot. I  don’t think much about shape (and
sometimes not at all) but it feels like I am—​and the other musicians are—​
working with it on a more unconscious level. I close my eyes, using my
ears, feeling of the soundwaves in my body, the colours I get when my
eyes are closed and the mood it all puts me in to play and reflect on the
music. With other musicians it can be harder to get this flow and I tend to
start thinking more and also be more conscious about the shape and how
to do it. (Julie Kjær, saxophonist)

A similar distinction is seen through fMRI studies. Scans of professional pian­

ists showed a significant deactivation of the areas of the brain associated with
self-​conscious monitoring while freely improvising compared to playing learned
music (Limb and Braun 2008). Other improvisers stress a fundamentally bodily
engagement in performance:
My involvement in an improvisation is kind of visceral. But in that sense
it has shape. I’m very much aware of gesture, but felt as a kind of move-
ment of energy, or line. A kind of fluidity that morphs according to what’s
going on around it. That’s probably as well as I can explain it. It’s a bit
like the feeling I get when I watch British sign language (which I under-
stand a little) or dance, or even things moving around me—​that move-
ment translates into a body feeling that feels like music. I even find myself
making internal (or external if I’m not careful!) sounds when watching
things move. (Rachel Musson, saxophonist)

If shaped representations support the conscious planning, monitoring and

manipulation of musical processes in offline musicking, in improvisation
such distinct, deliberate musical strategies are less often in play. Metaphors
of shape become a means to articulate an otherwise quite ineffably integrated
When I listen I’m outside the shape looking at it. When I’m playing I’m
inside it, traveling, with no overall sense of its size or layout. I’ve worked
with African musicians who, when we’ve been working out arrangements,
use the phrase ‘you can come inside’ when it’s your turn to play. (Stephen
Hiscock, percussionist and composer)
Reflection: Alice Eldridge 169


Beach, C. B., ed., 1914: ‘Curwen, John’, The New Student’s Reference Work (Chicago:
Godøy, R. I., 2010:  ‘Gestural affordances of musical sound’, in R. I. Godøy and M.
Leman, eds., Musical Gestures: Sound, Movement, and Meaning (London: Routledge),
pp. 103–​25.
Godøy, R. I., E. Haga and A. Jensenius, 2006: ‘Playing “air instruments”: mimicry of
sound-​producing gestures by novices and experts’, in S. Gibet, N. Courty and J.-​F.
Kamp, eds., Gesture in Human-​Computer Interaction and Simulation: 6th International
Gesture Workshop, Lecture Notes in Artificial Intelligence 3881 (Berlin: Springer), pp.
Griffiths, D., 2008:  ‘Scheme bricks’. Software available at https://​​scheme-​bricks/​
(accessed 9 April 2017).
Limb, C. J. and A. R. Braun, 2008:  ‘Neural substrates of spontaneous musical perfor-
mance: an fMRI study of jazz improvisation’, PLoS One 3/​2: e1679.
Magnusson, T., 2011a: ‘ixi lang: a SuperCollider parasite for live coding’, in Proceedings
of the International Computer Music Conference, 31 July–​5 August 2011, University of
Huddersfield (conference document), pp. 503–6. Software available at https://​github.
com/​thormagnusson/​ixilang (accessed 9 April 2017).
Magnusson, T., 2011b: ‘Algorithms as scores: coding live music’, Leonardo Music Journal
21: 19–​23.
Magnusson, T., 2011c: ‘Confessions of a live coder’, in Proceedings of the International
Computer Music Conference, 31 July–​5 August 2011, University of Huddersfield (confer-
ence document), pp. 609–16.
McLean, A., 2014: ‘Making programming languages to dance to: live coding with Tidal’, forth-
coming in Proceedings of the 2nd ACM SIGPLAN International Workshop on Functional
Art, Music, Modelling and Design, Gothenburg, Sweden, 1–3 September 2014. Software
available at https://​​tidalcycles/​Tidal (accessed 9 April 2017).
McLean, A. and G. Wiggins, 2011: ‘Texture: visual notation for the live coding of pat-
tern’, in Proceedings of the International Computer Music Conference, 31 July–​5 August
2011, University of Huddersfield (conference document), pp. 612–​28. Software available
at https://​​yaxu/​texture (accessed 9 April 2017).

The shape of musical improvisation

Milton Mermikides and Eugene Feygelson

Contemporary enquiries into the art of musical improvisation cross a range of dis-
ciplines from cultural studies, pedagogy, psychology, neuroscience, mathema­tical
and computer modelling, and quantitative and qualitative analyses to ethnogra-
phy. Even a succinct survey of the state of current research would fill a volume,
as is meticulously demonstrated in Berkowitz’s The Improvising Mind (2010) and,
in the context of jazz, Berliner’s seminal Thinking in Jazz (1994). This chapter
addresses the concept of shape in musical improvisation, and offers insights into
how it might be described, identified or created. Jazz pedagogy is used to pres-
ent key improvisational mechanisms and as a foundation on which a model of
improvisation is constructed. The concept of shape in improvisation is addressed
within this model. Taking examples from jazz repertoire, we offer an approach
to categorizing improvisational shape. Finally, these concepts are adopted in the
detailed analysis of a classical cadenza, demonstrating the wide applicability (and
stylistic neutrality) of this approach to improvisational analysis.
One must acknowledge the complex continua that exist between composi-
tion, performance and improvisation (Benson 2003), as well as the role of intu-
ition, spontaneity and ‘polished improvisation’ in composition. Nonetheless,
certain musical practices rely on—​indeed, stylistically require—​a significant
degree of spontaneity, or at any rate a purposeful avoidance of premeditation
or prescription. Performance-​without-​complete-​blueprint—​as might be found
in a jazz solo, a Hindustani raga, a classical cadenza or ‘free’ improvisation—​
may itself involve significant preparation and hard-​earned skill, but there
is a relatively higher number of musical choices made ‘in the moment’ than
are found in the typical performance of a score or memorized piece. In these
contexts, the audience and (and until the last moment) the performer(s) can-
not be fully aware of what music will emerge. With this definition it might be
argued that there are no completely improvised musics and—​if performed by
170 humans—​no music that avoids spontaneity, or at least the possibility of small
The shape of musical improvisation 171

variation, in performance. A fully notated score is never just that, and, even if
notes and relative pitch durations are prescribed meticulously, in the moment
of performance spontaneous expressive (or unavoidable) alterations of tempo,
dynamic, timbre and inflection will emerge. There is a spectrum of allowable
spontaneity in performances of all kinds, from subtle nuance to significant
invention (although never without some type of constraint). This chapter deals
with musical shape that might emerge at the inventive edge of the spectrum,
taking examples from the conventional jazz improvisation repertoire (solo-
ing or melodic interpretation within fixed structural and harmonic outlines)
and the improvised classical cadenza (a stylistically constrained exploration of
­thematic materials). Furthermore the structural components that are addressed
are distinct from (or additional to) the superficial structural contexts in which
the improvisation might take place (such as the jazz standard lead sheet with
its given form and harmonic framework, or the rhythmic cycle, tala, of a
Hindustani raga). We are in essence considering the shape that might emerge
from spontaneously selected musical elements.
The formal shaping of an improvisation is addressed widely in improvi-
sational pedagogical literature, for example in Hal Crook’s How to Improvise
(1991) or Berkowitz’s survey of nineteenth-​century improvisational treatises
(2010: 15–​80). Conversely, research into (usually jazz) improvisation has aimed
to reveal musical strategies and structures—​however implicit or intuitive—​that
occur in improvisational repertoire. These include—​citing just three of many
possible examples—​Markov chains in Coltrane’s improvisations on Giant Steps
(Franz 1998), phrasing schemata in Charlie Parker’s blues soloing (Love 2012)
and patterns in the transformation of those musical features less amen­able to
standard notational analysis (see for example Benadon’s microtiming analysis
of early jazz improvisation in Time Warps in Early Jazz, 2009).
Despite a foundation of jazz practice, this chapter takes an abstracted model
of improvisation (proposed by Mermikides 2010, which shares important fea-
tures of Pressing’s 1988 model) through which improvisational shapes might
be identified. In the model presented here, a musical object is seen as exist-
ing at a point in multidimensional musical space (M-​Space) and possessing an
array of properties available for modification. Improvisation is represented as
the artful motion through this space, and characteristics of this motion may
form larger-​scale musical structures. This view of improvisation offers practical
applications for performance (and composition) in a range of styles, as well as a
framework within which to analyse and appreciate the repertoire and practice.

Chains of thought

One conceptualization of—​ and approach to—​ improvisation is as a musi-

cal object (such as a phrase, a melody or an idea) undergoing appreciable
172 Music and Shape

transformations over time (see for example Damian and Feist 2001: 12–​20;
Crook 1995: 8–​31; and Berliner 1994: 146–​69). More specifically, this object
might be perceived as containing a set of musical properties, with each property
open to modification in future phrases. New material is thus created by alter-
ing a selected parameter, or set of parameters, from previous objects. In other
words, improvisation involves the construction of a musical train of thought
where every subsequent object relates to a preceding one in terms of a changing
set of variable parameters. This concept is best fleshed out with a real-​world
example. In a section of John Coltrane’s solo in Acknowledgement (Coltrane
1965: 2:06–​2:32), to pick one of countless examples from the repertoire, a
simple motive (C, E♭ and F typically in a quaver, quaver, crotchet—​or more
generally short, short, long—​rhythm) is manipulated in terms of chromatic
transposition, metric placement and rhythmic subdivision (more specifically,
the duration between note onsets also known as an inter-​onset interval, IOI).
Even a first listen to the extract will allow a clear identification of the central
role of this musical ‘seed’ in the ensuing improvisation, with these three pri-
mary degrees of transformation. Occasionally, objects merge—​for example, the
last note of one phrase becomes the first of the next—​but on the whole this
passage provides an idealized, clear (and readily notated) example of this con-
cept of improvisation: the expressive variation of an established musical object.
Many improvisational passages contain patterns that are less immediately
recognizable than the linear chains-​of-​thought model that Coltrane provides in
the last example. Figure 6.1 shows how an opening phrase might be developed
through a number of complex transformational pathways.
In this analysis, phrases are generally identified as being related to a previ-
ously occurring phrase, and may themselves combine into larger phrases or
break off into smaller ones. They are labelled accordingly. The types of rela-
tionships between phrases are described by sets of transformational processes
in boxed text.
This improvisational methodology thereby involves the selecting of a par-
ticular subset of musical properties of a phrase, which is then either fixed or
modified by varying amounts in the subsequent phrase, to form a series of
interlinking chains. Even in this short extract, the themes of variation, trans­
position and recombination (as identified by Berkowitz in the context of classi-
cal improvisational pedagogy; 2010: 39–​80), the creation of logical expectations
and their potential subversion (‘rational deception’; Bach [1753, 1762] 1949:
434) and analogies with syntax (Patel 2003) are revealed.
Some further points for consideration:
• There may be many valid analyses of an improvisation, and the
performer’s conception and the listener’s interpretation of the solo
may differ.
• A single phrase may also form the impetus for any number of
subsequent phrases, along any number of transformational processes.
FIGURE 6.1   An illustration of a complex chains-​of-​thought improvisation methodology (Mermikides 2010)
174 Music and Shape

• Phrases may be hierarchical (e.g. B.1 contains phrases A.1.2, A.1.3

and A.1.4) and one of several reasonable analyses is presented here.
Small-​or larger-​scale phrase structures are all open to modification.
• Sufficient transformation may result in the formation of a new phrase

unit for further modification.
• This analysis purposefully omits the complex interactions between
performers in an ensemble, whereby musical material created by
one player may influence the improvisations of any number of
other players. The interaction between players follows a similar
methodology whereby musical material is shared within a common
ideas pool and modified between members of the ensemble (see
Sawyer 1992; Monson 1991, 1996; and Berliner 1994: 647–​51
for theoretical and transcription analyses of ensemble motivic
• A particular solo, performer’s identity or musical style may be

described not just by its melodic, harmonic and rhythmic vocabulary,
but also by the types of transformational processes and extent of
variations employed.
• Phrases G.1 and G.2 in Figure 6.1 provide a glimpse of how technology

may be employed as part of the improvisational process, and how it
may offer otherwise unavailable transformational dimensions.
• The precise demarcation of musical material into units for
transformation, which are called ‘phrases’ here, is a subjective exercise.
Furthermore, the definition of a phrase unit may change in relation
to the transformational process employed. For example, a set of
five notes might be considered a complete phrase for a sequencing
process, but a timbral modulation may also be applied through those
five notes, implying a smaller conceptual subdivision. It is tempting
(and sometimes useful) to describe musical fragments as cells, which
are combined into hierarchical phrases. But it soon becomes clear in
analysis that a cell’s autonomy is temporary, and that there are no
uniformly indivisible musical units; even a single note can be subject
to all manner of transformation and recombination, and parameters
such as timbre do not allow for such a convenient atomistic
perspective. As an illustration of the difficulty (perhaps futility) of
declaring a lower limit to the object, Curtis Road’s Microsound (2004)
offers an exploration of ‘sound particles’—​lasting less than 100 ms—​
and their role in the creation of line, pulse and texture.
• The example in Figure 6.1 mainly presents a traditional improvisation

where phrases occur in a strict series. Contrapuntal mechanisms,
ensemble interactions and the smearing of a phrase (through
electronics) allow phrases to coexist and their relationships to be
parallel, rather than strictly linear.
The shape of musical improvisation 175

• There is a differing amount of variation from one phrase to the next,

so a solo may be characterized by the extent of relatedness between
phrases. This concept of musical proximity underpins the idea of
improvisational shape discussed later in the chapter.

The chains-​of-​thought model of improvisation is employed here as a form

of retrospective analysis, but it may also be used in the context of real-​time
improvisational choices. For example, Figure 6.2 shows how a phrase offers the
performer a set of options for the continuation of an improvisation based on
various transformational processes.
Since there is a nontrivial number of precise transformational processes (let
alone combinations thereof), Figure 6.2 shows but the briefest glimpse into the
refracting pathway of an improvisation, which hints at the wealth of musical pos-
sibilities available. Phrase-​based improvisation may be seen as the artful carving
of a pathway through this mesh of musical possibilities. Indeed, a particular solo
or style of playing may be described in terms of which transformational pro-
cesses are selected. Acknowledgement favours chromatic transposition, rhythmic
displacement, and augmentation or diminution. Other improvisational passages
may use the manipulation of other musical parameters (such as diatonic transpo-
sition, inversion, retrograde, dynamics, timbre, phrase length, etc.) as expressive
mechanisms. Whatever the parameters employed, the premise of the chains-​of-​
thought improvisational model is that of a series of musical objects which gener-
ally hold some appreciable relationship to a previously established object or set
of objects. In the examples presented, these prior objects have emerged in the
course of the unfolding improvisation, but they might also include objects from
other ensemble members, melodic material (say, from the ‘head’ of a jazz stan-
dard), or elements from a stylistically or personally established, wider vocabulary
or set of schemata (as in Pressing’s concepts of ‘referents’; 1984: 346).
This chains-​of-​thought perspective runs the danger of narrowing the con-
cept of improvisation into exclusively linear and causal relationships between
objects, with little acknowledgement of the important role of spontaneous,
novel inspiration. However, as this model is developed through this chapter,
it is shown that these ‘pristine’ moments may in fact be accommodated into a
theoretical model with the introduction of the concepts of multidimensional
musical space, ensemble interaction, proximity, surprise and shape.

Limitation and variation of musical topics

Here the discussion turns to jazz improvisational pedagogy and how it might
relate to and inform the concept of shape in improvisation generally. The jazz
pedagogical material that started to emerge around the late 1980s from such
educator/​practitioners as Hal Crook, Jerry Bergonzi and Mick Goodrick
FIGURE 6.2   An illustration of musical refractions. In the course of an improvisation, a phrase is manipulated by the selection of one of
many transformational processes (1–​8 present a few of countless possibilities). The resulting phrase is in turn open to further modifications.
Improvisation is seen as the realization of a pathway through the multitude of refracting musical possibilities (Mermikides 2010).
The shape of musical improvisation 177

encouraged a more direct engagement with improvisation beyond the typi-

cal ‘learn your scales, transcribe, and good luck’ approach. For example,
Goodrick’s and Crook’s material (Goodrick 1987; Crook 1991) set particular
challenges for the student, which might be described as ‘guided improvisation’
or ‘improvisation within limits’. Typical exercises include:
• Improvise through the tune using only chord-​tones of the harmonic
• Improvise a short phrase. Rest. Improvise a short phrase. Rest.
Improvise a long phrase. Rest and repeat.
• Improvise a phrase that starts with the concluding material of the

previous phrase. Rest. Repeat.
• Improvise a solo with a prescribed dynamic, registral or ‘excitement’
• Improvise a series of phrases with a particular intervallic structure,

adjusting accidentals to negotiate the harmony.
• Improvise using only a prescribed range or area of the guitar
fretboard, or the saxophone register, etc.

The point of such ‘limiting’ exercises is to force new ideas and avenues of
ex­ploration, liberating the improviser from the overwhelming number of
possi­bilities of the ‘blank canvas’, focusing on specific musical parameters
that might otherwise be overlooked, and avoiding the habitualized patterns
that arise in the absence of guidelines. In Crook’s words, ‘There is no freedom
without structure’ (1991: 55). One might think of this type of approach as the
training of a particular type of skill: the independent and artful modification,
or maintenance, of coexisting musical parameters. This type of pedagogical
literature aims to develop proficiency (conscious and intuitive) in this area to
create authentically chosen material rather than pat phrases at the moment of
improvisation or clichéd responses to a sequence of chords. Improvisational
skill and strategy are thereby developed through the fixing—​or variation—​of
specific musical parameters.
Alongside jazz pedagogical material, further support for this ‘limit and
vary’ approach, this time from an academic theoretical standpoint, is offered
by Pressing’s (1988) ‘Improvisation: methods and models’, which offers a
model of improvisation as variegated attention paid to selected parameters and
transformational processes. Values of various musical parameters and types
of transformation of a phrase are defined. The amount of attention paid to
each is described with the currency of cognitive strength. This is unlikely to be
more than conjecture, and it is difficult to imagine how it could be measured.
Personal and anecdotal accounts and neuroscientific reports on improvising,
tentative as they may be, seem to suggest that at any particular moment the
creative improviser is thinking actively about one or two musical goals at most
(Werner 1996; Nachmanovitch 1990; Solstad 1991; Limb and Braun 2008).
178 Music and Shape

Pressing’s model taken alone is not immediately stylistically relevant to jazz

improvisation research, nor authoritative for it. Nonetheless, by defining this
multilevel environment within which the improviser may navigate, Pressing pro-
vides a very powerful conceptual vocabulary. Despite the difficulty with the
proposed distribution of cognitive strengths, the independent defining of atten-
tion to, and values of, particular musical parameters and processes is illuminat-
ing. The staggering developments in music technology now allow the adoption
within practice of similar models for composition and real-​time performance,
as well as computer-​based generative improvisation systems (see for example
Bäckman and Dahlstedt 2008).
As a practical demonstration of these theoretical concepts we offer an exam-
ple of the process. Here, a simple three-​note phrase (to the left of Figure 6.3)
is shown to afford various simple interpretations, which may then be used as
‘topics’ from which an improvisation might develop. Despite its simplicity, the
phrase in Figure 6.3 could be described in innumerable ways and levels of detail
including: (1) a phrase starting on beat ‘4 and’, (2) a three-​note rhythmic pat-
tern with no particular rhythmic placement, (3) a melodic gesture, (4) a broken
chord implying part of a Cmin9 or E♭maj7 chord, (5) a phrase with a particular

FIGURE 6.3   Coexisting interpretations of Phrase α (Mermikides 2010)

The shape of musical improvisation 179

harmonic altitude relative to the harmonic context or (6) a particular pattern

of timbral characteristics and envelope represented as amplitude over time.
In this way, the phrase can be conceived as possessing many sets and subsets
of properties, variably simple or complex, or, to adopt Pressing’s language, an
object existing in a particular point in multidimensional conceptual space. It
becomes clear how an improvisation might develop with this concept in mind.
Any number of the subsets of musical characteristics may be used as a reference
point for ensuing phrases. For example, the starting beat may be fixed for a new
phrase, the rhythmic pattern preserved with new notes or the melodic contour
maintained but transposed, and so on. Not only can the concept of isoryth-
mos (fixed rhythmic structure) be explored but also isomelos (fixed sequence
of melodic pitches; Persichetti 1961). In fact, one might coin a number of
words prefixed iso-​and dis-​to refer to the pertinent fixing or varying of vari-
ous musical parameters respectively, such as isotimbre (fixed timbre), isopaesi
(fixed intensity), isomodos (fixed scale implication), isokinetos (fixed gesture)
and isologos (a fixed concept or pattern applicable across multiple parameters).
From a jazz practitioner’s perspective, by limiting certain parameters one is
more able to explore ‘otherwhere’ (Pate, cited in Berliner 1994: 385). A lang­
uage naturally evolves from here to describe a musical relationship defined by
the significant variation of a particular parameter (displacement, distimbre,
etc.) Figure 6.4 offers some possible improvisations emerging from phrase α,
based on this ‘limit and vary’ approach.
Each of these continuations can be seen as the limiting and varying of some
inherent sets of properties in phrase α. For example, phrase 1 takes the rhyth-
mic structure and general melodic shape of α as a constant, and uses diatonic
transposition, rhythmic displacement and micro timing as transformational
processes, whereas phrase 2 uses the strict intervallic structure of α and employs
chromatic transposition and rhythmic displacement to create an angular, dis-
sonant line. Phrases 3–​6 employ the fixing and variation of sets of properties
including melodic structure, articulation, metric placement, dynamics and
scale implication. Phrase 7 might be considered an interruption where many
of the parameters are varied or abandoned, effectively restarting the continu­
ation process. Figure 6.5 illustrates how the limitation and variation of musical
parameters may forge new material in an improvisation.
The importance of considering a host of available topics for modifica-
tion is reflected in the ever-​growing jazz pedagogical material of the last two
decades. Some pedagogical texts give an overview of many transformational
topics within one book (see Crook 1991, 1995; and Damian and Feist 2001 for
examples of improvisational meta-​views), while other writers choose to cre-
ate a series of volumes addressing each topic separately, of which Bergonzi’s
Inside Improvisation series is a clear example. To date, Bergonzi has written
seven volumes, each focusing on a topic (Bergonzi 1992, 1994, 1996, 1998,
2000, 2002 and 2004). In fact, these can be seen as studies in the fixing of these
FIGURE 6.4   Improvised continuations of Phrase α. Instances of Phrase α, and its close relations, are labelled with solid and dashed outlines respectively
(Mermikides 2010).
The shape of musical improvisation 181

Fix Melodic Shape
Key Area Rhythmic Structure
Vary Vary
Rhythmic Density Diatonic Transposition
Melodic Content Rhythmic Placement

Fix Intervallic Structure
Harmonic Vary
Implication Chromatic
Vary Transposition
Attack Envelope Rhythmic

Fix Fix
Fix Fretboard shape Order of Melody
The 3 Melody Notes Vary Notes
in Terms of Scale Rhythmic Vary
Vary Placement Rhythmic
Imply Parallel Scale String Placement Placement of
Melody Notes

FIGURE 6.5   An illustration of how the fixing and variation of musical topics may forge
improvisational continuations from Phrase α (Mermikides 2010)

featured topics (e.g. a particular melodic cell) and thereby exploring deeply
other variables (permutation, harmonic altitude, segmentation, etc.). A con-
temporary bibliography of jazz pedagogical material has started to resemble
a library of chess books with stacks of general-​principle texts alongside titles
dedicated to every conceivable opening, variation of opening and style of end
game. The study of these differentiated skills, in both chess and jazz improvi-
sation, aims to offer the player informed options and intuition at the moment
of performance.
This section has introduced, and given some examples of, the concept of
improvisation as a mutation of preceding phrases, through the variable fixing
and variation of various musical parameters. In this way, a phrase is modi-
fied according to various parameters, and wanders from its starting origin
while maintaining a comprehensible narrative for the listener. The next sec-
tion will look more deeply into this idea of trajectories through this wealth
of possibilities, and how this might enhance an understanding of shape in
182 Music and Shape

Modelling and navigating musical space

The improviser, given a starting phrase, is presented with a range of choices

for continuation depending on the differential manipulation of a host of
musical subsets. Beyond the theoretical interest, this concept may be applied
pedagogically and in performance to guide improvisational practice. To return
to the extract from Coltrane’s Acknowledgement, a simple melodic fragment
is transformed in terms of three parameters: chromatic transposition, metric
placement and note separation. Although it might be possible to appreciate
these three values for each object in the context of standard notation, we can
observe them more directly in an alternative representation. Figure 6.6 shows a
three-​dimensional space that represents all the possible variations of the phrase
in terms of the three specific parameters of chromatic transposition, metric
placement and rhythmic subdivision. Twenty numbered phrases in the solo are
plotted in reference to example phrases in the space, but the space represents
all possible permutations of values within those ranges. With the caveat that
important articulation, dynamic and timbral elements are set aside for now,
Coltrane’s solo may be seen as constituting a small subsection of this clearly
demarcated musical space.
Incidentally, a reference may be drawn here with the language of evolution-
ary biology. Raup’s cube (cited in Dawkins 1997: 192) illustrates an analogous
three-​dimensional space of possible shell types, where three genes contribut-
ing to the shape of a shell (spire, flare and verm) are laid out in three dimen-
sions. Every possible expression of these genes is laid out in multidimensional
space, and a subset of these that have been found in nature as a product of
natural selection can be indicated. Similarly, the phrases in Coltrane’s solo rep-
resent the ‘naturally occurring’ subset of all possible phrases within a defined
musical space.
Figure 6.6 uses a simply arranged set of transformed phrases: however, the
exact layout of phrases within that musical space is debatable. One could make
a good case that potential phrases existing along a particular dimension may
not always have an easily described continuum of proximity. A semiquaver dis-
placement may actually be a more radical mutation than a minim displacement;
an octave transposition is perhaps less extreme than a semitone, and a semitone
less extreme than a quarter-​tone for that matter. This nonlinear nature of musi-
cal proximity is noted in Wishart’s On Sonic Art with reference to the relation-
ship between frequency and consonance (1996: 71–​3).
This problem of defining proximity may be approached tentatively. For
example, phrase α may be imagined as existing at a particular point on an axis
of rhythmic placement within the concept of a cyclical bar. Because of the pat-
tern of strong and weak beats, a minim displacement is to be considered the
‘nearest’ displacement (despite it being the furthest in terms of beat placement).
FIGURE 6.6   Coltrane’s cube: some possible phrases of Coltrane’s Acknowledgement plotted in the three-​dimensional musical space of metric placement, rhythmic
separation and chromatic transposition, with a few coordinates illustrated with standard notation (Mermikides 2010)
184 Music and Shape

A crotchet displacement is more distant; the D for example would now fall on
beats 2 or 4, rather than 1 or 3, a more significant change in character. Any qua-
ver displacement alters the phrases yet more extremely, removing the upbeat
and interfering with any swinging of quavers that may be going on. Semiquaver
shifts, and yet finer rational subdivisions (if appreciable), alter the phrase still
more radically. To complicate matters further, the layout of these phrases is not
static: once a rhythmic displacement has been made, phrases are reordered in
terms of proximity. For example, if phrase α is displaced by a crotchet, its ‘near-
est’ neighbour is now a minim away. Incidentally, this representation can more
readily adopt micro-​timing features which ‘fall between the cracks’ of standard
notation. With the concept of musical proximity in mind, more dimensions
may be added and a new musical space may be constructed for exploration
(Figure 6.7).
Orthogonal to this rhythmic placement axis, a note separation axis may
also be postulated, representing the progressive elongation and contraction of
phrase α, with wider note separation in one direction and shorter in the other.
This axis might be arranged with the emphatic top D used as a rhythmic anchor
about which the outer two notes are stretched or compressed. The individual
notes may compress until they form a chord and then extend beyond that point
to form a retrograde transformation of phrase α. In addition to rhythmic place-
ment and note separation, another axis may be added that represents all possi-
ble diatonic transpositions of a phrase (diatonic to the key of C Dorian), higher
in one direction and lower in the other. Chromatic transposition within a tonal
harmony creates a nonlinear pattern of musical distance. However, within a
modal setting, from which phrase α is derived, the hierarchical nature of scale
degrees is less clear. The subjective decision has therefore been made to arrange
the proximity in terms of diatonic transposition very simply, so that proxim-
ity in this dimension is equivalent to similarity of melodic register. Given the
definition of these parameters, variations of phrase α exist in three dimensions,
with potential mutations of the phrase existing side by side in conceptual space.
A sense of musical proximity within these constraints may also be perceived.
Now that the concept of proximity has been established, one might also
imagine additional transformational dimensions emerging from phrase α, such
as a chromatic transposition dimension, axes of various timbral characteristics
(including those achievable only through electronic manipulations), points of
symmetry, intonation, segmentations and so on. An impression of how a musi-
cal phrase exists in multiple simultaneous dimensions of transformation, here
termed M-​Space, is shown in Figure 6.8.
The coexistence of these multiple dimensions is possible to conceive, but
difficult to illustrate precisely in one diagram. A  conceptual model, whose
precise demarcation can be delegated to a computer, might serve better than
two-​dimensional illustrations. Regardless, a logical visualization of a phrase
existing within a radiated sphere of closely related musical material may readily
FIGURE 6.7  Phrase α existing at the centre of a three-​dimensional musical space with variously
proximate neighbouring phrases. Phrase α is indicated at the origin of the axes and the musical
distance between it and various close neighbours is shown. The boundary of the grey sphere describes
a boundary of equal proximity and contains phrases within this musical distance. The lower part
of the diagram shows an impression of Phrase α existing at a point within this musical space
(Mermikides 2010).
FIGURE 6.8   An impression of M-​Space: phrase α sits at the centre of many simultaneous dimensions of musical transformation. Twelve of these are represented in four
three-​dimensional subsets (some of which are continuous rather than discrete values) with some proximate phrases indicated. A phrase may move along any number
of such transformational axes during the course of improvisation. In the top right of the diagram, a phrase shows the result of a small move in all of these subsets
simultaneously (the modification is marked as a grey disc in each transformational subset) (Mermikides 2010).
The shape of musical improvisation 187

be adopted and, as will be demonstrated, a concept of relative musical proxim-

ity is intuitively accessible, despite the mathematical complexity on which it
is constructed. In fact, many musicological terms such as repetition, motivic
development, sequencing, antiphony, call and response, ternary form, exposi-
tion, etc. rely on a shared intuition of musical similarity or difference (i.e. musi-
cal proximity).
As more axes of transformation are added, an idea emerges of a particular
musical object living at a particular point in a conceptual space of all its pos-
sible variations, from which the improviser may explore in any direction, or to
use saxophonist Evan Parker’s description of improvisation, ‘take a note for a
walk’ (cited in Borgo 2005: 36).
Thus, proximity is simply distance in M-​ Space. These fields, grouping
related musical objects together, are shown later as cloud-​like structures, but
the actual shape is harder to grasp. Since we are conceiving this boundary as a
multidimensional object of a particular radius, its shape cannot be conceived
as a simple three-​dimensional sphere. Multidimensional musical proximity
would imply that the exact repetition of a phrase, with a major alteration in
a single dimension, such as a substantial timbral modification, may be equally
proximate as a repetition of the phrase with many slight alterations. Electronic
dance music could be said to illustrate this idea clearly; a continuously repeated
phrase may be subjected to extraordinary timbral manipulations while retain-
ing an intelligible relationship to the original phrase. (For one of numerous
examples of this, listen to the extreme, and tolerated if not relished, timbral
manipulations of the bass ostinato in Phuture’s Acid Tracks from 1987).
If an improvisation is considered a series of jumps or flights in musical space
from one object to the next, this raises a question of the existence of an opti-
mal ‘musical’ distance or flight path. It seems that the skill-​set of the proficient
improviser includes the ability to control musical proximity for expressive effect,
balancing predictable, similar objects with novelty. In The Sound of Surprise
(1959), Balliett’s posited ‘aural elixir’ is the result of perfectly selected surprises;
and, as Borgo notes, effective improvisation is not a random stumble through
musical space, nor is it always a dainty, careful and predictable movement
through it: ‘Randomness does not produce a sense of surprise, but rather confu-
sion, dismay, or disinterest. And small departures from an orderly progression,
if insufficiently interesting or dramatic, will pass without much notice. Surprises
are by definition unexpected, and yet those that most capture our interest or
delight have a feeling of sureness about them once experienced’ (Borgo 2005:1).
Some approaches to finding this ideal middle ground between predictabil-
ity and randomness (a ‘rational deception’) are addressed later in this section.
Regardless, the concept of proximity allows an awareness of a rarely identified
expressive medium. Irrespective of the specific vocabulary, a series of phrases
with only subtle changes creates a different musical effect than do a series of
wildly disparate phrases. In other words, M-​Space distance and velocity are, in
188 Music and Shape

themselves, avenues of expression, as is acceleration, the rate with which prox-

imity between phrases alters.
Rather than improvisation as a meandering drunken walk through this M-​
Space, large-​scale improvisational strategies are possible and occur often in the
hands of skilled practitioners. Listening to Jimmy Smith’s solo on The Sermon
(1958: 1:58–​2:50) with the M-​Space model in mind, it is easy to hear the separ­
ation of phrase fields (Figure 6.9).
Even a first listen sorts these phrases into five main fields (A–​E) with two
to five phrases in each (A1–​3, B1–​4, etc.) There are common features within each
group, which form a strong gravitational force between the phrases. This prox-
imity means that they can tolerate and indeed draw attention to any subtle
transformations, including editing of notes and detailed variations of micro
timing and inflection. The creation of proximate phrases fixes groups of
musical dimensions and thereby frees up other musical dimensions for effec-
tive expression. This multilevel hierarchical structure of phrases and fields in

Field B
Field A B1
A2 B4


Field C
C2 C1

Field E Field D
E1 E2 D2


FIGURE 6.9   A multi-​level depiction of Smith’s solo on The Sermon. Improvisation is seen as a
configuration of fields at varying distances and trajectories in M-​Space, with each field containing a
constellation of phrases. Phrases, in turn, may be broken down into a nexus of smaller phrase units as
is shown in reference to E2. Phrase E2 has been placed closer to Fields A and C than B and D, to reflect
features of E2.1. Fields themselves are linked together in terms of timbral, registral and temporal
components and may coexist in a yet greater nexus of relationships with other performers or musical
objects (Mermikides 2010).
The shape of musical improvisation 189

M-​Space is illustrated in Figure 6.9. Fields A–​E co-​exist as part of the same
solo, but their relative proximity is also due to registral, timbral and temporal
considerations. Not shown in Figure 6.9 is the yet more complex interaction of
fields and phrases between other performers and musical objects.
In essence, this solo extract might be described as a ‘field series’ in that a
motive is explored in a number of objects (phrases) before jumping to another
area of M-​Space for expressive exploration. Although Figure 6.9 is illustrated
in two dimensions, one must be reminded that the relative distances between
fields, and between their constituent phrases, is the cumulative result of their
relative positions in multidimensional space (through variations of many
coexisting musical parameters). Once the concept of M-​Space structures and
their relative positions has been grasped, the listening and analytical process
becomes far clearer. From the straight-​ahead to the most avant-​garde contexts,
it becomes possible to classify improvisations in terms of which parameters are
fixed and varied, and what types of trajectories through M-​Space occur.
Other improvisational structures (or strategies) might also exist. One can
hear, for example, in Wes Montgomery’s solo on No Blues (1965: 1:32–​2:02),
one very narrow phrase field being used repeatedly as a pivot to other fields in a
‘call-​and-​response’ manner. A sharply defined F (in octaves) is used as a motive
central to various other phrases (which might belong to several identifiable
fields). This interjected figure is so clearly defined that the other ensemble mem-
bers are compelled to mark it with their own musical material. In other words,
the soloist’s M-​Space structures have infiltrated those of the accompanists, as
should occur in any responsive ensemble environment. One might name this
type of improvisational structure a ‘pivot’, where one narrow area of M-​Space
is used as a frequent and significant launching point for other satellite objects.
Alternatively, Coltrane’s Acknowledgement (discussed earlier) displays the
strategy of identifying a narrow field and then furtively exploring that space for
extended periods: one might refer to this improvisational shape as ‘nuclear’. Pat
Metheny’s approach on Unquity Road (1976) is less clearly delineated: phrase
fields exist, but the transitions between them are often blurred, and motion
through M-​Space relatively slow, so that the result is a ‘merged’ improvisational
structure. Finally, an improvisation may be characterized by repeated distant
jumps in musical space, an ‘unbounded’ improvisational shape, so that there
is little appreciable relationship between successive musical objects (Sheffield
Phantoms (Bailey 1975) may—​to many listeners—​provide an example).
As analyses of improvised solos are revealed, it becomes possible to sort con-
stituent passages into these kinds of broad category. Note that these are grouped
in terms of the relationships between the phrases rather than according to the
vocabulary. A  pictorial comparison of five improvisational strategies is pre-
sented in Figure 6.10—​nuclear, field series, pivot, merged and unbounded—​of
which Acknowledgement (Coltrane 1965), The Sermon (Smith 1958), No Blues
(Montgomery 1965), Unquity Road (Metheny 1976)  and Sheffield Phantoms
1. Nuclear 2. Field Series 3. Pivot

4. Merged 5. Unbounded

FIGURE 6.10   Five improvisational structures: 1) ‘Nuclear’: phrases, with only occasional small anomalies, fall within one close field with only minor variances. 2) ‘Field Series’:
close phrases are played a few times with variances before repeating the process at a different point in M-​Space. 3) ‘Pivot’: one particular narrow field is played often, acting as
a springboard to various satellite fields. 4) ‘Merged’: fields are merged by the use of a transitional phrase of otherwise distinct phrase fields. 5) ‘Unbounded’: a series of phrases
with little proximity of one phrase to any other (Mermikides 2010).
The shape of musical improvisation 191

(Bailey 1975)  are respective examples. The categorization of these strategies

involves some subjectivity (one listener’s nuclear may be another listener’s
unbounded improvisation) and there may be borderline cases, but a clear ter-
minology and framework in which to analyse, compare and contrast a range of
improvisations regardless of style is presented.
Identifying improvisational strategies such as these informs a practical
approach to improvisational performance and ‘guided’ score instructions, as
well as creating a broad analytical foundation. Furthermore, an appreciation
of M-​Space structures may act readily as a supporting mechanism to com-
positional practice and employment of electronics. The analyses so far have
focused on the transformation of material over linear time, one object moving
to another particular location in space. However, jazz (and classical practice, as
is demonstrated in the next section) also involves a transformation of synchro-
nous (albeit unheard) material. That is to say, the (often adventurous) interpre-
tation of a melody in a jazz ‘head’ involves the spontaneous transformation of
the (usually quite skeletal) melody. Rhythmic adjustments, interpolated phrases
and playful transpositions of the written (but rarely exactly rendered) melody
are all potential avenues of expression for the skilled performer—​and informed
listener—​and this can be seen as the playful dancing around the composed
musical object in musical space.
Through a pedagogical exploration of jazz improvisational method, this sec-
tion has crossed paths with an elaboration of Pressing’s event-​cluster model
(1988). This meta-​view of improvisation is made more powerful with the sup-
port of a practical stylistic understanding of the jazz idiom, which provides
an appreciation of time-​feel, harmonic altitude and the extrapolation of the
concepts of proximity and velocity.
The view of improvisation as transformations in multidimensional musical
space is so broad that it connects the mechanics and pedagogy of jazz practice
with a diverse range of compositional and analytical research. These include
(among others):
• Xenakis and Kanach’s formal modelling (Xenakis and Kanach
2001) which, with its consideration of stochastic functions over
multiple—​albeit discretely valued—​musical parameters, has parallels
with the concept of proximity and improvisational strategies in
• Wishart’s ‘gestures’ in electronic music (1996)—​a taxonomy of
continuous sonic modifications—​which may be considered analogous
to expressive contours with respect to timbre;
• The multidimensionality of Pressing’s improvisation model (1998)
which is, as discussed earlier, related directly to M-​Space;
• Moles and Schaeffer’s prescient graphical representations of l’objet

sonore (Holmes 2008: 45–​48), conceived readily in respect to three
synchronous expressive contours, or three dimensions in M-​Space;
192 Music and Shape

• Methods within Schillinger’s compositional systems (1978) which may

be described as employing isokinetos, isorythmos and isologos; and
• Dreyfus’ detailed studies (1996) of motivic transformation in the

music of J. S. Bach, readily adopted in terms of a chains-​of-​thought

A cadenza in musical space

These same concepts of transformation and proximity in a musical space—​

defined by formal expectations set up by Beethoven’s ordering of his mate-
rial—​may be used to examine a written-​out classical cadenza from the later
nineteenth century. Although, for the most part making cadenzas was the pre-
rogative of performers, there are some exceptions. For example (focusing on
violin music), Mozart includes original cadenzas for his Sinfonia Concertante
for violin and viola K364/​320d and the Concertone for two violins K190/​
186e. Even though a practice of publishing ornamentation was established
well before (for example, in the northern European editions of Arcangelo
Corelli’s Op. 5 Sonatas for Violin and Basso Continuo, circa 1710), composing
and publishing cadenzas was uncommon until well into the romantic period.
From then on, cadenzas of highly regarded performers were printed (prob-
ably to sell to the growing market for amateur music-​making), and these may
or may not have reflected the stylistic interests of the concerto’s composer.
At the time this was no more surprising than in modern jazz practice, where
improvisation on a standard such as Ain’t Misbehavin (1929) is neither predi-
cated on nor held back by the specific practice of its composers, Fats Waller
and Harry Brooks. While a sense of what distinguishes that particular era of
jazz from bebop or the avant-​garde is clear to most jazz professionals and expe-
rienced listeners, when the standard is reworked in an improvisation performers
might, but need not, incorporate ‘appropriate’ or ‘historically a​ ccurate’ perfor-
mance practices.
A written cadenza is in and of itself problematic, even oxymoronic: experi-
enced improvisers in various genres will relate that over time their retention of
material within improvisation increases. Some report that they can eventually
write down all or most of their improvisation immediately after performance
(Nooshin 2003: 246). Some, contrastingly, report a type of short-​term memory
loss, including the loss of the entirety of their improvisation just after perfor-
mance (Berkowitz 2010: 121–​4). In either sense, the modern idea that a writ-
ten text based on an improvisation is categorically different from a performed
improvisation is likely based on a dichotomization of improvisation and com-
position that makes little sense before modern times.
The shape of musical improvisation 193

With these provisos, and using a closely related approach, we can now turn
to an extant cadenza for Beethoven’s Violin Concerto Op. 61 by the Belgian
violinist Hubert Léonard (1819–​90) first published in the 1880s (Léonard
c.  1883). Léonard was an established violinist who had studied with Henri
Vieuxtemps and François Habeneck at the Paris Conservatoire. Beginning
his tours of Europe in 1844, he succeeded Charles Auguste de Bériot as prin-
cipal professor of violin at the Bruxelles Conservatoire in 1847, becoming the
tutor of celebrated violinists of the next generation including Martin Pierre
Marsick, Henry Schradieck and Henri Marteau (Stowell 1992: 65). Léonard
is possibly best known today as an inspiration for Gabriel Fauré’s Violin
Sonata. In 1875, on a long visit to Sainte-​Adresse, near Le Havre, Léonard
advised the young Fauré on how to make his composition ‘more playable
and effective’ (Nectoux 2004:  23). Léonard, then, was well respected in his
field, not on a par with extraordinary or esoteric violinists such as Eugène
Ysaÿe (in the following generation) or Niccolò Paganini (of the previous
generation), and better remembered, in the long run, as a pedagogue. It was
in this capacity that he provided an unusual second-​violin accompaniment
for the Beethoven concerto, exclusively for teaching purposes, reissued in a
1909 arrangement for violin and piano edited by his student Henri Marteau
(Beethoven 1909).
To fill out some of the context in which Léonard’s cadenza was fashioned,
we should note briefly the types of values embodied in classical-​era cadenza
improvisation/​ composition as exemplified by Daniel Gottlob Türk ([1789]
1982: 301). Like the ‘limit and vary’ exercises illustrated above, Türk describes
some basic starting-​points for cadenza creation, although his presentation is
rather more prescriptive than the examples offered from Goodrick (1987) and
Crook (1991).
• [T]‌he cadenza … should particularly reinforce the impression the
composition has made in a most lively way and present the most
important parts of the whole composition in the form of a brief
summary or in an extremely concise arrangement…
• The cadenza … must consist not so much of intentionally added
difficulties as of such thoughts which are scrupulously suited to the
main character of the composition…
• Cadenzas should not be too long…
• [M]‌odulations into other keys … either do not take place at all …
or they must be used with much insight … only in passing…
[O]riginally the harmony of the six-​four chord and in any case the
triad that follows it were the basis of the cadenza, but in our time
these harmonic confines are probably too narrow. One can modulate;
only one should not remain in neighbouring keys so long that the
feeling for the main key is extinguished.
194 Music and Shape

• Just as unity is required for a well-​ordered whole, so also is variety

• No thought should be often repeated in the same key or in another…

• Every dissonance which has been included … must be properly
• A cadenza does not have to be erudite, but novelty, wit, an
abundance of ideas and the like are so much more its indispensable
• The same tempo and metre should not be maintained throughout the
cadenza; its individual fragments … must be skillfully joined to one
• A cadenza which perhaps has been learned by memory with great

effort or has been written out before should be performed as if it were
merely invented on the spur of the moment.

Keeping Türk’s guidelines in mind, together with the basic principles outlined
in our discussion of jazz improvisation, we can now look at the relationship
of Léonard’s cadenza to the text of Beethoven’s Op. 61 following a chains-​of-​
thought model. The cadenza is rich in the standard techniques of cadenzas of
the period, principally motivic and melodic development and transposition.
Yet, looking at his traversal of musical space, we see Léonard playing with the
structural relationships expected from Beethoven’s composition, creating musi-
cal leaps in terms of formal proximity, as illustrated below.
An analogy is to be found in the concept of transformational grammar, as
drawn on already above. As Chomsky (1988) theorized, the ‘transformational’
properties of language arise from the language user accessing a body of know­
ledge (‘lexicon’, which is the linguistic analogue to Pressing’s 1984 ‘knowledge
base’) to generate novel sentences, or sequences of words or phrases. The cadenza,
similarly, is a kind of real-​time interaction between the source material and the
improviser/​composer’s musical abilities. To use terms from Pressing (1984), the
‘referent’ is clear: the thematic, motivic, rhythmic shapes or any musical element
from the written text of the concerto, with a clear emphasis on the main thematic
material. The ‘knowledge base’ is derived from the virtuosity of the performer/​
composer and thus dictates the execution—​or ‘generation’ in Chomskian terms—​
of the cadenza, a novel set of phrases and sequences based on the referent.
Thus, Léonard created a novel reconception of Beethoven’s given material,
passing elements of the original text through the filter of his individual musi-
cal personality and creative interests. He uses generative methods of musical
creation common to the period—​variation, transposition, recombination and
‘rational deception’—​to reenvision Beethoven’s material (Berkowitz 2010). He
also inserts shapes—​conceived here as motives or characteristic passages of
material in the process of change—​that seem to come directly from his know­
ledge base, perhaps inherited from his composition/​improvisation education,
The shape of musical improvisation 195

perhaps quoted from his own composed/​improvised work, perhaps newly syn-
thesized with material from Beethoven, although it may be hard to discern an
exact origin. In some ways, perhaps this latter category is the most significant
outcome of the cadenza creation process: the fusion of the composer’s and
the performer’s musical identities through which shapes survive although their
obvious identifying features have changed.
Figure 6.11 represents the first section of Léonard’s cadenza, divided into
L1 and L2 according to their relationship to material from Beethoven’s original
text (the corresponding passages are B1 and B2). L1 illustrates the opening of
Léonard’s cadenza, which extends the first seventeen bars of Beethoven’s text
(B1) by one bar (circled). Structurally Léonard keeps the identical thematic
order, employing isologos (to continue with our earlier terminology). The types
of transformation in this section are harmonic, rhythmic and ornamental: the
first statement of the main theme (bars 1–​9 of B1) is harmonized with two-​,
three-​and four-​note chords, where the first half of the second section (bars 10–​
13 of B1) is extended by one bar (bar 14 of L1) and embellished by arpeggiated
flourishes in A major and Amaj7 (bars 11 and 13). The statement is thus intensi-
fied (e.g. the opening forte dynamic marks an example of dispaesi), departing
from the sweet character of the original melody but keeping the original line
(isomelos). The descending melodic figure in bars 14–​15 of B1 is broken into
chords and rhythmically varied while the final two bars, 16–​17, are repeated
nearly identically to Beethoven’s text with the minor addition of a ‘g’ pick-​up
for supporting harmony (bars 17–​18 of L1).
L2 shows us a slightly more adventurous departure from Beethoven’s
motives. A reinvention of bars 32–​34 of B2 starts the passage, keeping the
rhythmic gestures of the original section (isorhytmos). However, melodically
the phrase peaks, not troughs (bar 22 of L2), as Léonard reconfigures expecta-
tions (examples of dismelos and diskinetos). Further, he borrows the rhythmic
detail of what follows in B2 (bars 35–​41) creating a transitional harmonic sec-
tion that leads into a very unexpected revision of B3, departing from the related
material in B1 and B2.
Illustrated in Figure 6.12, L3 begins with a rather unexpected harmonic and
ornamental reshaping of the melodic and harmonic material of B3, one of
the most ‘expressive’ moments of the concerto as reported widely by theorists
and embodied in performance across many recordings (Stowell 1998; Fabian
2006). This section acquires greater significance later in the cadenza. Léonard
reuses the highly recognizable opening suspensions, g2 to f ♯ 2 and c3 to b♭2, in the
original harmonic progression (B3, bars 332–​35), but transforms the texture
and setting entirely with supporting d2 ostinato trills. Here Léonard is extract-
ing part of a musical shape (isokinetos) but shifting the conceptual outcome
dramatically and in terms of original affect (dislogos). The trills are then used
to modulate further towards a section that is so highly generic as transitional
material (bars 37–​42) that it is hard to call it Beethovenian, or even Léonardian.
FIGURE 6.11   Opening section of Léonard’s cadenza (L1/​L2) and corresponding sections from Beethoven’s Violin Concerto Op. 61 (B1/​B2). L1 is subdivided into two
phrases, with their respective transformations connected by lines in the figure. The motivic elements of B2 are also then connected to their reshaped versions in L2.
FIGURE 6.12   Second section of Léonard’s cadenza (L3/​L4A/​L4B) and corresponding sections from Beethoven’s Violin Concerto Op. 61 (B3/​B4). A harmonic relationship is
indicated by showing the use of the C–​B♭ and G–​F♯ suspensions in B3 as a motivic shape in L3. The modulating sequence of L3 is then separated by a different box, lower
in the figure. The B4 melody is boxed in the middle of the figure, with corresponding melodic embellishments indicated above and below in L4A and L4B.
198 Music and Shape

L4A brings a virtuosic reworking of the secondary theme (B4) that is highly
evocative of chromatic contrapuntal late ​eighteenth-​or early ​nineteenth-​cen-
tury virtuoso violin technique (embodied variously in the études of Pierre
Gavinies or the caprices of Niccolò Paganini). Fluctuating, like the original
section, from major to minor, this passage offers nothing inherently unusual.
What follows, however, is quite unexpected. As illustrated in Figure 6.13, L5
is an arpeggiated section that denies the expectation that the melodic material
in B4, bars 57–​60, will be continued (dismelos). Instead, we hear a harmonic
progression with a pronounced bass melody line, indicated by accent marks in
the score. The melodic line seems familiar yet is difficult to recognize, especially
given the denial of the expected continuation of B4’s melodic material. This
is, in our opinion, the most profound moment of musical surprise or ‘rational
deception’, as theorized by C. P. E. Bach ([1753, 1762] 1949).
Possible sources for this passage can be found among Beethoven’s harmonic
suspensions in the orchestral parts: the inner string writing of bars 51–​63
(Beethoven 1968, not depicted here); the bassoon and horn parts in relation to
the principal violin part, bars 304–​29 (not depicted here); and, most convinc-
ingly, bars 351–​57 (shown in Figure 6.13: B5), where we touch on the A♭​–B♭
and C tonalities evident in Léonard’s cadenza. Note particularly the melodic
suspensions in the violin part and the almost Khachaturian-​like semitone har-
monic modulations in bars 348–​69.1
Yet even this suggestion of derivation is dubious. More likely, Léonard’s
material is the generative outcome of synthesized information, the improviser/​
composer mixing the referent and his own knowledge base so thoroughly that
the listener can no longer be sure about the origin of the material. Therefore,
instead of an etymological pursuit through analysis, it seems wiser to propose
that our auditory perception is rationally deceived, both through the harmonic
language and in supposing a specific source for the material: it is impossible
to tell whether we are listening to Beethoven or Léonard, and perhaps that is
the point. What matters is not the amount of influence that is present, nor the
success of Léonard’s transition—​which might even be seen as sloppy, depend-
ing on one’s sense of compositional values. The important point is that we
have reached something mystical, inexplicable, the joining of two minds who
lacked the opportunity to meet in person but who meet—​in a metaphysical
sense—​through sound and transfer of text. The synthesis of these two streams
of consciousness, which crystallizes throughout the cadenza, is made particu-
larly evident in this transitional passage.
Returning to Figure 6.12, let us note that the section succeeding this transi-
tion, shown as L4B, revisits the melodic content of B4 in an even more robust
virtuosic texture, now in the original key areas of D major and D minor.
Léonard provides a resolute confirmation of our expectations in the continu­
ation of the melody previously denied (isomelos), embellishing the melodic con-
tent in a homogeneous fashion through to the melodic material of bars 57–​60
FIGURE 6.13   Final section of Léonard’s cadenza (L5/​L6A/​L6B) and corresponding sections from Beethoven’s Violin Concerto Op. 61 (B5/​B6). Both L5 and B5 indicate progressions that
embody pronounced melodic properties, using secondary dominant and common-​tone modulations, semitone movement in the bass, and consistent fluctuations between major and
minor harmonies. The relationship is considered tenuous. The shapes material from B6 is indicated, the opening section corresponds to L6A, while the last sequence of notes in B6,
circling around ‘a’, corresponds to an extended retransition sequence in L6B.
200 Music and Shape

of B4, subsequently compressing it into a generic harmonic transition to the

next section of the cadenza.
Illustrated in Figure 6.13, L6A and L6B are extensions of the original open-
ing material of the principal violin’s entrance into the concerto, B6, which fol-
lows the orchestral exposition. Rhythmically compressing Beethoven’s m ­ aterial,
Léonard quickens the material (disrhythmos), heightening the virtuosity further
and provoking performers to rhythmically play the descent and rise in L6A
as quickly or flexibly as they dare. The melodic, gestural and dynamic con-
tent remains the same (isomelos, isokinetos and isopaesi). Gradually, Léonard
brings us closer to the expected Amaj7 leading us to the D major orchestral
playout of the first movement, extending the dominant harmony with the con-
trapuntal texture of L6B.
Figure 6.14 provides a graphic representation of several relationships of
Léonard’s cadenza to Beethoven’s concerto. The x axis represents bars in the
Beethoven, illuminating the fact that Léonard borrows materials from different
sections of the concerto, not always in the original order. The y axis represents
a sense of musical proximity to the original material, where, for example, L1 is

L1-6 Section of Léonard’s Cadenza L5

B1-6 Corresponding sections of
Beethoven’s compostion

Cadenza shape
Musical proximity to referenced material


isorytmos isokinetos

harmonic diskinetos disrhytmos
embellishment harmonic tenuous
isomelos embellishment harmonic
isokinetos isokinetos relationship
dispaesi isomelos dislogos
isokinetos disrhytmos
(extended) dispaesi

B1 B2 B4 B6 B3 B5

1 17 28 43 64 89 96 331 357
Section bar length and relative position of
referenced Beethoven material

FIGURE 6.14   Graphic representation of Léonard’s cadenza illustrating the relationship of musical
proximity to Beethoven’s original score. The closer the vertical distance between each corresponding
B and L section, the greater the musical proximity of corresponding material. Bar lengths for each
section correspond to horizontal width. A ‘field series’ organization is indicated by the arrows
connecting each L section, while transformations are briefly indicated in the rounded boxes
connecting each B and L section.
The shape of musical improvisation 201

closer and L3 further away in its transformation of Beethoven’s text. Rounded

boxes connecting respective ‘L’ and ‘B’ sections indicate specific types of trans-
formations employed by Léonard in each corresponding cadenza section.
Relative sectional length is indicated by the width of each square box. The
relationship of L5 and B5 is tenuous, as mentioned above: a vague relationship
(indicated by dotted lines in the diagram) at the greatest distance in terms of
musical proximity between cadenza and concerto material. Finally, if we fol-
low the arrows connecting the ‘L’ sections at the top of Figure 6.14, we can see
a pattern indicating one of the five improvisational strategies shown earlier in
Figure 6.10, a field series model, underlining the utility of the M-​Space model
in musical analysis regardless of genre (Mermikides 2010).


This chapter has approached the concept of shape in improvisation with the
use of four conceptual stepping-​stones, subdivided into four sections.
1. Using jazz improvisation (and its pedagogy) as a reference point,
an improvisation might be seen as a chain (or series of chains)
whereby newly created musical objects hold an appreciable musical
relationship (a general motivic similarity) to a previously established
musical object (or set of objects). The use of the word ‘chain’
suggests a strict linear set of relationships, but this model allows an
intricate nonlinear set of connections, whereby relationships might be
apparent among a wide set of parameters, and relationships between
objects might ‘leap-​frog’ over interpolated objects.
2. The links in this chain were more keenly examined, and it was
suggested (with a basis in jazz improvisational pedagogy) that they
can be usefully described in terms of which musical parameters are
pertinently fixed and which are varied. Terms like distimbral, dismelos
and isoplacement emerge quite naturally alongside well-​established
concepts such as isorhythm and displacement. This ‘limit and vary’
concept of improvisation allows a way to identify improvisational
mechanisms beyond the ‘surface’ vocabulary employed, as well as
having direct practical applications.
3. Since a series of musical objects in an improvisation might be
identified as having varying degrees of similarity, one might
imagine a multidimensional space of musical proximity (M-​Space)
surrounding any musical object with closer objects being more
recognizably similar and further objects being more musically distant
until they have little or no recognizable similarity. Since this musical
space occupies many musical dimensions, it follows that musical
202 Music and Shape

distance may be manifested asymmetrically over various parameters,

and a moderate change over several parameters may feel as distant as
extreme changes over a few parameters. With the concept established
of a musical improvisation as a journey through musical space,
the concepts of velocity, acceleration and shape through musical
space can be applied. Five general (and subjective) improvisational
shapes were suggested (with real-​world examples provided) which
might describe an improvisation or section of an improvisation.
These include (a) ‘nuclear’, an exploration of proximal musical
space; (b) ‘field series’, the serial exploration of several M-​Space
areas (‘fields’); (c) ‘pivot’, an improvisation where one narrowly
defined area of M-​Space is used as a frequent landing point
between other satellite areas; (d) ‘merged’, where there is a slow drift
through M-​Space making clear delineation of fields difficult; and
(e) ‘unbounded’, an improvisation where there is little appreciable
relationship between objects.

Although these categories of shape relationships have emerged from improvisa-

tional research, a similar approach might be used to describe musical structure
generally including a range of compositional forms.
4. The concepts presented above (alongside historical, lexicological and
analytical frameworks) were employed in the detailed analysis of a
classical cadenza, where the concepts of chains-​of-​thought, ‘limit
and vary’ and M-​Space structure were revealed in this (compared to
jazz) stylistically distant example. Finally, the cadenza discussion led
to an additional concept of shape as encapsulating and transmitting
common characteristics through a process of transformation in
which nothing easily identifiable survives intact and yet resultant
material remains perceptually related to its starting point.

These concepts of improvisational shape, while born of jazz pedagogy and

abstract mathematical modelling, may be useful across diverse styles, not only
as analytical tools and ways to appreciate the craft, but also, potentially, as
a mechanism to develop improvisational (and indeed compositional) practice.


Bach, C. P. E., [1753, 1762] 1949: Essay on the True Art of Playing Keyboard Instruments,
trans. W. Mitchell (New York: Norton).
Bäckman, K. and P. Dahlstedt, 2008: ‘A generative representation for the evolution of jazz
solos’, Lecture Notes in Computer Science 4974: 371–​80.
Balliett, W., 1959: The Sound of Surprise (New York: Dutton).
Beethoven, L. v, 1968: Violin Concerto in D major Op. 61 (New York: Kalmus).
The shape of musical improvisation 203

Beethoven, L. v, 1909: Violin Concerto in D major Op. 61 (Leipzig: Steingräber).

Benadon, F., 2009: ‘Time warps in early jazz’, Music Theory Spectrum 31/​1: 1–​25.
Benson, B. E., 2003: The Improvisation of Musical Dialogue: A Phenomenology of Music
(Cambridge: Cambridge University Press).
Bergonzi, J., 1992: Inside Improvisation, Vol. 1: Melodic Structures ([Rottenburg:] Advance
Bergonzi, J., 1994: Inside Improvisation, Vol. 2: Pentatonics (Rottenburg: Advance Music).
Bergonzi, J., 1996: Inside Improvisation, Vol. 3: Jazz Line (Rottenburg: Advance Music).
Bergonzi, J., 1998:  Inside Improvisation, Vol. 4:  Melodic Rhythms (Rottenburg:  Advance
Bergonzi, J., 2000:  Inside Improvisation, Vol. 5:  Thesaurus of Intervallic Melodies
(Rottenburg: Advance Music).
Bergonzi, J., 2002: Inside Improvisation, Vol. 6: Developing a Jazz Language (Rottenburg:
Advance Music).
Bergonzi, J., 2004: Inside Improvisation, Vol. 7: Hexatonics (Rottenburg: Advance Music).
Berkowitz, A., 2010: The Improvising Mind: Cognition and Creativity in the Musical Moment
(Oxford: Oxford University Press).
Berliner, P. F., 1994: Thinking in Jazz (Chicago: University of Chicago Press).
Borgo, D., 2005:  Sync or Swarm:  Improvising Music in a Complex Age (London and
New York: Continuum).
Chomsky, N., 1988: Aspects of the Theory of Syntax (Cambridge, MA: MIT Press).
Crook, H., 1991: How to Improvise: An Approach to Practicing Improvisation (Rottenburg:
Advance Music).
Crook, H., 1995:  How to Comp:  A  Study in Jazz Accompaniment (Rottenburg:  Advance
Damian, J. and J. Feist, eds., 2001:  The Guitarist’s Guide to Composing and Improvising
(Boston: Berklee Press).
Dawkins, R., 1997: Climbing Mount Improbable (London: Penguin).
Fabian, D., 2006: ‘The recordings of Joachim, Ysaÿe and Sarasate in light of their recep-
tion by nineteenth-​century British critics’, International Review of the Aesthetics and
Sociology of Music 37/​2: 189–​211.
Franz, D., 1998: ‘Markov chains as tools for jazz improvisation analysis’ (MSc thesis,
Virginia Polytechnic Institute and State University).
Goodrick, M., 1987:  The Advancing Guitarist:  Applying Guitar Concepts & Techniques
(Milwaukee, WI: Hal Leonard).
Holmes, T., 2008: Electronic and Experimental Music: Technology, Music, and Culture, 3rd
edn (New York: Routledge).
Léonard, H., c. 1883: Cadenza pour le Concerto de Violon de Beethoven (Mainz: Schott).
Limb, C. J. and A. R. Braun, 2008:  ‘Neural substrates of spontaneous musical perfor-
mance: an fMRI study of jazz improvisation’, PLoS ONE 3/​2: e1679.
Love, S. C., 2012: ‘ “Possible paths”: schemata of phrasing and melody in Charlie
Parker’s Blues’, Music Theory Online 18/​3, http://​​issues/​mto.12.18.3/​ (accessed 9 April 2017).
Mermikides, M., 2010: Changes Over Time: Theory, http://​​phd (accessed 9
April 2017).
Monson, I. T., 1991: ‘Musical interaction in modern jazz: an ethnomusicological perspective’
(PhD dissertation, New York University).
204 Music and Shape

Monson, I. T., 1996:  Saying Something:  Jazz Improvisation and Interaction (Chicago:
University of Chicago Press).
Nachmanovitch, S., 1990: Free Play: Improvisation in Life and Art (New York: Tarcher/
Nectoux, J., 2004: Gabriel Fauré: A Musical Life (Cambridge: Cambridge University Press).
Nooshin, L., 2003: ‘Improvisation as other: creativity, knowledge and power—​the case of
Iranian classical music’, Journal of the Royal Musical Association 128/​2: 242–​96.
Patel, A. D., 2003: ‘Language, music, syntax and the brain’, Nature Neuroscience 6: 674–​81.
Persichetti, V., 1961: Twentieth-​Century Harmony: Creative Aspects and Practice (New
York and London: W. W. Norton).
Pressing, J., 1984: ‘Cognitive processes in improvisation’, in W. R. Crozier and A. J. Chapman,
eds., Cognitive Processes in the Perception of Art (Amsterdam: Elsevier), pp. 345–​67.
Pressing, J., 1988: ‘Improvisation: methods and models’, in J. A. Sloboda, ed., Generative
Processes in Music:  The Psychology of Performance, Improvisation, and Composition
(Oxford: Oxford University Press), pp. 129–​78.
Roads, C., 2004: Microsound (Cambridge, MA: MIT Press).
Sawyer, K., 1992: ‘Improvisational creativity: an analysis of jazz performance’, Creativity
Research Journal 5/​3: 253–​63.
Schillinger, J., 1978: The Schillinger System of Musical Composition (New York: Da Capo Press).
Solstad, S. H., 1991: ‘Jazz improvisation as information processing’ (MPhil thesis,
University of Trondheim).
Stowell, R., 1992: The Cambridge Companion to the Violin (Cambridge: Cambridge University
Stowell, R., 1998: Beethoven: Violin Concerto (Cambridge: Cambridge University Press).
Türk, D. G., [1789] 1982: School of Clavier Playing or Instructions in Playing the Clavier
for Teachers and Students, trans. R. H. Haggh (Lincoln: University of Nebraska Press).
Werner, K., 1996: Effortless Mastery: Liberating the Master Musician Within (New Albany,
IN: Jamey Aebersold Jazz, Inc.).
Wishart, T., 1996: On Sonic Art, ed. S. Emmerson (Amsterdam: Harwood Academic).
Xenakis, I. and S. Kanach, 2001:  Formalized Music:  Thought and Mathematics in
Composition (New York: Pendragon).


Bailey, D., 1975: The Advocate (Album, Tzadik TZ 7618).

Coltrane, J., 1965: A Love Supreme (Album, Impulse! A-77).
Metheny, P., 1976: Bright Sized Life (Album, ECM L1073).
Montgomery, W., 1965: Smokin’ At The Half-​Note (Album, Universal V6-8633).
Phuture, 1987: Acid Tracks (Single, Trax records TX-​142).
Smith, J., 1958: The Sermon! (Album, Blue Note BLP 4011).

Shapes performed
Max Baillie, violinist

3D Bach . . . and the harmonic comet

I tend to translate the abstract stuff of imaginatively conceived sound into

other media:  pictures, shapes or structures. As long as these translations are
also imagined, they remain in a sense abstract, but they give me a bearing, a
way to spatially conceive the real-​time experience of music and even the mem-
ory of it. Whether I am in it, or travelling with it, or through it, is determined by
the qualities of a given piece or genre, and the sense of my surroundings in this
imagined space varies as much as the soundworlds themselves and the contexts
in which I hear them. Whichever the journey, there’s no doubt that my role as a
performer influences my sense of music and shape.
I want to deal here with one specific relationship, the crucial role of musician
as translator of shapes on the written page into the abstract realm of sound. It
seems one of the unfortunate realities of the western tradition in music that a
visual medium usually intervenes between the physical act of playing music and
its sounding. The page provides a distraction for the senses, robbing the ears
and body of more complete dedication to sound and the physicality of making
it. Of course, we are grateful to ink and paper as our only inheritance of what
went on in, for example, Beethoven’s mind. But unlike a sculptor or a painter
whose work is visible and exists with a physical constancy we can rely on, he left
us with but a tantalizing potential for something that is spun into existence only
at the ever-​elusive present. As performers we are responsible for the spinning,
and must find a way to connect the imagination of the creator, captured in its
two-​dimensional code, back into its full sonic glory!
But doing this successfully relies on more than the ability to play our instru-
ments both accurately and expressively. Starting from the page, we need to make
sense of what is in one way a whole world of information and in another a quite
minimal set of instructions as to how notes should be sounded according to the
208 Music and Shape

original inspiration that birthed them in a particular combination. Although

what we look at is abundant with shapes, it is in many ways an ungenerous and
often misleading code. Here I want to lead the reader through some of the basic
steps I take towards translating it convincingly and imaginatively. I’ll use Bach’s
solo violin music as an example because I have a strongly spatial sense of it. It
strikes me as essentially harmonic music realized in a (mostly) melodic form, and
it is the combination and balance of these two elements which gives it its shape
and provides the basis for the journey on which the ear travels while hearing it.
To explain what I mean, I first describe my visualization of harmony as a
traveller. I then briefly go through some of the steps I take to translate what’s
on the page into a form which plays out the spaces in this travel and the way
the melodic content interacts with it. In this way I trace the creation of a three-​
dimensional imaginative journey from its beginnings on the page.

The harmonic comet

Picture an anchor embedded into the ground. A  comet ignites and takes off
from the anchor up into the air and in its wake leaves an expansive arc which it
completes into a circle when the comet returns to its starting point. In continu-
ous motion, it travels up and into a second revolution. On a third rise from the
anchor the comet seamlessly curves out into a new trajectory creating a second
circle suspended in the air. As you watch, the comet builds an entire structure of
suspended rings in the sky, exploring the unknown space and also returning to
familiar orbits. Eventually, its travels take it back to the anchor where it began.
This image is a metaphor for the journey the ear might travel while hearing
a piece of music—​specifically, music that both is tonal and modulates from
one key to another. The comet is its real-​time flow, the anchor is the home key,
and the circle that emanates from it describes a short harmonic journey from
stasis towards tension and back. The points at which the comet breaks into a
new orbit are the pivot chords, and if a feeling of movement comes from the
changes between chords within one key then it is the pivots that create the
sense of travel: rather than occupying the same space through a modulation,
I imagine the music as travelling from one space to another. The character of
this travel (which may be anything from a sublime cruise to a frantic search or
even a joyous ramble depending on the piece) is down to everything else in the
score: the metre, tempo, voice-​leading, bowing, and so on, but the movement
itself comes from travel between harmonic orbits.
But where do we start in our flat forest of notes (Figure R.13a)? Unlike, for
example, a classical portrait whose structure is clearly assumed, the overwhelm-
ing visual impression in the figures here is of something uniform. At a glance
the image flipped over looks more or less the same (Figure R.13b). Where are
the structural shapes?
Reflection: Max Baillie 209



FIGURE R.13   The opening of the Allemande from J. S. Bach’s Partita in D minor for solo violin: (R.13a)
as usual and (R.13b) upside down

Harmonic cartography

For both performer and teacher, playing Bach requires some detective work.
I sometimes find it helpful to think of the solo repertoire as a distillation (as
distinct from a reduction). Clearly, it’s not that there’s anything missing, but
there is an invisible hierarchy; the notes don’t all occupy the same function in
the harmonic scaffold. If we dig this structure out (Figure R.14), the shape of
the harmonic journey begins to emerge from a more or less uniform trail of
notes on the staff.
FIGURE R.14   Allemande, bars 1–​8, with a harmonic analysis of tonal centres and harmonic rhythm
Reflection: Max Baillie 211

FIGURE R.15   The passage in R.14 represented as a physical journey through space between related
tonal orbits

This kind of simple analysis can be made intuitively or more technically.

(I tend towards the intuitive but occasionally sit down with my violin and
strum chords while singing the written line to inform what my inner ear tells
me.) The crucial thing that emerges is a rhythm independent of the note values
themselves, and this rhythm dictates the spaces in our imaginative journey, the
distances covered, the passing places, the explorations and the return home.
Figure R.15 presents the same passage as the depiction of a harmonic traveller.
As performers, we are bound to respect certain constraints of discipline
delineated by the score, but I doubt Bach or any other great composer past or
present believes that the code we read is anything more than a practical aid (at
best) or a distraction (at worst) to real music-​making. Think, for example, of
that most severe and unmusical—​and yet necessary—​of features, the barline,
with its ruthless and rigid carving-​up of the phrase; or the militaristic group-
ings of notes held together by beams, often completely at odds with the way a
phrase groups notes together. What the harmonic map allows us to do is to gain
a sense of the underlying structure of the music, of its inherent shape indepen-
dent of its presentation. We can then phrase it accordingly, which along with
knowing where to give emphasis, informs us also where not to give it.
Figure R.16 offers an example where awareness of the harmonic rhythm
results in a shape at odds with the figuration and the general visual impression
of the music. In this opening, bars 5 and 6 are visually distinct from each other
212 Music and Shape

FIGURE R.16   Allegro assai, bars 1–​8, from J. S. Bach’s Sonata in C major for unaccompanied violin

not only in figuration but in Bach’s original phrase markings, as are bars 7 and 8.
From the beginning the harmonic rhythm swings boisterously from tonic to
dominant in each bar, and yet despite the visible differentiation between bars 5
and 6 we are going up a gear on our rustic C major ride: the harmonic rhythm
halves. Bars 5 and 6 belong very much together by virtue of sailing across one
harmony, and the same is true of bars 7 and 8.
The shape the listener receives, if the performer emphasizes this harmonic
rhythm (Figure R.17), is totally different than if either the figurations or the
visual impression of each bar as a separate entity guides the performer. It’s
unquestionably more convincing to my ears: it’s as though the music has gone
up a gear; the trajectory is the same but the arc is bigger. The listener hears
the lower timescale expand while the upper, the flow of semiquavers, remains
constant, and it’s as though two timescales of music are bound together: magic!
But there are other layers too. To use another metaphor, if the harmony is
the skeleton, what of the flesh and blood? The next stage in building up our 3D
image is to look at how the harmonic layer interacts with the melodic rhythm,
and I’ve found this equally illuminating.

Multilayered shaping: harmonic versus melodic rhythm

The passagework in Figure R.18 suggests a three-​to-​a-​bar feel, and as such

seems on first hearing quite natural: the notes that pop out are the ones that
don’t belong to the middle register’s noodling accompaniment, the first of
each group of four. But the harmonic rhythm dictates just two beats, the first
and the third, which is infinitely more groovy in giving momentum towards
the harmonic exploration that follows this excerpt. Thus the harmonic rhythm
𝅗𝅥 𝅘𝅥 𝅗𝅥 𝅘𝅥 underpins an upper melodic rhythm 𝅘𝅥 𝅘𝅥 𝅘𝅥 𝅘𝅥 𝅘𝅥 𝅘𝅥. Having dug down into the har-
monic groundwork, the performer who plays with a sense of the underlying
harmonic rhythm allows the listener to experience a dialogue between comple-
mentary layers which interact.


These are examples of what is in the music but not spelled out by the
score: shapes embedded in the text but not immediately visible. To my ears they
FIGURE R.17   The passage in R.16 showing the harmonic rhythm
FIGURE R.18   The Allegro assai, bars 13–​16, showing melodic rhythm
Reflection: Max Baillie 215

play an essential role in making sense of what Bach intended: we must remem-
ber that, although an accomplished violinist, he wrote from the keyboard, with
all its richness of harmony, counterpoint and voice-​leading expressed through
the medium of an essentially melodic instrument in these works for solo violin.
As a listener I feel I want to be led on the deeper path, the harmonic journey,
while enjoying these layers in dialogue; and that is also the way I aspire to bring
them to life as a violinist. The result is often that the phrasing becomes clearer
and also simpler:  the performer makes longer lines where passages belong
together harmonically, and the music gains its multilayered quality where
melodic and harmonic rhythms interplay. Then there is also the whole world of
melodic contrapuntal writing (as opposed to counterpoint between harmonic
and melodic layers) embedded in Bach’s solo violin music; here the challenge
(and joy) as a performer is in the sense of spinning these as dialogue while also
playing one line of music with a single coherence.
Being sensitive to, and inquisitive of, the depth of the text should be a nat-
ural ingredient in a loving realization of this music. Playing with compelling
sound and presence is not enough: if we don’t dig below the surface, and if
we don’t detach ourselves imaginatively from the staff, our performance will
ultimately be one-​dimensional because it will miss out the embedded propor-
tions in the music. The idea of a harmonic comet is one possible metaphor that
enables a player to assign an imagined shape to these proportions and embraces
the idea of the ear as a harmonic traveller. It suggests music as an agent with
a will to explore and with a physical form independent of its undifferentiated
representation on the page. As a way spatially to conceive harmonic patterns
it is an imaginative tool, and when combined with the melodic and rhythmic
layers around this harmonic framework it allows us to bring Bach’s music to
life in 3D.

Shape as understood by performing musicians

Helen M. Prior

This chapter presents findings from a study of performing musicians and focuses
on some of their practices and beliefs related to musical shaping. Musical per-
formance has been studied in myriad ways, and with a wide range of aims
in mind (for a useful overview, see Gabrielsson 2003). Preparation for perfor-
mance (especially memorized performance) has been examined in considerable
detail, with researchers finding expert practice to be a highly structured activity
in which performers focus on three dimensions of a composition:  the basic
dimension, which includes all aspects of the music requiring attention simply
to play the notes of the piece, and which therefore includes technical decisions;
the interpretative dimension, involving decisions about phrasing, dynamics
and tempo; and the performance dimension, which involves every aspect of
the piece that requires attention during performance, including basic, interpre-
tative and expressive performance cues (Chaffin, Imreh and Crawford 2002;
Chaffin et al. 2010). Experts often work on small sections of a piece of music,
determined by the musical structure, before joining these chunks together to
create larger sections as the piece becomes more familiar (Chaffin et al. 2002).
Decisions involved in musical performance preparation have also been exam-
ined, with three main types of performance decision being identified: intuitive,
deliberate and procedural, procedural being previously deliberate decisions
that have become intuitive over time (Bangert, Fabian et  al. 2014). Bangert,
Schubert and Fabian (2014) propose a spiral model of musical decision-​
making, in which a musician’s decisions switch from being intuitive to deliber-
ate and from there become procedural:  the proportion of intuitive decisions
thus increases with expertise. As a performer focuses on new musical features,
the cycle is repeated.
Many of the decisions made by performers concern expressive performance,
the teaching and nature of which has been examined extensively (Brenner and
Shape as understood by performing musicians 217

Strand 2013; Davis 2009; Fabian, Timmers and Schubert 2014; Juslin 2003;
Juslin, Friberg and Bresin 2002; Juslin, Friberg and Schoonderwaldt 2004;
Juslin and Madison 1999; Karlsson and Juslin 2008). Particularly useful is the
GERMS model (Juslin 2003), which identifies five essential components of
musical expression. These are: Generative rules, which serve to clarify the musi-
cal structure through timing, dynamics and articulation; Emotional expression,
in which a range of parameters is used by a performer to convey an intended
emotional expression; Random variations, which are unavoidable and essential
for a performance to sound as though it is produced by a human being; Motion
principles, which incorporate the representation of intended and non-​intended
biological motion in sound; and Stylistic unexpectedness, which involves the
creation of tension through the violation of expectations. Though these com-
ponents may not all be considered consciously by performers in their decision-​
making, this division does provide some understanding of what performers are
doing in order to create an expressive performance.
Some studies examine the use of particular types of language, such as
metaphors, in relation to music performance preparation (Barten 1998;
Woody 2002), but few studies examine the use and meaning of only one word.
Usually, such an exercise would be rather futile, as much of the terminology
employed by musicians has a reasonably well-​established definition. Shape,
or shaping, however, appears to have resisted formal definition in relation to
music, and yet seems to be a useful term for performers, as well as for other
musicians. A recent questionnaire study (Prior 2012c) revealed that perform-
ers use the notion of shaping when practising, in rehearsals, when teaching
and when playing music from a wide range of genres. The term was used in
relation to several ideas, from musical structure to musical expression, emo-
tion and tension; and in relation to specific musical features such as phrasing,
melodic line and dynamics. Overall, shape was found to be highly versatile
and multifaceted. This was an interesting finding in itself, but there was no
way in which an in-​depth understanding of shaping could be gained through
these data, gathered as they were in an online questionnaire. A subsequent
interview study allowed greater interaction with a small number of partici-
pants and allowed the development of a model of the ways in which musical
shaping may be used by performers and understood by those studying them.
This study, and the model arising from those data, are presented and refined
within this chapter.

Aim and method

The aim of the interview study was to understand how performing musicians
use the idea of musical shape or shaping.
218 Music and Shape


Ten professional musicians were interviewed; five were violinists and the other
five harpsichordists. The choice of instruments was carefully considered, in
terms of both the researcher’s background knowledge and experience as a
musician and the potential this gave for insight into the techniques discussed by
the musicians, and also in terms of the instruments’ different capabilities, which
seemed likely to prompt interesting variations in the musicians’ conceptions
of musical shaping. Specifically, the differences between the instruments’ abil-
ity to sustain a sound, to produce sounds with a varied dynamic range and to
play chords were noted. A further difference between the instrumentalists was
the violinists’ close knowledge of their own instrument, in contrast with the
harpsichordists’ unfamiliarity with the harpsichord used in the study, a double
manual by Michael Johnson.
Details of the participants can be seen in Table 7.1. They ranged from eigh-
teen to fifty-​four years of age, and their experience playing their instrument
ranged from less than ten years to forty years. They were all resident in the
UK, though some participants were originally from Australia, South America,

TABLE 7.1   Participants in the interview study

Name Instrument Age Group Years Playing Birthplace Place(s) of Study

Tina* Violin 25–​34 11–​20 UK Manchester University

(undergraduate); RNCM
(postgraduate); Sheffield
University (PG)
Bridget* Violin 25–​34 21–​30 UK TCM (UG)
Elsie* Violin 25–​34 21–​30 Australia Sydney Conservatorium
of Music (UG); RCM (PG)
Victor* Violin 35–​44 31–​40 Uruguay Privately (UG); RAM (PG)
Darragh Morgan Violin 35–​44 21–​30 Ireland GSMD (UG); Hong Kong
Academy of Performing
Arts (PG)
Yoshi* Harpsichord 25–​34 11–​20 Japan Queensland
Conservatorium (UG,
(PG); University of York
Katharine May Harpsichord 45–​54 21–​30 UK RCM (UG, PG)
Jane Chapman Harpsichord 45–​54 31–​40 UK RCM; Sweelinck
Julian Perkins Harpsichord 25–​34 21–​30 UK University of Cambridge
(UG); RAM (PG); Schola
Cantorum, Basel (PG)
Nathaniel Mander Harpsichord 18–​24 10 UK RAM

Note: Names with an asterisk are pseudonyms; other participants wished to be named.

Shape as understood by performing musicians 219

Ireland and Japan. Many of them had studied performance at universities and
conservatoires, often to postgraduate level. They were all established profes-
sional performers, the majority of their earnings coming from performance,
though some of them also taught or had research interests.


The personal experiences and attributes of the researcher are acknowledged to

have an influence on all stages of the research process. For the sake of trans-
parency, specific details are provided here, similar in nature to those provided
above for the participants. The interviewer was female and (in terms of the clas-
sifications used with the participants) in the twenty-​five to thirty-​four age group
and the twenty-​one to thirty years’ experience group. She had studied music at
a university as an undergraduate before studying music psychology as a post-
graduate, but had maintained involvement in practical music-​making in vari-
ous spheres throughout this time. The researcher did not disclose her musical
experiences explicitly to the participants unless they asked specific questions.


The participants were asked to attend an interview at King’s College London,

and to bring some music with them that they knew well or had been working
on recently. Violinists were asked to bring their instrument. All participants
were given a consent form and a brief demographic questionnaire to complete
before the main interview began. The interview schedule had been developed
using the findings of a previous questionnaire study (Prior 2012c) but was also
designed to incorporate practical music-making. At the beginning of the inter-
views the participants were asked to play a brief musical extract selected for
its potential for musical shaping and its probable unfamiliarity.1 Participants
were asked to play the extract as they would normally approach a new piece
of music, and then to describe what they were thinking about as they were
playing, as they might to a student. After this discussion, they were told that
the study was about musical shape or shaping, and they were asked to play the
extract again, while thinking about the shape, or their shaping, of the music.
They were then asked to describe their thoughts once more. Some participants
were also asked to play an extract without musical shaping and to describe
their thoughts again. Although this procedure could not be expected to allow
direct access to participants’ thought processes (Ericsson 2006; Ericsson and
Simon 1993), it did elicit helpful, descriptive responses that had some degree of
ecological validity.
This task was used as a prompt for further discussion. Participants were
asked how this compared to their usual experiences of shaping music, what they
meant when they referred to musical shape, and about shaping pieces they had
220 Music and Shape

brought with them or knew well. The schedule contents and order were flexible
to ensure that the interviews felt natural and comfortable for the participants.
At the end of the interview, participants signed the consent form and were
compensated for their time. The interviews were recorded using a Panasonic
SD700 HD Camcorder and a Sony ICD-​UX200 Digital Voice Recorder.


The interviews generated verbal, musical and gestural data, all of which were
analysed to some extent (Prior 2012a). This chapter focuses mainly on the ver-
bal data, with reference to some of the musical data. The verbal data were ana-
lysed with Interpretative Phenomenological Analysis (IPA). This approach has
been widely used in health psychology, but has also been found to work well in
research in music psychology (McPherson, Davidson and Faulkner 2012: 92) as
it allows participants’ thoughts and experiences to be examined idiographi-
cally and in detail. In particular, IPA is appropriate for situations in which
researchers are conducting exploratory studies investigating how individuals
are making sense of their personal and social world and the processes within
that world (Smith and Osborn 2003). The use of IPA was particularly appro-
priate here because of the complex and potentially idiosyncratic ways in which
expert musicians perceive and understand their work, as well as the potential
for emotional involvement in their practices. What constitutes a ‘good’ musical
performance is, in part, socially constructed, determined not only by techni-
cal expertise but by the tastes of both the individual and the period in which
they are performing (Leech-​Wilkinson 2009). It therefore seems appropriate to
examine the processes of musical shaping with a method such as IPA that was
developed within the framework of social constructionism.
Data analysis proceeded according to the guidelines for IPA provided by its
pioneers (Smith, Flowers and Larkin 2009; Smith and Osborn 2003). Following
each interview, the recording was listened to in its entirety and initial notes
were made. The data were then transcribed verbatim, but the recording was
used alongside the text throughout the coding process. Initial coding focused
on a phenomenological approach to the data, identifying the main concerns of
each participant and the meaning these concerns had for them. A second stage
of coding followed with an interpretative approach which attempted to iden-
tify how and why the participant had those concerns and to link the phenom-
enological codes to more abstract ideas. The coding was validated by another
member of the research team. Themes were generated from the coded data,
and a summary was written for each participant in relation to each theme. A
summary diagram was also created for each participant. Each interview was
analysed completely before moving on to the next participant’s data.
During the interviews, participants frequently demonstrated their thoughts
about musical shaping on their instrument or by singing. These data were seen
Shape as understood by performing musicians 221

as of equal importance to their verbal descriptions; indeed, some musicians

seemed to feel that they could communicate more effectively through musi-
cal demonstrations than through verbal description. Moreover, these musical
demonstrations can be seen to circumvent, to some extent, the limitations of
the representational validity of language (Willig 2001). Where these musical
examples were seen to shed light on a particular discussion, they were analysed
using Sonic Visualiser (Cannam, Landone and Sandler 2010). Musical demon-
strations were never considered in isolation; rather, multiple examples from one
or more participants were compared in conjunction with the verbal descrip-
tions participants provided. In this way, the musical demonstrations informed
the analysis of the verbal data and the verbal data provided explanations for
particular features of musical shaping that were demonstrated.
During their verbal explanations and musical demonstrations, participants
often used gestures. These gestures are currently the subject of further analysis
and are not considered fully here.

Results and a proposed model

Although the data gathered provided scope for the consideration of musi-
cal shaping in considerable detail, within this chapter a broad view is taken,
with the aim of creating a data-​led model of the use of musical shaping by
musicians. The model (which also acts as a summary of the data) is shown
in Figure 7.1 and is available as well on the companion website , complete
with tables showing examples of each component. On the far left of the model
is the concept or idea of a musical level that can be controlled (or for some
participants, ‘shaped’). Next to this is a column of musical triggers for shap-
ing: features of the music that participants identified as influencing their
shaping decisions. On the far right of the model is the change in sound that
results from the musical levels being controlled or shaped in performance.
These three columns are arranged in approximate size o ​ rder, with the larg-
est features at the top and the smallest at the bottom. One of the remaining
two columns in the model outlines the technical modifications that are used
to create this changed or shaped sound on the two instruments studied. The
separation of the two instruments within this column allows for the fact that
each instrument has limitations that restrict a performer’s ability to control the
changes in sound represented in the final column. Although these technical
approaches could be the participants’ main focus of attention, many partici-
pants appeared to ‘skip over’ these detailed decision-​making processes, using
more or less metaphorical ideas like shape heuristically to help them to create
a musically expressive performance, a notion that is represented by the central
column in the model. Because many of the heuristics seemed to be applicable
at multiple levels, they are arranged not in size order, but alphabetically. Each
222 Music and Shape

Musical level Trigger Heuristic Technical Change in sound

View of the score Audience Instrumentation
Concert Violin
Breathing Bow pressure,
Musical structure speed, angle, Programme
Composer contact
Whole piece Words on the Direction
LH contact, Tempo
Gesture Ornamentation
Polyphony Imagery
Harpsichord Timbre
Melodic contour Registration
Section Instrument
Attack speed/ Timing
Line weight
Patterns Attack/release Dynamics
Phrase synchrony -
Dynamic Shape spreading/
markings over-holding Vibrato
Articulation or Release speed
Note phrase markings Style Articulation

FIGURE 7.1  
Model of musical shaping. In the online version, each component is numbered, and
numbered examples of each component are presented in linked tables. See the companion website:

column and component of the model may be active independently or in com-

bination with any of the other components at any point in time; components
are not tied to other components of the model, be they similar or different in
nature or scale. Each component of the model is explored here before quota-
tions are used to highlight the ways in which multiple dimensions of the model
may interact.


The first column of the model focuses on the musical levels discussed by par-
ticipants in relation to musical shaping. It became apparent through the inter-
views that shape was a very flexible term in many ways, not least in the scale at
which it could be applied. Table 7.2 shows the participants who discussed using
shaping at each level. Examples from all participants may be found on the com-
panion website, and some of these are discussed later in the chapter; here, the
focus remains on a few specific quotations from participants discussing shape
at multiple scales.
Several participants discussed the use of the term ‘shape’ at more than one
level. Elsie commented that ‘Every note should have some kind of shape. And
every phrase needs to have a shape’,2 and she also explained that her understand-
ing of the large-​scale shape of the music affected the ways in which she shaped at
Shape as understood by performing musicians 223

TABLE 7.2   Participants who discussed each musical level (see Table 7.1 for their names, repre-
sented here by initials)

Musical Level Violinists Harpsichordists Grand Total

B D E T V Total Ja Ju K N Y Total

Concert 0 ✓ ✓ ✓ 3 3
Whole piece ✓ ✓ ✓ 3 ✓ ✓ ✓ ✓ 4 7
Movement ✓ ✓ ✓ ✓ 4 ✓ ✓ ✓ ✓ 4 8
Section ✓ ✓ ✓ ✓ 4 ✓ ✓ ✓ ✓ 4 8
Phrase ✓ ✓ ✓ ✓ ✓ 5 ✓ ✓ ✓ ✓ ✓ 5 10
Note ✓ ✓ ✓ 3 ✓ ✓ ✓ ✓ ✓ 5 8

smaller levels.3 Victor, too, saw ‘shaping’ as a flexible term that could apply to
several levels of the music. When asked to define shape, Victor used metaphors
of language and narrative:

RESEARCHER: So if I asked you to sum up ‘shaping’, what is it, in a

VICTOR: I think it’s making sense, saying a sentence that makes sense.
And it has a starting point, and a development and a climax, and a
resolution, and a stop.
RESEARCHER: OK. And is that on a large scale, or a small scale, or both?
VICTOR: I think both.4

In contrast, Julian used more technical language to describe the slight variation
in meaning that he felt occurred with the use of shape in different contexts:
JULIAN: I suppose if someone said to me . . . ‘What shape does the
music have to you?’ I’d think instinctively they were talking about the
structure. . . So structure and shape sort of overlap in that capacity. If
you’re talking about a phrase, and you said the shape, I’d be thinking
about, as a player, the sort of technical way you might play it, in
terms of grouping of notes, and the articulations . . . what degrees of
staccato or legato do we want . . . in a particular given phrase. But . . .
with baroque music, one note can have shape, a messa di voce, so you
can just, you know, if you were talking to a violinist or particularly
a singer, and you said ‘What shape does that note have?’ you might
immediately think of the swelling and diminuendo of one note.’5

For Darragh, however, the term ‘shape’ applied specifically to the phrasing
level, with other words being more appropriate for larger or smaller levels of
shaping that were discussed by other participants:

RESEARCHER: Some people sometimes use shape and structure

interchangeably; would you agree with that, or do you think shape
is different?
224 Music and Shape

DARRAGH: I think shape is to do with, again, this thing of tessitura, of

line, of actual line, whereas the structure, yeah of course, you know,
a bigger question, a bigger picture. . . Point A to point B, to point C,
whatever you want to call it . . . but it’s from there, to there, to there,
to there. And that’s the piece of music.
RESEARCHER: OK. So your shaping is on a relatively small scale?
DARRAGH: Exactly . . . it’s more under a micro-​magnifying glass,
whereas the other is looking at the big picture, isn’t it, probably.
RESEARCHER: OK. . . Are you thinking about shape when you’re playing
a single note, on its own?
DARRAGH: N-​no. You’re thinking possibly about colour, about quality
of sound, about length, because of the bow.6

These ideas could be seen to operate on a spectrum of specificity and scale, with
Elsie and Victor at one extreme, using the term ‘shaping’ flexibly at all levels,
Julian in a more central position acknowledging the slight variation of mean-
ing in the word between small and large scales, and Darragh at the opposite
extreme, reserving the idea for the phrasing level and using other terminology
for variations in sound at other levels. Specific examples of shaping at each level
are discussed later in the chapter.


The second column within the model shows the score-​based triggers identified
by participants as influencing their shape-​related decision-​making. Table 7.3
shows the participants who reported using each idea. In the model and in the
table, the triggers are shown in the order that relates approximately to musi-
cal scale, with large-​scale ideas at the top and small-​scale ideas at the bottom.
Some of the titles of these ideas may seem self-​explanatory; however, others
are more complicated, and therefore the categories are discussed briefly and in
order, with a few examples. Full examples are provided online.

View of the score
This particular musical trigger was usually an overarching philosophical stance
adopted by the musicians relating to how they felt they should use the informa-
tion provided on the score by the composer or the editor. Participants discussed
the idea of ‘shaping as being anything that you’re doing to get the music off the
page, and to the listener’,7 with the score providing clues as to how this might be
achieved.8 Other participants discussed the score as their only tangible connec-
tion with a composer and that composer’s intentions, with Elsie commenting,
‘it’s just you and the composer again’,9 and Victor describing the score as a code
that he has to interpret.10
Shape as understood by performing musicians 225

TABLE 7.3   Participants who discussed each trigger

Musical Trigger Violinists Harpsichordists Grand Total

B D E T V Total Ja Ju K N Y Total

View of the ✓ ✓ ✓ ✓ 4 ✓ ✓ 2 6
Musical ✓ ✓ ✓ ✓ 4 ✓ ✓ ✓ 3 7
Words on the ✓ ✓ 2 ✓ 1 3
Harmony ✓ ✓ ✓ ✓ ✓ 5 ✓ ✓ ✓ ✓ ✓ 5 10
Polyphony ✓ ✓ ✓ 3 ✓ ✓ ✓ ✓ ✓ 5 8
Melodic ✓ ✓ ✓ ✓ ✓ 5 ✓ ✓ ✓ ✓ ✓ 5 10
Rhythm ✓ ✓ ✓ 3 ✓ ✓ ✓ ✓ ✓ 5 8
Patterns ✓ 1 ✓ ✓ 2 3
Dynamic ✓ ✓ ✓ 3 0 3
Articulation ✓ ✓ ✓ ✓ 4 ✓ ✓ ✓ 3 7
or phrase

Musical structure
Some of the interviewees discussed musical structure as something that had an
influence on their musical shaping, with Elsie commenting that she is always
aware of her position within the musical structure as she plays11 and confirm-
ing that this influences her shaping on a smaller scale.12 Others described the
ways they would highlight structural boundaries13 or create a sense of structure
through their playing.14 Darragh discussed structure as something his fellow per-
formers frequently liked to be aware of before making interpretative decisions.15

Words on the score
Performance directions,16 words provided by the composer to convey a pro-
gramme or appropriate imagery for a piece,17 and the lyrics of a vocal piece18
were all reported to have a direct bearing on the musical shaping used by the

Harmony was one of only two triggers to be discussed by all participants,
though not all of them felt comfortable in using this trigger themselves, with
Bridget suggesting that she found other methods more intuitive.19 All four of
the other violinists, however, described how harmony could influence their
shaping decisions, with Victor arguing that much of the expressiveness of a
226 Music and Shape

performance can be lost if a performer is unaware of the harmony underly-

ing the melody they are playing.20 Elsie felt that harmony was central to her
shaping of the music,21 and she, Tina, Victor and Darragh provided numer-
ous examples of how this could occur. The harpsichordists, too, were focused
on harmony for many of their shaping and expressive decisions, with Julian
describing it as one of the first things he looked at in the score;22 and all five
harpsichordists discussed harmonic features of the music in relation to their
shaping decisions.23

Participants discussed musical parts played by others in ensembles influencing
their musical shaping,24 as well as their awareness of ‘voices’ within their own
parts, and of the shaping decisions they made to try to highlight those voices
for their listeners.25

Melodic contour
Like harmony, melodic contour was discussed by all participants. Many dis-
cussed mirroring melodic contours with dynamics26 or highlighting the top of a
phrase through timing.27 Others discussed descending melodic lines or tessitura
more generally.28

Participants discussed the ‘shape of the rhythm’,29 the hierarchical relationships
between beats in a bar,30 and the link between those relationships and bowing
patterns.31 Others discussed the appropriate grouping of particular rhythmic pat-
terns32 and how the shaping of a phrase related to its rhythmic (and other) constit-
uents.33 Tina discussed decisions relating to the musical shaping of syncopation.34

Victor, Jane and Katharine all discussed patterns (such as harmonic sequences)
in the music that influenced their shaping decisions.35

Dynamic markings
None of the harpsichordists discussed dynamic markings, probably because
there were none present in their scores. Bridget, Tina and Victor discussed
dynamic markings as a trigger for their musical shaping, though Victor sug-
gested that he did not feel he needed to think consciously about applying them.
Rather, he suggested, ‘I think dynamics fall into place’.36

Articulation or phrase markings

Four violinists and two harpsichordists discussed articulation (or bowing)
markings and how these influenced their musical shaping decisions. Nathaniel
discussed the influence of phrase markings on his musical shaping in some
Shape as understood by performing musicians 227

detail, specifically noting the phrasing indicated by the composer and what this
meant for him as a performer.37

Summary of musical triggers

Overall, a range of musical triggers seemed to prompt the participants in their
shaping decisions. These were rarely used in isolation; instead, they were used
in combination with others in specific ways that were appropriate for the piece
of music being discussed and the instrument on which it was to be performed.
Participants appeared to have preferences for particular triggers, with some
particularly favouring harmonic features and others focusing more on details
such as dynamic or articulation markings or rhythmic features. The next two
sections focus on the means by which participants used these triggers to modify
the sound they produced, namely through technical modifications and through
heuristics for musical expression. These are deliberately discussed in the oppo-
site order to that in which they are presented in the model, so that the most
tangible and concrete concepts are raised before less specific ideas that may, on
occasion, replace them in conscious thought.


As shown in Table 7.4, participants discussed a range of instrument-​specific

technical modifications that they could apply in relation to musical shaping.
The categories shown in the table reflect the comments made by partici-
pants in relation to the intertwined nature of the technical modifications they
were able to make. When a participant discussed a change in bow pressure,
for example, they frequently mentioned other changes, such as a modification
in bow speed. Although they also would couple these with changes in the left
hand, such as movements required for clean shifting or for vibrato, these were
sometimes considered separately from bowing considerations, and even if not,

TABLE 7.4   Participants who discussed each technical modification

Violinists Harpsichordists

Technical B D E T V Total Technical Ja Ju K N Y Total

modification modification
Bow pressure, ✓ ✓ ✓ ✓ ✓ 5 Registration ✓ ✓ ✓ 3
speed, angle, Attack speed /​ ✓ ✓ ✓ ✓ ✓ 5
contact weight
Left-​hand ✓ ✓ ✓ ✓ ✓ 5 Attack/​release ✓ ✓ ✓ ✓ ✓ 5
contact, synchrony
position, (spreading /​
movement over-​holding)
Release speed ✓ ✓ 2
228 Music and Shape

can be separated conceptually simply because of the physical independence of

the two hands. All violinists offered features within these two categories.38
The harpsichordists were slightly more varied in their discussions, with only
some of them mentioning registration. There is no doubt that this is something
carefully considered by all harpsichordists in relation to a performance, but in
the interviews with Julian, Katharine and Nathaniel, it was raised in relation
to musical shaping, and clear indications were made that the registration had
an effect on both the overall shaping of a suite or other set of pieces39 and that
the registration had implications for the shaping of phrases and notes.40 All the
harpsichordists talked about the ways in which they varied the attack speed and
weight of a note, and the synchrony of attack and release (i.e. the spreading
of chords).41 Only two participants discussed the speed at which they would
release the keys.42
Specific examples of technical modifications described and executed by par-
ticipants are discussed later in the chapter.


Although the technical approaches discussed above could be the participants’

main focus of attention, many participants appeared to ‘skip over’ these
detailed decision-​making processes, using more or less metaphorical ideas,
like shape or shaping, to help them create a musically expressive performance.
Participants would sometimes find it hard to discuss some of the technical
approaches mentioned above,43 and would prefer to use terms that were less
specific but perhaps more useful. It seemed that participants were using words
like ‘shape’ or ‘direction’, or ideas about ‘where the music was going’ as heu-
ristics, or ‘short-​cuts based on experience that solve problems too complex to
resolve quickly enough using analytical thought’ (Leech-​Wilkinson and Prior
2014: 36). These heuristics were often metaphorical, and participants seemed
to use them to consider ways of playing the music expressively without having
to focus on specific technical aspects of their playing. Yoshi commented, ‘I
think there’s a lot of things I do, sort of, naturally, that I don’t . . . consciously
think about.’44 A particularly helpful example of this was provided by Victor:

RESEARCHER: I noticed as well, here, . . . because you were ‘heading for

the top’, you were moving up towards the heel of the bow, and
lengthening your bow stroke. Is that a conscious thing that you’re
thinking about, or is it more that you’re just thinking that you’re
heading to the top, and that’s . . . shorthand for all the technical
things that are going on?
VICTOR: Absolutely. No, I didn’t think about that at all. Um, it probably
just happened because it’s integrated. Now maybe subconsciously.45
Shape as understood by performing musicians 229

Some of the metaphorical ideas concerning the music and the appropriate
musical shaping seemed to be expressed through gesture, exposing participants’
multimodal understanding of musical shaping. Participants often discussed
ideas of direction, movement and gesture when talking about their musical
shaping; and, while they did so, they often used gestures in conjunction with
their descriptions or demonstrations of the music. Participants used height to
represent pitch, and vertical gestures to indicate rhythmic features. Arch-​shapes
were used to indicate the shape of a phrase, and larger arches, wave patterns or
circular gestures to indicate the shape of an overall piece (Prior 2012a, 2012b).
Further analysis is intended to investigate whether or not there are specific dif-
ferences between the gestures used by violinists and harpsichordists, as well
as correspondences between the gestures used by participants, their verbal
descriptions and their musical demonstrations.
Specific examples of heuristics used by participants are discussed in more
detail later in the chapter; however, Table 7.5 shows the use of a range of heuris-
tic terms in the interviews and their distribution among participants. Because of
their holistic and nonspecific nature, the terms are not listed in order of size, as
other components of the model have been. Instead, they are listed alphabetically.


Participants discussed a range of changes in sound that correspond to those

used in expressive performance, namely vibrato, dynamics, timing fluctuations,
timbre, ornamentation and tempo. In addition, participants also discussed the

TABLE 7.5   Participants who discussed each heuristic

Heuristic Violinists Harpsichordists Grand Total

B D E T V Total Ja Ju K N Y Total

Audience ✓ ✓ ✓ ✓ ✓ 5 ✓ ✓ ✓ 3 8
Breathing ✓ ✓ ✓ 3 ✓ ✓ ✓ 3 6
Composer ✓ ✓ 2 ✓ ✓ ✓ 3 5
Direction ✓ ✓ ✓ ✓ ✓ 5 ✓ ✓ ✓ ✓ ✓ 5 10
Emotions ✓ ✓ ✓ 3 ✓ ✓ ✓ ✓ 4 7
Gesture ✓ ✓ ✓ 3 ✓ ✓ ✓ ✓ 4 7
Imagery ✓ ✓ 1 ✓ ✓ ✓ 3 4
Importance ✓ ✓ ✓ 3 ✓ ✓ ✓ ✓ 4 7
Instrument ✓ 1 ✓ ✓ ✓ ✓ ✓ 5 6
Line ✓ ✓ ✓ 3 ✓ ✓ ✓ 3 5
Natural ✓ ✓ ✓ ✓ 4 ✓ ✓ ✓ ✓ 4 8
Shape ✓ ✓ ✓ ✓ ✓ 5 ✓ ✓ ✓ ✓ ✓ 5 10
Singing ✓ ✓ 2 ✓ ✓ ✓ 3 5
Style ✓ ✓ ✓ ✓ ✓ 5 ✓ ✓ ✓ 3 8
230 Music and Shape

programme of a concert and instrumentation. Examples of these are high-

lighted later in the chapter. Before this, selected overall situational factors that
may influence the working of the model are briefly discussed.


Each column of this model is affected by some overall situational factors, such
as whether the performance decisions are made in private practice, in rehearsal
or in performance; or whether the music involves other performers who influ-
ence the shaping decisions made. Several participants noted the value of per-
formance for generating ideas about musical shaping, an idea supported by
some existing research (Doğantan-​Dack 2013). Yoshi suggested that she was
‘more alert’ during performance than when practising, which enabled her to
notice new features of the music and to have new ideas concerning the shaping
of those features. If Yoshi considers those ideas to be ‘too risky’ she ‘saves them
for later’, but there are times when she tries new ideas during a performance.46
Both Tina and Katharine valued spontaneity in their ensemble performances,
and noted how other performers would influence their own shaping during a
concert. Tina discussed the ‘communal’ and ‘spontaneous’ shaping of a Haydn
string quartet, stating that ‘the ideal is that at any point, really, one person
might help to guide it in a particular way, so that . . . the contour of the piece
changes’.47 Similarly, Katharine reported that she particularly enjoyed perform-
ing with ‘someone who takes a few risks, and does something spon­taneously
that you can then react to: . . . that’s really nice music-​making’.48 Hence, it is
anticipated that the situation in which musical shaping is considered will influ-
ence the extent to which each aspect of the model is used.


The value of the model described above lies not only in its ability to outline
various aspects of shaping as discussed by the participants in the study, but also
its potential for showing the combinations of factors used by participants in
specific situations. With this in mind, specific quotations and musical examples
from the interviews are discussed alongside presentations of the model that
highlight which components are active in each situation. Examples showing
shaping at each musical level are offered.

Shaping at the concert level

Katharine considered the varying of instrumentation and programme for an
audience at a concert:
KATHARINE: I’ve just been doing some concerts over the
weekend . . . basically accompanying, you know, a small chamber
group, but on Sunday I actually played the suite, as a break in
Shape as understood by performing musicians 231

the programme from having violin sound or singer. Just so that

there’s . . . variety within the . . . listening experience, I suppose.
You’re not listening to string players the whole time, or whatever, a
little bit of time for something a bit different.49

This could be seen as shaping a programme and the instrumentation of that

programme to suit a listener. It can be represented by the model as shown in
Figure 7.2, available online . The shaping level is designated as the level of
the whole concert, and the programme and instrumentation change as a result
of the consideration of the audience. All these components of the model are
shown in black, whereas the rest of the model is shown in grey.

Shaping at the level of a whole piece or movement

Bridget commented on shaping at the levels of phrasing, movement and the
whole piece. She was using ideas related to the ‘direction’ heuristic (‘where’s this
movement going’) and also the ‘importance’ heuristic (‘where’s the . . . high-
light of this’). The comment does not encompass the effect of these ideas on
the musical sound produced, and therefore this uncertainty is represented by
a question mark over this area of the model in Figure 7.3, available online .
BRIDGET: the main thing that I come across, talking about shape,
whether it’s just on my own, or in rehearsals, in an ensemble, is about
phrasing, like, the small phrases, . . . one line, or the bigger shape of
the whole piece, or the whole movement, so, instead of where’s the,
where’s this phrase going, where’s this movement going, where’s the,
you know, the highlight of this, where’s the whole, all these phrases,
where are they going to?50

Shaping at the movement level

Elsie discussed shaping at the movement level in some detail, describing how
her awareness of the larger-scale structure of a piece of music would affect her
shaping of the piece:
ELSIE: For example, if this was a slow movement, in between two outer
movements, . . . I’d be very, very careful to create a mood in which the
music could just sing. And it would also give the audience a chance
to relax, in between the two outer allegro movements, or presto, or
whatever it is. You have to time pieces throughout the whole thing,
and you have to know when to back off, and when to really, you
know, go for it, I suppose.51

She continued, playing the music in two ways, and describing her thoughts
about what she was doing:
ELSIE: I’ll play it in two different ways. If . . . I was playing this, within
the context of a larger piece of music, and I played it sort of, um
232 Music and Shape

[plays]. That might sound, sort of OK, but I’m still so involved with
it, do you know what I mean? Um, why not just let it go? [sighs] and
give the audience a chance to go, ‘Oh, that’s really nice’ you know, in
between having been gripped for the first thing, you know, so I could
just [plays] and just [plays]. It could give something, just completely
different. And it’s all to do with where the music lies within the whole

These sound examples are available on the companion website, and it is pos-
sible to examine them for differences between the two versions. Using Sonic
Visualiser (Cannam et al. 2010), we identified the main beats of the excerpts
and exported the data for statistical analysis (Table 7.6). The two versions
differed in tempo: the ‘involved’ version ( ) had a shorter mean beat length
and was therefore faster than the ‘letting go’ version ( ). In an interview sit-
uation, the significance of this is difficult to assess; however, the variance of
the beat length also differed, with the ‘involved’ version having a significantly
larger variance than the ‘letting go’ version (Levene’s test of homogeneity of
variance: F (1, 26) = 13.9, p = 0.001). The two versions also differed in Elsie’s
use of dynamics. Although the mean power of each excerpt cannot be judged
reliably from this interview source, the variance of the power showed a con-
siderable difference between the two versions, with the ‘involved’ version hav-
ing significantly greater variance than the ‘letting go’ version (Levene’s test
of homogeneity of variance: F (1, 1739) = 15.9, p < 0.001). Some of this was
achieved by using less bow pressure, though Elsie did not specify any other
technical modifications. When listening, one can hear a slight difference in
the vibrato used in each version, with the ‘involved’ version seeming to have
a slightly faster vibrato that begins more promptly after the start of the note
than the ‘letting go’ version.
When representing the whole of this quotation with the model, we can see
that Elsie is considering the shaping of a piece and movement as a whole, and
that she is considering the musical structure of the whole work as a trigger for
her musical shaping. She is using the heuristics of ‘audience’, ‘emotions’ and

TABLE 7.6   Differences between Elsie’s two versions of the

extract [00:11:00]

Involved Letting Go

Mean beat length 0.562 0.625

Variance of beat length 0.010 0.003
Mean power −​15.8 −​15.5
Variance of power 48.5 31.6
Shape as understood by performing musicians 233

‘singing’, and employing technical modifications using both hands to modify

the overall tempo, the timbre, the timing fluctuations, the dynamics and the
Interestingly, Julian discussed a similar idea to Elsie’s ‘letting go’:
JULIAN: sometimes if you make things, if you emote things too much,
it can become a bit wearisome to listen to. Sometimes just sort of,
stating simplicity is beautiful in itself. . . And I think that might
apply here. So I mean, I was sort of suggesting a lot of things, and
I think, if I was performing it in a concert, I might sort of throw a
lot of those, not throw them away, but, just make them a secondary
consideration, just for the, just going for a simple reading . . . So it’s
not too . . . convoluted.53

His performance preparation has involved ideas of shaping, but he some-

times approaches a performance with the desire to give a simpler performance.
Perhaps, as Elsie’s data suggest, a ‘less shaped’ performance can sometimes be

Shaping at the section level

Nathaniel discussed the shaping of phrases over longer sections of the music,
noting the harmony, melodic contour and patterns such as repetition, but also
using heuristics of the composer, suggested by his use of ‘he’ (the composer)
rather than ‘the music’ or ‘it’; direction, indicated by terms such as ‘goes’,
‘going’, ‘that way’ and ‘all the way to here’; emotions, suggested by the words
‘amazing’, ‘miraculous’, ‘incredible’ and ‘defeated’; and style, indicated by his
comment about the significance of repetition in baroque music. He does not
discuss the technical means by which he modifies the sound, but he mentions
changes in timing. This quotation is represented on the model in Figure 7.5,
available online .
NATHANIEL: So at the end of that, um let’s just see [plays] then you
start again [plays] and then, this time, he goes up [plays]—​isn’t that
amazing?—​and back down, before he goes that way. Then he goes
on, B♭ minor [plays] and on, C minor [plays] and then he keeps
going with an extended phrase, all the way to here and here, all the
way to the dominant, and then when he gets here, it’s miraculous,
[plays] and then we get [plays] that chord [plays] which is just
incredible, isn’t it? [plays]. So we get [plays] and away, and
there it is again [plays] which, I suppose, in the baroque, repetition,
it’s, it’s all about something more. So actually [plays] this time,
I think it’s even more defeated [plays] so I get slightly slower
234 Music and Shape

Shaping at the phrasing level

Participants gave many examples of shaping at the phrasing level, and so sev-
eral examples are considered here. Jane discussed shaping phrases using vivid

JANE: But I’m thinking of . . . not just the shape that’s up and down, . . .
I was thinking of shapes that swell. Again, it’s my three-​dimensional
thing, something that swells out, like a kind of serpent with
swellings in its body! [laughs] . . . So it not just a slippery snake that
goes like that, it’s something that kind of opens out and expands. . .
RESEARCHER: Can you tell me where?
JANE: Where it is, I suppose again, it would come to the harmonic thing
[plays]. That’s a sort of [plays], that’s a ‘here I am’ [plays], a sort
of visible [plays]. That to me is where he’s swelling out . . . puffing
himself up, but still he’s got energy to carry on [plays]. Now that
could be either [plays]; that could be just going away to nothing so,
I suppose the shape of that, thin shape, fat bulbous shape, starting
fairly bulbously, getting thinner, more bulbous as it comes down
again, and then going off to, just disappearing off. . . Which is . . . the
way the harpsichord works; you could do it completely the opposite
on the piano, because of . . . the dynamics, so in a way, the lack of
dynamics, . . . means that you have to follow, what the instrument’s
telling you. . . While on the piano, I could play that [sings] at the end,
but on the harpsichord I can do the [plays]. Some holding, but, could
do that I suppose [plays]. . . And the fact that he’s put er, lines over
each one, shows a kind of gestural [plays], gestural shape [plays].
Slightly rounded at the end there.55

When represented on the model, this quotation highlights the phrase level,
triggers of harmony and melodic contour, heuristics of gesture, imagery,
instrument and shape, technical modifications relating to over-​holding, and
timing as a change in sound (see Figure 7.6, available online ).
Victor discussed shaping a phrase in slightly more prosaic terms, though
he too was frequently emotionally invested in the music he was discussing
and playing. He noted the musical triggers of harmony, melodic contour
and rhythm, using the heuristic of the audience (listener) and the metaphor-
ical imagery of communication to convey his ideas. The following quote is
represented in Figure 7.7, available online :
VICTOR: it’s about how the listener will receive something that makes
sense. So how the melody, how the phrase is made up, is completely
unique, and it’s made up of technical considerations of rhythm,
pitch, harmony, of where the top point is, where it’s going, how
fast it’s getting there, . . . how slowly or fast it unravels, how it does.
Shape as understood by performing musicians 235

I think phrasing’s about being able to see that, from this [indicates
Tina discussed wide-​ranging parts of the model when talking about shaping
a phrase. She discussed musical triggers of melodic contour, harmony and
dynamic markings, and heuristics of audience, direction, imagery and line, as
well as changes in sound relating to timing and dynamics. Her quote is repre-
sented in Figure 7.8, available online .:
TINA: Yes, I suppose the shape of a phrase, whether it goes up
or . . . down, for example, the first line, thinking of it generally,
growing up to the top, and down again . . . 
RESEARCHER: So is it the pitch you’re thinking about, in terms of the
shape, or—​
TINA: Pitch, and, well, the dynamic, which is written in anyway. And
direction, so . . . some sort of forward movement towards the higher
point of it, so sort of trying to reach the top of it and then perhaps
away, and relaxing on the way back down again.
RESEARCHER: Do you mean forward movement in terms of tempo, or a
combination of things, or—​
TINA: Um, not exactly tempo, not an accelerando, but a sense of it.
Someone I know describes things as, you play them either in the
present tense, or the future, or the past, so I s’pose, if you play
something in the future, you’re sort of looking forwards . . . um,
which doesn’t exactly mean you play . . . faster, . . . it means you’re
sort of on the front edge of maybe, of what you think the tempo is,
rather than the back edge.
RESEARCHER: Yeah, OK. . . Would you describe that as rubato, or is it
not quite as much as that?
TINA: It’s not, no, not as much as that, just a general sense, I suppose
a sense of ‘line’ through . . . some kind of thread that, your, sort
of, intention, that comes across. . . I suppose if you’re speaking, if
you’re reading something out loud, you make sure the words within a
sentence carry on, even though you have to articulate each word and
things, but you don’t [pauses] pause [pauses] until you get to the end
of the sentence, you make sure you’ve got there, I suppose.57
Darragh discussed a technically and perceptually complex passage from
Bach’s E minor Partita which contains implied polyphony. He noted how for
this particular passage, little conscious shaping was required, an approach
which is supported by recent research (Davis 2009), whereas at the end of the
passage he would begin shaping the music once more. It was apparent from
his playing that he was referring not only to the timing fluctuations within
his performance, but also the dynamic range, the timbre or tone colour, and
236 Music and Shape

DARRAGH: So, you know, in a sense, you know, here we are talking

about music and interpretation, and shape, and phrasing, and all
that, but you know, music’s always been there, because it’s Bach,
and it’s genius, and it’s probably the best composer ever, but, if you
just kind of can play well, in tune, for that particular passage, and,
you know, technically be in control of what you’re doing, already
a certain amount of the battle is won, isn’t it? . . . he’s written the
genius into it, hasn’t he? . . . all I’m saying is you don’t, there’s not
anything, in one sense, you need to add to this particular passage, for
instance, right from [plays]. You know, if I was just really struggling
with that, [plays] you wouldn’t have the right flow. . . But because I’m
lucky enough, and I gradually worked it out, uh, where my bow’s
meant to be, uh, [plays] then at the end you start, yeah, shaping
again, or, taking time for the music to breathe again.58

When represented on the model (see Figure 7.9, online ), this quotation high-
lights shaping at the phrase level, with musical triggers of melodic contour and
(implied) polyphony, heuristics of breathing, composer and shape, technical
modifications in both hands (the left hand has complex fingering patterns and
shifts, and the bow moves in changing patterns relating to string crossings),
affecting the timbre, the timing and probably the dynamic variation and vibrato.

Shaping at the note level

Elsie gave a very clear example of shaping a single note. Her quote features the
musical triggers of harmony and rhythm; heuristics of the audience, emotion,
instrument (specifically, the baroque bow) and shape; technical modifications
undertaken by both hands; and a change in sound in timbre, timing, dynamics
and vibrato (heard in her demonstrations). The following quote is represented
on the model in Figure 7.10, available online :
ELSIE: Well what I would say, is, um, look at the shape of the bow, and,
and how long you have to do that note, you know [plays] it’s about
[plays] two seconds’ worth, I suppose, if we’re going to be really
analytical about it, and . . . the note needs to have a shape, so where’s
the . . . middle of that note going to be? . . . And um, [plays] basically,
it’s the bar lines, ’cos you know [plays] to get the maximum emotional
impact, I suppose you have to time the middle of that note to
coincide with this [plays] the clash [plays]. Then you’re really gonna
get the audience going, ‘Oh wow!’ you know? ’cos dissonances are
much, much more interesting sometimes than consonances.59

Yoshi, too, described shaping a single note, providing considerable detail about
the physical interaction between her arms and fingers and the keyboard of the
harpsichord, and about the resulting differences in the sound produced:
Shape as understood by performing musicians 237

YOSHI: I think sometimes, it’s the way you drop. [plays] If you just let
the weight of your fingers drop, or if you do it a little bit more [plays]
instant, not force, but just a little bit of ping on your finger, and then
you get more of a clear start to the sound. And if you, you can use
the flat bit of your finger, then it’s a little bit [plays] um, milder, a
little bit more sort of gentle, sort of plucking. . . I think the weight,
the speed, and also the angle . . . of the fingers will sort of, I think,
[plays] I guess you have more control [plays] when it’s flatter . . .
[plays] rather than that, but then, and then you sort of, sometimes,
just give it a little kick, and that’s a little bit . . . uh, it’s a little bit
more clear at the beginning, and somehow louder as well. [plays]60

When represented on the model, the heuristics of gesture, imagery and instru-
ment are highlighted, reflecting Yoshi’s discussion of the movements she is
making, the metaphorical ideas surrounding those movements (‘ping’, ‘little
kick’, etc.), and the technicalities of the instrument she is playing. She is dis-
cussing the attack of a single note, and therefore this is highlighted in the
musical level and technical modifications areas of the model. The change in
sound discussed concerns the timbre and dynamic of the note produced. This
can be seen in Figure 7.11, available online .


It is clear from the examples shown that multiple components of the model are
frequently used by participants at once. Each broad category can be thought
about in isolation or considered in relation to another. Often, technical modifi-
cations may not be thought about on a conscious level, with performers thinking
instead of heuristics to achieve their desired change in sound. Nor are musi-
cal triggers always thought about consciously. Different participants seemed
to favour particular components, suggesting that, over time, performers may
develop their own preferred means of thinking about musical shaping that are
represented in numerous areas of the model. It is worth bearing in mind, how-
ever, that the model was built from data gathered in one interview with each par-
ticipant, and is unlikely to represent the full scope of the shaping ex­perience. It
does, however, provide a picture of some of the ways in which these performing
musicians conceptualize and use the notion of musical shaping.
The model provides a new perspective on performance preparation, partly
because it is focused on musical shaping, rather than on performance prepar­
ation in general. Some components seem to correspond with aspects of exist-
ing research findings. In relation to research in expert practice (Chaffin et al.
2002; Chaffin et al. 2010), many of the musical triggers, some of the heuris-
tics, and many of the technical modifications may be involved in the formation
of interpretative performance cues. Some of the heuristics also seem likely to
238 Music and Shape

be involved in the formation of expressive cues. Many of the participants dis-

cussed the musical structure (and harmonic goals within the music, which are
necessarily related to musical structure), around which expert practice is often
organized. Parts of the model may also be considered in the light of research
in musical decision-​making discussed earlier (Bangert, Fabian et  al. 2014;
Bangert, Schubert and Fabian 2014). While musical triggers appear to prompt
all types of decisions concerning musical shaping, deliberate decision-​making
seems likely to involve technical modifications, while more intuitive decisions
may involve heuristics.
It may be possible to relate particular components of the model to Juslin’s
(2003) GERMS model of musical expression. Many of the musical triggers are
likely to be generative features, and therefore are highlighted with some of the
changes in sound in the right-​hand column. The heuristics of audience and emo-
tions are likely to aid the performer in generating an intended emotional expres-
sion. While random variations are not intended by the performer, this component
might perhaps be related to technical modifications. Motion principles may be
created with the aid of heuristics such as breathing, direction, gesture, line, natu-
ral and shape. Finally, stylistic unexpectedness may be aided with the heuristic
of style. Future research could use the research methods commonly employed in
performance-​preparation studies to ascertain whether or not the above specula-
tions hold true, or whether there may be ways of combining this model of musical
shaping with other aspects of performance preparation to c­ reate an overarching
model. It might also be possible to develop ways of representing individuals’ per-
sonal preferences in their understanding of musical shaping, or the particular
components used within one practice session or rehearsal. These could be used in
studies of performance preparation to examine participants’ shaping focus and
how this changes over time. Such representations could also be used in a study
of ensembles to investigate the dynamics of musical shaping in a group setting.
Does the shaping within a rehearsal switch between the preferred modes of the
members, or does it remain more constant? Are the dominant shaping modes
related to the music performed or the members of the group, or both?
Although the model represents the findings of the ten interviews discussed,
it may have the potential to be applied to other performers, and this would
be desirable in the search for an overarching conceptual model of shape or
shaping for musicians. When assessing the model’s generalizing potential, we
need to take several considerations into account. The two sample instruments
are technically very different in how they generate sounds, in the techniques
required of performers, and in the sounds themselves. There were, however, few
(if any) systematic differences between the responses of violinists and harpsi-
chords in terms of the levels at which musical shaping could be applied or dis-
cussed, musical triggers for shaping, or the heuristics for performance. Rather,
it seemed as though participants had individual preferences for these features of
the model. These similarities suggest that another sample of classical musicians
Shape as understood by performing musicians 239

would discuss shaping in the ways suggested here. A future study might look at
wind or brass players, or singers. Another interesting group might be players
of untuned percussion instruments: we could hypothesize that they might be
focused on rhythm, but to what extent do they shape what they play according
to the melodic and harmonic features of other parts?
Further studies might also establish whether or not the model has the poten-
tial to be generalized to western performers who are less reliant on a score, such
as musicians within the broad popular genre or jazz musicians. Within Chapter
8 of this book, Greasley and Prior argue that the performers of popular music
share responsibility for the shaping of the final sounds of the songs with others,
such as sound engineers, and indeed, classical musicians in recording settings
and certain live performance situations may also recognize this idea. The model
might therefore need to be extended to encompass the performers’ awareness
of and interaction with these other contributors; this is something that neces-
sitates further empirical study.
In its current form, this model offers an understanding of musical shaping
from the perspective of classical performing musicians. While the terms ‘shape’
and ‘shaping’ are commonly used by performers, their meanings have not previ-
ously been defined in relation to music. This model confirms the flexibility of
the term, highlighting its ability to be used in relation to all levels of the musical
structure; the influence of an array of musical triggers on performers’ shap-
ing decisions; the use of shape as one of a number of heuristics for expressive
performance; the technical modifications required to shape a note, phrase, sec-
tion, etc.; and the change in sound that results. At the very least, the data and
the resulting model have allowed some understanding of the commonly used
phrase ‘That was a beautifully shaped performance’, and that understanding
may perhaps help others to achieve that elusive goal.


This work was supported by the AHRC Research Centre for Musical
Performance as Creative Practice (grant number RC/​ AH/​D502527/​ 1). The
author is most grateful to David Mackin of Greengate Publishing Services for
producing the figures for this chapter.


Bangert, D., D. Fabian, E. Schubert and D. Yeadon, 2014: ‘Performing solo Bach: a case
study of musical decision-​making’, Musicae Scientiae 18/​1: 35–​52.
Bangert, D., E. Schubert and D. Fabian, 2014: ‘A spiral model of musical decision-​mak-
ing’, Frontiers in Psychology 5/320, (accessed
9 April 2017).
240 Music and Shape

Barten, S. S., 1998:  ‘Speaking of music:  the use of motor-​affective metaphors in music
instruction’, Journal of Aesthetic Education 32/​2: 89–​97.
Brenner, B. and K. Strand, 2013: ‘A case study of teaching musical expression to young
performers’, Journal of Research in Music Education 61/​1: 80–​96.
Cannam, C., C. Landone and M. Sandler, 2010: ‘Sonic visualiser: an open source applica-
tion for viewing, analysing, and annotating music audio files’, paper presented at the
ACM Multimedia 2010 International Conference, Firenze, Italy, 25–​29 October 2010.
Chaffin, R., G. Imreh and M. Crawford, 2002: Practicing Perfection (Mahwah, NJ: Erlbaum).
Chaffin, R., T. Lisboa, T. Logan and K. T. Begosh, 2010: ‘Preparing for memorized cello
performance: the role of performance cues’, Psychology of Music 38/​1: 3–​30.
Davis, S., 2009: ‘Bring out the counterpoint: exploring the relationship between implied
polyphony and rubato in Bach’s solo violin music’, Psychology of Music 37/​3: 301–​24.
Doğantan-​ Dack, M., 2013:  ‘Familiarity and musical performance’, in E. King and
H. M. Prior, eds., Music and Familiarity:  Listening, Musicology and Performance
(Aldershot: Ashgate), pp. 271–​88.
Ericsson, K. A., 2006:  ‘Protocol analysis and expert thought:  concurrent verbalizations
of thinking during experts’ performance on representative tasks’, in K. A. Ericsson, N.
Charness, P. J. Feltovich and R. R. Hoffman, eds., The Cambridge Handbook of Expertise
and Expert Performance (Cambridge: Cambridge University Press), pp. 223–​41.
Ericsson, K. A. and H. A. Simon, 1993:  Protocol Analysis:  Verbal Reports as Data
(Cambridge, MA: MIT Press).
Fabian, D., R. Timmers and E. Schubert, eds., 2014: Expressiveness in Music Performance:
Empirical Approaches across Styles and Cultures (Oxford: Oxford University Press).
Gabrielsson, A., 2003:  ‘Music performance research at the millennium’, Psychology of
Music 31/​3: 221–​72.
Juslin, P. N., 2003: ‘Five facets of musical expression: a psychologist’s perspective on music
performance’, Psychology of Music 31/​3: 273–​302.
Juslin, P. N. and G. Madison, 1999: ‘The role of timing patterns in recognition of emotional
expression from musical performance’, Music Perception 17/​2: 197–​221.
Juslin, P. N., A. Friberg and R. Bresin, 2002: ‘Toward a computational model of expres-
sion in music performance:  the GERM model’, Musicae Scientiae (Special Issue
2001–​2): 63–​122.
Juslin, P. N., A. Friberg and E. Schoonderwaldt, 2004: ‘Feedback learning of musical expres-
sivity’, in A. Williamon, ed., Musical Excellence: Strategies and Techniques to Enhance
Performance (Oxford: Oxford University Press), pp. 247–​70.
Karlsson, J. and P. N. Juslin, 2008: ‘Musical expression: an observational study of instru-
mental teaching’, Psychology of Music 36/​3: 309–​34.
Leech-​Wilkinson, D., 2009: The Changing Sound of Music: Approaches to the Study of
Recorded Musical Performances, http://​​studies/​chapters/​intro.html
(accessed 9 April 2017).
Leech-​ Wilkinson, D. and H. M. Prior, 2014:  ‘Heuristics for expressive perfor-
mance’, in D. Fabian, E. Schubert and R. Timmers, eds., Expressiveness in Music
Performance:  Empirical and Cultural Approaches (Oxford:  Oxford University Press),
pp. 34–​57.
McPherson, G. E., J. W. Davidson and R. Faulkner, 2012: Music in Our Lives: Rethinking
Musical Ability, Development and Identity (New York: Oxford University Press).
Shape as understood by performing musicians 241

Prior, H. M., 2012a: ‘Methods for exploring interview data in a study of musical shap-
ing’, paper presented at the 12th International Conference on Music Perception and
Cognition (ICMPC) and 8th Triennial Conference of the European Society for the
Cognitive Sciences of Music (ESCOM), Thessaloniki, Greece, 23–28 July 2012.
Prior, H. M., 2012b: ‘Multi-​modal understandings of musical shape: a comparison of
violinists and harpsichordists’, paper presented at the SEMPRE 40th Anniversary
Conference, Institute of Education, London, UK, 14–​15 September 2012.
Prior, H. M., 2012c: ‘Shaping music in performance: report for questionnaire participants
(revised August 2012)’, http://​​wp-​content/​uploads/​2015/​09/​Prior_​
Report.pdf (accessed 9 April 2017).
Smith, J. A. and M. Osborn, 2003: ‘Interpretative phenomenological analysis’, in J. A. Smith,
ed., Qualitative Psychology:  A  Practical Guide to Research Methods (London:  Sage),
pp. 51–​80.
Smith, J. A., P. Flowers and M. Larkin, 2009:  Interpretative Phenomenological Analysis
(London: Sage).
Willig, C., 2001: Introducing Qualitative Research in Psychology: Adventures in Theory and
Method (Maidenhead: Open University Press).
Woody, R. H., 2002: ‘Emotion, imagery and metaphor in the acquisition of musical perfor-
mance skill’, Music Education Research 4/​2: 213–​24.
Simon Desbruslais, trumpeter

Expressive freedoms in trumpet performance

During a rehearsal of Johann Sebastian Bach’s Cantata BWV 51 in Oxford in

2008, a colleague, whom I had invited to listen and to observe, advised quite
simply: more shapes. The aim, I believe, was to lift the notes further from the
page to create a more nuanced and stylish performance. This suited both the
contrapuntal edifice of Bach’s music and the period instruments that we were
using. On this occasion I  was leading the ensemble and therefore in posses-
sion of greater authority than usual. I have nonetheless had similar subsequent
experiences of this piece, and of similar repertoire, where I have possessed artis-
tic licence to create a microcosm of musical shapes not found in the notated
score. Indeed, I have found that such practice continues to be strongly encour-
aged within this genre.
The emancipation of expression in performance has been a focus of my
early career, significantly contrasting with my formative musical training in
symphony orchestras where such freedom was at a greater premium. Certain
trumpet repertoires, roles and trumpet types invite more creative expression
than others, often as a consequence of notational detail, style and function.
Drawing on my experience, in this Reflection I introduce a selection of reper-
toires from baroque to contemporary music to examine these notions.

Trumpet sound and character

John Wilbraham (1944–​98), a prominent British trumpeter, once remarked that

‘the trumpet is an inanimate object. It will only make a sound if you drop
it’. These words ring true for every trumpet (and brass) player, who is typi-
cally required to spend many years developing and maintaining an efficient
Reflection: Simon Desbruslais 243

embouchure to ‘manufacture’ sound. The physical instrument merely acts as

a device for resonance, and while trumpets do have individual sound quali-
ties (particularly dependent on their alloy) the final sound product is primarily
determined by the physical and technical attributes of the performer. There
are limitless possibilities: the shape of the mouth and tongue, embouchure
strength, type of instrument and size, and mouthpiece depth and density can
have a direct impact on the tone quality.
Many trumpeters aspire to play in an orchestra. Perhaps this is due to the
empowering position a trumpeter commands at the helm of a large group of
instrumentalists, or, more circumstantially, to a lack of solo repertoire (George
Enescu’s Légende of 1906 was the first solo composition by a prominent com-
poser since Johann Hummel’s Trumpet Concerto in 1803). For a trumpeter
wishing to perform professionally as a solo or chamber musician there are very
few available opportunities; these are generally self-​made, resulting in orches-
tral performance being, for many, the only realistic option. In an orchestra the
ability to play in a ‘section’ is paramount, which requires an awareness of other
performers and, most importantly, an ability to mirror the style of the princi-
pal player. The sound of a trumpet section should be homogeneous, meaning
that many forms of artistic creativity are sidelined. Particularly when playing
second trumpet, the role is to blend in: one must not ‘stick out’, but follow the
intonation, style and sound of the rest of the section as closely as possible,
often to the point of mirroring instrument and mouthpiece types.
However, while homogeneity is expected within a section, it is not an inher-
ent characteristic of the instrument. Different trumpeters produce different
When the role of a section is not required—​such as a prominent orchestral
solo, concerto, recital or chamber music performance—​the expressive oppor-
tunity to shape and colour the sound is presented. Accuracy (while important)
is no longer the overarching concern; originality and distinctiveness are para­
mount, tailored by vibrato, tone, phrase shape, dynamics, articulation and
rubato. The following examples represent creative independence in a variety
of repertoire to illustrate how certain types and roles engender greater freedom
than others.

Expressive freedoms

Bars 57–​61 from Gloria II of Bach’s B minor Mass illustrate an approach

to shape that has been influenced by the physicality of the baroque trumpet
and the HIP (historically informed performance) movement. The scarcity of
expressive markings, save the slurs in bars 57–​58, encourages expressive free-
dom. I approach this passage in five ways that conflict with the approach of
many trumpeters using modern instruments: (1) ‘phrasing-​off’, (2) diminuendo
244 Music and Shape



FIGURE R.19   Bach, B minor Mass, Gloria II, bars 57–​61: a) Gesellschaft edition, followed by b) a
notated interpretation

towards high notes, (3) upper-​note trill, (4) no vibrato and (5) slurred semi-
quaver couplets. Figure R.19a recreates the original markings from the Bach
Gesellschaft edition, followed by a transcription of one possible interpretation
(Figure R.19b). Notably, both second and third trumpets also have creative
roles in this style of writing.
This interpretation could be related to the physical baroque instrument.
‘Phrasing-​ off’ slurred couplets (emphasizing the first note) helps stamina,
accords with extant treatises (such as Quantz [1752] 2001)  and creates a
nuanced and layered musical character. The upper (clarino) register is quiet; we
know this both from experimentation with surviving instruments and from the
orchestration of works such as Bach’s Second Brandenburg Concerto, where
trumpet must balance with the concertino of violin, oboe and recorder. Upper-​
note lip trills, it may be argued, are easier to play. Particularly when using a
large baroque mouthpiece, vibrato is very hard to create and unnecessarily
exhausting. Finally, semiquavers are easier to perform when slurred in couplets.
While HIP informs performances (the approach described here is represen-
tative of many period trumpeters) the performer is still freer to choose than in
a detailed contemporary score. And although I have reached something of a
norm in my performance of this extract, I remain free to change; the scarcity of
dynamic markings encourages a creative approach to phrase shape.
The next passage (Figure R.20) is something that I have heard hammered out
in modern groups, where the triads are interpreted as loud articulations against
Bach’s complex counterpoint. However, this approach, perhaps influenced by
Reflection: Simon Desbruslais 245

FIGURE R.20   Bach, B minor Mass, Cum sancto spiritu, bars 111–​17

the sound of the piccolo trumpet, misses the opportunity to shape this phrase.
A diminuendo towards the final top C (a sounding D) provides an elegant alter-
native, emphasizing instead bar 113 as the dynamic climax.
Though it creates an attractive musical shape, this approach is harder to
perform. Some shapes, however, can make a trumpeter’s life easier. I have heard
many times (and I  am sure this is true for other instrumentalists) that one
should make a long note ‘travel’ or ‘go somewhere’. This can be a psychological
tactic to encourage the performer to breathe or bow ‘through’ a note—​to work
harder as the note progresses—​in order to sustain a long note where the effect
would otherwise be static. However, this can also form dynamic and colour
shapes. The example in Figure R.21 is taken from the third volume of Güttler’s
collection (1970) of Bach’s trumpet music, the definitive text for professional
performers. It is my personal copy, which I  have used for many live perfor-
mances on the natural trumpet. I find that the imaginary slurs help to remind
me of the overall shape and direction of the phrase (it is no coincidence that
I am supportive of Schenkerian analysis). I want to see how the pitches relate
to each other: rather than symphonic technique, where pitches are accurately
punctuated with a uniform character, I want to understand, and remind myself
in performance of, the larger musical line.
The extended melodies of the nineteenth century, enabled by the invention of
the chromatic, valved trumpet, encouraged and lent themselves to more extensive
colouration. This period introduced the modern orchestral solo, which became
a platform for performers to show both technical assurance and the individual-
ity of sound colour. However, it was originally the cornet that was assigned the
freedom to perform expressive solos, while the baroque trumpet was demoted
to a less creative role with a function mainly to articulate (Lawson and Stowell

FIGURE R.21   J. S. Bach, Complete Trumpet Repertoire, Vol. III with my annotations (used by kind
permission of Breitkopf & Härtel, Wiesbaden).
246 Music and Shape

FIGURE R.22  Tchaikovsky, Swan Lake Suite Op. 20a, ‘Intrada’, rehearsal mark 13

FIGURE R.23  Pritchard, Skyspace (2012), third movement, notated for piccolo trumpet in A, bars 1–​8
(used with permission).

1999: 130–​2). This was due in part to the loss of high baroque trumpet technique.
Nineteenth-​century orchestral solos often have a sense of expressive freedom, of
which the famous cornet solo in Pyotr Tchaikovsky’s Swan Lake Suite is exem-
plary (Figure R.22). These moments highlight the individuality of the principal
trumpeter’s sound, line and musical shapes. Generally, tied notes encourage both
a small dynamic change and a change in vibrato and colour.
The often complex styles of contemporary music have encouraged a
‘straighter’ approach which mirrors that of much baroque performance.
Playing with vibrato is more exhausting, requiring greater stamina, and the
high demands of contemporary music are more easily met with a straighter
sound. This is not to say that vibrato is completely avoided, but it is not the
primary colour. Similar physical strength is required to play the clarino trum-
pet and contemporary repertoire.
Individual nuances in musical phrase, however, are as important in many
contemporary works as they are in the baroque. While Deborah Pritchard’s
piccolo trumpet concerto, Skyspace (Figure R.23), contains meticulous atten-
tion to notational detail, I add my own character (indeed, I do not think that it
is possible to notate every expressive detail in a score). In the third movement,
although this is not notated, I ‘lift’ the second of each quaver-​pair in a manner
related to my experience of baroque music.

Concluding remarks

Owing to the nature of trumpet sound production, shape is of crucial impor-

tance to the way in which its performers think; indeed, I believe that shape is one
Reflection: Simon Desbruslais 247

of the most important concerns of trumpet performance, after basic technical

competency has been secured. Furthermore, the infinite array of characters
and styles of trumpet playing means that the freedom to shape an ‘individual’
sound is highly valued. Such individuality is encouraged by function, genre and
notational practice, and is most prominent in baroque trumpet repertoire (par-
ticularly Bach and Handel), orchestral trumpet solos and contemporary music.
When the role of the trumpet is to articulate orchestral textures, or to conform
to a homogeneous section, individual shapes are subordinate to accuracy.
While vibrato is found in the performance of all trumpet contexts, it is note-
worthy that straight players—​those whose base sound does not use vibrato, yet
who add it on occasion as a colour—​tend towards baroque and contemporary
repertoires. It is no coincidence that several trumpet soloists have specialized in
period performance and contemporary music, such as Reinhold Friedrich and
Gabriele Cassone. I too aspire towards this approach. This is due not only to
the lack of solo repertoire in the nineteenth century, but also to the compat-
ibility of baroque and contemporary styles. The physicality of the trumpet also
makes expressive shapes particularly important and can reduce performance
anxiety; a focus on character helps to direct the mind away from accuracy alone.
The various roles of the trumpet strongly influence freedoms to create and
to express shapes. The variety of distinctive characters and sounds of trumpet-
ers should be valued rather than homogenized. Though there is less freedom
for the second trumpet in a symphony orchestra than a solo performer in a
trumpet and piano recital, conductors would do well to value this freedom to
shape. It would also encourage and nurture national styles of orchestral perfor-
mance, which are become increasingly standardized. The trumpet itself makes
no sound, but the physiques and personalities of trumpeters make for infinite
shaping possibilities.


Bach, J. S., 1970: Complete Trumpet Repertoire, ed. L. Güttler, 3 vols (Wiesbaden: Breitkopf
& Härtel).
Lawson, C. and R. Stowell, 1999: The Historical Performance of Music: An Introduction
(Cambridge: Cambridge University Press).
Quantz, J. J., [1752] 2001: On Playing the Flute [Versuch einer Anweisung, die Flöte traver-
siere zu spielen], trans. E. R. Reilly (London: Faber & Faber).
Malcolm Bilson, fortepianist

Defining musical shape

In a casual conversation in 2001 a very famous pianist asked me, ‘Why is there
no dot on the upbeat to the first movement of the Beethoven Piano Sonata in
F minor, Opus 2 No. 1’ (Figure R.24). I was taken aback that anyone could
ask such a question, as every eighteenth-​century source clearly states that all
upbeats are short and light unless otherwise marked. One doesn’t put an expres-
sive marking on notes that are akin to articles in speech (the, an, of, by, etc.).
I then listened to some twelve recordings by world-​famous artists of the last
decades and was astonished to find that almost none played the note short or
light. Most played a heavy, long, even slurred upbeat (clearly not indicated by
the composer). Astonishing though it may seem, very little instruction if any is
given in conservatories around the world concerning the most basic expressive
devices used by composers: How long is a crotchet to be held that has no mark
of any kind (a slur, a tenuto)? What is the meaning of a slur? (Mozart never
wrote a sketch without indicating the slurs—​they are the real soul of the music,
realized through the notes.) I made a video on these subjects called Knowing the
Score, which was released in 2006.1
In addition to the more basic questions of notation, one of the topics touched
on in Knowing the Score was performance information that can be gleaned from
composers who have recorded their own works (Bartók, Prokofiev, Elgar and
others). But rather than listening to the recording to see how the composer inter-
prets the score, we can assume that the score represents what these composers
heard, hence: How did they write it down? One of the recorded examples featured
in Knowing the Score was Sergei Prokofiev playing his little Gavotte, Op. 32 No.
3, in what is generally considered a personal, highly idio­syncratic manner. I made
the claim that if we know what a Gavotte is, and follow Prokofiev’s markings
carefully, his rendition will be clearly revealed in his notation. (See Video 1 at .)
Reflection: Malcolm Bilson 249

FIGURE R.24   Beethoven, Piano Sonata in F minor, Op. 2 No. 1, first movement, bars 1–9

Proper realization of rhythmic conventions is essential for revealing the char-

acter of any genre of music. It is obvious that no notation will be completely
up to the job, and good rendition will be impossible without prior acquaintance
with the particular idiom. For this the advent of recording represented a major
revolution (for example, Bartók recording folk music in Romania, New Orleans
jazz reaching millions in the United States and Europe, and so on). François
Couperin’s notes inégales are meticulously described by the composer, but imag-
ine trying to describe the jazz rhythms of the early twentieth century rather
than hearing their transmission by recording. One cannot help but wonder if
Couperin would recognize many of today’s realizations of his descriptions.
Yet at the same time Prokofiev, as we have shown, gives a remarkable amount
of quite detailed performance information, and the sources in the late eigh-
teenth and early nineteenth centuries convey far more precise details on proper
execution of the expressive markings of the period than is normally realized.
Since rhythmic notation is limited (there is no notational possibility between a
crotchet and a quaver, for instance) it is generally assumed, erroneously in my
opinion, that flexible inflection can at best be only implied.
The first task of any serious musician is to determine the character and par-
ticular idiom of the work in question. Carl Philipp Emanuel Bach tells us, in
his chapter on Vortrag (Performance), ‘What comprises good performance? In
nothing other than the ability through playing or singing to make the ear sen-
sible of the true content and affect of musical thoughts. By altering these one
can change the ear to such an extent that one will hardly recognize it as the
same thought.’ Bach continues: ‘The subject matter of performance is the loud-
ness and softness of tones, touch, the snap, legato and staccato execution, the
vibrato, arpeggiation, the holding of tones, the retard and accelerando. Lack
of these elements or inept use of them makes a poor performance’ (Bach [1753,
1762] 1949: 148). And Daniel Gottlob Türk, in his Klavierschule, divides his
chapter on Vortrag into two sections: Ausdruck (Expression) and Ausführung
(Execution; Türk [1789] 1967: 347–​65). The meaning of any passage of music,
therefore, is revealed through its execution, and the tutors of Bach, Türk, Leopold
250 Music and Shape

Mozart and others of the time instruct us how to realize the performance indi-
cations in the scores of the time in order to play in the kind of inflected manner
we observed in the Prokofiev example. There is no music anywhere in the world,
from the simplest folk tunes to the most sophisticated art music, that is played
in an even, uninflected manner, yet such a manner of playing is often accepted
and even cultivated today, as evinced by many recordings of important artists.
I was on the jury of the Leeds International Pianoforte Competition in
2000. What I heard was a phenomenal level of piano playing, and I was often
moved by very beautiful and insightful playing in a variety of repertoires. But
there was no Mozart or Haydn that came even close to the beauties I associate
with that music; it was all smooth and even, virtually uninflected. One work we
heard five or six times was the Haydn Sonata C Major, Hob. 50. In Video 2
you can hear the first few bars performed by the fine Hungarian pianist Dezsö
Ránki. I chose his performance as emblematic of what I heard several times in
Leeds. His performance is by no means unmusical, and represents beautifully
a typical rendition of this score. But in the video I look at the detailed perfor-
mance indications in the score to demonstrate that Haydn’s expressive mark-
ings are at least as important for revealing the musical thoughts as the notes,
yet in Ránki’s, as in most performances today, they are simply glided over in
a smooth, uninflected manner. Musical shape—​inflected, passionate and flex-
ible—​is inherent in this music, and it is my belief that it can be regained by a
clear understanding of the expressive marks ubiquitous in music of the late
eighteenth and early nineteenth centuries that are today misunderstood or sim-
ply neglected. No one applauds more than I the wonderful new scholarly edi-
tions appearing, giving us access to every little aspect of Mozart’s or Schubert’s
notation. There are now at least seven so-​called Urtext editions of the Mozart
Piano Sonatas; but are any of his clear articulations, the essence of his musical
language, being taught in those music schools and conservatories that insist on
their use?
Musical shape is defined by properly inflected realizations of rhythmic
motives as we have shown in this short Haydn example. But two further aspects
of time in musical performance are equally important: tempo fluctuation and
tempo rubato. I am often told that in Beethoven’s music no tempo fluctuation
is allowed, that one must keep a strict beat. Not only is there no basis for this
widely held assumption, but we know that Beethoven changed tempo a great
deal, and indeed much in his music virtually demands it. And tempo rubato,
prized by Caccini in the seventeenth century, Mozart in the eighteenth and
Chopin in the nineteenth, involves independence between a steady accompani-
ment and a freely flowing upper voice, be it violin, the voice or the right hand
at a keyboard. This feature as well is generally discouraged in today’s conser-
vatories and music schools, yet is often the very soul of moving performances
heard on earlier recordings.2
Reflection: Malcolm Bilson 251


This Reflection derives from a lecture given at the Liszt Academy, Budapest, on
4 March 2014. A video of the lecture is available from the companion website


Bach, C. P. E., [1753, 1762] 1949: Essay on the True Art of Playing Keyboard Instruments, trans.
W. Mitchell (New York: Norton).
Türk, D. G., [1789] 1967: Klavierschule, facsimile reprint (Kassel: Bärenreiter), pp. 347–​65.

Shaping popular music

Alinka E. Greasley and Helen M. Prior

Much of the research concerning musical shaping in performance has focused

on the traditions of western classical music. Although this has increased our
understanding of musical shaping, it is questionable whether all the findings
may be directly applicable to western popular music or whether this broad
genre may engender other conceptions of the notion of shape or musical shap-
ing in performance. There are fundamental similarities between the musical
practices of popular and classical western musicians, such as musical materials,
instruments, and processes of collaboration and collective creativity, but there
are also many differences, one of which is the greater prominence of electronic
technology in popular music (Théberge 1997). This chapter investigates notions
of musical shaping from the perspectives of popular musicians performing
with a variety of purposes in mind. First, we discuss performers’ perspectives
on musical shape in live performance, drawing on evidence from popular musi-
cians who responded to a questionnaire study on musical shaping (Prior 2010,
2012b) and on work in the popular music field. Second, we examine the roles of
performer, producer and technology in shaping music in the recording studio,
drawing on existing literature in the field, which includes accounts provided by
professional popular musicians and music producers (Bayley 2010; Blake 2009;
Frith and Zagorski-​Thomas 2012; Negus 1992; Théberge 2001; Toynbee 2000).
This includes an investigation of how popular music recordings are shaped by
recording techniques and technological practices more broadly, drawing on
the work of authors such as Katz (2004), Théberge (1989, 2001) and Warner
(2003), among others. We then discuss the ways in which popular music record-
ings are used in performance, with a focus on the perspectives of DJs (disc-​
jockeys) using the idea of musical shaping in their work (Greasley and Prior
2013). A final section summarizes the varied notions of musical shaping that
arise from these different perspectives and explores their implications, as well as
Shaping popular music 253

the limitations of examining within a single chapter a flexible and widely applic­
able metaphor such as shape in a genre as diverse as popular music.


The term ‘popular music’ has been so widely used and defined that it is essen-
tial to begin with a brief discussion of the scope of the term as it pertains to
our work. In this chapter we are referring to popular music in contemporary
Britain, Europe and North America, mainly because most of the research to
date has been carried out in these contexts. In distinguishing between folk, art
and popular music within western culture, Philip Tagg (1982) observes that
popular music tends to be produced and transmitted primarily by professional
musicians; is mass distributed mainly through recorded sound;1 is a commod-
ity in an industrialized society; and tends to name composers or authors. Tagg
also notes the general lack of written theory and aesthetics, though this has
since developed (Bennett, Shank and Toynbee 2006; Brabazon 2012; Frith and
Goodwin 1990; Moore 2001; Negus 1996; Scott 2009). A useful definition of
popular music, and one that we will be adopting in the current chapter, is pro-
vided by Shuker (2013: 6):
In sum, only the most general definition can be offered under the general
umbrella category of ‘popular music’. Essentially, it consists of a hybrid
of musical traditions, styles and influences, with the only common ele-
ment being that the music is characterised by a strong rhythmical compo-
nent and generally, but not exclusively, relies on electronic amplification.
Indeed, a purely musical definition is insufficient, since a central charac-
teristic of popular music is a socioeconomic one: its mass production for
a mass, still predominantly youth-​oriented market. At the same time, of
course, it is an economic product that is invested with ideological signifi-
cance by many of its consumers.

There is a common historical tendency to snub popular music (Middleton

1995), which may explain why popular musicians have responded with writings
with titles beginning with ‘The Art of . . .’, encompassing topics such as record
or music production (Burgess 2001; Frith and Zagorski-​Thomas 2012; Gibson
2005; Moylan 2007), sound engineering (Horning 2004; Zak 2009) and DJing
(Broughton and Brewster 2002; Katz 2012). Shuker (2013) argues that there is
an ideological tension between the essential creativity of the process of making
popular music and its commercial nature, but most commentators agree that
considerable skill is required by all contributing parties for commercial success.
What is highlighted by these titles is not only the array of technology used in
the production of popular music (Théberge 1997), but also the number of peo-
ple and variety of skills required for success in popular music (McIntyre 2012).
254 Music and Shape

These two features are highlighted throughout in relation to their implications

for shaping popular music.
It may be pertinent at this point to describe the lines of enquiry that we do
not pursue in this chapter. First, there is the question of the extent to which
music videos shape the listener or viewer’s perception of the music. Music
­videos may consist of a mixture of small-​scale film-​like narrative, videoed musi-
cal performance and dance, while functioning both as music advertising and as
a product (Brabazon 2012). Cohen (2009) notes the strong effects of music
on various aspects of the perception and cognition of film (see also Reuben’s
and Mitchell’s Reflections later in this volume); however, fewer studies explore
the effects of watching a film on perception, cognition and other responses to
music (Boltz 2013). Various studies in musical performance have highlighted
the importance of visual information for the judgement of performance qual-
ity (Davidson 2006; Griffiths 2010; Tsay 2013) and for emotional responses to
music (Krahé, Hahn and Whitney 2015). Research on dance has concentrated
mainly on perceived congruency between dance and music, focusing on per-
formance art for the stage, rather than on a social dance situation or dance
within music videos (Cohen 2009). This combination of a scarcity of directly
related research and more abundant results from tangential areas suggests that
the study of the perception of popular music videos seems potentially fruit-
ful for future research, as does the study of live reenactments of music video
imagery in live performance (with either a prerecorded soundtrack or a live
musical performance; Kooijman 2006, Burns 2006) and also the perception of
visual turntablism (Brabazon 2012). However, there is not sufficient scope to
explore music videos fully here. While we consider the role of body movement
in performance briefly below, we do not consider how visual aspects of popular
music (in live performance or music videos) may contribute to understandings
of the notion of musical shaping in performance; for this the reader is referred
to Tan et al. (2013), in particular the chapter by Boltz exploring music videos
and visual influences on music perception and appreciation.
Secondly, within this chapter, we make generalizations about the genre of
popular music, despite the fact that research has shown that people are able to
categorize music at a very fine-​grained level. In Greasley, Lamont and Sloboda’s
(2013) study, as few as twenty-​three participants discussed more than 220
genres when talking about their musical preferences (for example ‘Rock’, as a
subgenre of popular music, was described as ‘Rock’, ‘Rock ‘n’ Roll’, ‘American
Rock’, ‘Christian Rock’, ‘Classic 60s Rock’, ‘Classic Rock’, ‘Funky Rock’,
‘Heavy Rock’, ‘Punk Rock’ and so on). In contrast, Prior’s (2010, 2012b) survey
research (described below) was carried out with popular musicians who per-
formed in around twenty popular music genres (e.g. ‘Rock’, ‘Jazz/​blues’, ‘Pop’;
see Table 8.1). The conclusions we draw from their accounts are not genre-​
specific (i.e. they are grouped under the broader umbrella of popular music),
partly because most of the musicians reported that they performed within more
Shaping popular music 255

TABLE 8.1   Number of popular musicians in Prior (2012b) who played each

genre of music

Genre Number of Participants

Musical theatre 3
Jazz/​blues 13
Pop 9
Rock/​metal 5
Country/​folk/​gospel 3
Urban (hip hop, soul, RnB, etc.), dance/​electronic 3
(house, techno, electronica, etc.)
Contemporary/​experimental 4
World 4
Crossover 3

than one genre. Greasley et al. (2013) also highlighted difficulties in definition
because of crossover in musical styles (e.g. ‘Folk Rock’, ‘Country Rock’, ‘Jazz
Rock’), which is why Shuker’s (2013) broad definition of popular music as con-
sisting of a hybrid of musical traditions, styles and influences is useful here.
Readers are invited to draw conclusions from the arguments we present that are
appropriate for the genres and subgenres in which they specialize.
We also make generalizations about the roles played by a number of contrib-
utors (e.g. performer, producer) in shaping popular music. Frith and Zagorski-​
Thomas (2012: 5–​6) note that ‘deciding who is responsible for what in the studio
is still a matter of record-​by-​record investigation (much of which remains to be
done) rather than, for example, genre generalization’. This can also be applied
to live music-​making in popular music, with its frequent use of technology and
concomitant expert personnel. We aim here to present the potential for each
contributor to shape the music, rather than analysing existing recordings or
presenting any kind of blueprint for musical success.

The performer’s role in shaping music in live performance

In this section, we explore the performer’s role in shaping music in live per-
formances, drawing on evidence from popular musicians who responded to a
questionnaire study on musical shaping (Prior 2010, 2012b) and on work in the
popular music field.
Prior’s (2010, 2012b) questionnaire study provides insights into the use of
musical shape or shaping by musicians from a relatively broad range of back-
grounds. More than two hundred participants completed a mixed-​response
questionnaire, which sought to establish some of the meanings and contexts
in which the idea of musical shape is used by musical performers. Participants
were asked about their musical background (e.g. main instrument, music
256 Music and Shape

categories in which they performed),2 questions concerning whether and how

they used shape in thinking or talking about how to perform music, and about
any links between music and shape that participants could describe. They were
also asked to rate their agreement with a series of fifty statements that had
been developed through reference to existing written quotations from musi-
cians using the idea of shape (see Prior 2010).
In order to extract popular musicians’ responses for the purposes of this
chapter, we collapsed the music categories to form two main (albeit simplis-
tic) categories of ‘classical’ (orchestral, choral and chamber music, as well as
opera) and ‘nonclassical’ (music theatre, jazz, popular, world, folk and cross-
over) music. On the basis of this divide, around 60 per cent of the sample
performed music within the classical genre exclusively, 10 per cent performed
within the popular music genre exclusively, and around 30 per cent performed
music from both broad genres. General trends in the data from the fifty state-
ments suggested that musicians who played nonclassical music (both exclu-
sively and as well as classical music) gave slightly less positive responses to the
idea of using the notion of shape in rehearsals with others or in informal dis-
cussions with other musicians. They also gave slightly less positive responses to
the idea of musical shape following the melodic line of the music, and to the
idea of musical shape moving from left to right, something that might reflect
the lower dependence on the musical score in nonclassical traditions (Prior
2012b). Nonetheless, some of the responses to the more open-​ended questions
merit further attention and provide insight into the use of the notion of shape
by popular musicians.
Twenty-​five respondents provided specific examples of using shape when
thinking or talking about popular music, and their responses are explored
below using a conventional qualitative content analysis (Hsieh and Shannon
2005). A brief examination of some of the musical and demographic data of
these twenty-​five participants provides some context for their responses. The
most commonly named main instrument was the voice (N = 5), followed by
guitar (N = 4) and piano (N = 4), and double bass (N = 2), trombone (N = 2)
and violin (N = 2). Other participants played the clarinet, euphonium, per-
cussion, saxophone or turntables, or conducted. Twenty-​one had more than
ten years’ experience of playing their instrument (though only four partici-
pants had more than forty years’ experience), and nineteen described them-
selves as performers of a professional standard. Table 8.1 shows the numbers
of participants who reported playing each genre of music. This is intended
to indicate the breadth of experience of the sample group rather than provid-
ing the means to identify trends within subgroups of participants, especially
as participants commonly selected more than one genre. Most commonly,
participants reported playing jazz or blues, pop, or rock or metal music.
Fourteen participants were from the UK and four from other English-​speak-
ing countries. Six participants were from other European countries and one
was from Malaysia.
Shaping popular music 257

There seemed to be three main ways in which the idea of musical shape was
used by these musicians. First, a few performers discussed their use of shapes
and images to overcome technical difficulties. Several singers described how
shape was helpful for themselves and their students in achieving the correct
pitch, tone colour and expression:

Teaching someone with pitching issues to create a sense of how to find

pitch and sing through the notes. The student has used images of shapes
in order to overcome his problems pitching—​it has proved very successful
for him. (Professional singer and teacher)

It was taught to me that when visualising your voice you should think of
it as a shepherd’s hook—​that it runs from your diaphragm, up through
your body, into your mind, resonating behind the nose, and out through
your mouth. (Professional-​standard singer)

I always explain the use of the voice to my students with . . . pictures of

shapes. For a beginner it is usually difficult to sing high notes. What often
helps them is imagining the tone as an arch that is streaming out of the top
of their heads, like a rainbow. I also explain the process of breathing and
singing as a circle that should not be disrupted. . . When I sing I always
produce pictures in my mind to achieve a certain sound, tone quality or
emotion. . . Low, warm tones, the ones that are used in Jazz Ballads I
always see as dark blue bubbles or circles . . . high very powerful tones that
are used often in Pop music, but also funk, often look like bright yellow or
red triangles or just lines. (Professional singer and teacher)

An amateur guitarist also discussed the ‘big, round shape’ of the ‘expansive’
timbre he was trying to achieve. A similar technical approach was described by
a guitarist and a pianist. The guitarist described how he would visualize chords
as ‘shapes on the fretboard’ (professional-​standard guitarist). The pianist took
this idea further:

While improvising over the tune, I  imagine the chords not as abstract
notions, but like architectures—​and my movement from one to another
involves drawing different shapes which I select as I go along. (Professional-​
standard pianist)

This idea of shapes being formed over time by changing musical features such
as chords or melodic patterns leads to the second way in which the idea of
musical shape was used, that is, in reference to a musical structure or trajec-
tory. This was a more common idea, cited by ten of the participants and often
discussed in relation to composition or improvisation:

Shape would have been used to talk about a large scale structural/​expres-
sive trajectory which can help to guide an improvisation. (Professional-​
standard pianist)
258 Music and Shape

Concentrating on the placement of one or more musical ideas and using

space/​duration to create contrast between them. A piece that has ‘shape’
could be said to arise through this process. (Professional pianist)
(1) thought about the shape of my solo, started with short phrases with
repetitive rhythm then extended the phrases, and (2) thought about [the]
form of the piece (AABC), where to place solos, how many solos, whether
to have a ‘rhythm only’ chorus, how to end the piece. (Amateur saxophonist)

The idea was also discussed on a larger scale, in relation to the choice of music
over a whole performance:
So each piece had a different shape, to give variety to the performance.
Not just tutti, then individual jazz breaks, then chorus. We tried to vary
the structure and shape of the programme. (Professional double bassist)
We were discussing which track to open our set with, given the style of
the DJ who would be playing before us. We wanted to find an opening
track that would work well after the previous DJ and be significantly
different but also energetic enough not to clear the dance floor. We dis-
cussed it in terms of energy level, often using contour metaphors, which
are absolutely central to DJing (peaks and troughs, building it up, taking
it down). (Amateur DJ)

The way in which a set is compiled can create a very powerful performance, as
exemplified by Fast’s (2006) description of Queen’s performance at Live Aid,
in which the group abbreviated many of their most popular hits to create a
fifteen-​minute act with a powerful emotional trajectory designed to enthuse the
audience and thereby raise as much money as possible.
The latter comments made by the questionnaire respondents form a useful
introduction to the third main way in which musical shaping was discussed,
that is, in relation to musical expression. This was discussed on a variety of
scales, in relation to both the whole piece and the shaping of individual phrases.
Sometimes these ideas were discussed in specific technical terms, with refer-
ence to phrasing and breathing, dynamics and tempo fluctuations, all of which
might vary according to the acoustic of the performance space:
How to craft phrases, the beginnings and ends of phrases, the swell of
dynamics, minute tempo changes bar to bar. (Professional-​ standard
euphonium player)
Tried to shape the line as I heard the song, giving breaks for breath at
what felt like natural points in the line and continuing through places that
needed a sense of continuation and flow. (Professional trombone player)
Shaping the music, rather like a sentence in poetry. Use of dynam-
ics to highlight the phrase. Reacting to new and unfamiliar acoustics.
(Professional-​standard percussionist, in this instance conducting a choir)
Shaping popular music 259

Other participants would discuss this expressive shaping in a more metaphor-

ical way, in terms of contrast and shade, energy or climaxes within the music:
My thinking about ‘shape’ was used more in an overall way, i.e. how do I
represent contrast and shade within a song that contains light and dark
images, as well as scenes (e.g. ‘roaring traffic boom’, ‘silence of a lonely
room’). (Amateur singer)
The idea was to a high energy beginning, dropping down pretty
quickly, then slowly building to climax. Sort of a backwards N shape.
(Professional-​standard pianist)

These three ideas—​technical shapes, shape as a formal or structural trajectory,

and shaping as expression—​formed the majority of participants’ responses,
though one other participant mentioned gesture as a component of musical
shape. These are all ideas that were also discussed by classical musicians in
relation to musical shaping, both in this questionnaire study (Prior 2012b) and
in later interview studies (Prior 2012a). Indeed, there appears to be little dif-
ference between the ideas of these popular musicians and those of classical
musicians completing the same questionnaire, as revealed by the qualitative
responses. Although some of the quantitative responses to the questionnaire
may have indicated a different emphasis in popular musicians’ understanding
of musical shape, their qualitative responses are not remarkably unlike those of
the classical musicians. The interview study of classical musicians (Prior 2012a)
allowed a more in-​depth study of musical shaping in a practical context and,
as a result of this, revealed some more sophisticated ideas surrounding musical
shaping. Not only were technical and expressive ideas discussed, but it became
apparent from participants’ responses that they were using shape-​related ideas
heuristically, that is, using nonspecific (often metaphorical) terms as short-​cuts
for complex technical ideas (Leech-​Wilkinson and Prior 2014). It also became
evident that participants had a multimodal understanding of musical shape
that could be expressed verbally, through musical sound or through gesture,
and that it was possible for participants to feel that their musical shaping was
closely intertwined with their identity. The short quotations above may not be
of sufficient length and depth to provide conclusive evidence of these ideas,
but the metaphorical, non-​technically specific nature of some of the comments
hints at heuristic thinking; the mention of gesture by one participant may stem
from a multimodal understanding of musical shaping; and the mention of per-
sonality in relation to musical shaping might relate to links between musical
shaping and identity. At the very least, there is evidence that further study of
popular musicians may reveal similar and equally interesting understandings
of musical shaping compared to those used by classical musicians.
The role of technology in shaping popular music practices has been well doc-
umented (Cook et al. 2009; Frith, Straw and Street 2001; Frith and Zagorski-​
Thomas 2012; Gracyk 1996; Katz 2004; Théberge 1997, 2001; Toynbee 2000;
260 Music and Shape

Warner 2003), yet only a small number of respondents referred to their use
of technology in relation to musical shaping. Participants mentioned simple
techniques such as ‘reverb’ and ‘fading out’, as well as the use of previously
recorded performances of improvisations to aid the creation of new improvisa-
tions. The latter is best understood through one participant’s own words. He
described the situation as ‘rehearsing in a duo with a saxophonist I regularly
work with’, and the use of shape as follows:
Listening back to previous recordings to give an idea of where the key
ideas lie, the piece being rehearsed will begin from the key idea and prog-
ress to an end-​point. The duration of these pieces are usually short and the
‘shape’ of such pieces that have been devised from this method are usually
more focused than ones that last longer in duration. (Professional pianist)

Several participants with interests in record production discussed technolog-

ically based shape-​related ideas when asked about ‘other links between music
and shape’, as evidenced by these comments from a singer and a guitarist:
Sound has a shape (electronic and digital, i.e. sound-​waves) and music is
the combination of sounds and silences (among other things), this can
extend to timbre (e.g. shrill, screeching), dynamic variation, phrasing
(legato, staccato, etc. /​technique) and even possible mental images cre-
ated from a sound /​song /​words. (Amateur singer)
Shape for me is simply a handy way to visual[ize] what I hear. Use of
‘shape’ now extends more broadly to the use of software programs such
as Protools where visualizing a recorded performance will not only allow
rapid editing, amongst many other things, but also gives a differing
insight into things like song structure and arrangements—​it also gives a
deep insight into feel or groove. (Professional-​standard guitarist)

The above comments suggest that the availability of computer programs that
display music and sound as waveforms has added a visual element to these
participants’ understandings of musical shaping (see the Reflections by Savage
and by Reuben later in this volume), rather as notation seems to have done for
classical musicians’ conceptualizations of shaping (see Küssner, Chapter 2 of
this volume). Other technology has also had an influence. Two participants spe-
cifically mentioned mixing and equalization:
From a mixing point of view—​EQ-​ing tracks to blend together better
is very often a visual thing, i.e. different instruments’ contour shaped so
they don’t all compete for the same frequency ranges. (Amateur electric
When DJing you manipulate the equalization (EQ) of tracks in order to
make them blend as well as possible, which means thinking in terms of
Shaping popular music 261

frequency space, often on an up/​down or left/​right scale. You also think

in terms of acoustic space—​tracks are produced for different spaces (‘big
room’ or ‘small room’ tracks) and also create different impressions of
acoustic space, or spaces, within themselves. (Amateur DJ)

The limited number of participants mentioning technology in their descrip-

tions of shaping may have been due to the nature of the methodology: ques-
tionnaires generally elicit shorter and less detailed answers than interviews.
Equally, the wording of the questionnaire, with its (albeit deliberate) focus on
performers and performing, may have influenced the types of responses given
by musicians. Had the questionnaire been directed explicitly towards producers
and recording engineers, it is likely that more technologically based conceptions
of musical shaping would have been found. There may also have been other fac-
tors influencing these participants’ responses. Although some described their
use of shape within recording sessions, they perhaps did not mention technol-
ogy because they were focused mostly on their own performance while some-
one else (i.e. the sound engineer) was (usually) dealing with that side of things.
It was surprising, for example, that the singers in Prior’s (2012b) study did
not describe the ways in which they use microphones to achieve certain vocal
effects. Several texts have explored the role of the microphone in popular musi-
cians’ practices, highlighting the extent to which it has influenced vocal style
(Campbell, Greated and Myers 2004; Frith 2001; Greig 2009; Horning 2004;
Théberge 2001). Musicians have built up knowledge of the types of micro-
phones available and how to employ these to help them to achieve particular
expressive goals. For example, pianissimo can be produced not only by singing
more quietly, the addition of more breath than tone and making less use of the
vocal tract, but also by moving the microphone away from the mouth (Greig
2009). Regulation of this distance (holding the microphone away for high, loud
notes; holding closer for quieter, low-​register notes) can lend warmth and grain
to a vocal performance (Barthes 1990; Frith 1981; Théberge 2001), but it also
reveals the intricacies of a vocal performance and thus can highlight flaws as
much as it can highlight richness of tone (Lees 1987; Théberge 2001).
Microphones offer just one example of how popular musicians use tech-
nology in their live performances. Amplification has also changed popular
musicians’ practices (Frith 2001; Théberge 2001). Popular styles such as rock
and heavy metal have adopted extended amplification techniques (e.g. distor-
tion, feedback) which provide musical outputs distinctive to those styles (Poss
1998; Théberge 2001; Walser 1993). An illustration of the use of technologies
for expressive effect in live performance—​from instrumental and vocal tech-
niques to extended amplification—​can be found in Hughes’ (2006) analysis of
Nirvana’s live performance at the University of Washington in 1990. None of
the participants in Prior’s (2010, 2012b) questionnaire study mentioned their
use of amplification techniques.
262 Music and Shape

Other aspects of live performance not mentioned by the popular musicians

were body movement and the audience. Research in the field of music psy-
chology shows that body movement plays a crucial role in the production and
perception of music (Davidson and Malloch 2009), and that performers move
their bodies in identifiably different ways according to expressive intentions
(Davidson 1993). The more highly expressive the piece, the larger and more pro-
nounced the movements (Davidson 1994). Descriptions of live performances
of popular music frequently highlight audience participation such as moving or
singing along to the music, often as an indicator of an audience’s engagement
with and enjoyment of the listening experience (e.g. see Inglis 2006). Some
performers actively encourage this participation through their gestures to the
audience. Fast (2006) suggests that an audience’s physical engagement with a
performance enables the audience to participate in, and thereby feel invested in,
the creation of the performance.
Above, we have discussed popular musicians’ perspectives on musical shap-
ing, and recent literature in the field of popular music, highlighting how per-
formers may use the notion of shape in relation to instrumental techniques and
the technology available to them. There were a number of similarities in the
classical (see Prior, Chapter 7 of this volume) and popular musicians’ ideas of
musical shaping, including the conceptualization of shape as relating to struc-
ture and musical expression, and as a means of working through technical dif-
ficulties. The next section explores musical practices in the recording studio.
Arguably, the powerful and intractable element of ‘liveness’ that is present in
live performance (see Fast 2006) is lost there; however, in popular music, in
particular, recording processes have the potential to generate a finished prod-
uct that surpasses the possibilities of live performance through the combined
creative input of the performer, sound engineer and producer and their use of
technology in the studio.

The roles of performer, producer and technology in shaping music

in the recording studio

The demands of the recording studio with its concomitant customized environ-
ment and lack of audience require of performers a different understanding of
musical performance compared to their usual live performance situation (Blake
2009; Gander 2011; Horning 2012; Pras and Guastavino 2011; Williams 2012;
Zak 2009). While studies in the popular field typically present producers as hav-
ing most (or in some cases all) of the control over the finished musical product,
performers have responded creatively to both the technical restrictions (most of
which are now historical) and the opportunities afforded by the studio environ-
ment, and have generated new performance techniques as a result (Doğantan-​
Dack 2008; Cook et al. 2009; Frith and Zagorski-​Thomas 2012). Many of the
Shaping popular music 263

means of musical shaping at the disposal of musicians in live performance (if

we assume similar perspectives on ‘shaping’ as those in Prior’s Chapter 7, such
as the use of instrumental technique to bring out a particular emotional tone)
are of course available to them in a studio setting. There is, however, further
potential for performers to modify the sounds they are producing (in conjunc-
tion with engineers and producers) to create an experience for the listener that
goes beyond the possibilities afforded by live performance (Kania 2008).
The studio environment, with its more o ​ r l​ess controlled acoustics, has the
potential to influence decisions made by performing musicians. For example, in
the same way that (classical) music performers in a live setting will adjust their
instrumental techniques to reflect the acoustics of the room in which they are
playing (see H9.13 in Table 7.H9, available online ) for popular musicians
too, sound quality and corresponding perceptions that influence their musical
decisions will vary with the recording space (Gander 2011). Williams’ (2012)
ethnographic research into recording studio practices (carried out from the per-
spective of a musician, engineer and producer) demonstrates that even the use
of technology as seemingly straightforward as headphones can have far-​reach-
ing consequences for the social and musical interactions taking place between
musicians performing together and between those musicians and engineers;
this in turn influences the creative process and aspects of the final recording.
In the studio environment, performers relinquish some of their control to
other personnel in the studio (e.g. sound engineer, producer). Music producers
aim to work with performers to overcome possible reluctance or insecurity,
challenge them artistically and steer them to reach for an imaginary world of
sound where technology emotionally enhances the original artist’s vision and
performance, instead of compromising it (Frank Duchene, personal commu-
nication). While this is a somewhat idealized view of the producer’s role, it
acknowledges the considerable interpersonal skills the producer needs in order
to work effectively with artists, as recent studies have shown (Bielmeier 2013;
Davis and Parker 2013). In many cases, performers and producers work col-
laboratively, with discussions leading to the modification of sounds and the
selection of performances for the final recording. However, sometimes produc-
ers conceal their decisions—​such as modifying the mix sent to the musicians’
headphones without their knowledge (see Gander 2011: 149–​53)—​and there
are often tensions about relative contribution (Blake 2009). The work of musi-
cian Miles Davis and producer Teo Macero is a good example of the latter.
Davis described some of the thinking behind his studio recordings in his auto-
biography (Davis with Troupe 1989, in Brackett 2009). According to Davis, the
complexities of the arrangements of the improvisations on the album Bitches
Brew were determined by him with the musicians, without the influence of the
producer. This suggests a dominant role of the performers in the musical out-
come. However, the producer’s contribution is undocumented and simplified
by Davis. Other accounts (e.g. Blake 2009; Szwed 2002) suggest the process
264 Music and Shape

between Macero and Davis had been a great deal more collaborative—​that the
pair had listened to the many hours of takes together in order to make deci-
sions about which sections to edit, splice and cut in the production of the final
Technological advances drive creative practices in the studio (Théberge 1989,
2001; Warner 2003): the microphone offers a crucial example (Horning 2002,
2004). Horning (2004) maintains that ‘the art of microphoning’ (see Canby 1956)
is a skill which evolved as a natural progression from the recording engineer’s
placing of performers before the acoustical recording horn, and one which is
acquired tacitly—​by recording engineers and performers alike—​through experi-
ence. It is a skill that can be used to achieve unique musical outcomes (Moorefield
2005), and that in some hands has been likened to ‘a painter mixing colours from
a palette’ (Horning 2002: 710). The increased role and responsibility of the engi-
neer for achieving musical balance through careful placement of microphones
led to the development of the multitrack studio, which was instrumental in the
development of popular and rock styles through the potential it offers for the
control and layering of sounds (Théberge 1989, 2001). Multitrack recording
was first used in popular music in the 1950s and is characterized by the separate
recording of multiple sound sources to a number of audio channels to create a
recording. This allows engineers to examine intricate details of timing and tuning
(Blake 2009; Frith and Zagorski-​Thomas 2012) as well as the broader perspective
of the musical sound (Zak 2009), an approach which has again led to produc-
ers likening their role to that of an artist painting (Phil Harding, in Frith and
Zagorski-​Thomas 2012). Such technology has increased the control of producers
over the recorded sounds, not only because of the detailed level at which they are
able to work, but also because of the necessary separation ‘of the artists from
each other, separation of their performances, and further a separation of the
artists from their song and even their performance’ (Gander 2011: 132). Gander
argues that this empowers the producer to make musical decisions.
Other technological advances have been seen in sampling and computer-​
based sequencing, including signal processing, Musical Instrument Digital
Interface (MIDI) sequencing, and sound synthesis (Blake 2009; Katz 2004;
Théberge 2001; Warner 2003). Signal processing enables producers to add spe-
cial effects (e.g. reverb, delay, chorus, flange, compression) to tracks; MIDI
sequencers facilitate enhanced control over layering of sounds; and digital sam-
pling enables the manipulation of sound in a variety of ways down to the finest
detail without any discernible loss of sound quality (Goodwin 1990). The level
of control of the sound that these technologies afford provides many creative
opportunities: one only needs to think of the ‘Amen’ break—​a four-​bar drum
solo performed by Gregory Coleman in the 1960s song Amen, Brother which
has been used extensively in a range of electronic music styles such as break-
beat, hip-​hop, hardcore, jungle, and drum and bass (Butler 2006)—​to realize
the potential for the use of samples in creating new records. Some authors,
Shaping popular music 265

however, have noted the lack of expressive shaping in MIDI-​sequenced music

and the effect this can have on co-​performers, who try to imitate the precision
of the sequenced sound at the expense of expressive gesture (Warner 2003).
Nonetheless, technological advances have afforded performers, producers and
engineers greater control over the sounds they are producing.
A common theme throughout the literature is the conceptualization of the
‘studio as instrument’ or ‘studio as creative tool’ (Blake 2009; Hennion 1989;
Horning 2012; Zak 2009), with accounts of the day-​to-​day activities of produc-
ers, recording artists and musical directors generating a strong sense of explor­
ation and experimentation (Blake 2009; Hennion 1989; Thompson 2010; Zak
2009). In some cases this seems to lead to new performance practices in live per-
formance (Blake 2009). What constitutes a ‘recording studio’ has changed over
time (in response to technological advances) such that there is now a blurred
distinction between specific professional or commercial studio locations and
the home studio environment (Théberge 2012). Advances in digital technology
mean that records can now be compiled using different studios in remote loca-
tions (sometimes across the other side of the world), reducing the social nature of
musical practices (e.g. social interaction, personal exchanges, communication)
between musicians and producers (see Negus 1992). The increasing prevalence
of long-​distance online collaboration and growth of PC-​based music produc-
tion has encouraged the development of remix sites through which musicians
share multitrack files and edit and discuss one another’s mixes (Théberge 2012).
This, Théberge notes, means that the artist’s mix becomes just one version of
the music, and that ‘fragments of music flow through a series of multivalent
exchanges only coming to completion when, and if, the participants decide to
bring the process to an end’ (ibid.: 87). Drawing on examples from a number
of popular genres, Moorefield (2010) outlines how multitrack recording tech-
niques facilitate remixing and ‘mash-​up’ practices wherein elements of tracks
are reordered, rebalanced and recontextualized. Goodwin (1990: 271) argues
that digital samplers have played a key role in remixes because the differing
length of sounds that can be stored can ‘be used to manipulate, extend, and/​or
condense the structure of a song, as well as its texture, arrangement and tim-
bre’. MARRS’s ‘Pump up the Volume’ (constructed using thirty samples from
other records) and The Avalanches’ ‘Frontier Psychiatrist’ (almost entirely con-
structed of samples from other records) are good examples of this.
The roles of the performer, producer and technology in the recording studio
are not easily separable:3 the construction of a popular music recording is a
collaborative process between artists, engineers and producers, and the tech-
nological equipment they use. As Phil Ramone (quoted in Massey 2000: 50)
asserts, ‘you, as the engineer, have to share in the painting with the artist’. There
is agreement that producers need significant interpersonal and leadership skills
to manage the production process (Jarrett 2012; Mike Howlett, in Frith and
Zagorski-​Thomas 2012), and tensions that may arise can be found in accounts
266 Music and Shape

given by musicians and producers throughout the literature. Moreover, the bal-
ance of performer, producer and technology will change with every record-
ing, not just as a result of a particular set of personalities involved. Björk, for
example, is said to have described her albums Post and Debut as ‘collections
of duets with the producers who had inspired her: Nellee Hooper, 808 State’s
Graham Massey, Tricky, Howie B.’ (Jonathan Van Meter, in Brackett 2009:
522). In contrast, she states that her later album Homogenic ‘is more like one
flavour. Me in one state of mind. One period of obsessions. That’s why I called
it Homogenic’ (ibid.).
The discussion concerning record producers’ shaping of popular music above
clearly simplifies the myriad influences on the producers themselves. Zagorski-​
Thomas (2012) highlights the complex social, commercial and economic fac-
tors contributing to both the availability and the use of recording spaces and
technologies by record producers that can be perceived in the sounds of the
records they produced. The separation of shaping by performer(s) and producer
is somewhat artificial: often, the collaboration between the two is sufficiently
close that separation of the decision-​making process is impossible. This situa-
tion is compounded when the performer becomes a sound engineer or producer