Sei sulla pagina 1di 450

8th International Conference 

on New Interfaces for Musical Expression

Casa Paganini – InfoMus Lab

Genova, Italy
June 4 – 8, 2008


Antonio Camurri, Stefania Serafin, and Gualtiero Volpe


Printed by: BETAGRAFICA scrl

In collaboration with
Facoltà di Lettere e Filosofia, Università degli Studi di Genova
Conservatorio di Musica “Niccolò Paganini”
Museo d’Arte Contemporanea Villa Croce
Ufficio Paganini, Comune di Genova
Polo del Mediterraneo per l’Arte, la Musica e lo Spettacolo
Accademia Ligustica di Belle Arti
Casa della Musica
GOG − Giovine Orchestra Genovese
AIMI (Associazione di Informatica Musicale Italiana)
Centro Italiano Studi Skrjabiniani
Goethe−Institut Genua
Fondazione Bogliasco
Fondazione Spinola
Associazione Amici di Paganini
Festival della Scienza
Radio Babboleo

With the support of

Regione Liguria
Comune di Genova
Provincia di Genova

EU ICT Project SAME (

EU Culture 2007 Project CoMeDiA (

Sipra SpA
Sugar Srl

NIME 08 Committees

Conference Chairs Alvaro Barbosa Sergi Jorda

Antonio Camurri Marc Battier Martin Kaltenbrunner
Gualtiero Volpe Frauke Behrendt Spencer Kiser
Kirsty Beilharz
Program Chair Benjamin Knapp
Edgar Berdahl
Stefania Serafin Juraj Kojs
Tina Blaine
Performance Chair Sinan Bokesoy Eric Lee
Roberto Doati Niels Bottcher Jonathan F. Lee
Eoin Brazil Paul Lehrman
Installation and Roberto Bresin
Demo Chair Michael Lew
Andrew Brouse Eric Lyon
Corrado Canepa Nick Bryan-Kinns
Michael Lyons
Club NIME Chair Claude Cadoz
Arthur Clay Thor Magnusson
Donald Glowinski
Ted Coffey Joseph Malloc
David Cournapeau Eduardo Miranda
NIME Steering Langdon Crawford Thomas Moeslund
Committee Smilen Dimitrov Katherine Moriwaki
Frédéric Bevilacqua Gerhard Eckel Teresa M. Nakra
Tina Blaine Georg Essl
Kazushi Nishimoto
Michael Lyons Stuart Favilla
Rolf Nordahl
Sile O'Modhrain Sidney Fels
Dan Overholt
Yoichi Nagashima Mikael Fernstrom
Garth Paine
Joe Paradiso Federico Fontana
Jyri Pakarinen
Carol Parkinson Alexandre R.J. Francois
Sandra Pauletto
Norbert Schnell Jason Freeman
Cornelius Pöpel
Eric Singer Ichiro Fujinaga
Robert Rowe
Atau Tanaka Lalya Gaye
Joran Rudi
Steven Gelineck
Margaret Schedel
Paper and Poster David Gerhard
Greg Schiemer
Committee Jeff Gray
Andrew Schloss
Michael Gurevich
Hugo Solis
Meta-reviewers Keith Hamel
Christa Sommerer
Kia Ng Kjetil Falkenberg
Hans-Christoph Steiner
Sile O'Modhrain Hansen
Matthew Suttor
Stefania Serafin David Hindman
George Tzanetakis
Bill Verplank Andy Hunt
Carr Wilkerson
Marcelo Wanderley Robert Huott
Matthew Wright
Ge Wang Alex Jaimes
Tomoko Yonezawa
Reviewers Jordi Janer
Diana Young
Anders-Petter Alexander Refsum
Michael F. Zbyszynski
Andersson Jensenius

Performance Sandra Solimano Laura Santini
Committee Atau Tanaka Olivier Villon
Miguel Azguime
Andreas Breitscheid Demo Committee Organizing Committee
Pascal Decroupet Alain Crevoisier Corrado Canepa
Michael Edwards Sofia Dahl Francesca Cavallero
Neil Leonard Amalia De Gotzen Roberto Doati
Michelangelo Lupone Emmanuel Flety Nicola Ferrari
Pietro Polotti Matija Marolt Roberta Fraguglia
Curtis Roads Barbara Mazzarino Donald Glowinski
Jøran Rudi Douglas Irving Repetto Lauro Magnani
Rodrigo Sigal Kenji Suzuki Andrea Masotti
Alvise Vidolin Andrea Valle Barbara Mazzarino
Daniel Weissberg Giovanna Varni Valentina Perasso
Iannis Zannos Roberto Sagoleo
Club NIME Committee Marzia Simonetti
Installation Committee Frédéric Bevilacqua Francesca Sivori
Jamie Allen Nicolas Boillot Sandra Solimano
Philippe Baudelot Marco Canepa
Nicola Bernardini Jaime Del Val NIME Secretariat
Riccardo Dapelo Davide Ferrari Roberta Fraguglia
Scott deLahunta Jean Jeltsch Francesca Sivori
Nicola Ferrari Eric Lapie
Sergi Jorda Claudio Lugo
Lauro Magnani Press
Leïla Olivesi
Pedro Rebelo Michele Coralli
Kéa Ostovany
Franco Sborgi Guillaume Pellerin
Eric Singer Nicolas Rasamimanana Cover design
Pavel Smetana Christophe Rosenberg Studiofluo


We are proud to present the 8th edition of the International Conference on New
Interfaces for Musical Expression (NIME08), hosted by Casa Paganini - InfoMus Lab,
Università degli Studi di Genova.

Since 2005, InfoMus Lab has its new premises in the recently restored monumental
building of S. Maria delle Grazie La Nuova - Casa Paganini. The International Centre of
Excellence Casa Paganini – InfoMus Lab aims at cross-fertilizing scientific and
technological research with humanistic and artistic research. Our research explores the
relationships between music, science and emerging technologies: a mission that recalls
Niccolò Paganini´s spirit of experimentation.
New perspectives in contemporary music, in multimedia and digital luthery are among
the main purposes of the Centre. Casa Paganini - InfoMus Lab studies new directions in
scientific and technological research to improve quality of life (e.g., therapy and
rehabilitation, leisure, sport, edutainment), to develop novel industrial applications and
services (e.g., innovative interfaces and multimedia applications), to contribute to
culture (e.g., museography, support cultural heritage through new technologies).

In this framework, the NIME Conference is a unique occasion for Casa Paganini to
present on the one hand its research outcomes and activities to the scientific community
and on the other hand to get inspiration and feedback for future work. Further, our
efforts have been directed in involving in NIME the most important institutions and the
whole city of Genova. For example, besides the monumental site of Casa Paganini,
which hosts the welcome concert and the scientific sessions, concerts will be held at the
Music Conservatory “Niccolò Paganini”, demos at Casa della Musica, installations at the
Museum of Contemporary Art “Villa Croce” and at the Faculty of Arts and Philosophy of
the University of Genova, posters in the ancient convent of Santa Maria di Castello, club
NIME performances at four different cafés and clubs in Genova (010, Banano Tsunami,
Cafè Garibaldi, Mentelocale).

The scientific program of NIME08 includes 2 keynote lectures, 34 oral presentations,

and 40 poster presentations, selected by the program committee out of 105
submissions. We are honored to welcome as guest speakers Xavier Serra, head of the
music technology group at the University Pompeu Fabra in Barcelona, and Andrew
Gerzso, director of the pedagogical department at IRCAM. Moreover, 2 panel
discussions will address relevant issues in current research in sound and music
computing: networked music performances and active listening and embodied music
cognition. The program also includes 22 demos, organized in 3 demo sessions.

The artistic program encompasses a welcome concert, 3 NIME concerts, 4 Club NIME
performances, and 7 installations. The NIME concerts and the Club NIME performances
include 23 music pieces, selected by the program committee out of 63 submissions.
The welcome concert on June 4 evening, offered by Casa Paganini – InfoMus Lab in
collaboration with major music institutions in Genova, will present 4 novel music pieces
by young composers using EyesWeb XMI: one of the pieces has been commissioned to
tackle some open problems on networked performance faced in the EU Culture 2007
Project CoMeDiA; another piece has been commissioned to exploit a paradigm of
"active music listening" which is part of the EU FP7 ICT Project SAME.

Four workshops will precede and follow the official NIME program on June 4 and 8: a
workshop on technology enhanced music education, a tablet workshop for performers
and teachers, one on Jamoma, and one on techniques for gesture measurement in
musical performance.

Moreover, this year the 4th Sound and Music Computing (SMC) Summer School is held
at Casa Paganini in connection with NIME08, on June 9 - 11, 2008. The program of the
school includes plenary lectures, poster sessions, and hands-on activities. The school
will address the following topics: Gesture and Music - Embodied Music Cognition,
Mobile Music Systems, and Active Music Listening.

Organizing the NIME Conference is a huge effort, which is affordable only with the help
of many people. We would like to thank the members of the NIME Steering Committee
for the precious and wise suggestions, the demo and installation chair Corrado Canepa,
the performance chair Roberto Doati, the club performance chair Donald Glowinski, and
the members of our program committees who helped in the final selection of papers,
posters, demos, installations, and performances.

We wish to thank the Rector of the University of Genova Professor Gaetano Bignardi,
the Culture Councilor of Regione Liguria Fabio Morchio, and the Culture Councilor of
Provincia di Genova Giorgio Devoto, whose support has been of vital importance for the
creation and maturation of the project of Casa Paganini project.
We wish to thank Professor Gianni Vernazza, Head of the Faculty of Engineering,
Professor Riccardo Minciardi, Director of the DIST-University of Genova, the colleagues
Lauro Magnani and Franco Sborgi, Professors at the University of Genova; Patrizia
Conti - Director of the Music Conservatory “Niccolò Paganini”; Sandra Solimano -
Director of the Museum of Contemporary Art “Villa Croce”; Teresa Sardanelli - Head of
the Direzione Cultura e Promozione della Città of Comune di Genova and Anna Rita
Certo - Head of the Ufficio Paganini of Comune di Genova; Pietro Borgonovo - Artistic
Director of GOG - Giovine Orchestra Genovese; Enrico Bonanni and Maria Franca
Floris of the Dipartimento Ricerca, Innovazione, Istruzione, Formazione, Politiche
Giovanili, Cultura e Turismo of Regione Liguria; Roberta Canu - Director of Goethe-
Institut Genua; Vittorio Bo and Manuela Arata - Directors of Festival della Scienza;
Francesca Sivori - Vice-President of the Centro Italiano Studi Skrjabiniani; Andrea
Masotti and Edoardo Lattes - Casa della Musica; Giorgio De Martino – Artistic Director
of Fondazione Spinola; Laura Santini of Mentelocale.

Finally, we thank the whole staff of InfoMus Lab – Casa Paganini for their precious help
and the hard work in the organization of the conference.

Enjoy NIME 08!

Antonio Camurri and Gualtiero Volpe

NIME 08 Conference Chairs

Stefania Serafin
NIME 08 Program Chair

Genova, May 8, 2008

Table of Contents 


Thursday, June 5, 2008

Session 1: Networked music performance 1

David Kim-Boyle
Network Musics - Play, Engagement and the Democratization of Performance ....... 3
Álvaro Barbosa
Ten-Hand Piano: A Networked Music Installation..................................................... 9
Mike Wozniewski, Nicolas Bouillot, Zack Settel, Jeremy R. Cooperstock
Large-Scale Mobile Audio Environments for Collaborative Musical Interaction ........ 13

Session 2: Networked music performance 2

Angelo Fraietta
Open Sound Control: Constraints and Limitation...................................................... 19
Matteo Bozzolan, Giovanni Cospito
SMuSIM: a Prototype of Multichannel Spatialization System with
Multimodal Interaction Interface................................................................................ 24

Session 3: Analysis of performers gesture

and gestural control of musical instruments
Chris Nash, Alan Blackwell 
Realtime Representation and Gestural Control of Musical Polytempi ...................... 28
Mikael Laurson, Mika Kuuskankare 
Towards Idiomatic and Flexible Score-based Gestural Control
with a Scripting Language ........................................................................................ 34
Alexandre Bouënard, Sylvie Gibet, Marcelo M. Wanderley
Enhancing the visualization of percussion gestures
by virtual character animation ................................................................................... 38
Diana Young
Classification of Common Violin Bowing Techniques
Using Gesture Data from a Playable Measurement System..................................... 44

Friday, June 6, 2008

Session 4: Instruments 1
Jyri Pakarinen, Vesa Välimäki, Tapio Puputti
Slide guitar synthesizer with gestural control ............................................................ 49

Otso Lähdeoja
An Approach to Instrument Augmentation: the Electric Guitar.................................. 53
Juhani Räisänen
Sormina - a new virtual and tangible instrument ....................................................... 57
Edgar Berdahl, Hans-Christoph Steiner, Collin Oldham
Practical Hardware and Algorithms for Creating Haptic Musical Instruments ........... 61
Amit Zoran, Pattie Maes
Considering Virtual & Physical Aspects in Acoustic Guitar Design ........................... 67

Session 5: Instruments 2
Dylan Menzies
Virtual Intimacy : Phya as an Instrument .................................................................. 71
Jennifer Butler
Creating Pedagogical Etudes for Interactive Instruments ......................................... 77

Session 6: Evaluation and HCI methodologies

Dan Stowell, Mark D. Plumbley, Nick Bryan-Kinns
Discourse analysis evaluation method for expressive musical interfaces ................. 81
Chris Kiefer, Nick Collins, Geraldine Fitzpatrick
HCI Methodology For Evaluating Musical Controllers: A Case Study ....................... 87
Olivier Bau, Atau Tanaka, Wendy Mackay
The A20: Musical Metaphors for Interface Design .................................................... 91

Session 7: Sensing systems and measurement technologies

Tobias Grosshauser
Low Force Pressure Measurement: Pressure Sensor Matrices
for Gesture Analysis, Stiffness Recognition and Augmented Instruments ................ 97
Giuseppe Torre, Javier Torres, Mikael Fernstrom
The development of motion tracking algorithms
for low cost inertial measurement units - POINTING-AT - ........................................ 103
Adrian Freed
Application of new Fiber and Malleable Materials
for Agile Development of Augmented Instruments and Controllers .......................... 107
Alain Crevoisier, Greg Kellum
Transforming Ordinary Surfaces Into Multi-touch Controllers ................................... 113
Nicholas Ward, Kedzie Penfield, Sile OʼModhrain, R. Benjamin Knapp
A Study of Two Thereminists:
Towards Movement Informed Instrument Design ..................................................... 117

Saturday, June 7, 2008

Session 8: Active listening to sound and music content

Vassilios-Fivos A. Maniatakos, Christian Jacquemin
Towards an affective gesture interface for expressive music performance .............. 122
Anna Källblad, Anders Friberg, Karl Svensson, Elisabet Sjöstedt Edelholm
Hoppsa Universum – An interactive dance installation for children .......................... 128
Antonio Camurri, Corrado Canepa, Paolo Coletta,
Barbara Mazzarino, Gualtiero Volpe
Mappe per Affetti Erranti: a Multimodal System
for Social Active Listening and Expressive Performance .......................................... 134

Session 9: Agent-based systems

Sergio Canazza, Antonina Dattolo
New data structure for old musical open works ........................................................ 140
Arne Eigenfeldt, Ajay Kapur
An Agent-based System for Robotic Musical Performance ...................................... 144

Session 10: Sensing systems and measurement technologies

Maurizio Goina, Pietro Polotti
Elementary Gestalts for Gesture Sonification ........................................................... 150
Stefano Delle Monache, Pietro Polotti, Stefano Papetti, Davide Rocchesso
Sonic Augmented Found Objects ............................................................................. 154
Jean-Marc Pelletier
Sonified Motion Flow Fields as a Means of Musical Expression............................... 158
Josh Dubrau, Mark Havryliv
P[a]ra[pra]xis: Poetry in Motion................................................................................. 164
Jan C. Schacher
davos soundscape, a location based interactive composition .................................. 168


Thursday, June 5, 2008 - Session 1

Andy Schmeder, Adrian Freed
uOSC: The Open Sound Control Reference Platform for Embedded Devices ......... 175
Timothy Place, Trond Lossius, Alexander Refsum Jensenius
Addressing Classes by Differentiating Values and Properties in OSC ..................... 181
Ananya Misra, Georg Essl, Michael Rohs
Microphone as Sensor in Mobile Phone Performance .............................................. 185

Nicolas Bouillot, Mike Wozniewski, Zack Settle, Jeremy R. Cooperstock
A Mobile Wireless Augmented Guitar ....................................................................... 189
Robert Jacobs, Mark Feldmeier, Joseph A. Paradiso
A Mobile Music Environment Using a PD Compiler and Wireless Sensors .............. 193
Ross Bencina, Danielle Wilde, Somaya Langley
Gesture ≈ Sound Experiments: Process and Mappings ........................................... 197
Miha Ciglar
“3rd. Pole” - a Composition Performed via Gestural Cues ........................................ 203
Kjetil Falkenberg Hansen, Marcos Alonso
More DJ techniques on the reactable ....................................................................... 207
Smilen Dimitrov, Marcos Alonso, Stefania Serafin
Developing block-movement, physical-model based objects for the Reactable ....... 211
Jean-Baptiste Thiebaut, Samer Abdallah, Andrew Robertson
Real Time Gesture Learning and Recognition: Towards Automatic Categorization . 215
Mari Kimura
Making of VITESSIMO for Augmented Violin:
Compositional Process and Performance ................................................................ 219
Joern Loviscach
Programming a Music Synthesizer through Data Mining .......................................... 221
Kia Ng, Paolo Nesi
i-Maestro: Technology-Enhanced Learning and Teaching for Music ........................ 225

Friday, June 6, 2008 - Session 2

Bart Kuyken, Wouter Verstichel, Frederick Bossuyt, Jan Vanfleteren,
Michiel Demey, Marc Leman
The HOP sensor: Wireless Motion Sensor ............................................................... 229
Niall Coghlan, R. Benjamin Knapp
Sensory Chairs: a System for Biosignal Research and Performance ....................... 233
Andrew B. Godbehere, Nathan J. Ward
Wearable Interfaces for Cyberphysical Musical Expression ..................................... 237
Kouki Hayafuchi, Kenji Suzuki
MusicGlove: A Wearable Musical Controller for Massive Media Library ................... 241
Michael Zbyszynski
An Elementary Method for Tablet ............................................................................. 245
Gerard Roma, Anna Xambó
A tabletop waveform editor for live performance ...................................................... 249
Andrea Valle
Integrated Algorithmic Composition. Fluid systems
for including notation in music composition cycle ..................................................... 253
Andrea Valle
GeoGraphy: a real-time, graph-based composition environment ............................. 257

Ioannis Zannos, Jean-Pierre Hébert
Multi-Platform Development of Audiovisual and Kinetic Installations........................ 261
Greg Corness
Performer model: Towards a Framework for Interactive Performance
Based on Perceived Intention ................................................................................... 265
Paulo Cesar Teles, Aidan Boyle
Developing an “Antigenous” Art Installation
Based on A Touchless Endo-system Interface ......................................................... 269
Silvia Lanzalone
The ‘suspended clarinet’ with the ‘uncaused sound’.
Description of a renewed musical instrument ........................................................... 273
Mitsuyo Hashida, Yosuke Ito, Haruhiro Katayose
A Directable Performance Rendering System: Itopul ............................................... 277
William R. Hazlewood, Ian Knopke
Designing Ambient Musical Information Systems ..................................................... 281

Saturday, June 7, 2008 - Session 3

Aristotelis Hadjakos, Erwin Aitenbichler, Max Mühlhäuser
The Elbow Piano: Sonification of Piano Playing Movements .................................... 285
Yoshinari Takegawa, Tsutomu Terada, Masahiko Tsukamoto
UnitKeyboard: an Easy Configurable Compact Clavier ........................................... 289
Cléo Palacio-Quintin
Eight Years of Practice on the Hyper-Flute:
Technological and Musical Perspectives .................................................................. 293
Edgar Berdahl, Julius O. Smith III
A Tangible Virtual Vibrating String ............................................................................ 299
Christian Geiger, Holger Reckter, David Paschke, Florian Schutz, Cornelius Pöpel
Towards Participatory Design and Evaluation
of Theremin-based Musical Interfaces ...................................................................... 303
Tomás Henriques
META-EVI: Innovative Performance Paths with a Wind Controller ........................... 307
Robin Price, Pedro Rebelo
Database and mapping design for audiovisual prepared radio set installation ......... 311
Kazuhiro Jo, Norihisa Nagano
Monalisa: "see the sound, hear the image" .............................................................. 315
Andrew Robertson, Mark D. Plumbley, Nick Bryan-Kinns
A Turing Test for B-Keeper: Evaluating an Interactive Real-Time Beat-Tracker....... 319
Gabriel Gatzsche, Markus Mehnert, Christian Stöcklmeier
Interaction with tonal pitch spaces ............................................................................ 325
Parag Chordia, Alex Rae
real-time Raag Recognition for Interactive Music ..................................................... 331

Anders Vinjar
Bending Common Music with Physical Models ........................................................ 335
Margaret Schedel, Alison Rootberg, Elizabeth de Martelly
Scoring an Interactive, Multimedia Performance Work ............................................. 339

DEMOS1 343

Ayaka Endo, Yasuo Kuhara

Rhythmic Instruments Ensemble Simulator
Generating Animation Movies Using Bluetooth Game Controller ............................. 345
Keith A. McMillen
Stage-Worthy Sensor Bows for Stringed Instruments .............................................. 347
Lesley Flanigan, Andrew Doro
Plink Jet .................................................................................................................... 349
Yusuke Kamiyama, Mai Tanaka, Hiroya Tanaka
Oto-Shigure: An Umbrella-Shaped Sound Generator for Musical Expression .......... 352
Sean Follmer, Chris Warren, Adnan Marquez-Borbon
The Pond: Interactive Multimedia Installation ........................................................... 354
Ethan Hartman, Jeff Cooper, Kyle Spratt
Swing Set: Musical Controllers with Inherent Physical Dynamics............................. 356
Paul Modler, Tony Myatt
Video Based Recognition of Hand Gestures by Neural Networks
for The Control of Sound and Music ......................................................................... 358
Kenji Suzuki, Miho Kyoya, Takahiro Kamatani, Toshiaki Uchiyama
beacon: Embodied Sound Media Environment for Socio-Musical Interaction .......... 360
Eva Sjuve
Prototype GO: Wireless Controller for Pure Data ..................................................... 362
Robert Macrae, Simon Dixon
From toy to tutor: Note-Scroller is a game to teach musi .......................................... 364
Stuart Favilla, Joanne Cannon, Tony Hicks, Dale Chant, Paris Favilla
Gluisax: Bent Leather Band’s Augmented Saxophone Project ................................. 366
Staas De Jong
The Cyclotactor : Towards a Tactile Platform for Musical Interaction ....................... 370
Michiel Demey, Marc Leman, Frederick Bossuyt, Jan Vanfleteren
The Musical Synchrotron: using wireless motion sensors
to study how social interaction affects synchronization with musical tempo ............. 372

 These are the contributions accepted as demos. The demo program also includes nine further demos associated 
to papers and posters. 


Opening concert ....................................................................................................... 377

Roberto Girolin
Lo specchio confuso dall’ombra ............................................................................... 378
Nicola Ferrari
The Bow is bent and drawn ...................................................................................... 379
Giorgio Klauer
Tre aspetti del tempo per iperviolino e computer ...................................................... 380
Alessandro Sartini
Aurora Polare ........................................................................................................... 381
Pascal Baltazar
Pyrogenesis .............................................................................................................. 382
Chikashi Miyama
Keo Improvisation for sensor instrument Qgo........................................................... 383
Keith Hamel, François Houle, Aleksandra Dulic
Intersecting Lines ..................................................................................................... 384
Ernesto Romero e Esthel Vogrig
Vistas ........................................................................................................................ 385
Martin Messier, Jacques Poulin-Denis
The Pencil Project .................................................................................................... 386
Stuart Favilla, Joanne Cannon, Tony Hicks
Heretic’s Brew .......................................................................................................... 387
Mark Alexander Bokowiec, Julie Wilson-Bokowiec
The Suicided Voice ................................................................................................... 388
Mark Alexander Bokowiec, Julie Wilson-Bokowiec
Etch .......................................................................................................................... 389
Thomas Ciufo
Silent Movies: an improvisational sound/image performance ................................... 390
Alison Rootberg, Margaret Schedel
The Color of Waiting ................................................................................................. 391
Ge Wang, Georg Essl, Henri Penttinen
MoPhO - A Suite for Mobile Phone Orchestra .......................................................... 392



Jane Rigler
Traces/Huellas (for flute and electronics) ................................................................. 395

Renaud Chabrier, Antonio Caporilli
Drawing / Dance ....................................................................................................... 396
Joshua Fried
Radio Wonderland .................................................................................................... 397
Silvia Lanzalone
Il suono incausato, improvise-action
for suspended clarinet, clarinettist and electronics (2005) ........................................ 398
Luka Dekleva, Luka Prinčič, Miha Ciglar
FeedForward Cinema ............................................................................................... 399
Greg Corcoran, Hannah Drayson, Miguel Ortiz Perez, Koray Tahiroglu
The Control Group .................................................................................................... 400
Nicolas d'Alessandro
Cent Voies ................................................................................................................ 401
Cléo Palacio-Quintin, Sylvain Pohu
Improvisation for hyper-flute, electric guitar and real-time processing ...................... 402
Nicolas d'Alessandro, Sylvain Pohu
Improvisation for Guitar/Laptop and HandSketch ..................................................... 403
Ajay Kapur
Anjuna's Digital Raga ............................................................................................... 404
Jonathan Pak
Redshift .................................................................................................................... 405


Olly Farshi
Habitat ...................................................................................................................... 409
Jeff Talman
Mirror of the moon .................................................................................................... 410
Joo Youn Paek
Fold Loud.................................................................................................................. 411
Kenneth Newby, Aleksandra Dulic, Martin Gotfrit
in a thousand drops... refracted glances .................................................................. 412
Jared Lamenzo, Mohit Santram, Kuan Huan, Maia Marinelli
Soundscaper ............................................................................................................ 413
Pasquale Napolitano, Stefano Perna, Pier Giuseppe Mariconda
SoundBarrier_ .......................................................................................................... 414
Art Clay, Dennis Majoe
China Gates.............................................................................................................. 415


Kia Ng
4th i-Maestro Workshop on Technology-Enhanced Music Education ....................... 419
Michael Zbyszyński
Tablet Workshop for Performers and Teachers ........................................................ 421
R. Benjamin Knapp, Marcelo Wanderley, Gualtiero Volpe
Techniques for Gesture Measurement in Musical Performance ............................... 423
Alexander Refsum Jensenius, Timothy Place, Trond Lossius,
Pascal Baltazar, Dave Watson
Jamoma Workshop ................................................................................................... 425




Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Network Musics - Play, Engagement and the Democratization of

David Kim-Boyle
University of Maryland, Baltimore
Department of Music
1000 Hilltop Circle, Baltimore, MD
1-410-455 8190
kimb oyle@umb
ABSTRACT Stockhausen [21] as well as the improvisatory work of groups
such as AMM, Musica Electronica Viva and artists associated
The rapid development of network communication
with the Fluxus School who directly situated the audience in
technologies has allowed composers to create new ways i n 1
which to directly engage participants in the exploration of new performative roles. Much as this earlier generation created
musical environments. A number of distinctive aesthetic unique opportunities for musical expression, composers
approaches to the musical application of networks will be working with networks create environments which are
outlined in this paper each of which is mediated and musically expressed through playful exploration. The musical
conditioned by the technical and aesthetic foundations of the forms that emerge from these explorations and the
network technologies themselves. Recent work in the field b y relationships that develop between participants should be
artists such as Atau Tanaka and Metraform will be examined, as considered, however, in the context of the social goals that
will some of the earlier pioneering work in the genre by Max propelled the work of this earlier generation of composers.
Neuhaus. While recognizing the historical context of Given the central aesthetic role the exploration of network-
collaborative work, the author will examine how the strategies based musical environments plays, the extent to which the
employed in the work of these artists have helped redefine a network’s topology conditions the play of participants
new aesthetics of engagement in which play, spatial and requires consideration [16]. While interactions between
temporal dislocation are amongst the genre’s defining participants can occur over spatially distributed or localized
characteristics. environments, and the interactions and explorations
themselves can be synchronous or asynchronous, the design
of the interface through which these explorative behaviors are
Keywords mediated is of equal importance. Informed by an
Networks, collaborative, open-form, play, interface. understanding of the principles of game design theory, it will
be argued that meaningful interaction and truly democratized
1. INTRODUCTION performance spaces can only emerge from carefully considered
The development of high-speed network communication system and interface design [19].
protocols and other wireless and telecommunications
technology has allowed the creation of musical environments
which directly engage participants in the realization of new 2. MUSICAL APPROACHES
forms of musical expression. These environments resituate the While a number of studies have been published outlining
role of the composer to that of designer and transform the different ways in which agents can collaborate with each other
nature of performance to that of play. While the development through a network infrastructure [1, 18, 26, 27], significantly
of the genre has been informed by aesthetic concerns shared b y less attention has been given to the different aesthetic
all collaborative art, the spatial and, to some extent, temporal approaches that these topologies facilitate. While the
dislocation of participants conditions and mediates the nature classification of network structures is helpful, the ways i n
of play itself to an unprecedented extent [1]. which such structures condition the behavior of participants i s
By actively engaging its audience, network-based musical equally significant. Some of the ways in which network
environments recall the collaborative work of an earlier topologies mediate musical expression will be explored in the
generation of composers such as Brown [3], Haubenstock- remainder of this paper. Central to this discussion are the
Ramati [10], Brün [4], Wolff [28], Pousseur [8], and musical effects of spatial and temporal dislocation and the role
of interface design.
A number of distinctive approaches to the musical application
Permission to make digital or hard copies of all or part of this work for of networks can be seen to have emerged since the earliest
personal or classroom use is granted without fee provided that copies experiments in the genre in the 1960s. These include the
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. The term collaborative work will be used throughout in reference to
NIME08, June 4-8, 2008, Genova, Italy any work in which performers or the audience are given creative
Copyright remains with the author(s). responsibility for determining the order of musical events or, in some
cases, for interpreting general musical processes. Open form and
mobile form works are two examples of traditional collaborative work.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

creation of network instruments, generative works, the employed would greatly inhibit the ability of participants t o
integration of musical play within social networks and the distinguish their own musical contribution much less be able
creation of immersive environments. to engage in meaningful dialog with others. Nevertheless,
through their participation, listeners were able to build a
community brought together by the exploration of the
network infrastructure. This would suggest that the goals of
2.1 Network Instruments Radio Net were not so much the participation in dialog but
Amongst some of the earliest work to utilize rather the playful exploration of a network environment.
telecommunication networks for artistic purposes are Max Like these earlier works, one of Neuhaus’s most recent projects
Neuhaus’s radio projects. Between 1966 and 1977, Neuhaus Auracle (2004) [15] adopts a similar network infrastructure
produced a series of works, which he termed “Broadcast although in this case the network no longer exists over radio
Works” in which the musical outcome is dependent upon the transmissions but rather the internet. In Auracle, participants
active responses of the audience. In the earliest of these works form ensembles and collectively modify an audio stream
Public Supply I (1966), radio listeners were asked to call in t o broadcast by a server through the use of their voice. In a
a radio station and produce any sounds they wanted. Neuhaus similar manner to the Broadcast Works the resultant sounds of
then mixed the incoming signals to produce the musical Auracle are affected by the proficiency of the participants but
results. Neuhaus has written of these works - “ seems that also by network latency. Network latency, a manifestation of
what these works are really about is proposing to reinstate a temporal dislocation, is often considered a technical handicap
kind of music which we have forgotten about and which i s for performers who wish to collaborate over the internet, but i t
perhaps the original impulse for music in man: not making a is a key aesthetic consideration in the work of many
musical product to be listened to, but forming a dialogue, a composers who exploit it in the creation of unique musical
dialogue without language, a sound dialogue.” [16] The environments. While latency is minimized in Auracle by the
intention is strikingly similar to that expressed by Tanaka - system architecture employed, it nevertheless clearly
“In classical art making, we have been mostly concerned with distinguishes the relationships participants form with the
making objects - nodes - is (sic.) time to think about the audio stream and through that with other ensemble members
process of creating connections?” [22] and like much of from those traditional relationships that exist between
Tanaka’s network-based projects, Neuhaus’s work exists as an performers and their instruments.
environment which promotes the agency of its participants
through the initiation and development of musical dialogs. In Just as in the Broadcast Works Neuhaus regards Auracle not as
Public Supply I, however, Neuhaus mediates those a self-contained musical work in itself but as a collective
relationships through the mixing process, reinforcing instrument or musical architecture through which participants
musically interesting dialogs while downplaying those of less develop relationships through musical dialog. As implied
appeal. above, those dialogs are necessarily mediated by the design of
the instrument itself. The algorithm used to extract control
In a later realization of Neuhaus’s project, listeners from across features from the sonic input is not made explicit and the
the Eastern United States were asked to call in and whistle a ability of participants to shape the audio stream with any
single pitch for as long as they were able. The work, entitled degree of nuance is quite limited. Further, there is little direct
Radio Net, was produced in cooperation with National Public indication as to how particular gestures modify the audio
Radio. Unlike Public Supply I, in this work Neuhaus did not stream. While this would seem to inhibit the ability of
mix the responses live but rather, devised an automated participants to engage in meaningful dialog with other
mixing system in which the output switched between various participants, it does reinforce the fact that like any instrument,
input signals based on the pitch of the input sounds. The Auracle has its own idiosyncrasies.
input whistles were also subject to electronic transformation
as the sounds looped from one broadcast station to another. In comparison with the Broadcast Works, the use of an
While Radio Net’s realization was perhaps of more interest t o interface also represents an important distinction. Existing as
its participants than a passive audience, and despite the fact the window through which the environment is explored and
that some thousands of listeners participated in the realization dialogs with ensemble members are developed, of immediate
of the work, the result as realized in its only 1977 performance note is its simplicity. With the screen divided into discrete
was coherent, subtle, and at times quite beautiful [14]. sections representing the geographical location of
participants, the musical contributions of ensemble members
To the extent that Radio Net was developed as an environment is graphically represented by simple lines. Basic control
within which musical dialogs could be formed and developed, functions allow participants to record brief audio samples
the work does present a number of themes which we will see which transform the audio stream. While control functions are
taken up in various forms in most subsequent network-based simple they are a necessary consequence of the work’s open
music. These include the role of the agency of others i n environment. The interface design also enables ensemble
conditioning one’s own play, the degree to which dialogs are members to more clearly distinguish their own musical
mediated by the mechanism’s of the network, the public vs. contributions from those of other members.
private space of performance, the degree to which the dialogs
enabled represent truly unique ways of communicating and the
new role of the composer as a designer of a musical
environment rather than a creator of self-contained musical
work. Rather than attempt to address the extent to which all
these themes are addressed in Neuhaus’s Broadcast Works, let
us for now comment on the question of agency. The network
infrastructure of Radio Net and the transformational processes

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Ulrike Gabriel [2]. In Global String, network traffic between

nodes is used to drive the parameters of the audio synthesis
engine [2007, pers. comm. April], directly correlating temporal
dislocation to musical expression. It thus makes explicit the
ways in which the network mediates communication between
participants making prominent the question of information
transparency. Global String also is one of the few network
instruments to incorporate haptic feedback within its
infrastructure. This is an especially important design
characteristic as, unlike software-based instruments, it more
directly rewards performance skill and in doing so increases
the likelihood of more meaningful play emerging [11].

2.2 Generative Works

The work of composer Jason Freeman, a collaborator with
Neuhaus on Auracle, often addresses ways in which an
audience can be engaged in the creation of unique musical
forms. The design of carefully considered interfaces is crucial
to this endeavor. Graph Theory is a recent web-based work i n
which participants do not directly interact with each other but
rather help realize an open-form musical work by navigating
Figure 1. Interface for Max Neuhaus’s Auracle. pathways through a range of musical possibilities. In the work,
basic melodic cells are repeatedly performed by a violinist.
Much of Atau Tanaka’s work has employed networks t o The user is able to choose which cell will follow the currently
directly explore issues of collaboration and community performed cell by choosing between up to four subsequent
building. In his Global String (1998), a project produced with cells, see Fig. 3 top. There are a total of sixty one cells. While
Kasper Toeplitz, a network simulates an invisible resonant the order of cells is chosen by the participant, the range of
string whose nodes are anchored in different gallery possibilities is predetermined and displayed with a graphic
installations. Tanaka writes of the project - “The installation representation of interconnected nodes. A novel aspect of this
consists of a real physical string connected to a virtual string work is that a score can be generated for performance in which
on the network. The real string (12 mm diameter, 15 m length) the order of the loops is determined by the popularity of
stretches from the floor diagonally up to the ceiling of the choices made by users. While the content of the work is
installation space. On the floor is one end point - the ear. Up defined by the composer, the ability of a collective t o
above is the connection to the network, connecting it t o determine its order is a unique feature and an extension of
another end point at a remote location. Sensors translate the classic open-form works.
vibrations into data to be transmitted to the other end via
network protocol. The server is the “bridge” of the instrument - While the pathways chosen through the score are not overtly
the reflecting point. It runs software that is a physical model of determined by the composer, they are certainly influenced by
a string. Data is streamed back to each site as sound and as data how the composer has decided to distribute musical phrases
- causing the string to become audible, and to be actuated t o amongst nodal points. One of Freeman’s pre-compositional
vibrate, on the other end.” [24] Players of the string are able t o rules was that adjacent cells could have only one change
collaborate with other players located in different installations between their respective pitch sets. This decision introduced
in a topology similar to that of Weinberg’s bridge model [27]. melodic continuity and helped keep decision making for the
participants relatively simple. There were no such rules
applied to rhythmic properties. The graphical representations
employed were also considered in determining navigational
pathways [2007, pers. comm. 2 January]. As participants
navigating Graph Theory’s structure do not interact with each
other, questions of spatial distribution and temporal latency
are not pertinent. The interface that Freeman has designed,
however, does condition the play of those who interact with
the materials broadcast by the server. A map of all possible
pathways through the work’s 61 nodes is presented in the
bottom left quadrant of the interface. These pathways have a
tri-partite structure which encourages both local exploration
Figure 2. Installation setup for Tanaka’s Global String. of neighboring nodes and implies greater musical contrast for
larger cross-sectional explorations. A participant’s movement
Just like Neuhaus’s Auracle, Global String is not a self- through the nodes of the work is also facilitated through the
contained musical work but rather a network-based instrument use of simple bar graphs for the display of rhythmic structure
that facilitates connections between participants across and pitch contour. This choice of display clearly renders the
distributed space. In Global String, these connections are work more suitable to participants unable to read common
mediated by the latency of the network which Tanaka practice notation.
considers analogous to instrumental resonance [22], an idea
also explored in the work of artists such as Carsten Nicolai and

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Through immediate visual and aural feedback, participants of Mobile Music” (2004) is a good example of this more recent
Graph Theory are clearly able to discern their actions and direction. Using specially modified mobile communication
evaluate them in the context of previous and future decisions. devices equipped with physical sensors that measure both the
They are also able to compare their choices with those of pressure with which the device is held and its movement i n
others through a simple “popularity” index which rates the space, participants are able to collectively remix a popular
frequency with which subsequent cells are chosen. The choices song chosen by the members of the network [23]. Various
made are given a further complexity in that they contribute t o audio transformations such as time stretching and sampling
a more global index used to create a score for live performance. can be applied, and rhythmic patterns and sequences can be
Participants thus contribute to two distinct levels of generated from the original source material through various
performance - the private space performance that takes place built-in software modules. Just as in open form works, these
within their own immediate interaction with the network, and transformations can be applied in any order and the various
the public space performance which results from the collective contributions of each group member become an individual
play of many participants. track in the master mix. The physical proximity of the
participants which is determined through a GPS system is also
used to affect the dynamic balance of the resultant mix directly
correlating social proximity with musical presence. The results
of the remixing and transformations are broadcast to all
participants. More overtly than Neuhaus’s Auracle, Tanaka’s
instrument creates immediate collaborative relationships and
communities through the virtual environment of the network
technology employed. The “Malleable Mobile Music” project
has recently been employed in a new interactive work,
Net_Dérive, for mobile, wearable communication devices. This
latter project was produced in collaboration with Petra
Gemeinboeck and received its premiere in Paris in 2006 [25].

Figure 4. Specially modified PDAs for Malleable Mobile


2.4 Immersive Works

A different type of musical collaboration is explored i n
immersive works [7]. In Ecstasis by the Australian ensemble
Metraform, four participants engage in exploring and
decoding a virtual environment through the use of head-
mounted displays equipped with motion tracking devices. The
images seen through the displays are also projected on four
screens surrounding the participants. In Ecstasis, the
participants, each graphically represented by an avatar,
determine the nature and scope of the work through their
interactions. Metraform has written of Ecstasis - “The
relationship between the avatars modulates the space, colour,
transparency and sound of the environment. The collective
interaction results in a dynamic interplay with and within a
continuously modifiable environment. This engagement
transgresses from a preoccupation of ‘doing’ something in an
Figure 3. User interface for Graph Theory with an environment to ‘being’ present to one’s self and others.” [13]
excerpt from the generated score.
In Ecstasis and other recent work by the ensemble, sound i s
employed as a means of environmental understanding. The
2.3 Social Networks soundscape of the work was produced by composer Lawrence
In recent work, Atau Tanaka has utilized mobile network Harvey. The sounds heard, and the sound transformations
technology to build communities in which the members applied are determined by the virtual location of each of the
collaborate in shared musical experiences. His “Malleable four participants as well as from information derived from the

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

motion of the head-mounted displays. Consisting of sixteen Theory. Given the responsibility assumed of participants, the
channels of spatialized sound, the sounds complement the composer or designer of that environment must also assume
images generated and provide easily discerned sonic cues that some responsibility for the quality of those relationships that
help establish cooperative relationships between the emerge. Dobrian goes further and states that in a collective
participants. Ecstasis defines clear goals for its participants performance it is up to the composer to develop an
and rewards their explorations with a greater understanding of environment within which compelling work can take place [9]
their environment. Through the environmental space that the while Tanaka has stated that interesting results can only be
work presents, Ecstasis becomes a catalyst for collective achieved by developing interesting processes [2007, pers.
individuation [12]. As its participants decode their comm. April]. Bryan-Kinns and Healey have even shown that
environment and come to a greater collective awareness it i s the effect of decay within a collective instrument significantly
clear that the disjunctions between interface and environment affects how participants engage with that environment [5]. As
and public and private performance spaces are no longer we have seen in the work of Neuhaus and Freeman, interface
sustainable. design is of critical importance in conditioning the ways i n
which processes, environments and relationships are able to be
explored while in Tanaka’s Global String, haptic feedback is a
critical component in the development of meaningful play.
Indeed, as has become evident, democratized performance
spaces can only be realized through carefully considered
interface design.
Transparent interface design also facilitates the ability of
participants to surrender to their environment rather than have
to decode the means through which it is presented. How that
environment responds to their own agency is of especial
importance. As noted by Phillips and Rabinowitz,
...when the audience expects instant response, asks the piece
for self-affirmation or affirmation of a learned behavior, the
effect closes down what the piece means to open up.
Collaborative art asks for something as complex as inspired
Figure 5. A screenshot from Ecstasis. surrender and must elicit recognition, building from
reflection. That moment of self-regard should then develop
into more complicated correspondences. Otherwise, the piece
3. AESTHETIC THEMES can veer toward superficiality and rely on what we call a
While each network project examined posits its own aesthetic “supermarket door process of interactivity”: I walked up to i t
questions, they all share a number of common concerns. These and it opened’ I have power [17].
range from questions regarding the democratized performance
space which network-based work promotes, through t o While technology has not fundamentally changed the defining
questions provoked by the technology through which these characteristics of collaborative art forms, it has certainly
works are sustained. Some of these questions include mediated them in distinctive ways. In some environments,
consideration of how the spatial and temporal aesthetics of such as in Metraform’s Ecstasis, this has brought about
network technologies mediate collaborative relationships [11] unique modes of engagement while in other projects network
while others make overt the influence of interface design in the latency has produced collective instruments the aesthetics of
promotion of democratized performance environments. which are founded on immediacy and extended reflection [24].
Of defining character, of course, are the spatial and temporal
Given the creative role participants play in exploring their properties of the network infrastructure or topology. While
musical environments, the role of the composer has largely these are able to be exploited to musical effect, it is perhaps
become transformed to that of designer while the traditional counterintuitive that spatial disjunction and temporal
role of the performer has been subsumed by that of player. To a dislocation can also perhaps serve to facilitate a greater
certain extent this situation is paralleled in traditional open- awareness of agency and collective becoming.
form works in which composers design open musical
environments which serve to facilitate an awareness of process
and collective becoming. All network-based musical works
posit environments within which relationships between 4. SUMMARY
participants are facilitated and developed. The directives The democratized performance spaces that network-based
which determine the extent to which these environments can musical environments supports are a natural response to the
be explored and relationships developed differs from musical and social ideals that motivated the work of an earlier
composer to composer and from project to project. While generation of composers for whom such technology did not
artists such as Tanaka and Neuhaus encourage collaborative exist. These technologies have brought about new modes of
relationships and dialogs to be openly explored within the awareness of individual agency and of the creative
boundaries of their environments, other artists such as relationships that can emerge with others through the playful
Metraform, and Freeman adopt a less open approach and exploration of the architectures that sustain musical
predefine particular social goals through and for their work. In collaboration. The aesthetic features unique to the genre
Metraform’s Ecstasis, as we have seen, this took the form of an emphasize the challenges of fully engaging participants i n
improved environmental understanding while the creation of a collaborative processes and moving participants beyond the
performable work was an explicit goal of Freeman’s Graph easy solution of falling back on what Cage has referred to as
superficial habits [6]. These challenges are amply rewarded,

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

however, by the exciting potential of network music to create [12] Massumi, B. The Political Economy of Belonging. In
unique forms of musical expression and new modes of musical Parables for the Virtual: Movement, Affect, Sensation.
agency and engagement and in doing so to transcend the Duke University Press, Durham, NC, 2002.
network architectures that make such dialogs and [13] Metraform. Ecstasis. 2004, Viewed December 2006,
relationships possible. <>.
[14] Neuhaus, M. Radio Net. 1977. Available at
I am grateful to Jason Freeman, Lawrence Harvey and Atau [15] Neuhaus, M., Freeman, J., Ramakrishnan, C., Varnick, K.,
Tanaka for providing further information on their work. I Burk, P., and Birchfield, D. Auracle. 2006, Viewed June
would also like to thank John Dack for generously providing a 2006, <>.
copy of his article on the Scambi project.
[16] Neuhaus, M. The broadcast works and Audium. 2007,
Viewed January 2007, <>.
6. REFERENCES [17] Phillips, L., and Rabinowitz, P. On collaborating with an
[1] Barbosa, A. Displaced soundscapes: a survey of network audience. Collaborative Journal, 2006, Viewed January
systems for music and sonic art creation. Leonardo Music 2006, <>.
Journal, vol. 13, 2003, 53-59. [18] Rebelo, P. Network performance: strategies and
[2] Broeckmann, A. Reseau/Resonance - connective processes applications. Presentation at the 2006 International
and artistic practice. Artmedia VIII, 2002, Viewed March Conference on New Interfaces for Musical Expression
2007, (NIME06), Paris, 2006, Viewed March 2007,
< <>
Broeckmann.html>. [19] Salen, K., and Zimmerman, E. Rules of Play: Game Design
[3] Brown, E. Form in new music. Darmstadter Beitrager, vol. Fundamentals. MIT Press, Cambridge, MA, 2004.
10, 1965, 57-69. [20] Souza e Silva, A. Art by telephone: from static to mobile
[4] Brün, H. When music resists meaning: the major writings interfaces. Leonardo Electronic Almanac, vol. 12, no. 10,
of Herbert Brün. Ed. A Chandra, Wesleyan University 2004.
Press, Middletown, CN, 2004. [21] Stockhausen, K. time passes.... Trans. C Cardew, Die
[5] Bryan-Kinns, N. and Healey, P.G.T. Decay in collaborative Reihe, 3 (1959), Bryn Mawr, PA, 10-40.
music making. In Proceedings of the 2006 International [22] Tanaka, A. Seeking interaction, changing space. In
Conference on New Interfaces for Musical Expression Proceedings of the 6th International Art +
(NIME06), Paris, 2006, 114-117. Communication Festival 2003, Riga, Latvia, 2003, Viewed
[6] Cage, J. Soundings: investigation into the nature of July 2006, <>.
modern music. Neuberger. [23] Tanaka, A. Mobile music making. In Proceedings of New
[7] Chew, E, Kyriakakis, C., Papadopoulos, C., Sawchuk, A. A., Interfaces for Musical Expression 2004 Conference,
and Zimmermann, R. From remote media immersion to Hamamatsu, 2004, 154-156.
distributed immersive performance. In Proceedings of the [24] Tanaka, A. Global String. 2005, Viewed July 2006,
2003 ACM SIGMM Workshop on Experiential <
Telepresence, Berkeley, CA, 2003, 110-120. f>.
[8] Dack, J. “Open” forms and the computer. In Musiques, [25] Tanaka, A., Gemeinboeck, P., and Momeni, A. Net_Dérive,
Arts, Technologies: Towards a Critical Approach. a participative artwork for mobile media. In-press, 2007.
L’Harmattan, Paris, 2004, 401-412.
[26] Viewed January 2006,
[9] Dobrian, C. Aesthetic considerations in the use of <>.
“virtual” music instruments. In Proceedings of the 2003
International Conference on New Interfaces for Musical [27] Weinberg, G. Interconnected musical networks: toward a
Expression (NIME03), Montreal, 2003, 161-163. theoretical framework. Computer Music Journal, 29:2,
2005, 23-39.
[10] Haubenstock-Ramati, R. Notation - material and form.
Perspectives of New Music, Vol. 4, No. 1, 1965, 39-44. [28] Wolff, C. Open to whom and to what. Interface, 16/3,
1987, 133-141.
[11] Leman, M. Embodied music cognition and mediation
technology. MIT Press, Cambridge, MA, 2007.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Ten-Hand Piano: A Networked Music Installation

Álvaro Barbosa
Research Center for Science and Technology of the Arts (CITAR)
Portuguese Catholic University – School of the Arts
Rua Diogo Botelho 1327, 4169-005 Porto, Portugal
+351 22 616 62 91

ABSTRACT computer networks as a channel to connect performing spaces. It

This paper presents the latest developments of the Public Sound can run entirely over WWW, and its underlying communication
Objects (PSOs) system, an experimental framework to implement protocol (Hypertext Transfer Protocol - HTTP), in order to
and test new concepts for Networked Music. The project of a perform over a regular Internet Connection and achieve the sense
Public interactive installation using the PSOs system was of a Public Acoustic Space where anonymous users can meet and
commissioned in 2007 by Casa da Musica, the main concert hall be found performing in collective Sonic Art pieces.
space in Porto. It resulted in a distributed musical structure with The system itself is an interface-decoupled Musical Instrument, in
up to ten interactive performance terminals distributed along the which a remote user interface and a sound processing engine
Casa da Musica’s hallways, collectively controlling a shared reside with different hosts, given that it is possible to
acoustic piano. The installation allows the visitors to collaborate accommodate an extreme scenario where a user can access the
remotely with each other, within the building, using a software synthesizer from any place in the world using a web browser.
interface custom developed to facilitate collaborative music
practices and with no requirements in terms previous knowledge Specific software features were implemented in order to reduce
of musical performance. the disruptive effects of network latency [3], such as dynamic
adaptation of the musical tempo and dynamics to communication
latency measured in real-time.
Network Music Instruments; Real-Time Collaborative In particular, the recent developments presented in this paper,
Performance; Electronic Music Instruments; Behavioral Driven result from a commission in 2007 of an Interactive Sonic Art
Interfaces; Algorithmic Composition; Public Music; Sound Installation form Casa da Musica, the main concert hall space in
Objects; Porto. The resulting Setup is a distributed musical structure with
up to ten interactive performance terminals distributed along the
Casa da Musica’s hallways, collectively controlling a shared
1. INTRODUCTION acoustic piano.
The Public Sound Objects (PSOs) project consists of the
development of a networked musical system, which is an It Includes:
experimental framework to implement and test new concepts for
on-line music communication. It not only serves a musical x The adaptation of the Original synthesizer (a Pure-Data
purpose, but it also facilitates a straight-forward analysis of [4] sound Engine) to a Yamaha Disklavier Piano [5]
collective creation and the implications of remote communication x Redesign of the interactive sound paradigm in order to
in this process. constructively articulate multiple instances of
The project was initiated in 2000 [1] [2] at the Music Technology experimental users to an ongoing musical piece in real
Group (MTG) from the Pompeu Fabra University in Barcelona, time.
and most developments since 2006 have been undertaken by the x Introduction o an Ethersound [6] acoustic broadcast
Research Center for Science and Technology of the Arts (CITAR) system for the clients musical feed-back
at the Portuguese Catholic University in Porto.
x Design of a physical infrastructure, coherent with the
The PSOs system approaches the idea of collaborative musical Casa da Musica architecture, to support the client and
performances over a computer network as a Shared Sonic server terminals.
Environment aiming to go beyond the concept of simply using

Permission to make digital or hard copies of all or part of this work for 2.1 Sound Objects
personal or classroom use is granted without fee provided that copies are Community-driven creation, results in a holistic process, i.e., its
not made or distributed for profit or commercial advantage and that properties cannot be determined or explained by the sum of its
copies bear this notice and the full citation on the first page. To copy components alone [7]. A community of users involved in a
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
NIME08, June 4-8, 2008, Genova, Italy
Copyright remains with the author(s).

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

creation process, through a Shared Sonic Environment, definitely this topic, even though they were scattered over different panels,
constitutes a Whole in Holistic sense. instead of one distinct session.
According to Jan Smuts (1870-1950), the father of Holism Since then the term Networked Music has become increasingly
Theory, the concept of a Whole implies its individual parts to be consensual in defining the area, and according to Jason Freeman’s
flexible and adjustable. It must be possible for the part to be definition [12]: it is about music practice situations where
different in the whole from what it is outside the whole. In traditional aural and visual connections between participants are
different wholes a part must be different in each case from what it augmented, mediated or replaced by electronically-controlled
is in its separate state. connections.
Furthermore, the whole must itself be an active factor or influence In order to have a broad view over the scientific dissemination of
among individual parts, otherwise it is impossible to understand Networked Music research I present some of the most significant
how the unity of a new pattern arises from its elements. Whole Landmarks in the field over the last decade:
and parts mutually and reciprocally influence and modify each
other. 2.2.1 Summits and Workshops
Similarly, when questioning object’s behaviors in Physics it is The ANET Summit (August 20-24, 2004)
often by looking for simple rules that it is possible to find the The summit was organized by Stanford University’s Center for
answers. Once found, these rules can often be scaled to describe Computer Research in Music and Acoustics (CCRMA) and held
and simulate the behavior of large systems in the Real World. at the Banff Center in Canada, was the first Workshop event
This notion applies to the Acoustic Domains through the addressing the topic of High quality Audio over Computer
definition of Sound Objects as a relevant element of the music Networks. The guest lecturers were Chris Chafe, Jeremy
creation process by Pierre Schaeffer in the 1960’s. According to Cooperstock, Theresa Leonard, Bob Moses and Wieslaw
Schaeffer, a Sound Object is defined as: Woszczyk. A New edition of the ANET Summit is planed for
April 2008
“Any sound phenomenon or event perceived as a coherent whole
(…) regardless of its source or meaning” (Schaeffer, P., 1966). The Networked Music Workshop at ICMC (September 4, 2005).

Sound Object (I’object sonore), refers to an acoustical object for This Workshop was held in Barcelona and resulted from
human perception and not a mathematical or electroacoustical experience in previous ICMCs, which called for the need to
object for synthesis. One can consider a sound object the smallest realize such an event. Guest Lecturers were: Álvaro Barbosa
self-contained particle of a Soundscape [8]. Defining a universe of (Pompeu Fabra University, MTG), Scot Gresham-Lancaster
sound events by subsets of Sound Objects is a promising approach (Cogswell College Sunnyvale, CA), Jason Freeman (Georgia
for content-processing and transmission of audio [9], and from a Institute of Technology), Ross Bencina (Pompeu Fabra
psychoacoustic and perceptual point of view it provides a very University, MTG).
powerful paradigm to sculpt the symbolic value conveyed by a
Soundscape. 2.2.2 PhD Dissertations
These are some relevant dissertations published on the topic:
In an artistic context the scope for the user’s personal
interpretation is wider. Therefore such Sound Objects can have a 2002 Golo Föllmer “Making Music on the Net, social and
much deeper symbolic value and represent more complex aesthetic structures in participative music” [13]; 2002 Nathan
metaphors. Often there is no symbolic value in a sound, but once Schuett “The Effects of Latency on Ensemble Performance” [14];
there is a variation in one of its fundamental parameters it might 2003 Jörg Stelkens “Network Synthesizer” [15]; 2003 Gil
then convey a symbolic value. Weinberg “Interconnected Musical Networks: Bringing
Expression and Thoughtfulness to Collaborative Music” [16];
All these ideas about Sound Objects and the Holistic nature of
2006 Álvaro Barbosa “Displaced Soundscapes” [17]
community music are the basis for the main concept behind the
Public Sound Objects System. In fact, in PSOs raw material
provided for each user, to create his contribution to a shared 2.2.3 Journal Articles
musical piece, is a simple Sound Object. These Sound Objects, There is a number of Survey and partial overview articles on the
individually controlled, become part of a complex collective topic of Networked Music [18], [19], [20] [21] and [22] however
system in which several users can improvise simultaneously and a special issue of the journal Organised Sound from 2005 [23],
concurrently. edited by Leigh Landy, specifically focused on the topic of
Networked Music and includes many of the relevant references in
In the system a server-side real-time sound synthesis engine (a this area.
Disklavier Piano in the case of the Casa da Musica installation)
provides an interface to transform various parameters of a Sound
Object, which enables users to add symbolic meaning to their 3. THE PSOs INSTALLATION
performance. Casa da Musica is the main concert venue in the city of Porto,
and it has a strong activity in what concerns contemporary and
2.2 About Networked Music experimental forms of Music. The commission for the Public
In his Keynote Speech from ICMC 2003 Roger Dannenberg Sound Objects Installation had the underlying idea of bringing
mentioned “Networked Music” as one of the promising research music to the hallways of the house of music, so that the visitors
topics and at least four papers [2], [10] and [11] were centered on could actually interact with it.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

3.1 The User Interface

The graphical user interface is based on a bi-dimensional
graphical metaphor of an ever-going bouncing ball enclosed in a
square shape box. Each time the ball hits one of the walls a piano
key is triggered by the server according to a pitch defined by the
value of a stylized fader that frames the Box (each fader
determines the pitch of a sound triggered in its adjacent wall).

Fig.1 Casa da Musica Building 1

The final implementation consists of a Disklavier Piano controlled
via MIDI by s server that simultaneously can be used as a
terminal, located at the main foyer of Casa da Musica. This server
accepts incoming control data generated by 10 client computers
located in diverse points of the hallways of a scenic route of the
building. Incoming data is transmitted over the building’s IP
Network using Open Sound Control [24].

Fig. 4 PSOs Client interface showing the representation of 5 users

Each of the clients actuating at a given moment are visually
represented in real-time by grey balls while the user himself
controls a distinctive orange ball. The user can also add a trail to
his ball producing an arpeggio sound (or a chord if the trail
extension is zero), given that the scale of notes each client can
Fig. 2 The PSOs Server connected to the Disklavier Piano and produce was anticipated to create a harmonic soundscape when
two of the clients which remotely control the same Piano different sound overlap in time.
The sound generated at the Servers site conveys the overall The PSOs system integrates several features to overcome
performance of every user and is streamed back to each client Network Latency issues already published in [3]. Nonetheless, in
using an ETHERSOUND [6] system, which produces latencies this version a new Latency tolerance feature was implemented to
under 100 ms on the building’s LAN. improve the perceptive correlation between an impact and a
triggered sound, using a simple sound panorama adjustment at the
sound server and consequently adding consistent sound panning
with the object’s behavior at the graphical user interface.

t1 t2 t3
Fig. 3 A PSOs Client with the ETHERSOUND Hub, Speakers
and Keyboard concealed on the structure. L

All the computer hardware for the server and clients has been
cloaked by a metal structure created in coherence with the 't 2 't 3

building’s unique architecture (a project by Rem Koolhaas), so 't 1 Time

that the users only access a one key mouse and a screen, or in
case of the server a touch screen.
Fig. 5 Representation of Impacts VS Triggered Sound with sound
panorama adjustment in the presence of latency ('t)
Image Source “House of Music Opening Day” Wikipedia Commons
under the license GFDL (GNU Free Documentation License)

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

The basic idea consists of only transmitting a sound object trough [9] Amatriain, X. and Herrera, P. Transmitting Audio Content as Sound
the Right Channel of the streamed Soundscape stereo mix, when a Objects. 15-6-2002. Proceedings of the AES22 International
ball hits the right wall, transmitting only through the Left Channel Conference on Virtual, Synthetic and Entertainment Audio
when a ball hits the left wall and transmitting in booth channels [10] Stelkens, J. peerSynth: A P2P Multi-User Software with new
(L+R) if the ball hits the top or bottom wall. techniques for integrating latency in real time collaboration. 2003.
Proceedings of the International Computer Music Conference
Sound Panorama Adjustment adds an extra cue to perception in
temporal order of triggered Sound Objects and respective
correlation to ball impacts. [11] Obu, Y., Kato, T. and Yonekura, T. M.A.S.: A Protocol for a
Musical Session in a Sound Field Where Synchronization between
Musical Notes is no garanteed. 2003. International Computer Music
4. CONCLUSIONS AND FUTURE WORK Association. Proceedings of the International Computer Music
The PSOs Installation at Casa da Musica allows a piano to be Conference (ICMC2003), Singapore
controlled by 10 instances simultaneously (Ten Hands!) in a
[12] Freeman, J. The Networked Music Workshop at ICMC 2005,
coherent and constructive manner, which would hardly be Barcelona (September 4, 2005)
possible to do in a traditional way.
[13] Föllmer, G. 2002 Making Music on the Net, social and aesthetic
Even though the interface is radically different than the normal structures in participative music. Ph.D. Thesis, Martin Luther
control paradigm of a piano it is based on the same fundamental Universität Halle-Wittenberg – Germany
musical facets (Rhythm, Pitch, Timbre and Dynamic) and [14] Schuett, N. 2002 The Effects of Latency on Ensemble Performance.
therefore it is an engaging experience, since the users recognize a Ph.D. Thesis, Stanford University, California – USA
familiar result achieved trough a totally different way.
[15] Stelkens, J. 2003 Network Synthesizer. Ph.D. Thesis, Ludwig
The interface is simple enough to achieve a musical soundscape Maximilians Universität, München – Germany
with zero learning time and without any previous musical practice [16] Weinberg, G. 2003 Interconnected Musical Networks – Bringing
experience, which made the system very accessible and popular Expression and Thoughtfulness to Collaborative Music Making.
for the average 500 daily visitors of the Casa da Musica. Ph.D. Thesis, Massachusetts Institute of Technology, Massachusetts
Controlling a popular acoustical instrument brings the users closer
to the musical experience and in this sense we would like to [17] Barbosa, A. 2006 Displaced Soundscapes: Computer Supported
Cooperative Work for Music Applications. Ph.D. Thesis, Pompeu
further develop this system adding a pool of instruments to the
Fabra University, Barcelona – Spain
piano, such as wind, string and percussion instruments controlled
by Robotics. [18] Sergi Jordà, S. 1999 Faust Music On Line (FMOL): An approach to
Real-time Collective Composition on the Internet, Leonardo Music
Journal, Volume 9, pp.5-12
5. ACKNOWLEDGMENTS [19] Tanzi, D. 2001 Observations about Music and Decentralized
The author would like to thank the people that collaborated in this Environments, Leonardo Music Journal, Volume 34, Issue 5, pp.431-
project: Jorge Cardoso (UCP), Jorge Abade (UCP) and Paulo 436
Maria Rodrigues (Casa da Musica).
[20] Barbosa, A. 2003 Displaced Soundscapes: A Survey of Network
Systems for Music and Sonic Art Creation, Leonardo Music Journal,
6. REFERENCES Volume 13, Issue 1, pp.53-59
[1] Barbosa, A. and Kaltenbrunner, M. Public Sound Objects: A shared [21] Weinberg, G. 2005 Interconnected Musical Networks: Toward a
musical space on the web. 2002. IEEE Computer Society Press. Theoretical Framework, Computer Music Journal, Vol. 29, Issue 2,
Proceedings of International Conference on Web Delivering of pp.23-29
Music (WEDELMUSIC 2002) - Darmstadt, Germany
[22] Traub, P. 2005 Sounding the Net: Recent Sonic Works for the
[2] Barbosa, A., Kaltenbrunner, M. and Geiger, G. Interface Decoupled Internet and Computer Networks, Contemporary Music Review, Vol.
Applications for Geographically Displaced Collaboration in Music. 24, No. 6, December 2005, pp. 459 – 481
2003. Proceedings of the International Computer Music Conference
(ICMC2003) [23] Landy, L. 2005 Organised Sound 10 (Issue 3), Cambridge University
Press, U.K. (OS: ISSN: 1355-7718)
[3] Barbosa, A., Cardoso, J. and Geiger, G. Network Latency Adaptive
Tempo in the Public Sound Objects System. 2005. Proceedings the [24] Wright, M. and Freed, A. 1997 Open Sound Control: A New
International Conference on New Interfaces for Musical Expression Protocol for Communicating with Sound Synthesizers, proceedings
(NIME 2005); Vancouver, Canada. of the International Computer Music Conference
[4] Puckette, M. Pure Data. 269-272. 1996a. International Computer [25] Nella, M. J. Constraint Satisfaction and Debugging for
Music Association. Proceedings of the International Computer Interactive User Interfaces. Ph.D. Thesis, University of
Music Conference, San Francisco (ICMC96) Washington, Seattle, WA, 1994.
[5] Yamaha Disklavier Piano:
.html (Cunsulted 2008/01/30)
[6] ETHERSOUND: (Cunsulted 2008/01/30)
[7] Smuts, J. Holism and Evolution. 1926. Macmillan, London UK
[8] Schaeffer, P., Traité des Objets Musicaux., Le Seuil, Paris, 1966

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Large-Scale Mobile Audio Environments

for Collaborative Musical Interaction

Mike Wozniewski & Zack Settel Jeremy R. Cooperstock

Nicolas Bouillot Université de Montréal Centre for Intelligent Machines
Centre for Intelligent Machines Montréal, Québec, Canada McGill University
McGill University Montréal, Québec, Canada
Montréal, Québec, Canada

ABSTRACT within our physical environment. This prospect yields a

New application spaces and artistic forms can emerge when new domain for musical interaction employing augmented-
users are freed from constraints. In the general case of reality interfaces and large multi-user environments.
human-computer interfaces, users are often confined to a We present a system where multiple participants can nav-
fixed location, severely limiting mobility. To overcome this igate about a university campus, several city blocks, or an
constraint in the context of musical interaction, we present even larger space. Equipped with position-tracking and
a system to manage large-scale collaborative mobile audio orientation-sensing technology, their locations are relayed
environments, driven by user movement. Multiple partici- to other participants and to any servers that are managing
pants navigate through physical space while sharing over- the current state. With a mobile device for communication,
laid virtual elements. Each user is equipped with a mobile users are able interact with an overlaid virtual audio envi-
computing device, GPS receiver, orientation sensor, micro- ronment, containing a number of processing elements. The
phone, headphones, or various combinations of these tech- physical space thus becomes a collaborative augmented-
nologies. We investigate methods of location tracking, wire- reality environment where immersive musical interfaces can
less audio streaming, and state management between mobile be explored. Musicians can input audio at their locations,
devices and centralized servers. The result is a system that while virtual effects processors can be scattered through
allows mobile users, with subjective 3-D audio rendering, the scene to transform those signals. All users, perform-
to share virtual scenes. The audio elements of these scenes ers and audience alike, receive subjectively rendered spatial
can be organized into large-scale spatial audio interfaces, audio corresponding to their particular locations, allowing
thus allowing for immersive mobile performance, locative for unique experiences that are not possible in traditional
audio installations, and many new forms of collaborative music performance venues.
sonic activity.

sonic navigation, mobile music, spatial interaction, wireless
audio streaming, locative media, collaborative interfaces


With the design of new interfaces for musical expression,
it is often argued that control paradigms should capitalize
on natural human skills and activities. As a result, a wide
range of tracking solutions and sensing platforms have been
explored, which translate human action into signals that can
be used for the control of music and other forms of media.
The physical organization of interface components plays an Figure 1: A mobile performer
important role in the usability of the system, since user mo-
tion naturally provides kinesthetic feedback, allowing a user
to better remember the style of interaction and gestures re- 1.1 Background
quired to trigger certain events. Also, as digital devices be- In earlier work, we have spent significant time exploring
come increasingly mobile and ubiquitous, we expect interac- how virtual worlds can be used as musical interfaces. The
tive applications to become more distributed and integrated result of this investigation has led to the development of the
Audioscape engine [30]1 , which allows for the spatial orga-
nization of sound processing, and provides an audiovisual
rendering of the scene for feedback. Audio elements can
Permission to make digital or hard copies of all or part of this work for be arranged in a 3-D world and precise control over the di-
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
rectivity of propagating audio is provided to the user. For
bear this notice and the full citation on the first page. To copy otherwise, to example, an audio signal emitted by a sound generator may
republish, to post on servers or to redistribute to lists, requires prior specific be steered toward a sound processor that exists at some
permission and/or a fee. 3-D location. The processed signal may again be steered
NIME08, Genova, Italy 1
Copyright 2008 Copyright remains with the author(s). Available at

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

towards a virtual microphone that captures and busses the spaces emerge that can operate in a potentially unbounded
sound to a loudspeaker where it is heard. The result is a physical space. These offer many novel possibilities that can
technique of spatial signal bussing, which lends particularly lead to new artistic approaches; or they can re-contextualize
well to many common mixing operations. Gain control, for existing concepts that can then be revisited and expanded
instance, is accomplished by adjusting the distance between upon. An excellent example is parade music, where sound
two nodes in the scene, while filter parameters can be con- emission is spatially dynamic or mobile; a passive listener
trolled by changing orientations. remains in one place while different music is coming and
The paradigm of organizing sound processing in three- going. One hundred years ago, Charles Ives integrated this
dimensional space has been explored in some of our previous concept into symphonic works, where different musical ma-
publications [27, 28, 26]. We have seen that users easily un- terial flowed through the score, extending our notions of
derstand how to interact with these scenes, especially when counterpoint to include those based on proximity of mu-
actions are related to every-day activity. For instance, it is sical material. The example of parade music listening ex-
instantly understood that increasing the distance between pands to include two other cases: a mobile listener can walk
two virtual sound elements will decrease the intensity of with or against the parade, yielding additional relationships
the transmitted signal, and that pointing a sound source to the music. Our work also integrates the concept of ac-
in a particular direction will result in a stronger signal at tive listening; material may be organized topographically
the target location. We have designed and prototyped sev- in space, produced by mobile performers and encountered
eral applications using these types of interaction techniques, non-linearly by mobile listeners. From this approach come
including 3-D mixing, active listening, and using virtual ef- several rich musical forms, which like sculpture, integrate
fects racks [27, 29]. Furthermore, we began to share virtual point of view ; listeners/observers create their own unique
scenes between multiple participants, each with subjective rendering. Thus, artists may create works that explore the
audio rendering and steerable audio input, allowing for the spatial dynamics of musical experience, where flowing mu-
creation of virtual performance venues and support for vir- sic content is put in counterpoint by navigation. Musical
tual reality video conferencing [31]. scores begin to resemble maps, and listeners play a larger
While performers appreciated the functionality of these role in authoring their experiences.
earlier systems, they were nevertheless hampered by con-
straints on physical mobility. These applications operated 1.3 Related Work
mainly with game-like techniques, where users stood in front With respect to collaborative musical interfaces, Blaine
of screens, and navigated through the scene using controllers and Fels provide an overview of many systems, classifying
such as joysticks or gamepads. The fact that the gestures for them according to attributes such as scale, type of media,
moving and steering sound were abstracted through these amount of directed interaction, learning curve, and level
intermediate devices resulted in a lack of immersive feeling of physicality, among others [7]. However, most of these
and made the interfaces more complicated to learn. systems rely on users to be in a relatively fixed location in
We thus decided to incorporate more physical movement, front of a computer. The move to augmented- or mixed-
for example, sensing the user’s head movement with an ori- reality spaces seems like a natural evolution, offering users
entation sensor attached to headphones, and applying this a greater level of immersion in the collaboration, and their
to affect changes to the apparent spatial audio rendering. respective locations can be used for additional control.
To further extend this degree of physical involvement we be- In terms of locative media, some projects have considered
gan to add real-world location awareness to the system, al- the task of tagging geographical locations with sound. The
lowing users to move around the space physically instead of [murmur] project [2] is one simple example, where users
virtually. For example, our 4Dmix3 installation [4] tracked tag interesting locations with phone numbers. Others can
up to six users in an 80m2 gallery space. The motion of call the numbers using their mobile phones and listen to
each user controlled the position of a recording buffer, which audio recordings related to the locations. Similarly, the
could travel among a number of virtual sound generators in Hear&There project [20] allows recording audio at a given
the scene. The result was a type of remixing application, GPS coordinate, while providing a spatial rendering of other
where users controlled the mix by moving through space. recordings as users walk around. Unfortunately, this is lim-
In the remainder of this paper, we explore the use of ited to a single-person experience, where the state of the
larger scale position tracking, such as that of a Global Po- augmented reality scene is only available on one computer.
sitioning System (GPS), and the resulting challenges and Tanaka proposed an ad-hoc (peer-to-peer) wireless network-
opportunities that such technology presents. We evolve ing strategy to allow multiple musicians to share sound si-
our framework to support a more distributed and mobile- multaneously using hand-held computers [22]. Later work
capable architecture, which results in the need for wireless by Tanaka and Gemeinboeck [23] capitalized on location-
audio streaming and the distribution of information about based services available on 3G cellular networks to acquire
the mobile participants. Sections 2 and 3 describe the addi- coarse locations of mobile devices. They proposed the cre-
tional technical elements that need to be introduced into the ation of locative media instruments, where geographic local-
system to support wireless and mobile applications, while ization is used as a musical interface.
Section 4 demonstrates a prototypical musical application Large areas can also be used for musical interaction in
using this new architecture. Musicians in the Mobile Au- other ways. Sonic City [16] proposed mobility, rather than
dioscape are able to navigate through an outdoor environ- location, alone, for interaction. As a user walks around a
ment containing a superimposed set of virtual audio ele- city, urban sounds are processed in real time as a result of
ments. Real physical gestures can be used to steer and readings from devices such as accelerometers, light sensors,
move sound through the space, providing an easily under- temperature sensors, and metal detectors. Similarly, the
stood paradigm of interaction in what can now be thought Sound Mapping [19] project included gyroscopes along with
of as a mobile music venue. GPS sensors in a suitcase that users could push around a
small area. Both position changes and subtle movements
1.2 Mobile Music Venues could be used to manipulate the sound that was transmitted
By freeing users from the confines of computer termi- between multiple cases in the area via radio signal.
nals and interfaces that severely limit mobility, application Orientation or heading can also provide useful feedback,

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

since spatial sound conveys a great deal of information about tions that transmit error corrections over radio frequencies.
directions of objects and the acoustics of an environment. The idea is that mobile GPS units in the area will have
Projects including GpsTunes [21] and Melodious Walkabout similar positional drift, and correcting this can yield accu-
[15] use this type of information to provide audio cues that racies of under 1m. Another technique, known as assisted
guide individuals in specific directions. GPS (AGPS), takes advantage of common wireless networks
We take inspiration from the the projects mentioned above, (cellular, bluetooth, WiFi) in urban environments to access
and incorporate many of these ideas into our work. How- reference stations with a clear view of the sky (e.g., on the
ever, real-time high-fidelity audio support for multiple indi- roofs of buildings). Although accuracy is still in the or-
viduals has not been well addressed. Tanaka’s work [22], as der of 15m, the interesting benefit of this system is that
well as some of our past experiences [8], demonstrate how localization can be attained indoors (with an accuracy of
we can deal with the latencies associated with distributed approximately 50m) [6].
audio performance, but minimizing latency remains a ma-
jor focus of our work. The ability to create virtual audio 2.3 Orientation & Localization
scenes will be supported with some additions to our existing While GPS devices provide location information, it is also
Audioscape engine. To address the need of distributed mo- important to capture a listener’s head orientation so that
bile interaction, we are adding large-scale location sensing spatial cues can be provided, the resulting sound appearing
and the ability to distribute state, signals, and computa- to propagate from a particular direction. Most automotive
tion among mobile clients effectively. These challenges are GPS receivers report heading information by tracking the
addressed in the following sections. vehicle trajectory over time. This is a viable strategy for in-
ferring the orientation of a vehicle, but a listener’s head can
change orientation independently of body motion. More-
2. LOCATIVE TECHNOLOGY over, the types of applications we are targeting will likely
In order to support interaction in large-scale spaces, we involve periods of time where a user does not change posi-
require methods of tracking users and communicating be- tion, but stays in one place and orients his or her body in
tween them. A variety of mobile devices are available for various directions. Therefore, additional orientation sensing
this purpose, potentially equipped with powerful processors, seems to be a requirement.
wireless transmission, and sensing technologies. For our ini- In human psychoacoustic perception, accuracy and re-
tial prototypes, we chose to develop on Gumstix (verdex sponsiveness of orientation information are important, since
XM4-bt) processors with expansion boards for audio I/O, a listener’s ability to localize sound is highly dependent on
GPS, storage, and WiFi communication [17]. These devices changes in phase, amplitude, and spectral content with re-
have the benefit of being full-function miniature computers spect to head motion. Responsiveness, in particular, is a
(FFMC) with a large development community, and as a significant challenge, considering the wireless nature of the
result, most libraries and drivers can be supported easily. system. Listeners will be moving their heads continuously
to help localize sounds, and a delay of more than 70ms in
2.1 Wireless Standards spatial cues can hinder this process [10]. Furthermore, it
Given that the most generally available wireless technolo- has been demonstrated that head-tracker latency is most
gies on mobile devices are Bluetooth and WiFi, we consider noticeable in augmented reality applications, as a listener
the benefits and drawbacks for each of these standards . For can compare virtual sounds to reference sounds in the real
transmission of data between sensors located on the body environment. In these cases, latencies as low as 25ms can be
and the main processing device, Bluetooth is a viable solu- detected, and begin to impair performance in localization
tion. However, even with Bluetooth 2.0, a practical transfer tasks at slightly greater values [11]. It is therefore suggested
rate is typically limited to approximately 2.1 Mbps. If we that latency be maintained below 30ms.
want to send or receive audio (16 bit samples at 44kHz), To track head orientation, we attach an inertial measure-
approximately 700 kbps of bandwidth is needed for each ment unit (IMU) to the headphones of each participant,
stream. In theory, this allows for interaction between up capable of sensing instantaneous 3-D orientation with an
to three individuals, where each user sends one stream and error of less than 1 degree. It should be mentioned that not
receives two. Given the need to support a greater number all applications will require this degree of precision, and
of participants, we are forced to use WiFi.2 Furthermore, some deployments could potentially make use of trajectory-
the range of Bluetooth is limiting, whereas WiFi can relay based orientation information. For instance, the Melodious
signals through access points. Furthermore, we can make Walkabout [15] uses aggregated GPS data to determine the
use of higher-level protocols such as Optimized Link State direction of travel, and provides auditory cues to guide in-
Routing protocol (OLSR) [18], which computes optimal for- dividuals in specific directions. Users hear music to their
warding paths for ad-hoc nodes. This is a viable way to left if they are meant to take a left turn, whereas a low-pass
reconfigure wireless networks if individuals are moving. filtered version of their audio is heard if they are traveling
in the wrong direction. We can conceive of other types of
2.2 GPS applications, where instantaneous head orientation is not
GPS has seen widespread integration into a variety of needed, and users could adjust to the paradigm of hear-
commodity hardware such as cell phones and PDAs. These ing audio spatialization according to trajectory rather than
provide position tracking in outdoor environments, typically line of sight. Of particular interest, are high-velocity appli-
associated with the 3-D geospatial coordinates of users. cations such as skiing or cycling, where users are generally
However, accuracy in consumer-grade devices is quite poor, looking forward, in the direction of travel. Such constraints
ranging between approximately 5m in the best case (high- can help with predictions of possible orientations, while the
quality receiver with open skies) [25] to 100 metres or more faster speed helps to overcome the coarse resolution of cur-
[6]. Several methods exist to reduce error, for example, rent GPS technology.
differential GPS (DGPS) uses carefully calibrated base sta-
We note viable alternatives on the horizon, such as the
newly announced SOUNDabout Lossless codec, which al- The move to mobile technology presents significant de-
lows even smaller audio streams to be sent over Bluetooth. sign challenges in the domain of audio transmission, largely

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

related to scalability and the effects of latency on user expe- for Pure Data [3], and can be deployed on both a central
rience. More precisely, a certain level of quality needs to be server and a mobile device.
maintained to ensure that mobile performers and audience In benchmark tests, we have successfully transmitted un-
members experience audio fidelity that is comparable to compressed streams with an outgoing packet size of 64 sam-
traditional venues. The design of effective solutions should ples. The receiver buffer holds two packets in the queue
take into account that WiFi networks provide variable per- before decoding, meaning that a delay of three packets is
formance based on the environment, and that small and encountered before the result can be heard. With a sam-
lightweight mobile devices are, at present, limited in terms pling rate of 44.1kHz, this translates to a packetization and
of computation capabilities. receiving latency of 3 × (64/44.1) = 4.35ms. In addition,
the network delay can be as low as 2ms, provided that the
3.1 Scalability users are relatively close to each other, and typically does
Reliance on unicast communication between users in a not exceed 10ms for standard wireless applications. The
group suffers a potential n2 effect of audio interactions be- sum of these latencies is in the order of 7-15ms.
tween them and in turn, to bandwidth explosion. We have Practical performance will, of course, depend on the wire-
investigated a number of solutions to deal with this prob- less network being used and the number of streams trans-
lem. mitted. Our experiments show that high packet rate results
Multicast technology, for instance, allows devices to send in network instability and high jitter. In such situations it
UDP packets to an IP multicast address that virtualizes a is necessary to increase packet size to help maintain an ac-
group of receivers. Interested clients are able to subscribe to ceptable packet rate. This motivates us, as future work, to
the streams of relevance, drastically reducing the overall re- investigate algorithms for autonomous adaptation of low-
quired bandwidth. However, IP multicast over IEEE 802.11 latency protocols that deal both with quality and scalabil-
wireless LAN is known to exhibit unacceptable performance ity.
[14] due to unsupported collision avoidance and acknowl-
edgement at the MAC layer. Our benchmark tests confirm
that multicast transmission experienced higher jitter than 4. MOBILE AUDIOSCAPE
unicast, mandating a larger receiver buffer to maintain qual- Our initial prototyping devices, Gumstix, were chosen
ity. Furthermore, packet loss for the multicast tests was in to provide: 1) wireless networking for bidirectional high-
the order of 10-15%, resulting in a distorted audio stream, quality, low-latency audio and data streams, 2) local au-
while unicast had almost negligible losses of 0.3%. Based on dio processing, 3) on-board device hosting for navigation
these results, we decided to rely for now on a point-to-point and other types of USB or Bluetooth sensors, 4) minimal
streaming methodology while experimenting with emerging size/weight, and 5) Linux support. A more detailed ex-
non-standard multicast protocols, in anticipation of future planation of our hardware infrastructure can be found in
improvements. another publication [9], in particular, the method of Blue-
tooth communication between Gumstix and sensors.
3.2 Low Latency Streaming To develop on these devices, a cross-compilation toolchain
Mobile applications tend to rely on compression algo- was needed that could produce binaries for the ARM-based
rithms to respect bandwidth constraints. As a result they 400MHz Gumstix processors (Marvell’s PXA270). The first
often incur signal delays that challenge musical interaction library that we needed to build was a version of Pure Data
and performer synchronization. Acceptable latency toler- (Pd), which is used extensively for audio processing and
ance depends on the style of music, with figures as low as control signal management by our Audioscape engine. Par-
10ms [12] for fast pieces. More typically, musicians have dif- ticularly, we used Pure Data anywhere (PDa), a special
ficulty synchronizing with latencies above 50ms [13]. Most fixed-point version of Pd for use with the processors typ-
audio codecs require greater than this amount of encod- ically found on mobile devices [5]. Several externals needed
ing time.3 Due in part to limited computational resources to be built for PDa, including a customized version of the
available on our mobile devices, we instead transmit un- Open Sound Control (OSC) objects, where multicast sup-
compressed audio, thus fully avoiding codec delays in the port was added, and the nstream object, mentioned in Sec-
system. tion 3.2. The latter was also specially designed to support
Other sources of latency include packetization delay, cor- both regular Pd and PDa, using sample conversion for in-
responding to the time required to fill a packet with data teroperability between an Apple laptop, PC and Gumstix
samples for transmission, and network delay, which varies units.
according to network load and results in jitter at the re- We also supplied each user with an HP iPAQ, loaded
ceiver. Soundcard latencies also play a role, but we con- with a customized application that could graphically repre-
sider this to be outside of our control. The most effective sent their location on a map. This program was authored
method for managing these remaining delays may be to min- with HP Mediascape software [1], which supports the play-
imize the size of transmitted packets. By sending a smaller back of audio, video, and even Flash based on user position.
number of audio samples in each network packet, we also The most useful aspect of this software was the fact that
decrease the amount of time that we must wait for those we could use Flash XML Sockets to receive GPS locations
samples to arrive from the soundcard. of other participants and update the display accordingly.
In this context, we have developed an dynamically recon- Although we used the Compact Flash GPS receivers with
figurable transmission protocol for low-latency, high-fidelity the iPAQs for sending GPS data, the interface between Me-
audio streaming. Our protocol, nstream, supports dynamic diascape software and the Flash program running within it
adjustment of sender throughput and receiver buffer size. only allowed for updates at 2Hz, corresponding to a latency
This is accomplished by switching between different levels of at least 500ms before position-based audio changes were
of PCM quantization (8, 16 and 32 bit), packet size, and re- heard. The use of the GPSstix receiver, directly attached
ceiver buffer size. The protocol is developed as an external to the Gumstix processor, is highly recommended to anyone
Possible exceptions are the Fraunhofer Ultra-Low De- attempting to reproduce this work.
lay Codec (offering a 6ms algorithmic delay) [24] and the The resulting architecture is illustrated in Figure 2. In-
SOUNDabout Lossless codec (claiming under 10ms). put audio streams are sent as mono signals to an Audioscape

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Figure 3: Two participants jamming in a virtual

echo chamber, which has been arbitrarily placed on
the balcony of a building at the Banff Centre.

Figure 2: Mobile Audioscape architecture. Solid We have presented the challenges associated with sup-
lines indicate audio streaming while dotted lines porting multiple participants in such a system, including
show transmission of control signals. the need for accurate sensing technologies and network ar-
chitectures that can support low latency communication in
a scalable fashion. The prototype application that we devel-
server on a nearby laptop. The server also receives all con- oped was well-received by those who experimented with it,
trol data via OSC from the iPAQ devices and stores location but many improvements still need to be made. The coarse-
information for each user. A spatialized rendering is com- ness of resolution available in consumer-grade GPS technol-
puted, and stereo audio signals are sent back to the users. ogy is such that an application must span a wide area for it
For all streams, we send audio with a sampling rate of 44.1 to be of any value. This is problematic, since the range of
kHz and 16-bit samples. a WiFi network is much smaller, mandating redirection of
In terms of network topology, wireless ad-hoc connections signals through additional access points or OLSR peers. If
are used, allowing users to venture far away from buildings all signals must first travel to a server for processing, then
with access points (provided that the laptop server is moved distant nodes will suffer from very large latency.
as well). Due to the number of streams being transmitted, One solution is to distribute the state of the virtual scene
audio is sent with 256 samples per packet, which ensures an to all client machines, and perform rendering locally on the
acceptable packet rate and reduces jitter on the network. mobile devices. For the prototype application that we de-
The result is a latency of 3 × (256/44.1) = 17.4ms for pack- veloped, this would cut latency in half since audio signals
etization and a minimal network delay of about 2ms. How- would only need to travel from one device to another, with-
ever, since audio is sent to a central server for processing out the need to return from a central processing server. Fur-
before being heard, these delays are actually encountered thermore, this strategy would allow users to be completely
twice, for a total latency of approximately 40ms. This is free in terms of mobility, rather than in within contact with
well within the acceptable limit for typical musical perfor- the server for basic functionality. However, for scenes of
mance, and was not noticed by users of the system. any moderate complexity, this demands much more pro-
The artistic application we designed allows users to navi- cessing power and memory than is currently available in
gate through an overlaid virtual audio scene. Various sound consumer devices, and of course, the number of users will
loops exist at fixed locations, where users may congregate still be limited by the available network bandwidth required
and jam with accompanying material. Several virtual volu- for peer-to-peer streaming.
metric regions are also located in the environment, allowing A full investigation into distributing audio streams, state
some users to escape within a sonically isolated area of the and computational load will be presented in future work,
scene. Furthermore, each of these enclosed regions serves but for the moment we have provided a first step into the
as a resonator, providing musical audio processing (e.g., de- exploration of large-scale mobile audio environments. The
lay, harmonization or reverb) to signals played within. As multi-user nature of the system coupled with high-fidelity
soon as players enter such a space, their sounds are modi- audio distribution provides a new domain for musical prac-
fied, and a new musical experience is encountered. Figure tice. We have already designed outdoor spaces for sonic
3 shows two such performers, who have chosen to jam in a investigation, and hope to perform and create novel musi-
harmonized echo chamber. They are equipped with Gum- cal interfaces in this new mobile context.
stix and iPAQs, with both unobtrusively in their pockets.
5. DISCUSSION The authors wish to acknowledge the generous support
Approaching mobile music applications from the perspec- of NSERC and Canada Council for the Arts, which have
tive of virtual overlaid environments, allows novel paradigms funded the research and artistic development described in
of artistic practice to be realized. The virtualization of per- this paper through their New Media Initiative. The proto-
former and audience movement allows for interaction with type application described in Section 4 was produced in co-
sound and audio processing in a spatial fashion that leads to production with The Banff New Media Institute (Alberta,
new types of interfaces and thus, new musical experiences. Canada). The authors would like to thank the participants

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

of the locative media residency for facilitating the work and musical expression (NIME), pages 109–115,
in particular, Duncan Speakman, who provided valuable as- Singapore, 2003.
sistance with the HP Mediascape software. [17] Gumstix.
[18] P. Hipercom. RFC 3626, Optimized Link State
7. REFERENCES Routing protocol (OLSR), 2003.
[19] I. Mott and J. Sosnin. Sound mapping: an assertion
[1] HP Mediascape website.
of place. In Proceedings of Interface, 1997.
[2] The [murmur] project.
[20] J. Rozier, K. Karahalios, and J. Donath. Hear &
[3] Pure Data. There: An augmented reality system of linked audio.
[4] Webpage: 4Dmix3. au- In Proceedings of International Conference on Auditory Display (ICAD), 2000.
[5] PDa: Real time signal processing and sound [21] S. Strachan, P. Eslambolchilar, R. Murray-Smith,
generation on handheld devices. In International S. Hughes, and S. O’Modhrain. GpsTunes:
Computer Music Conference (ICMC), 2003. Controlling navigation via audio feedback. In
[6] R. Bajaj, S. L. Ranaweera, and D. P. Agrawal. GPS: International Conference on Human Computer
Location-tracking technology. Computer, 35(4):92–94, Interaction with Mobile devices & services
2002. (MobileHCI), pages 275–278, New York, 2005. ACM.
[7] T. Blaine and S. Fels. Contexts of collaborative [22] A. Tanaka. Mobile music making. In Proceedings of
musical experiences. In Proceedings of the conference New Interfaces for Musical Interaction (NIME), 2004.
on New Interfaces for Musical Expression (NIME), [23] A. Tanaka and P. Gemeinboeck. A framework for
pages 129–134, Montreal, 2003. spatial interaction in locative media. In Proceedings
[8] N. Bouillot. nJam user experiments: Enabling remote New Interfaces for Musical Expression (NIME), pages
musical interaction from milliseconds to seconds. In 26–30, Paris, France, 2006. IRCAM.
Proceedings of the International Conference on New [24] S. Wabnik, G. Schuller, J. Hirschfeld, and U. Krämer.
Interfaces for Musical Expression (NIME), pages Reduced bit rate ultra low delay audio coding. In
142–147, New York, NY, USA, 2007. ACM. Proceedings of the 120th AES Convention, May 2006.
[9] N. Bouillot, M. Wozniewski, Z. Settel, and J. R. [25] M. Wing, A. Eklund, and L. Kellogg. Consumer-grade
Cooperstock. A mobile wireless platform for global positioning system (GPS) accuracy and
augmented instruments. In International Conference reliability. Journal of Forestry, 103(4):169–173, 2005.
on New Interfaces for Musical Expression, Genova, [26] M. Wozniewski. A framework for interactive
Italy, 2008. three-dimensional sound and spatial audio processing
[10] D. Brungart, B. Simpson, R. McKinley, A. Kordik, in a virtual environment. Master’s thesis, McGill
R. Dallman, and D. Ovenshire. The interaction University, 2006.
between head-tracker latency, source duration, and [27] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
response time in the localization of virtual sounds. In framework for immersive spatial audio performance.
Proceedings of the International Conference on In New Interfaces for Musical Expression (NIME),
Auditory Display (ICAD), 2004. Paris, pages 144–149, 2006.
[11] D. S. Brungart and A. J. Kordik. The detectability of [28] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
headtracker latency in virtual audio displays. In paradigm for physical interaction with sound in 3-D
Proceedings of International conference on Auditory audio space. In Proceedings of International
Display (ICAD), pages 37–42, 2005. Computer Music Conference (ICMC), 2006.
[12] E. Chew, A. A. Sawchuk, R. Zimmerman, [29] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
V. Stoyanova, I. Tosheff, C. Kyriakakis, spatial interface for audio and music production. In
C. Papadopoulos, A. R. J. François, and A. Volk. Digital Audio Effects (DAFx), 2006.
Distributed immersive performance. In Proceedings of
[30] M. Wozniewski, Z. Settel, and J. R. Cooperstock.
the Annual National Association of the Schools of
Audioscape: A pure data library for management of
Music (NASM), San Diego, CA, 2004.
virtual environments and spatial audio. In Pure Data
[13] E. Chew, R. Zimmermann, A. A. Sawchuk, Convention, Montreal, 2007.
C. Papadopoulos, C. Kyriakakis, C. Tanoue, D. Desai,
[31] M. Wozniewski, Z. Settel, and J. R. Cooperstock.
M. Pawar, R. Sinha, and W. Meyer. A second report
User-specific audio rendering and steerable sound for
on the user experiments in the distributed immersive
distributed virtual environments. In International
performance project. In Proceedings of the 5th Open
conference on auditory displays (ICAD), 2007.
Workshop of MUSICNETWORK: Integration of
Music in Multimedia Applications, 2005.
[14] D. Dujovne and T. Turletti. Multicast in 802.11
WLANs: an experimental study. In MSWiM ’06:
Proceedings of the 9th ACM international symposium
on Modeling analysis and simulation of wireless and
mobile systems, pages 130–138, New York, NY, USA,
2006. ACM.
[15] R. Etter. Implicit navigation with contextualized
personal audio contents. In Adjunct Proceedings of the
Third International Conference on Pervasive
Computing, pages 43–49, 2005.
[16] L. Gaye, R. Mazé, and L. E. Holmquist. Sonic city:
the urban environment as a musical interface. In
Proceedings of the Conference on New interfaces for

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Open Sound Control: Constraints and Limitations

Angelo Fraietta
Smart Controller Pty Ltd
PO Box 859
Hamilton 2303, Australia

ABSTRACT time control of sound” [17] and is unsuitable as an end-to-end

Open Sound Control (OSC) is being used successfully as a protocol for most constrained embedded systems.
messaging protocol among many computers, gestural This paper will first describe some of the powerful features
controllers and multimedia systems. Although OSC has provided by OSC before dispelling some of the myths regarding
addressed some of the shortcomings of MIDI, OSC cannot OSC. Finally, some strategies will be proposed that could be
deliver on its promises as a real-time communication protocol used to develop a protocol to meet the needs of constrained
for constrained embedded systems. This paper will examine systems.
some of the advantages but also dispel some of the myths
concerning OSC. The paper will also describe how some of the
best features of OSC can be used to develop a lightweight 2. OSC FEATURES
protocol that is microcontroller friendly.
2.1 OSC Addressing Scheme
The OSC address scheme provides three main features: the
Keywords ability to give the mapped address an intuitive name, the ability
MIDI, Open Sound Control, Data Transmission Protocols, to increase the maximum number of namespaces, and the ability
Gestural Controllers. to define a range of addresses within a single message.

2.1.1 Intuitive Names

1. INTRODUCTION OSC is similar to MIDI in that it defines mapped points and
Open Sound Control (OSC) has been implemented as a
values to be assigned to those points. For example, if a gestural
communications protocol in more than a few hardware and
controller had the left finger position mapped to ‘MIDI
software projects. The general impression appears to be that
controller 12 on Channel 1’, a value of ‘127’ would be
“MIDI is a simple and cheap way to communicate between a
accomplished by sending the bytes ‘0xB01 0x0C 0x7F’. The
controller and computer, but it is limited in terms of bandwidth
point being mapped is defined by the first two bytes, while the
and precision and on the way out, OpenSound Control [sic]
value of the point is defined by the last byte. In OSC, setting a
being a better alternative”[1]. In some cases, developers felt that
point with a value could be done with the following message:
they had to implement OSC in new instruments to maintain any
‘/minicv/forefinger 127’; the address being ‘/minicv/forefinger’.
sort of credibility in the NIME community [4]. It appears that
The ability to provide an intuitive name to a parameter is a
the general consensus in computer music communities is that
function of composition rather than a function of performance.
OSC is computer music’s new ‘royal robe’ to replace the
It is much easier for a composer to map a musical event to a
outdated, slow, ‘tattered and torn’ MIDI and its “well-
meaningful name, such as ‘/lefthand’ than it is to map to some
documented flaws” [18]. This perception could be implied due
esoteric set of numbers such as ‘0xB0 0x0C’.
to the lack of papers critical of OSC.
OSC has provided some very useful and powerful features that 2.1.2 Increased Namespace
were not previously available in MIDI, including an intuitive The addressing feature of OSC enables the users to increase the
addressing scheme, the ability to schedule future events, and possible number of mapped points. In MIDI, for example, after
variable data types. Although more and more composers are continuous controllers 0 to 127 on channels 1 to 16 have all
developing and composing for low power, small footprint, and been assigned, the namespace for continuous controllers has
wireless instruments and human interfaces [3, 13, 14]; a move been exhausted. In OSC, however, if two performers required
toward OSC in these application is not always possible, nor the namespace ‘lefthand’, the address space could be expanded
desirable. Although OSC has addressed some of the limitations “through a hierarchical namespace similar to URL notation”
of MIDI, OSC does not provide “everything needed for real- [18]. For example, the two different performers could use
‘/performer1/lefthand’ and ‘/performer2/lefthand’. Each OSC
client will receive these messages, and due to the packet
Permission to make digital or hard copies of all or part of this work for paradigm of OSC [18], the client that does not require the
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
message will discard it.
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
NIME08, June 5-7, 2008, Genova, Italy
Copyright remains with the author(s). 0x in front of the number signifies it is a hexadecimal value.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

2.1.3 Address Ranges [4]. The speed comparison between OSC and MIDI is always
The namespace feature of OSC is extremely powerful in that it made at the 31.25 kilobits per second Data Layer in MIDI3, and
enables a significantly large number of namespaces and the so Wright and Freed state that MIDI is “roughly 300 times
ability to define a range of points in a single message. For slower” [18] than OSC. Speed, by its definition, is a function
example, the OSC namespace ‘/minicv/left* 127’ would set the of time; in the same way the weight is not just a function of
value of ‘127’ from ‘/minicv/leftThumb’ right through to mass, but also a function of gravity. Comparing the speed of
‘/minicv/leftPinkie.’ MIDI with that of OSC is akin to comparing the weight of a
2Kg ball on earth with a 600Kg ball in outer space where the
gravity is zero. A more accurate speed comparison between
2.2 OSC Data Types OSC and MIDI would be made by comparing the two protocols
One of the brilliant features of OSC is the ability to define at identical layers on the OSI stack, comparing the time taken
different data types that can be transmitted in a message. for the target data to be encoded and then decoded on identical
Although it is possible to send any data type of any resolution layers on the stack using identical processors. If one was to
using MIDI system exclusive messages, OSC has provided a measure the number of machine instructions required to parse a
standard for software and hardware developers from different typical MIDI message with that of a typical OSC message,
vendors. MIDI would win hands down.

2.3 Time Tags 3.2 OSC is Efficient

OSC contains a feature where future events can be sent in “Open SoundControl [sic] is … efficient… and readily
advance, allowing “receiving synthesizers to eliminate jitter implementable on constrained, embedded systems.” [18].
introduced during packet transport” [18] providing “sub- Efficiency is generally the ability to accomplish a particular task
nanosecond accuracy over a range of over 100 years” [19]. with the minimum amount of wastage or expenditure. In the
context of a gestural controller, it would be the ability to
3. MYTHS provide the same or similar functionality with the minimum
When one considers that OSC has been used in some very amount processor speed, memory, power, and bandwidth while
impressive installations and performances, such as providing the same or similar functionality. Efficiency is a
“multichannel audio and video streams and robot control relative term—what is deemed efficient today may be deemed
sequences at Disneyland” [OSC Newsgroup], it is not too inefficient tomorrow when newer technologies or algorithms are
difficult to understand why one may be reluctant to write a developed. In order to evaluate whether OSC is efficient, one
critical paper when OSC is gaining a ‘legendary’ reputation. If does not necessarily need to compare it in its entirety to a pre-
one is to consider using OSC on a constrained system, one existing system, but rather, to demonstrate how the resources
should separate fact from the fable, using maths to dispel the are being wasted.
myths. The two myths this paper will dispel are that OSC is fast
and that OSC is efficient. In a real-time system, such as a music performance, the ability
to meet timing constraints is of primary importance [15]. The
3.1 OSC is Fast system “must respond to external events in a timely fashion,
There is a belief in the NIME community that OSC is a fast which means that for all practical purposes, a late computation
communications protocol; for example, “The choice for OSC … is just as bad as an outright wrong computation” [8]. Many
was for its high speed, powerful protocol, and driver/OS- newer mobile musical interfaces are communicating wirelessly;
independency” [5]. Statements such as these are normally based for example, the “Pocket Gamelan”, which uses mobile
on a comparison between the data transmission rate between telephones that communicate amongst themselves [14].
OSC and MIDI in their typical applications [OSC Newsgroup].2 Although the speed of processors in wireless devices is
It is, however, misleading to compare the speed of OSC to increasing, this “increase in the processor speed is accompanied
MIDI based on the data transmission rate because OSC does by increased power consumption” [13]. An increase in power
not have a data transmission rate. means a decrease in the period that a battery powered controller
can be used in a performance. Furthermore, power usage
The Open Systems Interconnect (OSI) model [7] defines a contributes to the carbon footprint of the instrument. Efficiency,
communication model where applications communicate through therefore, is also an environmental issue.
a layered stack, where the transmitted message passes from the
highest layer of the stack to its lowest layer on the transmitting The developers of OSC state “our design is not preoccupied
end, and from the lowest layer to the highest layer on the with squeezing musical information into the minimum number
receiver. OSC does not define anything below the presentation of bytes. We encode numeric data in 32-bit or 64-bit quantities,
layer, but rather assumes the transport layer will have a provide symbolic addressing, time-tag messages, and in general
bandwidth of greater than 10 megabits per second [19]. MIDI, are much more liberal about using bandwidth for important
however, can be defined using the OSI model [7], from its features”[18]. A major aspect of this “liberal use of bandwidth”
Application Layer defining the message type right down to the
Physical Layer that defines the connector type and current loop 3
The MIDI Manufacturers Association (MMA) has approved a
standard for MIDI over IEEE-1394 (FireWire) [8] and is
already being used by instrument manufacturers including
Personal communications on Developer's list for the Yamaha, Presonus, Roland, and M-Audio [Personal
OpenSound Control [sic] (OSC) Protocol communications]. Furthermore, other implementations of will be referred to as OSC MIDI over other protocols exist, so the speed limitation of
Newsgroup. MIDI is no longer technically correct.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

is the address, which defines the mapped point being (ARP) [9] used on local Ethernet networks. Without going into
referenced. As stated previously in this paper, the advantages the exact details, a brief explanation of how each mechanism
given by the addressing scheme are the intuitive names, the operates is presented, showing how similar paradigms to the
increased namespace, and addressing a range of points and will OSC address space are efficiently implemented.
be addressed later in the paper.

3.2.1 Communications Bandwidth Internet Mapping

An example was given previously for a mapped point—one The intuitive naming strategy used in OSC is similar to domain
names on the internet. When addressing a computer on the
using MIDI, ‘0xB0 0x0C 0x7F’; the other using OSC
internet, one does not normally type in the Internet Protocol
‘/minicv/forefinger 127’. The first example uses only three
(IP)[11] address; rather, they type in the domain name. This
bytes while the second uses over twenty bytes. A significant
problem with the increased message sizes for wireless systems makes it very easy for a human to remember how to locate and
communicate with a particular computer on the internet. The
is that “the more data that is transmitted the greater the chance
calling computer, however, does not send a request to every
that part of the message will need to be retransmitted due to
noise - increasing latency and jitter” [OSC Newsgroup4]. Some computer on the internet. Instead, the domain name is mapped
to an IP address through an Internet Name Server[10]. For
developers of embedded and wireless instruments that have
example, if one was to ‘ping’ a domain from the command line,
been using OSC have resorted to developing pseudo device
the computer will obtain the IP address from the name server
drivers, whereby OSC is converted to a lightweight protocol
before being transmitted [3], reporting a five hundred percent and then send ping messages to the IP address. For example:
increase in throughput and efficiency [OSC Newsgroup5]. $> ping
Although this is an efficient alternative to transmitting the Pinging []
with 32 bytes of data:
whole OSC packet over the serial port, it effectively means that Reply from bytes=32 time=16ms
OSC is not the complete end-to-end, server to client protocol. TTL=56

3.2.2 Processing Bandwidth This activity is done behind the scenes and is abstracted away
Although the transmission rate is taken into consideration— from the user. Although obtaining the IP before sending a
hence Wright and Freed’s assumption “that Open SoundControl message is effectively a two step procedure, these two steps
[sic] will be transmitted on a system with a bandwidth in the make it much more efficient than sending the domain name to
10+ megabit/sec range” [18]—many seem to forget that after every web server.
transmission and reception, the packet also needs to be parsed
by the target synthesiser. Furthermore, it is not just the target Local Network Ethernet Mapping
synthesiser that needs to parse the data, but all synthesisers that On local networks, the abstraction is done through the Media
are not the intended recipients are required to stop what they are Access Control (MAC) address through ARP [9]. If, for
doing and parse a significant number of bytes before rejecting example, a computer whose IP address was ‘’ on a
the message. This in turn affects the minimum processing local network wanted to send a message to the computer
requirements of each and every component in entire system. addressed ‘’, it does not send a message to all the
Although many microcontrollers are being developed with computers on the local network expecting all but ‘’
higher processing speeds, the “increase in the processor speed is to reject it. If this was the case, every time a network card
... accompanied by increased power consumption” [13]. received a message, it would be required to interrupt the
3.2.3 Processing Efficiency computer, impacting on the performance of the rejecting
computer. Rather, the ARP layer maps the IP addresses of the
Although the string based OSC namespace is more efficient for
computers on the network to MAC addresses. This MAC
a human to evaluate, a numerical value is much more efficient
for the computer because computers are arithmetic devices. address is used to address the network card. The other network
cards on the local network ignore the message and do not
Apart from the number of bytes that need to be parsed, the OSC
interrupt the computer. This mapping can be viewed on a
implementation requires that the namespace be parsed through
computer by typing ‘arp –a’ from the command prompt.
some sort of string library, requiring additional computation
and the memory space to contain the library. In a performance $> arp -a
where a mapped point is changed one hundred times a second, Interface: --- 0x2
Internet Address Physical Address Type
the human would not be expected to read that value for every 00-04-ed-0d-f2-da dynamic
message sent; the computer, however, is. Hence, the message is 00-13-ce-f4-63-b6 dynamic
optimized for the entity that requires it least during
performance. This problem with the OSC addressing model is Although these steps are complicated, this is the sort of thing
that the coupling between the human cognition of the computers are good at and it makes communication on complex
namespace and transmission mechanism to the target computer networks very efficient. A similar approach to these could be
is too tight [6]—the naming, which is effectively the human used as an underlying layer to OSC. Implementation of such a
interface, should be abstracted away from the implementation mechanism for OSC is well beyond the scope of this paper; this
using a mapping strategy. Two such strategies that uses this does show that such processes are being used by other
type of mapping are the Internet Name Server [10] for technologies for improved efficiency and should probably be
addressing domain names, and the Address Resolution Protocol used in OSC.

3.2.4 Address Pattern Matching

Christopher Graham posting on 23 January 2008. The method of mapping multiple points to a single message,
ibid. for example, the OSC namespace ‘/minicv/left* 127’ is based

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

on the UNIX name matching and expansion [18, 19]. Once is used by many implementations of OSC [19]. UDP not
again we see a tight coupling between the human interface and guarantee that a packet will be received if transmitted;
the computer implementation. Although the developers of OSC moreover, it does not guarantee that the target will receive
claim that “with modern transport technologies and careful packets in the order they were sent. OSC is based on the same
programming, this addressing scheme incurs no significant paradigm as UDP in that it is packet driven. “This leads to a
performance penalty ... in message processing”[18], using two protocol that is as stateless as possible: rather than assuming
numbers to define a range would require significantly less that the receiver holds some state from previous
processing than decoding a character string with wildcards. For communications” [18]. The problem with the paradigm is that it
example, in the address range ‘/minicv/left*’, every character is no longer event driven, and assumes all the relevant data is
would need to be parsed and tested to see if it was one of the transmitted at once. If a gestural controller sends an OSC
defined wildcard characters. Next, one would have to factor in message that was supposed to change a robot motor direction,
the string comparison that would be required for every mapped immediately followed by a message to start the motor, the OSC
address on the client computer. receiver may receive those in the opposite order, which may be
worse than not receiving the information at all. For example, if
Protocols such as MODBUS [] and a server was to send the following messages using UDP:
DNP [] are used by telemetry units to /lefthand/motor/direction 1
control pump stations in real time [2]. These protocols can use a /lefthand/motor/start
message type that sets a range of mapped points using a single
message. When a range is defined using two numbers, it is a The client could receive them as follows:
simple matter to test if a mapped point is within the range. For /lefthand/motor/start
example, if a message from a protocol that defined two mapped /lefthand/motor/direction 1
point ranges ‘UPPER_RANGE’ and ‘LOWER_RANGE’, the
algorithm to test would be as follows. This now means that the composer will need address the
possibility of messages arriving in the wrong order without any
IF MAPPED_POINT <= UPPER_RANGE notice from the protocol. Although, one could use TCP “in
AND MAPPED_POINT >= LOWER_RANGE THEN situations where guaranteed delivery is more important than low
ProcessValue latency” [19], lower latency has been one of the OSC
evangelists’ greatest catch cries.
As with the intuitive names, this requires an additional layer of
mapping and abstraction, which in turn means work for the 5. STRATEGIES FOR IMPROVEMENT
developer. Software engineering has a similar paradigm where The first strategy for improvement is the intelligent mapping of
some languages are scripted and some are compiled. Scripted namespaces to numbers. OSC must move away from the
languages require the server to compile human readable code stateless protocol paradigm and begin to embrace techniques
each time it is executed, while compiled languages use a tool to such as caching [16], which has been used for many years now
convert human readable code to something that is more efficient to improve the performance of networks, hard drives, and
for the computer. The first type is more efficient for the memory access on CPUs. MIDI’s use of running status is an
programmer because he or she does not need to compile the example of how caching can improve performance by nearly
code after each modification; however, there is a definite thirty-three percent. Caching will be the key to efficient
performance hit. Compiled languages require an extra step: mapping of address patterns to simple numbers without
compiling the human readable code to machine code; however, significant impacting upon performance.
there is an enhancement in performance. In terms of
communications protocols, OSC is like a scripted language: OSC must move toward an event delegation model, where
extremely powerful, but requiring significantly more computing clients register whether to receive OSC messages within a
power than what is available to most embedded technologies particular namespace. Needlessly receiving and parsing large
today. irrelevant messages from OSC servers is a waste of valuable
processing power.
3.2.5 Message Padding The developers of OSC must change their attitude towards
Another possible inefficiency is the padding of all message
MIDI. OSC has been anti-MIDI for a while, with OSC
parameters to four byte boundaries. For example, a parameter
developers often ridiculing MIDI developers [personal
that is only one byte in length is padded to four bytes. The
correspondence]. Some OSC developers have made token
reasoning behind this is that the OSC data structure is optimised
gestures towards MIDI by providing a namespace, which is “an
for thirty-two bit architectures [OSC Newsgroup]. There have
OSC representation for all of the important MIDI messages”
not yet been any conclusive tests to determine whether the gains
[19]. This completely defeats the innovative address pattern
obtained from this optimisation exceed the additional overhead
provided by OSC. Instead, an underlying network layer should
created by inserting and later filtering these additional padded
convert an intuitively mapped name, such as
bytes [OSC Newsgroup]; however, these results should be
‘/performer1/lefthand’ to a MIDI message and then transport it
forthcoming in the near future. It does, however, mean that
via MIDI or vice versa. The MIDI controller number should be
there would be a decrease in efficiency for eight, sixteen, and
completely abstracted away from the application layer in order
sixty-four bit architectures.
to reduce the coupling between the two. The OSC server should
not need to know at the application layer that the motor that
4. FAULT TOLERANCE controls the robot’s left finger is MIDI controller 13. Likewise,
the motor that is being controlled by controller 13 should not
OSC is a packet driven protocol that does not accommodate
need to know that the OSC server is really addressing
failure in the underlying OSI layers. UDP [12] is a protocol that
‘/performer1/lefthand’. Although these sort of strategies have

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

been employed in dynamic routing schemes in some OSC [5] Kartadinata, S. the gluion: advantages of an FPGA-based
projects [19], this should be a function of the network layer, not sensor interface. in International Conference on New
the application layer. When one considers that the longest Interfaces for Musical Expression (NIME). 2006. IRCAM
domain names on the internet can be addressed with only four - Centre Pompidou, Paris, France.
bytes, it is not unreasonable to expect that even the most [6] Larman, C., Applying UML and patterns: an introduction
complex OSC namespaces could be translated into simple MIDI to object-oriented analysis and design and the unified
messages if required. process. 2nd ed. 2002, Upper Saddle River, NJ: Prentice
There needs to be a greater number of message types—currently Hall PTR. xxi, 627.
there are only two. OSC needs to move towards an object [7] Lemieux, J., The OSEK/VDX Standard: Operating System
oriented paradigm in the communications protocol [4]. and Communication. Embedded Systems Programming,
Currently, all the network, data link, and transport layers of 2000. 13(3): p. 90-108.
transmission have been delegated to the application layer. This [8] Pawlicki, J. Formalization of embedded system
is above the presentation layer, which is where OSC exists— development: history and present. in Quality Congress.
this is completely upside down when comparing to the OSI ASQ's ... Annual Quality Congress Proceedings. 2003:
model. OSC needs to develop an underlying OSI stack where PROQUEST Online.
the protocol between the client and server is abstracted away [9] Plummer, D.C. RFC 826 - Ethernet Address Resolution
from the user. The underlying mapping should direct the Protocol: Or converting network protocol addresses to
message from the source to the destination. 48.bit Ethernet address for transmission on Ethernet
hardware. < >
6. CONCLUSION accessed 28 January 2008
Although OSC has provided a standard “protocol for [10] Postel, J. IEN-89 - Internet Name Server. < ftp://ftp.rfc-
communication among computers, sound synthesizers, and > accessed 28 January
other multimedia devices” [19], and was supposed to overcome 2008
“MIDI's well-documented flaws … [, its] liberal [use] … of
bandwidth” [18] may be its Achilles heel, preventing it from [11] Postel, J. RFC 760 - DoD standard Internet Protocol. <
ever being the standard end-to-end protocol for communication > accessed 28
for low power and wireless microcontroller interfaces. If OSC is January 2008
to have any hope in servicing this significant and important area [12] Postel, J. RFC 768 - User Datagram Protocol. <
of the NIME community, an OSI stack needs to be developed > accessed 21
that has efficiency and performance at the forefront, while at the January 2008
same time, implementing proven design patterns [6]. This, [13] Schiemer, G. and M. Havryliv. Wearable firmware: the
however, would be a significant research project within itself. Singing Jacket. in Ghost in the Machine: the Australasian
Computer Music Conference. 2004. University of Victoria,
I would like to thank Adrian Freed from Center for New [14] Schiemer, G. and M. Havryliv. Pocket Gamelan: a Pure
Music and Audio at Univ. California, Berkeley for answering Data interface for java phones. in International Conference
the many questions I asked about OSC. I would also like to on New Musical Interfaces for Music Expression (NIME-
thank all the members of the Developer's list for the OpenSound 2005). 2005. University of British Columbia, Vancouver.
Control [sic] (OSC) Protocol for [15] Son, S.H., Advances in real-time systems. 1995,
their input. Englewood Cliffs, N.J.: Prentice Hall. xix, 537.
[16] Vitter, J.S., External memory algorithms and data
8. REFERENCES structures: dealing with massive data. ACM Comput.
[1] Doornbusch, P., Instruments from now into the future: the Surv., 2001. 33(2): p. 209-271.
disembodied voice. Sounds Australian, 2003(62): p. 18.
[17] Wright, M. Introduction to OSC. <
[2] Entus, M., Running lift stations via telemetry. Water > accessed
Engineering & Management, 1989. 136(11): p. 41-43. 21 January 2008
[3] Fraietta, A. Mini CV Controller - Conference Poster. in [18] Wright, M. and A. Freed. Open SoundControl: A New
Generate and Test: the Australasian Computer Music Protocol for Communicating with Sound Synthesizers. in
Conference. 2005. Queensland University of Technology, International Computer Music Conference. 1997.
Brisbane: Australasian Computer Music Association. Thessaloniki, Hellas: International Computer Music
[4] Fraietta, A., The Smart Controller: an integrated electronic Association.
instrument for real-time performance using programmable [19] Wright, M. and A. Freed. OpenSound Control: State of the
logic control, in School of Contemporary Arts. 2006, Art 2003. in International Conference on New Interfaces
University of Western Sydney. for Musical Expression (NIME-03). 2003. Montreal,
Quebec, Canada.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

SMuSIM: a Prototype of Multichannel Spatialization

System with Multimodal Interaction Interface

Matteo Bozzolan Giovanni Cospito

Department of Electronic Music Department of Electronic Music
Conservatory of Music G.Verdi Conservatory of Music G.Verdi
Como, Italy Como, Italy

ABSTRACT the multichannel spatialization of sound sources. In par-

The continuous evolutions in the human-computer inter- ticular the devices explored in this work are: mouse and
faces field have allowed the development of control devices keyboard (very simple and primitive), a gamepad (classical
that let have a more and more intuitive, gestural and non- gaming joystick) and a webcam (low cost USB camera, that
invasive interaction. allows, through image analysis techniques, a totally non-
Such devices find a natural employment also in the music invasive and free-hand interaction).
applied informatics and in particular in the electronic music, In respect of the sound spatialization, the proposed proto-
always searching for new expressive means. type provides a quadriphonic sound diffusion and allows to
This paper presents a prototype of a system for the real- control up to four independent sound sources. The spatial-
time control of sound spatialization in a multichannel con- ization technique implemented is the well-known Amplitude
figuration with a multimodal interaction interface. The spa- Panning extended to the multichannel case. This choice of
tializer, called SMuSIM, employs interaction devices that simplicity find its motivation in the fact that the primary
range from the simple and well-established mouse and key- aim of this work is the investigation on the interaction in-
board to a classical gaming used joystick (gamepad), finally terfaces rather than the implementation of advanced spa-
exploiting more advanced and innovative typologies based tialization algorithms. The sound projection space can be
on image analysis (as a webcam). artificially altered by controlling the direct to reverberated
signal ratio.
Sound spatialization, multimodal interaction, interaction 2. RELATED WORKS
interfaces, EyesWeb, Pure data. Although the use of spatial sound is present since from
the origins of the music and it appears many times in clas-
1. INTRODUCTION sical western music, it becomes a fundamental practice and
Technology and music have always had a particular re- a key aesthetical element mainly from the second half of the
lationship and affinity. In particular, the researches and past century (first thanks to the development of sound dif-
experimentations in the fields of electricity first and then fusion electrical devices and then because of the revolution
of the electronics and informatics have allowed, in the last of electronic and digital sound systems). For brevity, in this
two centuries, the birth of a series of instruments for a new section are presented only some of the most recent works in
musical expressivity. the field of real-time sound spatialization digital systems.
Besides, thanks to a more and more available computa- A first example is MidiSpace [6], a system for the spatial-
tional power associated with the development of new tech- ization of MIDI sound files in virtual environments realized
nics and technologies for the human gestuality acquisition at the end of the ’90s at the Sony Computer Science Lab
and analysis, new ways have opened in the field of the in Paris. It is one of the earliest sound spatialization ex-
human-computer interaction, allowing so the birth of a new periments in 3D worlds and it gives the user two distinct
generation of interfaces that find a natural employment even graphic interfaces to control the application: the first one
in music applications. (bidimensional) allows to displace the various sound sources
As reported in [11], the most widespread interaction de- (identified by a set of musical instruments) in the projection
vices currently used are (in an increasing order of complex- space, while the second one (three-dimensional and realized
ity): PC keyboard, mouse, joystick, MIDI keyboard, video in VRML) controls the movements of an avatar in the vir-
camera, touchpad, touchscreen, 3D input devices (data glo- tual world. The spatialization technique is the two-channel
ves, electromagnetic trakers) or haptic devices. Amplitude Panning and the interaction devices are mouse
This paper shows the results of the experimentation of and keyboard.
some of these interfaces for the realization of a system for A more recent work is represented by ViMiC [3], a real-
time system for the gestural control of spatialization for
small ensamble of players. It belongs to the wider project
Gesture Control of Spatialization started in the 2005 at the
Permission to make digital or hard copies of all or part of this work for McGill University IDMIL Lab (Montreal, Canada). It’s
personal or classroom use is granted without fee provided that copies are very interesting because it allows the user to control the dis-
not made or distributed for profit or commercial advantage and that copies placement of the sound sources simply by moving his hands
bear this notice and the full citation on the first page. To copy otherwise, to in the air (thanks to a complex apparatus for movements
republish, to post on servers or to redistribute to lists, requires prior specific interpretation and codification called Gesture Description
permission and/or a fee.
NIME08, Genova, Italy Interchange Format). A set of 8 sensors (connected with
Copyright 2008 Copyright remains with the author(s). an electromagnetic tracking system) is applied to the two

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

hand of the player. phonic configuration with the 4 loudspeaker placed at the
Zirkonium [9] is a software implemented to control the corners of the room. The projection space can be artifi-
spatialization within the Klangdom system at the ZKM cially extended and modified by controlling the direct to
(Germany). The Klangdom is formed by 39 speaker and it reverbereted signal ratio (for the creation of illusory acous-
can be controlled by Zirkonium through mouse and joystick. tic spaces).
It implements various spatialization algorithms (Wave Field The spatializer allows the player to control up to four
Synthesis, Ambisonics, Vector Base Amplitude Panning e simultaneous sound sources and a graphical feedback gives
Sound Surface panning) and it allows the user to define an the instantaneous state of the system.
arbitrary number of resources1 to spatialize in the concert The system offers a set of functionalities that allows a
hall. The system is controlled through a simple graphic complete and efficient control of the spatialization and in
interface. particular: a punctual and precise placement of the sound
Challenging Bodies [4] is a complex multidisciplinary pro- sources in the space, the control of relative and absolute
ject for live-performances of disabled people realized at the volume levels, the automatization of the movements, a non-
Informatics and Music Department of the Regina Univer- linear interpolation of the position of the sources in time and
sity (Canada). Within this wide project, the RITZ system, the possibility to load pre-recorded sound files or to acquire
through various techniques, allows to frontally spatialize up signals coming from a microphone or any audio device.
to 10 input signals coming from musical instruments with 7 As shown in Figure 1, the system has been implemented
loudspeaker placed in front of the players. Its control inter- in Pure Data 3 (and its graphical interface GrIPD 4 ) and
face is made up by two windows: the first one, implemented EyesWeb 5 (with the creation of ad hoc additional blocks)
in GEM2 , supplies a graphical feedback of the loudspeakers communicating through the OSC 6 protocol, making so SMu-
configuration and it allows to modify the position of the SIM a native network distributed application (both with
sound sources in the space, while the second one, the main one ore more instances on several machines, allowing mul-
control patch implemented in Pure Data, gives the user the tiple distributed configurations).
possibility to set the relative and absolute sound levels. The
system is hardly oriented to scalability and usability. 3.1 Interaction interfaces
The last example is the work recently proposed by Scha- The prototype offers three different typologies of human-
cher [10] at the ICMST of the Zurich university (Switzer- computer interaction devices for the spatialization’s con-
land). It consists of a design methodology and of a set of trol. Keyboard and mouse are the simplest and the most
tools for the gestural control of sound sources in surround widespread ones. The user controls the diffusion of the
environments. The spatialization is made through a struc- sound sources in the space through a combination of actions
tured and formalized analysis that allows to map the player and commands coming from the PC keyboard and from the
gestures on the sources movements by applying various ty- mouse. In this case the system provides (in addition to
pologies of geometric transformations. From the point of the visual feedback window) a bidimensional graphic envi-
view of the input devices, the system does not have a con- ronment where the player can put and move some graphic
solidated structure, but the interfaces used up to now spaces objects representing the different sound sources.
from data gloves equipped with multiple sensors (pressure,
position, bending) to haptic arms and graphic and multi-
touch tablets. The spatialization algorithms used are the
Ambisonics and the Vector Based Amplitude Panning.

SMuSIM is a multichannel sound spatialization system
with a multiple and multimodal interaction interface. It
is designed for real-time applications in musical expressive
contexts (electronic music spatialization, distributed and
collaborative network performances).

Figure 2: Input devices used for SMuSIM.

The second device is a gamepad, a classical gaming con-

troller with two axis and ten buttons freely configurable.
The very compact dimensions and the ergonomicity make
the devices very usable and allows a great playability.
The last interface is a standard low-cost USB webcam
that acquires the movements of a set of colored objects.
Each physical object (through a color-based tracking algo-
rithm) is associated to a sound object in the sound projec-
tion space.
The player can use one ore more devices at the same
Figure 1: The system’s architecture. time (allowing a collaborative and multi-user performance).
The proposed interfaces are deliberately simple, cheap and
In this first implementation, the speaker are supposed to
be arranged in the spatialization room in a typical quadri- 3
1 4
A resource is a set of one or more audio sources coming
from an audio file, a network stream or any audio device.˜jsarlo/gripd/
2 6

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

widely available on the market in order to let the system of the attenuation levels to apply to the audio signals on
easily usable and accessible to any user level. each channel. The spatialization technique is the Ampli-
tude Panning extended to the multichannel case. On the
3.2 Software components structure basis of the positional data of the virtual sources coming
As shown in Figure 3, the application is composed by from the input devices, a monophonic signal (considering a
some functional units that perform the various needed tasks. single source) is applied to the various channels with a gain
Data coming from the input devices are acquired, for- factor as follows:
matted and analyzed by the Device controller unit that is
constituted by other 4 sub-units, one for each input device.
xi (t) = gi x(t), i = 1, ..., N
In particular Mouse/Keyboard controller supplies a gra-
phic window (interaction environment) where the user can where xi (t) is the signal to apply to the loudspeaker i, gi
displace the four objects representing the sound sources the gain factor of the correspondent channel, N the cardi-
with the mouse. A set of keyboard key combinations allows nality of the loudspeaker and t the time. The gain factor gi
to perform a set of predefined actions (shifting of single or has a non-linear proportionality with the position (x, y) of
groups of sources, maintaining or not their topological con- a single sound source in the space. To overcome the 6dB at-
figuration, loading/saving default configurations, etc.). tenuation at the center of the projection space, a quadratic
sinusoidal compensation curve is applied along the two di-
mensions. By considering all the sound sources involved,
the resulting signal X(t) can finally be defined as:

X(t) = xi (t)

where K is the maximum number of sound sources in-

volved in the spatialization (K = 4 on the specific case of
The graphic and audio feedback production is managed
respectively by the Graphics display and Sound production
units. The last one prepares the audio stream to send to
the loudspeakers. It essentially manages the reverberation
algorithm by applying it to the resulting signal coming from
Figure 3: Diagram of the software functionalities the combination of the original audio stream (furnished by
implemented in SMuSIM. the Audio streaming/playback unit) and the spatialization
data, allowing in this way the creation of illusory acous-
Joystick controller allows to control the spatializer with a tic spaces. By controlling the balance between the direct
standard 2-axis and 12-buttons gamepad. The interface be- and reverberated signal independently for each channel, it
tween the gamepad and the spatializer is managed through is possible, besides increasing the overall distances percep-
GrIPD, that provides all the needed functionalities. But- tion, to deform the sound projection environment (by act-
tons are used to select the sources to be controlled, while ing along one ore more dimensions of the room). Currently
the two analog mini-sticks determine the changes in their the functionalities of this unit are extremely limited in view
position and volume. With this device it is easy to control of a future integration of a sound synthesis engine for the
more than one source at the same time7 . Webcam con- real-time generation of sounds.
troller manages data coming from the video acquisition de-
vice. The interaction paradigm in this case is the following: 4. FUTURE WORK
the webcam films a plane and neutral colored surface on The system developed is still in a prototypal phase and
which are placed the objects to be tracked; the webcam’s has some limitations that can be easily improved. First
field of view correspond to the diffusion space and the po- it can be interesting to test some other interaction inter-
sition of the colored objects determines the displacement of faces (to enlarge the multimodality issue) such as more
the sound sources on the sound projection space. The unit performative cameras (higher frame-rate, infrared lighting)
provides a set of tools for the real-time selection of the de- or other technologies for the exploitation of the gestural
sired color to track (simply by picking it out on a window control of the instrument (electro-magnetic or ultra-sound
showing the webcam video stream) and for the extraction of tracking systems, data gloves). A study is currently ac-
centroids and bounding boxes of the color blobs. Bounding tive for the exploration of touch-sensible interfaces (graphic
boxes are used to set the volume levels of each source (a tablets, multitouch and painterly interfaces).
vertical position stands for maximum volume, a horizontal A second improvement refers to the spatialization tech-
one for mute). The MIDI controller block processes data nique, given that in this first phase of the project it has
coming from an optional MIDI device (both hardware and not been the crucial aspect of the work. The simple Ampli-
software). tude Panning technique can be replaced by more complex
The source movements can be automatized thanks to the and efficient algorithm such as the Vector Based Amplitude
Automatization unit, while the position’s changes of sound Panning, Ambisonics extended to a multichannel configu-
sources are made not instantaneous through Interpolator, ration and Wave Field Synthesis.
that generates a smoothed and decelerated motion by a non- Another key issue is represented by the performances of
linear interpolation of subsequent positional data. the system that are the main requisite of the application in
Spatializer is the unit that performs the computation contexts of real-time musical performances. In fact there are
Thanks to the compact size and to the ergonomicity of actually some latency problems in the configuration with
the gamepad, that allows the contemporaneous pressure of the webcam running particularly on not high performances
more than one button at the same time. machines or notebooks. This could be resolved by improv-

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

ing and optimizing both the tracking algorithm and the vi- consideration, the graphic feedback proposed to the user is
sual feedback production (in case abandoning the EyesWeb quite simple and thin, but it results very efficient and let
and Pd platforms and realizing an integrated, stand-alone have the actual state of the sound sources in the diffusion
and dedicated software application). space always under control.
From the point of view of the automatization, it does not From the point of view of the sound spatialization, the
provide any way of interaction with the player, but it is Amplitude Panning technique produces the expected re-
an autonomous and isolated modality. It could be inter- sults. It is very efficient, it does not have problems of com-
esting the implementation of rules for pattern learning and putational complexity and it is easily configurable to the
reproduction in order to let the system able to imitate and various executive and technical contexts (customization of
continue a performance initially guided by a human user. the panning curves and of the number of diffusion channels).
Other possible developments could refer to the diffusion Even if an intensive and large scale test session has still to
system (increasing the number of loudspeakers and their be conducted, SMuSIM has shown good results in terms of
configuration) and to the integration of a sound synthesis learnability, intuitivity and expressiveness. There are vari-
engine within the application. ous possible developments of this work and they refer both
During this first phase of the work there was not enough to software and hardware issues (input devices, diffusion
space for an intensive and structured test session on a large system) and applicative and musical aspects.
and heterogeneous set of users. However a hypothetical
evaluation experiment has been predisposed for a future 6. REFERENCES
[1] A. Camurri et al. Toward real-time multimodal
The experiment has a total duration of about 45 minutes
processing: EyesWeb 4. In Proceedings of the
and it is composed by six sections:
Convention on Motion, Emotion and Cognition
1) free trial of the instrument (10 min) without any expla-
(AISB04), Leeds, UK, 2004.
nation about the working principles of the system (the user
has previously read a short user manual) [2] J. M. Chowning. The simulation of moving sound
2) supervised test (10 min) in which the user has to execute sources. In Journal of the Audio Engineering Society,
some tasks evaluated by the operators volume 19, pages 2–6, 1971.
3) explanation of the working principles (5 min) by an op- [3] M. Marshall, Wanderley, et al. On the development of
erator in order to increase the consciousness of control of a system for the gesture control of spatialization. In
the spatialization instrument and to accelerate the learning Proceedings of the 2006 International Computer
process Music Conference (ICMC06), pages 360–366, New
4) repetition of the test (10 min) after the explanations of Orleans, USA, 2006.
the operator [4] J. Nixdorf and D. Gerhard. Real-time sound source
5) questionnaire (5 min) of evaluation compiled by the user spatialization as used in Challenging Bodies:
6) interview (5 min) in which the operators deepen some implementation and performance. In Proceedings of
aspects appeared during the test. the 2006 International Conference on New Interfaces
The two proposed tests contains list a of 21 tasks (for each for Musical Expression (NIME06), pages 318–321,
test) that the user has to execute. Each task receives a Paris, France, 2006.
mark according to a five point Likert-scale (1: not exe- [5] N. Orio, N. Schnell, and M. M. Wanderlay. Input
cuted, 5: executed at the first trial). The tasks are sorted devices for musical expression: borrowing tools from
by the increasing level of difficulty and they are intended HCI. In Proceedings of the 2001 International
to test most of the functionalities of the instrument and its Conference on New Interfaces for Musical Expression
expressive possibilities. The questionnaire presents 22 ques- (NIME01), 2001.
tions divided into 5 categories: usability of the system (8), [6] F. Pachet and O. Delerue. A mixed 2D/3D interface
learnability (3), audio feedback (3), visual feedback (4) and for music spatialization . In Proceedings of the First
overall opinion (4). Also in the questionnaire the players International Conference on Virtual Worlds, pages
has to give a mark according to a five point Likert-scale (1: 298–307, Paris, France, 1998.
bad, 5: very good). [7] M. Puckette. Pure Data: another integrated computer
music environment. In Proceedings of the 1996
International Computer Music Conference (ICMC96),
5. CONCLUSIONS pages 269–272, Hong Kong, China, 1996.
A real-time sound sources spatialization system with a [8] V. Pulkki. Spatial sound generation and perception
multimodal interaction interface has been developed. by amplitude panning techniques. Graduation thesis,
The interaction interfaces have been realized with very Helsinki University of Technology, Laboratory of
simple and inexpensive technologies and devices, that have Acoustics and Audio Signal Processing, 2001.
nevertheless shown satisfactory expressive and interaction [9] C. Ramakrishnan, J. Gossmann, and L. Brummer.
possibilities. In particular the best results came out, as The ZKM Klangdom. In Proceedings of the 2006
expected, with the gamepad and the webcam, devices that International Conference on New Interfaces for
allow more freedom in movements and a more intuitive and Musical Expression (NIME06), pages 140–143, Paris,
natural interaction. Moreover the webcam let the user move France, 2006.
independently each sound source (action impossible with [10] J. C. Schacher. Gesture control of sounds in 3D space.
both the mouse and the gamepad). On the other hand, the In Proceedings of the 2007 International Conference
performances are one of the key aspects associated with this on New Interfaces for Musical Expression (NIME07),
last kind of device, because of the computational load of the pages 358–361, New York, USA, 2007.
image analysis techniques that make the real-time issue a [11] L. Schomaker, A. Camurri, et al. A taxonomy of
crucial aspect of the application. multimodal interaction in the human information
In general even all the graphic rendering operations for processing system. Technical report, Nijmegen
the creation of the visual feedback are particularly oner- University, 1995.
ous for the overall performances of the system. Under this

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Realtime Representation and

Gestural Control of Musical Polytempi
Chris Nash and Alan Blackwell
Rainbow (Interaction & Graphics) Research Group, University of Cambridge
Computer Laboratory, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
+44 (0)1223 334678
{christopher.nash, alan.blackwell}

ABSTRACT from multiple, misaligned passages of music could be argued to

Over the last century, composers have made increasingly hold limited aesthetic appeal. For the performer: such
ambitious experiments with musical time, but have been perceptions of anarchy, chaos and randomness may seem ironic,
impeded in expressing more temporally-complex musical as the musician attempts to follow the composer’s explicit,
processes by the limitations of both music notations and human highly-ordered and inflexible timing directions, unable to rely
performers. In this paper, we describe a computer-based on implicit or explicit timing cues from other musicians or a
notation and gestural control system for independently conductor [19]; unable to rely on a universal, steady pulse of
manipulating the tempi of musical parts within a piece, at bar or beat. And, finally: if the performer struggles to manage
performance time. We describe how the problem was an individual part of the piece, then the composer’s task of
approached, drawing upon feedback and suggestions from developing and imagining the complete, combined performance
consultations across multiple disciplines, seeking analogous becomes an almost impossibly hard mental operation.
problems in other fields. Throughout, our approach is guided In conventional music, an audience is invariably attuned to
and, ultimately, assessed by an established professional global tempo variations within a piece. When introducing a
composer, who was able to interact with a working prototype of simultaneous part with a differing tempo, an extra dimension is
the system. added to the audience’s perception of the performance – the
explicit interplay of parts in respect of time.
Tempo, polytempi, performance, composition, realtime, gesture

Although intricate and complex musical processes involving
rhythm, melody and harmony are to be found in most musical
genres, the use of and conventions relating to tempo are less
adventurous [16]. It has only been the last century or so that has
seen composers, such as Steve Reich and Conlon Nancarrow, Figure 1. Perceived synchronisation in phase music.
experiment with simultaneous musical parts bearing differing
tempi [9]. As regards experiments in musical time, the notion of
Due to the periodic nature of much of the world’s rhythms [3],
polytempi is crucially different from the relatively more
there are various points where disjoint parts can appear more or
common concepts of polyrhythm and polymetre, which both
less temporally-aligned, so that the perceived effect is
rely on simple integer divisions of the bars or beats in the piece.
determined not only by the absolute musical offset, but also
In contrast, the multiple simultaneous tempi of polytempo
relative factors. For example, in Figure 1, the parts start in-sync
music leads to situations where the bar lines and beats of each
and gradually diverge because of differing tempi. Initially, the
part in the piece are themselves incongruent. The timing
divergence is small enough so that the listener can still corrupt
relationships between the events in each part can no longer be
their perception of the musical events onto a single time scale,
thought of, or expressed in, simple integer fractions (e.g. 3 in
dismissing the offset as they might a digital chorus effect,
the time of 2, or 3/2 vs. 6/4), but instead become irrational.
acoustic echo or performance prosody [9]. After time, the offset
A number of explanations can be volunteered for the paucity of increases and the two parts are more easily separated, becoming
polytempo use in the modern musical repertoire. For the harder to align perceptually. Yet, by Bar 4, the absolute offset is
average listener: the barrage of incongruous notes resulting approximately one beat, and thus the music can be aligned
about the beat. Continued, such alignment occurs relative to
other points in the bar, as well as divisions of the beat, and
inevitably aligns relative to the bar itself.
The varying incongruity of notes can be seen to form a temporal
harmony, where perceived aligned and misaligned episodes
correspond to consonance and dissonance respectively. For
centuries, these concepts have been powerful tools levered by
composers in their engagement with tonal harmony; in the
typical case, dissonance giving way to consonance, to provide

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

resolution [14]. As such, and like dissonant harmonies, the nonetheless able to present impressions of temporal harmony,
average listener’s aversion to apparently cacophonic, with temporally dissonant passages resolving to consonance.
misaligned music serves only to reinforce the potential for Furthermore, his reliance on more established working practices
temporal resolutions. affords him greater flexibility in instrumentation, arrangement
and performance.
Harmony and pitch have been studied, codified and notated in a
variety of ways to enable performance by musicians and Both Nancarrow and Reich effectively used technology to
experimentation by composers, yet few notations have arisen to address problems with the use of polytempi, but were both
explicitly express temporal relations [16]. In our research, we forced to pre-calculate and prescribe the tempo variations long
apply computer technology and interactive notations to tackle in advance of performance; waiting hours, days or weeks to
these remaining problems. Whereas the problem of directing hear the result of their writing. In all three cases, the composers
and coordinating performers is unquestionably also a matter of are forced to limit their creativity in some way, be it temporal
notation (be it paper, digital, aural, static or interactive), this freedom, dynamism, note-to-note control or instrumentation.
paper principally focuses on the earlier, pre-requisite stage in
Approaches to managing complex musical timings tend to focus
the process – the composer’s creation of the music. For our
on performance requirements. Ghent [7] is one of the earlier
purposes, the “super-human virtuosity” currently required of
attempts to use audio cues (e.g. multiple metronomes) for
polytempi performers can be provided by the computer [4].
individual musicians. Ligeti [12] uses a similarly audio-based
method. Such techniques isolate the musician from the
2. BACKGROUND ensemble and, more importantly, the part from the piece, which
The simple concept described in the Introduction and Figure 1 is not only incompatible with the composer’s requirement of a
underpins the phase music of Steve Reich [15]. “Piano Phase” macroscopic view of the music, but also inhibits performer
(1967) contains two musical parts with the same melodic and interaction, an important component of the music, socially and
rhythmic content, but slightly differing (yet constant) tempi. aesthetically [19].
The parts start together, then gradually diverge, or ‘phase’, in
musical time, producing moments of dissonance and Other explicit considerations of polytempo music are sparse,
consonance, as the parts become more or less aligned. Reich’s and the paucity of published research in this area is marked by
interface to this process began with a tape machine, playing two the writings of lamenting composers desperate to explore more
looped tapes of the phrase at different speeds. Subsequently, advanced musical timings, such as the late Stockhausen [17]. A
and owing to the relative simplicity and repetitive nature of the useful website, run by artist John Greschak [8], contains more
musical content, he was able to carry the idea to the piano, information and unpublished articles about polytempo, as well
whereupon two exceptionally disciplined and practiced as an annotated list of polytempo music. To our knowledge,
performers can play the music live. With the exception of the there has been no previously published work in the area of
tape speed settings, general performance directions and the music interaction or interface design that has significantly
looped phrase itself, however, the piece is not fully-scored, but addressed musical tempo as the focus of control, nor explicitly
is instead an example of the generative or procedural considered the composer and composition as target user or task.
specification of music. Notably, it is difficult to inspect or
manipulate specific, individual notes or events in the 3. A SYSTEM FOR POLYTEMPI
performance. There are two principal requirements of a system allowing
composers to interact with tempo and polytempi: a
Conlon Nancarrow, a contemporary of Reich, took a different
representation of the polytempi, including the temporal
approach to the problems of notation and performance,
relations between parts (the notation); and a method of
replacing the human pianist with a pianola (or player piano),
manipulating and managing the tempo of such parts (the
notating his music on the paper roles used by the machine [9].
system). In this latter case, interaction should occur in realtime,
Unlike score notation, the roles represent time linearly and the
in order to quickly allow the auditioning of alternative material
piano’s mechanism eventually afforded the opportunity to
and making of expressive refinements.
dynamically vary tempo within a part. Unlike Reich,
Nancarrow’s pieces tended not to rely on the phasing of musical However, before further considering issues of system design
events in repetitive parts, but on a grander plan of having a and implementation, we must tackle one of the fundamental
single, climatic point of synchrony. goals of our research: the design of a notation for polytempi,
upon which the system will be based.
Alejandro Viñao [1], an established modern-day composer, has
been much inspired by Nancarrow’s efforts, and now brings a 3.1 Notation
more personal perspective and motivation to our research, The design of our system was arrived at by drawing on our prior
joining us as Composer in Residence at the Computer research into notations for performance and composition in
Laboratory. For more than 30 years and in a variety of areas and music and other expressive arts [2][5]. The lack of previous
centres of research (including IRCAM and MIT’s Media Lab), work in this specific problem encouraged us to look for
he has sought technologies to help him express his musical analogies in other disciplines and fields where it is necessary to
ideas. Yet, his appropriation of technologies and methods in handle parallel streams, signals and processes – such as physics,
conventional music practice force him to an unsatisfying data communication, computer security, graphics, and
compromise when it comes to exploring polytempi. Using engineering fields.
scored music, Alejandro divides the bar into the finest
performable resolution (e.g. 1/32nd notes), and uses varying In facilitated cross-disciplinary meetings with 10 different
note accents and stresses to give the impression of multiple specialist research groups (see Section 8, for a full list), the
tempi. Even though he admits such methods do not produce concept of phase and synchronization was highlighted in a
true polytempi, Alejandro manages to create pieces that are number of non-musical activities, possibly the closest cousin of

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

which is physical sound itself. A periodic waveform, such as a

Bar Y

Bar Y

Bar Y
sine wave, at any moment has a phase, frequency and (c) (d)
wavelength that might be adapted to music, in the forms of
musical position, tempo and bar length, respectively1.
Considering a musical part as a periodic signal, the challenge
moves to representing multiple signals so that the relationship
(a) Bar X Bar X Bar X
between them is evident. In many fields, phase can be plotted or
graphed as a function of other properties of a given system,
such as time (e.g. phase plot) or frequency (e.g. Bode plot). In
this manner, it would be possible to plot musical position on a Figure 3. An idealised example of using a musical
vertical axis against absolute time on the horizontal, but this phase plot to manage polytempi.
(a) Part Y is progressing faster through the bar. (b) The part is slowed
would only be useful in plotting absolute synchronization and
to the tempo of its counterpart, leaving them offset by 1 musical beat.
absolute time offsets – tempo would be implicitly presented as (c) The part is again slowed so that, by (d), the parts are back in sync.
line gradient, and relative alignments would also be difficult to
identify. Instead, we propose a plot of the phase of one signal
against the phase of another, as in Figure 2(a). In music, this is
bar lines (the other part’s formed by the axes of the graph), and
the musical position within the bar of one part, against that in
gradually creeps across the grid, as the bar lines diverge,
another part, as shown in Figure 2(b).
eventually converging on the opposite extreme of the bar,
whereupon the process concludes, having regained synchrony,
albeit a bar adrift.
Bar Y

3.2 Interface
The examples above demonstrate how the plot can be used to
inspect temporal aspects of a piece but, in order to be of use to
composers, a system must allow the viewer to affect the tempi –
to draw the line themselves – and react to what they see and
It would be possible to expose the relative synchronisation as a
(a) (b) Bar X control parameter, but this would require the composer to first
select a reference part to which the synchronisation would be
Figure 2. Plots of phase against phase. relative, effectively restricting tempo variation to a single part,
(a) A general case. (b) An adaptation for musical purposes (4/4). at any given time. Instead, we elected to simply control the
tempi of both parts independently.
In addition to these two fundamental variables, we envisaged
Although the plot no longer allows the reader to deduce the
additional control parameters. Notably, the composer will, at
individual tempos of each part, the relationship between them is
different times, wish to affect tempo variations of varying scale.
clear – a diagonal line (45 degrees) implies matched tempi;
With Reich and Nancarrow, the tempo changes were gradual
steeper or shallower and one part is faster or slower than the
and finely-controlled, but other composers, such as Alejandro
other. More importantly, the bar-level phase difference is also
Viñao, desire the expressive freedom to make both fine and
displayed, allowing the reader to easily deduce points of
more abrupt, coarser variations. Thus, a third variable of control
relative alignment, as shown by the guidelines in Figure 2(b).
range (or resolution) is required. Finally, observing that
From the diagram it is possible to see the salient factors of the
temporal harmony involves the varying between two extremes
polytempi process – the relative phases and synchronization of
(temporal consonance and dissonance), and that most pieces
two parts – and extrapolate how changes in each tempo, which
revolve around the journeys between them, we introduce a
affect the gradient of the plot, will affect the degree of
fourth factor in the interaction: a “gravitational” element that
synchrony over time. Figure 3 gives an illustration of a musical
draws the two parts into consonant temporal congruity, to a
varying degree. Altogether, this requires an interface offering at
To further illustrate how the plot functions, consider how least 4 degrees of freedom, corresponding to: tempo of first
Reich’s “Piano Phase” would be represented: With two parts part, tempo of second part, tempo control resolution and
featuring close yet differing constant tempos, the line would be influence of gravity.
drawn with a gradient slightly off-diagonal. One part would
Our interface could simply be formed from common input
reach the end of the bar sooner than the other, prompting the
widgets (sliders, rotary knobs, etc.). However, in designing our
line to ‘wrap-around’ using the dashed lines, as in Figure 2. The
prototype, we turned to human gesture, where the body affords
wrap-around line illustrates the relation between the two parts’
a large variety of motions to which our scales might be
effectively mapped, and where their interrelationships and
dependencies might be implicitly reflected. Gesture is often
seen as a ‘natural’ interaction mode for computer-based musical
1 applications, owing to the physical and tangible nature of
Amplitude, the remaining fundamental characteristic of audio signals
interaction in traditional music making [13]. In this vein, we
constitutes an instantaneous property, and might be seen as the
counterpart to similar musical properties such as dynamics, pitch,
elected to use gestures, motions and actions that would not
instrumentation, etc. appear out of character with those established in live musical

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Figure 4. A Vicon™ Motion Capture-based system designed for controlling and representing polytempi.

performance. Specifically, the similarity of expressive roles within a confined space. The raw camera data is processed by
between our user, the composer and that of a conductor was a the Vicon™ system into a realtime stream of 3D coordinates,
significant influence in our selection. The intended result was a which can be combined into groups representing different
method of interacting that would make a user more comfortable bodies and limbs.
in their manipulation of the system, where musical-like physical
For our system, we used two gloves and a belt to allow us to
actions prompted clear musical results and users did not feel
determine the position of the hands, relative to the body, and
inhibited or self-conscious by having to make overt, overly-
the position of the body relative to the space. The data was
exuberant and uncharacteristic gestures.
piped over a TCP/IP network to a PC running Cycling 74’s
Projecting the phase schematic onto a wall-mounted screen, a Max/MSP and Jitter. Using C++, we developed a Max/MSP
Vicon™ Motion Capture system [18] was used to capture body external that converted the data packets into usable Max
motion. A similar system has been used to control synthesizers variables. Variables corresponding to the positions, velocities
and sound generation [6], but we could find no published and orientation of the waist and each hand were connected to
account of an attempt to use such a system and gesture to allow the respective control variables of a MIDI playback engine
higher-level, realtime control of musical composition and (playing pre-recorded piano or percussive parts), so that they
expression. could be appropriately manipulated. In turn, the control
variables, together with the status variables of the engine, were
Our system (see Figures 4 and 5) was designed so that the
then passed to a Jitter patch that constructed a graphical
height of each hand would set the tempo of each respective part.
representation of the musical phase plot, to be fed back to the
Walking forwards or backwards set the tempo range addressed
by the hands, literally allowing more “up-close” adjustments or
broader handling “from a distance”. Appropriately, the effect of Despite a diverse collection of protocols, the different
gravity could be controlled by bringing the hands closer technologies integrated well, and a basic system was up and
together laterally, so that clasped hands (vertical and horizontal running quickly, allowing us time to iteratively refine the
proximity) would ultimately bring about synchronisation of interaction. The system outlined in Section 4 was designed to
both tempo and relative position. Additional gestures were encapsulate the relative properties of the synchronisation
added to start and stop playback (a quick clap), and to allow the between parts and, in doing so, would provide only limited
user to lock the tempo of each part (turning the respective palm insight into the more absolute characteristics of the performance
up) so that they could focus on the other. – notably, absolute tempo or absolute part position. In Figure 4
and 5, the screen shows the musical phase plot in a 3D
perspective, whereby a different plot is presented for each bar
4. TECHNICAL DETAILS of a single part, flying forward in an abstract 3D space,
The Vicon™ system works by using multiple cameras that can
appearing at a distance from the upper right (allowing bars to be
detect infra-red light reflected off small reflective balls attached
read left-to-right), at a speed matching the part’s tempo. The
to the subject. Belts, hats, gloves and suits adorned with these
user is thus given the impression that they are progressing
balls can be worn to allow untethered movement to be recorded
through the piece, and at what rate.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

example, the system’s difference in feedback between the two

parts when offset by half-a-bar and when offset by one-and-a-
half bars was minimal, yet could potentially hold important
musical implications. The display of absolute positions and
tempos was limited to the status readout on the right of the
display, together with the appropriate labelling of the axes
(indicating the current bar of each part). Similarly, although the
extrusion of the plots into 3D succeeded in providing a sense of
progress and time passing, the wrap-around lines now leapt
from plot to plot, making it harder to observe bar-to-bar trends,
and we were led to conclude that the original, static 2D design
might be more suitable in most musical applications.
Furthermore, a better macroscopic impression would be
afforded by adjusting a 2D musical phase plot (drawn relative
to bar lengths), to one where the axes simply represent a
continuing, absolute musical position within each part. The
viewport would then pan over the current musical position
appropriately. This would obviate the need for the wrap-around
Figure 5. Alejandro Viñao (foreground) using the prototype. line, which might reduce the visibility of bar-to-bar
relationships, but which could be replaced by appropriate
annotations to make bar transitions and relationships more
5. DISCUSSION explicit.
Following development of the basic architecture,
implementation of the prototype followed an iterative design 6. CONCLUSIONS
process, based on feedback from our own interaction with the This paper identified an area of musical expression that has
system, and three trials by Alejandro Viñao, which produced received relatively little attention from technologists and music
positive and useful feedback. researchers. In an effort to tackle the barriers between
The difference between our interactions and that of a practising composers and polytempi, we have proposed and tested both a
composer were revealing. To test the system, we used a variety notation for representing multiple tempi in music and a gestural
of movements to ensure a robust and varied interaction. As system for interacting with them in realtime.
demonstrated by the video (see Section 6), Alejandro’s gestures Alejandro Viñao’s assessment demonstrated the aptness of our
were significantly more subtle, focusing on fine control – underlying design concept, while identifying a number of minor
reiterating the utility of the resolution control. interaction issues that would be relatively easy to address. Our
For Alejandro, the strength of the design concept was already system, however, is but one possible solution to the problem,
evident in the early version of the system we had available for based on but one suggestion for a notation supporting
his first visit. The forwards-backwards tempo control resolution polytempi. It is yet to be established how well our system
feature was introduced for his second visit and refined in the scales to pieces with more than two differing tempi (i.e. using
third to allow a level of temporal control at which he could multiple plots or multiple axes). Furthermore, a major challenge
comfortably and confidently effect temporal manipulations in will be the integration of polytempi notation with both live
the music. Experimenting with temporality in a small selection performance and other elements of music (melody, harmony,
of pre-prepared pieces, Alejandro mentioned that he was dynamics, rhythm, etc.), both of which would afford the
already eager to try the prototype with music of his own composer or conductor greater possible creative freedom for
creation. realising music.

As with many musical instruments, mastering the interaction

might have required more than the short exposure afforded
Alejandro. In this respect, the “gravity” feature demonstrated its A computer animation demonstrating the system, including the
potential as a helper device, assisting actions that would supported gestures, as well as a video of Alejandro Viñao using
otherwise require fine control and hand-eye coordination – the system is available online at:
allowing Alejandro to more easily target and achieve alignment
and temporal consonance. The prototypes were only
implemented with a basic gravity effect that brought the two
parts closer together in absolute musical position. A more
flexible feature, whereby the effect might gradually match 8. ACKNOWLEDGMENTS
tempos or align position relative to either the bar, beat or sub- Special thanks to Alejandro Viñao, who presented us the
beat, would further improve the usability and creative flexibility challenge and gave us invaluable insights into modern musical
of the system. practice; and additionally to Tristram Bracey and Joe Osborne,
Alejandro observed that the system was well-suited to who worked on developing the prototype. For a wide range of
inspecting, manipulating and adapting to polytempo processes other insights, thanks to the various groups who kindly met
operating at the level of the bar – either within a bar, or relative with us: the Digital Technology, NETOS, OPERA, Rainbow
to the bar line. However, he noted that it was difficult to and Theory & Semantics research groups here in the Computer
orientate oneself to the macroscopic aspects of the piece. For Laboratory; the Inference group at the Cavendish Laboratory;
the Signal Processing group in the Engineering Department;

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

and the Socio-Digital System team at the Microsoft Research [9] Grout, D. and Palisca, C. A History of Western Music
Centre. For additional input thanks also to Ian Cross, of the (Fifth Edition). W. W. Norton & Co. Inc., NY, 1996.
Centre for Music & Science. Lastly, many thanks to the [10] Howard, D. and Angus, J. Acoustics and Psychoacoustics
Leverhulme Trust and the Engineering, and Physical Science (Third Edition). Focal Press, Oxford, UK, 2006.
Research Council (EPSRC), without whose financial support
this project would not have been possible. [11] Jordà, S. New Musical Interfaces and New Music-making
Paradigms. In Proceeding of New Interfaces for Musical
Expression (ACM CHI'01), ACM Press., New York, 2001.
[1] Bellaviti, S. Perception, Reception, and All That Popular [12] Ligeti, L. Beta Foly: Experiments with Tradition and
Music: An Interview with Alejandro Viñao. In Discourses Technology in West Africe. In Leonardo Music Journal,
in Music, 6, 2 (Spr-Sum‘07), University of Toronto, 2007. 10, 2000, 41-48.

[2] Blackwell, A. and Collins, N. The programming language [13] Magnusson, T. and Mendieta, E. The Acoustic, the Digital
as a musical instrument. In Proceedings of PPIG 2005 and the Body: A Survey on Musical Instruments. In
(Brighton, UK, June 29-July 1, 2005), 2005, 120-130. Proceedings of NIME ’07 (New York, June 6-10), 2007.

[3] Clark, E. Rhythm and Timing in Music. In The Psychology [14] Piston, W. and DeVoto, M. Harmony (Fifth Edition). W.
of Music (Second Edition, ed. Deutsch, D.), Elsevier Press, W. Norton & Co. Inc., NY, 1987.
1999, 725-792. [15] Potter, K. Four Minimalists: La Monte Young, Terry Riley,
[4] Collins, N. Relating Superhuman Virtuosity to Human Steve Reich, Philip Glass (Music in the Twentieth
Performance. In Proceedings of MAXIS (Sheffield Hallam Century), Cambridge University Press, Cambridge UK,
University, April 12-14, 2002), 2002. 2000.

[5] Delahunta, S., McGregor, W. and Blackwell, A. [16] Read, G. Music Notation: A Manual of Modern Practice
Transactables. Performance Research Journal, 9, 2 (Jun. (Second Edition). Taplinger Publishing Company, New
2004), 67-72. York, NY, 1979.

[6] Dobrian, C. and Bevilaqua, F. Gestural Contol of Music: [17] Stockhausen, K. How Time Passes. In die Reihe, 3
Using the Vicon 8 Motion Capture system. In Proceedings (Musical Craftsmanship), 10-40.
of NIME’03 (Quebec, Canada, May 22-24), 2003, 161-3. [18] Vicon Motion Systems. The Vicon MX Motion Capture
[7] Ghent, E. Programmed Signals to Performers: A New System. Detailed at Last Updated:
Compositional Resource. In Perspectives of New Music, Jan. 30, 2008. Last Checked: Jan 30, 2008.
6, 1, 1967, 96-106 [19] Williamon, Aaron. Musical Excellence: Strategies and
[8] Greschak, J. Polytempo Music Articles. Available at techniques to enhance performance. Oxford University Last Updated: Jan. Press, Oxford, UK, 2004.
15, 2008. Last Checked: Jan 30, 2008.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Towards Idiomatic and Flexible Score-based Gestural

Control with a Scripting Language

Mikael Laurson Mika Kuuskankare

Sibelius Academy Sibelius Academy
Helsinki, Finland Helsinki, Finland


In this paper we present our recent enhancements in score- Our score-base control scheme has several unique fea-
based control schemes for model-based instruments. A tures. First, the input process is interactive. After listening
novel scripting syntax is presented that adds auxiliary note to the result the user can modify the score and recalculate
information fragments to user specified positions in the the score until satisfied with the outcome. The user can se-
score. These mini-textures can successfully mimic several lect and edit any range from the score, polish it and hear the
well known playing techniques and gestures - such as orna- refinements in real-time, without re-synthesizing the whole
ments, tremolos and arpeggios - that would otherwise be piece. The ability to work with only a small amount of mu-
tedious or even impossible to notate precisely in a tradi- sical material at a time has proven to be very useful. This
tional way. In this article we will focus on several ‘real-life‘ is especially important when working with musical pieces
examples from the existing repertoire from different periods of considerable length. Second, our system allows to use
and styles. These detailed examples explain how specific performance rules that generate timing information and dy-
playing styles can be realized using our scripting language. namics automatically in a similar fashion than in [1]. The
user can, however, also work by hand using the graphical
front-end of the notation package. In this case special ex-
Keywords pression markings can be inserted directly in the score. We
synthesis control, expressive timing, playing styles have found that this kind of mixed approach - using au-
tomated rules and hand-given timing information - is very
1. INTRODUCTION practical and allows to define time modifications in a more
The simulation of existing acoustical musical instruments flexible way than using automatic rules only. Third, the
such as the classical guitar in this study [4] - provides a system supports both local and global time modifications.
good starting point when one wants to evaluate the qual- The importance of this kind of approach has also been dis-
ity of a synthesis algorithm and a control system. In this cussed in [2]. Local modifications involve only one note or
paper we aim to present our recent research efforts deal- chord (such as an expression that changes the time inter-
ing with our score-based control scheme [8]. Various as- val between notes). A global modification, in turn, handles
pects of our score-based control system have already been a group of notes or chords (a typical example of this is a
presented in different papers, for instance time modifica- tempo function).
tion [5], playing technique realizations [9], and the more
recent article dealing with macro-notes [6]. In the following
we aim to combine these features and show how realistic
playing simulations can be realized in an economical way. In this section we focus on an important component our
We will discuss three larger case studies from the exist- control system called macro-note. The macro-note imple-
ing guitar repertoire and give information how the system mentation has been revised and it is now compatible with
is able to reach convincing simulations. The realizations of our scripting language syntax. This syntax in turn has been
these examples can be found as MP3 files in our home page: used in demanding analytical and compositional tasks. The scripting syntax has a pattern-matching header that ex-
Musical scores in our system are situated within a larger tracts complex score information, thus making it straight-
environment called PWGL [7]. PWGL is a visual pro- forward to produce side-effects in a score.
gramming language based on Lisp, CLOS and OpenGL. Macro-notes allow to use notational short-hands which
Scores are of primary importance in our system and they are translated by the control system to short musical tex-
can be used in many compositional and analytical applica- tures. In the simplest case this scheme allows to mimic
tions such as to produce musical material for instrumental ornaments, such as trills and arpeggios. The reason for
music [3]. introducing the macro-note scheme in our system comes
from our previous experiences using musical scores to gen-
erate control information. To realize an ornament - say a
baroque trill in a dance movement - just by using metri-
Permission to make digital or hard copies of all or part of this work for cal notation without any abstraction mechanism can be an
personal or classroom use is granted without fee provided that copies are awkward and frustrating experience. What is worse, the re-
not made or distributed for profit or commercial advantage and that copies sult is typically ruined if the user changes the tempo. Thus,
bear this notice and the full citation on the first page. To copy otherwise, to in order to capture the free-flowing accelerandi/ritardandi
republish, to post on servers or to redistribute to lists, requires prior specific gestures typically associated with these kinds of ornaments
permission and/or a fee.
NIME08, Genova, Italy we need better abstraction mechanisms: the system should
Copyright 2008 Copyright remains with the author(s). respond gracefully to tempo changes or to changes in note
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

In this section we discuss three case studies. The first one
is a tremolo study realization (the original piece was com-
posed by Francisco Tarrega). The result is given in Figure
2. Although this example is now more complex it follows a
Figure 1: Two macro-note realizations that are la- similar scheme than the previous one. The following script
belled with ”trr”. The auxilliary notes are displayed was used to realize this example. Here the PM-part (1) ac-
after the main note as note-heads without stems. cesses all chords in a score and runs the Lisp-code part (2)
if the chord contains the expression with the label ’trmch’
(the variable ’ ?1’ will be bound to the current chord). The
duration; the system should know about the current musi- pitch-field consists now of all sorted midi-values that are
cal context such as dynamics, harmony, number of notes in contained in the chord. The most complex part of the
a chord; the system should have knowledge about the cur- code deals with the generation of a plucking pattern for
rent instrument and how it should react to various playing the tremolo gesture (see the large ’case’ expression) This
techniques. result defines the ’:indices’ parameter. Here different pat-
terns are used depending on the note value of the chord.
For instance, if the note value is a quarter note, 1/4, then
4. MACRO-NOTE SYNTAX the pattern will be ’(2 3)’, which will be expanded by the
Next we go over and discuss the main features of the ’add-items’ function to ’(2 1 1 1 3 1 1 1)’. This means that
macro-note syntax. As was already stated above, a macro we will use a typical tremolo pluck pattern where we pluck
note expression uses our scripting syntax having three main once the second note and then three times the first note in
parts: (1) a pattern-matching part (PM-part), (2) a Lisp- the pitch-field, then the third note and three times the first
code part, and (3) a documentation string. In the following note, and so on. We use here also an extra keyword called
code example we give a simple marco-note script that adds ’:len-function’ that guarantees that the sequence is finished
auxiliary notes to the main note simulating a repetition after the pattern has reached a given length.
gesture (see also Figure 1): A break-point function controls the overall amplitude
contour, ’:amp’, of the resulting gesture. Note that this
(* ?1 (e ?1 "trr") ; (1) PM-part
contour is added on top of the current velocity value.
(?if (add-macro-note ?1 ; (2) Lisp-code part
:dur (synth-dur ?1) Finally, we use two parameters that affect the timing of
:dtimes ’(.13 30* .12) the result. The ’:artic’ parameter is now a floating point
:midis (m ?1) value that is interpreted by our system as an absolute time
:indices 1
:artic 50 value in seconds, here 5.0s (by contrast, in the previous
:time-modif example we used integers that in turn were interpreted as
(mk-bpf ’(0 50 100) ’(90 130 100)) percentage values). This controls the crucial overlap effect
:update-function ’prepare-guitar-mn-data))
"repetition") ; (3) Documentation of the tremolo gesture. 5.0s is used here as a short-hand to
say: ’keep all sounds ringing’. The calculation of the final
durations is, however, much more complicated (for instance
the low bass notes will ring longer than the upper ones), but
In the PM-part (1) we first state, with a wild-card, ’*’, this will be handled automatically by the update-function.
and a variable, ’ ?1’, that this script is run for each note in The ’:time-modif’ parameter is similar to the one in the
the score (thus ’ ?1’ will be bound to the current note). Fur- previous example: we do an accelerando/ritardando gesture
thermore we check whether the note contains an expression during the tremolo event.
with the label ”trr”. If this is the case we run the Lisp-code
part (2). Here we call the Lisp function ’add-macro-note’ (* ?1 :chord (e ?1 "trmch") ; (1) PM-part
(?if ; (2) Lisp-code part
that generates a sequence of notes according to its keyword (when (m ?1 :complete? T)
parameters. The arguments are normally numbers, sym- (let* ((ms (sort> (m ?1)))
bols, lists or break-point functions. Internally these argu- inds len-function)
(case (note-value ?1)
ments are converted to circular lists. In our example we first (3/4
specify the duration of the sequence (’:dur’). Next we give a (setq inds (add-items ’(4 3 2 3 2 3) 3 1)
list of durations (’:d-times’). After this we define the ’pitch- len-function ’(= (mod len 24) 0)))
field’ of our macro-note, ’:midis’, which is in our case the (setq inds (add-items ’(2 3) 3 1)
midi-value of the current note, ’(m ?1)’ . A closely related len-function ’(= (mod len 8) 0)))
argument, ’:indices’, follows, that specifies how the pitch- (1/2
(setq inds (add-items ’(4 3 2 3) 3 1)
field will be read. Here the pitch-field consists of only one len-function ’(= (mod len 16) 0))))
pitch and using the index 1 we get a sequence of repetitions. (add-macro-note ?1
Two time related parameters follow: the first one, ’:artic’, :dur (synth-dur ?1)
defines an articulation value (which is in our case 50 per- :dtimes ’(.13 30* .12)
:midis (mapcar ’list ms ms)
cent meaning ’half-staccato’); the second, ’:time-modif’, is :indices inds
a tempo function, defined as a break-point function, where :len-function len-function
x-values are relative to the duration of the note (from 0 to :amp (mk-bpf
’(0.0 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
100), and the y-values specify tempo changes as percentage (g+ ’(40 20 0 30 10 50 20 40) (vel ?1)))
values (100 percent means ’a tempo’). Thus in this gesture :artic 5.0
we start slower with 80 percent, make an accelerando up :time-modif (mk-bpf ’(0 50 100) ’(90 130 100))
:update-function ’prepare-guitar-mn-data))))
to 130 percent, and come back to the ’a tempo’ state with "tremolo chords")
100 percent. Finally, the ’:update-function’ performs some
instrument specific calibration of the generated macro-note
sequence. Figure 1 shows two applications of the macro- Our next example is a realization of a arpeggio study by
note script. Heitor Villa-Lobos (Figure 3) and the script is quite similar
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

to the previous one. The main difference is that the pitch- 6. CONCLUSIONS
field is sorted according to string number and not according This paper presents our recent developments dealing with
to midi-value as was the case in the tremolo study example. a score-based control system that allows to fill a musical
The ’:indices’ parameter is also different: now it is static, score with ornamental textures such as trills and arpeg-
reflecting the idea of the piece where the rapid plucking gios. After presenting the main syntax features we discussed
gesture is repeated over and over again. three larger case studies that aim to show how the macro-
We combine here two notions of timing control: a global note scheme can be used in a musical context.
one and a local one. A global tempo function (see the break- These examples have been subjectively evaluated by the
point function above the staff that is labelled ”/time”) authors (the first author is a professional guitarist), and
makes a slow accelerando gesture lasting for 5 measures. we consider the macro-note scheme clearly to improve the
This global timing control is reflected in our script where the musical output of our model-based instrument simulations.
local ’:dur’ parameter gets gradually shorter and shorter. While this paper concentrates in the simulation of existing
(* ?1 :chord (e ?1 "vlarp") ; (1) PM-part musical instruments, it is obvious that our control scheme
(?if (when (m ?1 :complete? t) ; (2) Lisp-code part could potentially be used also to control new virtual instru-
(let* ((ms (mapcar #’midi (sort (m ?1 :object T) #’< ments.
:key #’(lambda (n)
(first (read-key n :fingering)))))))
(add-macro-note ?1
:dur (synth-dur ?1)
:dtimes ’(.14 20* .12) The work of Mikael Laurson and Mika Kuuskankare has
:midis (mapcar ’list ms ms) been supported by the Academy of Finland (SA 105557 and
:indices ’(6 4 5 3 4 2 3 1 2 1 3 2 4 3 5 4)
:artic 1.0
SA 114116).
:amp (mk-bpf
’(0.0 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
(g+ (vel ?1) ’(50 30 10 40 20 60 30 50)))
:len-function ’(= len 32) [1] A. Friberg. Generative rules for music performance: A
:update-function ’prepare-guitar-mn-data))))
" Villa-Lobos arp")
formal description of a rule system. Computer Music
Journal, 15(2):56–71, 1991.
[2] H. Honing. From time to time: The representation of
Our final example, an excerpt from J. S. Bach’s Sara- timing and tempo. Computer Music Journal,
bande, is the most complex one, and it is probably also 35(3):50–61, 2001.
the most delicate one, due to its slow basic tempo. The [3] M. Kuuskankare and M. Laurson. Expressive Notation
piece is ornamented with rich improvised textures, such as Package. Computer Music Journal, 30(4):67–79, 2006.
portamento glides, trills and arpeggios (see Figure 4). In
[4] M. Laurson, C. Erkut, V. Välimäki, and
the following we discuss the arpeggio script that is applied
M. Kuuskankare. Methods for Modeling Realistic
three times (see the chords with expressions having the la-
Playing in Acoustic Guitar Synthesis. Computer Music
bel ”carp”). The arpeggio script is similar to the tremolo
Journal, 25(3):38–49, Fall 2001.
example as we have a database of plucking patterns. These
[5] M. Laurson and M. Kuuskankare. Aspects on Time
are organized here, however, according to the number of
Modification in Score-based Performance Control. In
notes in the pitch-field. Furthermore, the script can choose
Proceedings of SMAC 03, pages 545–548, Stockholm,
randomly (using the ’pick-rnd’ function) from several alter-
Sweden, 2003.
natives. This results in arpeggio gesture realizations that
are not static but can vary each time the score is recalcu- [6] M. Laurson and M. Kuuskankare. Micro Textures with
lated, similar to the baroque performance practices where Macro-notes. In Proceedings of International Computer
a player is expected to improvise ornaments. Music Conference, pages 717–720, Barcelona, Spain,
(* ?1 :chord (e ?1 "carp") [7] M. Laurson and M. Kuuskankare. Recent Trends in
(?if (when (m ?1 :complete? t)
(let* ((ms (sort> (m ?1))) PWGL. In International Computer Music Conference,
(ind (case (length ms) pages 258–261, New Orleans, USA, 2006.
(6 (pick-rnd [8] M. Laurson, V. Norilo, and M. Kuuskankare.
’(6 5 4 3 2 1 2 3 4 5)
’(1 2 1 3 4 3 5 6 5 6 5 4 3 2 1) PWGLSynth: A Visual Synthesis Language for Virtual
’(1 2 3 4 5 6 5 4 3 2 1))) Instrument Design and Control. Computer Music
(5 (pick-rnd Journal, 29(3):29–41, Fall 2005.
’(5 4 3 2 1 2 3 4 5)
’(1 2 1 3 4 3 5 5 4 3 2 1) [9] M. Laurson, V. Välimäki, and C. Erkut. Production of
’(1 2 3 4 5 4 3 2 1))) Virtual Acoustic Guitar Music. In AES 22nd
(4 (pick-rnd International Conference on Virtual, Synthetic and
’( 4 3 2 1 2 3 4 )
’(1 2 1 3 4 3 4 3 2 1) Entertainment Audio, pages 249–255, Espoo, Finland,
’(1 2 3 4 4 3 2 1))) 2002.
(3 (pick-rnd
’( 3 2 1 2 3)
’(1 2 1 3 3 2 1))))))
(add-macro-note ?1
:dur (* 0.95 (synth-dur ?1))
:dtimes ’(.15 30* .13)
:midis (mapcar ’list ms ms)
:indices ind
:artic 5.0
:amp (mk-bpf
’(0.0 0.25 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
(g+ (vel ?1) ’(50 0 30 10 40 20 60 30 0)))
:time-modif (mk-bpf ’(0 50 100) ’(60 150 90))
:update-function ’prepare-guitar-mn-data))))
"Bach arp")

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Figure 2: Realization of the opening measures of the tremolo study ”Recuerdos de la Alhambra” by Francisco

Figure 3: Arpeggio study by Heitor Villa-Lobos. This example is challenging as we use macro-notes mixed
with ordinary guitar notation.

Figure 4: Johann Sebastian Bach: Sarabande. This example contains macro-note arpeggios and trills,
vibrato expressions, a tempo function and a portamento expression.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Enhancing the visualization of percussion gestures by

virtual character animation
∗ ∗
Alexandre Bouënard Sylvie Gibet Marcelo M. Wanderley
Univ. Européenne Bretagne Univ. Européenne Bretagne McGill University
Vannes, France Rennes, France Montreal, Qc, Canada

ABSTRACT is composed of different views of both the virtual character

A new interface for visualizing and analyzing percussion ges- and the instrument. It is finally enhanced with interactions
tures is presented, proposing enhancements of existing mo- between graphics modeling, physics synthesis of gesture and
tion capture analysis tools. This is achieved by offering a sound replay.
percussion gesture analysis protocol using motion capture. The paper is organized as follows. In section 2, previous
A virtual character dynamic model is then designed in or- work and motivations are discussed. The analysis process
der to take advantage of gesture characteristics, yielding to of percussion (timpani) gestures is detailed in section 3. Vi-
improve gesture analysis with visualization and interaction sualization and interaction concerns are discussed in section
cues of different types. 4. Finally, we conclude with further perspectives.

Keywords 2. RELATED WORK

Gesture and sound, interface, percussion gesture, virtual Previous works concern both percussion-related models
character, interaction. and interfaces, and works combining virtual character ani-
mation and music.
Most of the work about percussion gesture and sound
1. INTRODUCTION deals with the design of new electronic percussion devices,
Designing new musical interfaces is one of the most impor- thus creating either new interfaces (controllers) and/or new
tant trends of the past decades. Efforts have constantly been sound synthesis models and algorithms.
made to elaborate more and more efficient devices in order On the one hand, new interfaces are based on increasingly
to capture instrumental gestures. These technical advances efficient devices that are able to track gestures. Electronic
have given rise to novel interaction opportunities between percussions such as Radio Baton [1], Buchla Lightning [3],
digital instruments and performers, and the creation of new Korg Wavedrum [20] and ETabla [14] are digital musical
sound, image or tactile synthesis processes. Our main guide- instruments that are improving or emulating acoustic phe-
line aims at providing a set of pedagogical tools for helping nomena, by taking into account gesture cues such as posi-
the study of percussion gestures. Among these, rendering tion, touch and pressure. More recent work take advantage
real instrumental situations (interaction between performers of various techniques, such as magnetic gesture tracking [17],
and instruments) and exploring the gestural space (and its computer vision [16] or the physical modeling of the drum
corresponding visual, gestural and sounding effects) are of skin [13].
great interest. Eventually, our final goal is to build new vir- On the other hand, it is also achieved by designing sound
tual instrumental situations, especially with gesture-sound synthesis models and algorithms, ranging from purely signal-
interactions controlled by virtual characters. This paper of- based to physically-based methods [9]. These works rarely
fers a new tool for visualizing percussion gestures, which include the study of the instrumental gesture as a whole,
exploits both the analysis and synthesis of percussion ges- especially regarding to its dynamical aspects or its playing
tures. The analysis process is achieved by capturing the techniques, even if some take into account real measure-
movements of performers, while a physical model of virtual ments [2] and physical parameters mapping with percussion
character is designed for the synthesis. The visualization gesture [5]. Playing techniques can be qualitatively observed
∗Also with Samsara / VALORIA, Université Européenne de and used ([14] [12] [8]) for a better understanding of percus-
Bretagne (UEB), Vannes, France sive gestures.
They can also be quantified thanks to capturing tech-
niques [24], among which the most used is motion capture
by camera tracking. But whichever method is used to re-
Permission to make digital or hard copies of all or part of this work for produce the quality of the instrumental gesture, it generally
personal or classroom use is granted without fee provided that copies are fails to convey its dynamic aspect. That is why we explore
not made or distributed for profit or commercial advantage and that copies in this paper the possibility to physically animate a virtual
bear this notice and the full citation on the first page. To copy otherwise, to character performing percussive gestures so that its intrinsic
republish, to post on servers or to redistribute to lists, requires prior specific features are available to our interface.
permission and/or a fee.
NIME08 Genova, Italy As for previous work combining virtual character anima-
Copyright 2008 Copyright remains with the author(s). tion and music, very few studies are available, especially

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

in a mean of taking advantage of virtual character anima-

tion for helping the visualization of gestures. The DIVA
project1 used virtual character animation for audiovisual
performances output driven by MIDI events [11]. Hints
about motion capture characteristics towards the quality
of re-synthesis of the movement [15] have been proposed.
The influence of music performance on virtual character’s
behavior [23] has also been emphasized. Some work aims
at extracting expressive parameters from video data [4] for
enhancing video analysis. Eventually, a solution consists in
directly animating virtual models from the design of sound2 .
These studies are nevertheless out of the scope of virtual
character animation as a gestural controller for enhancing
the visualization and the analysis of instrumental situations.

Figure 2: Left: French (top) and German (bottom)

3. TIMPANI PERFORMANCE grips; Right: Impact locations on the drumhead.
There are many classifications of percussion instruments,
one of the most established typologies is based on physical Players commonly use three distinct locations of impacts
characteristics of instruments and the way by which they (Figure 2, right side). The most used is definitely the one-
produce sound. According to this classification, timpani are third location, while the rim appears rather rarely.
considered as membranophones, ”producing sound when the A database of timpani gestures has been created and is
membrane or head is put into motion” [6]. composed of five gestures: legato, tenuto, accent, vertical
accent and staccato. Each gesture is presented on Figure
3.1 Timpani Basics 3, showing the space occupation (Y-Z projection) of each
Timpani related equipment is mainly composed of a bowl, drumstick’s trajectory, and highlighting the richness of tim-
a head and drumsticks (Figure 1). In general, timpanists pani playing pattern variations.
have to cope with several timpani (usually four) with bowls
varying in size [19]. As for timpani drumsticks, they consist
of a shaft and a head. They are designed in a wide range
of lengths, weights, thicknesses and materials [6] and their
choice is of great importance [18].

Figure 3: Timpani playing variations - Tip of the

drumstick trajectories (Y-Z projection). Legato is
the standard up-and-down timpani gesture. Tenuto
Figure 1: Timpani player’s toolbox: bowl, head and and accent timpani variations show an increase in
drumsticks. velocity and a decrease in space occupation (in the
Y direction). Vertical accent and staccato timpani
Timpani playing is characterized by a wide range of play- variations also show an increase in velocity, and are
ing techniques. First, there are two main strategies for hold- characterized by an increase of space occupation (in
ing drumsticks (Figure 2, left side): the ”French” grip (also the Y direction) for a more powerful attack and
called ”thumbs-up”) and the ”German” grip (or ”matched” loudness.
Taking into account these various features, timpani ges-
DIVA project : tures are thus characterized by a wide variability. Next ses-
Animusic : sion will concern the quantitative capture of these variations.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

3.2 Motion capture protocol and database of these gestures, the performer has been asked to change
We propose to quantitatively characterize timpani ges- the location of the beat impact according to Figure 2 (right
tures by capturing the motion of several timpani perform- side). Finally, our database is composed of fifteen examples
ers. We use a camera tracking Vicon 460 system3 and a of timpani playing variations for each subject, and to each
standard DV camera that allow both the retrieval of ges- example corresponds five beats per hand. This database
ture and sound. will be used when studying in detail the variations of the
The main difficulty using such hardware solutions is then timpani gesture.
the choice of the sampling frequency for the analysis of per- The use of widespread analysis tools integrated in Vicon
cussive gestures (because of the short duration of the beat software allow for the representation of temporal sequences
impact [7]). For our experiments, cameras were set at 250 as cartesian or angular trajectories (position, velocity, accel-
Hz. With a higher sampling frequency (500 Hz and 1000 eration), but one can easily observe that such a representa-
Hz), we could expect to more accurately retrieve beat at- tion isn’t sufficient to finely represent the subtility of gesture
tacks, but the spatial capture range is significantly reduced dynamics, and cannot be easily interpreted by performers.
so that it is impossible to capture the whole body. In the instrumental gesture context, we are mainly inter-
In order to retrieve beat impacts, markers have also been ested in also displaying characteristics such as contact forces,
placed on the drumsticks. The smaller timpani (23”) has vibration patterns, and a higher-level interpretation of cap-
been used to emphasize sticks rebounds. tured data (space occupation, 3D trajectories, orientation of

Our visualization framework proposes the design of a vir-
tual instrumental scene, involving the physical modeling and
animation of both virtual characters and instruments. Tim-
pani gestures are taken from the database and physically
synthetized, making available both kinematic and dynamic
cues about the original motion.

4.1 Virtual instrumental scene

A virtual instrumental scene is designed using both graph-
ics and physics layers. The OpenGL graphics API is used
for rendering the virtual character, the timpani model, and
rendering motion cues of these entities. It also allows users
to explore the virtual instrumental space and to visualize
the scene from different points of view.
The ODE physics API [22] is used for the physical simu-
lation the virtual character and collisions.

Figure 4: A subject performing the capturing pro-

tocol. The number of markers and their positions
follow Vicon’s plug-in Gait indications.

Three performers (c.f. Figure 4) were asked to perform

our timpani-dedicated capturing protocol, yielding our tim-
pani gestures database. Table 1 proposes a summary of the
playing characteristics for each subject that has performed
our capturing protocol. The differences between performers
namely lie in their degree of expertise (Professor or Master
student), the grip strategy that is used (French or German),
their dominant (Left or Right) hand, and their gender.

Table 1: Timpani gestures data.

Subject Expertise Grip Handedness Gender
S1 Professor F Right M
S2 Master stud. G Left M
S3 Master stud. G Right F
Figure 5: Real-time visualization of segments’ ori-
Each performer has been asked to perform a single stroke entations.
roll of each gesture variation (legato, tenuto, accent, vertical
accent and staccato) presented in section 3.1. And for each
These graphics and physics layers build the primary visu-
Vicon : alization framework. It is possible to enrich this visualiza-

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

tion with both meaningful kinematic and dynamic motion

cues since the overall gesture is available.

4.2 Kinematic cues

Kinematic motion cues can be of different types. Firstly,
positions and orientations of any joint and segment com-
posing the virtual character can be visualized (Figure 5) in
real-time by the rendering of coordinate references.
Temporal trajectories describing the motion can be traced
(Figure 6). These include position, velocity, acceleration,
curvature trajectories, as well as plane projections, posi-
tion/velocity and velocity/acceleration phase plots of seg-
ments and joints.

Figure 7: Real-time rendering of 3D trajectory and

bounding box - drumstick tip trajectories helps in
identifying the gesture space that is actually used.

4.3.1 Virtual character modeling and simulation

The dynamic simulation of instrumental gestures has been
achieved by firstly proposing a dynamic model of a virtual
character, and secondly by putting this physical model into
motion through a simulation framework.
The virtual character is both modeled by its anthropom-
etry and its physical representation. As for the anthropom-
etry, it directly comes from motion capture. The physical
representation of the virtual character is composed of seg-
ments (members) articulated by joints, each represented by
its physical parameters (mass, volume, degrees of freedom).
The simulation framework is composed of two modules.
The first one is the simulation of motion equations. Equa-
Figure 6: Example of kinematic trajectory plot. Tip tions 1 and 2 describe the evolution of a solid S of mass m.
of the drumstick : position/velocity phase along the The acceleration of a point M of the solid S is aM and FM
Z axis. is the resulting force applied on S at point M . The inertia
matrix of S expressed at the point M is IM , while ΩS rep-
resents the angular velocity of S. Finally τM is the resulting
Figure 6 shows an example of such plots, the trajectory
torque applied on S at the point M .
represents the position/velocity phase (projected on the Z
axis) of the drumstick. m.aM = FM (1)
Although temporal trajectories (Figure 6) convey help-
ful information about the motion, they cannot be visualized
for the moment at the same time as our virtual instrumen- IM .Ω̇S + ΩS .IM .ΩS = τM (2)
tal scene rendering. We propose the real-time visualization Once the joints and members of the virtual character can
of 3D trajectories and their corresponding bounding boxes be simulated by the emulation of motion equations, we of-
(Figure 7). This helps in identifying the gestural space ac- fer a way to physically control the virtual character with
tually used during the performance. motion capture data thanks to a Proportionnal - Integral -
In addition of these kinematic cues, we offer the visu- Derivative (PID) process (Figure 8).
alization of dynamic characteristics of percussion gestures The PID process translates the motion capture trajecto-
by physically modeling, simulating and controlling a virtual ries into forces and torques. Knowing angular targets from
character. motion capture ΘT and Θ̇T , and knowing the angular state
of the virtual character ΘS and Θ̇S , the PID computes the
4.3 Dynamic cues torque τ to be applied. Kp , Ki and Kd are coefficients to
The aim of the visualization of gesture’s dynamic profiles be tuned. This process ends the simulation framework and
is to facilitate the visualization of the interaction between makes the virtual character able to dynamically replay in-
the virtual character and the percussion model. Interaction strumental timpani sessions.
information is available, thanks to physical modeling and The interactions between the virtual character, percus-
simulation of instrumental gestures. sion model and the sound are then discussed. It is achieved

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

by taking advantage of the dynamic characteristics that are sive way of designing new gesture-sound interactions based
available thanks to our virtual character dynamic model. on both kinematic and dynamic gesture features.

Figure 9: Dynamic cues about beat impact: visual-

ization of the location and magnitude of the attack
by the propagation of a wave.

We have presented in this paper a new interface for visu-
alizing instrumental gestures, based on the animation of a
virtual expressive humanoid. This interface facilitates the
Figure 8: PID process. From motion capture data 3D rendering of virtual instrumental scenes, composed of
targets (angles ΘT and angular velocities Θ̇T ), joints’ a virtual character interacting with instruments, as well as
current state (angles ΘS and angular velocities Θ̇S ) the visualization of both kinematic and dynamic cues of the
and coefficients (Kp , Ki and Kd ) to be tuned, torques gesture. Our approach is based on the use of motion capture
τ are processed to physically control the virtual data to control a dynamic character, thus making possible
character. a detailed analysis of the gesture, and the control of the dy-
namic interaction between the entities of the scene. It be-
4.3.2 Interaction comes therefore possible to enhance the visualization of the
hitting gesture by showing the effects of the attack force on
In order to account for the interaction between the vir-
the membrane. Furthermore, the simulation of movement,
tual character’s sticks and the timpani model, we suggest to
including preparatory and interaction movement, provides a
render a propagating wave on the membrane of the timpani
mean of creating new instrumental gestures, associated with
when a beat impact occurs. Although the rendering of such
an adapted sound-production process.
a wave isn’t the theoretical solution of the wave equation,
In the near future, we expect to enrich the analysis of
this model can take into account the biomechanical proper-
gesture, by extracting relevant features from the captured
ties of the limbs and the properties of the sticks. Once the
motion, such as invariant patterns. We will also introduce
collision system detects an impact, kinematic and dynamic
an expressive control of the virtual character from a reduced
features - such as the velocity and the impact force - can be
specification of the percussion gestures. Finally, we are cur-
extracted. These features instantiate the attributes of the
rently implementing the connection of our simulation frame-
propagation of the wave making it possible the visualization
work to well-known physical modeling sound-synthesis tools
of the position and the intensity of the impact (Figure 9).
such as IRCAM’s Modalys [10] to enrich interaction pos-
Once kinematic and dynamic features of motion and phys-
sibilities of this framework. A similar strategy to existing
ical interactions are obtained, we can set up strategies of
frameworks, such as DIMPLE [21], using Open Sound Con-
sound production. In this paper, we limit ourselves to the
trol [25] messages generated by the simulation engine, is
triggering of pre-recorded sounds available from motion cap-
being considered.
ture sessions. These sounds are played when the impacts of
the virtual character sticks are detected on the membrane
of the timpani model. 6. ACKNOWLEDGMENTS
One can notice that the time when the sound is played The authors would like to thank the people who have con-
doesn’t depend on motion capture data, but depends on the tributed to this work, including Prof. Fabrice Marandola
physical simulation and interaction between the virtual per- (McGill), Nicolas Courty (VALORIA), Erwin Schoonder-
former and the percussion model. This provides an exten- waldt (KTH), Steve Sinclair (IDMIL), as well as the tim-

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

pani performers. This work is partially funded by the Nat- [15] M. Peinado, B. Heberlin, M. M. Wanderley, B. Le
ural Sciences and Engineering Research Council of Canada Callennec, R. Boulic, and D. Thalmann. Towards
(Discovery and Special Research Opportunity grants), and Configurable Motion Capture with Prioritized Inverse
the Pole de Competitivite Bretagne Images & Réseaux. Kinematics. In Proc. of the Third International
Workshop on Virtual Rehabilitation, pages 85–96,
[1] R. Boie, M. Mathews, and A. Schloss. The Radio [16] T. Mäki-Patola, P. Hämäläinen, and A. Kanerva. The
Drum as a Synthesizer Controller. In Proc. of the 1989 Augmented Djembe Drum - Sculpting Rhythms. In
International Computer Music Conference (ICMC89), Proc. of the 2006 International Conference on New
pages 42–45, 1989. Interfaces for Musical Expression (NIME06), pages
[2] R. Bresin and S. Dahl. Experiments on gesture : 364–369, 2006.
walking, running and hitting. In Rochesso & Fontana [17] M. Marshall, M. Rath, and B. Moynihan. The Virtual
(Eds.): The Sounding Object, pages 111–136, 2003. Bodhran - The Vodhran. In Proc. of the 2002
[3] D. Buchla. Lightning II MIDI Controller. International Conference on New Interfaces for Buchla and Associates’ Musical Expression (NIME02), pages 153–159, 2002.
Homepage. [18] F. W. Noak. Timpani Sticks. Percussion Anthology.
[4] A. Camurri, B. Mazzarino, M. Ricchetti, R. Timmers, The Instrumentalist, 1984. Third edition.
and G. Volpe. Multimodal analysis of expressive [19] G. B. Peters. Un-contestable Advice for Timpani and
gesture in music and dance performances. In A. Marimba Players. Percussion Anthology. The
Camurri, G. Volpe (Eds.): Gesture-Based Instrumentalist, 1984. Third edition.
Communication in Human-Computer Interaction, [20] G. Rule. Keyboard Reports: Korg Wavedrum.
LNAI 2915, Springer Verlag, pages 20-39, 2004. Keyboard, 21(3):72–78, 1995.
[5] K. Chuchacz, S. O’Modhrain, and R. Woods. Physical [21] S. Sinclair and M. M. Wanderley. Extending
Models and Musical Controllers: Designing a Novel DIMPLE: A Rigid Body Simulator for Interactive
Electronic Percussion Instrument. In Proc. of the 2007 Control of Sound. In Proc. of the ENACTIVE’07
International Conference on New Interfaces for Conference, pages 263–266, 2007.
Musical Expression (NIME07), pages 37–40, 2007. [22] R. Smith. Open Dynamics Engine.
[6] G. Cook. Teaching Percussion. Schirmer Books, 1997. [23] R. Taylor, D. Torres, and P. Boulanger. Using Music
Second edition. to Interact with a Virtual Character. In Proc. of the
[7] S. Dahl. Spectral Changes in the Tom-Tom Related to 2005 International Conference on New Interfaces for
the Striking Force. Spech, Music and Hearing Musical Expression (NIME05), pages 220–223, 2005.
Quarterly Progress and Status Report, KTH, Dept. of [24] A. Tindale, A. Kapur, G. Tzanetakis, P. Driessen, and
Speech, Music and Hearing, Royal Institute of A. Schloss. A Comparison of Sensor Strategies for
Technology, Stockholm, Sweden, 1997. Capturing Percussive Gestures. In Proc. of the 2005
[8] S. Dahl. Playing the Accent: Comparing Striking International Conference on New Interfaces for
Velocity and Timing in Ostinato Rhythm Performed Musical Expression (NIME05), pages 200–203, 2005.
by Four Drummers. Acta Acoustica with Acoustica, [25] M. Wright, A. Freed, and A. Momeni. Open Sound
90(4):762–776, 2004. Control: The State of the Art. In Proc. of the 2003
[9] C. Dodge and T. A. Jerse. Computer Music: International Conference on New Interfaces for
Synthesis, Composition and Performance. Schirmer - Musical Expression (NIME03), pages 153–159, 2003.
Thomson Learning, 1997. Second edition.
[10] N. Ellis, J. Bensoam, and R. Causse. Modalys
Demonstration. In Proc. of the 2005 International
Computer Music Conference (ICMC05), pages
101–102, 2005.
[11] R. Hanninen, L. Savioja, and T. Takala. Virtual
concert performance - synthetic animated musicians
playing in an acoustically simulated room. In Proc. of
the 1996 International Computer Music Conference
(ICMC96), pages 402–404, 1996.
[12] K. Havel and M. Desainte-Catherine. Modeling and
Air Percussion for Composition and Performance. In
Proc. of the 2004 International Conference on New
Interfaces for Musical Expression (NIME04), pages
31–34, 2004.
[13] R. Jones and A. Schloss. Controlling a physical model
with a 2D Force Matrix. In Proc. of the 2007
International Conference on New Interfaces for
Musical Expression (NIME07), pages 27–30, 2007.
[14] A. Kapur, G. Essl, P. Davidson, and P. Cook. The
Electronic Tabla Controller. Journal of New Music
Research, 32(4):351–360, 2003.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Classification of Common Violin Bowing Techniques Using

Gesture Data from a Playable Measurement System

Diana Young
MIT Media Laboratory
20 Ames Street
Cambridge, MA, USA

ABSTRACT Recently, the task of classifying individual violin bowing

This paper presents the results of a recent study of common techniques was undertaken using gesture data from the Aug-
violin bowing techniques using a newly designed measure- mented Violin, another playable sensing system [11]. In this
ment system. This measurement system comprises force, work, three bowing techniques (détaché, martelé, and spic-
inertial, and position sensors installed on a carbon fiber vi- cato) were classified using minimum and maximum bow ac-
olin bow and electric violin, and enables recording of real celeration in one dimension as inputs to a k-nearest-neighbor
player bowing gesture under normal playing conditions. Us- (k-NN) algorithm.
ing this system, performances of six different common bow- In the study presented in this paper, a similar approach
ing techniques (accented détaché, détaché lancé, louré, mar- was taken to classify violin bowing techniques. However,
telé, staccato, and spiccato) by each of eight violinists were here, the analysis incorporated a greater diversity of gesture
recorded. Using a subset of the gesture data collected, the data, i.e., more data channels, to classify six different bowing
task of classifying these data by bowing technique was un- techniques. Also, although a k-NN classifier was also used,
dertaken. Toward this goal, singular value decompostion in contrast to the research described above, the inputs to
(SVD) was used to compute the principal components of this classifier were determined by a dimensionality reduction
the data set, and then a k-nearest-neighbor (k-NN) classi- technique using all of the gesture data. That is, the data
fier was employed, using the principal components as inputs. reduction technique itself determines most salient features
The results of this analysis are presented below. of the data.
The data for this experiment was captured using a new
measurement system for violin bowing [16]. Based on the
Keywords earlier Hyperbow designs [15], this system includes force
bowing, gesture, playing technique, principal component anal- (downward and lateral bow force), inertial (3D acceleration
ysis, classification and 3D angular velocity), and position sensors installed on
a carbon fiber violin bow and electric violin, and enables
recording of real player bowing gesture under normal play-
1. INTRODUCTION ing conditions.
Physical bowing technique is a topic of keen interest in
research communities, due to the complexity of the bow-
string interaction and the expressive potential of bowing
gesture. Areas of interest include virtual instrument de- The primary goal of the bowing technique study was to
velopment [18], interactive performance [17, 2, 13, 8], and investigate the potential of using the new bowing measure-
pedagogy [7]. For many applications, reliable recognition of ment system described above to capture the disctintions be-
the individual bowing techniques that comprise right-hand tween common bowing techniques. In this study, the gesture
bowing technique would be a great benefit. and audio data generated by eight violinists performing six
Prior art on classification of violin bowing technique in different bowing techniques on each of the four violin strings
particular includes the CyberViolin project [9]. In this work, were recorded for later analysis. The details of the study
features are extracted from position data produced by an protocol, experimental setup, and participants are discussed
electromagnetic motion tracking system. A decision tree below.
takes these features as inputs in order to classify up to seven
different bowing techniques in realtime.
2.1 Study Protocol
In this study each of the eight participants was asked to
perform repetitions of a specific bowing technique originat-
ing from the Western “classical” music tradition. To help
Permission to make digital or hard copies of all or part of this work for communicate the kind of bowstroke desired, a musical ex-
personal or classroom use is granted without fee provided that copies are cerpt (from a work of the standard violin repertoire) featur-
not made or distributed for profit or commercial advantage and that copies ing each bowing technique was provided from [1]. In addi-
bear this notice and the full citation on the first page. To copy otherwise, to tion, an audio example of the bowing technique for each of
republish, to post on servers or to redistribute to lists, requires prior specific the four requested pitches was provided to the player. The
permission and/or a fee.
NIME08, Genova, Italy bowing technique was notated clearly on a score, specifying
Copyright 2008 Copyright remains with the author(s). the pitch and string, tempo, as well as any relevant articu-

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

lation markings, for each set of the recordings.

Two different tempi were taken for each of the bowing
techniques (on each pitch). First, trials were conducted us-
ing a characteristic tempo for each individual bowing tech-
nique. Immediately following these, trials were conducted
using one common tempo. Though the target trials were
actually those that were conducted with the same tempo
across all of the bowing techniques, it was found early on
that requesting performances using the characteristic tempo
first enabled the players to perform at the common tempo
with greater ease.
Both tempi required for each bowing technique were pro-
vided by a metronome. In some cases, a dynamics marking
was written in the musical example, but the participants
were instructed to perform all of the bowstrokes at a dy-
namic level of mezzo forte. Participants were instructed to
take as much time as they required to either play through
the musical example and/or practice the technique before
the start of the recordings to ensure that the performances Figure 1: This figure describes the experimental
would be as consistent as possible. setup used in the recording sessions for the bowing
Three performances of each bowing technique, comprising technique study. The top half of the figure shows the
one trial, were requested on each of the four pitches (one on interface for the Pd recording patch, and the lower
each string). During the first preliminary set of recording half shows the individual elements of the setup.
sessions, which were conducted in order to refine the experi- From left to right, they are the custom violin bowing
mental procedure, participants were asked to perform these measurement system installed on a Yamaha SV-200
bowing techniques on the open strings. The rationale for Silent Violin and a CodaBow R Conservatory vi-
this instruction was that the current measurement system olin bow; headphones; M-Audio Fast Track USB au-
does not capture any information concerning the left hand dio interface; and an Apple MacBook with a 2 GHz
gestures. It was observed, however, that players do not play Intel Core Duo processor (OS X).
as comfortably and naturally on open strings as when they
finger pitches with the left hand. Therefore, in the subse-
quent recording sessions that comprise the actual technique
study, the participants were asked to perform the bowing quiet and natural a playing environment as possible.
techniques on the fingered fourth interval above the open The participants for the bowing technique study included
string pitch, with no vibrato. eight violin students from the Schulich School of Music of
The bowing techniques that comprised this study were McGill University, five of whom had taken part in the pre-
accented détaché, détaché lancé, louré, martelé, staccato, and liminary testing sessions and who therefore already had ex-
spiccato. Brief descriptions of these techniques may be found perience with the measurement system and the test record-
in the Appendix. ing setup. The participants were recruited by means of an
email invitation and “word of mouth”, and they were each
2.2 Experimental Setup compensated $15 CAD to take part in the study. All of the
players were violin performance majors and had at least one
In each trial of the bowing technique study, the physical
year of conservatory-level training. They were also of the
gesture data were recorded simultaneously with the audio
same approximate age.1
data produced in the performances of each technique. The
experimental setup, depicted in Figure 1, was simple: the
custom violin bowing measurement system installed on a 3. TECHNIQUE STUDY EVALUATION
CodaBow R Conservatory violin bow [3] and the Yamaha The main goal of the technique study was to determine
SV-200 Silent Violin [14]; headphones (through which the whether the gesture data provided by the measurement sys-
participants heard all pre-recorded test stimuli and real- tem would be sufficient to recognize the six different bowing
time sound of the test violin); M-Audio Fast Track USB techniques (accented détaché, détaché lancé, louré, martelé,
audio interface [4]; and Apple MacBook with a 2 GHz Intel staccato, and spiccato) played by the eight violinists. To
Core Duo processor (OS X) running PureData (Pd) version begin these classification explorations, only a subset of the
0.40.0-test08 [10]. gesture data provided by the measurement system was con-
The audio and the gesture data were recorded to file by sidered for the evaluations. Included in the analyses were
means of a PD patch (shown in Figure 1), which encoded data from the eight bow gesture sensors only: the downward
the gesture data as multi-channel audio in order to properly and lateral forces; x, y, z acceleration; and angular velocity
“sync” all of the data together. Each file was recorded with about the x, y, and z axes.
a trial number, repetition number, and time and date stamp. In order to answer these questions, a simple supervised
The Pure Data (Pd) patch also allowed for easy playback of classification algorithm was used. The k-nearest-neighbor
recorded files used as test stimuli.
The recordings took place in the Center for Interdisci- 1
These studies received approval from the MIT Committee
plinary Research in Music Media and Technology (CIR- on the Use of Humans as Experimental Subjects (COUHES)
MMT) of McGill University. Care was taken to create as [5].

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

(k-NN) algorithm was chosen because it is simple and ro- Player 1, Principal Components 1-3
bust for well-conditioned data. Because each data point in
the time series was included, the dimensionality, 9152 (1144
samples in each time series x 8 gesture channels), of the ges-
ture data vector was very high. Therefore, the dimension- 0.15

ality of the gesture data set was first reduced before being 0.1

input to the classifier. 0.05

3.1 Computing the Principal Components -0.05
Principal component analysis (PCA) is a common tech- -0.1
nique used to reduce the dimensionality of data [12]. PCA -0.15 1
is a linear transform that transforms the data set into a -0.2
new coordinate system such that the variance of the data 0.95
vectors is maximized along the first coordinate dimension 0.3 0.25 0.2 0.15 0.1
0.05 0 -0.05 -0.1 -0.15 -0.2 0.9
(known as the first principal component). That is, most of
the variance is represented, or “explained”, by this dimen-
sion. Similarly, the second greatest variance is along the sec- Figure 2: Scatter plot of all six bowing techniques
ond coordinate dimension (the second principal component), for player 1 (of 8). Accented détaché (square),
the third greatest variance is along the third coordinate di- détaché lancé (triangle), louré (pentagon), martelé
mension (the third principal component), et cetera. Because (circle), staccato (star), spiccato (diamond). The
the variance of the data decreases with increasing coordinate axes correspond to the first three principal compo-
dimension, higher components may be disregarded for sim- nents.
ilar data vectors, thus resulting in decreased dimensionality
of the data set.
In order to reduce the dimensionality of the bowing ges-
Player 5, Principal Components 1-3
ture data in this study, the data were assembled into a
matrix and the principal components were computed using
the efficient singular value decompositions (SVD) algorithm.
For this bowing technique study, there were 576 (8 players x
6 techniques x 4 strings x 3 performances of each) recorded 0.3
examples produced by the participants, and for each exam- 0.2
ple, 8 channels of the bow gesture data were used. These
data were used to form a 576 x 9152 matrix M, which was
input to the SVD in order to enable the following analyses. 0
Before continuing with the classification step, it was in- -0.1
formative to illustrate the separability of bowing techniques
produced by the individual players. From the matrix M, a 1

smaller matrix composed of those 72 rows corresponding to -0.3

each violinist (6 techniques x 4 strings x 3 performances of -0.4
each) was taken and then decomposed using the SVD algo- 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3
rithm to produce the principal components of each individ-
ual player’s bowing data. A scatter plot was then produced
for each player’s data, showing the first three principal com- Figure 3: Scatter plot of all six bowing techniques
ponents corresponding to each bowing technique. Two of for player 5 (of 8). Accented détaché (square),
these plots are shown in Figures 2 and 3. As can be seen détaché lancé (triangle), louré (pentagon), martelé
in these examples, clear separability of bowing techniques (circle), staccato (star), spiccato (diamond). The
for individual players was demonstrated using only three di- axes correspond to the first three principal compo-
mensions. nents.

3.2 k-NN Classification

After computing the principal components produced by the principal components corresponding to the training data
the SVD method, the challenge of classifying the data was were input to the k-NN algorithm, enabling the remaining
undertaken using the full data matrix (including all play- data to be classified according to technique. For each case,
ers data together). Toward this goal, a k-nearest-neighbor a three-fold cross-validation procedure was obeyed, as this
classifier was used. Specifically, Nabney’s matlab implemen- process was repeated as the training data (and the data
tation [6] was employed. In this case, a subset of the data to be classified) were rotated. The final classification rate
contributed by all of the players was used to train the k-NN estimates were taken as the mean and standard deviation of
algorithm in order to classify the remaining data from all of the classification rates of the cross-validation trials.
the players by technique. The effect on the overall classification rate of the number
In each case, the principal components of the training of principal components is clearly illustrated by Figure 4.
data set were first computed using the SVD method. The As seen in Table 1, using 7 principal components enables
remaining data (to be classified) were then projected into classification of 6 bowing techniques of over 95.3 ± 2.6% of
the eigenspace determined by this exercise. Some number of the remaining data.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

class. acc. det. louré martelé staccato spiccato

actual detaché lanceé Player 6, Principal Components 1-3
acc. det. 0.938 0.010 0.010 0.042 0.000 0.000
det. lancé 0.000 0.917 0.000 0.010 0.021 0.052
louré 0.000 0.000 0.979 0.000 0.021 0.000
martelé 0.042 0.021 0.000 0.938 0.000 0.000 0.15
staccato 0.000 0.010 0.010 0.000 0.979 0.000
spiccato 0.000 0.031 0.000 0.000 0.000 0.969
Table 1: Training on two-thirds of the data from 0
each of the eight players, predicting the remain third -0.05
of each player’s data (with overall prediction of 95.3
± 2.6%) with seven principal components. -0.1

1 -0.2 1

-0.25 0.95
0.9 0.25 0.2 0.15 0.1 0.05 0.9
0 -0.05 -0.1 -0.15 -0.2 -0.25


Figure 5: Scatter plot of all six bowing techniques

Classification Rate

0.7 for player 6. Accented détaché (square), détaché

lancé (triangle), louré (pentagon), martelé (circle),
0.6 staccato (star), spiccato (diamond). The axes corre-
spond to the first three principal components. As
0.5 can be seen here, the détaché lancé and spiccato
techniques are not separable in three dimensions.

iment partly for ease of implementation. Other techniques,

1 2 3 4 5 6 7 8 9 10 however, should be evaluated in pursuit of robustness and
Number of Principal Components
higher classification rates.
Finally, more vigorous classification of bowing techniques
Figure 4: Mean prediction rates produced by k-
should include qualitative listening evaluations of the bow-
NN using two-thirds of the data from each of the
ing audio to complement the quantitative evaluation of the
eight players to predict the remaining one-third of
bowing gesture data.
all player data and increasing principal components
from one to ten.
Special thanks to the violinists of the Schulich School of
Music of McGill University for their time and participation;
4. DISCUSSION and to André Roy, Ichiro Fujinaga, and Stephen McAdams
The results of this bowing technique study are encour- for their help in organizing this study; Roberto Aimi for his
aging. Using a relatively small number of principal com- Pd expertise; and Joseph Paradiso for discussion; and much
ponents, the k-NN classification yielded over 95% average gratitude to the violinists from the Royal Academy of Music
classification of the six bowing techniques produced by the who participated in the early pilot studies for this research.
eight participants. Some of the error of this result can be un-
derstood from Table 1. This confusion matrix shows that ac- 6. APPENDIX
cented détaché is most often mis-classified as martelé, which Descriptions, taken from [1], of the six bowing techniques
is not surprising as these two techniques are somewhat sim- featured in this study are included below.
ilar in execution. Interestingly, there was considerable error
from mis-classifying détaché lancé as spiccato. Although • accented détaché A percussive attack, produced by
these two techniques are quite diffferent from each other, great initial bow speed and pressure, characterizes this
Figure 5 implies they were confused by one of the partici- stroke. In contrast to the martelé, the accented détaché
pants. This discrepancy alone explains much of the error in is basically a non-staccato articulation and can be per-
classifying these two techniques. formed at greater speeds than the martelé.
Of course, there is much to be done to build on the work • détaché Comprises a family of bowstrokes, played on-
begun here. The analysis described here involved the clas- the-string, which share in common a change of bowing
sification of six different bowing techniques in which each direction with the articulation of each note. Détaché
trial was actually comprised of a repetition of one of these strokes may be sharply accentuated or unaccentuated,
techniques. An immediate next step is to analyze the same legato (only in the sense that no rest occurs between
data set using individual bowstrokes. Also, only a subset of strokes), or very slightly staccato, with small rests sep-
the gesture channels captured by the bowing measurement arating strokes.
system was used for this study. For future studies that may
include more techniques and players, the benefit of the re- • détaché lancé “Darting” détaché. Characteristically,
maining channels should be explored. a short unaccented détaché bowstroke with some stac-
The SVD and k-NN algorithms were chosen for this exper- cato separation of strokes.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

• legato Bound together (literally, “tied”). Without [12] G. Strang. Linear Algebra and Its Applications.
interruption between the notes; smoothly connected, Brooks Cole, Stamford, CT, 4th edition, 2005.
whether in one or several bowstrokes. [13] D. Trueman and P. R. Cook. BoSSA: The
deconstructed violin reconstructed. In Proceedings of
• louré A short series of gently pulsed, slurred, legato
the International Computer Music Conference,
notes. Varying degrees of articulation may be em-
Beijing, 1999.
ployed. The legato connection between notes may not
be disrupted at all, but minimal separation may be [14] Yamaha. SV-200 Silent Violin.
[15] D. Young. Wireless sensor system for measurement of
• martelé Hammered; a sharply accentuated, staccato violin bowing parameters. In Proceedings of the
bowing. To produce the attack, pressure is applied Stockholm Music Acoustics Conference (SMAC 03),
an instant before bow motion begins. Martelé differs Stockholm, August 2003.
from accented détaché in that the latter has primar- [16] D. Young. A Methodology for Investigation of Bowed
ily no staccato separation between strokes and can be String Performance Through Measurement of Violin
performed at faster speeds. Bowing Technique. PhD thesis, M.I.T., 2007.
• staccato Used as a generic term, staccato means a [17] D. Young, P. Nunn, and A. Vassiliev. Composing for
non-legato martelé type of short bowstroke played with Hyperbow: A collaboration between MIT and the
a stop. The effect is to shorten the written note value Royal Academy of Music. In Proceedings of the 2006
with an unwritten rest. Conference on New Interfaces for Musical Expression
(NIME-06), Paris, 2006.
• spiccato A slow to moderate speed bouncing stroke. [18] D. Young and S. Serafin. Investigating the
Every degree of crispness is possible in the spiccato, performance of a violin physical model: Recent real
ranging from gently brushed to percussively dry. player studies. In Proceedings of the International
Computer Music Conference, Copenhagen, 2007.
[1] J. Berman, B. G. Jackson, and K. Sarch. Dictionary
of Bowing and Pizzicato Terms. Tichenor Publishing,
Bloomington, Indiana, 4th edition, 1999.
[2] F. Bevilacqua, N. H. Rasamimanana, E. Fléty,
S. Lemouton, and F. Baschet. The augmented violin
project: research, composition and performance
report. In Proceedings of the 2006 Conference on New
Interfaces for Musical Expression (NIME-06), Paris,
[3] CodaBow. Conservatory Violin Bow.
[4] M-Audio. Fast Track USB.
[5] MIT Committee on the Use of Humans as
Experimental Subjects (COUHES).
[6] I. T. Nabney. Netlab neural network software.
[7] K. Ng, B. Ong, O. Larkin, and T. Koerselman.
Technology-enhanced music learning and teaching:
i-maestro framework and gesture support for the
violin family. In Association for Technology in Music
Instruction (ATMI) 2007 Conference, Salt Lake City,
[8] J. Paradiso and N. Gershenfeld. Musical applications
of electric field sensing. Computer Music Journal,
21(3):69–89, 1997.
[9] C. Peiper, D. Warden, and G. Garnett. An interface
for real-time classification of articulations produced by
violin bowing. In Proceedings of the 2003 Conference
on New Interfaces for Musical Expression (NIME-03),
Montreal, 2003.
[10] M. Puckette. Pure Data (Pd).
[11] N. Rasamimanana, E. Fléty, and F. Bevilacqua.
Gesture analysis of violin bow strokes. Lecture Notes
in Computer Science, pages 145–155, 2006.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Slide guitar synthesizer with gestural control

Jyri Pakarinen Vesa Välimäki Tapio Puputti

Department of Signal Department of Signal Helsinki University of
Processing and Acoustics Processing and Acoustics Technology
Helsinki University of Helsinki University of P.O. Box 3000
Technology Technology FI-02015 TKK, Finland
P.O. Box 3000 P.O. Box 3000
FI-02015 TKK, Finland FI-02015 TKK, Finland

ABSTRACT x(n) y(n)

This article discusses a virtual slide guitar instrument, re-
cently introduced in [7]. The instrument consists of a novel CSG H l(z) gc z-L f z-LI
physics-based synthesis model and a gestural user interface. Contact sound Loop filter Energy
Fractional Integer delay
The synthesis engine uses energy-compensated time-varying generator compensation
digital waveguides. The string algorithm also contains a
parametric model for synthesizing the tube-string contact
sounds. The real-time virtual slide guitar user interface em- Figure 1: The signal flow diagram of the slide guitar
ploys optical gesture recognition, so that the user can play string synthesizer. The energy compensation block
this virtual instrument simply by making slide guitar play- compensates for the artificial energy losses due to
ing gestures in front of a camera. the time-varying delays. The contact sound genera-
tor (see Figure 2) simulates the handling noise due
to the sliding tube-string contact.
Sound synthesis, slide guitar, gesture control, physical mod-
eling played by wearing the slide tube on one hand and the ring
on the other, and by making guitar-playing gestures in front
of the camera. The user’s gestures are mapped into synthe-
1. INTRODUCTION sis control parameters, and the resulting sound is played
The term slide- or bottleneck guitar refers to a specific back through the loudspeaker in real-time. More informa-
traditional playing technique on a steel-string acoustic or tion on gestural control of music synthesis can be found e.g.
electric guitar. When playing the slide guitar, the musician in [8] and [16].
wears a slide tube on the fretting hand. Instead of pressing From the control point of view, the VSG can be seen as
the strings against the fretboard, she or he glides the tube a successor of the virtual air guitar (VAG) [1] developed at
on the strings while the picking hand plucks the strings in Helsinki University of Technology a few years ago. The ma-
a regular fashion. This produces a unique, voice-like tone jor difference between these gesture-controlled guitar syn-
with stepless pitch control. Although the tube is usually thesizers is that like in the real slide guitar, the VSG allows
slid along all six strings, single-note melodies can be played a continuous control over the pitch, and also sonifies the
by plucking just one string and damping the others with contact sounds emanating from the sliding contact between
the picking hand. The slide tube, usually made of glass the slide tube and the imaginary string.
or metal, also generates a squeaking sound while moving The VSG uses digital waveguides [11, 12] for synthesiz-
along on the wound metal strings. In most cases, the slide ing the strings. A model-based contact sound generator
guitar is tuned into an open tuning (for example the open is added for simulating the friction-based sounds created
G tuning: D2 , G2 , D3 , G3 , B3 , and D4 starting from the by the sliding tube-string contact. More information on
thickest string). This allows the user to play simple chords physics-based sound synthesis methods can be found in [14].
just by sliding the tube into different positions on the guitar
neck. The player usually wears the slide tube on the pinky
or ring finger, and the other fingers are free to fret the
strings normally. A single-delay loop (SDL) digital waveguide (DWG) model
A virtual slide guitar (VSG) [7, 4] is described in this [2] with time-varying pitch forms the basis of the slide gui-
paper. The VSG consists of an infra-red (IR) camera, IR- tar synthesis engine, as illustrated in Fig. 1. The string
reflecting slide tube and a ring, a computer running a physics- model consists of a feedback delay loop with an additional
based string algorithm, and a loudspeaker. The VSG is loop filter, an energy scaling coefficient, and a contact sound
generator block. The fractional delay filter in Fig. 1 allows
for a smooth transition between pitches, and also enables
the correct tuning of the string. There are several tech-
Permission to make digital or hard copies of all or part of this work for niques for implementing fractional delay filters, a thorough
personal or classroom use is granted without fee provided that copies are tutorial being found in [3]. For the purpose of this work,
not made or distributed for profit or commercial advantage and that copies a fifth-order Lagrange interpolator was found to work suf-
bear this notice and the full citation on the first page. To copy otherwise, to ficiently well. It must be noted that both the integer delay
republish, to post on servers or to redistribute to lists, requires prior specific line length and the fractional delay filter are time-varying,
permission and/or a fee.
NIME08, Genoa, Italy i.e. the user controls the total loop delay value and thus
Copyright 2008 Copyright remains with the author(s). also the pitch during run-time.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

(a) Filter 1-gbal

The basis of the synthetic contact sound for wound strings
is produced in the noise pulse train generator (Fig. 2, block
(a)). It outputs exponentially decaying noise pulses at the
Filter Waveshaper gTV given firing rate. In addition, the type of the string deter-
mines the decay time and duration of an individual pulse.
gbal guser For enhancing the harmonic structure of the contact noise
(b) (c) on wound strings, the lowest time-varying harmonic is em-
L(n) Smooth |x|
- nw fc(n) phasized by filtering the noise pulse train with a second-
order resonator (block (b)), where the firing rate controls
the resonators center frequency. The higher harmonics are
Figure 2: The contact sound generator block. The produced by distorting the resonators output with a suit-
sliding velocity controlled by the user commands able nonlinear waveshaper (block (c)). A scaled hyperbolic
the synthetic contact noise characteristics. The tangent function is used for this. Hence, the number of
sub-blocks are (a) the noise pulse generator, (b) a higher harmonics can be controlled by changing the scaling
resonator creating the first harmonic of the time- of this nonlinear function.
varying noise structure, (c) a static nonlinearity A 4th-order IIR filter (block (d)) is used for simulating
generating the upper time-varying harmonics, and the static longitudinal string modes and the general spec-
(d) an IIR filter simulating the general spectral tral shape of the contact noise. As the noise characteristics
characteristics of the noise. depend on the tube material and string type, different filter
parameters are used for different slide tube and string con-
figurations. In Fig. 2, the scaling coefficient gbal controls
The loop filter is a one-pole lowpass filter that simulates the ratio between the time-varying and static contact sound
the vibrational losses of the string. Different filter parame- components. Finally, the total amplitude of the synthetic
ters are used depending on the length and type of the string, contact noise is controlled by the slide velocity fc (n), via a
as suggested in [15]. Also, when changing the length of a scaling coefficient gTV . Parameter guser allows the user to
DWG string during run time, the signal energy is varied [5]. control the overall volume of the contact sound. For plain,
In practice, this can be heard as an unnaturally quick decay i.e. unwound strings, the contact sound synthesis block is
of the string sound. A time-varying scaling technique, in- simplified by replacing the noise burst generator (block (a)
troduced in [5], was used as a compensation. This results in in Fig. 2) with a white noise generator, and by omitting
an additional scaling operation inside the waveguide loop, blocks (b), (c), and (d).
as illustrated in Fig. 1.
2.1 Contact Sound Synthesis Since the user controls the pitch of the VSG in a continu-
The handling sounds created by the sliding tube-string ous manner, it is important that there is not a large latency
contact are very similar to the handling sounds between a between the user’s action and the resulting sound. Thus, a
sliding finger-string contact. A recent study [6] revealed high frame rate (120 fps) infra-red (IR) camera is used for
that these squeaky sounds consist mainly of lowpass-type detecting the users hand locations. The camera operates by
noise with both static and time-varying harmonic compo- lighting the target with IR-LEDs and sensing the reflected
nents. The lowpass-cutoff frequency, frequencies of the time- IR light. A real slide tube coated with IR reflecting fabric is
varying harmonics, and the overall magnitude of the contact used for detecting the users fretting hand. For recognition
noise are controlled by the sliding velocity. of the picking hand, a small ring of IR reflecting fabric is
For synthesizing the handling sounds, we chose a noise worn on the index finger.
pulse train as the excitation signal. This is based on the as-
sumption that when the tube slides over a single winding, it 3.1 Technical Details
generates a short, exponentially decaying noise burst. The The implementation works on a 2.66 GHz Intel Pentium
time interval between the noise pulses is controlled by the 4 CPU with 1 GB of RAM and a SoundMax Integrated Dig-
sliding velocity; a fast slide results in a temporally dense ital Audio soundcard. Both the sound synthesis part and
pulse train, while a slow slide makes the pulses appear fur- the camera interface operate in the Windows XP environ-
ther apart. In fact, the contact sound synthesizer can be ment. The sound synthesis uses PD (Pure Data) [9] version
seen as a periodic impact sound synthesis model rather than 0.38.4-extended-RC8. The sampling frequency for the syn-
a friction model. thesis algorithm is 44.1 kHz, except for the string waveguide
The general structure of the contact noise generator block loop, which runs at 22.05 kHz, as suggested in [13]. A Natu-
is illustrated in Fig. 2. The input variable L(n) denotes the ralpoint TrackIR4:PRO USB IR-camera is used for gesture
relative string length, controlled by the distance between recognition. Its output is a 355 x 290 binary matrix, where
the user’s hands. Variable n is the time index. Since the the reflected areas are seen as blobs. As a side note, a re-
contact noise depends on the sliding velocity, a time differ- cent article describing a PD patch for multichannel guitar
ence is taken from the input signal. If the control rate of effects processing can be found in [10].
the signal L(n) is different from the sound synthesis sam-
pling rate, as is often the case, a separate smoothing block 3.2 Camera API
is required after the differentiator. The smoothing block For the camera API (Application Programming Inter-
changes the sampling rate of L(n) to be equal to the sound face), Naturalpoint’s OptiTrack SDK version 1.0.030 was
synthesis sampling rate and uses polynomial interpolation used. The API was modified in the Visual Studio environ-
to smooth the control signal. Furthermore, since the con- ment to include gesture-recognition features. The added
tact noise is independent of the direction of the slide (up / features consist of the distinction between the two blobs
down on the string), the absolute value of the control signal (i.e. slide and plucking hand), calculation of the distance
is taken. The scaling coefficient nw denotes the number of between them, recognition of the plucking and pull-off ges-
windings on the string. The signal fc after this scaling can tures, and transmission of the control data to PD as OSC
therefore be seen as the noise pulse firing rate. (Open Sound Control) messages. Also, an algorithm was

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

added to keep track of the virtual string location, i.e. an synthetic noise. The overall spectral shape of the contact
imaginary line representing the virtual string. This is very noise is set with a 4th-order IIR filter.
similar to the work presented in [1]. The line is drawn The slide guitar synthesizer is operated using an optical
through the tube and the averaged location of the pluck- gesture recognition user interface, similarly as suggested in
ing hand, so that the virtual string slowly follows the play- [1]. However, instead of a web-camera, a high-speed infrared
ers movements. This prevents the user from drifting away video camera is used for attaining a lower latency between
from the virtual string. The API detects the direction of the users gesture and the resulting sound. This IR-based
the plucking hand movement, and when the virtual string camera system could also be used for gestural control of
is crossed, a pluck event and a direction parameter is sent. other latency-critical real-time applications. The real-time
Also, a minimum velocity limit is defined for the plucking virtual slide guitar model has been realized in PD. A video
gesture in order to avoid false plucks. file showing the virtual slide guitar in action can be found on
the Internet:
3.3 PD Implementation
When the PD implementation receives an OSC message
containing a pluck event, an excitation signal is inserted
into each waveguide string. The excitation signal is a short This work has been supported by the GETA graduate
noise burst simulating a string pluck. There is also a slight school, the Cost287-ConGAS action, EU FP7 SAME project,
delay (20 ms) between different string excitations for cre- and the Emil Aaltonen Foundation.
ating a more realistic strumming feel. The order in which
the strings are plucked depends on the plucking direction. 6. REFERENCES
Figure 3 illustrates the structure and signaling of the PD
patch. [1] M. Karjalainen, T. Mäki-Patola, A. Kanerva, and
The camera software can be set to show the blob positions A. Huovilainen. Virtual air guitar. J. Audio Eng.
on screen in real time. This is not required for playing, but Soc., 54(10):964–980, Oct. 2006.
it helps the user to stay in the cameras view. The camera [2] M. Karjalainen, V. Välimäki, and T. Tolonen.
API uses roughly 10% of CPU power without the display Plucked-string models: From the Karplus-Strong
and 20-40% with the display turned on. Since PD uses up to algorithm to digital waveguides and beyond.
80% of CPU power when playing all six strings, the current Computer Music J., 22(3):17–32, 1998.
VSG implementation can run all six strings in real time [3] T. I. Laakso, V. Välimäki, M. Karjalainen, and U. K.
without a noticeable drop in performance, provided that the Laine. Splitting the unit delay - tools for fractional
blob tracking display is turned off. Selecting fewer strings, delay filter design. IEEE Signal Proc. Mag.,
switching the contact sound synthesis off, or dropping the 13(1):30–60, 1996.
API frame rate to half, the display can be viewed while [4] J. Pakarinen. Modeling of Nonlinear and
playing. Time-Varying Phenomena in the Guitar. PhD thesis,
Helsinki University of Technology, 2008. Available
3.4 Virtual Slide Guitar on-line at
The virtual slide guitar system is illustrated in Fig. 4.
The camera API recognizes the playing gestures and sends (checked Apr. 14, 2008).
the plucking and pull-off events, as well as the distance be- [5] J. Pakarinen, M. Karjalainen, V. Välimäki, and
tween the hands, to the synthesis control block in PD. The S. Bilbao. Energy behavior in time-varying fractional
synthesis block consists of the DWG models illustrated in delay filters for physical modeling of musical
Fig. 1. At its simplest, the VSG is easy to play and needs no instruments. In Proc. Intl. Conf. on Acoustics,
calibration. The user simply puts the slide tube and reflect- Speech, and Signal Proc., volume 3, pages 1–4,
ing ring on and starts to play. For more demanding users, Philadelphia, PA, USA, Mar. 19-23 2005.
the VSG provides extra options, such as altering the tuning [6] J. Pakarinen, H. Penttinen, and B. Bank. Analysis of
of the instrument, selecting the slide tube material, setting handling noises on wound string. J. Acoust. Soc. Am.,
the contact sound volume and balance between static and 122(6):EL197–EL202, Dec. 2007.
dynamic components, or selecting an output effect (a reverb [7] J. Pakarinen, T. Puputti, and V. Välimäki. Virtual
or a guitar amplifier plugin). slide guitar. Computer Music J., 32(3), 2008.
The tube-string contact sound gives the user direct feed- Accepted for publication.
back of the slide tube movement, while the pitch of the [8] J. Paradiso and N. Gershenfeld. Musical Applications
string serves as a cue for the tube position. Thus, visual of Electric Field Sensing. Computer Music J., 21(2),
feedback is not needed in order to know where the slide 1997.
tube is situated on the imaginary guitar neck.
[9] M. Puckette. Pure data. In Proc. Intl. Computer
Music Conf., pages 269–272, 1996.
4. CONCLUSIONS [10] M. Puckette. Patch for guitar. In Proc. PureData
This paper discussed a real-time virtual slide guitar syn- Convention 07, Aug. 21-26 2007. Available on-line at
thesizer with camera-based gestural control. Time-varying∼catalogue-pd/19-Puckette.pdf
digital waveguides with energy-compensation are used for (checked Apr. 9, 2008).
simulating the string vibration. The contact noise between [11] J. O. Smith. Physical modeling using digital
the strings and the slide tube is generated with a paramet- waveguides. Computer Music J., 16(4):74–87, Winter
ric model. The contact sound synthesizer consists of a noise 1992.
pulse generator, whose output is fed into a time-varying [12] J. O. Smith. Physical Audio Signal Processing. 2004.
resonator and a distorting nonlinearity. By controlling the Aug. 2004 Draft,
noise pulse firing rate, the resonators center frequency, and∼jos/pasp/.
the overall dynamics with the sliding velocity, a realistic [13] V. Välimäki, J. Huopaniemi, M. Karjalainen, and
time-varying harmonic structure is obtained in the resulting Z. Janosy. Physical modeling of plucked string

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Figure 3: Structure and signaling of the PD patch.

Camera view
IR Cam

Control Synthesis

Pluck=up Pluck=up
PullOff=false false

Figure 4: The complete components of the virtual slide guitar.

instruments with application to real-time sound

synthesis. J. Audio Eng. Soc., 44(5):331–353, 1996.
[14] V. Välimäki, J. Pakarinen, C. Erkut, and
M. Karjalainen. Discrete-time modelling of musical
instruments. Reports on Progress in Physics,
69(1):1–78, Jan. 2006.
[15] V. Välimäki and T. Tolonen. Development and
calibration of a guitar synthesizer. J. Audio Eng.
Soc., 46(9):766–778, 1998.
[16] M. Wanderley and P. Depalle. Gestural control of
sound synthesis. Proc. IEEE, 92(4):632–644, 2004.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

An Approach to Instrument Augmentation: the Electric

Otso Lähdeoja
CICM MSH Paris Nord, Paris 8 University
4, rue de la croix faron 93210 St Denis, France
(+33) 01 49 40 66 12

ABSTRACT work with acoustic instruments from the classical orchestra

In this paper we describe an ongoing research on augmented instrumentarium. In the research project presented here, we are
instruments, based on the specific case study of the electric working on a form of augmentation of the electric guitar, which
guitar. The key question of the relationship between gesture, distinguishes itself as already being an acoustic–electric hybrid
instrument and sound is approached via an analysis of the instrument. Initially developed as an augmented instrument of
electric guitar’s design, playing technique and interface its time, the electric guitar is intrinsically connected with
characteristics. The study points out some inherent defaults in technology. Over the decades, it has undergone extensive
the guitar’s current forms of acoustic-electric hybridation, as experimentation and development following the technological
well as new perspectives for a better integration of the shifts from analogue electronics to MIDI, and to digital audio.
relationship between instrumental gesture and signal The electric guitar incorporates key electronic live music issues
processing. These considerations motivate an augmented guitar in itself, such as signal processing, amplification, interface and
project at the CICM, in which a gestural approach to control. Moreover, this « live electronic » praxis is, and has
augmentation is developed, emphasising the role of the been, widely shared, tested, and discussed by a worldwide
instrumentist’s repertoire of body movements as a source for community of users, in a wide variety of musical styles and
new gesture-sound « contact points » in the guitar playing expressions. With all its effects, pedals, amplifiers, and more
technique. recently computers, the electric guitar stands out as a pioneer
instrument in the area of acoustic–electronic hybridation.
Augmented instrument, electric guitar, gesture-sound Nevertheless, as we will try to demonstrate in this article, the
relationship solutions adopted by the electric guitar fall far from an ideal
augmented instrument. In its current state, it offers a complex
and often clumsy working environment, and much too often a
1. INTRODUCTION stereotyped, reductive approach to the musical possibilities of
The current research field on augmented instruments is signal processing and synthesis. For us, the actual point of
motivated by the assumption that the combination of traditional interest lies in understanding the causes for the rather poor
acoustic instruments with today’s sound technology yields a integration between the playing technique, the guitar and the
high potential for the development of tomorrow’s musical electronics, as found in the current electric guitar set-ups. This
instruments. Integrating the tactile and expressive qualities of leads us to question on a fundamental level the design of the
the traditional instruments with the sonic possibilities of today’s gesture–sound relationship in an augmented instrument.
digital audio techniques creates a promising perspective for
instrument design. A substantial research effort has already
been conducted in the field of instrument augmentation. Some 2. TECHNOLOGICAL EVOLUTION OF
projects, like the MIT hyperinstruments group [7] [5], the THE ELECTRIC GUITAR
augmented violin of the IRCAM [3] [9], and research work at The first electric guitar (Rickenbacker « frying pan », patent
the STEIM [8], have attained emblematic status, establishing filed in 1934) was an amplified acoustic guitar, motivated by
technological and methodological models in the research field popular music’s need for louder volume levels. Its qualities of
of augmentation, such as the use of sensor technology and timbre were poor compared to acoustic instruments and due to
signal analysis techniques to « tap into » the insrumental this it remained disregarded in its debuts [11]. From the 50’s
gesture. onward, technological progress and the rise of new popular
music styles promoting the values of individuality and
1.1. Electric guitar - precursor of originality opened a demand for large scale experimentation of
the sound possibilities offered by the new electric instrument.
augmentation Starting with the development of guitars, microphones and
A short survey of the afore mentioned works on instrument amplifiers, the experimentation went on to signal processing
augmentation shows that there has been a general tendency to with analog « effects », guitar driven analog synthesisers
(Roland gr-500, 1977), creating bridges between audio and the
MIDI protocol (guitar sythesiser, Roland GR-700, 1984), and
adopting digital audio processing in the early 80’s [10].

Currently, the electric guitar is following its development with

the integration of the evermore powerful microprocessors,
whether incorporated, like in the in the « many-guitars-in-one »
modeling system proposed by Line6 (Variax), or on a PC with a
« plug & play » environment like the « guitar rig » of Native
Instruments. Other approaches are being explored (and

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

commercialised), like the Gibson HD Digital Guitar featuring seems very limited and qualitatively poor in regard to the sonic
onboard analog-to-digital conversion and an ethernet possibilities offered by the technologies used.
connection which outputs an individual digital audio stream for 2) The instrument undergoes spatial extension, going from a
each string. single object to a collection of interconnected modules. A
common electric guitar playing environment comprises a guitar,
3. SOUND AND GESTURE a set of « effect » pedals and an amplifier, adding up to form an
environment which may easily expand beyond a single person’s
RELATIONSHIP IN THE ELECTRIC physical capacities of simultaneous control.
3.1. The gesture – string – signal continuum It appears to us that the cumulative approach to
The basis of the electric guitar is a transduction of the of the « electrification » and augmentation adopted by the electric
vibrating strings’ mechanical energy into electricity by a guitar carries inherent problems for the signal processing
microphone. The electromagnetic pick-up converts the vibration control which lead to downgrading its sonic and expressive
of the string directly into voltage, thus creating an immediate possibilities. Nevertheless, the established modular set-up of the
causal relationship between the instrumental gesture providing electric guitar is currently undergoing a profound
the initial energy, the string, and the electric signal produced. transformation with the advent of digital audio computing
Thus the basis of the electric guitar preserves the fundamental within the guitar itself or with PC plug-and-play environments.
characteristic of the acoustic instruments, the connection This development could offer an opportunity to redesign the
between gesture and sound, through direct energy transduction electric guitar by efficiently integrating the signal processing
as described by Claude Cadoz [4]. This intimacy ensures a high with the player’s gestures, and connecting the electronic graft to
quality instrumental relationship between the player and the the instrument and to its playing technique on a fundamental
guitar, a fact that has certainly contributed to the success of the level.
electric guitar among other experimental instruments. Players
experience an immediate response, multimodal (haptic, aural, 4. « CONTACT POINTS » : AN ANALYSIS
and, to a lesser degree, visual) feedback, a sense of OF THE GESTURE-SOUND
« connectedness » to the instrument.
The augmentation project we have undertaken has its basis in
3.2. A cumulative model of augmentation the observation that a musical instrument is not simply an
While the basis of the electric guitar is a genuine « electrified » object, but a meeting point between a gesture and an object, the
acoustic instrument, its hybrid quality becomes more abstruce result of this encounter being the sound which is produced. For
with the addition of various sound shaping modules or us, a musical instrument loses its essence when taken out of its
« effects », essential in creating the instrument’s tone. These context, i.e. the relationship with the human body. In this
analog or digital extensions are powered by electricity and have « gestural » approach of the instrument, the central question is
no direct energy connection to the initial playing gesture, and to find ways of understanding the link between the body, the
therefore alternative strategies for their control must be object and the sound produced. The nature of the continuum
conceived. This makes up for a second « level » of the hybrid between gesture and sound, mediated by the instrument, is a key
instrument, where the gesture – sound relationship has to be factor for the expressive and musical qualities of an instrument.
designed solely by means of creating correspondencies between A highly functional continuum enables the player to gradually
acquired gesture data and sound processing parameters. The embody the instrument in a process where the musician’s
design of the « electric » level of the hybrid instrument is a proprioception extends to the instrument, resulting in an
central question of instrument augmentation, all the more experience of englobing the instrument and playing directly
challenging as the electric « implant » should integrate and with the sound [2]. Through observation of how the musician
enhance the instrument without hindering the acoustic level’s connects to the instrument, it appears that the body manipulates
sonic possibilities and playing technique. the instrument with a repertory of very precisely defined
movements. Each part of the body connecting to the instrument
In the case of the electric guitar, the question of coexistence has its own « vocabulary » of gestures adapted to its task and to
between the acoustic and electric levels of the instrument has the constraints of the object. This repertory forms the
been addressed with a cumulative model of augmentation. In « instrumental technique » where constituants of the corporal
this process, the electric level is conceived as an extension of « vocabulary » are combined in real time to form an
the initial acoustic instrument, leaving the latter relatively instrumental discourse. Each movement and combination of
intact. Thus the core of the electric guitar does not vary much movements has its caracteristic sonic result. We use the term
from the acoustic one’s : both hands are involved in the initial « contact points » to signify these convergencies between
sound production, working on the mechanical properties of the gesture and object which result in the production or
strings. The augmented part of the instrument is grafted « on modification of a sound. It allows us to think in terms of a
top » of this core by adding various sound processing modules continuum between these three elements and to establish a
and their individual control interfaces. The consequences of this « map » of their relationships in the playing environment.
cumulative process of augmentation are twofold :
1) The playing environment becomes more complex as For instance, mapping « contact points » on the electric guitar
interfaces are added, each new module requiring a separate results in a precise repertory of the gestural vocabulary
means of control. Moreover, as both hands work mainly on the comprised in the playing technique in relationship with each
initial sound production, the control of the augmented level gesture’s corresponding interface and sonic result. We can thus
needs to be relegated to the periphery of the playing technique, establish a typology of initial sound producing « contact
using the little free « space » that can be found in between the points » (left and right hand techniques on the strings) and of
existing hand gestures or in other parts of the body like the feet. the gesture-interface couples which control the instrument’s
Due to this marginal position, the control of signal processing electric level (potentiometers, switches, pedals etc. and their
corresponding gestures). This allows for a comprehensive

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

articulation of the instrumental environment in the scope of In our augmentation project we have adopted a gesture-based
establishing strategies for further and/or alternative methodology which proceeds by an initial mapping of the
augmentations. « contact points » comprised in the guitar’s basic playing
technique. The augmentation potential of each gesture is
evaluated in relationship with the available aquisition and sound
processing/synthesis techniques. In parallel, we study the
musician’s body in the playing context, looking for potential
« ancillary » [12] gestures not included in the conventional
playing technique. We then look for ways of tapping into these
gestures with an adapted gesture acquisition system (sensors or
signal analysis), thus activating a new « contact point ».
Following is a selection of augmentations we are working on.
Audio and video material of the augmented guitar and its
related M A A music project can be found at : www.
Figure 1. Mapping « contact points » on the electric guitar
(gesture-specific detail not included here). 5.1. « Tilt – Sustain » augmentation
This augmentation is motivated by a double observation : 1) the
In the perspective of instrument augmentation, there is a dual upper body movements that characterise the performance of
interest in the mapping of « contact points ». On the one side, many guitarists remain disconnected from the actual sound.
breaking down the complexities of instrumental playing into a They carry an untapped expressive potential. 2) the sound of the
set of « meta-gestures » and their corresponding sonorities guitar has a very limited duration, which keeps it from
allows us to focus on strategies of « tapping into » the gestures employing long, sustained sounds. The development of the
of the standard instrumental technique, motivated by an guitar can be seen as a long search for this sustained quality [6].
intimate knowledge of gesture and medium. The gesture data The electric guitar with distortion and feedback effectively
acquisition can thus be adapted to the instrument according to attains that but only with a very distinct « overdriven » tone
its technical and playing specificities, using both direct and and high volume levels. The idea of our augmentation is to
indirect acquisition techniques [13]. On the other side, a map of create a sustainer controlled by the tilt of the guitar and of the
contact points allows for the articulation of the instrument player’s torso : the more the guitar is vertical, the more sustain.
according to a typology of « active zones » participating in the The augmentation is developped with a 2-axis tilt sensor
sound production, and of « silent zones » : convergencies of attached to the guitar, mapped to a realtime granular synthesis
gestures and localisations which have no role in the production engine which records the guitar sound and recycles it into a
of sound. From this « map » of « active » and « passive » synthesised sustain. The tilt–sustain augmentation activates a
regions of the instrumental environment, we may go on to find new « contact point » in the electric guitar playing technique,
ways of « activating » the silent zones, creating new contact incorporating torso movements into sound creation.
points and new gestures.
5.2. « Golpe » : The percussive electric guitar
5. THE AUGMENTED GUITAR PROJECT Acoustic guitar allows for the possibility of using percussive
We are currently developing an augmented guitar at the CICM techniques played on the instruments body. Due to its
motivated by the considerations exposed in this article. The microphone design, the electric guitar has lost this ability . The
project is based on simultaneous and crossover use of direct and percussive augmentation we’re working on aims to restore a
indirect gesture data acquisition (i.e. sensors and signal percussive dimension to the electric guitar, thus reactivating a
analysis) [13], as well as, both existing and new « contact traditional « contact point » which remains unused. In order to
points ». The technological platform is made up of a standard tap into the sounds of the guitar’s body, we have proceeded
Fender Stratocaster electric guitar equipped with an additional with the installation of a piezo microphone, detecting the
piezoelectric pickup and a selection of sensors (tilt, touch, percussive attacks and then analysing the signal’s spectral
pressure). The 2-channel audio and multichannel MIDI sensor content. When hit, different parts of the instrument resonate
data output is routed to a PC performing a series of signal with specific spectra, thus allowing us to build up a set of
analysis operations : perceptive feature extraction from the localisation–sound couples. The analysed signal drives a
audio signal (attacks, amplitude, spectrum related data) [5], and sampler where the piezo output is convolved with prerecorded
gesture recognition on the sensor data. The resulting data is percussive sounds, inspired by Roberto Aimi’s approach in his
mapped to the audio engine, providing information for a work for augmented percussions [1].
dynamic control of signal processing. The project is developped
in the Max/Msp environment. 5.3. « Bend » : an integrated « wah-wah »
The left hand fingers operating on the fretboard have an
essential role of producing intonations with horizontal and
vertical movements which range from a minute vibrato to an
extended four semi-tone « bends ». This technique is widely
used on the electric guitar, allowing the player to work in the
doman of continuous pitch variations as opposed to the semi-
tone divisions of the fretboard. The « bend » technique is often
used to enhance the expressiveness of the playing, giving the
guitar a « vocal » quality. The motive of this augmentation is to
Figure 2. The CICM augmented electric guitar set-up extend the inflexion gesture’s effect on the sound from a

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

variation of the pitch to a double variation of both pitch and provides high quality feedback on our augmentations, and it
timbre. In our system, we use attack detection and pitch bears a central role in (in)validating our work. As the
following to match the note’s evolution compared to its initial augmentations stabilize and become more refined, we are
pitch. The resulting pitch variation data is mapped to a filter looking forward to conduct a series of user evaluations which
section, emulating the behavior of the classic « wah-wah » could provide useful insight for further developement of the
effect. We find that controlling the filter through an expressive augmented guitar.
playing gesture incorporates the effect into the musical
discourse in a subtle manner compared to the expression pedal 7. REFERENCES
used traditionally for this type of effect.
[1] Aimi R. M. « Hybrid Percussion : Extending Physical
Instruments Using Sampled Acoustics » PhD thesis,
5.4. « Palm muting » : an augmented effect Massachusetts Institute of Technology 2007 p. 41
switch [2] Berthoz A. La décision Odile Jacob, Paris 2003 pp.
A popular playing technique on the electric guitar consists of 153-155
muting the strings with the picking hand’s palm, thus [3] Bevilaqua F. « Interfaces gestuelles, captation du
producing a characteristic, short, muffled sound. Our mouvement et création artistique » L'inouï #2, Léo
augmentation is based on the detection of the muting gesture by Scheer Paris 2006
an analysis of the spectral content of the guitar’s signal : a loss [4] Cadoz C. « Musique, geste, technologie » Les
of energy in the upper zones of the spectrum, regardless of nouveaux gestes de la musique Parenthèses, Marseille
which string(s) is(are) being played. Our system tests the 1999 pp. 49-53
incoming signal with a « model » spectrum, interpreting closely [5] Jehan T. « Perceptual Synthesis Engine : An Audio-
matching signals as the result of a muted attack. The aquired Driven Timbre Generator » Masters thesis
« muting on/off » data is used in our guitar as a haptic Massachusetts Institute of Technology 2001
augmentation of an effect pedal’s on/off switch, allowing to add [6] Laliberté M. « Facettes de l’instrument de musique et
a desired timbre quality (« effect ») to the sound simply by musiques arabes » De la théorie à l’art de
playing in muted mode. l’improvisation Delatour, Paris 2005 pp. 270-281
[7] Machover T. «hyperinstruments homepage »
6. CONCLUSION AND FUTURE WORK [8] Overholt D. « The Overtone Violin »
The augmented guitar project is currently evolving at a steady Proceedings Of NIME 2005 Vancouver 2005
pace, exploring new augmentations and sound–gesture [9] Rasamimanana N. H. « Gesture Analysis of Bow
relationships. Two different directions seem to emerge from this Str okes Using an Augmented Violin »  Masters thesis
work : one is refining the traditional electric guitar working Paris VI University 2004
environment by finding ways of replacing the poorly integrated [10] « Roland database » http://www.geocities.
effect modules with signal processing control systems more com/SiliconValley /9111/roland.htm
closely connected to the guitar’s playing technique. The other [11] Smitsonian Institute, « The Invention of the
direction points towards more radical augmentations of the ElectricGuitar »http://invention.smithsonian
guitar’s soundscape ; associated with the will of expanding the .org/centerpieces/ electricguitar
guitar’s melodically and harmonically oriented musical [12] Verfaille V. « Sonification of musicians’ ancillary
environment towards novel possibilities of working with timbre gestures » proceedings of ICAD 2006 London 2006
and sound texture. A central factor in this research is the [13] Wanderley M. « Interaction musicien-instrument:
establishment of an interactive working relationship between application au contrôle gestuel de la synthèse sonore »
technological innovation and music. Live playing experience Phd Thesis Paris VI University 2001 pp. 40-44

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Sormina – a new virtual and tangible instrument

Juhani Räisänen
University of Arts and Design
Helsinki, Media Lab
Voudinkuja 3 B 8
02780 Espoo, Finland
+358 40 5227204

This paper describes the Sormina, a new virtual and tangible
instrument, which has its origins in both virtual technology and 2. MOTIVATION
the heritage of traditional instrument design. The motivation The motivation for this innovation is the desire to create totally
behind the project is presented, as well as hardware and new musical instruments in the context of classical music by
software design. Insights gained through collaboration with using computers and sensors. We are interested in designing
acoustic musicians are presented, as well as comparison to digital instruments that could be accepted as part of the standard
historical instrument design. symphony orchestra. We believe that classical music can
benefit from the current developments in digital technology.
Keywords The symphony orchestra has been quite stable during the last
Gestural controller, digital musical instrument, usability, music century, although there have been some experiments using
history, design. electronics. Sormina aims to encourage the symphony orchestra
to develop further to meet the challenges of the digital era. A
handheld computer interface is operated very close to the body,
1. INTRODUCTION which makes the user experience quite intimate. By offering
Sormina is a new musical instrument that has been created as new modes of sensory engagement and intimate interaction,
part of a research project in the University of Arts and Design sormina contributes to a change in the digital world, from
Helsinki, Media Lab. Sormina uses sensors and wireless disembodied, formless, and placeless interaction to materiality
technology to play music. Its design is guided by traditional and intimacy.
instrument building.
This project participates in a long tradition of similar
In new wireless technology, the instrument loses part of its innovations, starting from the Theremin, which is a rare
traditional character. The physical connection between the example of a musical innovation to become part of classical
sounding material and the fingers (or lips) is lost. The material music practise. In addition to Theremin, one of the most
does not guide the design, which puts the designer in a totally influental to the current research has been Rubine and
new situation with new questions. This study tries to answer McAvinneys article in Computer Music Journal 1990, where
these questions by exploring the design of a new instrument that they presented their VideoHarp controller and discussed issues
is intended for use in the context of a live symphony orchestra. related to its construction [1]. Also Michel Waiswisz and his
The research has started from the concept of the interface, Hands has been a great inspiration [2]. Recently, Malloch and
which traditionally is held in hands or put in the mouth. The Wanderley have proposed the Tstick [3]. Important questions
playing posture of the musician, the delicate controllability of concerning parameter mapping have been discussed in Hunt,
the instrument and the ability to create nuances are considered Wanderley and Paradis [4].
as the key phenomena of the new design. Visual aesthetic and
usability are of equal importance.
Sormina aims to take the musician on a tour to the ancient
world, where tools were built to fit the fingers of human beings, 3.1 Hardware, sensors
and where technology was to serve humanity. The technological Structurally, the Sormina is built using a Wi-microdig analog to
tools have changed during centuries, but the idea of music digital encoder, a circuit board for the wiring, and 8
making stays the same. Using the most modern technology for potentiometer sensors with custom-made, wooden knobs. The
music making does not have to result in underrating our Wi-microDig is a thumb-sized, easily configurable hardware
common heritage. device that encodes up to 8 analog sensor signals to
multimedia-industry-compatible messages with high resolution
and then transmits these messages wirelessly to a computer in
real-time for analysis and/or control purposes [8]. The custom-
Permission to make digital or hard copies of all or part of this work for made circuit board takes care of the wiring. The potentiometers
personal or classroom use is granted without fee provided that copies are are mounted in the circuit board in an upright position, and the
not made or distributed for profit or commercial advantage and that encoder unit is also attached to the circuit board. The knobs of
copies bear this notice and the full citation on the first page. To copy the potentiometers are arranged in a straight line on top of the
otherwise, or republish, to post on servers or to redistribute to lists, instrument.
requires prior specific permission and/or a fee.
NIME08, June 5-7, 2008, Genova, Italy The manufacturer of Wi-microDig promises that the 8 inputs of
Copyright remains with the author(s). 10 bits resolution each can sample at up to 1500 Hz with only

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

milliseconds latency [8]. The wireless transmission complies quite reliably. The Max/MSP programming environment was
with the Bluetooth v2.0 standard, which is claimed to be a also favored for its usefulness in other parts of the project.
reliable protocol and, at 115, kbs much faster than MIDI speed.
The wireless range is guaranteed up to 100 meters without The wi-microdig patch outputs the sensor data as 7-bit
obstructions, since it is a Bluetooth class 1 device. With the information, which was found to be sufficient for the purpose of
the project. According to the tests made, it was not possible to
prototype there was considerable problems with the connection
range. The encoder in question was, however, an older model produce any larger resolution with the finger movements using
than Wi-microDig. the small potentiometer knobs of Sormina.

The construction of the controller is open: it is not put in a box A visual user interface was programmed using Max/MSP,
which also handles the connection to the encoder. One purpose
or cover. With the help of this arrangement, the visual design
appears light and spacious. However, the decision to use no of the interface is to give the musician visual cues in controlling
cover is subject to change in the forthcoming prototypes, as the the instrument. This proved to be beneficial especially in the
learning phase. In addition to the feel in the fingertip, it was
openness makes the construction vulnerable to dust and
moisture. helpful to see the state of all the sensors at one glimpse on the
The Sormina makes use of 8 potentiometer sensors, which is the
The visual interface comprises sliders, number boxes, and basic
maximum number of sensors to be connected to the encoder.
The choice between sensors was made on the basis of three notations for the sensor input. At the same time the interface is
main arguments: stability, precision and tangibility. The Wi- capable of recording a control sequence, which was found
microDig encoder comes with only one potentiometer, which useful for learning to play the instrument. While the recorded
sequence is playing back, the visual information about the state
did not fit the standards set for the instrument design. The
suitable potentiometers were purchased separately. of the sensors is shown on the interface.

The first argument for the selection of the sensor type was
stability. In order to attain a stable instrument, the sensors also
have to provide this characteristic. Stability in this context
means a sensor that would preserve its state when not touched.
Most of the available sensors are built accoeding to a
convention that does not give support to this demand.
Potentiometer sensor changes its state only by intentional
action. Stability is also required for an instrument in the sense
of durability and robustness. Potentiometers proved to be stable
also in this sense.

Figure 1. Sormina is a virtual instrument with wooden


3.2 Software
The software for Sormina has been programmed using
Max/MSP and Reaktor. It consists of three parts: one handles
the communication with the encoder through bluetooth, the
second takes care of the user interface, and the third produces
the sound. In addition, external software, Sibelius, was used for
the notation.
Figure 2. Part of the visual interfafe
The Wi-minidig comes with its own software, which actually is
not used in this project. This software is meant to take care of
the bluetooth connection and let the user decide the The sound is created using a sound synthesis patch created for
interpretation of the sensor data, which is then sent forward as the Reaktor software. The patch allows the control of several
MIDI information. In addition to this rather laborious software, features of sound synthesis. The mapping of the sensors to the
the company alsso offers on the web site for the same purpose a sound synthesis software appeared to be of crucial importance.
Max/MSP patch, which proved to be handier for the purpose of
Mainly due to the capabilities of the encoder, it was decided
the project. The wi-microdig patch for Max/MSP appeared to
that there should be 8 sensors. Nevertheless, it was found to be
handle the communication through bluetooth with the encoder
a very useful restriction. It was assumed that a human being
cannot handle too many controls at the same time. Too many

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

options could result in indeterminacy. Also, with 8 sensors, 4. IN PERFORMANCE

nearly all of the fingers could still be utilized for controlling Much of the development of the Sormina has been conducted
purposes. through collaborations with other musicians. The sound
The Reaktor software was chosen as the sound engine for the synthesis software and especially the mapping of the parameters
project, although the use of two different pieces of software has been open to change, so the insights of other performers has
instead of only one has its drawbacks. Reaktor was found to be been welcome. Still, for the purpose of creating a stable
more amenable than Max/MSP for the purposes of this project. instrument, it would have been preferable to fix the mapping at
a very early stage of development. This conflict has been one of
The sound synthesis patch in Reaktor comprises a 96-voiced the most challenging features of the project.
noise generator with filters and reverb. The patch has 26
controls for mapping but because of the restrictions of the The sound created by the Sormina seems to fit quite well with
hardware, only some of them were possible to choose. One string instruments, especially the cello. The reason for this fact
solution for the mapping problem could have been to use one was considered to be the use of noise generators as the main
sensor for several controls on the sound software but it was sound source. The sound of acoustic instruments has many
found that this would be unwise on a large scale, although some characteristics of white noise. Singing voices showed a similar
sensors are connect to two parameters. resemblance to the Sormina sound, also.

3.3 Notation 4.1 Concerts

One important part of the new instrument design was the There have been several public concerts during the first year of
attempt to notate the music created with the Sormina. It was the instrument’s existence. In addition, the Sormina has been
challenging to put up a link with Max/MSP and notation presented to researchers and students, and in seminars. The first
software for notating eight parameters in the same score. concert, in November 2006, was given with the cellist Juho
Laitinen and soprano Tuuli Lindeberg. In November 2007 the
The Sibelius software was chosen for this purpose. The note
heads were changed to triangles in order to distinguish them Sormina was played with the chamber choir Kampin Laulu.
The last performances of 2007 were in December in Los
from normal pitched notes. A number was added near the note
Angeles, where the instrument was being presented for the art
head to be more precise.
students in the California Institute for the Arts, Calarts. Two
concerts were also given in art galleries and jazz cafes in the

Figure 4. The author playing the Sormina in a concert

Figure 3. Notation of the parameters of the Sormina

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

The aim of the Sormina project was to explore the main
principles of the instruments in classical music, from the 6. FUTURE DIRECTION
musician’s point of view, and with these findings to create a The current research has used the observation of traditional
new, stable electronic music instrument that could be accepted musical instruments and their user experience for the design of
in a symphony orchestra. The results suggested the importance a new electronic music instrument. Still, the scope of the
of three layers in the design of new instruments. The first layer exploration has been narrow, concentrating primarily on the
is the sound-synthesis that defines the audible response. The author’s experience of acoustic instruments. In the future, a
second is the mapping of the gestures to the sound parameters, more systematic inquiry will be accomplished, where
which constitutes the instrument in a conceptual manner to the professional musicians will be observed and interviewed about
musician. The third layer, often overlooked in the creation of their playing habits. Also, perceiving the learning process in the
new digital music instruments, DMIs, is the materiality and study of classical music instruments can reveal qualities that
usability layer of the controller. could then assist in new instrument design.
Much weight in the research has been put to the human hand One direction in the development of the instrument is to
and its capabilities. The author has followed Curt Sachs’ combine the sound output with a live visual output. This is
findings about the hands and feet being the first intstruments especially attractive because of the readiness of Max/MSP/Jitter
[5], and Malcolm McCullough, as he praises our hands as a best to process and produce video and other moving image. Using
source of personal knowledge [6]. A remarkable source for the same parameters in video processing brings up interesting
understanding the importance of music playing has been Tellef questions about the connection between auditory and visual
Kvifte, who formulates a classification of instruments using sensory systems.
playing techniques, not based on the way the musical sound is
produced [7]. To enhance the usability of the instrument, its robustness needs
more attention. Also, in order to compete with traditional
The Sormina research suggests that the touch and feel of the instruments, the Sormina should be developed more in the
interface is important to take into account when designing new direction of a consumer product.
instruments. The musician uses subtle, almost intuitive and
unconscious movements of her body. The fingers, for example, 7. ACKNOWLEDGMENTS
have developed through evolution to take care of the most
The author would like to acknowledge the important
sophisticated and precise actions. Therefore it is reasonable to
contributions of many people to this project, including Martijn
use the fingers for playing music. In the culture of the human
Zwartjes, Risto Linnakoski, and Matti Kinnunen. The author
being, the fingers have been crucial for surviving. Even today,
received research funds from the Wihuri Foundation and the
they are used extensively, to express our thoughts, by writing
Runar Bäckström Foundation. The University of Arts and
with a pen or a computer.
Design Helsinki has also given grants for the research.
In the course of history, traditional instruments have matured to
be well adapted to the human body. Their long evolution has 8. REFERENCES
given them power to survive even in the era of computers. [1] Rubine, D. and McAvinney, P. Programmable
Through careful examination of their principles, it is possible to Fingertracking Instrument Controllers. In Computer Music
learn from their pattern and use the results in the design of Journal, Vol 14, No. 1, Spring 1990, 26-40.
totally new electronic instruments. In the present research, the
role of the physical interface has been found to be fundamental [2] Waisvisz, M. The Hands, A Set of Remote MIDI
for such a design. It appears that attention should be paid to the Controllers. In Proceedings of 1985 International
physical appearance of the instruments in order to build stable Computer Music Conference. Computer Music
instruments. Association, San Francisco, 1985.

Sormina aims to be more than a controller. As Rubin and [3] Malloch, J. and Wanderley, M. The TStick: From Musical
McAvinney formulate, a musical instrument may be thought of Interface to Musical Instrument. In Proc. of the 2007 Conf.
as a device that maps gestural parameters to sound control on New Interfaces for Musical Expression (NIME-07),
2007, 66-69.
parameters and then maps the sound control parameters to
sound [1]. By binding together a fixed set of sensors with a [4] Hunt, A., Wanderley, M. and Paradis M. The importance
stable sound source, we have developed Sormina into an of parameter mapping in electronic instrument design. In
instrument, not a controller. Proc. of the 2002 Conf. on New Interfaces for Musical
Expression (NIME-02), 2002, 149–154.
Sormina attempts to be engaging to new musicians, but also
rewarding for the professionals. Based on the current evidence, [5] Sachs, C. The History of Musical Instruments. Norton,
these goals have been reached to a large extent. New York, 1940, 25-26.

The Sormina has been played in concert situations, both solo [6] McCullough, M. Abstracting Craft. The Practiced Digital
and with acoustic musicians. Playing with an acoustic cello has Hand. The MIT Press, Cambridge, Massachussets, 1996,
been rewarding, but an a cappella choir also made a good 1-15
combination with the electronic sounds of the Sormina. [7] Kvifte, T. Instruments and the electronic age. Toward a
terminology for a unified description of playing technique.
The experience of concerts with acoustic instruments and
Solum förlag, Oslo, 1988, 1.
singers point out that the sound quality and playing techniques
of Sormina are well adaptable to classical music orchestra. The [8] Wi-Microdig v6.00/6.1 <
possibility to notate the playing brings another useful catalog/info_pages.php?pages_id=153
characteristic for use with a symphony orchestra.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Practical Hardware and Algorithms for Creating Haptic

Musical Instruments

Edgar Berdahl Hans-Christoph Steiner Collin Oldham

CCRMA/Stanford University ITP/NYU CCRMA/Stanford University

Actuator Sensor


ABSTRACT Sound Gesture

Signals Signals
The music community has long had a strong interest in hap-
tic technology. Recently, more effort has been put into mak-
ing it more and more accessible to instrument designers. Figure 1: A musician interacting with a haptic mu-
This paper covers some of these technologies with the aim sical instrument
of helping instrument designers add haptic feedback to their
instruments. We begin by giving a brief overview of practical
actuators. Next, we compare and contrast using embedded There has been a wide array of research into haptics over
microcontrollers versus general purpose computers as con- the past decades, the vast majority taking place in special-
trollers. Along the way, we mention some common software ized research labs with elaborate and custom equipment.
environments for implementing control algorithms. Then we Haptic feedback plays a key role in playing traditional in-
discuss the fundamental haptic control algorithms as well as struments, and with the push to further develop electronic
some more complex ones. Finally, we present two practical instruments, musicians have begun integrating haptic feed-
and effective haptic musical instruments: the haptic drum back into electronic instruments.
and the Cellomobo. Recently, a number of developments have opened up hap-
tic exploration to projects with smaller budgets and more
common facilities. Additionally, as it becomes easier to ac-
Keywords cess haptics equipment, it becomes possible to create haptics
haptic, actuator, practical, immersion, embedded, sampling platforms oriented to musical instrument designers. This is
rate, woofer, haptic drum, Cellomobo especially interesting to designers looking to create their own
instruments, since it means that they can design and employ
useful haptic feedback in their own instruments.
A haptic musical instrument consists of actuators that ex- 2. ACTUATORS
ert forces on the musician, sensors that detect the gestures
of the musician, an algorithm that determines what forces Actuators form the core of any haptic musical instrument.
to exert on the musician, and a controller that runs the al- The ideal actuator is linear and time invariant (LTI), has in-
gorithm and interfaces with the sensors and actuators. The finite bandwidth, can render arbitrarily large forces, and is
instrument often synthesizes sound as well. Figure 1 illus- accompanied by an LTI sensor with infinite resolution. In
trates how the musician is included in the haptic feedback practice, the actuator usually limits the performance of hap-
loop. tic feedback in a haptic musical instrument. One effective
design approach is to choose the actuator so that it directly
complements the metaphor of the target haptic musical in-
strument. For instance, for a haptic drum, use a woofer to
mimic a vibrating drum membrane.

2.1 Vibrotactile Actuators

Permission to make digital or hard copies of all or part of this work for Marshall and Wanderley [23] and Hayward and MacLean
any purpose are granted under a Creative Commons Attribution 3.0 license: [20] provide good overviews of some actuators, so here we
NIME08, Genova, Italy cover only the most effective and practical actuators for mu-
Copyright 2008 Copyright remains with the author(s). sical instrument designers.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

2.1.1 Vibrating Motors

Table 1: Approximate Actuator Costs in U.S. $
Vibrating motors are the most common haptic actuators. Device Price
They are widely used in mobile phones and other commu-
Vibrating motor $1-$20
nications devices. They are built using a motor with an
Tactor $5-$200
unbalanced weight attached to the spindle. They are al-
Alps motorized fader $30
most always used to generate a fixed frequency vibration,
Woofer/shaker $40
but some variation is possible. They are cheap, simple, and
easy to obtain, but they have a slow ramp up time, which Servomotor with encoder $400
limits their application. Novint Falcon $200
SensAble Omni $1000
2.1.2 Tactors
Tactors are specialized motors that produce vibrations in
a frequency range appropriate for sensing by the skin. They The challenge in applying woofers and shakers effectively
are included in devices like the iFeel mice. Immersion builds typically lies in integrating a sensor with the actuator.
their tactors using “Inertial Harmonic Drive”, which basi-
cally means a motor with a very small gear ratio whose 2.2.4 Multi-DOF Haptic Devices
spindle is attached to a surface by a somewhat flexible ny- Commercial robotic arms like the 6DOF2 SensAble Phan-
lon linkage. The motor yanks on the linkage to generate a tom have been available for a number of years now. They are
pulse. Another type of tactor is made using a piezoelectric typically designed to be held in the hand like a pen. They
element to actuate a plate under tension. It is also possible have traditionally been expensive and relatively rare; how-
to build low-cost tactors using vibrating motors [17]. ever, advancements in teleoperation and minimally-invasive
surgery in particular have driven production costs per unit
2.2 Force Feedback Actuators down significantly so that the Phantom Omni can be ob-
In order to provide force feedback in practice, it is nec- tained for $1000.
essary to measure the behavior of the haptic device in the The Novint Falcon is a more limited 3DOF haptic device
same dimension as it is actuated, making force feedback se- that is designed for gaming. While it does not provide the
tups more complex. flexibility or fidelity of the cheapest Phantom, it is available
for less than $200.
2.2.1 Motorized Faders
Alps Electric Co. and other manufacturers make motor- 3. CONTROLLERS
ized faders designed for use in digital control surfaces. These To provide force feedback, a control loop is usually called
faders consist of a belt motor drive attached to a linear slider every 1/fS seconds, where fS is the sampling rate. This
potentiometer. The potentiometer can serve as the position control loop reads inputs from the sensors, computes appro-
sensor for the haptic feedback loop controlling the motor. priate outputs to the actuators, and then immediately sends
Since the motor is relatively small, these faders cannot ex- the outputs to the actuators. In order to have a responsive
ert large forces, but they are cheap, pre-assembled and rel- haptic musical instrument, the controller must be quick. In
atively easy to procure. other words, the system delay (also known as input-output
2.2.2 Servomotors with Optical Encoders delay) should be short, and the sampling rate should be
high. For most operating systems, these requirements are
To produce relatively large forces, we have been using ser- mutually exclusive, so in the following sections, we consider
vomotors with built-in optical encoders that sense position common control hardware implementations.
[27]. We use the Reliance Electric ES364 servomotor with a The sampling rate is an important factor. Typical hap-
peak-torque specification of 6.5 kg-cm and encoder resolu- tics applications do not require sampling rates as high as
tion of 1000 pulses/rev (4000 counts).1 An arm attached to audio. For instance, the CHAI 3D haptics framework does
the motor shaft makes it possible to interface the motor ef- not support sampling rates above 1kHz for most devices [8].
fectively with the hand. A force-sensitive resistor placed at However, some haptic musical instruments send audio sig-
the end of the shaft provides an additional sensed quantity nals through the feedback loop. The human range of hearing
useful in further fine-tuning the force feedback. spans roughly 20Hz to 20kHz. According to the Nyquist-
Shannon sampling theorem, the sampling rate must be at
2.2.3 Woofers and Shakers
least 40kHz so the whole bandwidth that humans hear can
In contrast with rotational servomotors, woofers and shak- be sampled and reconstructed within the feedback loop.
ers are linear actuators. As a consequence, the maximum Haptic musical instruments taking full advantage of feed-
displacements they provide are typically limited to a couple ing aurally-relevant acoustic signals back through the haptic
centimeters or less. Nevertheless, these actuators can be eas- device must run at much higher sampling rates on the order
ily obtained at low-cost. Shakers are similar to woofers, but of 40kHz. It is true that these higher frequencies are very
they have no cone for pushing air. Instead they mount to poorly sensed by the human tactile system, but in a bowed
and shake a piece of furniture so that a listener can feel bass string experiment, users reported that the system neverthe-
and infrasonic frequencies in music and movie soundtracks. less felt much more real when the haptic sampling rate was
1 44kHz instead of 3kHz. They made comments regarding the
While the ES364 is now out of production, Applied Motion
sells the comparable VL23-030D with an optical encoder for “strong presence of the string in the hand,” “the string in
$400. It provides a maximum peak torque of 5.9kg-cm. This the fingers,” and “the string is really here” [22].
type of motor can be obtained surplus for prices as low as
$7 each. six degrees of freedom

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy


Embedded microcontrollers can be run without any oper- Immersion Studio, proprietary software only for Windows,
ating system or extraneous processes, which might interfere is required to create and edit Immersion effects. The avail-
with the control loop timing. In addition, they are small, able effects are classified by Immersion thusly: Vibrational
allowing them to be easily embedded within musical instru- (Periodic), Positional (Texture, Enclosure, Ellipse, Spring,
ments, and they can be configured to interface with a wide Grid), Directional (Constant, Ramp), and Resistive (Damper,
variety of sensors and actuators. Atmel processor-based mi- Friction, Inertia) [1]. Immersion Studio allows designers to
crocontrollers such as the AVR [5] and especially the Ar- experiment with the set of effects and build them into an
duino [7] have recently become popular in computer music. object that can be integrated into and triggered within a
Note that these microcontrollers do not natively support program. It is possible to create more elaborate compound
floating-point calculations.3 This is generally of no concern effects by combining effects with waveforms and envelopes
for simple algorithms, but more complex algorithms become into an object that can be triggered as a single unit.
much more difficult to implement without loss of fidelity. Immersion also produces more specialized versions of its
Studio program for other markets, including medical and au-
4.1 Generic Programming Tools tomotive applications. Additionally, they have the VibeTonz
Sometimes it is most convenient to program the control SDK for controlling the vibrating motor in some mobile de-
loop directly using generic tools. Microcontroller libraries vices and the VirutalHand SDK for controlling their glove
such as AVR-lib make reading data from the sensors and systems. In general, this special software and equipment is
writing data to the actuators straight-forward [5]. For teach- targeted at specific markets and is not easy to obtain.
ing purposes, we use a combination of the AVRMini Atmel-
based microcontroller board, the spyglass unit for producing 5. GENERAL PURPOSE COMPUTERS
debugging output, and the AVR motor controller board [3]. In contrast with the aforementioned microcontrollers, gen-
eral purpose computers are much faster and support native
4.2 Immersion and USB PID floating-point calculations. However, general purpose com-
Immersion, Inc. sells a number of tools to make design- puters face a considerable drawback when controlling haptic
ing haptic feedback easier. Immersion devices use “effects”, feedback: the operating system schedulers, the bus systems,
which are built upon wavetables and envelopes and are han- and device interface protocols can interfere with the ideally
dled by embedded microcontrollers. These effects can either deterministic timing of the control loop. Using an RS232 se-
be linked directly to button and position data using the rial port directly can help, but the maximum sampling rate
microcontroller, or they can be controlled by the host com- will still be limited by the scheduler.
puter via USB. The latency and jitter of USB are too high
to handle the feedback loop,4 so the microcontroller main- 5.1 DIMPLE
tains the feedback loop. In Immersion-compliant devices, Allowing musical instrument designers to incorporate a
the feedback loop controlling the motors probably runs at wide range of haptic behaviors into an instrument [25], DIM-
1kHz or faster.5 The data sent over USB is used purely for PLE takes full advantage of the CHAI 3D [8] and the Open
configuring and triggering the microcontroller. Dynamics Engine (ODE) [10] libraries. ODE models the
While a number of Immersion’s haptic devices are not state of the virtual world, and CHAI 3D renders visual feed-
easily procured, such as the tools for the medical and auto- back and mediates the link between the virtual and the
motive industries, the tools aimed at video game develop- haptic worlds. The CHAI 3D library is compatible with
ment are practical for creating haptic feedback in musical Windows and GNU/Linux, and it supports a wide variety
applications. Joysticks and steering wheels provide kines- of haptic interfaces including the SensAble devices and the
thetic and vibrotactile feedback using motors and position Novint Falcon. With the SensAble Phantom Omni, the
sensors; mice and gamepads provide vibrotactile feedback maximum sampling rate is 1kHz, which limits haptic in-
using tactors and vibrating motors. teraction at audio frequencies. The most recent release of
DIMPLE incorporates a method for sending downsampled
4.2.1 USB Physical Interface Devices audio-frequency data to the actuators, but the delay, which
Immersion, Inc. has worked to get their protocol into the is probably longer than 5ms,6 prevents practical implemen-
USB Human Interface Devices (HID) [11] standard in a new tation of high-bandwidth feedback control.
subsection called PID (Physical Interface Devices) [12]. To
program USB PID devices, each operating system has its 5.2 TFCS
own API: Apple has the HID Manager and ForceFeedback In contrast, the open-source Toolbox for the Feedback
APIs, Microsoft has the DDK HID and Immersion APIs, Control of Sound (TFCS) facilitates the implementation of
and the Linux kernel has the iforce module and the libff haptic algorithms with large feedback bandwidths when us-
API. ing general purpose computers [14]. Virtual musical instru-
ment models are provided via the Synthesis Toolkit (STK).
Since they are implemented efficiently using digital waveg-
They emulate floating-point calculations using integer uide technology, they can operate in synchrony with the
arithmetic, which is too slow to be useful in most haptic haptic device at sampling rates as high as 40kHz with less
algorithms. than one sample of delay. The TFCS ensures that the con-
USB HID devices usually communicate with the host com-
puter every 8-10ms; some devices can communicate faster, 6
This theoretical lower limit has been derived during per-
up to 1ms intervals. sonal communication with Stephen Sinclair and still needs
The manufacturers do not publish these rates. to be measured.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Table 2: Control Hardware
Approximate Native
Control Maximum minimum floating x
hardware sampling rate delay point
ATMEL-based ≈ 20kHz =50μs
˙ N z
DIMPLE 1kHz < 1ms Y
TFCS 40kHz ≈ 20μs Y x
ASP 96kHz typ. 10ms typ. Y

Figure 2: Force profile F (x) (above) and terrain

trol loop is called regularly by using the Real-Time Appli-
height profile z(x) (below) for a simple detent.
cation Interface (RTAI) for Linux [6] and the Linux Control
and Measurement Device Interface (Comedi) [9]. In multi-
processor machines, the control loop runs isolated on one Whenever the haptic device is pushed inside the virtual
processor, while all other code is executed on the remaining wall (i.e. x > 0), a spring force acts to push the device
processors. back out of the wall. So that the wall feels stiff, k should
be chosen large. The maximum stiffness that a haptic de-
5.3 Audio Signal Processing (ASP) Environ- vice can render is governed by a fundamental limit, which is
ments chiefly a function of the system delay, the sampling rate, and
Most general purpose computers also come equipped with the internal physical damping of the device [19]. In general,
sound interfaces, so designers should consider whether a more expensive haptic devices are required for rendering es-
sound interface can be used for implementing the control pecially stiff virtual springs and walls.
loop. However, sound interfaces are not designed for very
low-latency applications. Besides employing block-based pro- 6.1.3 Detents And Textures
cessing, sound interfaces use sigma delta modulator convert- Detents can help the musician orient himself or herself
ers that add considerable system delay [13]. The smallest within the playing space of the instrument. Detents can be
system delay we were able to achieve on a 4.4GHz dual core created even with 1DOF haptic devices. Figure 2 illustrates
AMD-based machine7 was 4ms, where fS = 96kHz. Nev- how to implement a simple detent. Near the origin, the
ertheless, this hardware/software solution is acceptable for force profile looks like that of a spring, while the force goes to
some kinds of instruments. For example, haptic instruments zero when the position x moves further from the detent [27].
that respond slowly to the environment can be implemented A simple potential energy argument implies that the force
without problems. profile F (x) is proportional to the derivative of the terrain
height z(x) (see Figure 2), allowing arbitrary terrains and
6. ALGORITHMS textures to be created.

6.1.4 Event-Based Haptics

6.1 Standard Haptic Algorithms
Another effective algorithm uses the sensors to detect cer-
tain events. When an event occurs, a stored waveform is sent
6.1.1 Spring to the actuators. A common example in gaming is sending
An actuator induces a force F on the haptic device. Most a recoil force waveform to the actuators when the user fires
haptic devices measure the movement of the device in re- a weapon. Since virtual walls cannot be made infinitely
sponse as a displacement x. Hence, the most fundamental stiff, some musical instrument designers may consider send-
(i.e. memoryless) haptic algorithm for these devices imple- ing ticks or pulses to the haptic interface whenever the inter-
ments a virtual spring with spring constant k. face enters a virtual wall. This type of event-based feedback
is known to improve the perception of hardness [21].
F = −kx (1)
6.2 Algorithms Requiring High Sampling Rates
The virtual spring in combination with the physical mass
and damping of the haptic device forms a damped harmonic 6.2.1 Virtual Instruments
oscillator, which can be plucked or bowed. By obtaining Extensive studies on the physical modeling of acoustic mu-
estimates of the haptic device’s velocity or acceleration, the sical instruments have led to the development of many dif-
device’s damping and mass can be controlled analogously. ferent acoustic musical instrument models. One simple way
to create a haptic musical instrument is to interface a hap-
6.1.2 Wall tic device with a virtual instrument according to the laws of
An algorithm similar to the spring implements a wall at physics [18]. For efficiency reasons, it is often convenient to
x = 0: run the haptic control loop at a standard haptic sampling
rate, while the musical instrument model runs at a higher
sampling rate to provide high-quality audio. For example,
F = −kx · (x > 0) (2)
the Association pour la Création et la Recherche sur les Out-
The machine was running the Planet CCRMA distribution ils d’Expression (ACROE) often employs a haptic sampling
of Fedora Core, which has a patched kernel allowing low- rate of about 3kHz, while audio output is often synthesized
latency audio. at standard audio sampling rates, such as 44kHz. However,

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Figure 3: Haptic drum

ACROE sometimes employs their ERGOS device with dedi-

cated DSP hardware to run both the haptic and audio loops
at 44kHz in real-time [22].

6.2.2 Actively Controlled Acoustic Instruments

An actively controlled acoustic musical instrument is an Figure 4: Cellomobo front (left) and back (right)
acoustic musical instrument that is augmented with sensors,
actuators, and a controller. These instruments can be con-
sidered a special case of haptic musical instruments where
sinusoidally, various period-doubling and apparently chaotic
the interface is the entire acoustic instrument itself. For
effects may be observed.
example, a monochord string can be plucked and bowed at
various positions as usual, while its acoustic behavior is gov-
erned by the control hardware. Simple and appropriate con- 7.2 Cellomobo
trol algorithms emulate passive networks of masses, springs, The Cellomobo is an instrument allowing the musician to
and dampers or implement self-sustaining oscillators [15]. bow a virtual string using a haptic interface [4]. The length
of a the string is adjusted by a resistive ribbon controller (see
Figure 4, left). The vibrating string element consists of a
7. EXAMPLES piezoelectric disc pickup (see Figure 4, bottom left), which is
mounted upon a shaker (see Figure 4, bottom). The haptic
7.1 Haptic Drum feedback and sound synthesis algorithms run at the audio
The haptic drum is a haptic musical instrument that can rate in Pure Data.
be constructed out of components found in practically any Figure 5 shows a diagram of the the Cellomobo’s com-
computer music laboratory [2]. It employs an event-based bined haptic feedback/sound synthesis engine. The dotted
haptics algorithm that is implemented using a woofer actua- box encloses the digital waveguide model of a lightly damped
tor, a general purpose computer, and an ASP environment. vibrating string. N/fS is the period of the note being played.
The woofer actuator conforms to the metaphor of a vi- The internal feedback loop gain g is between 0.9 and 0.999
brating drum membrane. A sunglass lens is attached rigidly and is controlled by a knob. HLP (z) is a lowpass filter caus-
to the cone but held away from the sensitive surround part ing the higher partials to decay more quickly [26]. The outer
by way of a toilet roll (see Figure 3). Whenever a drumstick feedback loop is closed around the shaker and piezoelectric
strikes the sunglass lens, it makes a loud “crack” sound. A pickup, which provides the excitation input to the instru-
nearby microphone (not shown) provides an input signal to a ment. H2LP (z) is a second order lowpass filter to remove
sound interface. A Pure Data patch detects drumstick colli- upper partials from the feedback loop. The cut-off frequency
sions by checking the threshold of the microphone signal en- of the filter is controlled by left hand finger pressure, to give
velope. Whenever a collision is detected, an exponentially- the musician control of tone color. Before the output sig-
decaying pulse is sent to the woofer that effectively modi- nal reaches the actuator, a hard clipping nonlinearity clips
fies the coefficient of restitution of the collision. The hap- off the tops of the wave form. This gives the haptic signal
tic drum can be configured to make it easier to play (one- more of a square shape, causing the bow to release from the
handed) drum rolls. It also facilitates playing various “gal- bowing surface more easily.
loping” and “backwards” drum rolls, which are otherwise The novel addition of the inner feedback loop is nonphys-
nearly impossible to play using one hand [16]. If instead a ical, but it allows the instrument to be less sensitive to the
ping pong ball is placed on the lens, and if the lens is driven dynamics of the sensor and actuator. This structure en-

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Bowing surface with [9]

piezo pickup [10]
[11] USB HID.
z −A + z −N
[12] USB PID.
g HLP (z) devclass_docs/pid1\_01.pdf.
[13] M. Antila. Contemporary electronics solutions for
Shaker active noise control. In Proc. Int. Symposium on
Active Noise and Vibration Control, September 2004.
z −B H2LP (z) [14] E. Berdahl, N. Lee, G. Niemeyer, and J. O. Smith.
Power Practical implementation of low-latency DSP for
amplifier feedback control of sound in research contexts. In
Proc. of the Acoustics ’08 Conference, June 2008.
Figure 5: The Cellomobo block diagram [15] E. Berdahl, G. Niemeyer, and J. O. Smith.
Applications of passivity theory to the active control
of acoustic musical instruments. In Proc. of the
hances the playability of the instrument and differentiates Acoustics ’08 Conference, June 2008.
the Cellomobo from previous research efforts [22]. In fact, [16] E. Berdahl, B. Verplank, J. O. Smith, and
the behavior is so robust, that the instrument functions de- G. Niemeyer. A physically-intuitive haptic drumstick.
spite the large ASP system delay (A + B)/fS =20ms.
˙ Note In Proc. Internat’l Computer Music Conf., volume 1,
that this delay is an order of magnitude longer than the pages 363–366, August 2007.
period of the highest note that can be played on the instru- [17] A. Bloomfield and N. I. Badler. A low cost tactor suit
ment, which is about 1ms. for vibrotactile feedback. Technical Report 66, Center
for Human Modeling and Simulation, University of
Pennsylvania, 2003.
[18] N. Castagne and C. Cadoz. Creating music by means
Making haptic musical instruments is not so difficult given of ’physical thinking’: The musician oriented genesis
some forethought and knowledge about the field! Incorpo- environment. In Proc. of the Fifth Int. Conf. on
rating haptic feedback is also often worth the effort—haptic Digital Audio Effects, pages 169–174, 2002.
feedback has been shown to improve the user’s impression of
[19] N. Diolaiti, G. Niemeyer, F. Barbagli, J. K. Salisbury,
playing a haptic musical instrument [22]. Haptic feedback
and C. Melchiorri. The effect of quantization and
has been informally found to make it easier for users to play
coulomb friction on the stability of haptic rendering.
various types of drum rolls [16]. Finally, haptic feedback
In Proc. of the First Joint Eurohaptics Conf. and
has been further shown to improve the accuracy of musi-
Symp. on Haptic Interfaces, pages 237–246, 2005.
cians playing a haptic musical instrument [24].
[20] V. Hayward and K. MacLean. Do it yourself haptics,
In this paper, we presented ideas on how to practically
part 1. IEEE Robotics and Automation Magazine,
implement such instruments given today’s technology. We
14(4):88–108, December 2007.
hope our efforts will help make haptic technologies more
accessible to designers and musicians. We expect more su- [21] K. Kuchenbecker, J. Fiene, and G. Niemeyer.
perior haptic technologies to become even more accessible Improving contact realism through event-based haptic
as other fields drive haptic device development. feedback. IEEE Transactions on Visualization and
Computer Graphics, 12(2):219–230, March/April 2006.
[22] A. Luciani, J.-L. Florens, D. Couroussé, and C. Cadoz.
9. ACKNOWLEDGEMENTS Ergotic sounds. In Proc. of the 4th Int. Conf. on
We would like to thank all of the people at or from CCRMA Enactive Interfaces, pages 373–376, November 2007.
who have helped us and inspired us to study haptics: Bill [23] M. Marshall and M. Wanderley. Vibrotactile feedback
Verplank, Julius O. Smith III, Günter Niemeyer, Chris Chafe, in digital musical instruments. In Proceedings of the
Sile O’Modhrain, Brent Gillespie, and Charles Nichols. 2006 International Conference on New Interfaces for
Musical Expression (NIME06), Paris, France, 2006.
10. REFERENCES [24] S. O’Modhrain and C. Chafe. Incorporating haptic
[1] Immersion fundamentals. feedback into interfaces for music applications. In Proc. of ISORA, World Automation Conference, 2000.
ImmFundamentals/HTML/ImmFundamentals.htm. [25] S. Sinclair and M. Wanderley. Extending dimple: a
[2] rigid body haptic simulator for interactive control of
HapticDrum. sound. In Proc. of 4th Int. Conf. on Enactive
Interfaces, November 2007.
[26] J. O. Smith. Physical Audio Signal Processing: For
Virtual Musical Instruments and Audio Effects.
cellomobo.html.˜jos/pasp/, 2007.
[27] B. Verplank. Haptic music exercises. In Proc. of the
[6] Int. Conf. on New Interfaces for Musical Expression,
[7] pages 256–257, 2005.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Considering Virtual & Physical Aspects

in Acoustic Guitar Design

Amit Zoran Pattie Maes

MIT Media Laboratory MIT Media Laboratory
20 Ames Street 20 Ames Street
Cambridge, MA 02139 Cambridge, MA 02139

This paper presents a new approach for designing acoustic guitars, 1.1. Acoustic, Electric and Virtual Guitar
making use of the virtual environment. The physical connection
between users and their instruments is preserved, while offering The design of a guitar is influenced by its cultural context. For
innovative sound design. This paper will discuss two projects, thousands of years lutes and afterwards guitars evolved: starting
reAcoustic eGuitar, the concept of a digitally fabricated with ancient instruments that were made out of natural chambers
instrument to design acoustic sounds, and A Physical Resonator (turtle shells, gourds), through fine handmade wooden chambers
For a Virtual Guitar, a vision in which the guitar can also [4] to electrically amplified guitars. Carfoot [7] presents and
preserve the unique tune of an instrument made from wood. analyzes the huge changes in guitar in the 20th century; electric
guitars, which use electricity in order to amplify instead of
chambers, evolved at mid century and were a part of the musical
Keywords revolution of Rock & Roll and its distortion sound.
Virtual, acoustic, uniqueness of tune, expressivity, sound
processing, rapid prototype, 3D printing, resonator. The guitar has been influenced by electrical technologies. It is to
be expected that digital technologies will now take a significant
part in the guitar evolution. While sound design has been
1. BACKGROUND conventionally done using digital software, expressive digital
Each acoustic instrument made of wood is unique. Each piece of instruments are starting to appear as well. The Line 6 Variax [5]
wood is different, leading to uniqueness of tune of the acoustic guitar gives a variety of preset sounds, from classic acoustic and
sound that is created. Both uniqueness and expressivity are the electric tones to sitar and banjo. It allows the player to plug into a
most important characteristics of the acoustic instrument. Digital computer and customize a chosen tone. Expressive playing and
instruments lack the uniqueness but usually allow more sound sound flexibility is enhanced with the digital guitar. Another
flexibility [1], by offering digital sound processing or synthesis example is Fender’s VG Stratocaster [6], a hybrid electric and
[2]. digital guitar.
Digital keyboard instruments have been significantly more
Carfoot uses the term virtual instead of digital. If digital defines
successful than bowed or plucked instruments, which suffered
the type of process being done, virtual refers better to an
from lack of expressivity and uniqueness of tune. On the one
experience’s context. Like virtual reality, the virtual sound
hand, the digital instrument can add new interfaces, controllers
created in digital environment imitates real life experience. This
and sound abilities to the musical experience. On the other hand,
experience feels like a natural experience to our senses, but it was
there is a significant cost for modeling the captured information
created with a computer model of that real life experience. In
into a pre-defined digital structure. Besides the processing
sections 2 and 3 we present our approach using the virtual sound
problem, it usually leads to decreasing or canceling the
uniqueness of tune and expressivity of the instrument. experience in order to create a new physical guitar (a conceptual
work). In section 4 we present a different vision in which the
The main approach to deal with the expressivity problem lies in guitar can also preserve unique tune of a material (a work in
the field of sound processing, instead of synthesis. One option to progress).
this approach is to capture expressive signal and modify some
parameters while preserving the expressive behavior [3].
We come to suggest a different approach. We believe that 2. COMBINING VIRTUAL AND
significant work can be done by combining benefits from both of PHYSICAL IN GUITAR DESIGN
the worlds (digital and physical) – preserving the values of 3D design, sound design and digital music software are becoming
acoustic instruments while applying digital control to their common and easier to use. Their combination is leading to the
structures. possibility of designing, simulating and printing objects according
to pre-required acoustic behavior.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Gershenfeld [8] presents a future realm in which personal 3D

printers become as common as color printers. RedEye RPM [9] is
a rapid prototyping company that creates guitars using digital
manufacturing technology. Synthetic materials, such as carbon-
fiber epoxy composites, could be used instead of wood in guitar
soundboards [10]. Blackbird Guitars created the Blackbird Rider
Acoustic [11], a commercial guitar digitally designed and made
from composite materials. This kind of new material enables a
significant decrease of the chamber’s size while preserving the
instrument loudness.

Three perspectives are fundamental to the sound experience
created by a musical instrument: the listener, the performer and
the instrument constructor [12].
The vision of reAcoustic eGuitar invites players to become
creators of their acoustic instruments and their sounds with
endless possibilities for the sounds to be re-shaped. Players will
customize their own sounds by assembling different small
chambers instead of using a single large one. Each string has its
own bridge; each bridge is connected to a different chamber.
Changing the chamber size, material or shape will change the
guitar’s sound.
Designing sounds digitally allows the player to share the
experience of the constructor. This might lead in a change of
relationship between players and their instruments. Today rapid
prototype materials have a broad range of qualities. Players can
now take part in designing their own acoustic sounds, by
modifying the physical structure of their instruments, revealing
the characteristics of new materials (see Figure 1).
We created a simple chamber in rapid prototype process. This
chamber adds a significant amplification to a single string (see
Figure 2)1, even without optimizing acoustical parameters as
membrane thickness and sound box size.
In the reAcoustic eGuitar vision digital technology will be used to
design the acoustic guitar structure (see Figure 3 for a design Figure 1: Constructing principles: Searching, downloading,
suggestion). It presents a novel sound design experience between modifying, printing and assembling the chambers.
users, their objects and the digital environment.
Re-designing the guitar according to the characteristics of rapid
prototyping materials could lead to sound innovations. Open
source and shared files environments could create a reality in
which a player downloads or designs his own sound cells, and
plugs them to his instrument (see Figure 4).
Starting from virtual sound, getting the desired virtual shape and
then printing it, the reAcoustic eGuitar offers a new user
experience for the guitar player.
The main disadvantage of the reAcoustic eGuitar concept lies in
the rapid prototype process itself. The process is expensive and
doesn’t preserve uniqueness of tune as wood does. Perhaps in a
few years, 3D printers will become less expensive and more
accessible so this idea can be reconsidered.

Figure 2: The 3D printed chamber connected to single string

on a wood structure vs. string on a wood structure without a
3D printed chamber from 3D Systems InVision HR 3-D is chamber.
presented in, January 27,

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

unique tool that also enables the player to design the required
sound with the computer.
The uniqueness of a musical instrument influences more than just
its sound. By differing itself from other instruments, it assumes an
individual economic value and stabilizes a unique relationship
with its owner. The structure of the wood is the main reason for
the acoustic instrument’s unique behavior. The grain of the
soundboard [13], the wood’s humidity, the exact thickness and
more influence how it transfers different frequencies. Luthiers
[14,15] used their experience in order to tune the instrument by
making modification to the wood until it gave the required results.
A Physical Resonator For A Virtual Guitar focuses on the
influences of the chamber on the sound of the acoustic guitar. The
chamber’s main parameters are the shape and material [14,15].
The structure and shape can be virtually designed on a computer
and be used as a virtual chamber. The material will not be
synthesized or modulated. In this way we will get a hybrid
chamber – part of it is physical (the guitar’s resonator) and part of
it is virtual (see Figure 5).
A replaceable slice of the material (the guitar resonator) will be
connected to the guitar bridge using mechanism that enables easy
replacement. Piezo sensors will capture the frequencies being
developed on the guitar’s resonator. The signal will be transferred
to a digital signal-processing unit (DSP). The DSP will modify the
sound by simulating different chambers shapes and sizes,
thickness and surface smoothness.
Figure 3: reAcoustic eGuitar, a design suggestion.

Figure 5: Physical resonator in virtual shape.

By combining the virtual with the physical, we believe we can

preserve both worlds’ values. More than that, the new approach of
the physical resonator can play an important role in continuing the
traditional relationship between players and their unique
Figure 4: Examples of different chambers. instruments. The digital part can be replaced and updated; the
resonators can be collected and saved. A player could take one
guitar body with many resonators, instead of a lot of guitars.
The use of a physical resonator is not limited to wood. The
4. A PHYSICAL RESONATOR FOR A resonator can also be created in a rapid prototype process; similar
VIRTUAL GUITAR to the concept presented in section 3.
The former project led to a new vision, A Physical Resonator For
A Virtual Guitar. It is a concept of combining the values of the
virtual guitar with the uniqueness of the wooden acoustic guitar’s
tune. By doing so we can achieve expressive playability in a

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

5. CONCLUSION AND FUTURE WORK Version of Die Gitarre und ihr Bau by Havrey J. C., (1981).
We believe that the future of the guitar lies in the connection The Bold Strummer Ltd, First English Edition, 1981.
between digital sound design and acoustic experience. Digital [5] Line 6, Variax®. Product website Last
processing can create new options for sound design, where the accessed: January 27, 2008.
acoustic part of the instrument will give the expressivity and
[6] Fender, VG Stratocaster®. Product website
uniqueness of tune. The reAcoustic eGuitar concept is based on Last accessed: January
rapid prototype techniques and 3D printers. This process is 30, 2008.
expensive and not accessible to the majority of guitar players.
There is not enough knowledge and experience of using rapid [7] Carfoot G. Acoustic, Electric and Virtual Noise: The Cultural
prototype for creating acoustic instruments. However, we believe Identity of the Guitar. Leonardo music journal, Vol. 16, pp.
that this may be more feasible in the future. 35-39, 2006.
The A Physical Resonator For A Virtual Guitar is a work in [8] Gershenfeld N. FAB: The Coming Revolution on Your
progress. We believe that by creating a chamber that is part virtual Desktop - From Personal Computers to Personal Fabrication,
and part physical, we will preserve expressivity and uniqueness of pp. 3-27. Basic Books, April 12, 2005.
tune in digital sound design innovations. We intend to develop a [9] RedEye RPM. Guitar with digital manufacturing technology.
working model for A Physical Resonator For A Virtual Guitar. Company website . Last accessed:
This process will be divided into different parts - from mechanical January 27, 2008.
solution for the replaceable resonator through development of
piezo sensors system that will be able to capture the resonator [10] Jonathan H. Carbon Fiber vs. Wood as an Acoustic Guitar
vibration in different locations. We also intend to develop a DSP Soundboard. PHYS 207 term paper.
unit that will implement the digital modeling of the structure. [11] Blackbird Guitars. Blackbird Rider Acoustic. Company
website Last accessed: January
27, 2008.
6. ACKNOWLEDGMENTS [12] Kvifte T., Jensenius A. R. Towards a Coherent Terminology
Authors want to thank MIT Media Laboratory, Marco Coppiardi, and Model of Instrument Description and Design. NIME 06,
Cati Vaucelle, Nan-Wei Gong and Tamar Rucham or their help June 4-8, 2006. Paris, France.
and support.
[13] Buksnowitz C., Teischinger A., Muller U., Pahler A., Evans
R., (2006). Resonance wood [Picea abies (L.) Karst.] –
7. REFERENCES evaluation and prediction of violin makers’ quality-grading.
J. Acoustical Society of America 121, 2007.
[1] Magnusson T., Mendieta E. H. The Acoustic, The Digital
and he Body: A Survey on Musical Instrument. NIME 07, [14] Kinkead J. Build Your Own Acoustic Guitar: Complete
June 6-10, 2007. New York, New York, USA. Instructions and Full-Size Plans. Published 2004 by Hal
[2] Poepel C., Overholt D. Recent Developments in Violin-
related Digital Musical Instruments: Where Are We and [15] Cumpiano W. R., Natelson J. D. Guitarmaking: Tradition
Where are We Going? NIME 06, June 4-8, 2006. Paris, and Technology: A Complete Reference for the Design &
France. Construction of the Steel - String Folk Guitar & the Classical
Guitar (Guitar Reference). Published 1998 by Chronicle
[3] Merrill D., Raffle H. The Sound of Touch. CHI 2007, April
28 – May 3, 2007, San Jose, California, USA.
[4] Jahnel F. (1962). Manual of Guitar technology: The History
and Technology of Plucked String Instruments. English

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Virtual Intimacy : Phya as an Instrument

Dylan Menzies
Dept. Computer Science and Engineering
De Montfort University
Leicester, UK

ABSTRACT lead to convincing results, but only if coupled carefully with

Phya is an open source C++ library originally designed for the large scale dynamics.
adding physically modeled contact sounds into computer While physical modeling provides a powerful way to gen-
game environments equipped with physics engines. We re- erate strong percepts, a balance must be struck on the
view some aspects of this system, and also consider it from level of detail, so that the output is not overly constrained.
the purely aesthetic perspective of musical expression. In practice this leads to the development of semi-physical-
perceptual models that provide some freedom for the sound
designer to more easily mould a virtual world.
Keywords It was apparent from early on, that Phya offers an in-
NIME, musical expression, virtual reality, physical model- herently musical experience, even from the limited control
ing, audio synthesis environment of a desktop. The richness of dynamic behav-
ior and multi-modal feedback are characteristic of musical
performance. A later section explores this further. Use
1. INTRODUCTION of coupled musical-visual performances has become com-
The use of impact sounds coupled to a modeled environ- mon, however performances within a physical audio-visual
ment was introduced in [5]. Refinements of impact sound world are still apparently scarce, as are physical audio-
models have since been made [1] . The first working mod- visual worlds in computer games. This state of affairs has
els for sustained contact sounds integrated with a physical prompted this article.
environment was made in [13], greatly expanding the over-
all realism of the simulation by relating audio and visual
elements continuously. Frictional models have been cre- 2. TECHNOLOGICAL REVIEW
ated for musical instruments, and have also been applied Below we briefly review the components of Phya, and the
to surfaces in simulated environments [2]. Further models overall structure used to accommodate them.
have have been proposed for other environmental sounds
including fluids [12]. In [7] a framework was presented for 2.1 Impacts
a physical audio system, designed to operate closely with a
physics engine providing rigid body dynamics. The empha- 2.1.1 Simple spring
sis was on using robust techniques that could be scaled up
The simplest impacts consist of a single excitation pulse,
easily, and accommodate an environment that was rapidly
which then drives the resonant properties of the colliding
changing. This work developed into the Phya physical audio
objects. The spectral brightness of the pulse depends on
library discussed here.
the combined hardness of the two surfaces. Using a spring
The principle goal for Phya has been to generate sounds
model, the combined spring constant, which determines the
that arise from dynamical interactions, that are either that
duration and so spectral profile of a hit, is k = (k1−1 +k2−1 )−1
are clearly visually apparent, or directly affected by user
where k1 and k2 are the spring constants of the individual
control. This is because when audio can be closely causally
surfaces. A model which just takes kpto be the lesser value
correlated to other percepts, the overall perceptual effect
is also adequate. The duration is π m/k where m is the
and sense of immersion is magnified considerably. A wide
selection of sounds fall into this category, including colli- effective mass (m−1 −1 −1
1 + m2 ) . The effective mass can be
sions between discrete solid and deformable objects. The approximated by the lesser mass. If one object is fixed like
complex dynamics of these objects is captured well by the a wall, the effective mass is the free object’s mass.
many physics engines that have been developed. The au- pThe impact displacement amplitude in this model is, A =
dio generated is a modulation of the audio rate dynamics of v m/k where v is the relative normal contact speed. To
excitation and resonance, by the relatively slow large scale give the sound designer more freedom over the relation be-
dynamics of objects. Simple audio synthesis processes can tween collision parameters and the impact amplitude, a lin-
ear breakpoint scheme is used with an upper limit also pro-
viding a primary stage of audio level limiting. Note that
the masses used for impact generation do not have to be in
Permission to make digital or hard copies of all or part of this work for
exact proportion to the dynamics engine masses.
personal or classroom use is granted without fee provided that copies are Audio sensitivity to surface hardness and object mass,
not made or distributed for profit or commercial advantage and that copies helps to paint a clearer picture of the environment. From a
bear this notice and the full citation on the first page. To copy otherwise, to musical perspective it adds variation to the sound that can
republish, to post on servers or to redistribute to lists, requires prior specific be generated, in an intuitive way.
permission and/or a fee.
NIME08, Genova, Italy
Copyright 2008 Copyright remains with the author(s).
2.1.2 Stiffness

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

profile at the contact point to generate an audio excitation.
pulse shorter Rolling is similar to sliding, except there is no relative
because k increases
movement at the contact point, resulting in a spectrally
above threshold
less bright version of the sliding excitation. This can be
modeled by appending a lowpass filter that can be varied
constant k/m pulses
according to the slip speed at the contact, creating a strong
cue for the dynamics there. See Figure 3. A second or-
der filter is useful to shape the spectrum better. The con-
tact excitation is also amplified by the normal force, in the
same way impacts are modified by collision energy. More
Figure 1: Displacements from three impacts, one of
subtle are modifications to spectral brightness according to
which is stiff.
the m/k ratio that determines the brightness of an impact.
Low m/k corresponds to a light needle reading the surface
at full brightness. Heavier objects result in slower response,
which can modeled again by controlling the lowpass filter.
contact layer Although simple, this efficient model is effective because it

contact surface profile lowpass gain exittion
speed generator
Figure 2: A grazing impact. / position freq

Impact stiffness is important for providing cues to the slip speed
listener about impact dynamics, because it causes spec-
tral changes in the sound depending on impact strength, normal force
whereas impact strength judged from the amplitude level
of an impact received by a listener is ambiguous because Figure 3: Surface excitation from rolling and slid-
of the attenuating effect of distance. Stiffness can be mod- ing.
eled by making the spring constant increase with impact
displacement. This causes an overall decrease in impact takes in the full dynamic information of the contact and
duration for an increase in impact amplitude, and makes it uses it to shape the audio which we then correlate with the
spectrally brighter, illustrated in Figure 1. The variation visual portrayal of the dynamics. It is also easily customized
in stiffness with impulse is a property of the surface and to fit the sound designers requirements. When flat surfaces
can be modeled reasonably well with a simple breakpoint are in contact over a wide area this can be treated as sev-
scheme, that can be tuned by the sound designer directly. eral spaced out contact points, which can often be supplied
Increasing brightness with note loudness is an important directly by the dynamics-collision system.
attribute of many musical instruments, acoustic and elec-
tronic, and is rooted in our everyday physical experience. 2.2.2 Contact jumps
It might even be called a universal element of expression. Even for a surface that is completely solid and smooth,
Phya incorporates this behavior naturally. the excitations do not necessarily correspond very well with
the surface profile. A contact may jump creating a small
2.1.3 Multiple hits and grazing micro-impact, due to the blunt nature of the contact sur-
Sometimes several hits can occur in rapid succession. A faces, see Figure 4. The sound resulting from this is signifi-
given physics engine would be capable of generating this im- cant and cannot be produced by reading the surface profile
pact information down to a certain time scale. The effect directly. Again, the detailed modeling of the surface inter-
can be simulated by generating secondary impulses accord- actions is beyond the capabilities available from dynamics
ing to a simple poisson-like stochastic process, so that for and collisions engines, which are not designed for this level
a larger impact the chance of secondary impacts increases. of detail. Good results can instead be achieved by adding
Also common are grazing hits, in which an impact is as- the jumps, pre-processed, into the profile, Figure 5. Down-
sociated with a short period of rolling and sliding. This sampling a jump results in a bump, unless it is sampled
is because the surfaces are uneven, and the main impulse with sufficient initial resolution, which may be impracti-
causing the rebound occurs during a period of less repulsive cal. A useful variation is therefore to downsample jumps to
contact. Such fine dynamics cannot be captured by a typ- jumps, by not interpolating. This retains the ’jumpiness’
ical physics engine. However, good results can be achieved and avoids the record-slowing-down effect.
by combining an audio impulse generation with a continu-
ous contact generation, according to the speed of collision 2.2.3 Programmatic and stochastic surfaces
and angle of incidence, see Figure 2. The component of ve- Stored profiles can be mapped over surface areas to cre-
locity parallel to the surface is used for the surface contact ate varying surface conditions. This can be acceptable for
speed. sparse jump-like surfaces that can be encoded at reduced
sample rates, but in general the memory requirements can
2.2 Continuous contacts be unreasonable. An alternative is to describe surfaces pro-
grammatically, either in a deterministic or fully stochastic
2.2.1 Basic model way. The advantage of a largely deterministic process is
Continuous contact generation is a more complex pro- that repetitions of a surface correlate closely, for instance
cess. The first method introduced, [13], was to mimic a when something is rolling back and forth, providing consis-
needle following the groove on a record. This corresponds tency cues to the dynamic behavior even without visuals.
to a contact point on one surface sliding over another sur- Indexable random number generators provide a way to de-
face, and is implemented by reading or generating a surface terministically generate random surfaces. Others include

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

linked to the movement of the door. Stick and slip for dis-
crete solid objects is simulated well by the generation of
pulses at regular linear or angular intervals. The amplitude
and spectral profile of the pulses modifying as the contact
force and speed changes. As contact force increases, nor-
mally the interval between each pulse increases, due to the
increased static friction limit, with more or less constant
impact lateral spring constant.

2.2.5 Buzzing
Common phenomena are buzzing and rattling at a con-
tact, caused by objects in light contact that have been set
Figure 4: Micro-impact occuring due to contact ge- vibrating. Like impact stiffness, it provides a distant in-
ometry dependent cue of dynamic state, which in this case is the
amplitude of vibration. Objects that are at first very quiet
can become loud when they begin to buzz, due to the non-
linear transfer of low frequency energy up to higher frequen-
cies that are radiated better. Precise modeling of this with
a dynamics-collision engine would be infeasible. However,
the process can be modeled well by clipping the signal from
the main vibrating object, as shown in Figure 7, and feed-
ing it to the resonant objects that are buzzing against each
other. The process could be made more elaborate by cal-
culating the mutual excitation due to two surfaces moving
Figure 5: Preprocessing a surface profile to include against each other.

repeating functions to generate pattern based surfaces such

as grids.
A useful range of surfaces can be generated by stochas-
tically generating pulses of different widths, with control
over the statistical parameters. A change of contact speed
is then achieved by simply varying the parameters.
Secondary excitations can also be generated stochasti-
cally, for instance to simulate the disturbance of gravel on
a surface, in a similar manner to the physically informed
footsteps in [3], Figure 6. In this scheme the collision pa- Figure 7: Clipping of resonator output to provide
buzz excitation.
tangent speed
lowpass poisson random lowpass particle

normal force
filter event gen amp pulse filter resonance
2.3 Resonators

Figure 6: Modeling loose surface particle sound. 2.3.1 Modal resonators, calibration, location de-
rameters are used to determine the activity rate of a poisson There are many types of resonator structure that have
like process which then generates impulses mimicking the been used to simulate sounding objects. For virtual envi-
collisions of gravel particles. A low frequency lowpass fil- ronments we require a minimal set of resonators that can be
ter is used to simulate the duration of the particle spray easily adapted to a wide variety of sounds, and can be effi-
following an impact. The impulses have randomly selected ciently run in numbers. The earliest forms of resonator used
amplitudes and are shaped or filtered to reflect increased for this purpose were modal resonators [5, 13] which con-
particle collision brightness with increased contact force and sist of parallel banks of second order resonant filters, each
speed, before exciting a particle resonance. This model sim- with individual coupling constants and damping. These are
plifies the fact that at high system collision energies there particularly suited to objects with mainly sharp resonances
will still be particle collisions occurring at low energy. It such as solid objects made from glass, stone and metal. It is
also assumes all particles have the same resonance. The possible to identify spectral peaks in the recording of a such
model does however have sufficient dynamic temporal and an object, and also the damping by tracking how quickly
spectral behavior to be interesting. Three levels of dynam- each peak decays, [11]. A command line tool is included
ics can be distinguished here, the gross object dynamics, with Phya for automating this process. The resultant data
the simulated gravel dynamics, and audio resonance. The is many times smaller than even a single collision sample.
detail that can be encoded in surface excitations is critical Refinements to this process included sampling over a range
from the musical point of view. It provides the foundation of impact points, and using spatial sound reconstruction.
from which the full sounds evolves. The associated complexities were not considered a priority
in Phya. Hitting an object in different places produces dif-
2.2.4 Friction ferent sounds, but just hitting an object in the same place
Friction stick and slip processes are important in string repeatedly produces different sounds each time, due to the
instruments. In virtual environments they are much less changing state of the resonant filters. It is part of the at-
common source of sound than the interactions considered traction of physical modeling that such subtleties are man-
so far. A good example is door creaking, which is visually ifested. If needed, an collision object can be broken up into

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

several different collision objects, and different Phya sound There is a common class of objects that are not com-
objects associated with these. pletely rigid, but still resonate clearly, for example a thin
sheet of metal. Such objects have variable resonance char-
2.3.2 Diffuse resonance acteristics depending on their shape. While explicit model-
For a large enough object of a given material the modes ing of the resonance parameters according to shape is pro-
become very numerous and merge into a diffuse continuum. hibitive, an excellent qualitative effect that correlates well
This coincides with the emergence of time domain struc- with visual dynamics is to vary the resonator parameters
ture at scales of interest to us, so that for instance a large about a calibrated set, according variations of shape from
plate of metal can be used to create echos and reverbera- the nominal. This can be quantified in a physical model
tion. For less dense, more damped material such as wood, of a deformable model by using stress parameters or linear
pronounced diffuse resonance occurs at modest sizes, for in- expansion factors. The large scale oscillation of such a body
stance in chairs and doors. Such objects are very common modulates the audio frequencies providing an excellent ex-
in virtual environments and yet a modal resonator is not ample of audiovisual dynamic coupling.
efficiently able to model diffuse resonance, or be matched
to a recording. Waveguide methods have been employed to 2.4 Phya overall structure and engine
model diffuse resonance either using abstract networks, in- Phya is built in the C++ language, and is based around
cluding banded waveguides [4], feedback delay networks [9] a core set of general object types, that can specialized and
or more explicit structures such as waveguide meshes [14, extended. Sounding objects are represented by a contain-
15]. An alternative approach introduced in [6], is to mimic ing object called a Body, which refers to an associated Sur-
a diffuse resonator by dividing the excitation into frequency face and Resonator object, see Figure 9. Specializations
bands, and feeding the power in each into a multi-band noise of these include SegmentSurface for recorded surface pro-
generator, via a filter that generates the time decay for each files, RandSurface for deterministically generated stochas-
band, see figure 8. This perceptual resonator provides a dif- tic surfaces, GridSurface for patterns. The resonator sub-
fuse response that responds to the input spectrum. When types are ModalResonator and PerceptualResonator. Bod-
combined with modal modeling for lower frequencies it can ies can share the same surface and resonator if required in
efficiently simulate wood resonance, and can be easily ma- order to handle groups of objects more efficiently. Colli-
nipulated by the sound designer. A similar approach had sions states are represented using Impact and Contact ob-
been used in [10] to simulate the diffuse resonance of sound jects that are dynamically created and released as collisions
boards to hammer strikes, however the difference here is occur between physical objects. These objects take care of
that the resonator follows the spectral profile of a general updating the state of any associated surface interactions.

resonator surface
bandpass gain

bandpass envelope lowpass impact contact


body1 impact generator body1 contact generator
body2 body2
bandpass gain
Figure 9: Main objects in Phya.
bandpass envelope lowpass

Figure 8: Outline of a perceptual resonator.

2.4.1 System view
The top level system view is shown in Figure 10. The
collision system in the environment simulator must gener-
2.3.3 Surface damping ate trigger updates in Phya’s collision update section, for
example using a callback system. This in turn reads dy-
A common feature of resonant objects is that their damp-
namic information from the dynamics engine and updates
ing factors are increased by contact with other objects. For
parameters that are used by the Phya audio thread to gen-
instance a cup placed on a table sounds less resonant when
erate audio samples. The most awkward part of the process
struck. This behavior has a strong visual-dynamic coupling,
is finding a way for Phya to keep track of continuous con-
and provides information about the surfaces. It can be sim-
ulated by accumulating a damping factor for each resonator
as a sum of damping factors associated with the surfaces
that are in contact.
Dynamics Collision update

2.3.4 Nonlinear resonance

Many objects enter non-linear regimes when vibrating
Collision DSP / Audio
strongly, sometimes causing a progressive shift of energy
to higher frequencies. For a modal system this can be mod-
eled by exciting higher modes by lower modes via nonlinear
couplings. In waveguide systems the non-linearities can be Physics engine Phya
built into the network.
Figure 10: Phya system overview.
2.3.5 Deformable objects

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

2.4.2 Tracking contacts From a more abstract view, the layered, multi-scale dynam-
Most collision engines do not use persistent contacts, mean- ics within Phya capture the layered dynamics present in
ing they forget information about contacts from one colli- real acoustic instruments. It is sometimes claimed that this
sion frame to another. On the other hand Phya wishes structure is particularly relevant to musical performance,
to remember contacts because it has audio processes that [8]. Electronic performance systems often fail to embody
generate excitations continuously during a contact. The the full range of dynamic scales, even within physically
problem can be attacked either by modifying the collision modeled instruments, which sometimes lack physical con-
engine, which is hard or not possible, or searching contact trol interfaces with appropriate embedded dynamics.
lists. In the simplest case, the physics engine provides a list Although grounded in physical behavior, and therefore
of non-persistent physical contacts at each collision step, naturally appealing to human psychology, the intimate in-
and no other information. For each physical contact, the teractions can be tailored to more unusual simulations that
associated Phya bodies can be found and compared with would be difficult or impossible in the real world. For
a list of current Phya contact pairs. If no pair matches a instance very deep resonances can be easily created that
new Phya contact is formed. If a pair is found, it is asso- would require very heavy objects, and unusual resonances
ciated with the current physical contact. For any pairs left can be created. Likewise, the parameters of surfaces can be
unmatched, the associated Phya contact is released. See composed to ensure the desired musical effect. The physical
Figure 11. This works on the, mostly true, assumption that behavior of objects can be matched to any desired scale, of
if a physical contact exists between two bodies in two suc- distance, time or gravity. Because the graphical world is
cessive frames then that is a continuous contact evolving. If virtual it too can be composed artistically with more free-
two bodies are in contact in more than one place then some dom than the real world.
confusion can occur, but this is offset by the fact that the The graphical output not only provides additional feed-
sound is more complex. Engines that keep persistent con- back to the performer, but adds the kind of intimate visual
tacts are easier to handle. The ability to generate callbacks association, present in traditional musical performance, but
when contacts are created and destroyed helps even more. lacking in much live electronic music, especially that focused
around keyboard and mouse control. Phya provides the au-
Physical Body1 Phya Body1 dience with an alternative to the performer as a visual focus.
Physical look in Phya Phya The mouse interface is readily extended to a more hapti-
Contact Contacts Contact cally and visually appropriate controller using a device such
Physical Body2 Phya Body2 as a Nintendo Wii remote. This has the effect of making the
control path correspond directly to the object path, improv-
Figure 11: Find a Phya contact from a Physical ing the sense of immersion for the performer. In a CAVE
contact. like environment the performer can maneuver within a spa-
tial audio environment, although without an audience. In a
full headset virtual reality environment, the performer can
2.4.3 Smooth surfaces interact directly with objects through virtual limbs, with
Another problem of continuous contacts arises from the virtual co-performers and virtual audience.
collision detection of curved surfaces. For example the col- While Phya has not been used yet to produce an extended
lision of a cylinder can be detected using a dedicated algo- musical work, we discuss musical aspects of some demon-
rithm, or a more general one applied to a collision net that strations. Figures 12 and 13, show simple examples of sonic
approximates a cylinder. From a visual dynamic point of toys constructed with Phya. In the first nested spheres form
view the general approach may appear satisfactory. How- a kind of virtual rattle, with the lowest resonance associated
ever, the dynamic information produced may lead to audio with the biggest sphere. The user interacts by dragging the
that is clearly consistent with an object with corners and middle sphere around by invisible elastic. The second shows
not smooth. A way to improve this situation is to smooth a deformable teapot with a range of resonances. The defor-
the dynamic information when it is intended that the sur- mation parameters are used to modify the resonant frequen-
face is smooth, using linear filters. This requires Phya to cies on the fly. The effect is at once familiar and surreal.
check the tags on the physical objects associated with a new Further examples demonstrate the stacking of many differ-
contact to see if smoothing is intended. ent resonant blocks. Configuring groups of blocks becomes
a musical, zen-like process.
2.4.4 Limiters
The unpredictable nature of physical environmental sound
requires automated level control, both to ensure it is suf-
ficiently audible and also not so loud to dominate other
audio sources or to clip the audio range. This has already
been partly addressed at the stage of excitation generation,
however because of the unpredictability of the whole sys-
tem, it is also necessary to apply limiters to the final mix.
This is best achieved with a short look-ahead brick wall lim-
iter, that can guarantee a limit, while also reducing annoy-
ing artifacts that would be caused without any look-ahead. Figure 12: Nested sonic spheres.
Too much look-ahead would compromise interactivity, how-
ever the duration of a single audio system processing vector,
which is typically 128 samples, is found to be sufficient. 4. COPING WITH NETWORK LATENCY
There has been considerable interest in collaborative in-
3. A VIRTUAL MUSICAL INSTRUMENT teractive musical performance over networks. One aspect
While Phya was designed for general purpose virtual worlds, of such systems is the delay or latency required to transmit
the variety and detail of sonic interactions on offer lend information around the network, which can be musically
themselves to the creation of musical virtual instruments. significant for long distance collaborations. In the case of

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

The original goal was to create a system that can capture
the sonic nuance and variety of collisions, and that was easy
to configure and use within a virtual reality context. This
required the consideration of a variety of inter-dependent
factors. The result is a system that is not only useful from
the point of view of virtual reality, but has natural aesthetic
Figure 13: Deformable sonic teapot. interest and application in musical performance. The inte-
grated graphical output is part of a fused perceptual aes-
thetic. Phya is now an open source project. 2 .
performance with acoustic instruments, it is impossible to
make each side hear the same total performance while also 7. REFERENCES
playing their instruments normally. Virtual instruments of [1] F. Avanzini, M. Rath, and D. Rocchesso.
the kind described here offer another possibility, due to the Physically-based audio rendering of contact. In Proc.
fact that the dynamics of the virtual world is strictly sepa- IEEE Int. Conf. on Multimedia and Expo,
rated from the control in the outer world. Figure 14 shows (ICME2002), Lausanne, volume 2, pages 445–448,
a collaboration between two performers across a network. 2002.
Adding local delays to match the network latency keeps the [2] F. Avanzini, S. Serafin, and D. Rocchesso. Interactive
simulation of rigid body interaction with
friction-induced sound generation. IEEE Tr. Speech
and Audio Processing, 13(5.2):1073–1081, 2005.
Delay D Virtual world Latency D Virtual world Delay D [3] P. Cook. Physically informed sonic modeling (phism):
Synthesis of percussive sounds. Computer Music
Journal, 21:3, 1997.
[4] G. Essl, S. Serafin, P. Cook, and J. Smith. Theory of
Figure 14: Two performers with local virtual banded waveguides. Computer Music Journal, spring
worlds. 2004.
[5] J. K. Hahn, H. Fouad, L. Gritz, and J. W. Lee.
two virtual worlds synchronized. In each world the audio Integrating sounds and motions in virtual
and graphical elements are of course synchronized. Per- environments. In Sound for Animation and Virtual
formance gestures are delayed, but this is not such a severe Reality, SIGGRAPH 95, 1995.
handicap because the visual feedback remains synchronized, [6] D. Menzies. Perceptual resonators for interactive
and is a price worth paying to maintain overall synchroniza- worlds. In Proceedings AES 22nd International
tion over the network. If control is by force rather than po- Conference on Virtual, Synthetic and Entertainment
sition, the gesture delay is even less intrusive. To eliminate Audio, 2002.
drift between the virtual worlds, and handle many perform- [7] D. Menzies. Scene management for modelled audio
ers efficiently, a central virtual world can be used, shown in objects in interactive worlds. In International
Figure 15 This adds return latency delays. Conference on Auditory Display, 2002.
[8] D. Menzies. Composing instrument control dynamics.
Organized Sound, 7(3), April 2003.
[9] D. Rochesso and J. O. Smith. Circulant and elliptic
Virtual world feedback delay networks for artificial reverberation.
Latency 1 Latency 2 IEEE trans. Speech and Audio, 5(1):1997, 1997.
[10] J. O. Smith and S. A. Van Duyne. Developments for
the commuted piano. In Proceedings of the
International Computer Music Conference, Banff,
Figure 15: Many performers with a central virtual Canada, 1995.
world. [11] K. van den Doel. Sound Synthesis for Virtual Reality
and Computer Games. PhD thesis, University of
British Columbia, 1998.
[12] K. van den Doel. Physically-based models for liquid
5. BACK TO REALITY sounds. ACM Transactions on Applied Perception,
The aesthetics of Phya partly inspired a tangible musi- 2:534–546, 2005.
cal performance piece, that we mention briefly because it [13] K. van den Doel, P. G. Kry, and D. K. Pai.
provides an interesting example of how the boundary be- Foleyautomatic: Physically-based sound effects for
tween virtual and real can become blurred. Ceramic Bowl1 interactive simulation and animation. In Computer
centers around a bowl with 4 contact microphones attached Graphics (ACM SIGGRAPH 01 Conference
around the base, where there is a hole. Objects are launched Proceedings), 2001.
manually into the bowl where they roll, slide and collide in [14] S. A. Van Duyne and J. O. Smith. Physical modeling
orbit until they exit. The captured sound is computer pro- with the 2-d digital waveguide mesh. In Proc. Int.
cessed under realtime control and diffused onto an 8 speaker Computer Music Conf., Tokyo, 1993.
rig. The microphone arrangement allows the spatial sound [15] S. A. Van Duyne and J. O. Smith. The 3d tetrahedral
events to be magnified over a large listening area. digital waveguide mesh with musical applications. In
Proceedings International Computer Music
First performed at the Electroacoustic Music Studies con- Conference, 2001.
ference, Leicester, 14 June 2007. Broadcast on BBC Radio
3 Hear and Now, 25 August 2007 Details at

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Creating Pedagogical Etudes for Interactive Instruments

Jennifer Butler
University Of British Columbia
Vancouver, B.C. Canada

ABSTRACT glove, or any other interactive electronic instrument. The

In this paper I discuss the importance of and need for creation of such a method will, I believe, help to guide both
pedagogical materials to support the development of new composers and instrument builders in the development of a
interfaces and new instruments for electronic music. I describe composed repertoire for interactive instruments, and an increase
my method for creating a graduated series of pedagogical in the expressive capabilities of both the performers and the
etudes composed using Max/MSP. The etudes will help instruments they use.
performers and instrument designers learn the most commonly
used basic skills necessary to perform with interactive
electronic music instruments. My intention is that the final
series will guide a beginner from these initial steps through a
graduated method, eventually incorporating some of the more
advanced techniques regularly used by electronic music
I describe the order of the series, and discuss the benefits (both
to performers and to composers) of having a logical sequence of
skill-based etudes. I also connect the significance of skilled
performers to the development of two essential areas that I
perceive are still just emerging in this field: the creation of a
composed repertoire and an increase in musical expression
during performance.
Figure 1. The p5 glove
Pedagogy, musical controllers, Max/MSP, etudes, composition, 2. THE ETUDES
repertoire, musical expression
2.1 Providing a Musical Context
Since the eighteenth century, it has been common practice for
1. INTRODUCTION composers and performers to write etudes for the development
The inspiration for developing a series of concert-etudes for of technique on virtually every established instrument. All
interactive musical instruments grew from my experiences instrumentalists who have achieved some level of virtuosity on
creating and performing music with a P5 glove (see figure 1). their instruments have done so through diligent practice of
Like most composers working in this field, I was not only technical exercises such as scales, arpeggios, tone practice, and
designing the music, but also learning how to perform on this composed etudes.
new instrument. Predictably, I found myself limited by my lack
of technical skill. I observe this to be a common problem Wanderley and Orio [6] describe another important purpose of
among composers and instrument designers in this field, with etudes: evaluation of different instruments. They describe a
performances featuring interactive electronics often sounding method used to compare interactive music systems. This
more like demonstrations or experiments than musical method uses short, repetitive “musical tasks.” With traditional
performances. musical instruments, they explain, “this task is facilitated thanks
to the vast music literature available. This is not the case [for]
As a musician with numerous years of training, I was not interactive music instruments that have a limited, or even
surprised that I needed to put in significant time to become nonexistent, literature.”
proficient on this instrument. However, it is not only time that is
needed to learn an instrument, but also a method. Currently, Etudes fulfill an important role in learning an instrument by
there are no existing methods for learning how to play a P5 providing an ingredient that short repetitive exercises cannot: a
musical context for the techniques they are teaching. [3] As a
composer, I propose that instead of compensating for the lack of
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are repertoire, we start composing a literature for interactive
not made or distributed for profit or commercial advantage and that electronic music instruments.
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
2.2 Virtuosity
NIME08, June 5-7, 2008, Genova, Italy Historically, one important role of the etude has been to build
Copyright remains with the author(s). virtuosity. For the purposes of this paper, I am using the

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

definition of virtuosity put forward by Dobrian and Koppelman Etude 8 introduces different methods of synthesis (for example:
[1]: “the ability to call upon all the capabilities of an instrument granular and FM), and Etude 9 is a study in changing tempos.
at will with relative ease.” As the authors point out, when Etude 10, the final etude in the series, brings together all skills
working with computers it does not make sense to judge learned in the earlier etudes.
virtuosity only by the factor of speed, because computers can
unquestionably play faster than humans. Each of these introductory etudes is notated along a timeline
that the performer must follow, using a clock that has been
When a performer has achieved virtuosity on an instrument, placed in the etude patch. (see figure 2). The main goal is for
many levels of control and technique have become the performer to become fluent enough on the instrument in
subconscious, and “when control of the instrument has been these basic control parameters so that when further complexity
mastered to the point where it is mostly subconscious, the mind is added the performer will be ready.
has more freedom to concentrate consciously on listening and
expression.” [1]
Virtuosic performers are highly valuable to composers and
instrument designers. Without virtuosic performers, and
instruments capable of adequate expression, composers cannot
hear their music fully realized. In many cases, instrument
designers and programmers have to rely on their own, often Figure 2. Example of Notation
limited, performing skills when first testing a new piece or
Etudes help to develop virtuosity, and therefore play a crucial Complexity is increased gradually throughout the series. It is
role in further developing a repertoire for an instrument. understood that the level of complexity might depend on the
Without etudes, players of acoustic instruments would not be characteristics of each interactive instrument. The main method
able to handle the technique needed to perform musical works, of adding complexity is to increase the number of different
and composers would not have performers to play the music control elements (for example, the number of triggers or
they imagine. As it says in the New Grove Dictionary, “the true different layers of sounds to be controlled) or by increasing the
virtuoso has always been prized not only for his rarity but also speed at which these elements need to be controlled. The first
for his ability to widen the technical and expressive boundaries three etudes use only one dimension, layer, or direction of
of his art.” [4] moveable data (constant flow between 0 and 127). Etudes 5 and
6 will involve two such layers. For example, one stream of data
could control volume, and the other spatialization. With some
3. STARTING AT THE BEGINNING instruments or mappings, the gestures that control this data may
3.1 Ordering the Series be completely separate (such as with a keyboard, or different
My initial series of etudes includes ten graduated studies that pedals), and with others they may be more connected (such as
introduce the basic skills needed to manipulate different with a wii, glove, or mouse). The final etudes will be the most
elements of musical sound. This series is designed for a complex: including many control parameters and requiring
beginner or novice performer on interactive instruments. The more intricate synchronization.
etudes are designed to create a non-intimidating experience for However, it is important to keep in mind that for now this is a
a musician with little or no previous experience with electronic series of beginner etudes, designed to prepare a beginning
music. performer for future compositions that may require a much
In choosing which musical elements and types of controls to higher level of complexity and technique.
include in the etudes, and in which order they will appear, I
have also created a priority list. Undoubtedly, my etudes focus 4. COMPATIBILITY
on the skills and musical elements most likely to be needed for
my own compositions. However, I have tried to make the 4.1 A Universal Interface
etudes stylistically diverse. By the end of the series the One of the most important features of these etudes is their
performer will have experience with: triggers, toggle, and more adaptability to many different controllers. Each etude is
fluid or constant parameters. designed so it is playable by any device that can produce the
required types of data. The interface for each etude lists the data
needed and provides the necessary links into the etude. For
3.2 The Etudes example, Etude 1 requires an instrument that can produce eight
Each etude contains four elements: 1 – a basic description of the separate triggers for sample playback (see figure 3).
purpose and intent of the etude, including a simulation
performance of the etude; 2 – a graphically notated score; 3 – Different mappings and interpretations can easily be tried with
the Max/MSP etude patch; and 4 – a Max/MSP patch that will each etude. This flexibility will allow performers to practice
be used to connect the interactive instrument to the etude patch. different movements for different musical parameters, helping
them to assess which of the movements will work best.
Etude 1 introduces the performer to different approaches to Performers can gain a deeper understanding of the particular
rhythm and synchronization. At times rhythmic freedom is strengths and weaknesses of their instrument.
encouraged, and at times strict rhythm is required. Etude 2
focuses on pitch control, while Etude 3 focuses on dynamic, or The etudes do not require specific movements, so the performer
volume control. Etude 4 combines the elements of rhythm, can choreograph all the gestures. For example, depending on
pitch, and volume control. Etude 5 focuses on spatialization and the instrument being used, different actions can activate each
localization, and Etude 6 on timbre and envelope manipulation. trigger; different parameters (position, amplitude, pitch) can
Etude 7 combines the elements used in the first six etudes. produce the same types of continuous numbers – yet the

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

resulting sounds will always be the same. Similar gestures, marimba. The various strengths and weaknesses of each
listening skills, and types of coordination are used by a large instrument become quickly apparent when repertoire is shared.
number of interactive instruments. Therefore, the skills a Also, many composers, notably John Cage, have written pieces
performer develops while learning this series of etudes on one for open instrumentation. Performances of these works can
controller will very likely be transferable to other controllers. vary widely depending on the instruments chosen.
Traditional etudes are also typically practised using a variety of
approaches that challenge players in a variety of ways (for
example, with different articulations or dynamic levels).

Figure 4. Etude 1 User Interface

Figure 3. Etude 1 Interface

4.2 The Etude Patches

Each etude will have two Max/MSP components. The primary
component is the etude patch (see figures 3 and 5). This patch
contains all the programming needed for each etude, and should
not be edited. Each etude patch includes an On/Off switch,
reset button, simulation button, and clock. The patch shows all
the needed information for performing the specific etude.
Each etude will also come with an optional User Interface (see
figures 4 and 6). This interface will include all the “send
objects” needed to communicate with the etude patch, as well as
information about the type of data that the etude patch is
programmed to receive. Performers will need to edit this patch
or create a new patch that sends the necessary information from
their interactive instrument into the etude patch.

4.3 A Shared Repertoire

Having a notated repertoire that can be performed by different
musicians, as well as different instruments, is important to the Figure 5. Etude 2 Interface
development of any musical genre. Currently there is no such
repertoire for interactive electronic instruments, and
consequently no way to make musical comparisons between
4.4 Point of Reference
performers or instruments. One significant role these etudes will fill is providing a reliable
point of reference when making comparisons between
There is also extensive historical precedent for sharing performers, performances, different instruments (level of
repertoire across instruments, especially when the repertoire for subtlety and expressiveness achievable; ease of learning;
one instrument is lacking. For example, several sonatas in the performer reactions), and different mappings. Each etude will
violin canon (Franck, Mozart, and Prokofiev) are commonly also focus on different musical or control elements, allowing a
also played on the flute, and the Bach Sonatas for solo cello are user to quickly determine the controller’s effectiveness and
performed on many instruments, including trombone and ability in each aspect of music.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

The etudes may also be a good test of which type of controller Interactive electronic music is an emerging field that has yet to
might be best suited for a certain piece of music. This could be solidly establish a repertoire or performance practice. I believe
especially useful while the piece is still being composed. A one of the most important steps in developing both of these
more skilled performer could easily learn these basic etudes on fundamental parts of a musical genre is to create a method for
several different controllers and quickly evaluate their learning performance technique. In the near future I hope to see
effectiveness on many musical levels. As Wanderley and Orio strong performances of well-written pieces replacing the
state, “Musical tasks are already part of the evaluation process demonstrations and experiments that currently occupy many
of acoustic musical instruments, because musicians and concert spots. For this to occur I believe composers, instrument
composers seldom choose an instrument without extensive designers and performers must work together.
testing to how specific musical gestures can be performed.” [6]
These etudes can strengthen such collaborations by providing a
foundation for evaluation of both the instrument and the
performer. This basis for evaluation is an essential ingredient in
building a lasting repertoire for interactive instruments.

I am very grateful to Dr. Bob Pritchard and Dr. Keith Hamel for
their support of this project, programming help, and generous
feedback on my work. Thank you also to my husband Michael
Begg for his invaluable editing skills. This project is supported
in part through the Social Science and Humanities Research
Council of Canada, grant 848-2003-0147, and by the University
of British Columbia Media And Graphics Interdisciplinary
Centre (MAGIC) the UBC Institute for Computing, Information
and Cognitive Science (ICICS), and the School of Music.

[1] Dobrian, C., and Koppelman, D. “The ‘E’ in NIME:
Musical Expression with New Computer Interfaces”.
Proceedings of the 2006 Conference on New Interfaces for
Musical Expression (NIME06), Paris, France, 2006.
[2] Fels, S., Gadd, A., and Mulder, A. “Mapping transparency
through metaphor: towards more expressive musical
instruments”. Organised Sound 7:2, 109-126. Cambridge
Figure 6. Etude 2 User Interface University Press, 2002.
[3] Ferguson, H., and Hamilton, K. L. “Study”. Grove Music
Online. L. Macy, ed.
5. CONCLUSIONS [4] Jander, O. “Virtuoso”. Grove Music Online. L. Macy, ed.
My primary goals in writing these etudes are to:
1. Create a learning environment in which beginners can [5] Lazzetta, F. “Meaning in Musical Gesture”. Trends in
experience a non-intimidating introduction to interactive Gestural Control of Music, M. M. Wanderley and M.
performance. Battier, eds. Paris, Fr: IRCAM - Centre Georges
2. Encourage other composers and performers to create their Pompidou, 2000.
own etudes and pieces that can be exchanged to broaden the [6] Wanderley, M. M., and Orio, N. “Evaluation of Input
level of shared knowledge, and help to define the skills needed Devices for Musical Expression: Borrowing Tools from
for performing on interactive electronic instruments. HCI”. Computer Music Journal 26:3, 62-76. MIT Press,
3. Create a tool that will guide performers and instrument 2002.
builders towards higher levels of control and musical

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Discourse analysis evaluation method for expressive

musical interfaces

Dan Stowell, Mark D. Plumbley, Nick Bryan-Kinns

Centre for Digital Music
Queen Mary, University of London
London, UK

ABSTRACT sion” [4].

The expressive and creative affordances of an interface are Using precision-of-reproduction as a basis for evaluation
difficult to evaluate, particularly with quantitative methods. also becomes problematic for musical systems which are not
However, rigorous qualitative methods do exist and can be purely deterministic. “Randomness” would seem to be the
used to investigate such topics. We present a methodology antithesis of precision, and therefore undesirable according
based around user studies involving Discourse Analysis of to some perspectives, yet there are many musical systems
speech. We also present an example of the methodology in which stochastic or chaotic elements are deliberately in-
in use: we evaluate a musical interface which utilises vocal troduced.
timbre, with a user group of beatboxers. The question arises of how to evaluate interfaces more
broadly than precision-of-reproduction. It is difficult to de-
sign an experiment that can reliably and validly measure
Keywords qualities such as expressiveness and aesthetics.
Evaluation, qualitative methods, discourse analysis, voice, Poepel [10] operationalises “expressivity” into a number
timbre, beatboxing of categories for stringed-instrument playing, and investi-
gates these numerically using tasks followed by Likert-scale
1. INTRODUCTION questionnaires. This limits users’ responses to predefined
categories, although a well-designed questionnaire can yield
One of the motives for founding the NIME conference was useful results. Unfortunately Poepel analyses the data us-
to foster dialogue on the evaluation of musical interfaces ing mean and ANOVA, which are inappropriate for Likert-
[11]. Yet a scan of NIME conference proceedings finds only scale (ordinal) data [6]. The questionnaire approach also
a few papers devoted to the development or application of largely reduces “expressivity” down to “precision” since in
rigorous evaluation methods. Many published papers do not this case, the tasks presented concern the reproduction of
include evaluation, or include only informal evaluation (e.g. musical units such as vibrato and dynamical changes.
quotes from, or general summaries of, user interviews). This Paine et al [9] use a qualitative analysis of semi-structured
may of course be fine, depending on the paper’s purpose and interviews with musicians, to derive “concept maps” of fac-
context, and the stage of development of the research. But tors involved in expressive performance (for specific instru-
the further development of well-founded evaluation methods ments). These are not used for evaluation, rather to guide
can only be of benefit to the field. design. In the evaluation of their instrument, the authors
In a very useful discussion, Wanderley and Orio [18] look turn to a quantitative approach, analysing how closely users
to the wider field of Human-Computer Interaction (HCI) for can match the control data used to generate audio exam-
applicable methodologies, and suggest specific approaches ples.
for evaluating musical interfaces. Much of HCI focuses on We propose that qualitative methods approaches may
interfaces which can be evaluated using goal-based tasks, prove to be useful tools for the evaluation of musical in-
where measurements can be made of (for example) how terfaces. This paper aims to be a contribution in that area,
long a task takes, or how often users fail to achieve the applying a rigorous qualitative method to study the use and
goal. Wanderley and Orio’s framework follows this route, affordances of a new musical interface.
recommending that experimenters evaluate users’ precision
in reproducing musical units such as glissandi or arpeggios.
Later work uses Wanderley and Orio’s framework [9, 10].
1.1 Discourse Analysis
Precision is important for accurate reproduction. But for Interviews and free-text comments are sometimes reported
composers, sound designers, and performers of expressive in studies on musical interfaces. However, often they are
or improvised music, it is not enough: interfaces should conducted in a relatively informal context, and only quotes
(among other things) be in some sense intuitive and offer or summaries are reported rather than any structured anal-
sufficient freedom of expression [11, 8]. “Control = expres- ysis, therefore providing little analytic reliability. Good
qualitative methods penetrate deeper than simple summaries,
offering insight into text data [1]. Discourse Analysis (DA)
is one such approach, developed and used in disciplines such
Permission to make digital or hard copies of all or part of this work for as linguistics, psychology, and social sciences [14, chapter 6].
personal or classroom use is granted without fee provided that copies are Essentially, DA’s strength comes from using a structured
not made or distributed for profit or commercial advantage and that copies method which can take apart the language used in dis-
bear this notice and the full citation on the first page. To copy otherwise, to courses (e.g. interviews, written works) and elucidate the
republish, to post on servers or to redistribute to lists, requires prior specific connections and implications contained within, while re-
permission and/or a fee.
NIME08, Genova, Italy maining faithful to the content of the original text [1]. DA
Copyright 2008 Copyright remains with the author(s). is designed to go beyond the specific sequence of phrases

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

used in a conversation, and produce a structured analysis

of the conversational resources used, the relations between
entities, and the “work” that the discourse is doing.
Uszkoreit [17] summarises the aim of DA very compactly:
The problems addressed in discourse research
aim to answer two general kinds of questions:
(1) what information is contained in extended
sequences of utterances that goes beyond the
meaning of the individual utterances themselves?
(2) how does the context in which an utterance
is used affect the meaning of the individual ut-
terances, or parts of them?
We should point out that DA is not usually regarded as one
single method – rather, it’s an approach to analysing texts.
Someone looking for the single recipe to perform a DA of
a text will be disappointed. However, specific DA methods
do exist in the literature. Our DA method is elaborated in
section 3.3.
In this paper we use DA to analyse interview data, in
the context of a project to develop voice-based interfaces
for controlling musical systems. First we give an overview
of the interface we wish to evaluate.
Figure 1: Constructing a timbre space for timbre
With recent improvements in timbre analysis and in com-
puter power, the potential arises to analyse the timbre of a
signal in real-time, and to use this analysis as a controller
for synthesis or for other processes – in particular, the po- Such a mapping from one timbre space to another de-
tential to “translate” the timbral variation of one source pends on being able to find a suitable “nearest neighbour”
into the timbral variation of another source. This is the in the target space. This is facilitated if the spaces are cov-
process which we refer to as timbre remapping [16]. De Poli ered by a similar distribution of data points, ensuring that
and Prandoni [3] made an early attempt at such control, the resolution of a timbral trajectory can be adequately re-
more recently investigated by Puckette [13]. flected in the target timbre space. This is why we perform
One of the main issues is the construction of a useful a warping during the construction of the timbre space: it
timbre space for the purpose of timbre remapping. Timbre ensures that the timbre dimensions are covered in a certain
is often very loosely defined, and often taken to refer to all way (guaranteeing various aspects of the distribution such
aspects of a sound beyond its pitch and loudness [7]. There as that it is centred and its standard deviation lies within
are many options as to which acoustic features to derive, a certain range).
and how to transform them, so as to provide a continuous One aspect of our timbre remapping system is that we
space that provides useable control to the performer. Some typically wish to remove pitch-dependencies from the tim-
features exhibit interactions with pitch, and the variation of bral data. Many acoustic measures such as MFCCs or spec-
some features may depend strongly upon the type of source. tral percentile statistics can exhibit interactions with pitch.
In the present work we derive a heterogeneous set of Our current approach to mitigating this is to include a pitch
timbral features, mostly spectral but some time-domain. analysis as one of the features passed to the PCA and there-
We then apply a Principal Components Analysis (PCA) to fore used in constructing the space. We then identify the
decorrelate the features and reduce dimensionality. Finally PCA component with the largest contribution from pitch,
we apply a piecewise linear warping (using the range, mean, and discard that, on the assumption that it is essentially
and standard deviation statistics) to shape the distribution composed of “pitch plus the pitch-dependent components of
of data points; we will come back to the reasons for this other features”. This approach makes simplifying assump-
shortly. The construction of the timbre space is summarised tions such as the linearity of pitch and timbre dimensions
in figure 1. and their interaction, but it leads to usable results in our
Thus far we have a procedure for creating a timbre space experience.
based on any input signal. We might want to analyse two
different classes of signal in this way, and then map the tim- 2.1 Real-time operation
bral trajectory of one system onto another: for example, use We wish to develop a timbre remapping system that can
the timbral trajectory of a voice to control the settings of operate efficiently in real-time, so the relative speed and ef-
a synthesiser, and produce the corresponding timbral tra- ficiency of the processes used is paramount. In fact this is
jectory. To do this, we take a point in the voice’s timbre the strongest motivation behind using PCA for the decor-
space, and find its nearest neighbour in the synthesiser’s relation, dimension reduction, and pitch-removal. PCA is
timbre space. If we can retrieve the synthesiser parameters a straightforward process and computationally very simple
which created this timbre, we can send those parameters to to apply. More sophisticated methods, including non-linear
the synthesiser, thus “remapping” from the vocal timbre to methods, exist, and may be capable of improved results
the synthesiser timbre. This approach has the advantage of (such as better pitch-removal), but imply a significant cost
being independent of the exact relation of the target sys- in terms of the processing power required.
tem’s control space to its timbre space: it works even if the Efficiency is also important in the process which retrieves
target system’s controls have highly nonlinear and obscure a nearest-neighbour data point from the target system’s
relation to the timbres produced. timbre space. We use a k d-tree data structure [12, chapter

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

2] for fast multidimensional search. The focus-group tradition provides a well-studied approach
to such group discussion [15]. Our group session has a lot
3. METHOD in common with a typical focus group in terms of the fa-
cilitation and semi-structured group discussion format. In
In evaluating a musical interface such as the above, we
addition we make available the interface(s) under consid-
wish to develop a qualitative method which can explore
eration and encourage the participants to experiment with
issues such as expressivity and affordances for users. Lon-
them during the session.
gitudinal studies may be useful, but imply a high cost in
As in the solo sessions, the transcribed conversation is
time and resources. Therefore our design aims to provide
the data to be analysed, which means that a neutral facili-
users with a brief but useful period of exploration of a new
tation technique is important – to encourage all participants
musical interface, including interviews and discussion which
to speak, to allow opposing points of view to emerge in a
we can then analyse.
non-threatening environment, and to allow the group to ne-
In any evaluation of a musical interface one must decide
gotiate the use of language with minimal interference.
the context of the evaluation. Is the interface being eval-
uated as a successor or alternative to some other interface 3.3 Data analysis
(e.g. an electric cello vs an acoustic cello)? Who is ex-
Our DA approach to analysing the data is based on that
pected to use the interface (e.g. virtuosi, amateurs, chil-
of [2, p. 95–102], adapted to our study context. The DA of
dren)? Such factors will affect not only the recruitment
text is a relatively intensive and time-consuming method.
of participants but also some aspects of the experimental
It can be automated to some extent, but not completely,
because of the close linguistic attention required. Our ap-
Our method is designed either to trial a single interface
proach consists of the following five steps:
with no explicit comparison system, or to compare two sim-
ilar systems (as is done below in our case study). The (a) Transcription
method consists of two types of user session (solo sessions
The speech data is transcribed, using a standard style of
followed by group session(s)), plus the Discourse Analysis
notation which includes all speech events (including repe-
of data collected.
titions, speech fragments, pauses). This is to ensure that
3.1 Solo sessions the analysis can remain close to what is actually said, and
avoid adding a gloss which can add some distortion to the
In order to explore individuals’ personal responses to the
data. For purposes of analytical transparency, the tran-
interface(s), we first perform solo sessions in which a partic-
scripts (suitably anonymised) should be published alongside
ipant is invited to try out the interface(s) for the first time.
the analysis results.
If there is more than one interface to be used, the order of
presentation is randomised in each session. (b) Free association
The solo session consists of three phases for each interface:
Having transcribed the speech data, the analyst reads it
Free exploration The participant is encouraged to try out through and notes down surface impressions and free asso-
the interface for a while and explore it in their own ciations. These can later be compared against the output
way. from the later stages.

Guided exploration The participant is presented with au- (c) Itemisation of transcribed data
dio examples of recordings created using the interface, The transcript is then broken down by itemising every sin-
and encouraged to create recordings inspired by those gle object in the discourse (i.e. all the entities referred to).
examples. This is not a precision-of-reproduction task; Pronouns such as “it” or “he” are resolved, using the par-
precision-of-reproduction is explicitly not evaluated, ticipant’s own terminology as far as possible, and for every
and participants are told that they need not replicate object an accompanying description is extracted, of the ob-
the examples. ject as it is in that instance – again using the participant’s
Semi-structured interview The interview’s main aim is own language, essentially by rewriting the sentence/phrase
to encourage the participant to discuss their experi- in which the instance is found.
ences of using the interface in the free and guided ex- The list of objects is scanned to determine if different
ploration phases, both in relation to prior experience ways of speaking can be identified at this point. Also, those
and to the other interfaces presented if applicable. objects which are also “actors” (or “subjects”) are identified
Both the free and guided phases are video recorded, – i.e. those which act with agency in the speech instance;
and the interviewer may play back segments of the they need not be human.
recording and ask the participant about them, in or- It is helpful at this point to identify the most commonly-
der to stimulate discussion. occurring objects and actors in the discourse.

The raw data to be analysed is the interview transcript. (d) Reconstruction of the described world
Our aim is for the participant to construct their own de- Starting with the list of most commonly-occurring objects
scriptions and categories, which means it is very important and actors in the discourse, the analyst reconstructs the
that the interviewer is experienced in neutral interview tech- depictions of the world that they produce. This could for
nique, and can avoid (as far as possible) introducing labels example be achieved using concept maps to depict the inter-
and concepts that do not come from the participant’s own relations between the actors and objects. If different ways
language patterns. of speaking have been identified, there will typically be one
reconstructed “world” per way of speaking. Overlaps and
3.2 Group session contrasts between these worlds can be identified.
To complement the solo sessions we also conduct a group The “worlds” we produce are very strongly tied to the
session: peer group discussion can produce more and dif- participant’s own discourse. The actors, objects, descrip-
ferent discussion around a topic, and can demonstrate the tions, relationships, and relative importances, are all de-
group negotiation of categories, labels, comparisons, etc. rived from a close reading of the text. These worlds are

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

essentially just a methodically reorganised version of the available online1 . In the study, condition “Q” was used to
participant’s own language. refer to the system with timbre remapping active, “X” for
In our particular context, we may be interested in the the system with timbre remapping inactive.
user’s conceptualisation of musical interfaces. It is partic-
ularly interesting to look at how these are situated in the 4.1 Reconstruction of the described world
described world, and particularly important to avoid pre-
conceptions about how users may describe an interface: for User 1
example, a given interface could be: an instrument; an ex-
User 1 expressed positive sentiments about both Q and X,
tension of a computer; two or more separate items (e.g. a
but preferred Q in terms of sound quality, ease of use and
box and a screen); an extension of the individual self; or it
being “more controllable”. In both cases the system was
could be absent from the discourse.
construed as a reactive system, making noises in response
(e) Examining context to noises made into the microphone; there was no concep-
tual difference between Q and X – for example in terms of
The relevant context of the discourse typically depends on affordances or relation to other objects.
the field of study, for example whether it is political or The “guided exploration” tasks were treated as reproduc-
psychological. Here we have created an explicit context tion tasks. User 1 described the task as difficult for X, and
of other participants. After running the previous steps of easier for Q, and situated this as being due to a difference
DA on each individual transcript, we compare and contrast in “randomness” (of X) vs. “controllable” (of Q).
the described worlds produced from each transcript, first
comparing those in the same experimental condition (i.e. User 2
same order of presentation, if relevant), then across all par-
ticipants. We also compare the DA of the focus group ses- User 2 found the the system (in both modes) “didn’t sound
sion(s) against that of the solo sessions. very pleasing to the ear”. His discussion conveyed a per-
vasive structured approach to the guided exploration tasks,
in trying to infer what “the original person” had done to
4. THE METHOD IN ACTION: EVALUAT- create the examples and to reproduce that. In both Q and
ING VOICE TIMBRE REMAPPING X the approach and experience was the same.
In our study we wished to evaluate the timbre remapping Again, User 2 expressed preference for Q over X, both
system with beatboxers (vocal percussion musicians), for in terms of sound quality and in terms of control. Q was
two reasons: they are one target audience for the technol- described as more fun and “slightly more funky”. Interest-
ogy in development; and they have a familiarity and level ingly, the issues that might bear upon such preferences are
of comfort with manipulation of vocal timbre that should arranged differently: issues of unpredictability were raised
facilitate the study sessions. for Q (but not X), and the guided exploration task for Q
We recruited by advertising online (a beatboxing website) was felt to be more difficult, in part because it was harder
and around London for amateur or professional beatboxers. to infer what “the original person” had done to create the
Participants were paid £10 per session plus travel expenses examples.
to attend sessions in our (acoustically-isolated) studio. We
recruited five participants from the small community, all
User 3
male and aged 18–21. One took part in a solo session; one User 3’s discourse placed the system in a different context
in the group session; and three took part in both. Their compared to others. It was construed as an “effect plugin”
beatboxing experience ranged from a few months to four rather than a reactive system, which implies different affor-
years. Their use of technology for music ranged from min- dances: for example, as with audio effects it could be ap-
imal to a keen use of recording and effects technology (e.g. plied to a recorded sound, not just used in real-time; and the
Cubase). description of what produced the audio examples is cast in
In our study we wished to investigate any effect of provid- terms of an original sound recording rather than some other
ing the timbre remapping feature. To this end we presented person. This user had the most computer music experience
two similar interfaces: both tracked the pitch and volume of of the group, using recording software and effects plugins
the microphone input, and used these to control a synthe- more than the others, which may explain this difference in
siser, but one also used the timbre remapping procedure to contextualisation.
control the synthesiser’s timbral settings. The synthesiser User 3 found no difference in sound or sound quality be-
used was an emulated General Instruments AY-3-8910 [5], tween Q and X, but found the guided exploration of X more
which was selected because of its wide timbral range (from difficult, which he attributed to the input sounds being more
pure tone to pure noise) with a well-defined control space varied.
of a few integer-valued variables. We used the method as
described in section 3. Analysis of the interview transcripts User 4
took approximately 10 hours per participant (around 2000 User 4 situated the interface as a reactive system, similar
words each). to Users 1 and 2. However, the sounds produced seemed to
We do not report a detailed analysis of the group session be segregated into two streams rather than a single sound –
transcript here: the group session generated information a “synth machine” which follows the user’s humming, plus
which is useful in the development of our system, but little “voice-activated sound effects”. No other users used such
which bears directly upon the presence or absence of timbral separation in their discourse.
control. We discuss this outcome further in section 5. “Randomness” was an issue for User 4 as it was for some
In the following, we describe the main findings from anal- others. Both Q and X exhibited randomness, although X
ysis of the solo sessions, taking each user one by one before was much more random. This randomness meant that User
drawing comparisons and contrasts. We emphasise that 4 found Q easier to control. The pitch-following sound was
although the discussion here is a narrative supported by
quotes, it reflects the structures elucidated by the DA pro- 1
cess – the full transcripts and Discourse Analysis tables are 2008/Stowell08nime-data/

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

felt to be accurate in both cases; the other (sound effects / A uniform outcome from all participants was the con-
percussive) stream was the source of the randomness. scious interpretation of the guided exploration tasks as precision-
In terms of the output sound, User 4 suggested some small of-reproduction tasks. This was evident during the study
differences but found it difficult to pin down any particular sessions as well as from the discourse around the tasks. As
difference, but felt that Q sounded better. one participant put it, “If you’re not going to replicate the
examples, what are you gonna do?”
4.2 Examining context A notable absence from the discourses, given our research
context, was discussion which might bear on expressivity,
Effect of order-of-presentation for example the expressive range of the interfaces. Towards
Users 1 and 2 were presented with the conditions in the or- the end of each interview we asked explicitly whether either
der XQ; Users 3 and 4 in the order QX. Order-of-presentation of the interfaces was more expressive, and responses were
may have some small influence on the outcomes: Users 3 generally non-commital. We propose that this was because
and 4 identified little or no difference in the output sound our tasks had failed to engage the participants in creative
between the conditions (User 4 preferred Q but found the or expressive activities: the (understandable) reduction of
difference relatively subtle), while Users 1 and 2 felt more the guided exploration task to a precision-of-reproduction
strongly that they were different and preferred the sound of task must have contributed to this. We also noticed that
Q. It would require a larger study to be confident that this our study design failed to encourage much iterative use of
difference really was being affected by order-of-presentation. record-and-playback to develop ideas. In section 5 we sug-
In our study we are not directly concerned with which gest some possible future directions to address these issues.
condition sounds better (both use the same synthesiser in
the same basic configuration), but this is an interesting as- 5. DISCUSSION
pect to come from the study. We might speculate that
The analysis of the solo sessions provides useful infor-
differences in perceived sound quality are caused by the dif-
mation on the user experience of a voice-controlled music
ferent way the timbral changes of the synthesiser are used.
system and the integration of timbre remapping into such
However, participants made no conscious connection be-
a system. Here we wish to focus on methodological issues
tween sound quality and issues such as controllability or
arising from the study.
Above we raised the issue that our “guided exploration”
Considerations across all participants task, in which participants were asked to record a sound
sample on the basis of an audio example, was interpreted
Taking the four participant interviews together, no strong as a precision-of-reproduction task. Possibilities to avoid
systematic differences between Q and X are seen. All par- this in future may include: using audio examples which
ticipants situate Q and X similarly, albeit with some nu- are clearly not originally produced using the interface (e.g.
anced differences between the two. Activating/deactivating string sections, pop songs), or even non-audio prompts such
the timbre remapping facet of the system does not make a as pictures; or forcing a creative element by providing two
strong enough difference to force a reinterpretation of the examples and asking participants to create a new recording
system. which combines elements of both.
A notable aspect of the four participants’ analyses is the Other approaches which encourage creative work with an
differing ways the system is situated (both Q and X). As interface could involve tasks in which participants are asked
designers of the system we may have one view of what the to create compositions, or iteratively develop live perfor-
system “is”, perhaps strongly connected with technical as- mance. We would expect that the use of more creative
pects of its implementation, but the analyses presented here tasks should produce more participant discussion of cre-
illustrate the interesting way that users situate a new tech- ative/expressive aspects of an interface.
nology alongside existing technologies and processes. The Such tasks could also be used to provide more structure
four participants situated the interface in differing ways: ei- during the group sessions: one reason the group session
ther as an audio effects plugin, or a reactive system; as a produced less relevant data than the solo sessions is (we
single output stream or as two. We emphasise that none believe) the lack of activities, which could have provided a
of these is the “correct” way to conceptualise the interface. more structured exploration of the interfaces.
These different approaches highlight different facets of the
interface and its affordances.
During the analyses we noted that all participants main- 6. CONCLUSIONS
tained a conceptual distance between themselves and the We have applied a detailed qualitative analysis to user
system, and analogously between their voice and the output studies involving a voice-driven musical interface with and
sound. There was very little use of the “cyborg” discourse without the use of timbre remapping. It has raised some
in which the user and system are treated as a single unit, interesting issues in the development of the interface, in-
a discourse which hints at mastery or “unconscious compe- cluding the unproblematic integration of the timbral aspect,
tence”. This fact is certainly understandable given that the and the nuanced interaction of issues such as control and
participants each had less than an hour’s experience with randomness.
the interface. It demonstrates that even for beatboxers with However, the primary aim of this paper has been to in-
strong experience in manipulation of vocal timbre, control- vestigate the use of Discourse Analysis to provide a robust
ling the vocal interface requires learning – an observation qualitative approach to evaluating the affordances and user
confirmed by the participant interviews. experience of a musical interface. Results from our DA-
The issue of “randomness” arose quite commonly among based user study indicate that with some modification of
the participants. However, randomness emerges as a nu- the user tasks, the method can derive detailed information
anced phenomenon: although two of the participants de- about how musicians interact with a new musical interface
scribed X as being more random than Q, and placed ran- and accommodate it in their existing conceptual repertoire.
domness in opposition to controllability (as well as prefer- We have presented one specific method for evaluating a
ence), User 2 was happy to describe Q as being more random musical interface, but of course there may be other appro-
and also more controllable (and preferable). priate methods. As discussed in the introduction, the state

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

of the art in evaluating musical interfaces is relatively un- [18] M. M. Wanderley and N. Orio. Evaluation of input
derdeveloped, and we would hope to encourage others to devices for musical expression: Borrowing tools from
explore reliable methods for evaluating new musical inter- HCI. Computer Music Journal, 26(3):62–76, 2002.
faces in authentic contexts.

[1] C. Antaki, M. Billig, D. Edwards, and J. Potter.
Discourse analysis means doing analysis: A critique of
six analytic shortcomings. Discourse Analysis Online,
[2] P. Banister, E. Burman, I. Parker, M. Taylor, and
C. Tindall. Qualitative Methods in Psychology: A
Research Guide. Open University Press, Buckingham,
[3] G. De Poli and P. Prandoni. Sonological models for
timbre characterization. Journal of New Music
Research, 26(2):170–197, 1997.
[4] C. Dobrian and D. Koppelman. The ‘E’ in NIME:
musical expression with new computer interfaces. In
Proceedings of New Interfaces for Musical Expression
(NIME), pages 277–282. IRCAM, Centre Pompidou
Paris, France, 2006.
[5] General Instrument. GI AY-3-8910 Programmable
Sound Generator datasheet, early 1980s.
[6] J. Harvey. Evaluation Cookbook, chapter So You Want
to Use a Likert Scale? Learning Technology
Dissemination Initiative, 1998.
[7] J. Kreiman, D. Vanlancker-Sidtis, and B. R. Gerratt.
Defining and measuring voice quality. In Proceedings
of From Sound To Sense: 50+ Years of Discoveries in
Speech Communication, pages 115–120. MIT, June
[8] T. Magnusson and E. H. Mendieta. The acoustic, the
digital and the body: A survey on musical
instruments. In Proceedings of New Interfaces for
Musical Expression (NIME), June 2007.
[9] G. Paine, I. Stevenson, and A. Pearce. The thummer
mapping project (ThuMP). In Proceedings of New
Interfaces for Musical Expression (NIME), pages
70–77, 2007.
[10] C. Poepel. On interface expressivity: A player-based
study. In Proceedings of New Interfaces for Musical
Expression (NIME), pages 228–231, 2005.
[11] I. Poupyrev, M. J. Lyons, S. Fels, and T. Blaine. New
Interfaces for Musical Expression. Workshop proposal,
[12] F. P. Preparata and M. I. Shamos. Computational
Geometry: An Introduction. Springer-Verlag, 1985.
[13] M. Puckette. Low-dimensional parameter mapping
using spectral envelopes. In Proceedings of the
International Computer Music Conference
(ICMC’04), pages 406–408, 2004.
[14] D. Silverman. Interpreting Qualitative Data: Methods
for Analysing Talk, Text and Interaction. Sage
Publications Inc, 2nd edition, 2006.
[15] D. W. Stewart. Focus groups: Theory and practice.
SAGE Publications, 2007.
[16] D. Stowell and M. D. Plumbley. Pitch-aware real-time
timbral remapping. In Proceedings of the Digital
Music Research Network (DMRN) Summer
Conference, July 2007.
[17] H. Uszkoreit. Survey of the State of the Art in Human
Language Technology, chapter 6 (Discourse and
Dialogue). Center for Spoken Language
Understanding, Oregon Health and Science
University, 1996.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

HCI Methodology For Evaluating Musical Controllers: A

Case Study

Chris Kiefer Nick Collins Geraldine Fitzpatrick

Department of Informatics Department of Informatics Interact Lab
University of Sussex, University of Sussex, Department of Informatics
Brighton, UK Brighton, UK University of Sussex,
Brighton, UK

ABSTRACT tion of musical controllers. Since then, research in this area

There is small but useful body of research concerning the has been relatively sparse and the adoption of these evalu-
evaluation of musical interfaces with HCI techniques. In ation techniques by the community seems relatively low. A
this paper, we present a case study in implementing these review of the 2007 NIME proceedings, for example, showed
techniques; we describe a usability experiment which eval- that 37% of papers presenting new instruments described
uated the Nintendo Wiimote as a musical controller, and some sort of formal usability testing, though often not ref-
reflect on the effectiveness of our choice of HCI methodolo- erenced to the wider HCI literature. One possible reason
gies in this context. The study offered some valuable results, for the slow uptake of HCI methods is that the practical-
but our picture of the Wiimote was incomplete as we lacked ities of carrying out a usability study are something of a
data concerning the participants’ instantaneous musical ex- black box as, understandably, papers tend to focus on re-
perience. Recent trends in HCI are leading researchers to sults rather than methodology. It is clear that there is a lot
tackle this problem of evaluating user experience; we review more that could be done to draw on HCI; this paper is a
some of their work and suggest that with some adaptation it modest response to this challenge by explicitly articulating
could provide useful new tools and methodologies for com- the processes and lessons learnt in applying HCI methodol-
puter musicians. ogy within the music field.
To date there is only limited HCI literature which fo-
cusses specifically on computer music. Höök [5] examines
Keywords the use of HCI in interactive art, an area which shares
HCI Methodology, Wiimote, Evaluating Musical Interac- common ground with computer music. She describes her
tion methodology for evaluating interaction in an installation,
and examines the issue of assessing usability when artists
1. INTRODUCTION might want to build systems for unique rather than ‘normal’
users; music shares similar characteristics with art. Poepel
A deep understanding of a musical interface is a desir-
[10] presents a method for evaluating instruments through
able thing to have. It can provide feedback which leads to
the measurement of musical expressivity. This technique is
an improved design and therefore a better creative system;
based on psychology research on cues for musical expres-
it can show whether a design functions as it was designed
sion; it evaluates players’ estimations of a controller’s capa-
to, and whether it functions in ways which may have been
bility for creating these cues. Wanderley and Orio [11] have
unexpected. The field of Human Computer Interaction [4]
conducted the most comprehensive review of HCI usabil-
provides tools and methodologies for evaluating computer
ity methodologies which can be applied to the evaluation
interfaces, but applying these to the specific area of com-
of musical systems. They discuss the importance of test-
puter music can be problematic. HCI methodology has
ing within well defined contexts or metaphors, and suggest
evolved around a task-based paradigm and the stimulus-
some that are commonly found in computer music. They
response interaction model of WIMP systems, as opposed
propose the use of simplistic musical tasks for evaluation,
to the richer and more complex interactions that occur be-
and highlight features of controllers which are most relevant
tween musicians and machines. Höök [5], discussing the re-
in usability testing: learnability, explorability, feature con-
lationship between HCI and installation art, suggests that
trollability and timing controllability. Their research fitted
art and HCI are not easily combined, and this may also be
best with our objectives for evaluating the Wiimote, and
true in the multi-disciplinary field of computer music.
had the largest influence of the methodology used.
Wanderley and Orio’s article [11] from 2002 built a bridge
We present a case study on the musical usability of the
between HCI usability evaluation methodology and com-
Nintendo Wiimote. This practical example will help to
puter music, reviewing current HCI research and suggesting
ground this research and provide a talking point for the
ways in which it could be applied specifically to the evalua-
employment of HCI evaluation for interactive systems. We
will go on to review more recent developments in HCI, from
the so called ‘third paradigm’, and discuss how they might
be applied in our field in the future.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to 2. A CASE STUDY
republish, to post on servers or to redistribute to lists, requires prior specific The semi-ubiquitous Nintendo Wiimote is becoming pop-
permission and/or a fee.
NIME08, Genoa, Italy ular with musicians, as can be seen from the multiplicity of
Copyright 2008 Copyright remains with the author(s). demo videos on YouTube. This motivated us to carry out

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

each task the participants were given a period of practice

time; after each task they would be interviewed while the
experience was still fresh, and asked about their preferred
controller. We considered using the ‘think aloud’ method
of gathering data during the tasks, but decided that this
would be incompatible with a musical study as it would
distract the participants’ attention. All data was recorded
for later analysis.
A script was written which described the events in the
experiment and the wording of the interview questions in
order to help the experimenter keep these constant for each
participant. Participants were asked up front about factors
which might affect the experiment such as musical experi-
ence and experience of using the Wiimote. After each task,
questions probed their experience in using the controllers.
To reduce learning effect, the order of use of the HandSonic
and the Wiimote were alternated between participants.
A call for participants was sent out to university mailing
Figure 1: The Nintendo Wiimote and the Roland
lists and local musicians, 21 people volunteering in total.
The study commenced as a rolling pilot, with experimental
parameters being checked and adjusted until a stable setup
a formal evaluation, asking the broad question how use- had been reached. This was important in particular to ad-
ful is the wiimote as a musical controller? Answering this just the difficulty of the tasks and to assess what could be
question presents a number of challenges. What should be fitted into the 30 minute runtime. The first four sessions
evaluated to give an overall picture of the device? How ended up as pilot sessions, so the final results were taken
can the capabilities of the controller be judged with a min- from the remaining 17 participants.
imum of influence on the results from software design and During the study participants were videoed, to observe
an acknowledgement of the potentially differing musical and their gestures while using the Wiimote and also to record
gaming skill levels of the participants? the interviews which occurred throughout the experiment.
The SuperCollider audio software [9] was used to construct
2.1 Experimental Design the experiment. This software allowed us to record a log file
The Wiimote is essentially a wireless 3-axis accelerometer of the participants’ actions which would be analysed later
with some buttons and an IR camera. Due to dependence for quantitative results.
on the force of gravity, only rotation around the roll and
tilt axes are effective for accurate measurement. Follow- 2.2 Post-Experiment Analysis
ing Wanderley and Orio’s guidelines, we decided to test the The initial data analysis fell into two main areas, the
core musical capabilities of the accelerometer using simplis- analysis of the qualitative interview data and of the quan-
tic musical tasks; an evaluation of the basic functions could titative log file data. Results of the analysis were stored in
be extrapolated from this to help assess the Wiimote’s use a MySQL database to facilitate flexible analysis later.
in more complex musical situations. The core functions that The analysis of the interview data happened over sev-
were tested were triggering (with drumming-like motions), eral stages. We used a process of reduction from the raw
continuous control using the roll and tilt axes, and gestu- videos to a final document containing statements summaris-
ral control using shape recognition. Continuous control was ing the participants’ answers to each question along with
divided into two categories; precise and expressive. any interesting quotes. Key parts of answers from the in-
Practical constraints had to be considered. The partici- terviews were transcribed from the video data and stored
pants in the study were volunteers, so a balance had to be in the database. These quotes were then coded according
struck between the length of the experiment and the de- to an emerging set of categories and then re-coded until
gree to which this might dissuade potential participants. A the categorisation was stable. The categorised set of quotes
length of 30 minutes was selected. was summarised to produce the final document of results.
Whenever possible, in order to give the participants a For the quantitative data, the log files were processed in
baseline for comparison of the Wiimote functions, a con- SuperCollider to extract specific data such as timing infor-
troller which represented a typical way of performing the mation from the triggering task. This data was exported to
musical tasks was provided. The Roland HPD-15 Hand- MATLAB for statistical analysis using ANOVA and other
Sonic was selected for this purpose, as it has a drum pad for tests.
comparison of triggering and knobs for comparison of con- Because we wish to concentrate on methodology, we only
tinuous control tasks. The data from this controller would have space to give highlights of the results of this study. In
also provide a basis for statistical comparisons. interview, several people commented on the lack of physical
The triggering task involved participants drumming a set feedback in the triggering task, saying that this made it dif-
of simple patterns along with a metronome, to obtain tim- ficult to determine the triggering point. The pitch task re-
ing data. The precise control task required participants to vealed some insights into the ergonomics of the device; some
co-ordinate discrete changes of pitch of a sawtooth wave to participants described how going past certain points of rota-
the beats of a metronome, and was repeated once for each tion felt unnatural. Some perceived it as less accurate than
Wiimote axis as well as for turning a knob on the Hand- the HandSonic. Participants commented on the Wiimote’s
Sonic. The expressive control task involved modifying the intuitive nature when used for expressive control. They de-
filter and grain density parameters of a synthesiser patch si- scribed it as ‘embodied’, and some felt that it widened the
multaneously. Finally, the gestural recognition task was to scope of editing possibilities. In general, many participants
control five tracks of percussion, by muting and un-muting commented on the fun aspect of using the Wiimote, even
layers through casting shapes with the Wiimote. Before when they may have preferred the HandSonic. An overall

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

criticism was of the device’s lack of absolute positioning ca- side of the controller, and insight into global trends of the
pability. The statistics revealed little significant difference participants. However, the conclusions reached from these
between the two controllers; participants displayed no over- results alone seemed to be a limited measure of the device
all preference and the timing errors showed no significant compared to the subtlety of the participants’ observations.
variance. Did the study result in a complete answer in relation to
the research question, how useful is the Wiimote as a musi-
2.3 Reflections cal controller? It’s difficult to answer this objectively, but it
What we’ve done by presenting this case study is to make can be observed that the results showed a detailed and inti-
explicit the practical issues of conducting a usability exper- mate understanding of the controller in a musical context.
iment. This is often mundane detail that gets omitted from One important thing the results do lack is any measure of
experimental reports, but may be the type of essential detail the participants’ experience while using the controller. The
that will make it easier to others to try out HCI methods. more interesting results came from post-task interviews, but
As such, it is worth noting a few key points about how the there is no data about their experience in the moment while
study was implemented. Firstly, the importance of a pilot they were using the device, something that would seem im-
study is easy to under-estimate. The best way to expose portant for a musical evaluation. This gap in the results is
flaws in a script is to put it into practice; for valid results the partly due to lack of technology and partly due to a lack
experimental parameters need to stay constant throughout of methodology. How can musicians self-report their ex-
the study, so flaws need to be removed at this early stage. perience while they are using a musical controller without
In retrospect, the difficulty of some tasks in the Wiimote disrupting the experience itself? Are there post-task evalua-
study could still have been better optimised at pilot stage tion techniques that can give a more accurate and objective
to suit the range of participants’ skill levels. analysis of a musical experience than an interview? More
Secondly, an issue of particular importance in a musical recent research in HCI is starting to address similar issues
usability study is allotted practice time. There’s a lower and can point to possibilities.
limit on the time participants need to spend becoming ac-
customed to the features of an instrument; getting this 3.1 The ‘Third Paradigm’
amount wrong can result in unrepresentative attempts at Kaye et. al. [7], in 2007, described a growing trend in
a task, concealing the true results. Again, this is something HCI research towards experience focused rather than task
which can be assessed during the pilot study. focused HCI. With this trend comes the requirement for new
Thirdly, the gathering of empirical data presents some evaluation techniques to respond to the new kinds of data
challenges. In order for the data to be valid, the partic- being gathered. This trend is a response to the evolving
ipants needed to perform the tasks in the same way, al- ways in which technology is utilised as computing becomes
though getting people to perform a precise task can be dif- increasingly embedded in daily life, a shift in focus away
ficult especially when you have creative people performing from productivity environments [8], and from evaluation of
a creative task. There needs to be some built in flexibility efficiency to evaluation of affective qualities [3]. As HCI is
in the tasks which allows for this. increasingly involved in other ‘highly interactive’ fields of
Finally, the time and effort in transcribing interviews can- computing such as gaming and virtual reality, the require-
not be under estimated. Even supposing voice-recognition ment for evaluating user experience becomes stronger. This
software of sufficient accuracy was available to help avoid new trend is known as the ‘third paradigm’, and researchers
the hard slog of manual annotation, it might be at the have started to tackle some of the challenges presented by
cost of the researcher not engaging so deeply with the data this approach.
by parsing it themselves. An alternative approach is tran- The Sensual Evaluation Instrument (SEI), designed by
scribing just the ‘interesting’ sections, which can save time, Isbister [6], is a means of self-reporting affect while
though this selection process entails some subjectivity. Tag- interacting with a computer system. Users utilise a set of
ging log file data for analysis was also a long process, as the biomorphic sculptured shapes to provide feedback in real-
correct data had to be found manually by comparison to time. Intuitive and emotional interaction occur cognitively
the video; this could have been improved if the logging had on a sub-symbolic level so the system uses non-verbal com-
been automatically synchronised to the video data. munication in order to more directly represent this. With
its sub-verbal reporting method, the SEI is a step in the
3. DISCUSSION right direction for evaluation of musical interfaces; however,
The previous section discussed the details of applying the as the reporting technique already involves some interaction
specific methodology we used in this study. It is also useful itself, it could only be used effectively in less interactive con-
to reflect more generally on the structuring of the case study texts such as evaluating some desktop software. The most
and the efficacy of the HCI evaluation. Was it useful to dynamic example of its use is from the designers’ tests with
carry out the Wiimote usability study with the methods we a computer game, and they acknowledge in their results
chose? Where were the gaps in the results and how could that it’s not ideal for time-critical interfaces or tasks that
the methodology be improved to narrow these gaps? require fine-grained data.
The most ‘interesting’ results came from analysis of the For more interactive tasks such as playing a musical con-
interview data. The interviews confirmed some expected troller, a non-interactive data gathering mechanism is es-
results about the controller but more usefully brought up sential, so the measurement of physiological data may yield
some unexpected issues that some people found with certain realtime readings without interrupting the users’ attention.
tasks, and some surprising suggestions about how the con- Some studies concentrate on this area of evaluation. Chateau
troller could be used. This is the kind of data that shows and Mersiols’ AMUSE system [1] is designed to collect and
the benefits of conducting a usability study, the kind of synchronise multiple sources of physiological data to mea-
data that is difficult to determine purely by intuition alone sure a user’s instantaneous reaction while they interact with
and that is best collected from the observations of a larger a computer system. This data might include eye gaze,
group of people. From the remaining results, the quantita- speech, gestures and physiological readings such as EMG,
tive results provided objective backup to certain elements of ECG, EEG, skin conductance and pulse. Mandryk [8] ex-
the interview results, some useful data about the functional amines the issues associated with the evaluation of affect

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

using these physiological measures; how to calibrate the sen- [4] Alan Dix, Janet Finlay, Gregory D. Abowd, and
sor readings and how to correlate multi-point sensor data Russel Beale. Human-Computer Interaction. Prentice
streams with single point subjective data. Both studies Hall, 3rd edition, 2004.
acknowledge that physiological readings are more valuable [5] Kristina Höök, Phoebe Sengers, and Gerd Andersson.
when combined with qualitative data. The challenge here Sense and sensibility: evaluation and interactive art.
is to interpret the data effectively and research needs to be In CHI ’03, pages 241–248, New York, NY, USA,
done into how to calibrate this data for musical experiments. 2003. ACM.
Fallman and Waterworth [3] describe how the Repertory [6] Katherine Isbister, Kia Hook, Jarmo Laaksolahti, and
Grid Technique (RGT) can be used for affective evaluation Michael Sharp. The sensual evaluation instrument:
of user experience. RGT is a post-task evaluation technique Developing a trans-cultural self-report measure of
based on Kelly’s Personal Construct Theory, and it involves affect. International Journal of Human-Computer
eliciting qualitative constructs from a user which are then Studies, 65:315–328, April 2007.
rated quantitatively. It sits on the border between qualita- [7] Joseph ’Jofish’ Kaye, Kirsten Boehner, Jarmo
tive and quantitative methods, allowing empirical analysis Laaksolahti, and Anna Staahl. Evaluating
of qualitative data. RGT isn’t ideal in a musical context experience-focused hci. In CHI ’07: CHI ’07 extended
as the data isn’t collected in the moment of the experience abstracts on Human factors in computing systems,
it evaluates; however, it could be an improvement on inter- pages 2117–2120, New York, NY, USA, 2007. ACM.
views, and has the the practical advantage that the data [8] Regan Lee Mandryk. Evaluating affective computing
analysis is less time-consuming. environments using physiological measures. In CHI’05
A number of experience evaluation techniques attempt Workshop on Evaluating Affective Interfaces -
to gather data from multiple data sources in order to at- Innovative Approaces, 2005.
tempt to triangulate an overall result. This way of working
[9] James McCartney. Rethinking the computer music
brings the challenge of synchronising and re-integrating the
language: SuperCollider. Computer Music Journal,
data sources, and some researchers are creating tools to deal
26(4):61–8, 2002.
with this [2]. These kind of tools would have been of great
[10] Cornelius Poepel. On interface expressivity: a
value to the data analysis in the Wiimote study, especially
player-based study. In NIME ’05: Proceedings of the
because of the need for log file to video synchronisation.
Developments in new HCI research are encouraging, but 2005 conference on New interfaces for musical
how useful are they in a computer music context? All these expression, pages 228–231, Singapore, Singapore,
methodologies need to be assessed specifically in terms of 2004. National University of Singapore.
evaluation of musical experience as well as user experience. [11] Marcelo Mortensen Wanderley and Nicola Orio.
Evaluation of input devices for musical expression:
Borrowing tools from hci. Comput. Music J.,
4. CONCLUSION 26(3):62–76, 2002.
We have examined current intersections between HCI eval-
uation methodology and computer music, presented a case
study of an evaluation based on this methodology, and looked
at some of the new research in HCI which is relevant to our
field. The evaluation of the Wiimote produced some valu-
able insights into its use as a musical controller, but it lacked
real-time data concerning the participants’ experience of us-
ing the device. The third wave of HCI holds promising po-
tential for computer music; the two fields share the common
goal of evaluating experience and affect between technology
and its users. The analysis of musical interfaces can be con-
sidered as a very specialised area of experience evaluation,
though techniques for new HCI research are not necessarily
immediately applicable to music technology. New research
is needed to adapt and test these methodologies in musical
contexts, and perhaps these techniques might inspire new
research which is directly useful to musicians.

[1] Noel Chateau and Marc Mersiol. Amuse: A tool for
evaluating affective interfaces. In CHI’05 Workshop
on Evaluating Affective Interfaces - Innovative
Approaces, 2005.
[2] Andy Crabtree, Steve Benford, Chris Greenhalgh,
Paul Tennent, Matthew Chalmers, and Barry Brown.
Supporting ethnographic studies of ubiquitous
computing in the wild. In DIS ’06: Proceedings of the
6th conference on Designing Interactive systems,
pages 60–69, New York, NY, USA, 2006. ACM.
[3] John Waterworth Daniel Fallman. Dealing with user
experience and affective evaluation in hci design: A
repertory grid approach. In CHI’05 Workshop on
Evaluating Affective Interfaces - Innovative
Approaces, 2005.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

The A20: Musical Metaphors for Interface Design

Olivier Bau Atau Tanaka Wendy E. Mackay
in|situ| lab, INRIA & LRI Culture Lab in|situ| lab, INRIA & LRI
Bât 490 Université Paris-Sud 11 Newcastle University Bât 490 Université Paris-Sud 11
91405 Orsay Cedex France NE1 7RU United Kingdom 91405 Orsay Cedex France

ABSTRACT to explicitly question traditional ways of thinking and open up

We combine two concepts, the musical instrument as metaphor novel design directions. Our goal was to create a technology
and technology probes, to explore how tangible interfaces can probe that focuses on the sonic aspects of tangible interfaces,
exploit the semantic richness of sound. Using participatory using participatory design to create and explore the possibilities
design methods from Human-Computer Interaction (HCI), we of a working prototype.
designed and tested the A20, a polyhedron-shaped, multi-
channel audio input/output device. The software maps sound We also draw on the instrument building approach from NIME,
around the edges and responds to the user’s gestural input, which offers a similar notion of generative design. Musical
allowing both aural and haptic modes of interaction as well as instruments are developed as open-ended systems that allow the
direct manipulation of media content. The software is designed creation of novel compositions and interpretations, while
to be very flexible and can be adapted to a wide range of idiomatic composition recognizes that limitations are imposed
shapes. Our tests of the A20’s perceptual and interaction by the characteristics of the system or instruments. We use this
properties showed that users can successfully detect sound instrument building metaphor as one of the foundations for our
placement, movement and haptic effects on this device. Our generative design approach: the limitations of the instrument
participatory design workshops explored the possibilities of the serve to both define and constrain the design space, with respect
A20 as a generative tool for the design of an extended, to the given research problem.
collaborative personal music player. The A20 helped users to
enact scenarios of everyday mobile music player use and to
generate new design ideas.

Generative design tools, Instrument building, Multi-faceted
audio, Personal music devices, Tangible user interfaces,
Technology probes

We are interested in creating tangible user interfaces that
exploit the semantic richness of sound. Our research draws from
two disciplines: Human-Computer Interaction (HCI) and NIME
instrument design. The former offers a number of examples of
the use of sound in graphical interfaces, including Buxton et
al.’s [2] early work, Gaver’s auditory icons [5] and Beaudouin- Figure 1. The A20 is a working prototype of a technology
Lafon and Gaver’s [1] ENO system. These systems focused probe for exploring music and sound in a tangible interface.
primarily on sound as a feedback mechanism, with an emphasis This paper describes the design and development of the A20
on graphical rather than tangible user interfaces. (Figure 1), a polyhedron-shaped, multi-channel audio device
We draw upon HCI design methods, particularly participatory that allows direct manipulation of media content through touch
design [7][12], that emphasize the generation of ideas in and movement, with various forms of aural and haptic
collaboration with users. In particular, technology probes [9] feedback. During a series of participatory design sessions, both
engage users as well as designers to create novel design users and designers used the A20 to generate and explore novel
interface designs. The easily modifiable software architecture
concepts, inspired by the use of the technology in situ. This
generative design approach challenges both users and designers allowed us to create various mappings between gestural and
pressure inputs, producing specific sounds and haptic output.
Meanwhile the flexibility of the A20 as an interface allowed
Permission to make digital or hard copies of all or part of this work for users a range of interpretations for any given mapping. The A20
personal or classroom use is granted without fee provided that copies are was never intended as a prototype of a specific future system.
not made or distributed for profit or commercial advantage and that
Instead, we sought to use it as a design tool to explore the
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists, potential of music and sound in tangible interfaces. Our
requires prior specific permission and/or a fee. participatory design workshops served to both evaluate the A20
NIME08, June 5-7, 2008, Genova, Italy itself and to explore novel interface designs, including social
Copyright remains with the author(s). interaction through portable music players.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

2. RELATED WORK Technology probes were originally designed to study human-to-

Since our goal was to maximize the user’s ability to explore human communication and were tested in the homes of remote
new forms of interaction, we needed a generic shape that would family members. Most focused on the exchange of visual
maximize the user’s freedom of expression and could be easily information, such as the VideoProbe [9] which snaps a picture
adapted to a variety of types of interaction, preferably through from a webcam in the living room – but only if the person does
direct manipulation of a topological display. The D20 Error! not move for three seconds – and shares it with a VideoProbe in
Reference source not found., co-designed by one of the the living room of a remote family member. Another device,
authors, is a design concept for a visual interface embodied as TokiTok [10] explored communication via simple sounds: users
an icosahedron. An icosahedron is a nearly spherical could transmit ‘knocks at a distance’, which conveyed simple
polyhedron with 20 discrete triangular facets. Figure 2 shows information such as ‘I’m home’ but also allowed participants to
how this shape permits a variety of display options and modes establish more elaborate codes to stay in touch. However, sound
of rotation-based interaction, such as around the equator, as and music have not been the focus of technology probe research
slices of a pie, or simply as binary choices (Figure 2). thus far.
The A20 project seeks to leverage the complementary aspects
of music research and user interface design methods. We use
the notion of technology probes to understand users and draw
inspiration, but in simulated settings in design workshops rather
than in the real world. We also take advantage of techniques
from NIME, with the inherently expressive properties of
musical instruments, to explore this design space.


Musical instruments are built to be vehicles of expressive
Figure 2. The D20 interaction modes that emerge from three communication. An instrument is generic in the sense that many
facet patterns: equator, pie and binary kinds of music can be composed for the same instrument. At the
same time, an instrument is idiosyncratic in that it is capable of
The D20 was created as a design concept, using a computer specific modes of articulation, limited melodic range and
simulation that emphasized the visual display properties of the harmonic combinations. An instrument is not necessarily
icosohedron. We decided to adopt the same form but this time designed to have a set of limitations, but a successful musical
as a functional prototype, focusing on its audio and haptic work must take into account these characteristics. A musical
possibilities. composition that respects and plays upon the idiosyncratic
Several other researchers have created omni-directional nature and limits of an instrument is considered an example of
spherical sound output devices. For example, Warusfel [17] idiomatic writing [15]. This approach to creative musical use of
controls radiation patterns from a single sound source across a acoustical properties and limitations applied to digital
spherical matrix. Freed et al. extended this to a multi-channel interaction properties is one of the core research areas of NIME.
approach that simulates acoustical instrument propagation [4]. In the field of HCI, various design methodologies exist to create
SenSAs [16] add sensors to create a form of electronic chamber useable or efficient user interfaces. This can include
music performance. The primary focus of these projects was to performance optimization in the technical sense, or taking into
recreate multi-directional sound radiation patterns that approach account the end-user’s needs in the design process as in the case
those of acoustic instruments: they create non-frontal forms of of User-Centered Design. A technique similar to that of
amplified sound reinforcement so as to better situate electronic idiomatic writing in music exists in HCI, whereby limitations of
sounds in context with acoustic sources. However, none have a technological system are used as part of the design process.
used a spherical form factor to play multiple sound sources in This is called seamful design [3]. Chalmers argues that
the context of end-user music devices such as MP3 players. accepting all of a system’s “physical and computational
The other relevant research relates to generative design characteristics [whether they are] weaknesses or strengths” not
methods. For example, cultural probes [6] provide people with only offers more robust system design, but may also inspire
unusual artefacts in various settings, with the goal of inspiring novel interface ideas.
novel ideas. The idea is to move from the more classical HCI Composing idiomatic music for an instrument can be
approach, in which users are viewed as a source of data, and considered an act of seamful design: we can make a link
engage in activities in which users become a source of between making a composition that takes into account an
inspiration. Technology probes [9] also focus on design instrument’s limitations, and creating an interface that takes
inspiration with users, but in an explicitly participatory design advantage of a system’s characteristics. In the user-interface
context. Technology probes are founded on the principle of design process, seamfulness helps define the creation of a
triangulation [11] which fulfills three “interdisciplinary goals: design space, while open-endedness helps in interpreting the
the social science goal of understanding the needs and desires design space. Here we apply the duality of seamful composition
of users in a real-world setting, the engineering goal of field- and open-ended instrument to create a tool for generative user
testing the technology, and the design goal of inspiring users interface design.
and researchers to think about new technologies”.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy


In the development of the A20, we sought an application- The first version of the A20 was a simple cube, which helped us
neutral approach that would yield a flexible interface. The to develop the software for integrating sound-processing and
design of the A20 is not a direct response to specific interface sensor data. The second version was an icosahedron, which we
design questions. Instead a metaphor-based conceptual used in our studies with users.
development allowed us to pursue an open-ended process to
explore the design space of audio interfaces. We called upon
three metaphors from the musical tradition: instrument building,
composition, and expressivity of interpretation.
When building digital musical instruments, unlike acoustic
instruments, we must define the mappings between input and
output [8]. For a given system specification, we can conceive of
many mappings to create a variety of input and output
connections. This range of mappings turns the system into a
potential family of instruments or corpus of articulations for a Figure 3: The A20 frame (left) consists of 20 triangles, 16
given instrument. This contrasts with most user interface of which hold flat speakers. Transducer and Force Sensing
design, in which the goal is to find the single optimal mapping Resistors (right) fit under each speaker.
of input and output that will create the desired interaction for a
Figure 3 shows A20’s frame on left, built with rapid
specific design problem.
prototyping stereo-lithography. An audio interface and sensors
We also draw from the metaphor of musical instrument were housed within the structure (Figure 3, right). The
composition which emphasizes expressivity and interpretation. icosahedron had 14 cm edges, resulting in a diameter of
A composition exists as a musical structure that can be executed approximately 22 cm. We attached commercially available
and re-interpreted in the context of a musical performance. lightweight flat-panel loudspeakers along the outside of the
These two metaphors, musical instruments and composition, frame, with each panel cut to a triangular shape. The assembled,
encourage us to re-examine the traditional user interface design working version can be seen in Figure 1.
concept of a scenario and redefine it as a compositional
The sixteen flat speakers are driven independently with two
abstraction that can be executed on that tool/instrument. In a
USB 8-channel 16-bit sound cards with a 44.1 kHz sampling
participatory design process, scenario creation and scenario
rate. Thus, only 16 of the faces are able to display independent
enacting can be seen as composition and interpretation. These
sound. Sensors include a Bluetooth Six-Degrees-Of-Freedom
metaphors serve to situate and enrich our interaction scenarios,
(6DoF) inertial sensor pack with a triaxial accelerometer and a
while also guiding the design specification of the system.
triaxial gyroscope for rotation [18]. Force Sensing Resistors
The metaphors of instrument, composition and interpretation (FSR) are integrated under each speaker transducer, with a 10-
correspond to two levels of abstraction of the A20. At the lower bit analog-to-digital conversion processor on a separate micro-
level of abstraction, the instrument is defined by the hardware controller-based acquisition board 1. The micro-controller
specification (form factor, sensors, audio output) and software acquires the pressure sensor data with 12 bit resolution which is
specification (mapping between input and output). As a design then sent over a standard serial port. Figure 4 (lower box)
tool, the A20’s hardware establishes the first set of constraints illustrates the hardware architecture.
for the design space, including gestural and pressure input on
the one hand and multidirectional and multi-channel output
capabilities on the other hand. The software defines the
‘elements of interaction’ that turn the A20 into an instrument.
For example, the user can create a sound that moves around the
device and then make it stop by shaking the device.
The upper level of abstraction comprises composition and
interpretation, which allow the user to play the device in the
context of a specific design scenario. Different interpretations
can be seen as different instantiations of an open-ended
interaction mapping. For example, shaking the device could be
interpreted as a gesture to validate playlist creation, to send a
song to a friend’s device, or an action in a collaborative music
game. The expressivity of the resulting instrument allows a
wide range of interpretations and instantiations for different Figure 4. A20 hardware and software architecture
design questions. The software that defines the A20’s
interaction is highly flexible, which enables us as user interface 4.2 SOFTWARE
designers to invent and invite users to ‘play’ a diverse set of The software architecture is based on a client/server model and
instruments and understand both the problems and potential of consists of: sensor acquisition and interaction mapping modules
each. and an audio engine, shown in Figure 4 (upper box). The data
collected from the A20’s sensors is broadcast to a control
module on the computer, which integrates the sensor data and


Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

defines the interaction mappings. A second module is in charge technology probe and an instrument, to inspire and explore
of audio processing and sends data back to the A20. Both different forms of interaction with a tangible audio device.
modules communicate via Open Sound Control2. We chose the
UDP protocol for its efficiency in time-sensitive applications.
5.1 Multi-faceted Audio Perception
The A20 interaction mappings are implemented in C ++ as a The purpose of the first set of tests was to assess the users’
server process that aggregates data from the accelerometer, ability to perceive different modes of audio display on the A20,
gyroscope and pressure sensors. We used the OpenGL graphics including their ability to perceive sound position, motion
library to program a visual representation of the physical around the device, and haptic patterns. We also wanted to
prototype for debugging interaction mappings and to accelerate familiarize them with the A20 so they could participate in the
matrix operations during real-time sound mapping on the second set of participatory design exercises.
device. We vectorized sound location across the surface of the
icosahedron in a way similar to Pulkki’s work on vector-based
sound positioning [13]. Vector-based audio panning extends the
principle of stereo panning with an additional dimension,
making it useful when the listener is not fixed at a sweet spot or
in cases where sound distribution includes a vertical dimension.
In the control software, 3D vectors represent sound sources.
The origin of a 3D coordinate system is the center of the object,
in this case, the center of the A20. Each face and corresponding
speaker is represented by a vector from that origin to its center.
The control software outputs, in real time, a vector angle for
each sound source. The audio engine can then calculate
amplitude levels given the angular distance between the vectors
representing the sound sources and those representing the Figure 5. Testing how a user perceives the A20
speakers. The control software dynamically calculates the
source vectors, resulting in sounds moving across a series of We asked 16 participants to perform a set of tests, in individual
faces. After audio processing, this results in a gradual sessions lasting approximately 10 minutes each. Each
multidimensional panning between those two faces, giving the participant was given the A20 (Figure 5) and asked to perform
impression of sound moving across the surface of the object. the following tasks:

This software can be adapted to a range of different shapes. The Test 1: Localizing Sound
vectors representing the faces are computed according to the Impulse sounds were played randomly on one of the facets and
number of speakers and their placement. The audio engine is the participant was asked to identify the source facet without
touching the A20. (Repeated five times.)
then configured with the proper number of speakers and data
relative to their output capabilities, such as physical size and Test 2: Detecting Direction of Movement
amplitude range. Thus the same software works for the original An impulse train was panned around the equator of the device
cube-shaped prototype and for the 20-sided icosahedron. to simulate a moving sound source with a circular pattern. The
The audio engine is written in Max/MSP and is divided into two participant was asked to identify whether the direction of
movement was clockwise or counter-clockwise, without
parts. The main control program is the master of two slave
touching the A20.
patches, each controlling a sound card. The audio engine
manages multiple sound streams that can be placed on different Test 3: Distinguishing Static from Dynamic
positions on the device according to location attributes sent by We combined the first two tests to determine whether the
the control software. This software allows us to use synthesized participant could distinguish a moving sound from a static
sounds as well as samples of recorded music in MP3 format. sound. The participant was presented with four conditions: two
Post-treatment algorithms are applied to achieve acoustical static sounds were played (derived from Test 1) and two
effects from the real world. For example, Doppler shift changes moving sounds were played (counter and counter-clockwise) in
the sound pitch as it moves closer or further, and filtering a counterbalanced presentation sequence.
effects change the sound timbre as the sound moves behind Test 4: Distinguishing Haptic Stimuli
obscuring objects, thus enhancing the effect of sound movement We combined the auditory and haptic channels to create various
around the device. combinations – some where the two modes were synchronous,
reinforcing perception of a single source, and others that
5. EVALUATION presented two distinct sources, one in each modality. The haptic
In order to evaluate the A20, we invited non-technical users to channels were presented on the lateral faces under the
the third in a series of participatory design workshops. The first participant’s hands whereas the auditory channel (a musical
two sessions, not reported here, focused on an interview-based excerpt from a well-known pop song) was presented on the
exploration of existing personal music player usage, and ‘pie’ zone at the top of the A20. In some combinations, the
structured brainstorming on communicating entertainment haptic channel corresponded to the music being heard, while in
devices, respectively. Evaluation of the A20 was comprised of others the haptic and audio stimuli were independent. The
two activities. The first type of evaluation focused on its participant was asked to indicate whether or not the haptic and
perceptual characteristics as a multi-faceted multi-channel audio audio signals were the same. In cases where the haptic signal
device. The second type of evaluation used the A20 as a was derived from the music, several variations were made to
bring more or less of the music into the haptic range. This
included generating the haptic signal from the amplitude
envelope of the music, or low-pass filtering the music before
2 generating the corresponding haptic stimulus.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Test 5: Distinguishing Haptic Stimuli

Participants were asked to hold the A20. We generated two
different haptic stimuli, one under each hand. These were low
frequency vibration patterns that were not in the audible range
(using pulse width and frequency modulation). The participant
was asked whether or not the two patterns were the same. For
each task, trial order was counterbalanced across participants.

5.2 Results Figure 7. Working with cardboard mockups and drawing

Figure 6 shows the results of each of the five tests. Participants storyboards to illustrate shared scenarios.
were reliably able to locate the position of a sound on the device
(Test 1, 85% accuracy), to detect the direction of motion (Test
2, 77% accuracy) and to perceive whether the sound was
5.4 Results
moving or not (Test 3, 79% accuracy). However users had One of our constraints was that we had only one working
greater difficulty determining whether the haptic stimulus under prototype of the A20, which meant that participants played with
their hands was a filtered version of the music being heard (Test it at different points in their design exercises. However, this
4, 69% of accuracy). Participants were particularly successful in enabled us to observe how the A20 affected their designs, and
distinguishing among haptic stimuli (Test 5, 91% accuracy). compare designs from those who experienced it early or late in
their design processes.
As one would expect, people had various interpretations of the
A20 and incorporated its features differently into their designs.
Some were directly influenced by the interaction elements that
they experienced in the perceptual tests. For example, one
group’s concept emerged from the first interaction mapping:
They extended the idea of flicking the A20 to navigate through
sounds and created a collaborative music game. One user would
perform a succession of back-and-forth flicks to create a sound
sequence. The remote player would then execute the same
sound sequence, adding one new sound at the end. As they play,
the sequence becomes successively more difficult to master,
until one player cannot reproduce the sequence.
Another group imagined a file browsing technique that involved
manipulating the sound source directly. This exploited the
Figure 6. Results of simple perception tests whole-object interaction and audio-only nature of the A20. One
participant applied this functionality to common MP3 players
by adding gestural input and spatialized sound. This modified
5.3 Participatory Design Workshop the concept of the playlist so that it was no longer a textual
We organized the workshop into four major design activities. representation of the music, but the music itself, sequentially
The first asked participants to create personal scenarios that laid out across the faces of the A20.
address the theme of mobile social interaction through music.
The second interaction mapping allows users to send a sound
The second and third activities were conducted in parallel. In
around the equator of the A20, so that the sound moves from
the second activity, small groups collaborated on creating a
the pressed face to its opposite face. Although presented only as
scenario that combined and deepened the individual scenarios
an abstract interaction element, several participants seized upon
from activity one. During this time, we invited individuals to
the idea of generating sonic feedback when sending a music file
test the A20, as described in the previous section. When all the
to someone else. One participant imagined a scenario that
members of a group had completed individual perception tests,
combined the second and third interaction mappings. He would
we used the A20 as a design tool to help each group imagine
turn physically in space with the A20 so as to orient himself
novel interaction scenarios. We implemented three interaction
with respect to his distant correspondent, effectively associating
mappings that allowed participants to play with thee different
a physical person in real space to the topology of the A20. He
forms of gesture-based interaction:
would then select a piece of music from a particular face to
1. Flick the A20 left or right to change the current music share with the other person.
track playing on the top of the device.
The third interaction mapping inspired another group to propose
2. Press on a facet to make a sound rotate around the a device that acts like a sound memory compass: “The A20 can
equator, starting from the pressed speaker and then fading be a recorder for use while traveling, to capture impressions
away. from different places. Each face saves a sonic snapshot from a
3. As the user physically turns in a circle, compensate by place I visit.” They attached sounds to virtual objects in the
panning the A20 so that the music stays fixed relative to environment and proposed navigating through this collection of
the surrounding space. objects by pointing the A20 in different directions in the space.
The fourth activity (Figure 7) asked pairs of participants to Other users imagined scenarios that exploited the A20’s form
create a meta-scenario that incorporated their newfound factor. For example, one group proposed throwing the A20 “like
interpretations of A20 interaction mappings and design a user a die onto the floor”, which would turn on shuffle mode and
interface that exploited its sound properties. The resulting “fill the living room with sound”. Another group proposed using
scenarios were sketched out on storyboards, acted out, and groups of A20’s like stackable bricks, to create a variety of
videotaped. different sound or music effects. These examples illustrate some

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

of the richness and innovation of the ideas generated by non- [4] Freed, A., Avizienis, R., Wessel, M. and Kassakian, P.
technical users, which go far beyond the creativity we saw in (2006). A compact 120 Independent Element Spherical
previous workshops, when they had no specific instrument on Loudspeaker Array with Programmable Radiation Patterns.
which to play and explore ideas. In Proc. of AES’06. paper 6783.
[5] Gaver, W. (1989). The Sonic Finder: An Interface that
6. CONCLUSION AND FUTURE WORK Uses Auditory Icons. In Human Computer Interaction,
Our goal has been to use the expressivity and open-endedness 4(1), pp. 67-94
typical of musical instruments to create generative design tools,
[6] Gaver, W.W. and Dunne, A. (1999). Projected Realities:
encouraging both users and designers to imagine new interfaces
Conceptual Design for Cultural Effect. In Proc. of CHI’99
using the evocative richness of sound. In workshops, users
pp. 600-608.
experienced, tested and explored design ideas, immersed in the
context provided by the workshop theme and the A20’s specific [7] J. Greenbaum and M. Kyng, Eds. (1992). Design at Work:
sound characteristics. We feel that the A20 successfully acted as Cooperative Design of Computer Systems. Lawrence
an expansive platform for generating and exploring new sound Erlbaum Associates, Inc.
interaction ideas. [8] Hunt, A., Wanderley, M.M. and Paradis, M. (2002). The
The icosahedron form served as a generic interface that could importance of parameter mapping in electronic instrument
be reinterpreted in different ways. The A20 constrained the design. In Proc. of NIME’02. pp. 149–154.
design space to gestural input and multi-directional sound [9] Hutchinson, H., Mackay, W.E., Westerlund, B., Bederson,
output and the idiosyncratic form factor influenced some B., Druin, A., Plaisant, C., Beaudouin-Lafon, M.,
participants’ scenario interpretations. However, since the sound Conversy, S., Evans, E., Hansen, H., Roussel, R.,
control software can be easily adapted to work on other form Eiderbäck, B., Lindquist S. and Sundblad, Y. (2003)
factors, different shapes could be used depending upon the Technology Probes: Inspiring Design for and with
design questions to be treated, allowing us to transpose on the Families. In Proc of CHI’03. pp. 17-24.
design space. This could be achieved by creating a wider range
of simple forms or even using Lego-like building blocks to [10] Lindquist, S., Westerlund, B., Sundblad, Y., Tobiasson, H.,
create a shape around the multidirectional sound source. Beaudouin-Lafon, M. and Mackay, W. (2007). Co-
designing Technology with and for Families - Methods,
In our future work, we plan to extend the output and networking Experiences, Results and Impact. In Streitz, N., Kameas,
capabilities of the A20. We found the preliminary perception A. & Mavrommati, I. (Eds), The Disappearing Computer,
tests with haptic patterns interesting and we also plan to explore LNCS 4500, Springer Verlag, 2007, pp. 99-119.
audio-haptic correlation and audio-to-haptic information
transitions and add these features to another instrument [11] Mackay, W.E. and Fayard, A-L. (1997). HCI, Natural
interface. This would allow user interface designers to take the Science and Design: A Framework for Triangulation
haptic capabilities of audio displays into account and to further Across Disciplines. In Proc. of DIS '97. ACM. pp. 223-
explore the multimodal potential across sound and touch 234.
together. We hope to develop a fully wireless lightweight [12] Muller, M.J. and Kuhn, S. (Eds.) (1993).
version of the A20 and would also like to add networking Communications of the ACM Special issue on
features so that multiple A20’s can communicate with each Participatory Design, 36 (6). pp. 24-28.
other and encourage diverse form of musical collaboration
among its users. [13] Poupyrev, I., Newton-Dunn, H. and Bau, O. (2006). D20:
interaction with multifaceted display devices. In CHI’06
Extended Abstracts. ACM. pp.1241-1246.
7. ACKNOWLEDGMENTS [14] Pulkki, V. (1997). Virtual sound source positioning using
This project was developed at Sony Computer Science
vector base amplitude panning. In J. Audio Eng. Soc. 45
Laboratory Paris. Our thanks to the project interns, Emmanuel
(6). pp.456-466.
Geoffray from IRCAM and Sonia Nagala from Stanford
University and to Nicolas Gaudron for the icosahedron [15] Tanaka, A. (2006). Interaction, Agency, Experience, and
structure. the Future of Music. In Brown, B., O’Hara, K. (Eds.)
Consuming Music Together: Social and Collaborative
Aspects of Music Consumption Technologies. Computer
8. REFERENCES Supported Cooperative Work (CSCW) Vol. 35. Springer,
[1] Beaudouin-Lafon, M. and Gaver, W. (1994). ENO: Dordrecht. pp. 267-288.
synthesizing structured sound spaces. In Proc. of UIST’94.
ACM. pp. 49-57. [16] Trueman, D., Bahn, C. and Cook, P. (2000). Alternative
Voices For Electronic Sound, Spherical Speakers and
[2] Buxton, W., Gaver, W. and Bly, S. (1994). Auditory Sensor-Speaker Arrays (SenSAs). In Proc. of ICMC’00.
Interfaces: The Use of Non-Speech Audio at the Interface. [17] Warusfel, O. and Misdariis, N. (2001). Directivity
Synthesis With A 3d Array Of Loudspeakers Application
[3] Chalmers, M. and Galani, A. (2004). Seamful For Stage Performance. In Proc. of DAFx’01.
interweaving: heterogeneity in the theory and design of
interactive systems. In Proc. of DIS’04. ACM. pp. 243- [18] Williamson, J., Murray-Smith, R., and Hughes, S. (2007).
252. Shoogle: excitatory multimodal interaction on mobile
devices. In Proc.of CHI '07. ACM. pp. 121-124.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Low Force Pressure Measurement: Pressure Sensor

Matrices for Gesture Analysis, Stiffness Recognition and
Augmented Instruments
Tobias Grosshauser
1, Place Igor Stravinsky, Paris
ReactiveS Lab, Munich
+49- 176- 242 99 241

ABSTRACT All in all, the target group is from beginners up to professional

The described project is a new approach to use highly sensitive musicians in the areas of teaching, performance, composition and
low force pressure sensor matrices for malposition, cramping and posture and gesture analysis.
tension of hands and fingers, gesture and keystroke analysis and In music and art, sensors can be an alternative or an enahncement
for new musical expression. In the latter, sensors are used as for traditional interfaces like computer keyboard, monitor, mouse
additional touch sensitive switches and keys. In pedagogical and camera in man-machine interaction. Position, pressure or
issues, new ways of technology enhanced teaching, self teaching force sensing is a possibility to translate the haptic reality to the
and exercising are described. The used sensors are custom made digital world. There is already a great choice of high performance
in collaboration with the ReactiveS Sensorlab. motion and position tracking systems, but techniques for pressure
recording are still under-represented. Besides the expenses, this is
Keywords due to the complicated measurement technology needed for the
mostly high capacitance of the industrial sensors and the
Pressure Measurement, Force, Sensor, Finger, Violin, Strings, complicated and damageable mechanical setup. After the first
Piano, Left Hand, Right Hand, Time Line, Cramping, Gesture and development period of the pressure sensors, the main goals, high
Posture Analysis. sensitivity and low weight, were achieved. Later, also the
following additional requirements:
Many audio and gesture parameters have already been explored - cheap and stable, “live-performance-proof“
and described in exercising, teaching and performing of musical - easy to use and to install
instruments. The suggested method in this paper extends the
- no distraction of gesture or movements
approved practices. Basic technology is a high sensitive pressure
sensor. The line up of several of these extremely light weighted - usable in performances with and without computer,
sensors in arrays allows a broad field of applications. A “stand-alone system“
combination in matrices allows 3-dimensional representation of - every sensor can be detected autonomously
the linearised data with position and pressure visualisation. The
position, pressure/ force and the data representation with for - high resolution AD-conversion, but also
instance time line alignement shows the change of the overall - compatibility to standards like MIDI
energy and is visualised graphically. Alongside the time axis the
change of the applicated force respectivly the pressure can be
Many different visualisation, recording, sonification and feedback GESTURE, PRESSURE AND POSITION
tools are programmed in PD and MaxMSP or similar software
environments and can be applied for the generated data.
Poepel shows a summarisation of the extended violins, playing
with ASDSS sounds, playing with expanded existing instruments
and playing with new gestures [1]. Askenfelt already measures
Permission to make digital or hard copies of all or part of this work for bow motion and force with custom electronic devices [2]. A thin
personal or classroom use is granted without fee provided that copies are resistor wire is among the bow hairs to get position data and bow
not made or distributed for profit or commercial advantage and that copies bridge distance with electrified strings. Paradiso uses the first
bear this notice and the full citation on the first page. To copy otherwise, wireless measurement system, two oscillators on the bow and an
or republish, to post on servers or to redistribute to lists, requires prior
antenna combined with a receiver [3]. Also pressure of the
specific permission and/or a fee.
NIME08, June 4-8, 2008, Genova, Italy forefinger and between the hair and wood. Young received
Copyright remains with the author(s). pressure data from a foil strain placed in the middle of the bow
[4]. Demoucron attaches accelerometers to the bow and measures

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

the complete pressure of the bow with sensors connected to the 4. PRESSURE AND POSITION
bow hair [5].
Maestre presents a gesture tracking system based on a commercial
EMF device [6]. One Sensor is glued on the bottom near the neck 4.1 Strings
of the violin, a second one on the bow. Data of position, pressure The basic measurements at the violin (exemplary for strings) are:
by deforming the bow and relating data to this capturing can be 4.1.1 Pressure and Position of each Finger of the
calculated. A lot more systems exist, but mostly combined with a
camera, which does not seem to be stable and reliable enough for Right Hand
performances and everyday use.
A different approach is developed at IRCAM by Bevilaqua [7].
The sensing capabilities are added to the bow and measure the
bow acceleration in realtime. A software based recognition system
detects different bowing styles.
Guaus measures the bow pressure over all [8] and not each finger,
which cause the pressure on the bow. Sensors are fixed on the
hairs of the bow on the tip and the frog. This means additional
weight on the tip, which could influence professional violin
playing, because of the leverage effect.
The recent paper of Young [9] describes a data base of bow
strokes with many sensor data like 3D acceleration, 2D bow force,
and electric field position sensing, again with an over all bow
force measurement.
The presented measuring system here shows a setup easy to
install, just sticking the less than 1mm thick, flexible sensor on the
bow or finger and connecting it with the convertor box. As every
single finger itself is measured, besides pressure and force
allocation and changes between the fingers at different playing
techniques, muscle cramps and wrong finger position can be

3. BASIC SETUP Figure 1. Change of force during one bow stroke

Each sensor is connected directly with a converter box. If less data
is required, fewer sensors can be plugged into the convertor box.
Standard stereo jacks are used as plugs, each sensor/ plug has its These integrated sensors show when the position and/or pressure
own control channel. This allows individual and minimized setups changes during the movement of the bow (see figure 1) or over a
and a better overview, if fewer channels are used, especially with certain force limit. This limit can be adjusted individually and
younger students. Wireless transmission is partly possible, but not visual feedback or just data recording is possible. This allows ex
always practicable. The connector box can be worn on a belt. Data post playing analysis of the performed music piece, or just
transmission is possible either to a computer or directly to information for beginners. The sensors of the middle and the ring
synthesizers or other modules via MIDI. The connector box finger can also be used for steering or switching peripheries on/off
provides a power supply for each sensor and direct MIDI out. or for constant sound manipulation.
The basic sensor is 5 x 5 x 2 mm, larger dimensions are possible. Figure 2 shows the integration of the sensors into the bow and
It weights only some grammes, depending on the dimensions of finger posture of the right hand. Every sensor is installed to the
the surface area. The sensors are usually combined in 2 to 4 rows right place, individually adapted to the ergonomical and technical
each consisting of 4 to 8 sensors, sticked on a flexible foil. correct position of the musicians’ fingers. For beginners, rather in
the age over 15 years, this is a simple control for the correctness
The basic setup consists of at least one 16-channel programmable of the posture, if they exercise alone at home and it detects wrong
converter and connector box, sensor matrices for shoulder rest, exposure or stiffness of the hand and fingers, for example too
chin rest and bow. Further a computer with MaxMSP, MathLab much or wrong directed pressure on the forefinger.
or common music software to process, record or display the data.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

neck of beginners and malposition, especially in long-lasting

exercising situations or general inattentiveness.

Figure 2. Pressure sensors on the bow

Figure 4. Shoulder Rest Pressure Allocation, SR1 and SR2
4.1.2 Pressure of the left Hand Sensor Array
This is not an every day solution, because the flexible sensors are
sticked on each finger. But for several methodical and technical
issues useful information is generated. Not the maximum pressure
itself, but how the pressure changes for example at different types
of thrills or vibrato and how the pressure is divided between the
fingers when double stops are played. In combination with bow
position sensor, left- right hand coordination can be explored very

4.1.3 Areal Distribution of Pressure and Position of

the Chin on the Chin Rest
First measurements of chinrest pressure in violin playing were
done by M. Okner [10]. Five dependent variables were evaluated:
peak pressure, maximum force, pressure/time integral and Figure 5. Shoulder Rest Malposition, SR1 and SR2
force/time integral, and total contact area. Similar variables are Sensorposition
studied in our measurements and shown in figure 3.
Similar to the above-mentioned chin rest solution, the shoulder
rest matrix (see figure 4) detects malposition and false posture,
often caused by disposition of the shoulder. In figure 5, SR2
shows an incorrect pressure allocation on the shoulder rest,
compared to SR2 in figure 4. Correct violin position is a basic
condition precedent to learn the further playing techniques. This
issue concerns mainly beginners, for advanced musicians it is
possible to use for example a defined pressure raise of the chin or
shoulder for switching and steering interactions. (Besides a bad
positioned shoulder rest, backache can be avoided by good
shoulder and neck posture.)

4.1.4 Comparison Shoulder-Chin Rest Pressure and

Figure 3. Pressure Allocation Chin Rest
Figure 6 shows the pressure allocation of chin and shoulder in one
coordinate system. The upper area is the chin pressure, the lower
Figure 3 shows the chin rest pressure and force sensor matrix area the shoulder rest pressure allocation area. The optical
measurements data in a 3-axis coordinate system. This optical representation is important for a simple every day use. In this case
representation of the force and pressure data seems to be a pratical wrong posture or mal position would appear in a brighter colour
way of showing the measruement results. or inhomogeneous shape of the 2 areas.
The sensor matrix enables the detection of the pressure Postion changes of the violin itself can be detected while playing.
distribution over the whole area, compared to the pressure This enables the musician or teacher to analyse besides pressure
measurement in one point only. Both, position changing and and force changes, changes of the violin position. Evaluation and
muscle cramping is detectable and could prevent pain of back and

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

studies about expressive gestures during a live performance or

practising situations are possible.

Figure 6. Comparison Shoulder-Chin Rest Pressure and Figure 8. Visualisation of 3rd Finger, “Stiffness Control”, too
Position much force/pressure

4.1.5 Stiffness Recognition 4.1.6 Pedagogical Issues

A common problem in teaching, especially in beginners’ lessons, The core application right now is the posture and tension control
is too much tension or stiffness in the bow hand. Besides wrong for stundents older than 15 years. Usually they like this feedback
posture of the hand, elbow and fingers, often force is applied to tool and are used to work with computers. A playful approach
the wrong fingers, sometimes because of the impression, the bow always is important and technical experimentation often ends up
could fall down. The wrong applied force can be detected by the in long exercising periods. This kind of motivation is not always
sensors. Most of the times, too much pressure is applied to working, but the other positive results like remembering the
fingers, where usually nearly no force is needed. In this example posture learned in the last lesson and being able to record data
on the violin bow the middle and the ring finger. A visualisation besides music is interesting.
tool (figure 7) was programmed in PD and MaxMSP. A blue ball The possibilities to provide a simple visualisation tool to get
reacts to the pressure of the ring finger. If there is too much objective data for self studies or practising scenarios are manifold.
pressure, it moves to the right (figure 8) and the color and shape Some useful scenarios are:
changes. Besides that, a sonification is implemented; the more
force, the louder and more distorted sound is generated. When the • Early detecting of too much muscle tension, caused by
opposite appears, the ball reacts into the other direction. wrong finger position or fatigue
Basically all sensor data can be visualised and sonified with this • Easy feedback for beginners, if their posture of the right
tool, but every student reacts individually to technology in hand is OK
teaching situations. First practical experience shows quite good • Avoiding of lasting and time consuming postural
acceptance and feedback, even from the younger pupils. corrections
With professional violinists, interesting tests are made. After long
exercising or job-related playing periods, physical fatigue can be
measured by posture or pressure changes. In exercising situations,
a visual feedback could inform the student and suggest a short
recreation phase.

4.2 Keyboard Instruments

Several basic measurements are applied in piano playing. The
main goals are mal position of the hands of fingers and too much
muscle tension or cramping of the fingers, hands and elbow.
Furthermore the force of single keystrokes and attacks were
measured and visualised for analysing.
With these sensor arrays, augmented pianos are possible and new
ways of playing techniques and expressions could be found.
Figure 7. Visualisation of 3rd Finger, “Stiffness Control”

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

4.2.1 Pressure/ Keystroke and Fingertip Position of

each Finger of the left and right Hand
There are two possibilities of keystroke and fingertip position
recognition. First, sticking the sensor onto the keys (difficult to
play), second fixing one sensor on each fingertip. Only the result
of the second variation were useful.
Keystroke recognition is explored by R. Möller [11] with „High-
speed-camera Recording of Pulp Deformation while Playing
Piano or Clavichord“. Different sensor types are discussed, but in
laboratory conditions with precise but usually unaffordable
equipment and only one playing mode. The low cost system
described here is fast enough to detect the maximum forces and
the gradient of each finger in different playing modes, like thrills
or fast scales. Figure 10. Piano Key Strokes, from “piano” to “forte”,
Sensors sticked on the finger tips
Common key stroke recognition works with two measure points
and the delay between them with every stroke (See Figure 9). Explanations about key strokes could be assisted with this tool.
Guido Van den Berghe describes the system in [12] and mentions Self studies at home could be compared to the recorded data in the
the unsatisfying possibilities to create differentiated key strokes music lesson or from other musicians. Several basic pedagogical
on electric pianos. He also developed a force measurement system problems were explored. The non-releasing of the keys, a
in combination with a high speed camera: But no easy to use and common beginner’s fault can be detected and visualised. Fatigue
an external tool, where no connections in or on the piano, no and cramping are other reasons for this manner, even at advanced
sensors or fixed parts are needed. Even if there are better systems pupils or professionals.
existing now, there is no “force-detection” interface except Many examples could be given, one more is the problem of
someteimes the MIDI-attack recognition for piano or keyboards, rhythmical and dynamical irregularity. Time line, score and audio
to connect to a fine grain and high resolution force and pressure alignment of the data with adjustable time resolutions can show
measurement. clearly the problem, which might be difficult to hear. In these
cases, basic visualizations can simplify long-winded explanations.

4.2.3 Further Analysis

First experiments with further combined measurements are made.
For example: Food pedal usage and pressure measurement. Two
aspects of this measurement are useful. Seeing how and when, for
example a professional player uses the pedals. This could be
recorded in a commonly used audio software and analysed with
the sensor data aligned to an audio recording. Self recording helps
to explore the own usage, or the change of the usage of the pedal
during a longer exercising period.

4.3 Extended Music, Enhanced Scores and

Figure 9. Key velocity measurement system in an electric Augmented Instruments
piano [10] Figure 11 shows a score with additional fingerings with the
thumb. The Zero before the conventional fingering is the left hand
thumb on the violin. It is used like an additional finger.The
second number next to the “0” gives the position of the thumb, if
4.2.2 Pedagogical Issues, Stiffness Recognition the sensor allows more than one active area. In this case three
In Figure 10, the changes of the force of slow keystrokes with
sensitive areas are used and in each the pressure of the thumb is
different attacks are shown.
detected. This allows not only switching effects on and off, even
more, adjusting the amount of data or sound manipulation.
Extended playing techniques, data or sound manipulation and
more voices are feasible.
The goal was the integration of sensors in common playing modes
and gestures. On the one hand thumb-steered real-time interaction
with electronic peripheries like computers and synthesizers, on the
other hand switching, manipulating and mixing of sound effects,
accompaniment and 3d-spatial sound position with integrated
sensors besides the fingerboard, in the range of the thumb.

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Similar methods could be applied to the right hand sensors, but it 5. REFERENCES
is quite difficult to change the pressure without influencing the [1] C. Poepel, D. Overholt. Recent Developments in Violin-
sound too much. related Digital Musical Instruments: Where Are We and
Where Are We Going? NIME06, 6th International Conference
on New Interfaces for Musical Expression, 2006.
[2] A. Askenfelt. Measurement of bow motion and bow force in
violin playing, Journal of Acoustical Society of America, 80,
[3] J. A. Paradiso and N. A. Gershenfeld. Musical applications of
electric field sensing, Computer Music Journal, 21:2, S 69-89,
MIT Press, Cambridge, Massachusetts, 1997.
[4] D. S. Young. Wireless sensor system for measurment of violin
bowing parameters, Stockholm Music Acoustics Conference,
Figure 11. Extended Score [5] M. Demoucron, R. Caussé. Sound synthesis of bowed string
instrumnets using a gesture based control of a physical model,
This extended score is a part of the piece “concertare” from International Conference on Noise & Vibration Engineering,
Tobias Grosshauser [13]. 2007.
For keyboard instruments for instance modulation is possible, just [6] E. Maestre, J. Janer, A. R. Jensenius and J. Malloch.
by changing the finger pressure after the key is already stroken. Extending gdif for instrumental gestures: the case of violin
This new playing technique allows new ways of articulation, even performance, International Computer Music Conference,
when the key is already pressed and sound effects like vibrato on Submitted, 2007.
the piano or keyboard instruments.
[7] F. Bevilaqua, N. Rasamimanana, E. Flety, S. Lemouton, F.
Baschet. The augmented violin project: research, composition
and performance report, NIME06, 6th International
4.4 Further Scenarios and Research Conference on New Interfaces for Musical Expression, 2006.
Further research will be observed with finger pressure
measurements on wind instruments and drums. In combination [8] E. Guaus, J. Bonada, A. Perez, E. Maestre, M. Blaauw.
with position recognition and acceleration sensors, most important Measuring the bow pressure force in a real violin
parameters are detected. performance, International Conference on Noise & Vibration
Engineering, 2007.
This pressure and force sensors provide more and more
possibilities for new music compositions in combination with [9] D. Young, A. Deshmane, Bowstroke Database: A Web-
extended scores and simplified real time interaction within Accessible Archive of Violin Bowing Data, NIME07, 7th
electronical environments. International Conference on New Interfaces for Musical
Expression, 2007
Concerning pedagogic issues, the systems and methods will be
more and more accurate and user friendly for a wider range of [10] M. Okner, T. Kernozek, Chinrest pressure in violin playing:
usage and target audience. type of music, chin rest, and shoulder pad as possible
mediators, Clin Biomech (Bristol, Avon)., 12(3):S12-S13,
The combination of traditional instruments, computer and high
tech tools like new sensors could motivate a new generation of
young musicians to learn with new methods they like and they are [11] R. Möller, Wentorf, High-speed-camera Recording of Pulp
more and more used to. Cheap and easy to use sensor systems Deformation while Playing Piano or Clavichord.
would support this development. Also if teaching and pedagogy Musikphysiologie und Musikermedizin, 2004, 11. Jg., Nr. 4,
would be more of an adventure and research for new possibilities [12] Guido Van den Berghe, Bart De Moor; Willem Minten,
and “unknown terrain”, making music, learning and playing a Modeling a Grand Piano Key Action, Computer Music
musical instrument and pracitising could be more fascinating. Journal, Vol. 19, No. 2. (Summer, 1995), pp. 15-22, The MIT
[13] T. Grosshauser, Concertare,, click

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

The development of motion tracking algorithms for low

cost inertial measurement units
Giuseppe Torre Javier Torres Mikael Fernstrom
Interaction Design Centre Tyndall National Institute Interaction Design Centre
University of Limerick Cork University University of Limerick
Limerick, Ireland Cork, Ireland Limerick, Ireland

In this paper, we describe an algorithm for the numerical
evaluation of the orientation of an object to which a cluster
of accelerometers, gyroscopes and magnetometers has been
attached. The algorithm is implemented through a set of
Max/Msp and pd new externals. Through the successful
implementation of the algorithm, we introduce Pointing-
at, a new gesture device for the control of sound in a 3D
environment. This work has been at the core of the Celeri-
tas Project, an interdisciplinary research project on motion
tracking technology and multimedia live performances be-
tween the Tyndall Institute of Cork and the Interaction Figure 1: Mote and its cluster of sensors with bat-
Design Centre of Limerick. tery pack. Dimensions are 25 x 25 x 50 mm

On the basis of the result achieved, we introduce in the
Tracking Orientation, Pitch Yaw and Roll, Quaternion, Eu- last paragraph Pointing-at, a new gesture device for the
ler, Orientation Matrix, Max/Msp,pd, Wireless Inertial Mea- control of sound in a 3D space or any surround system. The
surement Unit (WIMU) Sensors, Micro-Electro-Mechanical device can be used both in studio and live performances.
Systems (MEMS), Gyroscopes, Accelerometers, Magnetome- Our Celeritas system is built around the Tyndall’s 25mm
ters WIMU which is an array of sensors combined with a 12-
bit ADC [6, 4, 5, 7]. The sensor array is made up of three
1. INTRODUCTION single axis gyroscopes, two dual axis accelerometers and two
Motion Tracking technology has interested the multime- dual-axis magnetometers.
dia art community for two or more decades. Most of these The accelerometers measure the acceleration on the three
systems have tried to offer a valid alternative to camera- orthogonal axes (U, V and W as shown in Figure 2).
based system such as VNS[2] and EyesWeb [14]. Between The gyroscopes measures the angular rate around the
them are: DIEM [1], Troika Ranch[15], Shape Wrap, Pair three orthogonal axes.
and Wisear [19], Eco [17], Sensemble [13],The Hands [8]and The magnetometers measure the earth magnetic field on
Celeritas [20, 16] from the authors. the three orthogonal axes.
In this paper we describe the algorithm to numerically
solve the orientation of each single mote in our Celeritas
system. We also aim to give an introduction to the topic
to persons that aim to develop their own tracking device Before going into the description of the algorithm, we
(using Arduino for example). Although a full Max/Msp would like to introduce the reader to some of the most com-
and pd library has been developed and made available at mon terms in use to make easier the understanding of the
[10] , we have listed in the reference of this paper other following sections. A good explanation of this terms and of
Max/Msp developers [11, 12, 18] whose work has been freely the 3D math can be also found at [9].
released though their work focuses only on the conversion System of Reference. We will discuss the two systems of
between different numerical representation and does not in- reference: the Earth-Fixed one (x, y, z) which has the x
teract with the specific device specified above. axis pointing at the North Pole, y axis at west and z at
the Earth’ core. The IMU-Fixed frame (u, v, w) with three
orthogonal axes parallel to the sensor’s sensitive axes.
Quaternions form a 4-dimensional normed division alge-
Permission to make digital or hard copies of all or part of this work for bra over the real numbers. It is usually written in the form
personal or classroom use is granted without fee provided that copies are qw2 +qx2 +qy 2 +qz 2 = 1. Quaternions are used to represent
not made or distributed for profit or commercial advantage and that copies the rotation of an object in a 3D space. They are very com-
bear this notice and the full citation on the first page. To copy otherwise, to mon in programming, as they don’t suffer from problems
republish, to post on servers or to redistribute to lists, requires prior specific with singularities at 90 degrees.
permission and/or a fee.
NIME08, Genova, Italy Euler Angles The Euler angles are usually given in aero-
Copyright 2008 Copyright remains with the author(s). nautical term as Pitch, Roll and Yaw as shown in Figure

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Table 1: Unit of measurement

Sensor from to
Gyroscopes ADC degrees/second
Accelerometers ADC m/s2
Magnetometer ADC gauss

environment are ADC in the range of 0 and 4096 (as our

microcontroller is 12 -bit resolution). After having calcu-
lated the offset by averaging the first thousand incoming
values leaving the sensor in steady position, we are able to
read the values related to the movement of the sensor by
subtracting from the calculated offset. Then we need to
convert the ADC values from each sensor into the proper
units of measurement as shown in Table 1.
Multiplying the subtracted ADC value by the rate reso-
Figure 2: sensor and IMU reference system.
lution value of each sensor can do this. The rate resolution
value can be found in the specification sheet of each sensor
or by empirical methods.

3.2 Orientation using Gyroscopes

Sampling at t rate enables us at each Δt to know α, φ
and θ applying the following formulas:
Δθ(k + 1) = Δt ∗ Δθ̇(k + 1);

Δφ(k + 1) = Δt ∗ Δφ̇(k + 1);

Figure 3: Orientation Algorithm.

Δα(k + 1) = Δt ∗ Δα̇(k + 1);
where Δθ, Δφ, Δα represent the incremental angle around
2, where: Pitch is the rotation around the lateral (V) axis,
W, V, and U respectively. Next, the algorithm constructs
Roll around the longitudinal (U) axis and Yaw around the
the rotation matrix around each particular axis and multi-
perpendicular (W) one. The calculation involves the usage
ply them together.
of non-commutative matrix multiplication.
Orientation Matrix mathematically represents a mathe- 0 1
matical bases change in a 3 dimensional space, thus, we can cosΔθ(k + 1) −sinΔθ(k + 1) 0
translate the sensors output coordinates, given with respect R(w, θ, k + 1) = @ sinΔθ(k + 1) cosΔθ(k + 1) 0 A
to the IMU fixed frame, into the reference Earth frame using 0 0 1
the Orientation Matrix.
Angle x, y, z describe rotation using a unit vector indi- 0 1
cosΔφ(k + 1) 0 sinΔφ(k + 1)
cating the direction of the axis and an angle indicating the R(v, φ, k + 1) = @ 0 1 0 A
magnitude of the rotation about the considered axis. −sinΔφ(k + 1) 0 cosΔφ(k + 1)

1 0 0
With our cluster of sensors we calculate the orientation of R(v, α, k + 1) = @ 0 cosΔα(k + 1) −sinΔα(k + 1) A
the sensor with respect of the Earth-fixed frame of reference. 0 sinΔα(k + 1) cosΔα(k + 1)
The orientation is retrieved using two source of estimation:
the output of the gyroscopes and then the combination of which can be generally written as:
accelerometers and magnetometers on the other. The rea- Rotation(k+1) = R(w, θ, k+1)∗R(v, φ, k+1)∗R(v, α, k+1);
sons for doing this is that gyroscopes are not self-sufficient
for long-term precision because of a drift associated with Therefore we define our Orientation, in Matrix format, as
their reading. Accelerometers and magnetometers, on the to be:
other hand, are good for long-term stability but, once again,
Orientation(k + 1) = Rotation(k + 1) ∗ Orientation(k);
not good for short-term accuracy due to occasional inaccu-
racy caused by linear and rotational acceleration. Thus, From these results, the algorithm converts the resulting
our algorithm combines the short-term precision of the gy- matrix into quaternion and angle,x,y,z format which facili-
roscopes with the long-term precision of accelerometers and tate ease of use in graphical oriented programming language
magnetometers. such as Max/Msp and pd.

3.1 Reading the values from the sensor 3.3 Orientation using Accelerometers and Mag-
As the data from the motes are sent wirelessly to a base netometers
station connected to the host computer via serial port, we So far we considered the 3 x 3 Orientation Matrix as the
designed a C driver to handle this stream. Ultimately, we matrix describing the orientation of the IMU-fixed frame in
compiled a new external (mote) to import this stream in relation to the Earth-fixed frame. Conversely, the Inverse
Max/Msp or Pd. Values appearing in our host application Rotation Matrix describes the orientation of the Earth-fixed

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Figure 4: Pseudo Max patch. Figure 5: Pointing-at out of the shell.

frame in relation to the IMU-fixed frame and can be written quat2axis Converts the quaternion format to the angle, x,
as: y, z format.
azi ele Converts the input to azimuth and elevation num-
0 1 bers making the format readable by Vector Base Amplitude
a11 a12 a13
Orientation 1 = @ a21 a22 a23 A
− Panning (VBAP) or other multi-channel libraries.
a31 a32 a33 A schematic of the max patch is shown in Figure 4.

The two dual-axis magnetometers enable the reading of 5. APPLICATION DEVELOPMENT

the earth’s magnetic field on the three orthogonal axes.
These values are used to calculate the first column of the In- On the basis of the above algorithm, we developed sev-
verse Orientation Matrix (a11, a21,a31) using the following eral applications for multidisciplinary live performances like
set of formulas: Vitruvian for live dance performance and DjMote. In this
paper we introduce Pointing-at, a new gesture device for
0 1 0 1 the control of sounds in a 3-D surround environment.
a11 Hu
@ a21 A = 1/H ∗ @ Hv A 5.1 Pointing-at
a31 Hw Pointing-at is a new wearable wireless glove that uses the
where H is the earth magnetic field magnitude and Hu, results of our Celeritas project and the reliability of the
Hv and Hw are the magnetic filed vector measured by the Tyndall’s 25mm WIMU to control sounds in a 3-D or any
sensor along U, V and W respectively. set of surround system. Its design is focused on the anal-
To calculate the third column (a13, a23, a33) of the In- ysis of the methodologies concerning the gestural mapping
verse Orientation Matrix, we use the values read from the of sounds in a fully three-dimensional environment. As the
two dual-axis accelerometers. The formula used is: most natural movement related to directionality is the sim-
ple pointing of the hand in a given direction, we decided to
0 1 0 1 use the orientation of the hand/arm as indicator of this di-
a13 Gu rection. Thus, we fitted the WIMU in a glove, which has a
@ a23 A = 1/g ∗ @ Gv A
protective pocket on the top of the hand’s dorsum as shown
a33 Gw in Figure 5.
where g describe the earth gravity acceleration magnitude
and Gu, Gv and Gw the acceleration vector measured by
5.2 Gesture Mapping
the IMU along U, V and W The orientation data retrieved from the reading of the
Finally the third column (a12, a22, a23) is calculated WIMU are translated into azimuth and elevation coordi-
from the cross product between the first and the third col- nates making the data compatible to libraries such as VBAP[3].
umn as described below: Azimuth and elevation are calculated taking into account
the z-axis the main axis of reference. The third variable
0 1 0 1 that is a characteristic of surround sound editor systems
a12 (a21 ∗ a33) − (a31 ∗ a23) is source distance. The gestural mapping of this parame-
@ a22 A = @ (a31 ∗ a13) − (a11 ∗ a23) A
ter has been solved in the following way: a 90 degree roll
a32 (a11 ∗ a23) − (a21 ∗ a13) movement enable the azimuth to be read as distance value
in the range between 0 and 90 where 0 indicates the farthest
4. ORIENTATION LIBRARY AND MAPPING distance and 90 the closest (Fig 6).
For the purpose of the numerical evaluation of the sensor’s
orientation an ad hoc set of Max/Msp and pd external were 6. CONCLUSION AND FUTURE WORK
developed. The most important are listed below.. In this paper we described an algorithm that is used to
Orientation Calculates the Orientation of the IMU. The retrieve the orientation of an object that has attached to it
inlet receives a list made up of the following elements: pitch, a cluster of sensor made up of accelerometers, gyroscopes
yaw, roll, n pack, sampling time, Alpha rate resolution, Phi- and magnetometers. We also introduced a new wearable
rateresolution, Theta rate resolution. Each of the 9 outlets wireless glove, Pointing-at, for the gestural control of sounds
are element of the 3 x 3 Matrix format representing the in a 3-D surround space. The device was tested in our lab
IMU’s orientation. and proved to be a reliable tool for live performances. At
matrix2quat Converts the 3 x 3 Matrix to Quaternion the moment our team is working on the implementation of
format. a bending sensor (see red strip in Figure 5) to enable the

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

dance and music systems. Computer Music Journal,

24(1):57– 69, April.
[15] M.Coniglio.
[16] B. O’Flynn, G. Torre, M. Fernstrom, T. Winkler,
A. Lynch, J. Barton, P. Angove, and C. O’Mathuna.
Celeritas - a wearable wireless sensor system for
interactive digital dance theater. Body Sensor
Network, 4th Internation Workshop on Wearable and
Implantable Body Sensor Network, 2007.
[17] C. Park and P. H. Chou. Eco, ultra-wearable and
expandable wireless sensor platform. Proceedings of
the International Workshop on Wearable and
Implantable Body Sensor Networks (BSN’06), pages
162 – 165, April 2006.
[18] D. Sachs. A forearm controller and tactile display,
[19] D. Topper and P. Swendensen. Wireless dance control:
Pair and wisear. Proc. Of the 2005 International
Figure 6: Gesture to control distance parameter. Conference on New Interfaces for Musical Expression
(NIME’05),, pages 76 –79, May 2006 Canada.
[20] G. Torre, M. Fernstrom, B. O’Flynn, and P. Angove.
grabbing and release feature of sound on the fly. In future Celeritas wearable wireless system. Proc. Of the 2005
works we aim to reduce the size of the WIMU to 10mm to International Conference on New Interfaces for
improve wearability. Musical Expression (NIME’07, pages 205 – 208, 2007
New York.
We would like to thank the Tyndall National Institute
for allowing us access under the Science Foundation Ireland
sponsored National Access Programme. Many thanks also
to the staff at the Interaction Design Centre of the Univer-
sity of Limerick for their input in to the project and real-
ization of Pointing-at. Finally thanks to all the researchers
and students at Tyndall and UL who gave an important
input to this project.

Additional Author: Brendan OFlynn, Tyndall National
Institute - Cork University, email:

[3] ville/.
[5],,764 801 adxrs150,00.html.
[7] card.asp?part id=2018.
[11] release.html.
[12] library=111.
[13] R. Aylward, S. D. Lovell, and J. Paradiso.
”sensemble: A wireless, compact, multi-user sensor
system for interactive dance”. The 2006 International
Conference on New Interfaces for Musical Expression
(NIME’06), (134 - 139), June 2006.
[14] A. Camurri, S. Hashimoto, M. Ricchetti, A. Ricci,
K. Suzuki, R. Trocca, and G. Volpe. Eyesweb:
Toward gesture and affect recognition in interactive

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy

Application of new Fiber and Malleable Materials for

Agile Development of Augmented Instruments and
Adrian Freed
CNMAT (Center for New Music and Audio Audio Technology)
Dept of Music UC Berkeley,
1750 Arch Street, Berkeley CA 94709
+1 510 455 4335
ABSTRACT community – notably: high channel count, high resolution data
acquisition [1, 5, 12]; OSC wrapping, mapping, scaling and
The paper introduces new fiber and malleable materials, calibrating [17]; and visual programming dataflow languages
including piezoresistive fabric and conductive heat-shrink [15] tuned for media and arts applications, e.g., Max/MSP, Pd,
tubing, and shows techniques and examples of how they may
SuperColider, Processing, etc.
be used for rapid prototyping and agile development of musical
instrument controllers. New implementations of well-known The paper is structured as follows. Section 2 introduces new
designs are covered as well as enhancements of existing materials that facilitate rapid processing by describing a series
controllers. Finally, two new controllers are introduced that are of variations on a single theme: the humble footswitch. Section
made possible by these recently available materials and 3 shows how existing controllers can be rapidly improved using
construction techniques. those new materials. Section 4 describes novel controllers made
possible by the new materials. A conclusion considers the
Keywords challenges of teaching the design and construction techniques
that these new materials demand.
Agile Development, Rapid Prototyping, Conductive fabric,
Piezoresistive fabric, conductive heatshrink tubing, augmented
instruments. 2. Variants of the footswitch
A simple controller perhaps, but the humble footswitch finds
1. INTRODUCTION wide use in technology-based musical performance contexts
because the performer’s hands are usually actively engaged
Many human activities have a variant form optimized to deliver
playing an instrument. The traditional approach is to mount a
results in the shortest time. The idea is the same although the
heavy-duty mechanical switch into a solid metal box. These
names vary: short story writing, hacking, sketching, composing
“stomp boxes” are standard tools of the electric guitarist. As a
esquisses, short-order cooking, improv., fast-turn, live coding,
vehicle for exploration of new fiber and malleable materials we
rapid prototyping etc. Rapid and agile development of
will improve on the traditional stomp box by adding the
augmented instruments and controller prototypes is valuable
requirement that switch operation should be silent.
because many of the important insights that guide design
refinements are unavailable until performers have experienced The basic design pattern for a switch is to combine an
and used a device. The best predictor of the effectiveness of interruptible electrical conduction path (contact system) with a
new controller design is usually the number of design iterations device that returns the contacts to a stable rest equilibrium
available. when the actuating force is removed. The additional challenge
we face is that the motions created by these forces have to be
Controller projects usually involve co-design of musical
dampened to minimize the sounds they make.
mapping software, electronics for the sensor data acquisition,
and the physical interface. Providing an approximate physical
interface early speeds development of the other components of
2.1 Floor Protector + FSR
the system by providing both a real data source and time for Figure 1 shows a solution that can be assembled in a few
performance skills to be acquired.