Uosc The Open Sound Control Reference PL PDF

8th International Conference
on New Interfaces for Musical Expression
NIME08
Casa Paganini – InfoMus Lab

Genova, Italy
June 4 – 8, 2008
Proceedings
Antonio Camurri, Stefania Serafin, and Gualtiero Volpe

Editors
http://nime08.casapaganini.org
http://www.casapaganini.org
http://www.infomus.org

Printed by: BETAGRAFICA scrl
ISBN:13-978-88-901344-6-3

In collaboration with
Facoltà di Lettere e Filosofia, Università degli Studi di Genova
Conservatorio di Musica “Niccolò Paganini”
Museo d’Arte Contemporanea Villa Croce
Ufficio Paganini, Comune di Genova
Polo del Mediterraneo per l’Arte, la Musica e lo Spettacolo
Accademia Ligustica di Belle Arti
Casa della Musica
GOG − Giovine Orchestra Genovese
AIMI (Associazione di Informatica Musicale Italiana)
Centro Italiano Studi Skrjabiniani
Goethe−Institut Genua
Fondazione Bogliasco
Fondazione Spinola
Associazione Amici di Paganini
Festival della Scienza
Radio Babboleo
With the support of

Regione Liguria
Comune di Genova
Provincia di Genova

EU ICT Project SAME (www.sameproject.eu)

EU Culture 2007 Project CoMeDiA (www.comedia.eu.org)
Partners
Sipra SpA
Sugar Srl
I

NIME 08 Committees
Conference Chairs Alvaro Barbosa Sergi Jorda

Antonio Camurri Marc Battier Martin Kaltenbrunner
Gualtiero Volpe Frauke Behrendt Spencer Kiser
Kirsty Beilharz
Program Chair Benjamin Knapp
Edgar Berdahl
Stefania Serafin Juraj Kojs
Tina Blaine
Performance Chair Sinan Bokesoy Eric Lee
Roberto Doati Niels Bottcher Jonathan F. Lee
Eoin Brazil Paul Lehrman
Installation and Roberto Bresin
Demo Chair Michael Lew
Andrew Brouse Eric Lyon
Corrado Canepa Nick Bryan-Kinns
Michael Lyons
Club NIME Chair Claude Cadoz
Arthur Clay Thor Magnusson
Donald Glowinski
Ted Coffey Joseph Malloc
David Cournapeau Eduardo Miranda
NIME Steering Langdon Crawford Thomas Moeslund
Committee Smilen Dimitrov Katherine Moriwaki
Frédéric Bevilacqua Gerhard Eckel Teresa M. Nakra
Tina Blaine Georg Essl
Kazushi Nishimoto
Michael Lyons Stuart Favilla
Rolf Nordahl
Sile O'Modhrain Sidney Fels
Dan Overholt
Yoichi Nagashima Mikael Fernstrom
Garth Paine
Joe Paradiso Federico Fontana
Jyri Pakarinen
Carol Parkinson Alexandre R.J. Francois
Sandra Pauletto
Norbert Schnell Jason Freeman
Cornelius Pöpel
Eric Singer Ichiro Fujinaga
Robert Rowe
Atau Tanaka Lalya Gaye
Joran Rudi
Steven Gelineck
Margaret Schedel
Paper and Poster David Gerhard
Greg Schiemer
Committee Jeff Gray
Andrew Schloss
Michael Gurevich
Hugo Solis
Meta-reviewers Keith Hamel
Christa Sommerer
Kia Ng Kjetil Falkenberg
Hans-Christoph Steiner
Sile O'Modhrain Hansen
Matthew Suttor
Stefania Serafin David Hindman
George Tzanetakis
Bill Verplank Andy Hunt
Carr Wilkerson
Marcelo Wanderley Robert Huott
Matthew Wright
Ge Wang Alex Jaimes
Tomoko Yonezawa
Reviewers Jordi Janer
Diana Young
Anders-Petter Alexander Refsum
Michael F. Zbyszynski
Andersson Jensenius
III

Performance Sandra Solimano Laura Santini
Committee Atau Tanaka Olivier Villon
Miguel Azguime
Andreas Breitscheid Demo Committee Organizing Committee
Pascal Decroupet Alain Crevoisier Corrado Canepa
Michael Edwards Sofia Dahl Francesca Cavallero
Neil Leonard Amalia De Gotzen Roberto Doati
Michelangelo Lupone Emmanuel Flety Nicola Ferrari
Pietro Polotti Matija Marolt Roberta Fraguglia
Curtis Roads Barbara Mazzarino Donald Glowinski
Jøran Rudi Douglas Irving Repetto Lauro Magnani
Rodrigo Sigal Kenji Suzuki Andrea Masotti
Alvise Vidolin Andrea Valle Barbara Mazzarino
Daniel Weissberg Giovanna Varni Valentina Perasso
Iannis Zannos Roberto Sagoleo
Club NIME Committee Marzia Simonetti
Installation Committee Frédéric Bevilacqua Francesca Sivori
Jamie Allen Nicolas Boillot Sandra Solimano
Philippe Baudelot Marco Canepa
Nicola Bernardini Jaime Del Val NIME Secretariat
Riccardo Dapelo Davide Ferrari Roberta Fraguglia
Scott deLahunta Jean Jeltsch Francesca Sivori
Nicola Ferrari Eric Lapie
Sergi Jorda Claudio Lugo
Lauro Magnani Press
Leïla Olivesi
Pedro Rebelo Michele Coralli
Kéa Ostovany
Franco Sborgi Guillaume Pellerin
Eric Singer Nicolas Rasamimanana Cover design
Pavel Smetana Christophe Rosenberg Studiofluo
IV

Preface
We are proud to present the 8th edition of the International Conference on New
Interfaces for Musical Expression (NIME08), hosted by Casa Paganini - InfoMus Lab,
Università degli Studi di Genova.
Since 2005, InfoMus Lab has its new premises in the recently restored monumental
building of S. Maria delle Grazie La Nuova - Casa Paganini. The International Centre of
Excellence Casa Paganini – InfoMus Lab aims at cross-fertilizing scientific and
technological research with humanistic and artistic research. Our research explores the
relationships between music, science and emerging technologies: a mission that recalls
Niccolò Paganini´s spirit of experimentation.
New perspectives in contemporary music, in multimedia and digital luthery are among
the main purposes of the Centre. Casa Paganini - InfoMus Lab studies new directions in
scientific and technological research to improve quality of life (e.g., therapy and
rehabilitation, leisure, sport, edutainment), to develop novel industrial applications and
services (e.g., innovative interfaces and multimedia applications), to contribute to
culture (e.g., museography, support cultural heritage through new technologies).
In this framework, the NIME Conference is a unique occasion for Casa Paganini to
present on the one hand its research outcomes and activities to the scientific community
and on the other hand to get inspiration and feedback for future work. Further, our
efforts have been directed in involving in NIME the most important institutions and the
whole city of Genova. For example, besides the monumental site of Casa Paganini,
which hosts the welcome concert and the scientific sessions, concerts will be held at the
Music Conservatory “Niccolò Paganini”, demos at Casa della Musica, installations at the
Museum of Contemporary Art “Villa Croce” and at the Faculty of Arts and Philosophy of
the University of Genova, posters in the ancient convent of Santa Maria di Castello, club
NIME performances at four different cafés and clubs in Genova (010, Banano Tsunami,
Cafè Garibaldi, Mentelocale).
The scientific program of NIME08 includes 2 keynote lectures, 34 oral presentations,

and 40 poster presentations, selected by the program committee out of 105
submissions. We are honored to welcome as guest speakers Xavier Serra, head of the
music technology group at the University Pompeu Fabra in Barcelona, and Andrew
Gerzso, director of the pedagogical department at IRCAM. Moreover, 2 panel
discussions will address relevant issues in current research in sound and music
computing: networked music performances and active listening and embodied music
cognition. The program also includes 22 demos, organized in 3 demo sessions.
The artistic program encompasses a welcome concert, 3 NIME concerts, 4 Club NIME
performances, and 7 installations. The NIME concerts and the Club NIME performances
include 23 music pieces, selected by the program committee out of 63 submissions.
The welcome concert on June 4 evening, offered by Casa Paganini – InfoMus Lab in
collaboration with major music institutions in Genova, will present 4 novel music pieces
by young composers using EyesWeb XMI: one of the pieces has been commissioned to
tackle some open problems on networked performance faced in the EU Culture 2007
Project CoMeDiA; another piece has been commissioned to exploit a paradigm of
"active music listening" which is part of the EU FP7 ICT Project SAME.
V

Four workshops will precede and follow the official NIME program on June 4 and 8: a
workshop on technology enhanced music education, a tablet workshop for performers
and teachers, one on Jamoma, and one on techniques for gesture measurement in
musical performance.
Moreover, this year the 4th Sound and Music Computing (SMC) Summer School is held
at Casa Paganini in connection with NIME08, on June 9 - 11, 2008. The program of the
school includes plenary lectures, poster sessions, and hands-on activities. The school
will address the following topics: Gesture and Music - Embodied Music Cognition,
Mobile Music Systems, and Active Music Listening.
Organizing the NIME Conference is a huge effort, which is affordable only with the help
of many people. We would like to thank the members of the NIME Steering Committee
for the precious and wise suggestions, the demo and installation chair Corrado Canepa,
the performance chair Roberto Doati, the club performance chair Donald Glowinski, and
the members of our program committees who helped in the final selection of papers,
posters, demos, installations, and performances.
We wish to thank the Rector of the University of Genova Professor Gaetano Bignardi,
the Culture Councilor of Regione Liguria Fabio Morchio, and the Culture Councilor of
Provincia di Genova Giorgio Devoto, whose support has been of vital importance for the
creation and maturation of the project of Casa Paganini project.
We wish to thank Professor Gianni Vernazza, Head of the Faculty of Engineering,
Professor Riccardo Minciardi, Director of the DIST-University of Genova, the colleagues
Lauro Magnani and Franco Sborgi, Professors at the University of Genova; Patrizia
Conti - Director of the Music Conservatory “Niccolò Paganini”; Sandra Solimano -
Director of the Museum of Contemporary Art “Villa Croce”; Teresa Sardanelli - Head of
the Direzione Cultura e Promozione della Città of Comune di Genova and Anna Rita
Certo - Head of the Ufficio Paganini of Comune di Genova; Pietro Borgonovo - Artistic
Director of GOG - Giovine Orchestra Genovese; Enrico Bonanni and Maria Franca
Floris of the Dipartimento Ricerca, Innovazione, Istruzione, Formazione, Politiche
Giovanili, Cultura e Turismo of Regione Liguria; Roberta Canu - Director of Goethe-
Institut Genua; Vittorio Bo and Manuela Arata - Directors of Festival della Scienza;
Francesca Sivori - Vice-President of the Centro Italiano Studi Skrjabiniani; Andrea
Masotti and Edoardo Lattes - Casa della Musica; Giorgio De Martino – Artistic Director
of Fondazione Spinola; Laura Santini of Mentelocale.
Finally, we thank the whole staff of InfoMus Lab – Casa Paganini for their precious help
and the hard work in the organization of the conference.
Enjoy NIME 08!
Antonio Camurri and Gualtiero Volpe

NIME 08 Conference Chairs
Stefania Serafin
NIME 08 Program Chair
Genova, May 8, 2008
VI

Table of Contents
PAPERS 1
_____________________________________________________________________
Thursday, June 5, 2008
Session 1: Networked music performance 1

David Kim-Boyle
Network Musics - Play, Engagement and the Democratization of Performance ....... 3
Álvaro Barbosa
Ten-Hand Piano: A Networked Music Installation..................................................... 9
Mike Wozniewski, Nicolas Bouillot, Zack Settel, Jeremy R. Cooperstock
Large-Scale Mobile Audio Environments for Collaborative Musical Interaction ........ 13
Session 2: Networked music performance 2

Angelo Fraietta
Open Sound Control: Constraints and Limitation...................................................... 19
Matteo Bozzolan, Giovanni Cospito
SMuSIM: a Prototype of Multichannel Spatialization System with
Multimodal Interaction Interface................................................................................ 24
Session 3: Analysis of performers gesture

and gestural control of musical instruments
Chris Nash, Alan Blackwell
Realtime Representation and Gestural Control of Musical Polytempi ...................... 28
Mikael Laurson, Mika Kuuskankare
Towards Idiomatic and Flexible Score-based Gestural Control
with a Scripting Language ........................................................................................ 34
Alexandre Bouënard, Sylvie Gibet, Marcelo M. Wanderley
Enhancing the visualization of percussion gestures
by virtual character animation ................................................................................... 38
Diana Young
Classification of Common Violin Bowing Techniques
Using Gesture Data from a Playable Measurement System..................................... 44
Friday, June 6, 2008
Session 4: Instruments 1
Jyri Pakarinen, Vesa Välimäki, Tapio Puputti
Slide guitar synthesizer with gestural control ............................................................ 49
VII

Otso Lähdeoja
An Approach to Instrument Augmentation: the Electric Guitar.................................. 53
Juhani Räisänen
Sormina - a new virtual and tangible instrument ....................................................... 57
Edgar Berdahl, Hans-Christoph Steiner, Collin Oldham
Practical Hardware and Algorithms for Creating Haptic Musical Instruments ........... 61
Amit Zoran, Pattie Maes
Considering Virtual & Physical Aspects in Acoustic Guitar Design ........................... 67
Session 5: Instruments 2
Dylan Menzies
Virtual Intimacy : Phya as an Instrument .................................................................. 71
Jennifer Butler
Creating Pedagogical Etudes for Interactive Instruments ......................................... 77
Session 6: Evaluation and HCI methodologies

Dan Stowell, Mark D. Plumbley, Nick Bryan-Kinns
Discourse analysis evaluation method for expressive musical interfaces ................. 81
Chris Kiefer, Nick Collins, Geraldine Fitzpatrick
HCI Methodology For Evaluating Musical Controllers: A Case Study ....................... 87
Olivier Bau, Atau Tanaka, Wendy Mackay
The A20: Musical Metaphors for Interface Design .................................................... 91
Session 7: Sensing systems and measurement technologies

Tobias Grosshauser
Low Force Pressure Measurement: Pressure Sensor Matrices
for Gesture Analysis, Stiffness Recognition and Augmented Instruments ................ 97
Giuseppe Torre, Javier Torres, Mikael Fernstrom
The development of motion tracking algorithms
for low cost inertial measurement units - POINTING-AT - ........................................ 103
Adrian Freed
Application of new Fiber and Malleable Materials
for Agile Development of Augmented Instruments and Controllers .......................... 107
Alain Crevoisier, Greg Kellum
Transforming Ordinary Surfaces Into Multi-touch Controllers ................................... 113
Nicholas Ward, Kedzie Penfield, Sile OʼModhrain, R. Benjamin Knapp
A Study of Two Thereminists:
Towards Movement Informed Instrument Design ..................................................... 117
VIII

Saturday, June 7, 2008
Session 8: Active listening to sound and music content

Vassilios-Fivos A. Maniatakos, Christian Jacquemin
Towards an affective gesture interface for expressive music performance .............. 122
Anna Källblad, Anders Friberg, Karl Svensson, Elisabet Sjöstedt Edelholm
Hoppsa Universum – An interactive dance installation for children .......................... 128
Antonio Camurri, Corrado Canepa, Paolo Coletta,
Barbara Mazzarino, Gualtiero Volpe
Mappe per Affetti Erranti: a Multimodal System
for Social Active Listening and Expressive Performance .......................................... 134
Session 9: Agent-based systems

Sergio Canazza, Antonina Dattolo
New data structure for old musical open works ........................................................ 140
Arne Eigenfeldt, Ajay Kapur
An Agent-based System for Robotic Musical Performance ...................................... 144
Session 10: Sensing systems and measurement technologies

Maurizio Goina, Pietro Polotti
Elementary Gestalts for Gesture Sonification ........................................................... 150
Stefano Delle Monache, Pietro Polotti, Stefano Papetti, Davide Rocchesso
Sonic Augmented Found Objects ............................................................................. 154
Jean-Marc Pelletier
Sonified Motion Flow Fields as a Means of Musical Expression............................... 158
Josh Dubrau, Mark Havryliv
P[a]ra[pra]xis: Poetry in Motion................................................................................. 164
Jan C. Schacher
davos soundscape, a location based interactive composition .................................. 168
POSTERS 173
_____________________________________________________________________
Thursday, June 5, 2008 - Session 1

Andy Schmeder, Adrian Freed
uOSC: The Open Sound Control Reference Platform for Embedded Devices ......... 175
Timothy Place, Trond Lossius, Alexander Refsum Jensenius
Addressing Classes by Differentiating Values and Properties in OSC ..................... 181
Ananya Misra, Georg Essl, Michael Rohs
Microphone as Sensor in Mobile Phone Performance .............................................. 185
IX

Nicolas Bouillot, Mike Wozniewski, Zack Settle, Jeremy R. Cooperstock
A Mobile Wireless Augmented Guitar ....................................................................... 189
Robert Jacobs, Mark Feldmeier, Joseph A. Paradiso
A Mobile Music Environment Using a PD Compiler and Wireless Sensors .............. 193
Ross Bencina, Danielle Wilde, Somaya Langley
Gesture ≈ Sound Experiments: Process and Mappings ........................................... 197
Miha Ciglar
“3rd. Pole” - a Composition Performed via Gestural Cues ........................................ 203
Kjetil Falkenberg Hansen, Marcos Alonso
More DJ techniques on the reactable ....................................................................... 207
Smilen Dimitrov, Marcos Alonso, Stefania Serafin
Developing block-movement, physical-model based objects for the Reactable ....... 211
Jean-Baptiste Thiebaut, Samer Abdallah, Andrew Robertson
Real Time Gesture Learning and Recognition: Towards Automatic Categorization . 215
Mari Kimura
Making of VITESSIMO for Augmented Violin:
Compositional Process and Performance ................................................................ 219
Joern Loviscach
Programming a Music Synthesizer through Data Mining .......................................... 221
Kia Ng, Paolo Nesi
i-Maestro: Technology-Enhanced Learning and Teaching for Music ........................ 225
Friday, June 6, 2008 - Session 2

Bart Kuyken, Wouter Verstichel, Frederick Bossuyt, Jan Vanfleteren,
Michiel Demey, Marc Leman
The HOP sensor: Wireless Motion Sensor ............................................................... 229
Niall Coghlan, R. Benjamin Knapp
Sensory Chairs: a System for Biosignal Research and Performance ....................... 233
Andrew B. Godbehere, Nathan J. Ward
Wearable Interfaces for Cyberphysical Musical Expression ..................................... 237
Kouki Hayafuchi, Kenji Suzuki
MusicGlove: A Wearable Musical Controller for Massive Media Library ................... 241
Michael Zbyszynski
An Elementary Method for Tablet ............................................................................. 245
Gerard Roma, Anna Xambó
A tabletop waveform editor for live performance ...................................................... 249
Andrea Valle
Integrated Algorithmic Composition. Fluid systems
for including notation in music composition cycle ..................................................... 253
Andrea Valle
GeoGraphy: a real-time, graph-based composition environment ............................. 257
X

Ioannis Zannos, Jean-Pierre Hébert
Multi-Platform Development of Audiovisual and Kinetic Installations........................ 261
Greg Corness
Performer model: Towards a Framework for Interactive Performance
Based on Perceived Intention ................................................................................... 265
Paulo Cesar Teles, Aidan Boyle
Developing an “Antigenous” Art Installation
Based on A Touchless Endo-system Interface ......................................................... 269
Silvia Lanzalone
The ‘suspended clarinet’ with the ‘uncaused sound’.
Description of a renewed musical instrument ........................................................... 273
Mitsuyo Hashida, Yosuke Ito, Haruhiro Katayose
A Directable Performance Rendering System: Itopul ............................................... 277
William R. Hazlewood, Ian Knopke
Designing Ambient Musical Information Systems ..................................................... 281
Saturday, June 7, 2008 - Session 3

Aristotelis Hadjakos, Erwin Aitenbichler, Max Mühlhäuser
The Elbow Piano: Sonification of Piano Playing Movements .................................... 285
Yoshinari Takegawa, Tsutomu Terada, Masahiko Tsukamoto
UnitKeyboard: an Easy Configurable Compact Clavier ........................................... 289
Cléo Palacio-Quintin
Eight Years of Practice on the Hyper-Flute:
Technological and Musical Perspectives .................................................................. 293
Edgar Berdahl, Julius O. Smith III
A Tangible Virtual Vibrating String ............................................................................ 299
Christian Geiger, Holger Reckter, David Paschke, Florian Schutz, Cornelius Pöpel
Towards Participatory Design and Evaluation
of Theremin-based Musical Interfaces ...................................................................... 303
Tomás Henriques
META-EVI: Innovative Performance Paths with a Wind Controller ........................... 307
Robin Price, Pedro Rebelo
Database and mapping design for audiovisual prepared radio set installation ......... 311
Kazuhiro Jo, Norihisa Nagano
Monalisa: "see the sound, hear the image" .............................................................. 315
Andrew Robertson, Mark D. Plumbley, Nick Bryan-Kinns
A Turing Test for B-Keeper: Evaluating an Interactive Real-Time Beat-Tracker....... 319
Gabriel Gatzsche, Markus Mehnert, Christian Stöcklmeier
Interaction with tonal pitch spaces ............................................................................ 325
Parag Chordia, Alex Rae
real-time Raag Recognition for Interactive Music ..................................................... 331
XI

Anders Vinjar
Bending Common Music with Physical Models ........................................................ 335
Margaret Schedel, Alison Rootberg, Elizabeth de Martelly
Scoring an Interactive, Multimedia Performance Work ............................................. 339
DEMOS1 343
_____________________________________________________________________
Ayaka Endo, Yasuo Kuhara

Rhythmic Instruments Ensemble Simulator
Generating Animation Movies Using Bluetooth Game Controller ............................. 345
Keith A. McMillen
Stage-Worthy Sensor Bows for Stringed Instruments .............................................. 347
Lesley Flanigan, Andrew Doro
Plink Jet .................................................................................................................... 349
Yusuke Kamiyama, Mai Tanaka, Hiroya Tanaka
Oto-Shigure: An Umbrella-Shaped Sound Generator for Musical Expression .......... 352
Sean Follmer, Chris Warren, Adnan Marquez-Borbon
The Pond: Interactive Multimedia Installation ........................................................... 354
Ethan Hartman, Jeff Cooper, Kyle Spratt
Swing Set: Musical Controllers with Inherent Physical Dynamics............................. 356
Paul Modler, Tony Myatt
Video Based Recognition of Hand Gestures by Neural Networks
for The Control of Sound and Music ......................................................................... 358
Kenji Suzuki, Miho Kyoya, Takahiro Kamatani, Toshiaki Uchiyama
beacon: Embodied Sound Media Environment for Socio-Musical Interaction .......... 360
Eva Sjuve
Prototype GO: Wireless Controller for Pure Data ..................................................... 362
Robert Macrae, Simon Dixon
From toy to tutor: Note-Scroller is a game to teach musi .......................................... 364
Stuart Favilla, Joanne Cannon, Tony Hicks, Dale Chant, Paris Favilla
Gluisax: Bent Leather Band’s Augmented Saxophone Project ................................. 366
Staas De Jong
The Cyclotactor : Towards a Tactile Platform for Musical Interaction ....................... 370
Michiel Demey, Marc Leman, Frederick Bossuyt, Jan Vanfleteren
The Musical Synchrotron: using wireless motion sensors
to study how social interaction affects synchronization with musical tempo ............. 372

1
These are the contributions accepted as demos. The demo program also includes nine further demos associated
to papers and posters.
XII

PERFORMANCES 375
_____________________________________________________________________
Opening concert ....................................................................................................... 377

Roberto Girolin
Lo specchio confuso dall’ombra ............................................................................... 378
Nicola Ferrari
The Bow is bent and drawn ...................................................................................... 379
Giorgio Klauer
Tre aspetti del tempo per iperviolino e computer ...................................................... 380
Alessandro Sartini
Aurora Polare ........................................................................................................... 381
Pascal Baltazar
Pyrogenesis .............................................................................................................. 382
Chikashi Miyama
Keo Improvisation for sensor instrument Qgo........................................................... 383
Keith Hamel, François Houle, Aleksandra Dulic
Intersecting Lines ..................................................................................................... 384
Ernesto Romero e Esthel Vogrig
Vistas ........................................................................................................................ 385
Martin Messier, Jacques Poulin-Denis
The Pencil Project .................................................................................................... 386
Stuart Favilla, Joanne Cannon, Tony Hicks
Heretic’s Brew .......................................................................................................... 387
Mark Alexander Bokowiec, Julie Wilson-Bokowiec
The Suicided Voice ................................................................................................... 388
Mark Alexander Bokowiec, Julie Wilson-Bokowiec
Etch .......................................................................................................................... 389
Thomas Ciufo
Silent Movies: an improvisational sound/image performance ................................... 390
Alison Rootberg, Margaret Schedel
The Color of Waiting ................................................................................................. 391
Ge Wang, Georg Essl, Henri Penttinen
MoPhO - A Suite for Mobile Phone Orchestra .......................................................... 392
CLUB PERFORMANCES 393

_____________________________________________________________________
Jane Rigler
Traces/Huellas (for flute and electronics) ................................................................. 395
XIII

Renaud Chabrier, Antonio Caporilli
Drawing / Dance ....................................................................................................... 396
Joshua Fried
Radio Wonderland .................................................................................................... 397
Silvia Lanzalone
Il suono incausato, improvise-action
for suspended clarinet, clarinettist and electronics (2005) ........................................ 398
Luka Dekleva, Luka Prinčič, Miha Ciglar
FeedForward Cinema ............................................................................................... 399
Greg Corcoran, Hannah Drayson, Miguel Ortiz Perez, Koray Tahiroglu
The Control Group .................................................................................................... 400
Nicolas d'Alessandro
Cent Voies ................................................................................................................ 401
Cléo Palacio-Quintin, Sylvain Pohu
Improvisation for hyper-flute, electric guitar and real-time processing ...................... 402
Nicolas d'Alessandro, Sylvain Pohu
Improvisation for Guitar/Laptop and HandSketch ..................................................... 403
Ajay Kapur
Anjuna's Digital Raga ............................................................................................... 404
Jonathan Pak
Redshift .................................................................................................................... 405
INSTALLATIONS 407
_____________________________________________________________________
Olly Farshi
Habitat ...................................................................................................................... 409
Jeff Talman
Mirror of the moon .................................................................................................... 410
Joo Youn Paek
Fold Loud.................................................................................................................. 411
Kenneth Newby, Aleksandra Dulic, Martin Gotfrit
in a thousand drops... refracted glances .................................................................. 412
Jared Lamenzo, Mohit Santram, Kuan Huan, Maia Marinelli
Soundscaper ............................................................................................................ 413
Pasquale Napolitano, Stefano Perna, Pier Giuseppe Mariconda
SoundBarrier_ .......................................................................................................... 414
Art Clay, Dennis Majoe
China Gates.............................................................................................................. 415
XIV

WORKSHOPS 417
_____________________________________________________________________
Kia Ng
4th i-Maestro Workshop on Technology-Enhanced Music Education ....................... 419
Michael Zbyszyński
Tablet Workshop for Performers and Teachers ........................................................ 421
R. Benjamin Knapp, Marcelo Wanderley, Gualtiero Volpe
Techniques for Gesture Measurement in Musical Performance ............................... 423
Alexander Refsum Jensenius, Timothy Place, Trond Lossius,
Pascal Baltazar, Dave Watson
Jamoma Workshop ................................................................................................... 425
AUTHOR INDEX 427

_____________________________________________________________________
XV

Papers

Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Network Musics - Play, Engagement and the Democratization of

Performance
David Kim-Boyle
University of Maryland, Baltimore
County
Department of Music
1000 Hilltop Circle, Baltimore, MD
21250
1-410-455 8190
kimb oyle@umb c.edu
ABSTRACT Stockhausen [21] as well as the improvisatory work of groups
such as AMM, Musica Electronica Viva and artists associated
The rapid development of network communication
with the Fluxus School who directly situated the audience in
technologies has allowed composers to create new ways i n 1
which to directly engage participants in the exploration of new performative roles. Much as this earlier generation created
musical environments. A number of distinctive aesthetic unique opportunities for musical expression, composers
approaches to the musical application of networks will be working with networks create environments which are
outlined in this paper each of which is mediated and musically expressed through playful exploration. The musical
conditioned by the technical and aesthetic foundations of the forms that emerge from these explorations and the
network technologies themselves. Recent work in the field b y relationships that develop between participants should be
artists such as Atau Tanaka and Metraform will be examined, as considered, however, in the context of the social goals that
will some of the earlier pioneering work in the genre by Max propelled the work of this earlier generation of composers.
Neuhaus. While recognizing the historical context of Given the central aesthetic role the exploration of network-
collaborative work, the author will examine how the strategies based musical environments plays, the extent to which the
employed in the work of these artists have helped redefine a network’s topology conditions the play of participants
new aesthetics of engagement in which play, spatial and requires consideration [16]. While interactions between
temporal dislocation are amongst the genre’s defining participants can occur over spatially distributed or localized
characteristics. environments, and the interactions and explorations
themselves can be synchronous or asynchronous, the design
of the interface through which these explorative behaviors are
Keywords mediated is of equal importance. Informed by an
Networks, collaborative, open-form, play, interface. understanding of the principles of game design theory, it will
be argued that meaningful interaction and truly democratized
1. INTRODUCTION performance spaces can only emerge from carefully considered
The development of high-speed network communication system and interface design [19].
protocols and other wireless and telecommunications
technology has allowed the creation of musical environments
which directly engage participants in the realization of new 2. MUSICAL APPROACHES
forms of musical expression. These environments resituate the While a number of studies have been published outlining
role of the composer to that of designer and transform the different ways in which agents can collaborate with each other
nature of performance to that of play. While the development through a network infrastructure [1, 18, 26, 27], significantly
of the genre has been informed by aesthetic concerns shared b y less attention has been given to the different aesthetic
all collaborative art, the spatial and, to some extent, temporal approaches that these topologies facilitate. While the
dislocation of participants conditions and mediates the nature classification of network structures is helpful, the ways i n
of play itself to an unprecedented extent [1]. which such structures condition the behavior of participants i s
By actively engaging its audience, network-based musical equally significant. Some of the ways in which network
environments recall the collaborative work of an earlier topologies mediate musical expression will be explored in the
generation of composers such as Brown [3], Haubenstock- remainder of this paper. Central to this discussion are the
Ramati [10], Brün [4], Wolff [28], Pousseur [8], and musical effects of spatial and temporal dislocation and the role
of interface design.
A number of distinctive approaches to the musical application
Permission to make digital or hard copies of all or part of this work for of networks can be seen to have emerged since the earliest
personal or classroom use is granted without fee provided that copies experiments in the genre in the 1960s. These include the
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
1
requires prior specific permission and/or a fee. The term collaborative work will be used throughout in reference to
NIME08, June 4-8, 2008, Genova, Italy any work in which performers or the audience are given creative
Copyright remains with the author(s). responsibility for determining the order of musical events or, in some
cases, for interpreting general musical processes. Open form and
mobile form works are two examples of traditional collaborative work.
3
creation of network instruments, generative works, the employed would greatly inhibit the ability of participants t o
integration of musical play within social networks and the distinguish their own musical contribution much less be able
creation of immersive environments. to engage in meaningful dialog with others. Nevertheless,
through their participation, listeners were able to build a
community brought together by the exploration of the
network infrastructure. This would suggest that the goals of
2.1 Network Instruments Radio Net were not so much the participation in dialog but
Amongst some of the earliest work to utilize rather the playful exploration of a network environment.
telecommunication networks for artistic purposes are Max Like these earlier works, one of Neuhaus’s most recent projects
Neuhaus’s radio projects. Between 1966 and 1977, Neuhaus Auracle (2004) [15] adopts a similar network infrastructure
produced a series of works, which he termed “Broadcast although in this case the network no longer exists over radio
Works” in which the musical outcome is dependent upon the transmissions but rather the internet. In Auracle, participants
active responses of the audience. In the earliest of these works form ensembles and collectively modify an audio stream
Public Supply I (1966), radio listeners were asked to call in t o broadcast by a server through the use of their voice. In a
a radio station and produce any sounds they wanted. Neuhaus similar manner to the Broadcast Works the resultant sounds of
then mixed the incoming signals to produce the musical Auracle are affected by the proficiency of the participants but
results. Neuhaus has written of these works - “...it seems that also by network latency. Network latency, a manifestation of
what these works are really about is proposing to reinstate a temporal dislocation, is often considered a technical handicap
kind of music which we have forgotten about and which i s for performers who wish to collaborate over the internet, but i t
perhaps the original impulse for music in man: not making a is a key aesthetic consideration in the work of many
musical product to be listened to, but forming a dialogue, a composers who exploit it in the creation of unique musical
dialogue without language, a sound dialogue.” [16] The environments. While latency is minimized in Auracle by the
intention is strikingly similar to that expressed by Tanaka - system architecture employed, it nevertheless clearly
“In classical art making, we have been mostly concerned with distinguishes the relationships participants form with the
making objects - nodes - is (sic.) time to think about the audio stream and through that with other ensemble members
process of creating connections?” [22] and like much of from those traditional relationships that exist between
Tanaka’s network-based projects, Neuhaus’s work exists as an performers and their instruments.
environment which promotes the agency of its participants
through the initiation and development of musical dialogs. In Just as in the Broadcast Works Neuhaus regards Auracle not as
Public Supply I, however, Neuhaus mediates those a self-contained musical work in itself but as a collective
relationships through the mixing process, reinforcing instrument or musical architecture through which participants
musically interesting dialogs while downplaying those of less develop relationships through musical dialog. As implied
appeal. above, those dialogs are necessarily mediated by the design of
the instrument itself. The algorithm used to extract control
In a later realization of Neuhaus’s project, listeners from across features from the sonic input is not made explicit and the
the Eastern United States were asked to call in and whistle a ability of participants to shape the audio stream with any
single pitch for as long as they were able. The work, entitled degree of nuance is quite limited. Further, there is little direct
Radio Net, was produced in cooperation with National Public indication as to how particular gestures modify the audio
Radio. Unlike Public Supply I, in this work Neuhaus did not stream. While this would seem to inhibit the ability of
mix the responses live but rather, devised an automated participants to engage in meaningful dialog with other
mixing system in which the output switched between various participants, it does reinforce the fact that like any instrument,
input signals based on the pitch of the input sounds. The Auracle has its own idiosyncrasies.
input whistles were also subject to electronic transformation
as the sounds looped from one broadcast station to another. In comparison with the Broadcast Works, the use of an
While Radio Net’s realization was perhaps of more interest t o interface also represents an important distinction. Existing as
its participants than a passive audience, and despite the fact the window through which the environment is explored and
that some thousands of listeners participated in the realization dialogs with ensemble members are developed, of immediate
of the work, the result as realized in its only 1977 performance note is its simplicity. With the screen divided into discrete
was coherent, subtle, and at times quite beautiful [14]. sections representing the geographical location of
participants, the musical contributions of ensemble members
To the extent that Radio Net was developed as an environment is graphically represented by simple lines. Basic control
within which musical dialogs could be formed and developed, functions allow participants to record brief audio samples
the work does present a number of themes which we will see which transform the audio stream. While control functions are
taken up in various forms in most subsequent network-based simple they are a necessary consequence of the work’s open
music. These include the role of the agency of others i n environment. The interface design also enables ensemble
conditioning one’s own play, the degree to which dialogs are members to more clearly distinguish their own musical
mediated by the mechanism’s of the network, the public vs. contributions from those of other members.
private space of performance, the degree to which the dialogs
enabled represent truly unique ways of communicating and the
new role of the composer as a designer of a musical
environment rather than a creator of self-contained musical
work. Rather than attempt to address the extent to which all
these themes are addressed in Neuhaus’s Broadcast Works, let
us for now comment on the question of agency. The network
infrastructure of Radio Net and the transformational processes
4
Ulrike Gabriel [2]. In Global String, network traffic between

nodes is used to drive the parameters of the audio synthesis
engine [2007, pers. comm. April], directly correlating temporal
dislocation to musical expression. It thus makes explicit the
ways in which the network mediates communication between
participants making prominent the question of information
transparency. Global String also is one of the few network
instruments to incorporate haptic feedback within its
infrastructure. This is an especially important design
characteristic as, unlike software-based instruments, it more
directly rewards performance skill and in doing so increases
the likelihood of more meaningful play emerging [11].
2.2 Generative Works

The work of composer Jason Freeman, a collaborator with
Neuhaus on Auracle, often addresses ways in which an
audience can be engaged in the creation of unique musical
forms. The design of carefully considered interfaces is crucial
to this endeavor. Graph Theory is a recent web-based work i n
which participants do not directly interact with each other but
rather help realize an open-form musical work by navigating
Figure 1. Interface for Max Neuhaus’s Auracle. pathways through a range of musical possibilities. In the work,
basic melodic cells are repeatedly performed by a violinist.
Much of Atau Tanaka’s work has employed networks t o The user is able to choose which cell will follow the currently
directly explore issues of collaboration and community performed cell by choosing between up to four subsequent
building. In his Global String (1998), a project produced with cells, see Fig. 3 top. There are a total of sixty one cells. While
Kasper Toeplitz, a network simulates an invisible resonant the order of cells is chosen by the participant, the range of
string whose nodes are anchored in different gallery possibilities is predetermined and displayed with a graphic
installations. Tanaka writes of the project - “The installation representation of interconnected nodes. A novel aspect of this
consists of a real physical string connected to a virtual string work is that a score can be generated for performance in which
on the network. The real string (12 mm diameter, 15 m length) the order of the loops is determined by the popularity of
stretches from the floor diagonally up to the ceiling of the choices made by users. While the content of the work is
installation space. On the floor is one end point - the ear. Up defined by the composer, the ability of a collective t o
above is the connection to the network, connecting it t o determine its order is a unique feature and an extension of
another end point at a remote location. Sensors translate the classic open-form works.
vibrations into data to be transmitted to the other end via
network protocol. The server is the “bridge” of the instrument - While the pathways chosen through the score are not overtly
the reflecting point. It runs software that is a physical model of determined by the composer, they are certainly influenced by
a string. Data is streamed back to each site as sound and as data how the composer has decided to distribute musical phrases
- causing the string to become audible, and to be actuated t o amongst nodal points. One of Freeman’s pre-compositional
vibrate, on the other end.” [24] Players of the string are able t o rules was that adjacent cells could have only one change
collaborate with other players located in different installations between their respective pitch sets. This decision introduced
in a topology similar to that of Weinberg’s bridge model [27]. melodic continuity and helped keep decision making for the
participants relatively simple. There were no such rules
applied to rhythmic properties. The graphical representations
employed were also considered in determining navigational
pathways [2007, pers. comm. 2 January]. As participants
navigating Graph Theory’s structure do not interact with each
other, questions of spatial distribution and temporal latency
are not pertinent. The interface that Freeman has designed,
however, does condition the play of those who interact with
the materials broadcast by the server. A map of all possible
pathways through the work’s 61 nodes is presented in the
bottom left quadrant of the interface. These pathways have a
tri-partite structure which encourages both local exploration
Figure 2. Installation setup for Tanaka’s Global String. of neighboring nodes and implies greater musical contrast for
larger cross-sectional explorations. A participant’s movement
Just like Neuhaus’s Auracle, Global String is not a self- through the nodes of the work is also facilitated through the
contained musical work but rather a network-based instrument use of simple bar graphs for the display of rhythmic structure
that facilitates connections between participants across and pitch contour. This choice of display clearly renders the
distributed space. In Global String, these connections are work more suitable to participants unable to read common
mediated by the latency of the network which Tanaka practice notation.
considers analogous to instrumental resonance [22], an idea
also explored in the work of artists such as Carsten Nicolai and
5
Through immediate visual and aural feedback, participants of Mobile Music” (2004) is a good example of this more recent
Graph Theory are clearly able to discern their actions and direction. Using specially modified mobile communication
evaluate them in the context of previous and future decisions. devices equipped with physical sensors that measure both the
They are also able to compare their choices with those of pressure with which the device is held and its movement i n
others through a simple “popularity” index which rates the space, participants are able to collectively remix a popular
frequency with which subsequent cells are chosen. The choices song chosen by the members of the network [23]. Various
made are given a further complexity in that they contribute t o audio transformations such as time stretching and sampling
a more global index used to create a score for live performance. can be applied, and rhythmic patterns and sequences can be
Participants thus contribute to two distinct levels of generated from the original source material through various
performance - the private space performance that takes place built-in software modules. Just as in open form works, these
within their own immediate interaction with the network, and transformations can be applied in any order and the various
the public space performance which results from the collective contributions of each group member become an individual
play of many participants. track in the master mix. The physical proximity of the
participants which is determined through a GPS system is also
used to affect the dynamic balance of the resultant mix directly
correlating social proximity with musical presence. The results
of the remixing and transformations are broadcast to all
participants. More overtly than Neuhaus’s Auracle, Tanaka’s
instrument creates immediate collaborative relationships and
communities through the virtual environment of the network
technology employed. The “Malleable Mobile Music” project
has recently been employed in a new interactive work,
Net_Dérive, for mobile, wearable communication devices. This
latter project was produced in collaboration with Petra
Gemeinboeck and received its premiere in Paris in 2006 [25].
Figure 4. Specially modified PDAs for Malleable Mobile

Music.
2.4 Immersive Works

A different type of musical collaboration is explored i n
immersive works [7]. In Ecstasis by the Australian ensemble
Metraform, four participants engage in exploring and
decoding a virtual environment through the use of head-
mounted displays equipped with motion tracking devices. The
images seen through the displays are also projected on four
screens surrounding the participants. In Ecstasis, the
participants, each graphically represented by an avatar,
determine the nature and scope of the work through their
interactions. Metraform has written of Ecstasis - “The
relationship between the avatars modulates the space, colour,
transparency and sound of the environment. The collective
interaction results in a dynamic interplay with and within a
continuously modifiable environment. This engagement
transgresses from a preoccupation of ‘doing’ something in an
Figure 3. User interface for Graph Theory with an environment to ‘being’ present to one’s self and others.” [13]
excerpt from the generated score.
In Ecstasis and other recent work by the ensemble, sound i s
employed as a means of environmental understanding. The
2.3 Social Networks soundscape of the work was produced by composer Lawrence
In recent work, Atau Tanaka has utilized mobile network Harvey. The sounds heard, and the sound transformations
technology to build communities in which the members applied are determined by the virtual location of each of the
collaborate in shared musical experiences. His “Malleable four participants as well as from information derived from the
6
motion of the head-mounted displays. Consisting of sixteen Theory. Given the responsibility assumed of participants, the
channels of spatialized sound, the sounds complement the composer or designer of that environment must also assume
images generated and provide easily discerned sonic cues that some responsibility for the quality of those relationships that
help establish cooperative relationships between the emerge. Dobrian goes further and states that in a collective
participants. Ecstasis defines clear goals for its participants performance it is up to the composer to develop an
and rewards their explorations with a greater understanding of environment within which compelling work can take place [9]
their environment. Through the environmental space that the while Tanaka has stated that interesting results can only be
work presents, Ecstasis becomes a catalyst for collective achieved by developing interesting processes [2007, pers.
individuation [12]. As its participants decode their comm. April]. Bryan-Kinns and Healey have even shown that
environment and come to a greater collective awareness it i s the effect of decay within a collective instrument significantly
clear that the disjunctions between interface and environment affects how participants engage with that environment [5]. As
and public and private performance spaces are no longer we have seen in the work of Neuhaus and Freeman, interface
sustainable. design is of critical importance in conditioning the ways i n
which processes, environments and relationships are able to be
explored while in Tanaka’s Global String, haptic feedback is a
critical component in the development of meaningful play.
Indeed, as has become evident, democratized performance
spaces can only be realized through carefully considered
interface design.
Transparent interface design also facilitates the ability of
participants to surrender to their environment rather than have
to decode the means through which it is presented. How that
environment responds to their own agency is of especial
importance. As noted by Phillips and Rabinowitz,
...when the audience expects instant response, asks the piece
for self-affirmation or affirmation of a learned behavior, the
effect closes down what the piece means to open up.
Collaborative art asks for something as complex as inspired
Figure 5. A screenshot from Ecstasis. surrender and must elicit recognition, building from
reflection. That moment of self-regard should then develop
into more complicated correspondences. Otherwise, the piece
3. AESTHETIC THEMES can veer toward superficiality and rely on what we call a
While each network project examined posits its own aesthetic “supermarket door process of interactivity”: I walked up to i t
questions, they all share a number of common concerns. These and it opened’ I have power [17].
range from questions regarding the democratized performance
space which network-based work promotes, through t o While technology has not fundamentally changed the defining
questions provoked by the technology through which these characteristics of collaborative art forms, it has certainly
works are sustained. Some of these questions include mediated them in distinctive ways. In some environments,
consideration of how the spatial and temporal aesthetics of such as in Metraform’s Ecstasis, this has brought about
network technologies mediate collaborative relationships [11] unique modes of engagement while in other projects network
while others make overt the influence of interface design in the latency has produced collective instruments the aesthetics of
promotion of democratized performance environments. which are founded on immediacy and extended reflection [24].
Of defining character, of course, are the spatial and temporal
Given the creative role participants play in exploring their properties of the network infrastructure or topology. While
musical environments, the role of the composer has largely these are able to be exploited to musical effect, it is perhaps
become transformed to that of designer while the traditional counterintuitive that spatial disjunction and temporal
role of the performer has been subsumed by that of player. To a dislocation can also perhaps serve to facilitate a greater
certain extent this situation is paralleled in traditional open- awareness of agency and collective becoming.
form works in which composers design open musical
environments which serve to facilitate an awareness of process
and collective becoming. All network-based musical works
posit environments within which relationships between 4. SUMMARY
participants are facilitated and developed. The directives The democratized performance spaces that network-based
which determine the extent to which these environments can musical environments supports are a natural response to the
be explored and relationships developed differs from musical and social ideals that motivated the work of an earlier
composer to composer and from project to project. While generation of composers for whom such technology did not
artists such as Tanaka and Neuhaus encourage collaborative exist. These technologies have brought about new modes of
relationships and dialogs to be openly explored within the awareness of individual agency and of the creative
boundaries of their environments, other artists such as relationships that can emerge with others through the playful
Metraform, and Freeman adopt a less open approach and exploration of the architectures that sustain musical
predefine particular social goals through and for their work. In collaboration. The aesthetic features unique to the genre
Metraform’s Ecstasis, as we have seen, this took the form of an emphasize the challenges of fully engaging participants i n
improved environmental understanding while the creation of a collaborative processes and moving participants beyond the
performable work was an explicit goal of Freeman’s Graph easy solution of falling back on what Cage has referred to as
superficial habits [6]. These challenges are amply rewarded,
7
however, by the exciting potential of network music to create [12] Massumi, B. The Political Economy of Belonging. In
unique forms of musical expression and new modes of musical Parables for the Virtual: Movement, Affect, Sensation.
agency and engagement and in doing so to transcend the Duke University Press, Durham, NC, 2002.
network architectures that make such dialogs and [13] Metraform. Ecstasis. 2004, Viewed December 2006,
relationships possible. <http://www.metraform.com>.
[14] Neuhaus, M. Radio Net. 1977. Available at
5. ACKNOWLEDGMENTS <http://www.ubu.com/sound/neuhaus_radio.html>.
I am grateful to Jason Freeman, Lawrence Harvey and Atau [15] Neuhaus, M., Freeman, J., Ramakrishnan, C., Varnick, K.,
Tanaka for providing further information on their work. I Burk, P., and Birchfield, D. Auracle. 2006, Viewed June
would also like to thank John Dack for generously providing a 2006, <http://auracle.org>.
copy of his article on the Scambi project.
[16] Neuhaus, M. The broadcast works and Audium. 2007,
Viewed January 2007, <http://www.max-neuhaus.info>.
6. REFERENCES [17] Phillips, L., and Rabinowitz, P. On collaborating with an
[1] Barbosa, A. Displaced soundscapes: a survey of network audience. Collaborative Journal, 2006, Viewed January
systems for music and sonic art creation. Leonardo Music 2006, <http://www.artcircles.org/id85.html>.
Journal, vol. 13, 2003, 53-59. [18] Rebelo, P. Network performance: strategies and
[2] Broeckmann, A. Reseau/Resonance - connective processes applications. Presentation at the 2006 International
and artistic practice. Artmedia VIII, 2002, Viewed March Conference on New Interfaces for Musical Expression
2007, (NIME06), Paris, 2006, Viewed March 2007,
<http://www.olats.org/projetpart/artmedia/2002eng/te_a <http://www.sarc.qub.ac.uk/~prebelo/index>
Broeckmann.html>. [19] Salen, K., and Zimmerman, E. Rules of Play: Game Design
[3] Brown, E. Form in new music. Darmstadter Beitrager, vol. Fundamentals. MIT Press, Cambridge, MA, 2004.
10, 1965, 57-69. [20] Souza e Silva, A. Art by telephone: from static to mobile
[4] Brün, H. When music resists meaning: the major writings interfaces. Leonardo Electronic Almanac, vol. 12, no. 10,
of Herbert Brün. Ed. A Chandra, Wesleyan University 2004.
Press, Middletown, CN, 2004. [21] Stockhausen, K. ...how time passes.... Trans. C Cardew, Die
[5] Bryan-Kinns, N. and Healey, P.G.T. Decay in collaborative Reihe, 3 (1959), Bryn Mawr, PA, 10-40.
music making. In Proceedings of the 2006 International [22] Tanaka, A. Seeking interaction, changing space. In
Conference on New Interfaces for Musical Expression Proceedings of the 6th International Art +
(NIME06), Paris, 2006, 114-117. Communication Festival 2003, Riga, Latvia, 2003, Viewed
[6] Cage, J. Soundings: investigation into the nature of July 2006, <http://www.csl.sony.fr/~atau/>.
modern music. Neuberger. [23] Tanaka, A. Mobile music making. In Proceedings of New
[7] Chew, E, Kyriakakis, C., Papadopoulos, C., Sawchuk, A. A., Interfaces for Musical Expression 2004 Conference,
and Zimmermann, R. From remote media immersion to Hamamatsu, 2004, 154-156.
distributed immersive performance. In Proceedings of the [24] Tanaka, A. Global String. 2005, Viewed July 2006,
2003 ACM SIGMM Workshop on Experiential <www.sensorband.com/atau/globalstring/globalstring.pd
Telepresence, Berkeley, CA, 2003, 110-120. f>.
[8] Dack, J. “Open” forms and the computer. In Musiques, [25] Tanaka, A., Gemeinboeck, P., and Momeni, A. Net_Dérive,
Arts, Technologies: Towards a Critical Approach. a participative artwork for mobile media. In-press, 2007.
L’Harmattan, Paris, 2004, 401-412.
[26] Turbulence.org. Viewed January 2006,
[9] Dobrian, C. Aesthetic considerations in the use of <http://www.turbulence.org>.
“virtual” music instruments. In Proceedings of the 2003
International Conference on New Interfaces for Musical [27] Weinberg, G. Interconnected musical networks: toward a
Expression (NIME03), Montreal, 2003, 161-163. theoretical framework. Computer Music Journal, 29:2,
2005, 23-39.
[10] Haubenstock-Ramati, R. Notation - material and form.
Perspectives of New Music, Vol. 4, No. 1, 1965, 39-44. [28] Wolff, C. Open to whom and to what. Interface, 16/3,
1987, 133-141.
[11] Leman, M. Embodied music cognition and mediation
technology. MIT Press, Cambridge, MA, 2007.
8
Ten-Hand Piano: A Networked Music Installation

Álvaro Barbosa
Research Center for Science and Technology of the Arts (CITAR)
Portuguese Catholic University – School of the Arts
Rua Diogo Botelho 1327, 4169-005 Porto, Portugal
+351 22 616 62 91
abarbosa@porto.ucp.pt
ABSTRACT computer networks as a channel to connect performing spaces. It

This paper presents the latest developments of the Public Sound can run entirely over WWW, and its underlying communication
Objects (PSOs) system, an experimental framework to implement protocol (Hypertext Transfer Protocol - HTTP), in order to
and test new concepts for Networked Music. The project of a perform over a regular Internet Connection and achieve the sense
Public interactive installation using the PSOs system was of a Public Acoustic Space where anonymous users can meet and
commissioned in 2007 by Casa da Musica, the main concert hall be found performing in collective Sonic Art pieces.
space in Porto. It resulted in a distributed musical structure with The system itself is an interface-decoupled Musical Instrument, in
up to ten interactive performance terminals distributed along the which a remote user interface and a sound processing engine
Casa da Musica’s hallways, collectively controlling a shared reside with different hosts, given that it is possible to
acoustic piano. The installation allows the visitors to collaborate accommodate an extreme scenario where a user can access the
remotely with each other, within the building, using a software synthesizer from any place in the world using a web browser.
interface custom developed to facilitate collaborative music
practices and with no requirements in terms previous knowledge Specific software features were implemented in order to reduce
of musical performance. the disruptive effects of network latency [3], such as dynamic
adaptation of the musical tempo and dynamics to communication
latency measured in real-time.
Keywords
Network Music Instruments; Real-Time Collaborative In particular, the recent developments presented in this paper,
Performance; Electronic Music Instruments; Behavioral Driven result from a commission in 2007 of an Interactive Sonic Art
Interfaces; Algorithmic Composition; Public Music; Sound Installation form Casa da Musica, the main concert hall space in
Objects; Porto. The resulting Setup is a distributed musical structure with
up to ten interactive performance terminals distributed along the
Casa da Musica’s hallways, collectively controlling a shared
1. INTRODUCTION acoustic piano.
The Public Sound Objects (PSOs) project consists of the
development of a networked musical system, which is an It Includes:
experimental framework to implement and test new concepts for
on-line music communication. It not only serves a musical x The adaptation of the Original synthesizer (a Pure-Data
purpose, but it also facilitates a straight-forward analysis of [4] sound Engine) to a Yamaha Disklavier Piano [5]
collective creation and the implications of remote communication x Redesign of the interactive sound paradigm in order to
in this process. constructively articulate multiple instances of
The project was initiated in 2000 [1] [2] at the Music Technology experimental users to an ongoing musical piece in real
Group (MTG) from the Pompeu Fabra University in Barcelona, time.
and most developments since 2006 have been undertaken by the x Introduction o an Ethersound [6] acoustic broadcast
Research Center for Science and Technology of the Arts (CITAR) system for the clients musical feed-back
at the Portuguese Catholic University in Porto.
x Design of a physical infrastructure, coherent with the
The PSOs system approaches the idea of collaborative musical Casa da Musica architecture, to support the client and
performances over a computer network as a Shared Sonic server terminals.
Environment aiming to go beyond the concept of simply using
2. BACKGROUND TOPICS
Permission to make digital or hard copies of all or part of this work for 2.1 Sound Objects
personal or classroom use is granted without fee provided that copies are Community-driven creation, results in a holistic process, i.e., its
not made or distributed for profit or commercial advantage and that properties cannot be determined or explained by the sum of its
copies bear this notice and the full citation on the first page. To copy components alone [7]. A community of users involved in a
requires prior specific permission and/or a fee.
NIME08, June 4-8, 2008, Genova, Italy
Copyright remains with the author(s).
9
creation process, through a Shared Sonic Environment, definitely this topic, even though they were scattered over different panels,
constitutes a Whole in Holistic sense. instead of one distinct session.
According to Jan Smuts (1870-1950), the father of Holism Since then the term Networked Music has become increasingly
Theory, the concept of a Whole implies its individual parts to be consensual in defining the area, and according to Jason Freeman’s
flexible and adjustable. It must be possible for the part to be definition [12]: it is about music practice situations where
different in the whole from what it is outside the whole. In traditional aural and visual connections between participants are
different wholes a part must be different in each case from what it augmented, mediated or replaced by electronically-controlled
is in its separate state. connections.
Furthermore, the whole must itself be an active factor or influence In order to have a broad view over the scientific dissemination of
among individual parts, otherwise it is impossible to understand Networked Music research I present some of the most significant
how the unity of a new pattern arises from its elements. Whole Landmarks in the field over the last decade:
and parts mutually and reciprocally influence and modify each
other. 2.2.1 Summits and Workshops
Similarly, when questioning object’s behaviors in Physics it is The ANET Summit (August 20-24, 2004)
often by looking for simple rules that it is possible to find the The summit was organized by Stanford University’s Center for
answers. Once found, these rules can often be scaled to describe Computer Research in Music and Acoustics (CCRMA) and held
and simulate the behavior of large systems in the Real World. at the Banff Center in Canada, was the first Workshop event
This notion applies to the Acoustic Domains through the addressing the topic of High quality Audio over Computer
definition of Sound Objects as a relevant element of the music Networks. The guest lecturers were Chris Chafe, Jeremy
creation process by Pierre Schaeffer in the 1960’s. According to Cooperstock, Theresa Leonard, Bob Moses and Wieslaw
Schaeffer, a Sound Object is defined as: Woszczyk. A New edition of the ANET Summit is planed for
April 2008
“Any sound phenomenon or event perceived as a coherent whole
(…) regardless of its source or meaning” (Schaeffer, P., 1966). The Networked Music Workshop at ICMC (September 4, 2005).
Sound Object (I’object sonore), refers to an acoustical object for This Workshop was held in Barcelona and resulted from
human perception and not a mathematical or electroacoustical experience in previous ICMCs, which called for the need to
object for synthesis. One can consider a sound object the smallest realize such an event. Guest Lecturers were: Álvaro Barbosa
self-contained particle of a Soundscape [8]. Defining a universe of (Pompeu Fabra University, MTG), Scot Gresham-Lancaster
sound events by subsets of Sound Objects is a promising approach (Cogswell College Sunnyvale, CA), Jason Freeman (Georgia
for content-processing and transmission of audio [9], and from a Institute of Technology), Ross Bencina (Pompeu Fabra
psychoacoustic and perceptual point of view it provides a very University, MTG).
powerful paradigm to sculpt the symbolic value conveyed by a
Soundscape. 2.2.2 PhD Dissertations
These are some relevant dissertations published on the topic:
In an artistic context the scope for the user’s personal
interpretation is wider. Therefore such Sound Objects can have a 2002 Golo Föllmer “Making Music on the Net, social and
much deeper symbolic value and represent more complex aesthetic structures in participative music” [13]; 2002 Nathan
metaphors. Often there is no symbolic value in a sound, but once Schuett “The Effects of Latency on Ensemble Performance” [14];
there is a variation in one of its fundamental parameters it might 2003 Jörg Stelkens “Network Synthesizer” [15]; 2003 Gil
then convey a symbolic value. Weinberg “Interconnected Musical Networks: Bringing
Expression and Thoughtfulness to Collaborative Music” [16];
All these ideas about Sound Objects and the Holistic nature of
2006 Álvaro Barbosa “Displaced Soundscapes” [17]
community music are the basis for the main concept behind the
Public Sound Objects System. In fact, in PSOs raw material
provided for each user, to create his contribution to a shared 2.2.3 Journal Articles
musical piece, is a simple Sound Object. These Sound Objects, There is a number of Survey and partial overview articles on the
individually controlled, become part of a complex collective topic of Networked Music [18], [19], [20] [21] and [22] however
system in which several users can improvise simultaneously and a special issue of the journal Organised Sound from 2005 [23],
concurrently. edited by Leigh Landy, specifically focused on the topic of
Networked Music and includes many of the relevant references in
In the system a server-side real-time sound synthesis engine (a this area.
Disklavier Piano in the case of the Casa da Musica installation)
provides an interface to transform various parameters of a Sound
Object, which enables users to add symbolic meaning to their 3. THE PSOs INSTALLATION
performance. Casa da Musica is the main concert venue in the city of Porto,
and it has a strong activity in what concerns contemporary and
2.2 About Networked Music experimental forms of Music. The commission for the Public
In his Keynote Speech from ICMC 2003 Roger Dannenberg Sound Objects Installation had the underlying idea of bringing
mentioned “Networked Music” as one of the promising research music to the hallways of the house of music, so that the visitors
topics and at least four papers [2], [10] and [11] were centered on could actually interact with it.
10
3.1 The User Interface

The graphical user interface is based on a bi-dimensional
graphical metaphor of an ever-going bouncing ball enclosed in a
square shape box. Each time the ball hits one of the walls a piano
key is triggered by the server according to a pitch defined by the
value of a stylized fader that frames the Box (each fader
determines the pitch of a sound triggered in its adjacent wall).
Fig.1 Casa da Musica Building 1

The final implementation consists of a Disklavier Piano controlled
via MIDI by s server that simultaneously can be used as a
terminal, located at the main foyer of Casa da Musica. This server
accepts incoming control data generated by 10 client computers
located in diverse points of the hallways of a scenic route of the
building. Incoming data is transmitted over the building’s IP
Network using Open Sound Control [24].
Fig. 4 PSOs Client interface showing the representation of 5 users

Each of the clients actuating at a given moment are visually
represented in real-time by grey balls while the user himself
controls a distinctive orange ball. The user can also add a trail to
his ball producing an arpeggio sound (or a chord if the trail
extension is zero), given that the scale of notes each client can
Fig. 2 The PSOs Server connected to the Disklavier Piano and produce was anticipated to create a harmonic soundscape when
two of the clients which remotely control the same Piano different sound overlap in time.
The sound generated at the Servers site conveys the overall The PSOs system integrates several features to overcome
performance of every user and is streamed back to each client Network Latency issues already published in [3]. Nonetheless, in
using an ETHERSOUND [6] system, which produces latencies this version a new Latency tolerance feature was implemented to
under 100 ms on the building’s LAN. improve the perceptive correlation between an impact and a
triggered sound, using a simple sound panorama adjustment at the
sound server and consequently adding consistent sound panning
with the object’s behavior at the graphical user interface.
t1 t2 t3
R
Fig. 3 A PSOs Client with the ETHERSOUND Hub, Speakers
and Keyboard concealed on the structure. L
All the computer hardware for the server and clients has been
cloaked by a metal structure created in coherence with the 't 2 't 3
building’s unique architecture (a project by Rem Koolhaas), so 't 1 Time

that the users only access a one key mouse and a screen, or in
case of the server a touch screen.
Fig. 5 Representation of Impacts VS Triggered Sound with sound
panorama adjustment in the presence of latency ('t)
1
Image Source “House of Music Opening Day” Wikipedia Commons
under the license GFDL (GNU Free Documentation License)
11
The basic idea consists of only transmitting a sound object trough [9] Amatriain, X. and Herrera, P. Transmitting Audio Content as Sound
the Right Channel of the streamed Soundscape stereo mix, when a Objects. 15-6-2002. Proceedings of the AES22 International
ball hits the right wall, transmitting only through the Left Channel Conference on Virtual, Synthetic and Entertainment Audio
when a ball hits the left wall and transmitting in booth channels [10] Stelkens, J. peerSynth: A P2P Multi-User Software with new
(L+R) if the ball hits the top or bottom wall. techniques for integrating latency in real time collaboration. 2003.
Proceedings of the International Computer Music Conference
Sound Panorama Adjustment adds an extra cue to perception in
(ICMC2003)
temporal order of triggered Sound Objects and respective
correlation to ball impacts. [11] Obu, Y., Kato, T. and Yonekura, T. M.A.S.: A Protocol for a
Musical Session in a Sound Field Where Synchronization between
Musical Notes is no garanteed. 2003. International Computer Music
4. CONCLUSIONS AND FUTURE WORK Association. Proceedings of the International Computer Music
The PSOs Installation at Casa da Musica allows a piano to be Conference (ICMC2003), Singapore
controlled by 10 instances simultaneously (Ten Hands!) in a
[12] Freeman, J. The Networked Music Workshop at ICMC 2005,
coherent and constructive manner, which would hardly be Barcelona (September 4, 2005)
possible to do in a traditional way.
[13] Föllmer, G. 2002 Making Music on the Net, social and aesthetic
Even though the interface is radically different than the normal structures in participative music. Ph.D. Thesis, Martin Luther
control paradigm of a piano it is based on the same fundamental Universität Halle-Wittenberg – Germany
musical facets (Rhythm, Pitch, Timbre and Dynamic) and [14] Schuett, N. 2002 The Effects of Latency on Ensemble Performance.
therefore it is an engaging experience, since the users recognize a Ph.D. Thesis, Stanford University, California – USA
familiar result achieved trough a totally different way.
[15] Stelkens, J. 2003 Network Synthesizer. Ph.D. Thesis, Ludwig
The interface is simple enough to achieve a musical soundscape Maximilians Universität, München – Germany
with zero learning time and without any previous musical practice [16] Weinberg, G. 2003 Interconnected Musical Networks – Bringing
experience, which made the system very accessible and popular Expression and Thoughtfulness to Collaborative Music Making.
for the average 500 daily visitors of the Casa da Musica. Ph.D. Thesis, Massachusetts Institute of Technology, Massachusetts
– USA
Controlling a popular acoustical instrument brings the users closer
to the musical experience and in this sense we would like to [17] Barbosa, A. 2006 Displaced Soundscapes: Computer Supported
Cooperative Work for Music Applications. Ph.D. Thesis, Pompeu
further develop this system adding a pool of instruments to the
Fabra University, Barcelona – Spain
piano, such as wind, string and percussion instruments controlled
by Robotics. [18] Sergi Jordà, S. 1999 Faust Music On Line (FMOL): An approach to
Real-time Collective Composition on the Internet, Leonardo Music
Journal, Volume 9, pp.5-12
5. ACKNOWLEDGMENTS [19] Tanzi, D. 2001 Observations about Music and Decentralized
The author would like to thank the people that collaborated in this Environments, Leonardo Music Journal, Volume 34, Issue 5, pp.431-
project: Jorge Cardoso (UCP), Jorge Abade (UCP) and Paulo 436
Maria Rodrigues (Casa da Musica).
[20] Barbosa, A. 2003 Displaced Soundscapes: A Survey of Network
Systems for Music and Sonic Art Creation, Leonardo Music Journal,
6. REFERENCES Volume 13, Issue 1, pp.53-59
[1] Barbosa, A. and Kaltenbrunner, M. Public Sound Objects: A shared [21] Weinberg, G. 2005 Interconnected Musical Networks: Toward a
musical space on the web. 2002. IEEE Computer Society Press. Theoretical Framework, Computer Music Journal, Vol. 29, Issue 2,
Proceedings of International Conference on Web Delivering of pp.23-29
Music (WEDELMUSIC 2002) - Darmstadt, Germany
[22] Traub, P. 2005 Sounding the Net: Recent Sonic Works for the
[2] Barbosa, A., Kaltenbrunner, M. and Geiger, G. Interface Decoupled Internet and Computer Networks, Contemporary Music Review, Vol.
Applications for Geographically Displaced Collaboration in Music. 24, No. 6, December 2005, pp. 459 – 481
2003. Proceedings of the International Computer Music Conference
(ICMC2003) [23] Landy, L. 2005 Organised Sound 10 (Issue 3), Cambridge University
Press, U.K. (OS: ISSN: 1355-7718)
[3] Barbosa, A., Cardoso, J. and Geiger, G. Network Latency Adaptive
Tempo in the Public Sound Objects System. 2005. Proceedings the [24] Wright, M. and Freed, A. 1997 Open Sound Control: A New
International Conference on New Interfaces for Musical Expression Protocol for Communicating with Sound Synthesizers, proceedings
(NIME 2005); Vancouver, Canada. of the International Computer Music Conference
[4] Puckette, M. Pure Data. 269-272. 1996a. International Computer [25] Nella, M. J. Constraint Satisfaction and Debugging for
Music Association. Proceedings of the International Computer Interactive User Interfaces. Ph.D. Thesis, University of
Music Conference, San Francisco (ICMC96) Washington, Seattle, WA, 1994.
[5] Yamaha Disklavier Piano:
http://www.yamaha.co.jp/english/product/piano/product/europe/dl/dl
.html (Cunsulted 2008/01/30)
[6] ETHERSOUND:
http://www.ethersound.com/ (Cunsulted 2008/01/30)
[7] Smuts, J. Holism and Evolution. 1926. Macmillan, London UK
[8] Schaeffer, P., Traité des Objets Musicaux., Le Seuil, Paris, 1966
12
Large-Scale Mobile Audio Environments

for Collaborative Musical Interaction
Mike Wozniewski & Zack Settel Jeremy R. Cooperstock

Nicolas Bouillot Université de Montréal Centre for Intelligent Machines
Centre for Intelligent Machines Montréal, Québec, Canada McGill University
McGill University zs@sympatico.ca Montréal, Québec, Canada
Montréal, Québec, Canada jer@cim.mcgill.ca
{mikewoz,nicolas}@cim.mcgill.ca
ABSTRACT within our physical environment. This prospect yields a

New application spaces and artistic forms can emerge when new domain for musical interaction employing augmented-
users are freed from constraints. In the general case of reality interfaces and large multi-user environments.
human-computer interfaces, users are often confined to a We present a system where multiple participants can nav-
fixed location, severely limiting mobility. To overcome this igate about a university campus, several city blocks, or an
constraint in the context of musical interaction, we present even larger space. Equipped with position-tracking and
a system to manage large-scale collaborative mobile audio orientation-sensing technology, their locations are relayed
environments, driven by user movement. Multiple partici- to other participants and to any servers that are managing
pants navigate through physical space while sharing over- the current state. With a mobile device for communication,
laid virtual elements. Each user is equipped with a mobile users are able interact with an overlaid virtual audio envi-
computing device, GPS receiver, orientation sensor, micro- ronment, containing a number of processing elements. The
phone, headphones, or various combinations of these tech- physical space thus becomes a collaborative augmented-
nologies. We investigate methods of location tracking, wire- reality environment where immersive musical interfaces can
less audio streaming, and state management between mobile be explored. Musicians can input audio at their locations,
devices and centralized servers. The result is a system that while virtual effects processors can be scattered through
allows mobile users, with subjective 3-D audio rendering, the scene to transform those signals. All users, perform-
to share virtual scenes. The audio elements of these scenes ers and audience alike, receive subjectively rendered spatial
can be organized into large-scale spatial audio interfaces, audio corresponding to their particular locations, allowing
thus allowing for immersive mobile performance, locative for unique experiences that are not possible in traditional
audio installations, and many new forms of collaborative music performance venues.
sonic activity.
Keywords
sonic navigation, mobile music, spatial interaction, wireless
audio streaming, locative media, collaborative interfaces
1. INTRODUCTION & BACKGROUND

With the design of new interfaces for musical expression,
it is often argued that control paradigms should capitalize
on natural human skills and activities. As a result, a wide
range of tracking solutions and sensing platforms have been
explored, which translate human action into signals that can
be used for the control of music and other forms of media.
The physical organization of interface components plays an Figure 1: A mobile performer
important role in the usability of the system, since user mo-
tion naturally provides kinesthetic feedback, allowing a user
to better remember the style of interaction and gestures re- 1.1 Background
quired to trigger certain events. Also, as digital devices be- In earlier work, we have spent significant time exploring
come increasingly mobile and ubiquitous, we expect interac- how virtual worlds can be used as musical interfaces. The
tive applications to become more distributed and integrated result of this investigation has led to the development of the
Audioscape engine [30]1 , which allows for the spatial orga-
nization of sound processing, and provides an audiovisual
rendering of the scene for feedback. Audio elements can
Permission to make digital or hard copies of all or part of this work for be arranged in a 3-D world and precise control over the di-
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
rectivity of propagating audio is provided to the user. For
bear this notice and the full citation on the first page. To copy otherwise, to example, an audio signal emitted by a sound generator may
republish, to post on servers or to redistribute to lists, requires prior specific be steered toward a sound processor that exists at some
permission and/or a fee. 3-D location. The processed signal may again be steered
NIME08, Genova, Italy 1
Copyright 2008 Copyright remains with the author(s). Available at www.audioscape.org
13
towards a virtual microphone that captures and busses the spaces emerge that can operate in a potentially unbounded
sound to a loudspeaker where it is heard. The result is a physical space. These offer many novel possibilities that can
technique of spatial signal bussing, which lends particularly lead to new artistic approaches; or they can re-contextualize
well to many common mixing operations. Gain control, for existing concepts that can then be revisited and expanded
instance, is accomplished by adjusting the distance between upon. An excellent example is parade music, where sound
two nodes in the scene, while filter parameters can be con- emission is spatially dynamic or mobile; a passive listener
trolled by changing orientations. remains in one place while different music is coming and
The paradigm of organizing sound processing in three- going. One hundred years ago, Charles Ives integrated this
dimensional space has been explored in some of our previous concept into symphonic works, where different musical ma-
publications [27, 28, 26]. We have seen that users easily un- terial flowed through the score, extending our notions of
derstand how to interact with these scenes, especially when counterpoint to include those based on proximity of mu-
actions are related to every-day activity. For instance, it is sical material. The example of parade music listening ex-
instantly understood that increasing the distance between pands to include two other cases: a mobile listener can walk
two virtual sound elements will decrease the intensity of with or against the parade, yielding additional relationships
the transmitted signal, and that pointing a sound source to the music. Our work also integrates the concept of ac-
in a particular direction will result in a stronger signal at tive listening; material may be organized topographically
the target location. We have designed and prototyped sev- in space, produced by mobile performers and encountered
eral applications using these types of interaction techniques, non-linearly by mobile listeners. From this approach come
including 3-D mixing, active listening, and using virtual ef- several rich musical forms, which like sculpture, integrate
fects racks [27, 29]. Furthermore, we began to share virtual point of view ; listeners/observers create their own unique
scenes between multiple participants, each with subjective rendering. Thus, artists may create works that explore the
audio rendering and steerable audio input, allowing for the spatial dynamics of musical experience, where flowing mu-
creation of virtual performance venues and support for vir- sic content is put in counterpoint by navigation. Musical
tual reality video conferencing [31]. scores begin to resemble maps, and listeners play a larger
While performers appreciated the functionality of these role in authoring their experiences.
earlier systems, they were nevertheless hampered by con-
straints on physical mobility. These applications operated 1.3 Related Work
mainly with game-like techniques, where users stood in front With respect to collaborative musical interfaces, Blaine
of screens, and navigated through the scene using controllers and Fels provide an overview of many systems, classifying
such as joysticks or gamepads. The fact that the gestures for them according to attributes such as scale, type of media,
moving and steering sound were abstracted through these amount of directed interaction, learning curve, and level
intermediate devices resulted in a lack of immersive feeling of physicality, among others [7]. However, most of these
and made the interfaces more complicated to learn. systems rely on users to be in a relatively fixed location in
We thus decided to incorporate more physical movement, front of a computer. The move to augmented- or mixed-
for example, sensing the user’s head movement with an ori- reality spaces seems like a natural evolution, offering users
entation sensor attached to headphones, and applying this a greater level of immersion in the collaboration, and their
to affect changes to the apparent spatial audio rendering. respective locations can be used for additional control.
To further extend this degree of physical involvement we be- In terms of locative media, some projects have considered
gan to add real-world location awareness to the system, al- the task of tagging geographical locations with sound. The
lowing users to move around the space physically instead of [murmur] project [2] is one simple example, where users
virtually. For example, our 4Dmix3 installation [4] tracked tag interesting locations with phone numbers. Others can
up to six users in an 80m2 gallery space. The motion of call the numbers using their mobile phones and listen to
each user controlled the position of a recording buffer, which audio recordings related to the locations. Similarly, the
could travel among a number of virtual sound generators in Hear&There project [20] allows recording audio at a given
the scene. The result was a type of remixing application, GPS coordinate, while providing a spatial rendering of other
where users controlled the mix by moving through space. recordings as users walk around. Unfortunately, this is lim-
In the remainder of this paper, we explore the use of ited to a single-person experience, where the state of the
larger scale position tracking, such as that of a Global Po- augmented reality scene is only available on one computer.
sitioning System (GPS), and the resulting challenges and Tanaka proposed an ad-hoc (peer-to-peer) wireless network-
opportunities that such technology presents. We evolve ing strategy to allow multiple musicians to share sound si-
our framework to support a more distributed and mobile- multaneously using hand-held computers [22]. Later work
capable architecture, which results in the need for wireless by Tanaka and Gemeinboeck [23] capitalized on location-
audio streaming and the distribution of information about based services available on 3G cellular networks to acquire
the mobile participants. Sections 2 and 3 describe the addi- coarse locations of mobile devices. They proposed the cre-
tional technical elements that need to be introduced into the ation of locative media instruments, where geographic local-
system to support wireless and mobile applications, while ization is used as a musical interface.
Section 4 demonstrates a prototypical musical application Large areas can also be used for musical interaction in
using this new architecture. Musicians in the Mobile Au- other ways. Sonic City [16] proposed mobility, rather than
dioscape are able to navigate through an outdoor environ- location, alone, for interaction. As a user walks around a
ment containing a superimposed set of virtual audio ele- city, urban sounds are processed in real time as a result of
ments. Real physical gestures can be used to steer and readings from devices such as accelerometers, light sensors,
move sound through the space, providing an easily under- temperature sensors, and metal detectors. Similarly, the
stood paradigm of interaction in what can now be thought Sound Mapping [19] project included gyroscopes along with
of as a mobile music venue. GPS sensors in a suitcase that users could push around a
small area. Both position changes and subtle movements
1.2 Mobile Music Venues could be used to manipulate the sound that was transmitted
By freeing users from the confines of computer termi- between multiple cases in the area via radio signal.
nals and interfaces that severely limit mobility, application Orientation or heading can also provide useful feedback,
14
since spatial sound conveys a great deal of information about tions that transmit error corrections over radio frequencies.
directions of objects and the acoustics of an environment. The idea is that mobile GPS units in the area will have
Projects including GpsTunes [21] and Melodious Walkabout similar positional drift, and correcting this can yield accu-
[15] use this type of information to provide audio cues that racies of under 1m. Another technique, known as assisted
guide individuals in specific directions. GPS (AGPS), takes advantage of common wireless networks
We take inspiration from the the projects mentioned above, (cellular, bluetooth, WiFi) in urban environments to access
and incorporate many of these ideas into our work. How- reference stations with a clear view of the sky (e.g., on the
ever, real-time high-fidelity audio support for multiple indi- roofs of buildings). Although accuracy is still in the or-
viduals has not been well addressed. Tanaka’s work [22], as der of 15m, the interesting benefit of this system is that
well as some of our past experiences [8], demonstrate how localization can be attained indoors (with an accuracy of
we can deal with the latencies associated with distributed approximately 50m) [6].
audio performance, but minimizing latency remains a ma-
jor focus of our work. The ability to create virtual audio 2.3 Orientation & Localization
scenes will be supported with some additions to our existing While GPS devices provide location information, it is also
Audioscape engine. To address the need of distributed mo- important to capture a listener’s head orientation so that
bile interaction, we are adding large-scale location sensing spatial cues can be provided, the resulting sound appearing
and the ability to distribute state, signals, and computa- to propagate from a particular direction. Most automotive
tion among mobile clients effectively. These challenges are GPS receivers report heading information by tracking the
addressed in the following sections. vehicle trajectory over time. This is a viable strategy for in-
ferring the orientation of a vehicle, but a listener’s head can
change orientation independently of body motion. More-
2. LOCATIVE TECHNOLOGY over, the types of applications we are targeting will likely
In order to support interaction in large-scale spaces, we involve periods of time where a user does not change posi-
require methods of tracking users and communicating be- tion, but stays in one place and orients his or her body in
tween them. A variety of mobile devices are available for various directions. Therefore, additional orientation sensing
this purpose, potentially equipped with powerful processors, seems to be a requirement.
wireless transmission, and sensing technologies. For our ini- In human psychoacoustic perception, accuracy and re-
tial prototypes, we chose to develop on Gumstix (verdex sponsiveness of orientation information are important, since
XM4-bt) processors with expansion boards for audio I/O, a listener’s ability to localize sound is highly dependent on
GPS, storage, and WiFi communication [17]. These devices changes in phase, amplitude, and spectral content with re-
have the benefit of being full-function miniature computers spect to head motion. Responsiveness, in particular, is a
(FFMC) with a large development community, and as a significant challenge, considering the wireless nature of the
result, most libraries and drivers can be supported easily. system. Listeners will be moving their heads continuously
to help localize sounds, and a delay of more than 70ms in
2.1 Wireless Standards spatial cues can hinder this process [10]. Furthermore, it
Given that the most generally available wireless technolo- has been demonstrated that head-tracker latency is most
gies on mobile devices are Bluetooth and WiFi, we consider noticeable in augmented reality applications, as a listener
the benefits and drawbacks for each of these standards . For can compare virtual sounds to reference sounds in the real
transmission of data between sensors located on the body environment. In these cases, latencies as low as 25ms can be
and the main processing device, Bluetooth is a viable solu- detected, and begin to impair performance in localization
tion. However, even with Bluetooth 2.0, a practical transfer tasks at slightly greater values [11]. It is therefore suggested
rate is typically limited to approximately 2.1 Mbps. If we that latency be maintained below 30ms.
want to send or receive audio (16 bit samples at 44kHz), To track head orientation, we attach an inertial measure-
approximately 700 kbps of bandwidth is needed for each ment unit (IMU) to the headphones of each participant,
stream. In theory, this allows for interaction between up capable of sensing instantaneous 3-D orientation with an
to three individuals, where each user sends one stream and error of less than 1 degree. It should be mentioned that not
receives two. Given the need to support a greater number all applications will require this degree of precision, and
of participants, we are forced to use WiFi.2 Furthermore, some deployments could potentially make use of trajectory-
the range of Bluetooth is limiting, whereas WiFi can relay based orientation information. For instance, the Melodious
signals through access points. Furthermore, we can make Walkabout [15] uses aggregated GPS data to determine the
use of higher-level protocols such as Optimized Link State direction of travel, and provides auditory cues to guide in-
Routing protocol (OLSR) [18], which computes optimal for- dividuals in specific directions. Users hear music to their
warding paths for ad-hoc nodes. This is a viable way to left if they are meant to take a left turn, whereas a low-pass
reconfigure wireless networks if individuals are moving. filtered version of their audio is heard if they are traveling
in the wrong direction. We can conceive of other types of
2.2 GPS applications, where instantaneous head orientation is not
GPS has seen widespread integration into a variety of needed, and users could adjust to the paradigm of hear-
commodity hardware such as cell phones and PDAs. These ing audio spatialization according to trajectory rather than
provide position tracking in outdoor environments, typically line of sight. Of particular interest, are high-velocity appli-
associated with the 3-D geospatial coordinates of users. cations such as skiing or cycling, where users are generally
However, accuracy in consumer-grade devices is quite poor, looking forward, in the direction of travel. Such constraints
ranging between approximately 5m in the best case (high- can help with predictions of possible orientations, while the
quality receiver with open skies) [25] to 100 metres or more faster speed helps to overcome the coarse resolution of cur-
[6]. Several methods exist to reduce error, for example, rent GPS technology.
differential GPS (DGPS) uses carefully calibrated base sta-
2
We note viable alternatives on the horizon, such as the
3. WIRELESS AUDIO STREAMING
newly announced SOUNDabout Lossless codec, which al- The move to mobile technology presents significant de-
lows even smaller audio streams to be sent over Bluetooth. sign challenges in the domain of audio transmission, largely
15
related to scalability and the effects of latency on user expe- for Pure Data [3], and can be deployed on both a central
rience. More precisely, a certain level of quality needs to be server and a mobile device.
maintained to ensure that mobile performers and audience In benchmark tests, we have successfully transmitted un-
members experience audio fidelity that is comparable to compressed streams with an outgoing packet size of 64 sam-
traditional venues. The design of effective solutions should ples. The receiver buffer holds two packets in the queue
take into account that WiFi networks provide variable per- before decoding, meaning that a delay of three packets is
formance based on the environment, and that small and encountered before the result can be heard. With a sam-
lightweight mobile devices are, at present, limited in terms pling rate of 44.1kHz, this translates to a packetization and
of computation capabilities. receiving latency of 3 × (64/44.1) = 4.35ms. In addition,
the network delay can be as low as 2ms, provided that the
3.1 Scalability users are relatively close to each other, and typically does
Reliance on unicast communication between users in a not exceed 10ms for standard wireless applications. The
group suffers a potential n2 effect of audio interactions be- sum of these latencies is in the order of 7-15ms.
tween them and in turn, to bandwidth explosion. We have Practical performance will, of course, depend on the wire-
investigated a number of solutions to deal with this prob- less network being used and the number of streams trans-
lem. mitted. Our experiments show that high packet rate results
Multicast technology, for instance, allows devices to send in network instability and high jitter. In such situations it
UDP packets to an IP multicast address that virtualizes a is necessary to increase packet size to help maintain an ac-
group of receivers. Interested clients are able to subscribe to ceptable packet rate. This motivates us, as future work, to
the streams of relevance, drastically reducing the overall re- investigate algorithms for autonomous adaptation of low-
quired bandwidth. However, IP multicast over IEEE 802.11 latency protocols that deal both with quality and scalabil-
wireless LAN is known to exhibit unacceptable performance ity.
[14] due to unsupported collision avoidance and acknowl-
edgement at the MAC layer. Our benchmark tests confirm
that multicast transmission experienced higher jitter than 4. MOBILE AUDIOSCAPE
unicast, mandating a larger receiver buffer to maintain qual- Our initial prototyping devices, Gumstix, were chosen
ity. Furthermore, packet loss for the multicast tests was in to provide: 1) wireless networking for bidirectional high-
the order of 10-15%, resulting in a distorted audio stream, quality, low-latency audio and data streams, 2) local au-
while unicast had almost negligible losses of 0.3%. Based on dio processing, 3) on-board device hosting for navigation
these results, we decided to rely for now on a point-to-point and other types of USB or Bluetooth sensors, 4) minimal
streaming methodology while experimenting with emerging size/weight, and 5) Linux support. A more detailed ex-
non-standard multicast protocols, in anticipation of future planation of our hardware infrastructure can be found in
improvements. another publication [9], in particular, the method of Blue-
tooth communication between Gumstix and sensors.
3.2 Low Latency Streaming To develop on these devices, a cross-compilation toolchain
Mobile applications tend to rely on compression algo- was needed that could produce binaries for the ARM-based
rithms to respect bandwidth constraints. As a result they 400MHz Gumstix processors (Marvell’s PXA270). The first
often incur signal delays that challenge musical interaction library that we needed to build was a version of Pure Data
and performer synchronization. Acceptable latency toler- (Pd), which is used extensively for audio processing and
ance depends on the style of music, with figures as low as control signal management by our Audioscape engine. Par-
10ms [12] for fast pieces. More typically, musicians have dif- ticularly, we used Pure Data anywhere (PDa), a special
ficulty synchronizing with latencies above 50ms [13]. Most fixed-point version of Pd for use with the processors typ-
audio codecs require greater than this amount of encod- ically found on mobile devices [5]. Several externals needed
ing time.3 Due in part to limited computational resources to be built for PDa, including a customized version of the
available on our mobile devices, we instead transmit un- Open Sound Control (OSC) objects, where multicast sup-
compressed audio, thus fully avoiding codec delays in the port was added, and the nstream object, mentioned in Sec-
system. tion 3.2. The latter was also specially designed to support
Other sources of latency include packetization delay, cor- both regular Pd and PDa, using sample conversion for in-
responding to the time required to fill a packet with data teroperability between an Apple laptop, PC and Gumstix
samples for transmission, and network delay, which varies units.
according to network load and results in jitter at the re- We also supplied each user with an HP iPAQ, loaded
ceiver. Soundcard latencies also play a role, but we con- with a customized application that could graphically repre-
sider this to be outside of our control. The most effective sent their location on a map. This program was authored
method for managing these remaining delays may be to min- with HP Mediascape software [1], which supports the play-
imize the size of transmitted packets. By sending a smaller back of audio, video, and even Flash based on user position.
number of audio samples in each network packet, we also The most useful aspect of this software was the fact that
decrease the amount of time that we must wait for those we could use Flash XML Sockets to receive GPS locations
samples to arrive from the soundcard. of other participants and update the display accordingly.
In this context, we have developed an dynamically recon- Although we used the Compact Flash GPS receivers with
figurable transmission protocol for low-latency, high-fidelity the iPAQs for sending GPS data, the interface between Me-
audio streaming. Our protocol, nstream, supports dynamic diascape software and the Flash program running within it
adjustment of sender throughput and receiver buffer size. only allowed for updates at 2Hz, corresponding to a latency
This is accomplished by switching between different levels of at least 500ms before position-based audio changes were
of PCM quantization (8, 16 and 32 bit), packet size, and re- heard. The use of the GPSstix receiver, directly attached
ceiver buffer size. The protocol is developed as an external to the Gumstix processor, is highly recommended to anyone
3
Possible exceptions are the Fraunhofer Ultra-Low De- attempting to reproduce this work.
lay Codec (offering a 6ms algorithmic delay) [24] and the The resulting architecture is illustrated in Figure 2. In-
SOUNDabout Lossless codec (claiming under 10ms). put audio streams are sent as mono signals to an Audioscape
16
Figure 3: Two participants jamming in a virtual

echo chamber, which has been arbitrarily placed on
the balcony of a building at the Banff Centre.
Figure 2: Mobile Audioscape architecture. Solid We have presented the challenges associated with sup-
lines indicate audio streaming while dotted lines porting multiple participants in such a system, including
show transmission of control signals. the need for accurate sensing technologies and network ar-
chitectures that can support low latency communication in
a scalable fashion. The prototype application that we devel-
server on a nearby laptop. The server also receives all con- oped was well-received by those who experimented with it,
trol data via OSC from the iPAQ devices and stores location but many improvements still need to be made. The coarse-
information for each user. A spatialized rendering is com- ness of resolution available in consumer-grade GPS technol-
puted, and stereo audio signals are sent back to the users. ogy is such that an application must span a wide area for it
For all streams, we send audio with a sampling rate of 44.1 to be of any value. This is problematic, since the range of
kHz and 16-bit samples. a WiFi network is much smaller, mandating redirection of
In terms of network topology, wireless ad-hoc connections signals through additional access points or OLSR peers. If
are used, allowing users to venture far away from buildings all signals must first travel to a server for processing, then
with access points (provided that the laptop server is moved distant nodes will suffer from very large latency.
as well). Due to the number of streams being transmitted, One solution is to distribute the state of the virtual scene
audio is sent with 256 samples per packet, which ensures an to all client machines, and perform rendering locally on the
acceptable packet rate and reduces jitter on the network. mobile devices. For the prototype application that we de-
The result is a latency of 3 × (256/44.1) = 17.4ms for pack- veloped, this would cut latency in half since audio signals
etization and a minimal network delay of about 2ms. How- would only need to travel from one device to another, with-
ever, since audio is sent to a central server for processing out the need to return from a central processing server. Fur-
before being heard, these delays are actually encountered thermore, this strategy would allow users to be completely
twice, for a total latency of approximately 40ms. This is free in terms of mobility, rather than in within contact with
well within the acceptable limit for typical musical perfor- the server for basic functionality. However, for scenes of
mance, and was not noticed by users of the system. any moderate complexity, this demands much more pro-
The artistic application we designed allows users to navi- cessing power and memory than is currently available in
gate through an overlaid virtual audio scene. Various sound consumer devices, and of course, the number of users will
loops exist at fixed locations, where users may congregate still be limited by the available network bandwidth required
and jam with accompanying material. Several virtual volu- for peer-to-peer streaming.
metric regions are also located in the environment, allowing A full investigation into distributing audio streams, state
some users to escape within a sonically isolated area of the and computational load will be presented in future work,
scene. Furthermore, each of these enclosed regions serves but for the moment we have provided a first step into the
as a resonator, providing musical audio processing (e.g., de- exploration of large-scale mobile audio environments. The
lay, harmonization or reverb) to signals played within. As multi-user nature of the system coupled with high-fidelity
soon as players enter such a space, their sounds are modi- audio distribution provides a new domain for musical prac-
fied, and a new musical experience is encountered. Figure tice. We have already designed outdoor spaces for sonic
3 shows two such performers, who have chosen to jam in a investigation, and hope to perform and create novel musi-
harmonized echo chamber. They are equipped with Gum- cal interfaces in this new mobile context.
stix and iPAQs, with both unobtrusively in their pockets.
6. ACKNOWLEDGEMENTS
5. DISCUSSION The authors wish to acknowledge the generous support
Approaching mobile music applications from the perspec- of NSERC and Canada Council for the Arts, which have
tive of virtual overlaid environments, allows novel paradigms funded the research and artistic development described in
of artistic practice to be realized. The virtualization of per- this paper through their New Media Initiative. The proto-
former and audience movement allows for interaction with type application described in Section 4 was produced in co-
sound and audio processing in a spatial fashion that leads to production with The Banff New Media Institute (Alberta,
new types of interfaces and thus, new musical experiences. Canada). The authors would like to thank the participants
17
of the locative media residency for facilitating the work and musical expression (NIME), pages 109–115,
in particular, Duncan Speakman, who provided valuable as- Singapore, 2003.
sistance with the HP Mediascape software. [17] Gumstix. www.gumstix.com.
[18] P. Hipercom. RFC 3626, Optimized Link State
7. REFERENCES Routing protocol (OLSR), 2003.
[19] I. Mott and J. Sosnin. Sound mapping: an assertion
[1] HP Mediascape website. www.mscapers.com.
of place. In Proceedings of Interface, 1997.
[2] The [murmur] project. murmurtoronto.ca.
[20] J. Rozier, K. Karahalios, and J. Donath. Hear &
[3] Pure Data. www.puredata.info. There: An augmented reality system of linked audio.
[4] Webpage: 4Dmix3. au- In Proceedings of International Conference on
dioscape.org/twiki/bin/view/Audioscape/SAT4Dmix3. Auditory Display (ICAD), 2000.
[5] PDa: Real time signal processing and sound [21] S. Strachan, P. Eslambolchilar, R. Murray-Smith,
generation on handheld devices. In International S. Hughes, and S. O’Modhrain. GpsTunes:
Computer Music Conference (ICMC), 2003. Controlling navigation via audio feedback. In
[6] R. Bajaj, S. L. Ranaweera, and D. P. Agrawal. GPS: International Conference on Human Computer
Location-tracking technology. Computer, 35(4):92–94, Interaction with Mobile devices & services
2002. (MobileHCI), pages 275–278, New York, 2005. ACM.
[7] T. Blaine and S. Fels. Contexts of collaborative [22] A. Tanaka. Mobile music making. In Proceedings of
musical experiences. In Proceedings of the conference New Interfaces for Musical Interaction (NIME), 2004.
on New Interfaces for Musical Expression (NIME), [23] A. Tanaka and P. Gemeinboeck. A framework for
pages 129–134, Montreal, 2003. spatial interaction in locative media. In Proceedings
[8] N. Bouillot. nJam user experiments: Enabling remote New Interfaces for Musical Expression (NIME), pages
musical interaction from milliseconds to seconds. In 26–30, Paris, France, 2006. IRCAM.
Proceedings of the International Conference on New [24] S. Wabnik, G. Schuller, J. Hirschfeld, and U. Krämer.
Interfaces for Musical Expression (NIME), pages Reduced bit rate ultra low delay audio coding. In
142–147, New York, NY, USA, 2007. ACM. Proceedings of the 120th AES Convention, May 2006.
[9] N. Bouillot, M. Wozniewski, Z. Settel, and J. R. [25] M. Wing, A. Eklund, and L. Kellogg. Consumer-grade
Cooperstock. A mobile wireless platform for global positioning system (GPS) accuracy and
augmented instruments. In International Conference reliability. Journal of Forestry, 103(4):169–173, 2005.
on New Interfaces for Musical Expression, Genova, [26] M. Wozniewski. A framework for interactive
Italy, 2008. three-dimensional sound and spatial audio processing
[10] D. Brungart, B. Simpson, R. McKinley, A. Kordik, in a virtual environment. Master’s thesis, McGill
R. Dallman, and D. Ovenshire. The interaction University, 2006.
between head-tracker latency, source duration, and [27] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
response time in the localization of virtual sounds. In framework for immersive spatial audio performance.
Proceedings of the International Conference on In New Interfaces for Musical Expression (NIME),
Auditory Display (ICAD), 2004. Paris, pages 144–149, 2006.
[11] D. S. Brungart and A. J. Kordik. The detectability of [28] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
headtracker latency in virtual audio displays. In paradigm for physical interaction with sound in 3-D
Proceedings of International conference on Auditory audio space. In Proceedings of International
Display (ICAD), pages 37–42, 2005. Computer Music Conference (ICMC), 2006.
[12] E. Chew, A. A. Sawchuk, R. Zimmerman, [29] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
V. Stoyanova, I. Tosheff, C. Kyriakakis, spatial interface for audio and music production. In
C. Papadopoulos, A. R. J. François, and A. Volk. Digital Audio Effects (DAFx), 2006.
Distributed immersive performance. In Proceedings of
[30] M. Wozniewski, Z. Settel, and J. R. Cooperstock.
the Annual National Association of the Schools of
Audioscape: A pure data library for management of
Music (NASM), San Diego, CA, 2004.
virtual environments and spatial audio. In Pure Data
[13] E. Chew, R. Zimmermann, A. A. Sawchuk, Convention, Montreal, 2007.
C. Papadopoulos, C. Kyriakakis, C. Tanoue, D. Desai,
[31] M. Wozniewski, Z. Settel, and J. R. Cooperstock.
M. Pawar, R. Sinha, and W. Meyer. A second report
User-specific audio rendering and steerable sound for
on the user experiments in the distributed immersive
distributed virtual environments. In International
performance project. In Proceedings of the 5th Open
conference on auditory displays (ICAD), 2007.
Workshop of MUSICNETWORK: Integration of
Music in Multimedia Applications, 2005.
[14] D. Dujovne and T. Turletti. Multicast in 802.11
WLANs: an experimental study. In MSWiM ’06:
Proceedings of the 9th ACM international symposium
on Modeling analysis and simulation of wireless and
mobile systems, pages 130–138, New York, NY, USA,
2006. ACM.
[15] R. Etter. Implicit navigation with contextualized
personal audio contents. In Adjunct Proceedings of the
Third International Conference on Pervasive
Computing, pages 43–49, 2005.
[16] L. Gaye, R. Mazé, and L. E. Holmquist. Sonic city:
the urban environment as a musical interface. In
Proceedings of the Conference on New interfaces for
18
Open Sound Control: Constraints and Limitations

Angelo Fraietta
Smart Controller Pty Ltd
PO Box 859
Hamilton 2303, Australia
61-2-90431806
info@smartcontroller.com.au
ABSTRACT time control of sound” [17] and is unsuitable as an end-to-end

Open Sound Control (OSC) is being used successfully as a protocol for most constrained embedded systems.
messaging protocol among many computers, gestural This paper will first describe some of the powerful features
controllers and multimedia systems. Although OSC has provided by OSC before dispelling some of the myths regarding
addressed some of the shortcomings of MIDI, OSC cannot OSC. Finally, some strategies will be proposed that could be
deliver on its promises as a real-time communication protocol used to develop a protocol to meet the needs of constrained
for constrained embedded systems. This paper will examine systems.
some of the advantages but also dispel some of the myths
concerning OSC. The paper will also describe how some of the
best features of OSC can be used to develop a lightweight 2. OSC FEATURES
protocol that is microcontroller friendly.
2.1 OSC Addressing Scheme
The OSC address scheme provides three main features: the
Keywords ability to give the mapped address an intuitive name, the ability
MIDI, Open Sound Control, Data Transmission Protocols, to increase the maximum number of namespaces, and the ability
Gestural Controllers. to define a range of addresses within a single message.
2.1.1 Intuitive Names

1. INTRODUCTION OSC is similar to MIDI in that it defines mapped points and
Open Sound Control (OSC) has been implemented as a
values to be assigned to those points. For example, if a gestural
communications protocol in more than a few hardware and
controller had the left finger position mapped to ‘MIDI
software projects. The general impression appears to be that
controller 12 on Channel 1’, a value of ‘127’ would be
“MIDI is a simple and cheap way to communicate between a
accomplished by sending the bytes ‘0xB01 0x0C 0x7F’. The
controller and computer, but it is limited in terms of bandwidth
point being mapped is defined by the first two bytes, while the
and precision and on the way out, OpenSound Control [sic]
value of the point is defined by the last byte. In OSC, setting a
being a better alternative”[1]. In some cases, developers felt that
point with a value could be done with the following message:
they had to implement OSC in new instruments to maintain any
‘/minicv/forefinger 127’; the address being ‘/minicv/forefinger’.
sort of credibility in the NIME community [4]. It appears that
The ability to provide an intuitive name to a parameter is a
the general consensus in computer music communities is that
function of composition rather than a function of performance.
OSC is computer music’s new ‘royal robe’ to replace the
It is much easier for a composer to map a musical event to a
outdated, slow, ‘tattered and torn’ MIDI and its “well-
meaningful name, such as ‘/lefthand’ than it is to map to some
documented flaws” [18]. This perception could be implied due
esoteric set of numbers such as ‘0xB0 0x0C’.
to the lack of papers critical of OSC.
OSC has provided some very useful and powerful features that 2.1.2 Increased Namespace
were not previously available in MIDI, including an intuitive The addressing feature of OSC enables the users to increase the
addressing scheme, the ability to schedule future events, and possible number of mapped points. In MIDI, for example, after
variable data types. Although more and more composers are continuous controllers 0 to 127 on channels 1 to 16 have all
developing and composing for low power, small footprint, and been assigned, the namespace for continuous controllers has
wireless instruments and human interfaces [3, 13, 14]; a move been exhausted. In OSC, however, if two performers required
toward OSC in these application is not always possible, nor the namespace ‘lefthand’, the address space could be expanded
desirable. Although OSC has addressed some of the limitations “through a hierarchical namespace similar to URL notation”
of MIDI, OSC does not provide “everything needed for real- [18]. For example, the two different performers could use
‘/performer1/lefthand’ and ‘/performer2/lefthand’. Each OSC
client will receive these messages, and due to the packet
Permission to make digital or hard copies of all or part of this work for paradigm of OSC [18], the client that does not require the
personal or classroom use is granted without fee provided that copies
message will discard it.
1
Copyright remains with the author(s). 0x in front of the number signifies it is a hexadecimal value.
19
2.1.3 Address Ranges [4]. The speed comparison between OSC and MIDI is always
The namespace feature of OSC is extremely powerful in that it made at the 31.25 kilobits per second Data Layer in MIDI3, and
enables a significantly large number of namespaces and the so Wright and Freed state that MIDI is “roughly 300 times
ability to define a range of points in a single message. For slower” [18] than OSC. Speed, by its definition, is a function
example, the OSC namespace ‘/minicv/left* 127’ would set the of time; in the same way the weight is not just a function of
value of ‘127’ from ‘/minicv/leftThumb’ right through to mass, but also a function of gravity. Comparing the speed of
‘/minicv/leftPinkie.’ MIDI with that of OSC is akin to comparing the weight of a
2Kg ball on earth with a 600Kg ball in outer space where the
gravity is zero. A more accurate speed comparison between
2.2 OSC Data Types OSC and MIDI would be made by comparing the two protocols
One of the brilliant features of OSC is the ability to define at identical layers on the OSI stack, comparing the time taken
different data types that can be transmitted in a message. for the target data to be encoded and then decoded on identical
Although it is possible to send any data type of any resolution layers on the stack using identical processors. If one was to
using MIDI system exclusive messages, OSC has provided a measure the number of machine instructions required to parse a
standard for software and hardware developers from different typical MIDI message with that of a typical OSC message,
vendors. MIDI would win hands down.
2.3 Time Tags 3.2 OSC is Efficient

OSC contains a feature where future events can be sent in “Open SoundControl [sic] is … efficient… and readily
advance, allowing “receiving synthesizers to eliminate jitter implementable on constrained, embedded systems.” [18].
introduced during packet transport” [18] providing “sub- Efficiency is generally the ability to accomplish a particular task
nanosecond accuracy over a range of over 100 years” [19]. with the minimum amount of wastage or expenditure. In the
context of a gestural controller, it would be the ability to
3. MYTHS provide the same or similar functionality with the minimum
When one considers that OSC has been used in some very amount processor speed, memory, power, and bandwidth while
impressive installations and performances, such as providing the same or similar functionality. Efficiency is a
“multichannel audio and video streams and robot control relative term—what is deemed efficient today may be deemed
sequences at Disneyland” [OSC Newsgroup], it is not too inefficient tomorrow when newer technologies or algorithms are
difficult to understand why one may be reluctant to write a developed. In order to evaluate whether OSC is efficient, one
critical paper when OSC is gaining a ‘legendary’ reputation. If does not necessarily need to compare it in its entirety to a pre-
one is to consider using OSC on a constrained system, one existing system, but rather, to demonstrate how the resources
should separate fact from the fable, using maths to dispel the are being wasted.
myths. The two myths this paper will dispel are that OSC is fast
and that OSC is efficient. In a real-time system, such as a music performance, the ability
to meet timing constraints is of primary importance [15]. The
3.1 OSC is Fast system “must respond to external events in a timely fashion,
There is a belief in the NIME community that OSC is a fast which means that for all practical purposes, a late computation
communications protocol; for example, “The choice for OSC … is just as bad as an outright wrong computation” [8]. Many
was for its high speed, powerful protocol, and driver/OS- newer mobile musical interfaces are communicating wirelessly;
independency” [5]. Statements such as these are normally based for example, the “Pocket Gamelan”, which uses mobile
on a comparison between the data transmission rate between telephones that communicate amongst themselves [14].
OSC and MIDI in their typical applications [OSC Newsgroup].2 Although the speed of processors in wireless devices is
It is, however, misleading to compare the speed of OSC to increasing, this “increase in the processor speed is accompanied
MIDI based on the data transmission rate because OSC does by increased power consumption” [13]. An increase in power
not have a data transmission rate. means a decrease in the period that a battery powered controller
can be used in a performance. Furthermore, power usage
The Open Systems Interconnect (OSI) model [7] defines a contributes to the carbon footprint of the instrument. Efficiency,
communication model where applications communicate through therefore, is also an environmental issue.
a layered stack, where the transmitted message passes from the
highest layer of the stack to its lowest layer on the transmitting The developers of OSC state “our design is not preoccupied
end, and from the lowest layer to the highest layer on the with squeezing musical information into the minimum number
receiver. OSC does not define anything below the presentation of bytes. We encode numeric data in 32-bit or 64-bit quantities,
layer, but rather assumes the transport layer will have a provide symbolic addressing, time-tag messages, and in general
bandwidth of greater than 10 megabits per second [19]. MIDI, are much more liberal about using bandwidth for important
however, can be defined using the OSI model [7], from its features”[18]. A major aspect of this “liberal use of bandwidth”
Application Layer defining the message type right down to the
Physical Layer that defines the connector type and current loop 3
The MIDI Manufacturers Association (MMA) has approved a
standard for MIDI over IEEE-1394 (FireWire) [8] and is
already being used by instrument manufacturers including
2
Personal communications on Developer's list for the Yamaha, Presonus, Roland, and M-Audio [Personal
OpenSound Control [sic] (OSC) Protocol communications]. Furthermore, other implementations of
osc_dev@create.ucsb.edu will be referred to as OSC MIDI over other protocols exist, so the speed limitation of
Newsgroup. MIDI is no longer technically correct.
20
is the address, which defines the mapped point being (ARP) [9] used on local Ethernet networks. Without going into
referenced. As stated previously in this paper, the advantages the exact details, a brief explanation of how each mechanism
given by the addressing scheme are the intuitive names, the operates is presented, showing how similar paradigms to the
increased namespace, and addressing a range of points and will OSC address space are efficiently implemented.
be addressed later in the paper.
3.2.1 Communications Bandwidth 3.2.3.1 Internet Mapping

An example was given previously for a mapped point—one The intuitive naming strategy used in OSC is similar to domain
names on the internet. When addressing a computer on the
using MIDI, ‘0xB0 0x0C 0x7F’; the other using OSC
internet, one does not normally type in the Internet Protocol
‘/minicv/forefinger 127’. The first example uses only three
(IP)[11] address; rather, they type in the domain name. This
bytes while the second uses over twenty bytes. A significant
problem with the increased message sizes for wireless systems makes it very easy for a human to remember how to locate and
communicate with a particular computer on the internet. The
is that “the more data that is transmitted the greater the chance
calling computer, however, does not send a request to every
that part of the message will need to be retransmitted due to
noise - increasing latency and jitter” [OSC Newsgroup4]. Some computer on the internet. Instead, the domain name is mapped
to an IP address through an Internet Name Server[10]. For
developers of embedded and wireless instruments that have
example, if one was to ‘ping’ a domain from the command line,
been using OSC have resorted to developing pseudo device
the computer will obtain the IP address from the name server
drivers, whereby OSC is converted to a lightweight protocol
before being transmitted [3], reporting a five hundred percent and then send ping messages to the IP address. For example:
increase in throughput and efficiency [OSC Newsgroup5]. $> ping smartcontroller.com.au
Although this is an efficient alternative to transmitting the Pinging smartcontroller.com.au [210.79.16.38]
with 32 bytes of data:
whole OSC packet over the serial port, it effectively means that Reply from 210.79.16.38: bytes=32 time=16ms
OSC is not the complete end-to-end, server to client protocol. TTL=56
3.2.2 Processing Bandwidth This activity is done behind the scenes and is abstracted away
Although the transmission rate is taken into consideration— from the user. Although obtaining the IP before sending a
hence Wright and Freed’s assumption “that Open SoundControl message is effectively a two step procedure, these two steps
[sic] will be transmitted on a system with a bandwidth in the make it much more efficient than sending the domain name to
10+ megabit/sec range” [18]—many seem to forget that after every web server.
transmission and reception, the packet also needs to be parsed
by the target synthesiser. Furthermore, it is not just the target 3.2.3.2 Local Network Ethernet Mapping
synthesiser that needs to parse the data, but all synthesisers that On local networks, the abstraction is done through the Media
are not the intended recipients are required to stop what they are Access Control (MAC) address through ARP [9]. If, for
doing and parse a significant number of bytes before rejecting example, a computer whose IP address was ‘192.168.0.2’ on a
the message. This in turn affects the minimum processing local network wanted to send a message to the computer
requirements of each and every component in entire system. addressed ‘192.168.0.4’, it does not send a message to all the
Although many microcontrollers are being developed with computers on the local network expecting all but ‘192.168.0.4’
higher processing speeds, the “increase in the processor speed is to reject it. If this was the case, every time a network card
... accompanied by increased power consumption” [13]. received a message, it would be required to interrupt the
3.2.3 Processing Efficiency computer, impacting on the performance of the rejecting
computer. Rather, the ARP layer maps the IP addresses of the
Although the string based OSC namespace is more efficient for
computers on the network to MAC addresses. This MAC
a human to evaluate, a numerical value is much more efficient
for the computer because computers are arithmetic devices. address is used to address the network card. The other network
cards on the local network ignore the message and do not
Apart from the number of bytes that need to be parsed, the OSC
interrupt the computer. This mapping can be viewed on a
implementation requires that the namespace be parsed through
computer by typing ‘arp –a’ from the command prompt.
some sort of string library, requiring additional computation
and the memory space to contain the library. In a performance $> arp -a
where a mapped point is changed one hundred times a second, Interface: 192.168.0.2 --- 0x2
Internet Address Physical Address Type
the human would not be expected to read that value for every 192.168.0.1 00-04-ed-0d-f2-da dynamic
message sent; the computer, however, is. Hence, the message is 192.168.0.4 00-13-ce-f4-63-b6 dynamic
optimized for the entity that requires it least during
performance. This problem with the OSC addressing model is Although these steps are complicated, this is the sort of thing
that the coupling between the human cognition of the computers are good at and it makes communication on complex
namespace and transmission mechanism to the target computer networks very efficient. A similar approach to these could be
is too tight [6]—the naming, which is effectively the human used as an underlying layer to OSC. Implementation of such a
interface, should be abstracted away from the implementation mechanism for OSC is well beyond the scope of this paper; this
using a mapping strategy. Two such strategies that uses this does show that such processes are being used by other
type of mapping are the Internet Name Server [10] for technologies for improved efficiency and should probably be
addressing domain names, and the Address Resolution Protocol used in OSC.
3.2.4 Address Pattern Matching

4
Christopher Graham posting on 23 January 2008. The method of mapping multiple points to a single message,
5
ibid. for example, the OSC namespace ‘/minicv/left* 127’ is based
21
on the UNIX name matching and expansion [18, 19]. Once is used by many implementations of OSC [19]. UDP not
again we see a tight coupling between the human interface and guarantee that a packet will be received if transmitted;
the computer implementation. Although the developers of OSC moreover, it does not guarantee that the target will receive
claim that “with modern transport technologies and careful packets in the order they were sent. OSC is based on the same
programming, this addressing scheme incurs no significant paradigm as UDP in that it is packet driven. “This leads to a
performance penalty ... in message processing”[18], using two protocol that is as stateless as possible: rather than assuming
numbers to define a range would require significantly less that the receiver holds some state from previous
processing than decoding a character string with wildcards. For communications” [18]. The problem with the paradigm is that it
example, in the address range ‘/minicv/left*’, every character is no longer event driven, and assumes all the relevant data is
would need to be parsed and tested to see if it was one of the transmitted at once. If a gestural controller sends an OSC
defined wildcard characters. Next, one would have to factor in message that was supposed to change a robot motor direction,
the string comparison that would be required for every mapped immediately followed by a message to start the motor, the OSC
address on the client computer. receiver may receive those in the opposite order, which may be
worse than not receiving the information at all. For example, if
Protocols such as MODBUS [http://www.modbus-ida.org/] and a server was to send the following messages using UDP:
DNP [http://www.dnp.org/] are used by telemetry units to /lefthand/motor/direction 1
control pump stations in real time [2]. These protocols can use a /lefthand/motor/start
message type that sets a range of mapped points using a single
message. When a range is defined using two numbers, it is a The client could receive them as follows:
simple matter to test if a mapped point is within the range. For /lefthand/motor/start
example, if a message from a protocol that defined two mapped /lefthand/motor/direction 1
point ranges ‘UPPER_RANGE’ and ‘LOWER_RANGE’, the
algorithm to test would be as follows. This now means that the composer will need address the
possibility of messages arriving in the wrong order without any
IF MAPPED_POINT <= UPPER_RANGE notice from the protocol. Although, one could use TCP “in
AND MAPPED_POINT >= LOWER_RANGE THEN situations where guaranteed delivery is more important than low
ProcessValue latency” [19], lower latency has been one of the OSC
ENDIF
evangelists’ greatest catch cries.
As with the intuitive names, this requires an additional layer of
mapping and abstraction, which in turn means work for the 5. STRATEGIES FOR IMPROVEMENT
developer. Software engineering has a similar paradigm where The first strategy for improvement is the intelligent mapping of
some languages are scripted and some are compiled. Scripted namespaces to numbers. OSC must move away from the
languages require the server to compile human readable code stateless protocol paradigm and begin to embrace techniques
each time it is executed, while compiled languages use a tool to such as caching [16], which has been used for many years now
convert human readable code to something that is more efficient to improve the performance of networks, hard drives, and
for the computer. The first type is more efficient for the memory access on CPUs. MIDI’s use of running status is an
programmer because he or she does not need to compile the example of how caching can improve performance by nearly
code after each modification; however, there is a definite thirty-three percent. Caching will be the key to efficient
performance hit. Compiled languages require an extra step: mapping of address patterns to simple numbers without
compiling the human readable code to machine code; however, significant impacting upon performance.
there is an enhancement in performance. In terms of
communications protocols, OSC is like a scripted language: OSC must move toward an event delegation model, where
extremely powerful, but requiring significantly more computing clients register whether to receive OSC messages within a
power than what is available to most embedded technologies particular namespace. Needlessly receiving and parsing large
today. irrelevant messages from OSC servers is a waste of valuable
processing power.
3.2.5 Message Padding The developers of OSC must change their attitude towards
Another possible inefficiency is the padding of all message
MIDI. OSC has been anti-MIDI for a while, with OSC
parameters to four byte boundaries. For example, a parameter
developers often ridiculing MIDI developers [personal
that is only one byte in length is padded to four bytes. The
correspondence]. Some OSC developers have made token
reasoning behind this is that the OSC data structure is optimised
gestures towards MIDI by providing a namespace, which is “an
for thirty-two bit architectures [OSC Newsgroup]. There have
OSC representation for all of the important MIDI messages”
not yet been any conclusive tests to determine whether the gains
[19]. This completely defeats the innovative address pattern
obtained from this optimisation exceed the additional overhead
provided by OSC. Instead, an underlying network layer should
created by inserting and later filtering these additional padded
convert an intuitively mapped name, such as
bytes [OSC Newsgroup]; however, these results should be
‘/performer1/lefthand’ to a MIDI message and then transport it
forthcoming in the near future. It does, however, mean that
via MIDI or vice versa. The MIDI controller number should be
there would be a decrease in efficiency for eight, sixteen, and
completely abstracted away from the application layer in order
sixty-four bit architectures.
to reduce the coupling between the two. The OSC server should
not need to know at the application layer that the motor that
4. FAULT TOLERANCE controls the robot’s left finger is MIDI controller 13. Likewise,
the motor that is being controlled by controller 13 should not
OSC is a packet driven protocol that does not accommodate
need to know that the OSC server is really addressing
failure in the underlying OSI layers. UDP [12] is a protocol that
‘/performer1/lefthand’. Although these sort of strategies have
22
been employed in dynamic routing schemes in some OSC [5] Kartadinata, S. the gluion: advantages of an FPGA-based
projects [19], this should be a function of the network layer, not sensor interface. in International Conference on New
the application layer. When one considers that the longest Interfaces for Musical Expression (NIME). 2006. IRCAM
domain names on the internet can be addressed with only four - Centre Pompidou, Paris, France.
bytes, it is not unreasonable to expect that even the most [6] Larman, C., Applying UML and patterns: an introduction
complex OSC namespaces could be translated into simple MIDI to object-oriented analysis and design and the unified
messages if required. process. 2nd ed. 2002, Upper Saddle River, NJ: Prentice
There needs to be a greater number of message types—currently Hall PTR. xxi, 627.
there are only two. OSC needs to move towards an object [7] Lemieux, J., The OSEK/VDX Standard: Operating System
oriented paradigm in the communications protocol [4]. and Communication. Embedded Systems Programming,
Currently, all the network, data link, and transport layers of 2000. 13(3): p. 90-108.
transmission have been delegated to the application layer. This [8] Pawlicki, J. Formalization of embedded system
is above the presentation layer, which is where OSC exists— development: history and present. in Quality Congress.
this is completely upside down when comparing to the OSI ASQ's ... Annual Quality Congress Proceedings. 2003:
model. OSC needs to develop an underlying OSI stack where PROQUEST Online.
the protocol between the client and server is abstracted away [9] Plummer, D.C. RFC 826 - Ethernet Address Resolution
from the user. The underlying mapping should direct the Protocol: Or converting network protocol addresses to
message from the source to the destination. 48.bit Ethernet address for transmission on Ethernet
hardware. < http://www.faqs.org/rfcs/rfc826.html >
6. CONCLUSION accessed 28 January 2008
Although OSC has provided a standard “protocol for [10] Postel, J. IEN-89 - Internet Name Server. < ftp://ftp.rfc-
communication among computers, sound synthesizers, and editor.org/in-notes/ien/ien89.txt > accessed 28 January
other multimedia devices” [19], and was supposed to overcome 2008
“MIDI's well-documented flaws … [, its] liberal [use] … of
bandwidth” [18] may be its Achilles heel, preventing it from [11] Postel, J. RFC 760 - DoD standard Internet Protocol. <
ever being the standard end-to-end protocol for communication http://www.faqs.org/rfcs/rfc760.html > accessed 28
for low power and wireless microcontroller interfaces. If OSC is January 2008
to have any hope in servicing this significant and important area [12] Postel, J. RFC 768 - User Datagram Protocol. <
of the NIME community, an OSI stack needs to be developed http://www.faqs.org/rfcs/rfc768.html > accessed 21
that has efficiency and performance at the forefront, while at the January 2008
same time, implementing proven design patterns [6]. This, [13] Schiemer, G. and M. Havryliv. Wearable firmware: the
however, would be a significant research project within itself. Singing Jacket. in Ghost in the Machine: the Australasian
Computer Music Conference. 2004. University of Victoria,
Wellington.
7. ACKNOWLEDGMENTS
I would like to thank Adrian Freed from Center for New [14] Schiemer, G. and M. Havryliv. Pocket Gamelan: a Pure
Music and Audio at Univ. California, Berkeley for answering Data interface for java phones. in International Conference
the many questions I asked about OSC. I would also like to on New Musical Interfaces for Music Expression (NIME-
thank all the members of the Developer's list for the OpenSound 2005). 2005. University of British Columbia, Vancouver.
Control [sic] (OSC) Protocol osc_dev@create.ucsb.edu for [15] Son, S.H., Advances in real-time systems. 1995,
their input. Englewood Cliffs, N.J.: Prentice Hall. xix, 537.
[16] Vitter, J.S., External memory algorithms and data
8. REFERENCES structures: dealing with massive data. ACM Comput.
[1] Doornbusch, P., Instruments from now into the future: the Surv., 2001. 33(2): p. 209-271.
disembodied voice. Sounds Australian, 2003(62): p. 18.
[17] Wright, M. Introduction to OSC. <
[2] Entus, M., Running lift stations via telemetry. Water http://opensoundcontrol.org/introduction-osc > accessed
Engineering & Management, 1989. 136(11): p. 41-43. 21 January 2008
[3] Fraietta, A. Mini CV Controller - Conference Poster. in [18] Wright, M. and A. Freed. Open SoundControl: A New
Generate and Test: the Australasian Computer Music Protocol for Communicating with Sound Synthesizers. in
Conference. 2005. Queensland University of Technology, International Computer Music Conference. 1997.
Brisbane: Australasian Computer Music Association. Thessaloniki, Hellas: International Computer Music
[4] Fraietta, A., The Smart Controller: an integrated electronic Association.
instrument for real-time performance using programmable [19] Wright, M. and A. Freed. OpenSound Control: State of the
logic control, in School of Contemporary Arts. 2006, Art 2003. in International Conference on New Interfaces
University of Western Sydney. for Musical Expression (NIME-03). 2003. Montreal,
Quebec, Canada.
23
SMuSIM: a Prototype of Multichannel Spatialization

System with Multimodal Interaction Interface
Matteo Bozzolan Giovanni Cospito

Department of Electronic Music Department of Electronic Music
Conservatory of Music G.Verdi Conservatory of Music G.Verdi
Como, Italy Como, Italy
matteo.bozzolan@alice.it giovanni.cospito@fastwebnet.it
ABSTRACT the multichannel spatialization of sound sources. In par-

The continuous evolutions in the human-computer inter- ticular the devices explored in this work are: mouse and
faces field have allowed the development of control devices keyboard (very simple and primitive), a gamepad (classical
that let have a more and more intuitive, gestural and non- gaming joystick) and a webcam (low cost USB camera, that
invasive interaction. allows, through image analysis techniques, a totally non-
Such devices find a natural employment also in the music invasive and free-hand interaction).
applied informatics and in particular in the electronic music, In respect of the sound spatialization, the proposed proto-
always searching for new expressive means. type provides a quadriphonic sound diffusion and allows to
This paper presents a prototype of a system for the real- control up to four independent sound sources. The spatial-
time control of sound spatialization in a multichannel con- ization technique implemented is the well-known Amplitude
figuration with a multimodal interaction interface. The spa- Panning extended to the multichannel case. This choice of
tializer, called SMuSIM, employs interaction devices that simplicity find its motivation in the fact that the primary
range from the simple and well-established mouse and key- aim of this work is the investigation on the interaction in-
board to a classical gaming used joystick (gamepad), finally terfaces rather than the implementation of advanced spa-
exploiting more advanced and innovative typologies based tialization algorithms. The sound projection space can be
on image analysis (as a webcam). artificially altered by controlling the direct to reverberated
signal ratio.
Keywords
Sound spatialization, multimodal interaction, interaction 2. RELATED WORKS
interfaces, EyesWeb, Pure data. Although the use of spatial sound is present since from
the origins of the music and it appears many times in clas-
1. INTRODUCTION sical western music, it becomes a fundamental practice and
Technology and music have always had a particular re- a key aesthetical element mainly from the second half of the
lationship and affinity. In particular, the researches and past century (first thanks to the development of sound dif-
experimentations in the fields of electricity first and then fusion electrical devices and then because of the revolution
of the electronics and informatics have allowed, in the last of electronic and digital sound systems). For brevity, in this
two centuries, the birth of a series of instruments for a new section are presented only some of the most recent works in
musical expressivity. the field of real-time sound spatialization digital systems.
Besides, thanks to a more and more available computa- A first example is MidiSpace [6], a system for the spatial-
tional power associated with the development of new tech- ization of MIDI sound files in virtual environments realized
nics and technologies for the human gestuality acquisition at the end of the ’90s at the Sony Computer Science Lab
and analysis, new ways have opened in the field of the in Paris. It is one of the earliest sound spatialization ex-
human-computer interaction, allowing so the birth of a new periments in 3D worlds and it gives the user two distinct
generation of interfaces that find a natural employment even graphic interfaces to control the application: the first one
in music applications. (bidimensional) allows to displace the various sound sources
As reported in [11], the most widespread interaction de- (identified by a set of musical instruments) in the projection
vices currently used are (in an increasing order of complex- space, while the second one (three-dimensional and realized
ity): PC keyboard, mouse, joystick, MIDI keyboard, video in VRML) controls the movements of an avatar in the vir-
camera, touchpad, touchscreen, 3D input devices (data glo- tual world. The spatialization technique is the two-channel
ves, electromagnetic trakers) or haptic devices. Amplitude Panning and the interaction devices are mouse
This paper shows the results of the experimentation of and keyboard.
some of these interfaces for the realization of a system for A more recent work is represented by ViMiC [3], a real-
time system for the gestural control of spatialization for
small ensamble of players. It belongs to the wider project
Gesture Control of Spatialization started in the 2005 at the
Permission to make digital or hard copies of all or part of this work for McGill University IDMIL Lab (Montreal, Canada). It’s
personal or classroom use is granted without fee provided that copies are very interesting because it allows the user to control the dis-
not made or distributed for profit or commercial advantage and that copies placement of the sound sources simply by moving his hands
bear this notice and the full citation on the first page. To copy otherwise, to in the air (thanks to a complex apparatus for movements
republish, to post on servers or to redistribute to lists, requires prior specific interpretation and codification called Gesture Description
permission and/or a fee.
NIME08, Genova, Italy Interchange Format). A set of 8 sensors (connected with
Copyright 2008 Copyright remains with the author(s). an electromagnetic tracking system) is applied to the two
24
hand of the player. phonic configuration with the 4 loudspeaker placed at the
Zirkonium [9] is a software implemented to control the corners of the room. The projection space can be artifi-
spatialization within the Klangdom system at the ZKM cially extended and modified by controlling the direct to
(Germany). The Klangdom is formed by 39 speaker and it reverbereted signal ratio (for the creation of illusory acous-
can be controlled by Zirkonium through mouse and joystick. tic spaces).
It implements various spatialization algorithms (Wave Field The spatializer allows the player to control up to four
Synthesis, Ambisonics, Vector Base Amplitude Panning e simultaneous sound sources and a graphical feedback gives
Sound Surface panning) and it allows the user to define an the instantaneous state of the system.
arbitrary number of resources1 to spatialize in the concert The system offers a set of functionalities that allows a
hall. The system is controlled through a simple graphic complete and efficient control of the spatialization and in
interface. particular: a punctual and precise placement of the sound
Challenging Bodies [4] is a complex multidisciplinary pro- sources in the space, the control of relative and absolute
ject for live-performances of disabled people realized at the volume levels, the automatization of the movements, a non-
Informatics and Music Department of the Regina Univer- linear interpolation of the position of the sources in time and
sity (Canada). Within this wide project, the RITZ system, the possibility to load pre-recorded sound files or to acquire
through various techniques, allows to frontally spatialize up signals coming from a microphone or any audio device.
to 10 input signals coming from musical instruments with 7 As shown in Figure 1, the system has been implemented
loudspeaker placed in front of the players. Its control inter- in Pure Data 3 (and its graphical interface GrIPD 4 ) and
face is made up by two windows: the first one, implemented EyesWeb 5 (with the creation of ad hoc additional blocks)
in GEM2 , supplies a graphical feedback of the loudspeakers communicating through the OSC 6 protocol, making so SMu-
configuration and it allows to modify the position of the SIM a native network distributed application (both with
sound sources in the space, while the second one, the main one ore more instances on several machines, allowing mul-
control patch implemented in Pure Data, gives the user the tiple distributed configurations).
possibility to set the relative and absolute sound levels. The
system is hardly oriented to scalability and usability. 3.1 Interaction interfaces
The last example is the work recently proposed by Scha- The prototype offers three different typologies of human-
cher [10] at the ICMST of the Zurich university (Switzer- computer interaction devices for the spatialization’s con-
land). It consists of a design methodology and of a set of trol. Keyboard and mouse are the simplest and the most
tools for the gestural control of sound sources in surround widespread ones. The user controls the diffusion of the
environments. The spatialization is made through a struc- sound sources in the space through a combination of actions
tured and formalized analysis that allows to map the player and commands coming from the PC keyboard and from the
gestures on the sources movements by applying various ty- mouse. In this case the system provides (in addition to
pologies of geometric transformations. From the point of the visual feedback window) a bidimensional graphic envi-
view of the input devices, the system does not have a con- ronment where the player can put and move some graphic
solidated structure, but the interfaces used up to now spaces objects representing the different sound sources.
from data gloves equipped with multiple sensors (pressure,
position, bending) to haptic arms and graphic and multi-
touch tablets. The spatialization algorithms used are the
Ambisonics and the Vector Based Amplitude Panning.
3. IMPLEMENTATION: SMUSIM
SMuSIM is a multichannel sound spatialization system
with a multiple and multimodal interaction interface. It
is designed for real-time applications in musical expressive
contexts (electronic music spatialization, distributed and
collaborative network performances).
Figure 2: Input devices used for SMuSIM.
The second device is a gamepad, a classical gaming con-

troller with two axis and ten buttons freely configurable.
The very compact dimensions and the ergonomicity make
the devices very usable and allows a great playability.
The last interface is a standard low-cost USB webcam
that acquires the movements of a set of colored objects.
Each physical object (through a color-based tracking algo-
rithm) is associated to a sound object in the sound projec-
tion space.
The player can use one ore more devices at the same
Figure 1: The system’s architecture. time (allowing a collaborative and multi-user performance).
The proposed interfaces are deliberately simple, cheap and
In this first implementation, the speaker are supposed to
be arranged in the spatialization room in a typical quadri- 3
http://www.puredata.org
1 4
A resource is a set of one or more audio sources coming http://www.eyesweb.org
5
from an audio file, a network stream or any audio device. http://crca.ucsd.edu/˜jsarlo/gripd/
2 6
http://gem.iem.at http://opensoundcontrol.org
25
widely available on the market in order to let the system of the attenuation levels to apply to the audio signals on
easily usable and accessible to any user level. each channel. The spatialization technique is the Ampli-
tude Panning extended to the multichannel case. On the
3.2 Software components structure basis of the positional data of the virtual sources coming
As shown in Figure 3, the application is composed by from the input devices, a monophonic signal (considering a
some functional units that perform the various needed tasks. single source) is applied to the various channels with a gain
Data coming from the input devices are acquired, for- factor as follows:
matted and analyzed by the Device controller unit that is
constituted by other 4 sub-units, one for each input device.
xi (t) = gi x(t), i = 1, ..., N
In particular Mouse/Keyboard controller supplies a gra-
phic window (interaction environment) where the user can where xi (t) is the signal to apply to the loudspeaker i, gi
displace the four objects representing the sound sources the gain factor of the correspondent channel, N the cardi-
with the mouse. A set of keyboard key combinations allows nality of the loudspeaker and t the time. The gain factor gi
to perform a set of predefined actions (shifting of single or has a non-linear proportionality with the position (x, y) of
groups of sources, maintaining or not their topological con- a single sound source in the space. To overcome the 6dB at-
figuration, loading/saving default configurations, etc.). tenuation at the center of the projection space, a quadratic
sinusoidal compensation curve is applied along the two di-
mensions. By considering all the sound sources involved,
the resulting signal X(t) can finally be defined as:

K
X(t) = xi (t)
j=1
where K is the maximum number of sound sources in-

volved in the spatialization (K = 4 on the specific case of
SMuSIM ).
The graphic and audio feedback production is managed
respectively by the Graphics display and Sound production
units. The last one prepares the audio stream to send to
the loudspeakers. It essentially manages the reverberation
algorithm by applying it to the resulting signal coming from
Figure 3: Diagram of the software functionalities the combination of the original audio stream (furnished by
implemented in SMuSIM. the Audio streaming/playback unit) and the spatialization
data, allowing in this way the creation of illusory acous-
Joystick controller allows to control the spatializer with a tic spaces. By controlling the balance between the direct
standard 2-axis and 12-buttons gamepad. The interface be- and reverberated signal independently for each channel, it
tween the gamepad and the spatializer is managed through is possible, besides increasing the overall distances percep-
GrIPD, that provides all the needed functionalities. But- tion, to deform the sound projection environment (by act-
tons are used to select the sources to be controlled, while ing along one ore more dimensions of the room). Currently
the two analog mini-sticks determine the changes in their the functionalities of this unit are extremely limited in view
position and volume. With this device it is easy to control of a future integration of a sound synthesis engine for the
more than one source at the same time7 . Webcam con- real-time generation of sounds.
troller manages data coming from the video acquisition de-
vice. The interaction paradigm in this case is the following: 4. FUTURE WORK
the webcam films a plane and neutral colored surface on The system developed is still in a prototypal phase and
which are placed the objects to be tracked; the webcam’s has some limitations that can be easily improved. First
field of view correspond to the diffusion space and the po- it can be interesting to test some other interaction inter-
sition of the colored objects determines the displacement of faces (to enlarge the multimodality issue) such as more
the sound sources on the sound projection space. The unit performative cameras (higher frame-rate, infrared lighting)
provides a set of tools for the real-time selection of the de- or other technologies for the exploitation of the gestural
sired color to track (simply by picking it out on a window control of the instrument (electro-magnetic or ultra-sound
showing the webcam video stream) and for the extraction of tracking systems, data gloves). A study is currently ac-
centroids and bounding boxes of the color blobs. Bounding tive for the exploration of touch-sensible interfaces (graphic
boxes are used to set the volume levels of each source (a tablets, multitouch and painterly interfaces).
vertical position stands for maximum volume, a horizontal A second improvement refers to the spatialization tech-
one for mute). The MIDI controller block processes data nique, given that in this first phase of the project it has
coming from an optional MIDI device (both hardware and not been the crucial aspect of the work. The simple Ampli-
software). tude Panning technique can be replaced by more complex
The source movements can be automatized thanks to the and efficient algorithm such as the Vector Based Amplitude
Automatization unit, while the position’s changes of sound Panning, Ambisonics extended to a multichannel configu-
sources are made not instantaneous through Interpolator, ration and Wave Field Synthesis.
that generates a smoothed and decelerated motion by a non- Another key issue is represented by the performances of
linear interpolation of subsequent positional data. the system that are the main requisite of the application in
Spatializer is the unit that performs the computation contexts of real-time musical performances. In fact there are
7
Thanks to the compact size and to the ergonomicity of actually some latency problems in the configuration with
the gamepad, that allows the contemporaneous pressure of the webcam running particularly on not high performances
more than one button at the same time. machines or notebooks. This could be resolved by improv-
26
ing and optimizing both the tracking algorithm and the vi- consideration, the graphic feedback proposed to the user is
sual feedback production (in case abandoning the EyesWeb quite simple and thin, but it results very efficient and let
and Pd platforms and realizing an integrated, stand-alone have the actual state of the sound sources in the diffusion
and dedicated software application). space always under control.
From the point of view of the automatization, it does not From the point of view of the sound spatialization, the
provide any way of interaction with the player, but it is Amplitude Panning technique produces the expected re-
an autonomous and isolated modality. It could be inter- sults. It is very efficient, it does not have problems of com-
esting the implementation of rules for pattern learning and putational complexity and it is easily configurable to the
reproduction in order to let the system able to imitate and various executive and technical contexts (customization of
continue a performance initially guided by a human user. the panning curves and of the number of diffusion channels).
Other possible developments could refer to the diffusion Even if an intensive and large scale test session has still to
system (increasing the number of loudspeakers and their be conducted, SMuSIM has shown good results in terms of
configuration) and to the integration of a sound synthesis learnability, intuitivity and expressiveness. There are vari-
engine within the application. ous possible developments of this work and they refer both
During this first phase of the work there was not enough to software and hardware issues (input devices, diffusion
space for an intensive and structured test session on a large system) and applicative and musical aspects.
and heterogeneous set of users. However a hypothetical
evaluation experiment has been predisposed for a future 6. REFERENCES
use.
[1] A. Camurri et al. Toward real-time multimodal
The experiment has a total duration of about 45 minutes
processing: EyesWeb 4. In Proceedings of the
and it is composed by six sections:
Convention on Motion, Emotion and Cognition
1) free trial of the instrument (10 min) without any expla-
(AISB04), Leeds, UK, 2004.
nation about the working principles of the system (the user
has previously read a short user manual) [2] J. M. Chowning. The simulation of moving sound
2) supervised test (10 min) in which the user has to execute sources. In Journal of the Audio Engineering Society,
some tasks evaluated by the operators volume 19, pages 2–6, 1971.
3) explanation of the working principles (5 min) by an op- [3] M. Marshall, Wanderley, et al. On the development of
erator in order to increase the consciousness of control of a system for the gesture control of spatialization. In
the spatialization instrument and to accelerate the learning Proceedings of the 2006 International Computer
process Music Conference (ICMC06), pages 360–366, New
4) repetition of the test (10 min) after the explanations of Orleans, USA, 2006.
the operator [4] J. Nixdorf and D. Gerhard. Real-time sound source
5) questionnaire (5 min) of evaluation compiled by the user spatialization as used in Challenging Bodies:
6) interview (5 min) in which the operators deepen some implementation and performance. In Proceedings of
aspects appeared during the test. the 2006 International Conference on New Interfaces
The two proposed tests contains list a of 21 tasks (for each for Musical Expression (NIME06), pages 318–321,
test) that the user has to execute. Each task receives a Paris, France, 2006.
mark according to a five point Likert-scale (1: not exe- [5] N. Orio, N. Schnell, and M. M. Wanderlay. Input
cuted, 5: executed at the first trial). The tasks are sorted devices for musical expression: borrowing tools from
by the increasing level of difficulty and they are intended HCI. In Proceedings of the 2001 International
to test most of the functionalities of the instrument and its Conference on New Interfaces for Musical Expression
expressive possibilities. The questionnaire presents 22 ques- (NIME01), 2001.
tions divided into 5 categories: usability of the system (8), [6] F. Pachet and O. Delerue. A mixed 2D/3D interface
learnability (3), audio feedback (3), visual feedback (4) and for music spatialization . In Proceedings of the First
overall opinion (4). Also in the questionnaire the players International Conference on Virtual Worlds, pages
has to give a mark according to a five point Likert-scale (1: 298–307, Paris, France, 1998.
bad, 5: very good). [7] M. Puckette. Pure Data: another integrated computer
music environment. In Proceedings of the 1996
International Computer Music Conference (ICMC96),
5. CONCLUSIONS pages 269–272, Hong Kong, China, 1996.
A real-time sound sources spatialization system with a [8] V. Pulkki. Spatial sound generation and perception
multimodal interaction interface has been developed. by amplitude panning techniques. Graduation thesis,
The interaction interfaces have been realized with very Helsinki University of Technology, Laboratory of
simple and inexpensive technologies and devices, that have Acoustics and Audio Signal Processing, 2001.
nevertheless shown satisfactory expressive and interaction [9] C. Ramakrishnan, J. Gossmann, and L. Brummer.
possibilities. In particular the best results came out, as The ZKM Klangdom. In Proceedings of the 2006
expected, with the gamepad and the webcam, devices that International Conference on New Interfaces for
allow more freedom in movements and a more intuitive and Musical Expression (NIME06), pages 140–143, Paris,
natural interaction. Moreover the webcam let the user move France, 2006.
independently each sound source (action impossible with [10] J. C. Schacher. Gesture control of sounds in 3D space.
both the mouse and the gamepad). On the other hand, the In Proceedings of the 2007 International Conference
performances are one of the key aspects associated with this on New Interfaces for Musical Expression (NIME07),
last kind of device, because of the computational load of the pages 358–361, New York, USA, 2007.
image analysis techniques that make the real-time issue a [11] L. Schomaker, A. Camurri, et al. A taxonomy of
crucial aspect of the application. multimodal interaction in the human information
In general even all the graphic rendering operations for processing system. Technical report, Nijmegen
the creation of the visual feedback are particularly oner- University, 1995.
ous for the overall performances of the system. Under this
27
Realtime Representation and

Gestural Control of Musical Polytempi
Chris Nash and Alan Blackwell
Rainbow (Interaction & Graphics) Research Group, University of Cambridge
Computer Laboratory, 15 JJ Thomson Avenue, Cambridge, CB3 0FD, UK
+44 (0)1223 334678
{christopher.nash, alan.blackwell}@cl.cam.ac.uk
ABSTRACT from multiple, misaligned passages of music could be argued to

Over the last century, composers have made increasingly hold limited aesthetic appeal. For the performer: such
ambitious experiments with musical time, but have been perceptions of anarchy, chaos and randomness may seem ironic,
impeded in expressing more temporally-complex musical as the musician attempts to follow the composer’s explicit,
processes by the limitations of both music notations and human highly-ordered and inflexible timing directions, unable to rely
performers. In this paper, we describe a computer-based on implicit or explicit timing cues from other musicians or a
notation and gestural control system for independently conductor [19]; unable to rely on a universal, steady pulse of
manipulating the tempi of musical parts within a piece, at bar or beat. And, finally: if the performer struggles to manage
performance time. We describe how the problem was an individual part of the piece, then the composer’s task of
approached, drawing upon feedback and suggestions from developing and imagining the complete, combined performance
consultations across multiple disciplines, seeking analogous becomes an almost impossibly hard mental operation.
problems in other fields. Throughout, our approach is guided In conventional music, an audience is invariably attuned to
and, ultimately, assessed by an established professional global tempo variations within a piece. When introducing a
composer, who was able to interact with a working prototype of simultaneous part with a differing tempo, an extra dimension is
the system. added to the audience’s perception of the performance – the
explicit interplay of parts in respect of time.
Keywords
Tempo, polytempi, performance, composition, realtime, gesture
1. INTRODUCTION
Although intricate and complex musical processes involving
rhythm, melody and harmony are to be found in most musical
genres, the use of and conventions relating to tempo are less
adventurous [16]. It has only been the last century or so that has
seen composers, such as Steve Reich and Conlon Nancarrow, Figure 1. Perceived synchronisation in phase music.
experiment with simultaneous musical parts bearing differing
tempi [9]. As regards experiments in musical time, the notion of
Due to the periodic nature of much of the world’s rhythms [3],
polytempi is crucially different from the relatively more
there are various points where disjoint parts can appear more or
common concepts of polyrhythm and polymetre, which both
less temporally-aligned, so that the perceived effect is
rely on simple integer divisions of the bars or beats in the piece.
determined not only by the absolute musical offset, but also
In contrast, the multiple simultaneous tempi of polytempo
relative factors. For example, in Figure 1, the parts start in-sync
music leads to situations where the bar lines and beats of each
and gradually diverge because of differing tempi. Initially, the
part in the piece are themselves incongruent. The timing
divergence is small enough so that the listener can still corrupt
relationships between the events in each part can no longer be
their perception of the musical events onto a single time scale,
thought of, or expressed in, simple integer fractions (e.g. 3 in
dismissing the offset as they might a digital chorus effect,
the time of 2, or 3/2 vs. 6/4), but instead become irrational.
acoustic echo or performance prosody [9]. After time, the offset
A number of explanations can be volunteered for the paucity of increases and the two parts are more easily separated, becoming
polytempo use in the modern musical repertoire. For the harder to align perceptually. Yet, by Bar 4, the absolute offset is
average listener: the barrage of incongruous notes resulting approximately one beat, and thus the music can be aligned
about the beat. Continued, such alignment occurs relative to
other points in the bar, as well as divisions of the beat, and
inevitably aligns relative to the bar itself.
The varying incongruity of notes can be seen to form a temporal
harmony, where perceived aligned and misaligned episodes
correspond to consonance and dissonance respectively. For
centuries, these concepts have been powerful tools levered by
composers in their engagement with tonal harmony; in the
typical case, dissonance giving way to consonance, to provide
28
resolution [14]. As such, and like dissonant harmonies, the nonetheless able to present impressions of temporal harmony,
average listener’s aversion to apparently cacophonic, with temporally dissonant passages resolving to consonance.
misaligned music serves only to reinforce the potential for Furthermore, his reliance on more established working practices
temporal resolutions. affords him greater flexibility in instrumentation, arrangement
and performance.
Harmony and pitch have been studied, codified and notated in a
variety of ways to enable performance by musicians and Both Nancarrow and Reich effectively used technology to
experimentation by composers, yet few notations have arisen to address problems with the use of polytempi, but were both
explicitly express temporal relations [16]. In our research, we forced to pre-calculate and prescribe the tempo variations long
apply computer technology and interactive notations to tackle in advance of performance; waiting hours, days or weeks to
these remaining problems. Whereas the problem of directing hear the result of their writing. In all three cases, the composers
and coordinating performers is unquestionably also a matter of are forced to limit their creativity in some way, be it temporal
notation (be it paper, digital, aural, static or interactive), this freedom, dynamism, note-to-note control or instrumentation.
paper principally focuses on the earlier, pre-requisite stage in
Approaches to managing complex musical timings tend to focus
the process – the composer’s creation of the music. For our
on performance requirements. Ghent [7] is one of the earlier
purposes, the “super-human virtuosity” currently required of
attempts to use audio cues (e.g. multiple metronomes) for
polytempi performers can be provided by the computer [4].
individual musicians. Ligeti [12] uses a similarly audio-based
method. Such techniques isolate the musician from the
2. BACKGROUND ensemble and, more importantly, the part from the piece, which
The simple concept described in the Introduction and Figure 1 is not only incompatible with the composer’s requirement of a
underpins the phase music of Steve Reich [15]. “Piano Phase” macroscopic view of the music, but also inhibits performer
(1967) contains two musical parts with the same melodic and interaction, an important component of the music, socially and
rhythmic content, but slightly differing (yet constant) tempi. aesthetically [19].
The parts start together, then gradually diverge, or ‘phase’, in
musical time, producing moments of dissonance and Other explicit considerations of polytempo music are sparse,
consonance, as the parts become more or less aligned. Reich’s and the paucity of published research in this area is marked by
interface to this process began with a tape machine, playing two the writings of lamenting composers desperate to explore more
looped tapes of the phrase at different speeds. Subsequently, advanced musical timings, such as the late Stockhausen [17]. A
and owing to the relative simplicity and repetitive nature of the useful website, run by artist John Greschak [8], contains more
musical content, he was able to carry the idea to the piano, information and unpublished articles about polytempo, as well
whereupon two exceptionally disciplined and practiced as an annotated list of polytempo music. To our knowledge,
performers can play the music live. With the exception of the there has been no previously published work in the area of
tape speed settings, general performance directions and the music interaction or interface design that has significantly
looped phrase itself, however, the piece is not fully-scored, but addressed musical tempo as the focus of control, nor explicitly
is instead an example of the generative or procedural considered the composer and composition as target user or task.
specification of music. Notably, it is difficult to inspect or
manipulate specific, individual notes or events in the 3. A SYSTEM FOR POLYTEMPI
performance. There are two principal requirements of a system allowing
composers to interact with tempo and polytempi: a
Conlon Nancarrow, a contemporary of Reich, took a different
representation of the polytempi, including the temporal
approach to the problems of notation and performance,
relations between parts (the notation); and a method of
replacing the human pianist with a pianola (or player piano),
manipulating and managing the tempo of such parts (the
notating his music on the paper roles used by the machine [9].
system). In this latter case, interaction should occur in realtime,
Unlike score notation, the roles represent time linearly and the
in order to quickly allow the auditioning of alternative material
piano’s mechanism eventually afforded the opportunity to
and making of expressive refinements.
dynamically vary tempo within a part. Unlike Reich,
Nancarrow’s pieces tended not to rely on the phasing of musical However, before further considering issues of system design
events in repetitive parts, but on a grander plan of having a and implementation, we must tackle one of the fundamental
single, climatic point of synchrony. goals of our research: the design of a notation for polytempi,
upon which the system will be based.
Alejandro Viñao [1], an established modern-day composer, has
been much inspired by Nancarrow’s efforts, and now brings a 3.1 Notation
more personal perspective and motivation to our research, The design of our system was arrived at by drawing on our prior
joining us as Composer in Residence at the Computer research into notations for performance and composition in
Laboratory. For more than 30 years and in a variety of areas and music and other expressive arts [2][5]. The lack of previous
centres of research (including IRCAM and MIT’s Media Lab), work in this specific problem encouraged us to look for
he has sought technologies to help him express his musical analogies in other disciplines and fields where it is necessary to
ideas. Yet, his appropriation of technologies and methods in handle parallel streams, signals and processes – such as physics,
conventional music practice force him to an unsatisfying data communication, computer security, graphics, and
compromise when it comes to exploring polytempi. Using engineering fields.
scored music, Alejandro divides the bar into the finest
performable resolution (e.g. 1/32nd notes), and uses varying In facilitated cross-disciplinary meetings with 10 different
note accents and stresses to give the impression of multiple specialist research groups (see Section 8, for a full list), the
tempi. Even though he admits such methods do not produce concept of phase and synchronization was highlighted in a
true polytempi, Alejandro manages to create pieces that are number of non-musical activities, possibly the closest cousin of
29
which is physical sound itself. A periodic waveform, such as a
Bar Y
Bar Y
Bar Y
sine wave, at any moment has a phase, frequency and (c) (d)
wavelength that might be adapted to music, in the forms of
(b)
musical position, tempo and bar length, respectively1.
Considering a musical part as a periodic signal, the challenge
moves to representing multiple signals so that the relationship
(a) Bar X Bar X Bar X
between them is evident. In many fields, phase can be plotted or
graphed as a function of other properties of a given system,
such as time (e.g. phase plot) or frequency (e.g. Bode plot). In
this manner, it would be possible to plot musical position on a Figure 3. An idealised example of using a musical
vertical axis against absolute time on the horizontal, but this phase plot to manage polytempi.
(a) Part Y is progressing faster through the bar. (b) The part is slowed
would only be useful in plotting absolute synchronization and
to the tempo of its counterpart, leaving them offset by 1 musical beat.
absolute time offsets – tempo would be implicitly presented as (c) The part is again slowed so that, by (d), the parts are back in sync.
line gradient, and relative alignments would also be difficult to
identify. Instead, we propose a plot of the phase of one signal
against the phase of another, as in Figure 2(a). In music, this is
bar lines (the other part’s formed by the axes of the graph), and
the musical position within the bar of one part, against that in
gradually creeps across the grid, as the bar lines diverge,
another part, as shown in Figure 2(b).
eventually converging on the opposite extreme of the bar,
whereupon the process concludes, having regained synchrony,
albeit a bar adrift.
Bar Y
3.2 Interface
The examples above demonstrate how the plot can be used to
inspect temporal aspects of a piece but, in order to be of use to
composers, a system must allow the viewer to affect the tempi –
to draw the line themselves – and react to what they see and
hear.
It would be possible to expose the relative synchronisation as a
(a) (b) Bar X control parameter, but this would require the composer to first
select a reference part to which the synchronisation would be
Figure 2. Plots of phase against phase. relative, effectively restricting tempo variation to a single part,
(a) A general case. (b) An adaptation for musical purposes (4/4). at any given time. Instead, we elected to simply control the
tempi of both parts independently.
In addition to these two fundamental variables, we envisaged
Although the plot no longer allows the reader to deduce the
additional control parameters. Notably, the composer will, at
individual tempos of each part, the relationship between them is
different times, wish to affect tempo variations of varying scale.
clear – a diagonal line (45 degrees) implies matched tempi;
With Reich and Nancarrow, the tempo changes were gradual
steeper or shallower and one part is faster or slower than the
and finely-controlled, but other composers, such as Alejandro
other. More importantly, the bar-level phase difference is also
Viñao, desire the expressive freedom to make both fine and
displayed, allowing the reader to easily deduce points of
more abrupt, coarser variations. Thus, a third variable of control
relative alignment, as shown by the guidelines in Figure 2(b).
range (or resolution) is required. Finally, observing that
From the diagram it is possible to see the salient factors of the
temporal harmony involves the varying between two extremes
polytempi process – the relative phases and synchronization of
(temporal consonance and dissonance), and that most pieces
two parts – and extrapolate how changes in each tempo, which
revolve around the journeys between them, we introduce a
affect the gradient of the plot, will affect the degree of
fourth factor in the interaction: a “gravitational” element that
synchrony over time. Figure 3 gives an illustration of a musical
draws the two parts into consonant temporal congruity, to a
application.
varying degree. Altogether, this requires an interface offering at
To further illustrate how the plot functions, consider how least 4 degrees of freedom, corresponding to: tempo of first
Reich’s “Piano Phase” would be represented: With two parts part, tempo of second part, tempo control resolution and
featuring close yet differing constant tempos, the line would be influence of gravity.
drawn with a gradient slightly off-diagonal. One part would
Our interface could simply be formed from common input
reach the end of the bar sooner than the other, prompting the
widgets (sliders, rotary knobs, etc.). However, in designing our
line to ‘wrap-around’ using the dashed lines, as in Figure 2. The
prototype, we turned to human gesture, where the body affords
wrap-around line illustrates the relation between the two parts’
a large variety of motions to which our scales might be
effectively mapped, and where their interrelationships and
dependencies might be implicitly reflected. Gesture is often
seen as a ‘natural’ interaction mode for computer-based musical
1 applications, owing to the physical and tangible nature of
Amplitude, the remaining fundamental characteristic of audio signals
interaction in traditional music making [13]. In this vein, we
constitutes an instantaneous property, and might be seen as the
counterpart to similar musical properties such as dynamics, pitch,
elected to use gestures, motions and actions that would not
instrumentation, etc. appear out of character with those established in live musical
30
Figure 4. A Vicon™ Motion Capture-based system designed for controlling and representing polytempi.
performance. Specifically, the similarity of expressive roles within a confined space. The raw camera data is processed by
between our user, the composer and that of a conductor was a the Vicon™ system into a realtime stream of 3D coordinates,
significant influence in our selection. The intended result was a which can be combined into groups representing different
method of interacting that would make a user more comfortable bodies and limbs.
in their manipulation of the system, where musical-like physical
For our system, we used two gloves and a belt to allow us to
actions prompted clear musical results and users did not feel
determine the position of the hands, relative to the body, and
inhibited or self-conscious by having to make overt, overly-
the position of the body relative to the space. The data was
exuberant and uncharacteristic gestures.
piped over a TCP/IP network to a PC running Cycling 74’s
Projecting the phase schematic onto a wall-mounted screen, a Max/MSP and Jitter. Using C++, we developed a Max/MSP
Vicon™ Motion Capture system [18] was used to capture body external that converted the data packets into usable Max
motion. A similar system has been used to control synthesizers variables. Variables corresponding to the positions, velocities
and sound generation [6], but we could find no published and orientation of the waist and each hand were connected to
account of an attempt to use such a system and gesture to allow the respective control variables of a MIDI playback engine
higher-level, realtime control of musical composition and (playing pre-recorded piano or percussive parts), so that they
expression. could be appropriately manipulated. In turn, the control
variables, together with the status variables of the engine, were
Our system (see Figures 4 and 5) was designed so that the
then passed to a Jitter patch that constructed a graphical
height of each hand would set the tempo of each respective part.
representation of the musical phase plot, to be fed back to the
Walking forwards or backwards set the tempo range addressed
screen.
by the hands, literally allowing more “up-close” adjustments or
broader handling “from a distance”. Appropriately, the effect of Despite a diverse collection of protocols, the different
gravity could be controlled by bringing the hands closer technologies integrated well, and a basic system was up and
together laterally, so that clasped hands (vertical and horizontal running quickly, allowing us time to iteratively refine the
proximity) would ultimately bring about synchronisation of interaction. The system outlined in Section 4 was designed to
both tempo and relative position. Additional gestures were encapsulate the relative properties of the synchronisation
added to start and stop playback (a quick clap), and to allow the between parts and, in doing so, would provide only limited
user to lock the tempo of each part (turning the respective palm insight into the more absolute characteristics of the performance
up) so that they could focus on the other. – notably, absolute tempo or absolute part position. In Figure 4
and 5, the screen shows the musical phase plot in a 3D
perspective, whereby a different plot is presented for each bar
4. TECHNICAL DETAILS of a single part, flying forward in an abstract 3D space,
The Vicon™ system works by using multiple cameras that can
appearing at a distance from the upper right (allowing bars to be
detect infra-red light reflected off small reflective balls attached
read left-to-right), at a speed matching the part’s tempo. The
to the subject. Belts, hats, gloves and suits adorned with these
user is thus given the impression that they are progressing
balls can be worn to allow untethered movement to be recorded
through the piece, and at what rate.
31
example, the system’s difference in feedback between the two

parts when offset by half-a-bar and when offset by one-and-a-
half bars was minimal, yet could potentially hold important
musical implications. The display of absolute positions and
tempos was limited to the status readout on the right of the
display, together with the appropriate labelling of the axes
(indicating the current bar of each part). Similarly, although the
extrusion of the plots into 3D succeeded in providing a sense of
progress and time passing, the wrap-around lines now leapt
from plot to plot, making it harder to observe bar-to-bar trends,
and we were led to conclude that the original, static 2D design
might be more suitable in most musical applications.
Furthermore, a better macroscopic impression would be
afforded by adjusting a 2D musical phase plot (drawn relative
to bar lengths), to one where the axes simply represent a
continuing, absolute musical position within each part. The
viewport would then pan over the current musical position
appropriately. This would obviate the need for the wrap-around
Figure 5. Alejandro Viñao (foreground) using the prototype. line, which might reduce the visibility of bar-to-bar
relationships, but which could be replaced by appropriate
annotations to make bar transitions and relationships more
5. DISCUSSION explicit.
Following development of the basic architecture,
implementation of the prototype followed an iterative design 6. CONCLUSIONS
process, based on feedback from our own interaction with the This paper identified an area of musical expression that has
system, and three trials by Alejandro Viñao, which produced received relatively little attention from technologists and music
positive and useful feedback. researchers. In an effort to tackle the barriers between
The difference between our interactions and that of a practising composers and polytempi, we have proposed and tested both a
composer were revealing. To test the system, we used a variety notation for representing multiple tempi in music and a gestural
of movements to ensure a robust and varied interaction. As system for interacting with them in realtime.
demonstrated by the video (see Section 6), Alejandro’s gestures Alejandro Viñao’s assessment demonstrated the aptness of our
were significantly more subtle, focusing on fine control – underlying design concept, while identifying a number of minor
reiterating the utility of the resolution control. interaction issues that would be relatively easy to address. Our
For Alejandro, the strength of the design concept was already system, however, is but one possible solution to the problem,
evident in the early version of the system we had available for based on but one suggestion for a notation supporting
his first visit. The forwards-backwards tempo control resolution polytempi. It is yet to be established how well our system
feature was introduced for his second visit and refined in the scales to pieces with more than two differing tempi (i.e. using
third to allow a level of temporal control at which he could multiple plots or multiple axes). Furthermore, a major challenge
comfortably and confidently effect temporal manipulations in will be the integration of polytempi notation with both live
the music. Experimenting with temporality in a small selection performance and other elements of music (melody, harmony,
of pre-prepared pieces, Alejandro mentioned that he was dynamics, rhythm, etc.), both of which would afford the
already eager to try the prototype with music of his own composer or conductor greater possible creative freedom for
creation. realising music.
As with many musical instruments, mastering the interaction

might have required more than the short exposure afforded
7. SUPPORTING MATERIAL
Alejandro. In this respect, the “gravity” feature demonstrated its A computer animation demonstrating the system, including the
potential as a helper device, assisting actions that would supported gestures, as well as a video of Alejandro Viñao using
otherwise require fine control and hand-eye coordination – the system is available online at:
allowing Alejandro to more easily target and achieve alignment
and temporal consonance. The prototypes were only http://polytempi.nashnet.co.uk
implemented with a basic gravity effect that brought the two
parts closer together in absolute musical position. A more
flexible feature, whereby the effect might gradually match 8. ACKNOWLEDGMENTS
tempos or align position relative to either the bar, beat or sub- Special thanks to Alejandro Viñao, who presented us the
beat, would further improve the usability and creative flexibility challenge and gave us invaluable insights into modern musical
of the system. practice; and additionally to Tristram Bracey and Joe Osborne,
Alejandro observed that the system was well-suited to who worked on developing the prototype. For a wide range of
inspecting, manipulating and adapting to polytempo processes other insights, thanks to the various groups who kindly met
operating at the level of the bar – either within a bar, or relative with us: the Digital Technology, NETOS, OPERA, Rainbow
to the bar line. However, he noted that it was difficult to and Theory & Semantics research groups here in the Computer
orientate oneself to the macroscopic aspects of the piece. For Laboratory; the Inference group at the Cavendish Laboratory;
the Signal Processing group in the Engineering Department;
32
and the Socio-Digital System team at the Microsoft Research [9] Grout, D. and Palisca, C. A History of Western Music
Centre. For additional input thanks also to Ian Cross, of the (Fifth Edition). W. W. Norton & Co. Inc., NY, 1996.
Centre for Music & Science. Lastly, many thanks to the [10] Howard, D. and Angus, J. Acoustics and Psychoacoustics
Leverhulme Trust and the Engineering, and Physical Science (Third Edition). Focal Press, Oxford, UK, 2006.
Research Council (EPSRC), without whose financial support
this project would not have been possible. [11] Jordà, S. New Musical Interfaces and New Music-making
Paradigms. In Proceeding of New Interfaces for Musical
Expression (ACM CHI'01), ACM Press., New York, 2001.
9. REFERENCES
[1] Bellaviti, S. Perception, Reception, and All That Popular [12] Ligeti, L. Beta Foly: Experiments with Tradition and
Music: An Interview with Alejandro Viñao. In Discourses Technology in West Africe. In Leonardo Music Journal,
in Music, 6, 2 (Spr-Sum‘07), University of Toronto, 2007. 10, 2000, 41-48.
[2] Blackwell, A. and Collins, N. The programming language [13] Magnusson, T. and Mendieta, E. The Acoustic, the Digital
as a musical instrument. In Proceedings of PPIG 2005 and the Body: A Survey on Musical Instruments. In
(Brighton, UK, June 29-July 1, 2005), 2005, 120-130. Proceedings of NIME ’07 (New York, June 6-10), 2007.
[3] Clark, E. Rhythm and Timing in Music. In The Psychology [14] Piston, W. and DeVoto, M. Harmony (Fifth Edition). W.
of Music (Second Edition, ed. Deutsch, D.), Elsevier Press, W. Norton & Co. Inc., NY, 1987.
1999, 725-792. [15] Potter, K. Four Minimalists: La Monte Young, Terry Riley,
[4] Collins, N. Relating Superhuman Virtuosity to Human Steve Reich, Philip Glass (Music in the Twentieth
Performance. In Proceedings of MAXIS (Sheffield Hallam Century), Cambridge University Press, Cambridge UK,
University, April 12-14, 2002), 2002. 2000.
[5] Delahunta, S., McGregor, W. and Blackwell, A. [16] Read, G. Music Notation: A Manual of Modern Practice
Transactables. Performance Research Journal, 9, 2 (Jun. (Second Edition). Taplinger Publishing Company, New
2004), 67-72. York, NY, 1979.
[6] Dobrian, C. and Bevilaqua, F. Gestural Contol of Music: [17] Stockhausen, K. How Time Passes. In die Reihe, 3
Using the Vicon 8 Motion Capture system. In Proceedings (Musical Craftsmanship), 10-40.
of NIME’03 (Quebec, Canada, May 22-24), 2003, 161-3. [18] Vicon Motion Systems. The Vicon MX Motion Capture
[7] Ghent, E. Programmed Signals to Performers: A New System. Detailed at http://www.vicon.com. Last Updated:
Compositional Resource. In Perspectives of New Music, Jan. 30, 2008. Last Checked: Jan 30, 2008.
6, 1, 1967, 96-106 [19] Williamon, Aaron. Musical Excellence: Strategies and
[8] Greschak, J. Polytempo Music Articles. Available at techniques to enhance performance. Oxford University
http://www.greschak.com/polytempo. Last Updated: Jan. Press, Oxford, UK, 2004.
15, 2008. Last Checked: Jan 30, 2008.
33
Towards Idiomatic and Flexible Score-based Gestural

Control with a Scripting Language
Mikael Laurson Mika Kuuskankare

CMT CMT
Sibelius Academy Sibelius Academy
Helsinki, Finland Helsinki, Finland
laurson@siba.fi mkuuskan@siba.fi
ABSTRACT 2. SCORE-BASED SYNTHESIS CONTROL

In this paper we present our recent enhancements in score- Our score-base control scheme has several unique fea-
based control schemes for model-based instruments. A tures. First, the input process is interactive. After listening
novel scripting syntax is presented that adds auxiliary note to the result the user can modify the score and recalculate
information fragments to user specified positions in the the score until satisfied with the outcome. The user can se-
score. These mini-textures can successfully mimic several lect and edit any range from the score, polish it and hear the
well known playing techniques and gestures - such as orna- refinements in real-time, without re-synthesizing the whole
ments, tremolos and arpeggios - that would otherwise be piece. The ability to work with only a small amount of mu-
tedious or even impossible to notate precisely in a tradi- sical material at a time has proven to be very useful. This
tional way. In this article we will focus on several ‘real-life‘ is especially important when working with musical pieces
examples from the existing repertoire from different periods of considerable length. Second, our system allows to use
and styles. These detailed examples explain how specific performance rules that generate timing information and dy-
playing styles can be realized using our scripting language. namics automatically in a similar fashion than in [1]. The
user can, however, also work by hand using the graphical
front-end of the notation package. In this case special ex-
Keywords pression markings can be inserted directly in the score. We
synthesis control, expressive timing, playing styles have found that this kind of mixed approach - using au-
tomated rules and hand-given timing information - is very
1. INTRODUCTION practical and allows to define time modifications in a more
The simulation of existing acoustical musical instruments flexible way than using automatic rules only. Third, the
such as the classical guitar in this study [4] - provides a system supports both local and global time modifications.
good starting point when one wants to evaluate the qual- The importance of this kind of approach has also been dis-
ity of a synthesis algorithm and a control system. In this cussed in [2]. Local modifications involve only one note or
paper we aim to present our recent research efforts deal- chord (such as an expression that changes the time inter-
ing with our score-based control scheme [8]. Various as- val between notes). A global modification, in turn, handles
pects of our score-based control system have already been a group of notes or chords (a typical example of this is a
presented in different papers, for instance time modifica- tempo function).
tion [5], playing technique realizations [9], and the more
recent article dealing with macro-notes [6]. In the following
we aim to combine these features and show how realistic
3. MACRO-NOTES
playing simulations can be realized in an economical way. In this section we focus on an important component our
We will discuss three larger case studies from the exist- control system called macro-note. The macro-note imple-
ing guitar repertoire and give information how the system mentation has been revised and it is now compatible with
is able to reach convincing simulations. The realizations of our scripting language syntax. This syntax in turn has been
these examples can be found as MP3 files in our home page: used in demanding analytical and compositional tasks. The
www.siba.fi/pwgl/pwglsynth.html. scripting syntax has a pattern-matching header that ex-
Musical scores in our system are situated within a larger tracts complex score information, thus making it straight-
environment called PWGL [7]. PWGL is a visual pro- forward to produce side-effects in a score.
gramming language based on Lisp, CLOS and OpenGL. Macro-notes allow to use notational short-hands which
Scores are of primary importance in our system and they are translated by the control system to short musical tex-
can be used in many compositional and analytical applica- tures. In the simplest case this scheme allows to mimic
tions such as to produce musical material for instrumental ornaments, such as trills and arpeggios. The reason for
music [3]. introducing the macro-note scheme in our system comes
from our previous experiences using musical scores to gen-
erate control information. To realize an ornament - say a
baroque trill in a dance movement - just by using metri-
Permission to make digital or hard copies of all or part of this work for cal notation without any abstraction mechanism can be an
personal or classroom use is granted without fee provided that copies are awkward and frustrating experience. What is worse, the re-
not made or distributed for profit or commercial advantage and that copies sult is typically ruined if the user changes the tempo. Thus,
bear this notice and the full citation on the first page. To copy otherwise, to in order to capture the free-flowing accelerandi/ritardandi
republish, to post on servers or to redistribute to lists, requires prior specific gestures typically associated with these kinds of ornaments
NIME08, Genova, Italy we need better abstraction mechanisms: the system should
Copyright 2008 Copyright remains with the author(s). respond gracefully to tempo changes or to changes in note
34
5. REALIZATION EXAMPLES
In this section we discuss three case studies. The first one
is a tremolo study realization (the original piece was com-
posed by Francisco Tarrega). The result is given in Figure
2. Although this example is now more complex it follows a
Figure 1: Two macro-note realizations that are la- similar scheme than the previous one. The following script
belled with ”trr”. The auxilliary notes are displayed was used to realize this example. Here the PM-part (1) ac-
after the main note as note-heads without stems. cesses all chords in a score and runs the Lisp-code part (2)
if the chord contains the expression with the label ’trmch’
(the variable ’ ?1’ will be bound to the current chord). The
duration; the system should know about the current musi- pitch-field consists now of all sorted midi-values that are
cal context such as dynamics, harmony, number of notes in contained in the chord. The most complex part of the
a chord; the system should have knowledge about the cur- code deals with the generation of a plucking pattern for
rent instrument and how it should react to various playing the tremolo gesture (see the large ’case’ expression) This
techniques. result defines the ’:indices’ parameter. Here different pat-
terns are used depending on the note value of the chord.
For instance, if the note value is a quarter note, 1/4, then
4. MACRO-NOTE SYNTAX the pattern will be ’(2 3)’, which will be expanded by the
Next we go over and discuss the main features of the ’add-items’ function to ’(2 1 1 1 3 1 1 1)’. This means that
macro-note syntax. As was already stated above, a macro we will use a typical tremolo pluck pattern where we pluck
note expression uses our scripting syntax having three main once the second note and then three times the first note in
parts: (1) a pattern-matching part (PM-part), (2) a Lisp- the pitch-field, then the third note and three times the first
code part, and (3) a documentation string. In the following note, and so on. We use here also an extra keyword called
code example we give a simple marco-note script that adds ’:len-function’ that guarantees that the sequence is finished
auxiliary notes to the main note simulating a repetition after the pattern has reached a given length.
gesture (see also Figure 1): A break-point function controls the overall amplitude
contour, ’:amp’, of the resulting gesture. Note that this
(* ?1 (e ?1 "trr") ; (1) PM-part
contour is added on top of the current velocity value.
(?if (add-macro-note ?1 ; (2) Lisp-code part
:dur (synth-dur ?1) Finally, we use two parameters that affect the timing of
:dtimes ’(.13 30* .12) the result. The ’:artic’ parameter is now a floating point
:midis (m ?1) value that is interpreted by our system as an absolute time
:indices 1
:artic 50 value in seconds, here 5.0s (by contrast, in the previous
:time-modif example we used integers that in turn were interpreted as
(mk-bpf ’(0 50 100) ’(90 130 100)) percentage values). This controls the crucial overlap effect
:update-function ’prepare-guitar-mn-data))
"repetition") ; (3) Documentation of the tremolo gesture. 5.0s is used here as a short-hand to
say: ’keep all sounds ringing’. The calculation of the final
durations is, however, much more complicated (for instance
the low bass notes will ring longer than the upper ones), but
In the PM-part (1) we first state, with a wild-card, ’*’, this will be handled automatically by the update-function.
and a variable, ’ ?1’, that this script is run for each note in The ’:time-modif’ parameter is similar to the one in the
the score (thus ’ ?1’ will be bound to the current note). Fur- previous example: we do an accelerando/ritardando gesture
thermore we check whether the note contains an expression during the tremolo event.
with the label ”trr”. If this is the case we run the Lisp-code
part (2). Here we call the Lisp function ’add-macro-note’ (* ?1 :chord (e ?1 "trmch") ; (1) PM-part
(?if ; (2) Lisp-code part
that generates a sequence of notes according to its keyword (when (m ?1 :complete? T)
parameters. The arguments are normally numbers, sym- (let* ((ms (sort> (m ?1)))
bols, lists or break-point functions. Internally these argu- inds len-function)
(case (note-value ?1)
ments are converted to circular lists. In our example we first (3/4
specify the duration of the sequence (’:dur’). Next we give a (setq inds (add-items ’(4 3 2 3 2 3) 3 1)
list of durations (’:d-times’). After this we define the ’pitch- len-function ’(= (mod len 24) 0)))
(1/4
field’ of our macro-note, ’:midis’, which is in our case the (setq inds (add-items ’(2 3) 3 1)
midi-value of the current note, ’(m ?1)’ . A closely related len-function ’(= (mod len 8) 0)))
argument, ’:indices’, follows, that specifies how the pitch- (1/2
(setq inds (add-items ’(4 3 2 3) 3 1)
field will be read. Here the pitch-field consists of only one len-function ’(= (mod len 16) 0))))
pitch and using the index 1 we get a sequence of repetitions. (add-macro-note ?1
Two time related parameters follow: the first one, ’:artic’, :dur (synth-dur ?1)
defines an articulation value (which is in our case 50 per- :dtimes ’(.13 30* .12)
:midis (mapcar ’list ms ms)
cent meaning ’half-staccato’); the second, ’:time-modif’, is :indices inds
a tempo function, defined as a break-point function, where :len-function len-function
x-values are relative to the duration of the note (from 0 to :amp (mk-bpf
’(0.0 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
100), and the y-values specify tempo changes as percentage (g+ ’(40 20 0 30 10 50 20 40) (vel ?1)))
values (100 percent means ’a tempo’). Thus in this gesture :artic 5.0
we start slower with 80 percent, make an accelerando up :time-modif (mk-bpf ’(0 50 100) ’(90 130 100))
:update-function ’prepare-guitar-mn-data))))
to 130 percent, and come back to the ’a tempo’ state with "tremolo chords")
100 percent. Finally, the ’:update-function’ performs some
instrument specific calibration of the generated macro-note
sequence. Figure 1 shows two applications of the macro- Our next example is a realization of a arpeggio study by
note script. Heitor Villa-Lobos (Figure 3) and the script is quite similar
35
to the previous one. The main difference is that the pitch- 6. CONCLUSIONS
field is sorted according to string number and not according This paper presents our recent developments dealing with
to midi-value as was the case in the tremolo study example. a score-based control system that allows to fill a musical
The ’:indices’ parameter is also different: now it is static, score with ornamental textures such as trills and arpeg-
reflecting the idea of the piece where the rapid plucking gios. After presenting the main syntax features we discussed
gesture is repeated over and over again. three larger case studies that aim to show how the macro-
We combine here two notions of timing control: a global note scheme can be used in a musical context.
one and a local one. A global tempo function (see the break- These examples have been subjectively evaluated by the
point function above the staff that is labelled ”/time”) authors (the first author is a professional guitarist), and
makes a slow accelerando gesture lasting for 5 measures. we consider the macro-note scheme clearly to improve the
This global timing control is reflected in our script where the musical output of our model-based instrument simulations.
local ’:dur’ parameter gets gradually shorter and shorter. While this paper concentrates in the simulation of existing
(* ?1 :chord (e ?1 "vlarp") ; (1) PM-part musical instruments, it is obvious that our control scheme
(?if (when (m ?1 :complete? t) ; (2) Lisp-code part could potentially be used also to control new virtual instru-
(let* ((ms (mapcar #’midi (sort (m ?1 :object T) #’< ments.
:key #’(lambda (n)
(first (read-key n :fingering)))))))
(add-macro-note ?1
:dur (synth-dur ?1)
7. ACKNOWLEDGMENTS
:dtimes ’(.14 20* .12) The work of Mikael Laurson and Mika Kuuskankare has
:midis (mapcar ’list ms ms) been supported by the Academy of Finland (SA 105557 and
:indices ’(6 4 5 3 4 2 3 1 2 1 3 2 4 3 5 4)
:artic 1.0
SA 114116).
:amp (mk-bpf
’(0.0 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
(g+ (vel ?1) ’(50 30 10 40 20 60 30 50)))
8. REFERENCES
:len-function ’(= len 32) [1] A. Friberg. Generative rules for music performance: A
" Villa-Lobos arp")
formal description of a rule system. Computer Music
Journal, 15(2):56–71, 1991.
[2] H. Honing. From time to time: The representation of
Our final example, an excerpt from J. S. Bach’s Sara- timing and tempo. Computer Music Journal,
bande, is the most complex one, and it is probably also 35(3):50–61, 2001.
the most delicate one, due to its slow basic tempo. The [3] M. Kuuskankare and M. Laurson. Expressive Notation
piece is ornamented with rich improvised textures, such as Package. Computer Music Journal, 30(4):67–79, 2006.
portamento glides, trills and arpeggios (see Figure 4). In
[4] M. Laurson, C. Erkut, V. Välimäki, and
the following we discuss the arpeggio script that is applied
M. Kuuskankare. Methods for Modeling Realistic
three times (see the chords with expressions having the la-
Playing in Acoustic Guitar Synthesis. Computer Music
bel ”carp”). The arpeggio script is similar to the tremolo
Journal, 25(3):38–49, Fall 2001.
example as we have a database of plucking patterns. These
[5] M. Laurson and M. Kuuskankare. Aspects on Time
are organized here, however, according to the number of
Modification in Score-based Performance Control. In
notes in the pitch-field. Furthermore, the script can choose
Proceedings of SMAC 03, pages 545–548, Stockholm,
randomly (using the ’pick-rnd’ function) from several alter-
Sweden, 2003.
natives. This results in arpeggio gesture realizations that
are not static but can vary each time the score is recalcu- [6] M. Laurson and M. Kuuskankare. Micro Textures with
lated, similar to the baroque performance practices where Macro-notes. In Proceedings of International Computer
a player is expected to improvise ornaments. Music Conference, pages 717–720, Barcelona, Spain,
2005.
(* ?1 :chord (e ?1 "carp") [7] M. Laurson and M. Kuuskankare. Recent Trends in
(?if (when (m ?1 :complete? t)
(let* ((ms (sort> (m ?1))) PWGL. In International Computer Music Conference,
(ind (case (length ms) pages 258–261, New Orleans, USA, 2006.
(6 (pick-rnd [8] M. Laurson, V. Norilo, and M. Kuuskankare.
’(6 5 4 3 2 1 2 3 4 5)
’(1 2 1 3 4 3 5 6 5 6 5 4 3 2 1) PWGLSynth: A Visual Synthesis Language for Virtual
’(1 2 3 4 5 6 5 4 3 2 1))) Instrument Design and Control. Computer Music
(5 (pick-rnd Journal, 29(3):29–41, Fall 2005.
’(5 4 3 2 1 2 3 4 5)
’(1 2 1 3 4 3 5 5 4 3 2 1) [9] M. Laurson, V. Välimäki, and C. Erkut. Production of
’(1 2 3 4 5 4 3 2 1))) Virtual Acoustic Guitar Music. In AES 22nd
(4 (pick-rnd International Conference on Virtual, Synthetic and
’( 4 3 2 1 2 3 4 )
’(1 2 1 3 4 3 4 3 2 1) Entertainment Audio, pages 249–255, Espoo, Finland,
’(1 2 3 4 4 3 2 1))) 2002.
(3 (pick-rnd
’( 3 2 1 2 3)
’(1 2 1 3 3 2 1))))))
(add-macro-note ?1
:dur (* 0.95 (synth-dur ?1))
:dtimes ’(.15 30* .13)
:midis (mapcar ’list ms ms)
:indices ind
:artic 5.0
:amp (mk-bpf
’(0.0 0.25 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
(g+ (vel ?1) ’(50 0 30 10 40 20 60 30 0)))
:time-modif (mk-bpf ’(0 50 100) ’(60 150 90))
"Bach arp")
36
Figure 2: Realization of the opening measures of the tremolo study ”Recuerdos de la Alhambra” by Francisco
Tarrega.
Figure 3: Arpeggio study by Heitor Villa-Lobos. This example is challenging as we use macro-notes mixed
with ordinary guitar notation.
Figure 4: Johann Sebastian Bach: Sarabande. This example contains macro-note arpeggios and trills,
vibrato expressions, a tempo function and a portamento expression.
37
Enhancing the visualization of percussion gestures by

virtual character animation
∗ ∗
Alexandre Bouënard Sylvie Gibet Marcelo M. Wanderley
Samsara / VALORIA Bunraku / IRISA IDMIL / CIRMMT
Univ. Européenne Bretagne Univ. Européenne Bretagne McGill University
Vannes, France Rennes, France Montreal, Qc, Canada
alexandre.bouenard@univ-ubs.fr sylvie.gibet@irisa.fr marcelo.wanderley@mcgill.ca
ABSTRACT is composed of different views of both the virtual character

A new interface for visualizing and analyzing percussion ges- and the instrument. It is finally enhanced with interactions
tures is presented, proposing enhancements of existing mo- between graphics modeling, physics synthesis of gesture and
tion capture analysis tools. This is achieved by offering a sound replay.
percussion gesture analysis protocol using motion capture. The paper is organized as follows. In section 2, previous
A virtual character dynamic model is then designed in or- work and motivations are discussed. The analysis process
der to take advantage of gesture characteristics, yielding to of percussion (timpani) gestures is detailed in section 3. Vi-
improve gesture analysis with visualization and interaction sualization and interaction concerns are discussed in section
cues of different types. 4. Finally, we conclude with further perspectives.
Keywords 2. RELATED WORK

Gesture and sound, interface, percussion gesture, virtual Previous works concern both percussion-related models
character, interaction. and interfaces, and works combining virtual character ani-
mation and music.
Most of the work about percussion gesture and sound
1. INTRODUCTION deals with the design of new electronic percussion devices,
Designing new musical interfaces is one of the most impor- thus creating either new interfaces (controllers) and/or new
tant trends of the past decades. Efforts have constantly been sound synthesis models and algorithms.
made to elaborate more and more efficient devices in order On the one hand, new interfaces are based on increasingly
to capture instrumental gestures. These technical advances efficient devices that are able to track gestures. Electronic
have given rise to novel interaction opportunities between percussions such as Radio Baton [1], Buchla Lightning [3],
digital instruments and performers, and the creation of new Korg Wavedrum [20] and ETabla [14] are digital musical
sound, image or tactile synthesis processes. Our main guide- instruments that are improving or emulating acoustic phe-
line aims at providing a set of pedagogical tools for helping nomena, by taking into account gesture cues such as posi-
the study of percussion gestures. Among these, rendering tion, touch and pressure. More recent work take advantage
real instrumental situations (interaction between performers of various techniques, such as magnetic gesture tracking [17],
and instruments) and exploring the gestural space (and its computer vision [16] or the physical modeling of the drum
corresponding visual, gestural and sounding effects) are of skin [13].
great interest. Eventually, our final goal is to build new vir- On the other hand, it is also achieved by designing sound
tual instrumental situations, especially with gesture-sound synthesis models and algorithms, ranging from purely signal-
interactions controlled by virtual characters. This paper of- based to physically-based methods [9]. These works rarely
fers a new tool for visualizing percussion gestures, which include the study of the instrumental gesture as a whole,
exploits both the analysis and synthesis of percussion ges- especially regarding to its dynamical aspects or its playing
tures. The analysis process is achieved by capturing the techniques, even if some take into account real measure-
movements of performers, while a physical model of virtual ments [2] and physical parameters mapping with percussion
character is designed for the synthesis. The visualization gesture [5]. Playing techniques can be qualitatively observed
∗Also with Samsara / VALORIA, Université Européenne de and used ([14] [12] [8]) for a better understanding of percus-
Bretagne (UEB), Vannes, France sive gestures.
They can also be quantified thanks to capturing tech-
niques [24], among which the most used is motion capture
by camera tracking. But whichever method is used to re-
Permission to make digital or hard copies of all or part of this work for produce the quality of the instrumental gesture, it generally
personal or classroom use is granted without fee provided that copies are fails to convey its dynamic aspect. That is why we explore
not made or distributed for profit or commercial advantage and that copies in this paper the possibility to physically animate a virtual
bear this notice and the full citation on the first page. To copy otherwise, to character performing percussive gestures so that its intrinsic
republish, to post on servers or to redistribute to lists, requires prior specific features are available to our interface.
NIME08 Genova, Italy As for previous work combining virtual character anima-
Copyright 2008 Copyright remains with the author(s). tion and music, very few studies are available, especially
38
in a mean of taking advantage of virtual character anima-

tion for helping the visualization of gestures. The DIVA
project1 used virtual character animation for audiovisual
performances output driven by MIDI events [11]. Hints
about motion capture characteristics towards the quality
of re-synthesis of the movement [15] have been proposed.
The influence of music performance on virtual character’s
behavior [23] has also been emphasized. Some work aims
at extracting expressive parameters from video data [4] for
enhancing video analysis. Eventually, a solution consists in
directly animating virtual models from the design of sound2 .
These studies are nevertheless out of the scope of virtual
character animation as a gestural controller for enhancing
the visualization and the analysis of instrumental situations.
Figure 2: Left: French (top) and German (bottom)

3. TIMPANI PERFORMANCE grips; Right: Impact locations on the drumhead.
There are many classifications of percussion instruments,
one of the most established typologies is based on physical Players commonly use three distinct locations of impacts
characteristics of instruments and the way by which they (Figure 2, right side). The most used is definitely the one-
produce sound. According to this classification, timpani are third location, while the rim appears rather rarely.
considered as membranophones, ”producing sound when the A database of timpani gestures has been created and is
membrane or head is put into motion” [6]. composed of five gestures: legato, tenuto, accent, vertical
accent and staccato. Each gesture is presented on Figure
3.1 Timpani Basics 3, showing the space occupation (Y-Z projection) of each
Timpani related equipment is mainly composed of a bowl, drumstick’s trajectory, and highlighting the richness of tim-
a head and drumsticks (Figure 1). In general, timpanists pani playing pattern variations.
have to cope with several timpani (usually four) with bowls
varying in size [19]. As for timpani drumsticks, they consist
of a shaft and a head. They are designed in a wide range
of lengths, weights, thicknesses and materials [6] and their
choice is of great importance [18].
Figure 3: Timpani playing variations - Tip of the

drumstick trajectories (Y-Z projection). Legato is
the standard up-and-down timpani gesture. Tenuto
Figure 1: Timpani player’s toolbox: bowl, head and and accent timpani variations show an increase in
drumsticks. velocity and a decrease in space occupation (in the
Y direction). Vertical accent and staccato timpani
Timpani playing is characterized by a wide range of play- variations also show an increase in velocity, and are
ing techniques. First, there are two main strategies for hold- characterized by an increase of space occupation (in
ing drumsticks (Figure 2, left side): the ”French” grip (also the Y direction) for a more powerful attack and
called ”thumbs-up”) and the ”German” grip (or ”matched” loudness.
grip).
Taking into account these various features, timpani ges-
1
DIVA project : www.tml.tkk.fi/Research/DIVA tures are thus characterized by a wide variability. Next ses-
2
Animusic : www.animusic.com sion will concern the quantitative capture of these variations.
39
3.2 Motion capture protocol and database of these gestures, the performer has been asked to change
We propose to quantitatively characterize timpani ges- the location of the beat impact according to Figure 2 (right
tures by capturing the motion of several timpani perform- side). Finally, our database is composed of fifteen examples
ers. We use a camera tracking Vicon 460 system3 and a of timpani playing variations for each subject, and to each
standard DV camera that allow both the retrieval of ges- example corresponds five beats per hand. This database
ture and sound. will be used when studying in detail the variations of the
The main difficulty using such hardware solutions is then timpani gesture.
the choice of the sampling frequency for the analysis of per- The use of widespread analysis tools integrated in Vicon
cussive gestures (because of the short duration of the beat software allow for the representation of temporal sequences
impact [7]). For our experiments, cameras were set at 250 as cartesian or angular trajectories (position, velocity, accel-
Hz. With a higher sampling frequency (500 Hz and 1000 eration), but one can easily observe that such a representa-
Hz), we could expect to more accurately retrieve beat at- tion isn’t sufficient to finely represent the subtility of gesture
tacks, but the spatial capture range is significantly reduced dynamics, and cannot be easily interpreted by performers.
so that it is impossible to capture the whole body. In the instrumental gesture context, we are mainly inter-
In order to retrieve beat impacts, markers have also been ested in also displaying characteristics such as contact forces,
placed on the drumsticks. The smaller timpani (23”) has vibration patterns, and a higher-level interpretation of cap-
been used to emphasize sticks rebounds. tured data (space occupation, 3D trajectories, orientation of
segments).
4. VISUALIZATION
Our visualization framework proposes the design of a vir-
tual instrumental scene, involving the physical modeling and
animation of both virtual characters and instruments. Tim-
pani gestures are taken from the database and physically
synthetized, making available both kinematic and dynamic
cues about the original motion.
4.1 Virtual instrumental scene

A virtual instrumental scene is designed using both graph-
ics and physics layers. The OpenGL graphics API is used
for rendering the virtual character, the timpani model, and
rendering motion cues of these entities. It also allows users
to explore the virtual instrumental space and to visualize
the scene from different points of view.
The ODE physics API [22] is used for the physical simu-
lation the virtual character and collisions.
Figure 4: A subject performing the capturing pro-

tocol. The number of markers and their positions
follow Vicon’s plug-in Gait indications.
Three performers (c.f. Figure 4) were asked to perform

our timpani-dedicated capturing protocol, yielding our tim-
pani gestures database. Table 1 proposes a summary of the
playing characteristics for each subject that has performed
our capturing protocol. The differences between performers
namely lie in their degree of expertise (Professor or Master
student), the grip strategy that is used (French or German),
their dominant (Left or Right) hand, and their gender.
Table 1: Timpani gestures data.

Subject Expertise Grip Handedness Gender
S1 Professor F Right M
S2 Master stud. G Left M
S3 Master stud. G Right F
Figure 5: Real-time visualization of segments’ ori-
Each performer has been asked to perform a single stroke entations.
roll of each gesture variation (legato, tenuto, accent, vertical
accent and staccato) presented in section 3.1. And for each
These graphics and physics layers build the primary visu-
3
Vicon : www.vicon.com alization framework. It is possible to enrich this visualiza-
40
tion with both meaningful kinematic and dynamic motion

cues since the overall gesture is available.
4.2 Kinematic cues

Kinematic motion cues can be of different types. Firstly,
positions and orientations of any joint and segment com-
posing the virtual character can be visualized (Figure 5) in
real-time by the rendering of coordinate references.
Temporal trajectories describing the motion can be traced
(Figure 6). These include position, velocity, acceleration,
curvature trajectories, as well as plane projections, posi-
tion/velocity and velocity/acceleration phase plots of seg-
ments and joints.
Figure 7: Real-time rendering of 3D trajectory and

bounding box - drumstick tip trajectories helps in
identifying the gesture space that is actually used.
4.3.1 Virtual character modeling and simulation

The dynamic simulation of instrumental gestures has been
achieved by firstly proposing a dynamic model of a virtual
character, and secondly by putting this physical model into
motion through a simulation framework.
The virtual character is both modeled by its anthropom-
etry and its physical representation. As for the anthropom-
etry, it directly comes from motion capture. The physical
representation of the virtual character is composed of seg-
ments (members) articulated by joints, each represented by
its physical parameters (mass, volume, degrees of freedom).
The simulation framework is composed of two modules.
The first one is the simulation of motion equations. Equa-
Figure 6: Example of kinematic trajectory plot. Tip tions 1 and 2 describe the evolution of a solid S of mass m.
of the drumstick : position/velocity phase along the The acceleration of a point M of the solid S is aM and FM
Z axis. is the resulting force applied on S at point M . The inertia
matrix of S expressed at the point M is IM , while ΩS rep-
resents the angular velocity of S. Finally τM is the resulting
Figure 6 shows an example of such plots, the trajectory
torque applied on S at the point M .
represents the position/velocity phase (projected on the Z
axis) of the drumstick. m.aM = FM (1)
Although temporal trajectories (Figure 6) convey help-
ful information about the motion, they cannot be visualized
for the moment at the same time as our virtual instrumen- IM .Ω̇S + ΩS .IM .ΩS = τM (2)
tal scene rendering. We propose the real-time visualization Once the joints and members of the virtual character can
of 3D trajectories and their corresponding bounding boxes be simulated by the emulation of motion equations, we of-
(Figure 7). This helps in identifying the gestural space ac- fer a way to physically control the virtual character with
tually used during the performance. motion capture data thanks to a Proportionnal - Integral -
In addition of these kinematic cues, we offer the visu- Derivative (PID) process (Figure 8).
alization of dynamic characteristics of percussion gestures The PID process translates the motion capture trajecto-
by physically modeling, simulating and controlling a virtual ries into forces and torques. Knowing angular targets from
character. motion capture ΘT and Θ̇T , and knowing the angular state
of the virtual character ΘS and Θ̇S , the PID computes the
4.3 Dynamic cues torque τ to be applied. Kp , Ki and Kd are coefficients to
The aim of the visualization of gesture’s dynamic profiles be tuned. This process ends the simulation framework and
is to facilitate the visualization of the interaction between makes the virtual character able to dynamically replay in-
the virtual character and the percussion model. Interaction strumental timpani sessions.
information is available, thanks to physical modeling and The interactions between the virtual character, percus-
simulation of instrumental gestures. sion model and the sound are then discussed. It is achieved
41
by taking advantage of the dynamic characteristics that are sive way of designing new gesture-sound interactions based
available thanks to our virtual character dynamic model. on both kinematic and dynamic gesture features.
Figure 9: Dynamic cues about beat impact: visual-

ization of the location and magnitude of the attack
by the propagation of a wave.
5. CONCLUSION
We have presented in this paper a new interface for visu-
alizing instrumental gestures, based on the animation of a
virtual expressive humanoid. This interface facilitates the
Figure 8: PID process. From motion capture data 3D rendering of virtual instrumental scenes, composed of
targets (angles ΘT and angular velocities Θ̇T ), joints’ a virtual character interacting with instruments, as well as
current state (angles ΘS and angular velocities Θ̇S ) the visualization of both kinematic and dynamic cues of the
and coefficients (Kp , Ki and Kd ) to be tuned, torques gesture. Our approach is based on the use of motion capture
τ are processed to physically control the virtual data to control a dynamic character, thus making possible
character. a detailed analysis of the gesture, and the control of the dy-
namic interaction between the entities of the scene. It be-
4.3.2 Interaction comes therefore possible to enhance the visualization of the
hitting gesture by showing the effects of the attack force on
In order to account for the interaction between the vir-
the membrane. Furthermore, the simulation of movement,
tual character’s sticks and the timpani model, we suggest to
including preparatory and interaction movement, provides a
render a propagating wave on the membrane of the timpani
mean of creating new instrumental gestures, associated with
when a beat impact occurs. Although the rendering of such
an adapted sound-production process.
a wave isn’t the theoretical solution of the wave equation,
In the near future, we expect to enrich the analysis of
this model can take into account the biomechanical proper-
gesture, by extracting relevant features from the captured
ties of the limbs and the properties of the sticks. Once the
motion, such as invariant patterns. We will also introduce
collision system detects an impact, kinematic and dynamic
an expressive control of the virtual character from a reduced
features - such as the velocity and the impact force - can be
specification of the percussion gestures. Finally, we are cur-
extracted. These features instantiate the attributes of the
rently implementing the connection of our simulation frame-
propagation of the wave making it possible the visualization
work to well-known physical modeling sound-synthesis tools
of the position and the intensity of the impact (Figure 9).
such as IRCAM’s Modalys [10] to enrich interaction pos-
Once kinematic and dynamic features of motion and phys-
sibilities of this framework. A similar strategy to existing
ical interactions are obtained, we can set up strategies of
frameworks, such as DIMPLE [21], using Open Sound Con-
sound production. In this paper, we limit ourselves to the
trol [25] messages generated by the simulation engine, is
triggering of pre-recorded sounds available from motion cap-
being considered.
ture sessions. These sounds are played when the impacts of
the virtual character sticks are detected on the membrane
of the timpani model. 6. ACKNOWLEDGMENTS
One can notice that the time when the sound is played The authors would like to thank the people who have con-
doesn’t depend on motion capture data, but depends on the tributed to this work, including Prof. Fabrice Marandola
physical simulation and interaction between the virtual per- (McGill), Nicolas Courty (VALORIA), Erwin Schoonder-
former and the percussion model. This provides an exten- waldt (KTH), Steve Sinclair (IDMIL), as well as the tim-
42
pani performers. This work is partially funded by the Nat- [15] M. Peinado, B. Heberlin, M. M. Wanderley, B. Le
ural Sciences and Engineering Research Council of Canada Callennec, R. Boulic, and D. Thalmann. Towards
(Discovery and Special Research Opportunity grants), and Configurable Motion Capture with Prioritized Inverse
the Pole de Competitivite Bretagne Images & Réseaux. Kinematics. In Proc. of the Third International
Workshop on Virtual Rehabilitation, pages 85–96,
7. REFERENCES 2004.
[1] R. Boie, M. Mathews, and A. Schloss. The Radio [16] T. Mäki-Patola, P. Hämäläinen, and A. Kanerva. The
Drum as a Synthesizer Controller. In Proc. of the 1989 Augmented Djembe Drum - Sculpting Rhythms. In
International Computer Music Conference (ICMC89), Proc. of the 2006 International Conference on New
pages 42–45, 1989. Interfaces for Musical Expression (NIME06), pages
[2] R. Bresin and S. Dahl. Experiments on gesture : 364–369, 2006.
walking, running and hitting. In Rochesso & Fontana [17] M. Marshall, M. Rath, and B. Moynihan. The Virtual
(Eds.): The Sounding Object, pages 111–136, 2003. Bodhran - The Vodhran. In Proc. of the 2002
[3] D. Buchla. Lightning II MIDI Controller. International Conference on New Interfaces for
http://www.buchla.com/. Buchla and Associates’ Musical Expression (NIME02), pages 153–159, 2002.
Homepage. [18] F. W. Noak. Timpani Sticks. Percussion Anthology.
[4] A. Camurri, B. Mazzarino, M. Ricchetti, R. Timmers, The Instrumentalist, 1984. Third edition.
and G. Volpe. Multimodal analysis of expressive [19] G. B. Peters. Un-contestable Advice for Timpani and
gesture in music and dance performances. In A. Marimba Players. Percussion Anthology. The
Camurri, G. Volpe (Eds.): Gesture-Based Instrumentalist, 1984. Third edition.
Communication in Human-Computer Interaction, [20] G. Rule. Keyboard Reports: Korg Wavedrum.
LNAI 2915, Springer Verlag, pages 20-39, 2004. Keyboard, 21(3):72–78, 1995.
[5] K. Chuchacz, S. O’Modhrain, and R. Woods. Physical [21] S. Sinclair and M. M. Wanderley. Extending
Models and Musical Controllers: Designing a Novel DIMPLE: A Rigid Body Simulator for Interactive
Electronic Percussion Instrument. In Proc. of the 2007 Control of Sound. In Proc. of the ENACTIVE’07
International Conference on New Interfaces for Conference, pages 263–266, 2007.
Musical Expression (NIME07), pages 37–40, 2007. [22] R. Smith. Open Dynamics Engine. www.ode.org.
[6] G. Cook. Teaching Percussion. Schirmer Books, 1997. [23] R. Taylor, D. Torres, and P. Boulanger. Using Music
Second edition. to Interact with a Virtual Character. In Proc. of the
[7] S. Dahl. Spectral Changes in the Tom-Tom Related to 2005 International Conference on New Interfaces for
the Striking Force. Spech, Music and Hearing Musical Expression (NIME05), pages 220–223, 2005.
Quarterly Progress and Status Report, KTH, Dept. of [24] A. Tindale, A. Kapur, G. Tzanetakis, P. Driessen, and
Speech, Music and Hearing, Royal Institute of A. Schloss. A Comparison of Sensor Strategies for
Technology, Stockholm, Sweden, 1997. Capturing Percussive Gestures. In Proc. of the 2005
[8] S. Dahl. Playing the Accent: Comparing Striking International Conference on New Interfaces for
Velocity and Timing in Ostinato Rhythm Performed Musical Expression (NIME05), pages 200–203, 2005.
by Four Drummers. Acta Acoustica with Acoustica, [25] M. Wright, A. Freed, and A. Momeni. Open Sound
90(4):762–776, 2004. Control: The State of the Art. In Proc. of the 2003
[9] C. Dodge and T. A. Jerse. Computer Music: International Conference on New Interfaces for
Synthesis, Composition and Performance. Schirmer - Musical Expression (NIME03), pages 153–159, 2003.
Thomson Learning, 1997. Second edition.
[10] N. Ellis, J. Bensoam, and R. Causse. Modalys
Demonstration. In Proc. of the 2005 International
Computer Music Conference (ICMC05), pages
101–102, 2005.
[11] R. Hanninen, L. Savioja, and T. Takala. Virtual
concert performance - synthetic animated musicians
playing in an acoustically simulated room. In Proc. of
the 1996 International Computer Music Conference
(ICMC96), pages 402–404, 1996.
[12] K. Havel and M. Desainte-Catherine. Modeling and
Air Percussion for Composition and Performance. In
Proc. of the 2004 International Conference on New
Interfaces for Musical Expression (NIME04), pages
31–34, 2004.
[13] R. Jones and A. Schloss. Controlling a physical model
with a 2D Force Matrix. In Proc. of the 2007
International Conference on New Interfaces for
Musical Expression (NIME07), pages 27–30, 2007.
[14] A. Kapur, G. Essl, P. Davidson, and P. Cook. The
Electronic Tabla Controller. Journal of New Music
Research, 32(4):351–360, 2003.
43
Classification of Common Violin Bowing Techniques Using

Gesture Data from a Playable Measurement System
Diana Young
MIT Media Laboratory
20 Ames Street
Cambridge, MA, USA
young@media.mit.edu
ABSTRACT Recently, the task of classifying individual violin bowing

This paper presents the results of a recent study of common techniques was undertaken using gesture data from the Aug-
violin bowing techniques using a newly designed measure- mented Violin, another playable sensing system [11]. In this
ment system. This measurement system comprises force, work, three bowing techniques (détaché, martelé, and spic-
inertial, and position sensors installed on a carbon fiber vi- cato) were classified using minimum and maximum bow ac-
olin bow and electric violin, and enables recording of real celeration in one dimension as inputs to a k-nearest-neighbor
player bowing gesture under normal playing conditions. Us- (k-NN) algorithm.
ing this system, performances of six different common bow- In the study presented in this paper, a similar approach
ing techniques (accented détaché, détaché lancé, louré, mar- was taken to classify violin bowing techniques. However,
telé, staccato, and spiccato) by each of eight violinists were here, the analysis incorporated a greater diversity of gesture
recorded. Using a subset of the gesture data collected, the data, i.e., more data channels, to classify six different bowing
task of classifying these data by bowing technique was un- techniques. Also, although a k-NN classifier was also used,
dertaken. Toward this goal, singular value decompostion in contrast to the research described above, the inputs to
(SVD) was used to compute the principal components of this classifier were determined by a dimensionality reduction
the data set, and then a k-nearest-neighbor (k-NN) classi- technique using all of the gesture data. That is, the data
fier was employed, using the principal components as inputs. reduction technique itself determines most salient features
The results of this analysis are presented below. of the data.
The data for this experiment was captured using a new
measurement system for violin bowing [16]. Based on the
Keywords earlier Hyperbow designs [15], this system includes force
bowing, gesture, playing technique, principal component anal- (downward and lateral bow force), inertial (3D acceleration
ysis, classification and 3D angular velocity), and position sensors installed on
a carbon fiber violin bow and electric violin, and enables
recording of real player bowing gesture under normal play-
1. INTRODUCTION ing conditions.
Physical bowing technique is a topic of keen interest in
research communities, due to the complexity of the bow-
string interaction and the expressive potential of bowing
2. BOWING TECHNIQUE STUDY
gesture. Areas of interest include virtual instrument de- The primary goal of the bowing technique study was to
velopment [18], interactive performance [17, 2, 13, 8], and investigate the potential of using the new bowing measure-
pedagogy [7]. For many applications, reliable recognition of ment system described above to capture the disctintions be-
the individual bowing techniques that comprise right-hand tween common bowing techniques. In this study, the gesture
bowing technique would be a great benefit. and audio data generated by eight violinists performing six
Prior art on classification of violin bowing technique in different bowing techniques on each of the four violin strings
particular includes the CyberViolin project [9]. In this work, were recorded for later analysis. The details of the study
features are extracted from position data produced by an protocol, experimental setup, and participants are discussed
electromagnetic motion tracking system. A decision tree below.
takes these features as inputs in order to classify up to seven
different bowing techniques in realtime.
2.1 Study Protocol
In this study each of the eight participants was asked to
perform repetitions of a specific bowing technique originat-
ing from the Western “classical” music tradition. To help
Permission to make digital or hard copies of all or part of this work for communicate the kind of bowstroke desired, a musical ex-
personal or classroom use is granted without fee provided that copies are cerpt (from a work of the standard violin repertoire) featur-
not made or distributed for profit or commercial advantage and that copies ing each bowing technique was provided from [1]. In addi-
bear this notice and the full citation on the first page. To copy otherwise, to tion, an audio example of the bowing technique for each of
republish, to post on servers or to redistribute to lists, requires prior specific the four requested pitches was provided to the player. The
NIME08, Genova, Italy bowing technique was notated clearly on a score, specifying
Copyright 2008 Copyright remains with the author(s). the pitch and string, tempo, as well as any relevant articu-
44
lation markings, for each set of the recordings.

Two different tempi were taken for each of the bowing
techniques (on each pitch). First, trials were conducted us-
ing a characteristic tempo for each individual bowing tech-
nique. Immediately following these, trials were conducted
using one common tempo. Though the target trials were
actually those that were conducted with the same tempo
across all of the bowing techniques, it was found early on
that requesting performances using the characteristic tempo
first enabled the players to perform at the common tempo
with greater ease.
Both tempi required for each bowing technique were pro-
vided by a metronome. In some cases, a dynamics marking
was written in the musical example, but the participants
were instructed to perform all of the bowstrokes at a dy-
namic level of mezzo forte. Participants were instructed to
take as much time as they required to either play through
the musical example and/or practice the technique before
the start of the recordings to ensure that the performances Figure 1: This figure describes the experimental
would be as consistent as possible. setup used in the recording sessions for the bowing
Three performances of each bowing technique, comprising technique study. The top half of the figure shows the
one trial, were requested on each of the four pitches (one on interface for the Pd recording patch, and the lower
each string). During the first preliminary set of recording half shows the individual elements of the setup.
sessions, which were conducted in order to refine the experi- From left to right, they are the custom violin bowing
mental procedure, participants were asked to perform these measurement system installed on a Yamaha SV-200
TM
bowing techniques on the open strings. The rationale for Silent Violin and a CodaBow R Conservatory vi-
this instruction was that the current measurement system olin bow; headphones; M-Audio Fast Track USB au-
does not capture any information concerning the left hand dio interface; and an Apple MacBook with a 2 GHz
gestures. It was observed, however, that players do not play Intel Core Duo processor (OS X).
as comfortably and naturally on open strings as when they
finger pitches with the left hand. Therefore, in the subse-
quent recording sessions that comprise the actual technique
study, the participants were asked to perform the bowing quiet and natural a playing environment as possible.
techniques on the fingered fourth interval above the open The participants for the bowing technique study included
string pitch, with no vibrato. eight violin students from the Schulich School of Music of
The bowing techniques that comprised this study were McGill University, five of whom had taken part in the pre-
accented détaché, détaché lancé, louré, martelé, staccato, and liminary testing sessions and who therefore already had ex-
spiccato. Brief descriptions of these techniques may be found perience with the measurement system and the test record-
in the Appendix. ing setup. The participants were recruited by means of an
email invitation and “word of mouth”, and they were each
2.2 Experimental Setup compensated $15 CAD to take part in the study. All of the
players were violin performance majors and had at least one
In each trial of the bowing technique study, the physical
year of conservatory-level training. They were also of the
gesture data were recorded simultaneously with the audio
same approximate age.1
data produced in the performances of each technique. The
experimental setup, depicted in Figure 1, was simple: the
custom violin bowing measurement system installed on a 3. TECHNIQUE STUDY EVALUATION
TM
CodaBow R Conservatory violin bow [3] and the Yamaha The main goal of the technique study was to determine
SV-200 Silent Violin [14]; headphones (through which the whether the gesture data provided by the measurement sys-
participants heard all pre-recorded test stimuli and real- tem would be sufficient to recognize the six different bowing
time sound of the test violin); M-Audio Fast Track USB techniques (accented détaché, détaché lancé, louré, martelé,
audio interface [4]; and Apple MacBook with a 2 GHz Intel staccato, and spiccato) played by the eight violinists. To
Core Duo processor (OS X) running PureData (Pd) version begin these classification explorations, only a subset of the
0.40.0-test08 [10]. gesture data provided by the measurement system was con-
The audio and the gesture data were recorded to file by sidered for the evaluations. Included in the analyses were
means of a PD patch (shown in Figure 1), which encoded data from the eight bow gesture sensors only: the downward
the gesture data as multi-channel audio in order to properly and lateral forces; x, y, z acceleration; and angular velocity
“sync” all of the data together. Each file was recorded with about the x, y, and z axes.
a trial number, repetition number, and time and date stamp. In order to answer these questions, a simple supervised
The Pure Data (Pd) patch also allowed for easy playback of classification algorithm was used. The k-nearest-neighbor
recorded files used as test stimuli.
The recordings took place in the Center for Interdisci- 1
These studies received approval from the MIT Committee
plinary Research in Music Media and Technology (CIR- on the Use of Humans as Experimental Subjects (COUHES)
MMT) of McGill University. Care was taken to create as [5].
45
(k-NN) algorithm was chosen because it is simple and ro- Player 1, Principal Components 1-3
bust for well-conditioned data. Because each data point in
the time series was included, the dimensionality, 9152 (1144
samples in each time series x 8 gesture channels), of the ges-
ture data vector was very high. Therefore, the dimension- 0.15
ality of the gesture data set was first reduced before being 0.1
input to the classifier. 0.05
0
3.1 Computing the Principal Components -0.05
Principal component analysis (PCA) is a common tech- -0.1
nique used to reduce the dimensionality of data [12]. PCA -0.15 1
is a linear transform that transforms the data set into a -0.2
new coordinate system such that the variance of the data 0.95
-0.25
vectors is maximized along the first coordinate dimension 0.3 0.25 0.2 0.15 0.1
0.05 0 -0.05 -0.1 -0.15 -0.2 0.9
(known as the first principal component). That is, most of
the variance is represented, or “explained”, by this dimen-
sion. Similarly, the second greatest variance is along the sec- Figure 2: Scatter plot of all six bowing techniques
ond coordinate dimension (the second principal component), for player 1 (of 8). Accented détaché (square),
the third greatest variance is along the third coordinate di- détaché lancé (triangle), louré (pentagon), martelé
mension (the third principal component), et cetera. Because (circle), staccato (star), spiccato (diamond). The
the variance of the data decreases with increasing coordinate axes correspond to the first three principal compo-
dimension, higher components may be disregarded for sim- nents.
ilar data vectors, thus resulting in decreased dimensionality
of the data set.
In order to reduce the dimensionality of the bowing ges-
Player 5, Principal Components 1-3
ture data in this study, the data were assembled into a
matrix and the principal components were computed using
the efficient singular value decompositions (SVD) algorithm.
0.4
For this bowing technique study, there were 576 (8 players x
6 techniques x 4 strings x 3 performances of each) recorded 0.3
examples produced by the participants, and for each exam- 0.2
ple, 8 channels of the bow gesture data were used. These
0.1
data were used to form a 576 x 9152 matrix M, which was
input to the SVD in order to enable the following analyses. 0
Before continuing with the classification step, it was in- -0.1
formative to illustrate the separability of bowing techniques
-0.2
produced by the individual players. From the matrix M, a 1
smaller matrix composed of those 72 rows corresponding to -0.3

0.9
each violinist (6 techniques x 4 strings x 3 performances of -0.4
0.8
each) was taken and then decomposed using the SVD algo- 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3
rithm to produce the principal components of each individ-
ual player’s bowing data. A scatter plot was then produced
for each player’s data, showing the first three principal com- Figure 3: Scatter plot of all six bowing techniques
ponents corresponding to each bowing technique. Two of for player 5 (of 8). Accented détaché (square),
these plots are shown in Figures 2 and 3. As can be seen détaché lancé (triangle), louré (pentagon), martelé
in these examples, clear separability of bowing techniques (circle), staccato (star), spiccato (diamond). The
for individual players was demonstrated using only three di- axes correspond to the first three principal compo-
mensions. nents.
3.2 k-NN Classification

After computing the principal components produced by the principal components corresponding to the training data
the SVD method, the challenge of classifying the data was were input to the k-NN algorithm, enabling the remaining
undertaken using the full data matrix (including all play- data to be classified according to technique. For each case,
ers data together). Toward this goal, a k-nearest-neighbor a three-fold cross-validation procedure was obeyed, as this
classifier was used. Specifically, Nabney’s matlab implemen- process was repeated as the training data (and the data
tation [6] was employed. In this case, a subset of the data to be classified) were rotated. The final classification rate
contributed by all of the players was used to train the k-NN estimates were taken as the mean and standard deviation of
algorithm in order to classify the remaining data from all of the classification rates of the cross-validation trials.
the players by technique. The effect on the overall classification rate of the number
In each case, the principal components of the training of principal components is clearly illustrated by Figure 4.
data set were first computed using the SVD method. The As seen in Table 1, using 7 principal components enables
remaining data (to be classified) were then projected into classification of 6 bowing techniques of over 95.3 ± 2.6% of
the eigenspace determined by this exercise. Some number of the remaining data.
46
class. acc. det. louré martelé staccato spiccato

actual detaché lanceé Player 6, Principal Components 1-3
acc. det. 0.938 0.010 0.010 0.042 0.000 0.000
det. lancé 0.000 0.917 0.000 0.010 0.021 0.052
0.2
louré 0.000 0.000 0.979 0.000 0.021 0.000
martelé 0.042 0.021 0.000 0.938 0.000 0.000 0.15
staccato 0.000 0.010 0.010 0.000 0.979 0.000
0.1
spiccato 0.000 0.031 0.000 0.000 0.000 0.969
0.05
Table 1: Training on two-thirds of the data from 0
each of the eight players, predicting the remain third -0.05
of each player’s data (with overall prediction of 95.3
± 2.6%) with seven principal components. -0.1
-0.15
1 -0.2 1
-0.25 0.95
0.9 0.25 0.2 0.15 0.1 0.05 0.9
0 -0.05 -0.1 -0.15 -0.2 -0.25
0.8
Figure 5: Scatter plot of all six bowing techniques

Classification Rate
0.7 for player 6. Accented détaché (square), détaché

lancé (triangle), louré (pentagon), martelé (circle),
0.6 staccato (star), spiccato (diamond). The axes corre-
spond to the first three principal components. As
0.5 can be seen here, the détaché lancé and spiccato
techniques are not separable in three dimensions.
0.4
iment partly for ease of implementation. Other techniques,

1 2 3 4 5 6 7 8 9 10 however, should be evaluated in pursuit of robustness and
Number of Principal Components
higher classification rates.
Finally, more vigorous classification of bowing techniques
Figure 4: Mean prediction rates produced by k-
should include qualitative listening evaluations of the bow-
NN using two-thirds of the data from each of the
ing audio to complement the quantitative evaluation of the
eight players to predict the remaining one-third of
bowing gesture data.
all player data and increasing principal components
from one to ten.
5. ACKNOWLEDGMENTS
Special thanks to the violinists of the Schulich School of
Music of McGill University for their time and participation;
4. DISCUSSION and to André Roy, Ichiro Fujinaga, and Stephen McAdams
The results of this bowing technique study are encour- for their help in organizing this study; Roberto Aimi for his
aging. Using a relatively small number of principal com- Pd expertise; and Joseph Paradiso for discussion; and much
ponents, the k-NN classification yielded over 95% average gratitude to the violinists from the Royal Academy of Music
classification of the six bowing techniques produced by the who participated in the early pilot studies for this research.
eight participants. Some of the error of this result can be un-
derstood from Table 1. This confusion matrix shows that ac- 6. APPENDIX
cented détaché is most often mis-classified as martelé, which Descriptions, taken from [1], of the six bowing techniques
is not surprising as these two techniques are somewhat sim- featured in this study are included below.
ilar in execution. Interestingly, there was considerable error
from mis-classifying détaché lancé as spiccato. Although • accented détaché A percussive attack, produced by
these two techniques are quite diffferent from each other, great initial bow speed and pressure, characterizes this
Figure 5 implies they were confused by one of the partici- stroke. In contrast to the martelé, the accented détaché
pants. This discrepancy alone explains much of the error in is basically a non-staccato articulation and can be per-
classifying these two techniques. formed at greater speeds than the martelé.
Of course, there is much to be done to build on the work • détaché Comprises a family of bowstrokes, played on-
begun here. The analysis described here involved the clas- the-string, which share in common a change of bowing
sification of six different bowing techniques in which each direction with the articulation of each note. Détaché
trial was actually comprised of a repetition of one of these strokes may be sharply accentuated or unaccentuated,
techniques. An immediate next step is to analyze the same legato (only in the sense that no rest occurs between
data set using individual bowstrokes. Also, only a subset of strokes), or very slightly staccato, with small rests sep-
the gesture channels captured by the bowing measurement arating strokes.
system was used for this study. For future studies that may
include more techniques and players, the benefit of the re- • détaché lancé “Darting” détaché. Characteristically,
maining channels should be explored. a short unaccented détaché bowstroke with some stac-
The SVD and k-NN algorithms were chosen for this exper- cato separation of strokes.
47
• legato Bound together (literally, “tied”). Without [12] G. Strang. Linear Algebra and Its Applications.
interruption between the notes; smoothly connected, Brooks Cole, Stamford, CT, 4th edition, 2005.
whether in one or several bowstrokes. [13] D. Trueman and P. R. Cook. BoSSA: The
deconstructed violin reconstructed. In Proceedings of
• louré A short series of gently pulsed, slurred, legato
the International Computer Music Conference,
notes. Varying degrees of articulation may be em-
Beijing, 1999.
ployed. The legato connection between notes may not
be disrupted at all, but minimal separation may be [14] Yamaha. SV-200 Silent Violin.
employed. http://www.global.yamaha.com/index.html
[15] D. Young. Wireless sensor system for measurement of
• martelé Hammered; a sharply accentuated, staccato violin bowing parameters. In Proceedings of the
bowing. To produce the attack, pressure is applied Stockholm Music Acoustics Conference (SMAC 03),
an instant before bow motion begins. Martelé differs Stockholm, August 2003.
from accented détaché in that the latter has primar- [16] D. Young. A Methodology for Investigation of Bowed
ily no staccato separation between strokes and can be String Performance Through Measurement of Violin
performed at faster speeds. Bowing Technique. PhD thesis, M.I.T., 2007.
• staccato Used as a generic term, staccato means a [17] D. Young, P. Nunn, and A. Vassiliev. Composing for
non-legato martelé type of short bowstroke played with Hyperbow: A collaboration between MIT and the
a stop. The effect is to shorten the written note value Royal Academy of Music. In Proceedings of the 2006
with an unwritten rest. Conference on New Interfaces for Musical Expression
(NIME-06), Paris, 2006.
• spiccato A slow to moderate speed bouncing stroke. [18] D. Young and S. Serafin. Investigating the
Every degree of crispness is possible in the spiccato, performance of a violin physical model: Recent real
ranging from gently brushed to percussively dry. player studies. In Proceedings of the International
Computer Music Conference, Copenhagen, 2007.
7. REFERENCES
[1] J. Berman, B. G. Jackson, and K. Sarch. Dictionary
of Bowing and Pizzicato Terms. Tichenor Publishing,
Bloomington, Indiana, 4th edition, 1999.
[2] F. Bevilacqua, N. H. Rasamimanana, E. Fléty,
S. Lemouton, and F. Baschet. The augmented violin
project: research, composition and performance
report. In Proceedings of the 2006 Conference on New
Interfaces for Musical Expression (NIME-06), Paris,
2006.
[3] CodaBow. Conservatory Violin Bow.
http://www.codabow.com/.
[4] M-Audio. Fast Track USB.
http://www.m-audio.com/.
[5] MIT Committee on the Use of Humans as
Experimental Subjects (COUHES).
http://web.mit.edu/committees/couhes/.
[6] I. T. Nabney. Netlab neural network software.
http://www.ncrg.aston.ac.uk/netlab/index.php.
[7] K. Ng, B. Ong, O. Larkin, and T. Koerselman.
Technology-enhanced music learning and teaching:
i-maestro framework and gesture support for the
violin family. In Association for Technology in Music
Instruction (ATMI) 2007 Conference, Salt Lake City,
2007.
[8] J. Paradiso and N. Gershenfeld. Musical applications
of electric field sensing. Computer Music Journal,
21(3):69–89, 1997.
[9] C. Peiper, D. Warden, and G. Garnett. An interface
for real-time classification of articulations produced by
violin bowing. In Proceedings of the 2003 Conference
on New Interfaces for Musical Expression (NIME-03),
Montreal, 2003.
[10] M. Puckette. Pure Data (Pd).
http://www.crca.ucsd.edu/~msp/software.html.
[11] N. Rasamimanana, E. Fléty, and F. Bevilacqua.
Gesture analysis of violin bow strokes. Lecture Notes
in Computer Science, pages 145–155, 2006.
48
Slide guitar synthesizer with gestural control
Jyri Pakarinen Vesa Välimäki Tapio Puputti

Department of Signal Department of Signal Helsinki University of
Processing and Acoustics Processing and Acoustics Technology
Helsinki University of Helsinki University of P.O. Box 3000
Technology Technology FI-02015 TKK, Finland
P.O. Box 3000 P.O. Box 3000 tapio.puputti@tkk.fi
FI-02015 TKK, Finland FI-02015 TKK, Finland
jyri.pakarinen@tkk.fi vesa.valimaki@tkk.fi
ABSTRACT x(n) y(n)

This article discusses a virtual slide guitar instrument, re-
cently introduced in [7]. The instrument consists of a novel CSG H l(z) gc z-L f z-LI
physics-based synthesis model and a gestural user interface. Contact sound Loop filter Energy
Fractional Integer delay
The synthesis engine uses energy-compensated time-varying generator compensation
delay
digital waveguides. The string algorithm also contains a
parametric model for synthesizing the tube-string contact
sounds. The real-time virtual slide guitar user interface em- Figure 1: The signal flow diagram of the slide guitar
ploys optical gesture recognition, so that the user can play string synthesizer. The energy compensation block
this virtual instrument simply by making slide guitar play- compensates for the artificial energy losses due to
ing gestures in front of a camera. the time-varying delays. The contact sound genera-
tor (see Figure 2) simulates the handling noise due
to the sliding tube-string contact.
Keywords
Sound synthesis, slide guitar, gesture control, physical mod-
eling played by wearing the slide tube on one hand and the ring
on the other, and by making guitar-playing gestures in front
of the camera. The user’s gestures are mapped into synthe-
1. INTRODUCTION sis control parameters, and the resulting sound is played
The term slide- or bottleneck guitar refers to a specific back through the loudspeaker in real-time. More informa-
traditional playing technique on a steel-string acoustic or tion on gestural control of music synthesis can be found e.g.
electric guitar. When playing the slide guitar, the musician in [8] and [16].
wears a slide tube on the fretting hand. Instead of pressing From the control point of view, the VSG can be seen as
the strings against the fretboard, she or he glides the tube a successor of the virtual air guitar (VAG) [1] developed at
on the strings while the picking hand plucks the strings in Helsinki University of Technology a few years ago. The ma-
a regular fashion. This produces a unique, voice-like tone jor difference between these gesture-controlled guitar syn-
with stepless pitch control. Although the tube is usually thesizers is that like in the real slide guitar, the VSG allows
slid along all six strings, single-note melodies can be played a continuous control over the pitch, and also sonifies the
by plucking just one string and damping the others with contact sounds emanating from the sliding contact between
the picking hand. The slide tube, usually made of glass the slide tube and the imaginary string.
or metal, also generates a squeaking sound while moving The VSG uses digital waveguides [11, 12] for synthesiz-
along on the wound metal strings. In most cases, the slide ing the strings. A model-based contact sound generator
guitar is tuned into an open tuning (for example the open is added for simulating the friction-based sounds created
G tuning: D2 , G2 , D3 , G3 , B3 , and D4 starting from the by the sliding tube-string contact. More information on
thickest string). This allows the user to play simple chords physics-based sound synthesis methods can be found in [14].
just by sliding the tube into different positions on the guitar
neck. The player usually wears the slide tube on the pinky
or ring finger, and the other fingers are free to fret the
2. STRING SYNTHESIS
strings normally. A single-delay loop (SDL) digital waveguide (DWG) model
A virtual slide guitar (VSG) [7, 4] is described in this [2] with time-varying pitch forms the basis of the slide gui-
paper. The VSG consists of an infra-red (IR) camera, IR- tar synthesis engine, as illustrated in Fig. 1. The string
reflecting slide tube and a ring, a computer running a physics- model consists of a feedback delay loop with an additional
based string algorithm, and a loudspeaker. The VSG is loop filter, an energy scaling coefficient, and a contact sound
generator block. The fractional delay filter in Fig. 1 allows
for a smooth transition between pitches, and also enables
the correct tuning of the string. There are several tech-
Permission to make digital or hard copies of all or part of this work for niques for implementing fractional delay filters, a thorough
personal or classroom use is granted without fee provided that copies are tutorial being found in [3]. For the purpose of this work,
not made or distributed for profit or commercial advantage and that copies a fifth-order Lagrange interpolator was found to work suf-
bear this notice and the full citation on the first page. To copy otherwise, to ficiently well. It must be noted that both the integer delay
republish, to post on servers or to redistribute to lists, requires prior specific line length and the fractional delay filter are time-varying,
NIME08, Genoa, Italy i.e. the user controls the total loop delay value and thus
Copyright 2008 Copyright remains with the author(s). also the pitch during run-time.
49
(a) Filter 1-gbal

The basis of the synthetic contact sound for wound strings
is produced in the noise pulse train generator (Fig. 2, block
(a)). It outputs exponentially decaying noise pulses at the
(d)
Filter Waveshaper gTV given firing rate. In addition, the type of the string deter-
mines the decay time and duration of an individual pulse.
gbal guser For enhancing the harmonic structure of the contact noise
(b) (c) on wound strings, the lowest time-varying harmonic is em-
L(n) Smooth |x|
- nw fc(n) phasized by filtering the noise pulse train with a second-
z-1
order resonator (block (b)), where the firing rate controls
the resonators center frequency. The higher harmonics are
Figure 2: The contact sound generator block. The produced by distorting the resonators output with a suit-
sliding velocity controlled by the user commands able nonlinear waveshaper (block (c)). A scaled hyperbolic
the synthetic contact noise characteristics. The tangent function is used for this. Hence, the number of
sub-blocks are (a) the noise pulse generator, (b) a higher harmonics can be controlled by changing the scaling
resonator creating the first harmonic of the time- of this nonlinear function.
varying noise structure, (c) a static nonlinearity A 4th-order IIR filter (block (d)) is used for simulating
generating the upper time-varying harmonics, and the static longitudinal string modes and the general spec-
(d) an IIR filter simulating the general spectral tral shape of the contact noise. As the noise characteristics
characteristics of the noise. depend on the tube material and string type, different filter
parameters are used for different slide tube and string con-
figurations. In Fig. 2, the scaling coefficient gbal controls
The loop filter is a one-pole lowpass filter that simulates the ratio between the time-varying and static contact sound
the vibrational losses of the string. Different filter parame- components. Finally, the total amplitude of the synthetic
ters are used depending on the length and type of the string, contact noise is controlled by the slide velocity fc (n), via a
as suggested in [15]. Also, when changing the length of a scaling coefficient gTV . Parameter guser allows the user to
DWG string during run time, the signal energy is varied [5]. control the overall volume of the contact sound. For plain,
In practice, this can be heard as an unnaturally quick decay i.e. unwound strings, the contact sound synthesis block is
of the string sound. A time-varying scaling technique, in- simplified by replacing the noise burst generator (block (a)
troduced in [5], was used as a compensation. This results in in Fig. 2) with a white noise generator, and by omitting
an additional scaling operation inside the waveguide loop, blocks (b), (c), and (d).
as illustrated in Fig. 1.
3. REAL-TIME IMPLEMENTATION
2.1 Contact Sound Synthesis Since the user controls the pitch of the VSG in a continu-
The handling sounds created by the sliding tube-string ous manner, it is important that there is not a large latency
contact are very similar to the handling sounds between a between the user’s action and the resulting sound. Thus, a
sliding finger-string contact. A recent study [6] revealed high frame rate (120 fps) infra-red (IR) camera is used for
that these squeaky sounds consist mainly of lowpass-type detecting the users hand locations. The camera operates by
noise with both static and time-varying harmonic compo- lighting the target with IR-LEDs and sensing the reflected
nents. The lowpass-cutoff frequency, frequencies of the time- IR light. A real slide tube coated with IR reflecting fabric is
varying harmonics, and the overall magnitude of the contact used for detecting the users fretting hand. For recognition
noise are controlled by the sliding velocity. of the picking hand, a small ring of IR reflecting fabric is
For synthesizing the handling sounds, we chose a noise worn on the index finger.
pulse train as the excitation signal. This is based on the as-
sumption that when the tube slides over a single winding, it 3.1 Technical Details
generates a short, exponentially decaying noise burst. The The implementation works on a 2.66 GHz Intel Pentium
time interval between the noise pulses is controlled by the 4 CPU with 1 GB of RAM and a SoundMax Integrated Dig-
sliding velocity; a fast slide results in a temporally dense ital Audio soundcard. Both the sound synthesis part and
pulse train, while a slow slide makes the pulses appear fur- the camera interface operate in the Windows XP environ-
ther apart. In fact, the contact sound synthesizer can be ment. The sound synthesis uses PD (Pure Data) [9] version
seen as a periodic impact sound synthesis model rather than 0.38.4-extended-RC8. The sampling frequency for the syn-
a friction model. thesis algorithm is 44.1 kHz, except for the string waveguide
The general structure of the contact noise generator block loop, which runs at 22.05 kHz, as suggested in [13]. A Natu-
is illustrated in Fig. 2. The input variable L(n) denotes the ralpoint TrackIR4:PRO USB IR-camera is used for gesture
relative string length, controlled by the distance between recognition. Its output is a 355 x 290 binary matrix, where
the user’s hands. Variable n is the time index. Since the the reflected areas are seen as blobs. As a side note, a re-
contact noise depends on the sliding velocity, a time differ- cent article describing a PD patch for multichannel guitar
ence is taken from the input signal. If the control rate of effects processing can be found in [10].
the signal L(n) is different from the sound synthesis sam-
pling rate, as is often the case, a separate smoothing block 3.2 Camera API
is required after the differentiator. The smoothing block For the camera API (Application Programming Inter-
changes the sampling rate of L(n) to be equal to the sound face), Naturalpoint’s OptiTrack SDK version 1.0.030 was
synthesis sampling rate and uses polynomial interpolation used. The API was modified in the Visual Studio environ-
to smooth the control signal. Furthermore, since the con- ment to include gesture-recognition features. The added
tact noise is independent of the direction of the slide (up / features consist of the distinction between the two blobs
down on the string), the absolute value of the control signal (i.e. slide and plucking hand), calculation of the distance
is taken. The scaling coefficient nw denotes the number of between them, recognition of the plucking and pull-off ges-
windings on the string. The signal fc after this scaling can tures, and transmission of the control data to PD as OSC
therefore be seen as the noise pulse firing rate. (Open Sound Control) messages. Also, an algorithm was
50
added to keep track of the virtual string location, i.e. an synthetic noise. The overall spectral shape of the contact
imaginary line representing the virtual string. This is very noise is set with a 4th-order IIR filter.
similar to the work presented in [1]. The line is drawn The slide guitar synthesizer is operated using an optical
through the tube and the averaged location of the pluck- gesture recognition user interface, similarly as suggested in
ing hand, so that the virtual string slowly follows the play- [1]. However, instead of a web-camera, a high-speed infrared
ers movements. This prevents the user from drifting away video camera is used for attaining a lower latency between
from the virtual string. The API detects the direction of the users gesture and the resulting sound. This IR-based
the plucking hand movement, and when the virtual string camera system could also be used for gestural control of
is crossed, a pluck event and a direction parameter is sent. other latency-critical real-time applications. The real-time
Also, a minimum velocity limit is defined for the plucking virtual slide guitar model has been realized in PD. A video
gesture in order to avoid false plucks. file showing the virtual slide guitar in action can be found on
the Internet: http://youtube.com/watch?v=eCPFYKq5zTk.
3.3 PD Implementation
When the PD implementation receives an OSC message
containing a pluck event, an excitation signal is inserted
5. ACKNOWLEDGMENTS
into each waveguide string. The excitation signal is a short This work has been supported by the GETA graduate
noise burst simulating a string pluck. There is also a slight school, the Cost287-ConGAS action, EU FP7 SAME project,
delay (20 ms) between different string excitations for cre- and the Emil Aaltonen Foundation.
ating a more realistic strumming feel. The order in which
the strings are plucked depends on the plucking direction. 6. REFERENCES
Figure 3 illustrates the structure and signaling of the PD
patch. [1] M. Karjalainen, T. Mäki-Patola, A. Kanerva, and
The camera software can be set to show the blob positions A. Huovilainen. Virtual air guitar. J. Audio Eng.
on screen in real time. This is not required for playing, but Soc., 54(10):964–980, Oct. 2006.
it helps the user to stay in the cameras view. The camera [2] M. Karjalainen, V. Välimäki, and T. Tolonen.
API uses roughly 10% of CPU power without the display Plucked-string models: From the Karplus-Strong
and 20-40% with the display turned on. Since PD uses up to algorithm to digital waveguides and beyond.
80% of CPU power when playing all six strings, the current Computer Music J., 22(3):17–32, 1998.
VSG implementation can run all six strings in real time [3] T. I. Laakso, V. Välimäki, M. Karjalainen, and U. K.
without a noticeable drop in performance, provided that the Laine. Splitting the unit delay - tools for fractional
blob tracking display is turned off. Selecting fewer strings, delay filter design. IEEE Signal Proc. Mag.,
switching the contact sound synthesis off, or dropping the 13(1):30–60, 1996.
API frame rate to half, the display can be viewed while [4] J. Pakarinen. Modeling of Nonlinear and
playing. Time-Varying Phenomena in the Guitar. PhD thesis,
Helsinki University of Technology, 2008. Available
3.4 Virtual Slide Guitar on-line at
The virtual slide guitar system is illustrated in Fig. 4. http://lib.tkk.fi/Diss/2008/isbn9789512292431/
The camera API recognizes the playing gestures and sends (checked Apr. 14, 2008).
the plucking and pull-off events, as well as the distance be- [5] J. Pakarinen, M. Karjalainen, V. Välimäki, and
tween the hands, to the synthesis control block in PD. The S. Bilbao. Energy behavior in time-varying fractional
synthesis block consists of the DWG models illustrated in delay filters for physical modeling of musical
Fig. 1. At its simplest, the VSG is easy to play and needs no instruments. In Proc. Intl. Conf. on Acoustics,
calibration. The user simply puts the slide tube and reflect- Speech, and Signal Proc., volume 3, pages 1–4,
ing ring on and starts to play. For more demanding users, Philadelphia, PA, USA, Mar. 19-23 2005.
the VSG provides extra options, such as altering the tuning [6] J. Pakarinen, H. Penttinen, and B. Bank. Analysis of
of the instrument, selecting the slide tube material, setting handling noises on wound string. J. Acoust. Soc. Am.,
the contact sound volume and balance between static and 122(6):EL197–EL202, Dec. 2007.
dynamic components, or selecting an output effect (a reverb [7] J. Pakarinen, T. Puputti, and V. Välimäki. Virtual
or a guitar amplifier plugin). slide guitar. Computer Music J., 32(3), 2008.
The tube-string contact sound gives the user direct feed- Accepted for publication.
back of the slide tube movement, while the pitch of the [8] J. Paradiso and N. Gershenfeld. Musical Applications
string serves as a cue for the tube position. Thus, visual of Electric Field Sensing. Computer Music J., 21(2),
feedback is not needed in order to know where the slide 1997.
tube is situated on the imaginary guitar neck.
[9] M. Puckette. Pure data. In Proc. Intl. Computer
Music Conf., pages 269–272, 1996.
4. CONCLUSIONS [10] M. Puckette. Patch for guitar. In Proc. PureData
This paper discussed a real-time virtual slide guitar syn- Convention 07, Aug. 21-26 2007. Available on-line at
thesizer with camera-based gestural control. Time-varying http://artengine.ca/∼catalogue-pd/19-Puckette.pdf
digital waveguides with energy-compensation are used for (checked Apr. 9, 2008).
simulating the string vibration. The contact noise between [11] J. O. Smith. Physical modeling using digital
the strings and the slide tube is generated with a paramet- waveguides. Computer Music J., 16(4):74–87, Winter
ric model. The contact sound synthesizer consists of a noise 1992.
pulse generator, whose output is fed into a time-varying [12] J. O. Smith. Physical Audio Signal Processing. 2004.
resonator and a distorting nonlinearity. By controlling the Aug. 2004 Draft,
noise pulse firing rate, the resonators center frequency, and http://ccrma.stanford.edu/∼jos/pasp/.
the overall dynamics with the sliding velocity, a realistic [13] V. Välimäki, J. Huopaniemi, M. Karjalainen, and
time-varying harmonic structure is obtained in the resulting Z. Janosy. Physical modeling of plucked string
51
Figure 3: Structure and signaling of the PD patch.
Camera view
IR Cam
PC
Cam API PD
Control Synthesis
Soundcard
StrLength=
BlobDist=78p
0.49
Pluck=up Pluck=up
PullOff=
PullOff=false false
Figure 4: The complete components of the virtual slide guitar.
instruments with application to real-time sound

synthesis. J. Audio Eng. Soc., 44(5):331–353, 1996.
[14] V. Välimäki, J. Pakarinen, C. Erkut, and
M. Karjalainen. Discrete-time modelling of musical
instruments. Reports on Progress in Physics,
69(1):1–78, Jan. 2006.
[15] V. Välimäki and T. Tolonen. Development and
calibration of a guitar synthesizer. J. Audio Eng.
Soc., 46(9):766–778, 1998.
[16] M. Wanderley and P. Depalle. Gestural control of
sound synthesis. Proc. IEEE, 92(4):632–644, 2004.
52
An Approach to Instrument Augmentation: the Electric

Guitar
Otso Lähdeoja
CICM MSH Paris Nord, Paris 8 University
4, rue de la croix faron 93210 St Denis, France
(+33) 01 49 40 66 12
otso.lahdeoja@free.fr
ABSTRACT work with acoustic instruments from the classical orchestra

In this paper we describe an ongoing research on augmented instrumentarium. In the research project presented here, we are
instruments, based on the specific case study of the electric working on a form of augmentation of the electric guitar, which
guitar. The key question of the relationship between gesture, distinguishes itself as already being an acoustic–electric hybrid
instrument and sound is approached via an analysis of the instrument. Initially developed as an augmented instrument of
electric guitar’s design, playing technique and interface its time, the electric guitar is intrinsically connected with
characteristics. The study points out some inherent defaults in technology. Over the decades, it has undergone extensive
the guitar’s current forms of acoustic-electric hybridation, as experimentation and development following the technological
well as new perspectives for a better integration of the shifts from analogue electronics to MIDI, and to digital audio.
relationship between instrumental gesture and signal The electric guitar incorporates key electronic live music issues
processing. These considerations motivate an augmented guitar in itself, such as signal processing, amplification, interface and
project at the CICM, in which a gestural approach to control. Moreover, this « live electronic » praxis is, and has
augmentation is developed, emphasising the role of the been, widely shared, tested, and discussed by a worldwide
instrumentist’s repertoire of body movements as a source for community of users, in a wide variety of musical styles and
new gesture-sound « contact points » in the guitar playing expressions. With all its effects, pedals, amplifiers, and more
technique. recently computers, the electric guitar stands out as a pioneer
instrument in the area of acoustic–electronic hybridation.
Keywords
Augmented instrument, electric guitar, gesture-sound Nevertheless, as we will try to demonstrate in this article, the
relationship solutions adopted by the electric guitar fall far from an ideal
augmented instrument. In its current state, it offers a complex
and often clumsy working environment, and much too often a
1. INTRODUCTION stereotyped, reductive approach to the musical possibilities of
The current research field on augmented instruments is signal processing and synthesis. For us, the actual point of
motivated by the assumption that the combination of traditional interest lies in understanding the causes for the rather poor
acoustic instruments with today’s sound technology yields a integration between the playing technique, the guitar and the
high potential for the development of tomorrow’s musical electronics, as found in the current electric guitar set-ups. This
instruments. Integrating the tactile and expressive qualities of leads us to question on a fundamental level the design of the
the traditional instruments with the sonic possibilities of today’s gesture–sound relationship in an augmented instrument.
digital audio techniques creates a promising perspective for
instrument design. A substantial research effort has already
been conducted in the field of instrument augmentation. Some 2. TECHNOLOGICAL EVOLUTION OF
projects, like the MIT hyperinstruments group [7] [5], the THE ELECTRIC GUITAR
augmented violin of the IRCAM [3] [9], and research work at The first electric guitar (Rickenbacker « frying pan », patent
the STEIM [8], have attained emblematic status, establishing filed in 1934) was an amplified acoustic guitar, motivated by
technological and methodological models in the research field popular music’s need for louder volume levels. Its qualities of
of augmentation, such as the use of sensor technology and timbre were poor compared to acoustic instruments and due to
signal analysis techniques to « tap into » the insrumental this it remained disregarded in its debuts [11]. From the 50’s
gesture. onward, technological progress and the rise of new popular
music styles promoting the values of individuality and
1.1. Electric guitar - precursor of originality opened a demand for large scale experimentation of
the sound possibilities offered by the new electric instrument.
augmentation Starting with the development of guitars, microphones and
A short survey of the afore mentioned works on instrument amplifiers, the experimentation went on to signal processing
augmentation shows that there has been a general tendency to with analog « effects », guitar driven analog synthesisers
(Roland gr-500, 1977), creating bridges between audio and the
MIDI protocol (guitar sythesiser, Roland GR-700, 1984), and
adopting digital audio processing in the early 80’s [10].
Currently, the electric guitar is following its development with

the integration of the evermore powerful microprocessors,
whether incorporated, like in the in the « many-guitars-in-one »
modeling system proposed by Line6 (Variax), or on a PC with a
« plug & play » environment like the « guitar rig » of Native
Instruments. Other approaches are being explored (and
53
commercialised), like the Gibson HD Digital Guitar featuring seems very limited and qualitatively poor in regard to the sonic
onboard analog-to-digital conversion and an ethernet possibilities offered by the technologies used.
connection which outputs an individual digital audio stream for 2) The instrument undergoes spatial extension, going from a
each string. single object to a collection of interconnected modules. A
common electric guitar playing environment comprises a guitar,
3. SOUND AND GESTURE a set of « effect » pedals and an amplifier, adding up to form an
environment which may easily expand beyond a single person’s
RELATIONSHIP IN THE ELECTRIC physical capacities of simultaneous control.
GUITAR
3.1. The gesture – string – signal continuum It appears to us that the cumulative approach to
The basis of the electric guitar is a transduction of the of the « electrification » and augmentation adopted by the electric
vibrating strings’ mechanical energy into electricity by a guitar carries inherent problems for the signal processing
microphone. The electromagnetic pick-up converts the vibration control which lead to downgrading its sonic and expressive
of the string directly into voltage, thus creating an immediate possibilities. Nevertheless, the established modular set-up of the
causal relationship between the instrumental gesture providing electric guitar is currently undergoing a profound
the initial energy, the string, and the electric signal produced. transformation with the advent of digital audio computing
Thus the basis of the electric guitar preserves the fundamental within the guitar itself or with PC plug-and-play environments.
characteristic of the acoustic instruments, the connection This development could offer an opportunity to redesign the
between gesture and sound, through direct energy transduction electric guitar by efficiently integrating the signal processing
as described by Claude Cadoz [4]. This intimacy ensures a high with the player’s gestures, and connecting the electronic graft to
quality instrumental relationship between the player and the the instrument and to its playing technique on a fundamental
guitar, a fact that has certainly contributed to the success of the level.
electric guitar among other experimental instruments. Players
experience an immediate response, multimodal (haptic, aural, 4. « CONTACT POINTS » : AN ANALYSIS
and, to a lesser degree, visual) feedback, a sense of OF THE GESTURE-SOUND
« connectedness » to the instrument.
RELATIONSHIP :
The augmentation project we have undertaken has its basis in
3.2. A cumulative model of augmentation the observation that a musical instrument is not simply an
While the basis of the electric guitar is a genuine « electrified » object, but a meeting point between a gesture and an object, the
acoustic instrument, its hybrid quality becomes more abstruce result of this encounter being the sound which is produced. For
with the addition of various sound shaping modules or us, a musical instrument loses its essence when taken out of its
« effects », essential in creating the instrument’s tone. These context, i.e. the relationship with the human body. In this
analog or digital extensions are powered by electricity and have « gestural » approach of the instrument, the central question is
no direct energy connection to the initial playing gesture, and to find ways of understanding the link between the body, the
therefore alternative strategies for their control must be object and the sound produced. The nature of the continuum
conceived. This makes up for a second « level » of the hybrid between gesture and sound, mediated by the instrument, is a key
instrument, where the gesture – sound relationship has to be factor for the expressive and musical qualities of an instrument.
designed solely by means of creating correspondencies between A highly functional continuum enables the player to gradually
acquired gesture data and sound processing parameters. The embody the instrument in a process where the musician’s
design of the « electric » level of the hybrid instrument is a proprioception extends to the instrument, resulting in an
central question of instrument augmentation, all the more experience of englobing the instrument and playing directly
challenging as the electric « implant » should integrate and with the sound [2]. Through observation of how the musician
enhance the instrument without hindering the acoustic level’s connects to the instrument, it appears that the body manipulates
sonic possibilities and playing technique. the instrument with a repertory of very precisely defined
movements. Each part of the body connecting to the instrument
In the case of the electric guitar, the question of coexistence has its own « vocabulary » of gestures adapted to its task and to
between the acoustic and electric levels of the instrument has the constraints of the object. This repertory forms the
been addressed with a cumulative model of augmentation. In « instrumental technique » where constituants of the corporal
this process, the electric level is conceived as an extension of « vocabulary » are combined in real time to form an
the initial acoustic instrument, leaving the latter relatively instrumental discourse. Each movement and combination of
intact. Thus the core of the electric guitar does not vary much movements has its caracteristic sonic result. We use the term
from the acoustic one’s : both hands are involved in the initial « contact points » to signify these convergencies between
sound production, working on the mechanical properties of the gesture and object which result in the production or
strings. The augmented part of the instrument is grafted « on modification of a sound. It allows us to think in terms of a
top » of this core by adding various sound processing modules continuum between these three elements and to establish a
and their individual control interfaces. The consequences of this « map » of their relationships in the playing environment.
cumulative process of augmentation are twofold :
1) The playing environment becomes more complex as For instance, mapping « contact points » on the electric guitar
interfaces are added, each new module requiring a separate results in a precise repertory of the gestural vocabulary
means of control. Moreover, as both hands work mainly on the comprised in the playing technique in relationship with each
initial sound production, the control of the augmented level gesture’s corresponding interface and sonic result. We can thus
needs to be relegated to the periphery of the playing technique, establish a typology of initial sound producing « contact
using the little free « space » that can be found in between the points » (left and right hand techniques on the strings) and of
existing hand gestures or in other parts of the body like the feet. the gesture-interface couples which control the instrument’s
Due to this marginal position, the control of signal processing electric level (potentiometers, switches, pedals etc. and their
corresponding gestures). This allows for a comprehensive
54
articulation of the instrumental environment in the scope of In our augmentation project we have adopted a gesture-based
establishing strategies for further and/or alternative methodology which proceeds by an initial mapping of the
augmentations. « contact points » comprised in the guitar’s basic playing
technique. The augmentation potential of each gesture is
evaluated in relationship with the available aquisition and sound
processing/synthesis techniques. In parallel, we study the
musician’s body in the playing context, looking for potential
« ancillary » [12] gestures not included in the conventional
playing technique. We then look for ways of tapping into these
gestures with an adapted gesture acquisition system (sensors or
signal analysis), thus activating a new « contact point ».
Following is a selection of augmentations we are working on.
Audio and video material of the augmented guitar and its
related M A A music project can be found at : www.
myspace.com/maamusique
Figure 1. Mapping « contact points » on the electric guitar
(gesture-specific detail not included here). 5.1. « Tilt – Sustain » augmentation
This augmentation is motivated by a double observation : 1) the
In the perspective of instrument augmentation, there is a dual upper body movements that characterise the performance of
interest in the mapping of « contact points ». On the one side, many guitarists remain disconnected from the actual sound.
breaking down the complexities of instrumental playing into a They carry an untapped expressive potential. 2) the sound of the
set of « meta-gestures » and their corresponding sonorities guitar has a very limited duration, which keeps it from
allows us to focus on strategies of « tapping into » the gestures employing long, sustained sounds. The development of the
of the standard instrumental technique, motivated by an guitar can be seen as a long search for this sustained quality [6].
intimate knowledge of gesture and medium. The gesture data The electric guitar with distortion and feedback effectively
acquisition can thus be adapted to the instrument according to attains that but only with a very distinct « overdriven » tone
its technical and playing specificities, using both direct and and high volume levels. The idea of our augmentation is to
indirect acquisition techniques [13]. On the other side, a map of create a sustainer controlled by the tilt of the guitar and of the
contact points allows for the articulation of the instrument player’s torso : the more the guitar is vertical, the more sustain.
according to a typology of « active zones » participating in the The augmentation is developped with a 2-axis tilt sensor
sound production, and of « silent zones » : convergencies of attached to the guitar, mapped to a realtime granular synthesis
gestures and localisations which have no role in the production engine which records the guitar sound and recycles it into a
of sound. From this « map » of « active » and « passive » synthesised sustain. The tilt–sustain augmentation activates a
regions of the instrumental environment, we may go on to find new « contact point » in the electric guitar playing technique,
ways of « activating » the silent zones, creating new contact incorporating torso movements into sound creation.
points and new gestures.
5.2. « Golpe » : The percussive electric guitar
5. THE AUGMENTED GUITAR PROJECT Acoustic guitar allows for the possibility of using percussive
We are currently developing an augmented guitar at the CICM techniques played on the instruments body. Due to its
motivated by the considerations exposed in this article. The microphone design, the electric guitar has lost this ability . The
project is based on simultaneous and crossover use of direct and percussive augmentation we’re working on aims to restore a
indirect gesture data acquisition (i.e. sensors and signal percussive dimension to the electric guitar, thus reactivating a
analysis) [13], as well as, both existing and new « contact traditional « contact point » which remains unused. In order to
points ». The technological platform is made up of a standard tap into the sounds of the guitar’s body, we have proceeded
Fender Stratocaster electric guitar equipped with an additional with the installation of a piezo microphone, detecting the
piezoelectric pickup and a selection of sensors (tilt, touch, percussive attacks and then analysing the signal’s spectral
pressure). The 2-channel audio and multichannel MIDI sensor content. When hit, different parts of the instrument resonate
data output is routed to a PC performing a series of signal with specific spectra, thus allowing us to build up a set of
analysis operations : perceptive feature extraction from the localisation–sound couples. The analysed signal drives a
audio signal (attacks, amplitude, spectrum related data) [5], and sampler where the piezo output is convolved with prerecorded
gesture recognition on the sensor data. The resulting data is percussive sounds, inspired by Roberto Aimi’s approach in his
mapped to the audio engine, providing information for a work for augmented percussions [1].
dynamic control of signal processing. The project is developped
in the Max/Msp environment. 5.3. « Bend » : an integrated « wah-wah »
effect
The left hand fingers operating on the fretboard have an
essential role of producing intonations with horizontal and
vertical movements which range from a minute vibrato to an
extended four semi-tone « bends ». This technique is widely
used on the electric guitar, allowing the player to work in the
doman of continuous pitch variations as opposed to the semi-
tone divisions of the fretboard. The « bend » technique is often
used to enhance the expressiveness of the playing, giving the
guitar a « vocal » quality. The motive of this augmentation is to
Figure 2. The CICM augmented electric guitar set-up extend the inflexion gesture’s effect on the sound from a
55
variation of the pitch to a double variation of both pitch and provides high quality feedback on our augmentations, and it
timbre. In our system, we use attack detection and pitch bears a central role in (in)validating our work. As the
following to match the note’s evolution compared to its initial augmentations stabilize and become more refined, we are
pitch. The resulting pitch variation data is mapped to a filter looking forward to conduct a series of user evaluations which
section, emulating the behavior of the classic « wah-wah » could provide useful insight for further developement of the
effect. We find that controlling the filter through an expressive augmented guitar.
playing gesture incorporates the effect into the musical
discourse in a subtle manner compared to the expression pedal 7. REFERENCES
used traditionally for this type of effect.
[1] Aimi R. M. « Hybrid Percussion : Extending Physical
Instruments Using Sampled Acoustics » PhD thesis,
5.4. « Palm muting » : an augmented effect Massachusetts Institute of Technology 2007 p. 41
switch [2] Berthoz A. La décision Odile Jacob, Paris 2003 pp.
A popular playing technique on the electric guitar consists of 153-155
muting the strings with the picking hand’s palm, thus [3] Bevilaqua F. « Interfaces gestuelles, captation du
producing a characteristic, short, muffled sound. Our mouvement et création artistique » L'inouï #2, Léo
augmentation is based on the detection of the muting gesture by Scheer Paris 2006
an analysis of the spectral content of the guitar’s signal : a loss [4] Cadoz C. « Musique, geste, technologie » Les
of energy in the upper zones of the spectrum, regardless of nouveaux gestes de la musique Parenthèses, Marseille
which string(s) is(are) being played. Our system tests the 1999 pp. 49-53
incoming signal with a « model » spectrum, interpreting closely [5] Jehan T. « Perceptual Synthesis Engine : An Audio-
matching signals as the result of a muted attack. The aquired Driven Timbre Generator » Masters thesis
« muting on/off » data is used in our guitar as a haptic Massachusetts Institute of Technology 2001
augmentation of an effect pedal’s on/off switch, allowing to add [6] Laliberté M. « Facettes de l’instrument de musique et
a desired timbre quality (« effect ») to the sound simply by musiques arabes » De la théorie à l’art de
playing in muted mode. l’improvisation Delatour, Paris 2005 pp. 270-281
[7] Machover T. «hyperinstruments homepage »
http://www.media.mit.edu/hyperins/
6. CONCLUSION AND FUTURE WORK [8] Overholt D. « The Overtone Violin »
The augmented guitar project is currently evolving at a steady Proceedings Of NIME 2005 Vancouver 2005
pace, exploring new augmentations and sound–gesture [9] Rasamimanana N. H. « Gesture Analysis of Bow
relationships. Two different directions seem to emerge from this Str okes Using an Augmented Violin » Masters thesis
work : one is refining the traditional electric guitar working Paris VI University 2004
environment by finding ways of replacing the poorly integrated [10] « Roland database » http://www.geocities.
effect modules with signal processing control systems more com/SiliconValley /9111/roland.htm
closely connected to the guitar’s playing technique. The other [11] Smitsonian Institute, « The Invention of the
direction points towards more radical augmentations of the ElectricGuitar »http://invention.smithsonian
guitar’s soundscape ; associated with the will of expanding the .org/centerpieces/ electricguitar
guitar’s melodically and harmonically oriented musical [12] Verfaille V. « Sonification of musicians’ ancillary
environment towards novel possibilities of working with timbre gestures » proceedings of ICAD 2006 London 2006
and sound texture. A central factor in this research is the [13] Wanderley M. « Interaction musicien-instrument:
establishment of an interactive working relationship between application au contrôle gestuel de la synthèse sonore »
technological innovation and music. Live playing experience Phd Thesis Paris VI University 2001 pp. 40-44
56
Sormina – a new virtual and tangible instrument

Juhani Räisänen
University of Arts and Design
Helsinki, Media Lab
Voudinkuja 3 B 8
02780 Espoo, Finland
+358 40 5227204
Juhani.raisanen@taik.fi
ABSTRACT
This paper describes the Sormina, a new virtual and tangible
instrument, which has its origins in both virtual technology and 2. MOTIVATION
the heritage of traditional instrument design. The motivation The motivation for this innovation is the desire to create totally
behind the project is presented, as well as hardware and new musical instruments in the context of classical music by
software design. Insights gained through collaboration with using computers and sensors. We are interested in designing
acoustic musicians are presented, as well as comparison to digital instruments that could be accepted as part of the standard
historical instrument design. symphony orchestra. We believe that classical music can
benefit from the current developments in digital technology.
Keywords The symphony orchestra has been quite stable during the last
Gestural controller, digital musical instrument, usability, music century, although there have been some experiments using
history, design. electronics. Sormina aims to encourage the symphony orchestra
to develop further to meet the challenges of the digital era. A
handheld computer interface is operated very close to the body,
1. INTRODUCTION which makes the user experience quite intimate. By offering
Sormina is a new musical instrument that has been created as new modes of sensory engagement and intimate interaction,
part of a research project in the University of Arts and Design sormina contributes to a change in the digital world, from
Helsinki, Media Lab. Sormina uses sensors and wireless disembodied, formless, and placeless interaction to materiality
technology to play music. Its design is guided by traditional and intimacy.
instrument building.
This project participates in a long tradition of similar
In new wireless technology, the instrument loses part of its innovations, starting from the Theremin, which is a rare
traditional character. The physical connection between the example of a musical innovation to become part of classical
sounding material and the fingers (or lips) is lost. The material music practise. In addition to Theremin, one of the most
does not guide the design, which puts the designer in a totally influental to the current research has been Rubine and
new situation with new questions. This study tries to answer McAvinneys article in Computer Music Journal 1990, where
these questions by exploring the design of a new instrument that they presented their VideoHarp controller and discussed issues
is intended for use in the context of a live symphony orchestra. related to its construction [1]. Also Michel Waiswisz and his
The research has started from the concept of the interface, Hands has been a great inspiration [2]. Recently, Malloch and
which traditionally is held in hands or put in the mouth. The Wanderley have proposed the Tstick [3]. Important questions
playing posture of the musician, the delicate controllability of concerning parameter mapping have been discussed in Hunt,
the instrument and the ability to create nuances are considered Wanderley and Paradis [4].
as the key phenomena of the new design. Visual aesthetic and
usability are of equal importance.
Sormina aims to take the musician on a tour to the ancient
3. CONSTRUCTING THE INSTRUMENT
world, where tools were built to fit the fingers of human beings, 3.1 Hardware, sensors
and where technology was to serve humanity. The technological Structurally, the Sormina is built using a Wi-microdig analog to
tools have changed during centuries, but the idea of music digital encoder, a circuit board for the wiring, and 8
making stays the same. Using the most modern technology for potentiometer sensors with custom-made, wooden knobs. The
music making does not have to result in underrating our Wi-microDig is a thumb-sized, easily configurable hardware
common heritage. device that encodes up to 8 analog sensor signals to
multimedia-industry-compatible messages with high resolution
and then transmits these messages wirelessly to a computer in
real-time for analysis and/or control purposes [8]. The custom-
Permission to make digital or hard copies of all or part of this work for made circuit board takes care of the wiring. The potentiometers
personal or classroom use is granted without fee provided that copies are are mounted in the circuit board in an upright position, and the
not made or distributed for profit or commercial advantage and that encoder unit is also attached to the circuit board. The knobs of
copies bear this notice and the full citation on the first page. To copy the potentiometers are arranged in a straight line on top of the
otherwise, or republish, to post on servers or to redistribute to lists, instrument.
NIME08, June 5-7, 2008, Genova, Italy The manufacturer of Wi-microDig promises that the 8 inputs of
Copyright remains with the author(s). 10 bits resolution each can sample at up to 1500 Hz with only
57
milliseconds latency [8]. The wireless transmission complies quite reliably. The Max/MSP programming environment was
with the Bluetooth v2.0 standard, which is claimed to be a also favored for its usefulness in other parts of the project.
reliable protocol and, at 115, kbs much faster than MIDI speed.
The wireless range is guaranteed up to 100 meters without The wi-microdig patch outputs the sensor data as 7-bit
obstructions, since it is a Bluetooth class 1 device. With the information, which was found to be sufficient for the purpose of
the project. According to the tests made, it was not possible to
prototype there was considerable problems with the connection
range. The encoder in question was, however, an older model produce any larger resolution with the finger movements using
than Wi-microDig. the small potentiometer knobs of Sormina.
The construction of the controller is open: it is not put in a box A visual user interface was programmed using Max/MSP,
which also handles the connection to the encoder. One purpose
or cover. With the help of this arrangement, the visual design
appears light and spacious. However, the decision to use no of the interface is to give the musician visual cues in controlling
cover is subject to change in the forthcoming prototypes, as the the instrument. This proved to be beneficial especially in the
learning phase. In addition to the feel in the fingertip, it was
openness makes the construction vulnerable to dust and
moisture. helpful to see the state of all the sensors at one glimpse on the
screen.
The Sormina makes use of 8 potentiometer sensors, which is the
The visual interface comprises sliders, number boxes, and basic
maximum number of sensors to be connected to the encoder.
The choice between sensors was made on the basis of three notations for the sensor input. At the same time the interface is
main arguments: stability, precision and tangibility. The Wi- capable of recording a control sequence, which was found
microDig encoder comes with only one potentiometer, which useful for learning to play the instrument. While the recorded
sequence is playing back, the visual information about the state
did not fit the standards set for the instrument design. The
suitable potentiometers were purchased separately. of the sensors is shown on the interface.
The first argument for the selection of the sensor type was
stability. In order to attain a stable instrument, the sensors also
have to provide this characteristic. Stability in this context
means a sensor that would preserve its state when not touched.
Most of the available sensors are built accoeding to a
convention that does not give support to this demand.
Potentiometer sensor changes its state only by intentional
action. Stability is also required for an instrument in the sense
of durability and robustness. Potentiometers proved to be stable
also in this sense.
Figure 1. Sormina is a virtual instrument with wooden

knobs
3.2 Software
The software for Sormina has been programmed using
Max/MSP and Reaktor. It consists of three parts: one handles
the communication with the encoder through bluetooth, the
second takes care of the user interface, and the third produces
the sound. In addition, external software, Sibelius, was used for
the notation.
Figure 2. Part of the visual interfafe
The Wi-minidig comes with its own software, which actually is
not used in this project. This software is meant to take care of
the bluetooth connection and let the user decide the The sound is created using a sound synthesis patch created for
interpretation of the sensor data, which is then sent forward as the Reaktor software. The patch allows the control of several
MIDI information. In addition to this rather laborious software, features of sound synthesis. The mapping of the sensors to the
the company alsso offers on the web site for the same purpose a sound synthesis software appeared to be of crucial importance.
Max/MSP patch, which proved to be handier for the purpose of
Mainly due to the capabilities of the encoder, it was decided
the project. The wi-microdig patch for Max/MSP appeared to
that there should be 8 sensors. Nevertheless, it was found to be
handle the communication through bluetooth with the encoder
a very useful restriction. It was assumed that a human being
cannot handle too many controls at the same time. Too many
58
options could result in indeterminacy. Also, with 8 sensors, 4. IN PERFORMANCE

nearly all of the fingers could still be utilized for controlling Much of the development of the Sormina has been conducted
purposes. through collaborations with other musicians. The sound
The Reaktor software was chosen as the sound engine for the synthesis software and especially the mapping of the parameters
project, although the use of two different pieces of software has been open to change, so the insights of other performers has
instead of only one has its drawbacks. Reaktor was found to be been welcome. Still, for the purpose of creating a stable
more amenable than Max/MSP for the purposes of this project. instrument, it would have been preferable to fix the mapping at
a very early stage of development. This conflict has been one of
The sound synthesis patch in Reaktor comprises a 96-voiced the most challenging features of the project.
noise generator with filters and reverb. The patch has 26
controls for mapping but because of the restrictions of the The sound created by the Sormina seems to fit quite well with
hardware, only some of them were possible to choose. One string instruments, especially the cello. The reason for this fact
solution for the mapping problem could have been to use one was considered to be the use of noise generators as the main
sensor for several controls on the sound software but it was sound source. The sound of acoustic instruments has many
found that this would be unwise on a large scale, although some characteristics of white noise. Singing voices showed a similar
sensors are connect to two parameters. resemblance to the Sormina sound, also.
3.3 Notation 4.1 Concerts

One important part of the new instrument design was the There have been several public concerts during the first year of
attempt to notate the music created with the Sormina. It was the instrument’s existence. In addition, the Sormina has been
challenging to put up a link with Max/MSP and notation presented to researchers and students, and in seminars. The first
software for notating eight parameters in the same score. concert, in November 2006, was given with the cellist Juho
Laitinen and soprano Tuuli Lindeberg. In November 2007 the
The Sibelius software was chosen for this purpose. The note
heads were changed to triangles in order to distinguish them Sormina was played with the chamber choir Kampin Laulu.
The last performances of 2007 were in December in Los
from normal pitched notes. A number was added near the note
Angeles, where the instrument was being presented for the art
head to be more precise.
students in the California Institute for the Arts, Calarts. Two
concerts were also given in art galleries and jazz cafes in the
area.
Figure 4. The author playing the Sormina in a concert

Figure 3. Notation of the parameters of the Sormina
59
5. DISCUSSION
The aim of the Sormina project was to explore the main
principles of the instruments in classical music, from the 6. FUTURE DIRECTION
musician’s point of view, and with these findings to create a The current research has used the observation of traditional
new, stable electronic music instrument that could be accepted musical instruments and their user experience for the design of
in a symphony orchestra. The results suggested the importance a new electronic music instrument. Still, the scope of the
of three layers in the design of new instruments. The first layer exploration has been narrow, concentrating primarily on the
is the sound-synthesis that defines the audible response. The author’s experience of acoustic instruments. In the future, a
second is the mapping of the gestures to the sound parameters, more systematic inquiry will be accomplished, where
which constitutes the instrument in a conceptual manner to the professional musicians will be observed and interviewed about
musician. The third layer, often overlooked in the creation of their playing habits. Also, perceiving the learning process in the
new digital music instruments, DMIs, is the materiality and study of classical music instruments can reveal qualities that
usability layer of the controller. could then assist in new instrument design.
Much weight in the research has been put to the human hand One direction in the development of the instrument is to
and its capabilities. The author has followed Curt Sachs’ combine the sound output with a live visual output. This is
findings about the hands and feet being the first intstruments especially attractive because of the readiness of Max/MSP/Jitter
[5], and Malcolm McCullough, as he praises our hands as a best to process and produce video and other moving image. Using
source of personal knowledge [6]. A remarkable source for the same parameters in video processing brings up interesting
understanding the importance of music playing has been Tellef questions about the connection between auditory and visual
Kvifte, who formulates a classification of instruments using sensory systems.
playing techniques, not based on the way the musical sound is
produced [7]. To enhance the usability of the instrument, its robustness needs
more attention. Also, in order to compete with traditional
The Sormina research suggests that the touch and feel of the instruments, the Sormina should be developed more in the
interface is important to take into account when designing new direction of a consumer product.
instruments. The musician uses subtle, almost intuitive and
unconscious movements of her body. The fingers, for example, 7. ACKNOWLEDGMENTS
have developed through evolution to take care of the most
The author would like to acknowledge the important
sophisticated and precise actions. Therefore it is reasonable to
contributions of many people to this project, including Martijn
use the fingers for playing music. In the culture of the human
Zwartjes, Risto Linnakoski, and Matti Kinnunen. The author
being, the fingers have been crucial for surviving. Even today,
received research funds from the Wihuri Foundation and the
they are used extensively, to express our thoughts, by writing
Runar Bäckström Foundation. The University of Arts and
with a pen or a computer.
Design Helsinki has also given grants for the research.
In the course of history, traditional instruments have matured to
be well adapted to the human body. Their long evolution has 8. REFERENCES
given them power to survive even in the era of computers. [1] Rubine, D. and McAvinney, P. Programmable
Through careful examination of their principles, it is possible to Fingertracking Instrument Controllers. In Computer Music
learn from their pattern and use the results in the design of Journal, Vol 14, No. 1, Spring 1990, 26-40.
totally new electronic instruments. In the present research, the
role of the physical interface has been found to be fundamental [2] Waisvisz, M. The Hands, A Set of Remote MIDI
for such a design. It appears that attention should be paid to the Controllers. In Proceedings of 1985 International
physical appearance of the instruments in order to build stable Computer Music Conference. Computer Music
instruments. Association, San Francisco, 1985.
Sormina aims to be more than a controller. As Rubin and [3] Malloch, J. and Wanderley, M. The TStick: From Musical
McAvinney formulate, a musical instrument may be thought of Interface to Musical Instrument. In Proc. of the 2007 Conf.
as a device that maps gestural parameters to sound control on New Interfaces for Musical Expression (NIME-07),
2007, 66-69.
parameters and then maps the sound control parameters to
sound [1]. By binding together a fixed set of sensors with a [4] Hunt, A., Wanderley, M. and Paradis M. The importance
stable sound source, we have developed Sormina into an of parameter mapping in electronic instrument design. In
instrument, not a controller. Proc. of the 2002 Conf. on New Interfaces for Musical
Expression (NIME-02), 2002, 149–154.
Sormina attempts to be engaging to new musicians, but also
rewarding for the professionals. Based on the current evidence, [5] Sachs, C. The History of Musical Instruments. Norton,
these goals have been reached to a large extent. New York, 1940, 25-26.
The Sormina has been played in concert situations, both solo [6] McCullough, M. Abstracting Craft. The Practiced Digital
and with acoustic musicians. Playing with an acoustic cello has Hand. The MIT Press, Cambridge, Massachussets, 1996,
been rewarding, but an a cappella choir also made a good 1-15
combination with the electronic sounds of the Sormina. [7] Kvifte, T. Instruments and the electronic age. Toward a
terminology for a unified description of playing technique.
The experience of concerts with acoustic instruments and
Solum förlag, Oslo, 1988, 1.
singers point out that the sound quality and playing techniques
of Sormina are well adaptable to classical music orchestra. The [8] Wi-Microdig v6.00/6.1 <http://infusionsystems.com/
possibility to notate the playing brings another useful catalog/info_pages.php?pages_id=153
characteristic for use with a symphony orchestra.
60
Practical Hardware and Algorithms for Creating Haptic

Musical Instruments
Edgar Berdahl Hans-Christoph Steiner Collin Oldham

CCRMA/Stanford University ITP/NYU CCRMA/Stanford University
eberdahl@ccrma.stanford.edu hans@at.or.at coldham@ccrma.stanford.edu
Actuator Sensor
Musician
Controller
ABSTRACT Sound Gesture

Signals Signals
The music community has long had a strong interest in hap-
tic technology. Recently, more effort has been put into mak-
ing it more and more accessible to instrument designers. Figure 1: A musician interacting with a haptic mu-
This paper covers some of these technologies with the aim sical instrument
of helping instrument designers add haptic feedback to their
instruments. We begin by giving a brief overview of practical
actuators. Next, we compare and contrast using embedded There has been a wide array of research into haptics over
microcontrollers versus general purpose computers as con- the past decades, the vast majority taking place in special-
trollers. Along the way, we mention some common software ized research labs with elaborate and custom equipment.
environments for implementing control algorithms. Then we Haptic feedback plays a key role in playing traditional in-
discuss the fundamental haptic control algorithms as well as struments, and with the push to further develop electronic
some more complex ones. Finally, we present two practical instruments, musicians have begun integrating haptic feed-
and effective haptic musical instruments: the haptic drum back into electronic instruments.
and the Cellomobo. Recently, a number of developments have opened up hap-
tic exploration to projects with smaller budgets and more
common facilities. Additionally, as it becomes easier to ac-
Keywords cess haptics equipment, it becomes possible to create haptics
haptic, actuator, practical, immersion, embedded, sampling platforms oriented to musical instrument designers. This is
rate, woofer, haptic drum, Cellomobo especially interesting to designers looking to create their own
instruments, since it means that they can design and employ
useful haptic feedback in their own instruments.
1. INTRODUCTION
A haptic musical instrument consists of actuators that ex- 2. ACTUATORS
ert forces on the musician, sensors that detect the gestures
of the musician, an algorithm that determines what forces Actuators form the core of any haptic musical instrument.
to exert on the musician, and a controller that runs the al- The ideal actuator is linear and time invariant (LTI), has in-
gorithm and interfaces with the sensors and actuators. The finite bandwidth, can render arbitrarily large forces, and is
instrument often synthesizes sound as well. Figure 1 illus- accompanied by an LTI sensor with infinite resolution. In
trates how the musician is included in the haptic feedback practice, the actuator usually limits the performance of hap-
loop. tic feedback in a haptic musical instrument. One effective
design approach is to choose the actuator so that it directly
complements the metaphor of the target haptic musical in-
strument. For instance, for a haptic drum, use a woofer to
mimic a vibrating drum membrane.
2.1 Vibrotactile Actuators

Permission to make digital or hard copies of all or part of this work for Marshall and Wanderley [23] and Hayward and MacLean
any purpose are granted under a Creative Commons Attribution 3.0 license: [20] provide good overviews of some actuators, so here we
http://creativecommons.org/licenses/by/3.0/
NIME08, Genova, Italy cover only the most effective and practical actuators for mu-
Copyright 2008 Copyright remains with the author(s). sical instrument designers.
61
2.1.1 Vibrating Motors

Table 1: Approximate Actuator Costs in U.S. $
Vibrating motors are the most common haptic actuators. Device Price
They are widely used in mobile phones and other commu-
Vibrating motor $1-$20
nications devices. They are built using a motor with an
Tactor $5-$200
unbalanced weight attached to the spindle. They are al-
Alps motorized fader $30
most always used to generate a fixed frequency vibration,
Woofer/shaker $40
but some variation is possible. They are cheap, simple, and
easy to obtain, but they have a slow ramp up time, which Servomotor with encoder $400
limits their application. Novint Falcon $200
SensAble Omni $1000
2.1.2 Tactors
Tactors are specialized motors that produce vibrations in
a frequency range appropriate for sensing by the skin. They The challenge in applying woofers and shakers effectively
are included in devices like the iFeel mice. Immersion builds typically lies in integrating a sensor with the actuator.
their tactors using “Inertial Harmonic Drive”, which basi-
cally means a motor with a very small gear ratio whose 2.2.4 Multi-DOF Haptic Devices
spindle is attached to a surface by a somewhat flexible ny- Commercial robotic arms like the 6DOF2 SensAble Phan-
lon linkage. The motor yanks on the linkage to generate a tom have been available for a number of years now. They are
pulse. Another type of tactor is made using a piezoelectric typically designed to be held in the hand like a pen. They
element to actuate a plate under tension. It is also possible have traditionally been expensive and relatively rare; how-
to build low-cost tactors using vibrating motors [17]. ever, advancements in teleoperation and minimally-invasive
surgery in particular have driven production costs per unit
2.2 Force Feedback Actuators down significantly so that the Phantom Omni can be ob-
In order to provide force feedback in practice, it is nec- tained for $1000.
essary to measure the behavior of the haptic device in the The Novint Falcon is a more limited 3DOF haptic device
same dimension as it is actuated, making force feedback se- that is designed for gaming. While it does not provide the
tups more complex. flexibility or fidelity of the cheapest Phantom, it is available
for less than $200.
2.2.1 Motorized Faders
Alps Electric Co. and other manufacturers make motor- 3. CONTROLLERS
ized faders designed for use in digital control surfaces. These To provide force feedback, a control loop is usually called
faders consist of a belt motor drive attached to a linear slider every 1/fS seconds, where fS is the sampling rate. This
potentiometer. The potentiometer can serve as the position control loop reads inputs from the sensors, computes appro-
sensor for the haptic feedback loop controlling the motor. priate outputs to the actuators, and then immediately sends
Since the motor is relatively small, these faders cannot ex- the outputs to the actuators. In order to have a responsive
ert large forces, but they are cheap, pre-assembled and rel- haptic musical instrument, the controller must be quick. In
atively easy to procure. other words, the system delay (also known as input-output
2.2.2 Servomotors with Optical Encoders delay) should be short, and the sampling rate should be
high. For most operating systems, these requirements are
To produce relatively large forces, we have been using ser- mutually exclusive, so in the following sections, we consider
vomotors with built-in optical encoders that sense position common control hardware implementations.
[27]. We use the Reliance Electric ES364 servomotor with a The sampling rate is an important factor. Typical hap-
peak-torque specification of 6.5 kg-cm and encoder resolu- tics applications do not require sampling rates as high as
tion of 1000 pulses/rev (4000 counts).1 An arm attached to audio. For instance, the CHAI 3D haptics framework does
the motor shaft makes it possible to interface the motor ef- not support sampling rates above 1kHz for most devices [8].
fectively with the hand. A force-sensitive resistor placed at However, some haptic musical instruments send audio sig-
the end of the shaft provides an additional sensed quantity nals through the feedback loop. The human range of hearing
useful in further fine-tuning the force feedback. spans roughly 20Hz to 20kHz. According to the Nyquist-
Shannon sampling theorem, the sampling rate must be at
2.2.3 Woofers and Shakers
least 40kHz so the whole bandwidth that humans hear can
In contrast with rotational servomotors, woofers and shak- be sampled and reconstructed within the feedback loop.
ers are linear actuators. As a consequence, the maximum Haptic musical instruments taking full advantage of feed-
displacements they provide are typically limited to a couple ing aurally-relevant acoustic signals back through the haptic
centimeters or less. Nevertheless, these actuators can be eas- device must run at much higher sampling rates on the order
ily obtained at low-cost. Shakers are similar to woofers, but of 40kHz. It is true that these higher frequencies are very
they have no cone for pushing air. Instead they mount to poorly sensed by the human tactile system, but in a bowed
and shake a piece of furniture so that a listener can feel bass string experiment, users reported that the system neverthe-
and infrasonic frequencies in music and movie soundtracks. less felt much more real when the haptic sampling rate was
1 44kHz instead of 3kHz. They made comments regarding the
While the ES364 is now out of production, Applied Motion
sells the comparable VL23-030D with an optical encoder for “strong presence of the string in the hand,” “the string in
$400. It provides a maximum peak torque of 5.9kg-cm. This the fingers,” and “the string is really here” [22].
type of motor can be obtained surplus for prices as low as
2
$7 each. six degrees of freedom
62
4. EMBEDDED MICROCONTROLLERS 4.2.2 Immersion Studio

Embedded microcontrollers can be run without any oper- Immersion Studio, proprietary software only for Windows,
ating system or extraneous processes, which might interfere is required to create and edit Immersion effects. The avail-
with the control loop timing. In addition, they are small, able effects are classified by Immersion thusly: Vibrational
allowing them to be easily embedded within musical instru- (Periodic), Positional (Texture, Enclosure, Ellipse, Spring,
ments, and they can be configured to interface with a wide Grid), Directional (Constant, Ramp), and Resistive (Damper,
variety of sensors and actuators. Atmel processor-based mi- Friction, Inertia) [1]. Immersion Studio allows designers to
crocontrollers such as the AVR [5] and especially the Ar- experiment with the set of effects and build them into an
duino [7] have recently become popular in computer music. object that can be integrated into and triggered within a
Note that these microcontrollers do not natively support program. It is possible to create more elaborate compound
floating-point calculations.3 This is generally of no concern effects by combining effects with waveforms and envelopes
for simple algorithms, but more complex algorithms become into an object that can be triggered as a single unit.
much more difficult to implement without loss of fidelity. Immersion also produces more specialized versions of its
Studio program for other markets, including medical and au-
4.1 Generic Programming Tools tomotive applications. Additionally, they have the VibeTonz
Sometimes it is most convenient to program the control SDK for controlling the vibrating motor in some mobile de-
loop directly using generic tools. Microcontroller libraries vices and the VirutalHand SDK for controlling their glove
such as AVR-lib make reading data from the sensors and systems. In general, this special software and equipment is
writing data to the actuators straight-forward [5]. For teach- targeted at specific markets and is not easy to obtain.
ing purposes, we use a combination of the AVRMini Atmel-
based microcontroller board, the spyglass unit for producing 5. GENERAL PURPOSE COMPUTERS
debugging output, and the AVR motor controller board [3]. In contrast with the aforementioned microcontrollers, gen-
eral purpose computers are much faster and support native
4.2 Immersion and USB PID floating-point calculations. However, general purpose com-
Immersion, Inc. sells a number of tools to make design- puters face a considerable drawback when controlling haptic
ing haptic feedback easier. Immersion devices use “effects”, feedback: the operating system schedulers, the bus systems,
which are built upon wavetables and envelopes and are han- and device interface protocols can interfere with the ideally
dled by embedded microcontrollers. These effects can either deterministic timing of the control loop. Using an RS232 se-
be linked directly to button and position data using the rial port directly can help, but the maximum sampling rate
microcontroller, or they can be controlled by the host com- will still be limited by the scheduler.
puter via USB. The latency and jitter of USB are too high
to handle the feedback loop,4 so the microcontroller main- 5.1 DIMPLE
tains the feedback loop. In Immersion-compliant devices, Allowing musical instrument designers to incorporate a
the feedback loop controlling the motors probably runs at wide range of haptic behaviors into an instrument [25], DIM-
1kHz or faster.5 The data sent over USB is used purely for PLE takes full advantage of the CHAI 3D [8] and the Open
configuring and triggering the microcontroller. Dynamics Engine (ODE) [10] libraries. ODE models the
While a number of Immersion’s haptic devices are not state of the virtual world, and CHAI 3D renders visual feed-
easily procured, such as the tools for the medical and auto- back and mediates the link between the virtual and the
motive industries, the tools aimed at video game develop- haptic worlds. The CHAI 3D library is compatible with
ment are practical for creating haptic feedback in musical Windows and GNU/Linux, and it supports a wide variety
applications. Joysticks and steering wheels provide kines- of haptic interfaces including the SensAble devices and the
thetic and vibrotactile feedback using motors and position Novint Falcon. With the SensAble Phantom Omni, the
sensors; mice and gamepads provide vibrotactile feedback maximum sampling rate is 1kHz, which limits haptic in-
using tactors and vibrating motors. teraction at audio frequencies. The most recent release of
DIMPLE incorporates a method for sending downsampled
4.2.1 USB Physical Interface Devices audio-frequency data to the actuators, but the delay, which
Immersion, Inc. has worked to get their protocol into the is probably longer than 5ms,6 prevents practical implemen-
USB Human Interface Devices (HID) [11] standard in a new tation of high-bandwidth feedback control.
subsection called PID (Physical Interface Devices) [12]. To
program USB PID devices, each operating system has its 5.2 TFCS
own API: Apple has the HID Manager and ForceFeedback In contrast, the open-source Toolbox for the Feedback
APIs, Microsoft has the DDK HID and Immersion APIs, Control of Sound (TFCS) facilitates the implementation of
and the Linux kernel has the iforce module and the libff haptic algorithms with large feedback bandwidths when us-
API. ing general purpose computers [14]. Virtual musical instru-
ment models are provided via the Synthesis Toolkit (STK).
Since they are implemented efficiently using digital waveg-
3
They emulate floating-point calculations using integer uide technology, they can operate in synchrony with the
arithmetic, which is too slow to be useful in most haptic haptic device at sampling rates as high as 40kHz with less
algorithms. than one sample of delay. The TFCS ensures that the con-
4
USB HID devices usually communicate with the host com-
puter every 8-10ms; some devices can communicate faster, 6
This theoretical lower limit has been derived during per-
up to 1ms intervals. sonal communication with Stephen Sinclair and still needs
5
The manufacturers do not publish these rates. to be measured.
63
F
Table 2: Control Hardware
Approximate Native
Control Maximum minimum floating x
hardware sampling rate delay point
ATMEL-based ≈ 20kHz =50μs
˙ N z
DIMPLE 1kHz < 1ms Y
TFCS 40kHz ≈ 20μs Y x
ASP 96kHz typ. 10ms typ. Y
Figure 2: Force profile F (x) (above) and terrain

trol loop is called regularly by using the Real-Time Appli-
height profile z(x) (below) for a simple detent.
cation Interface (RTAI) for Linux [6] and the Linux Control
and Measurement Device Interface (Comedi) [9]. In multi-
processor machines, the control loop runs isolated on one Whenever the haptic device is pushed inside the virtual
processor, while all other code is executed on the remaining wall (i.e. x > 0), a spring force acts to push the device
processors. back out of the wall. So that the wall feels stiff, k should
be chosen large. The maximum stiffness that a haptic de-
5.3 Audio Signal Processing (ASP) Environ- vice can render is governed by a fundamental limit, which is
ments chiefly a function of the system delay, the sampling rate, and
Most general purpose computers also come equipped with the internal physical damping of the device [19]. In general,
sound interfaces, so designers should consider whether a more expensive haptic devices are required for rendering es-
sound interface can be used for implementing the control pecially stiff virtual springs and walls.
loop. However, sound interfaces are not designed for very
low-latency applications. Besides employing block-based pro- 6.1.3 Detents And Textures
cessing, sound interfaces use sigma delta modulator convert- Detents can help the musician orient himself or herself
ers that add considerable system delay [13]. The smallest within the playing space of the instrument. Detents can be
system delay we were able to achieve on a 4.4GHz dual core created even with 1DOF haptic devices. Figure 2 illustrates
AMD-based machine7 was 4ms, where fS = 96kHz. Nev- how to implement a simple detent. Near the origin, the
ertheless, this hardware/software solution is acceptable for force profile looks like that of a spring, while the force goes to
some kinds of instruments. For example, haptic instruments zero when the position x moves further from the detent [27].
that respond slowly to the environment can be implemented A simple potential energy argument implies that the force
without problems. profile F (x) is proportional to the derivative of the terrain
height z(x) (see Figure 2), allowing arbitrary terrains and
6. ALGORITHMS textures to be created.
6.1.4 Event-Based Haptics

6.1 Standard Haptic Algorithms
Another effective algorithm uses the sensors to detect cer-
tain events. When an event occurs, a stored waveform is sent
6.1.1 Spring to the actuators. A common example in gaming is sending
An actuator induces a force F on the haptic device. Most a recoil force waveform to the actuators when the user fires
haptic devices measure the movement of the device in re- a weapon. Since virtual walls cannot be made infinitely
sponse as a displacement x. Hence, the most fundamental stiff, some musical instrument designers may consider send-
(i.e. memoryless) haptic algorithm for these devices imple- ing ticks or pulses to the haptic interface whenever the inter-
ments a virtual spring with spring constant k. face enters a virtual wall. This type of event-based feedback
is known to improve the perception of hardness [21].
F = −kx (1)
6.2 Algorithms Requiring High Sampling Rates
The virtual spring in combination with the physical mass
and damping of the haptic device forms a damped harmonic 6.2.1 Virtual Instruments
oscillator, which can be plucked or bowed. By obtaining Extensive studies on the physical modeling of acoustic mu-
estimates of the haptic device’s velocity or acceleration, the sical instruments have led to the development of many dif-
device’s damping and mass can be controlled analogously. ferent acoustic musical instrument models. One simple way
to create a haptic musical instrument is to interface a hap-
6.1.2 Wall tic device with a virtual instrument according to the laws of
An algorithm similar to the spring implements a wall at physics [18]. For efficiency reasons, it is often convenient to
x = 0: run the haptic control loop at a standard haptic sampling
rate, while the musical instrument model runs at a higher
sampling rate to provide high-quality audio. For example,
F = −kx · (x > 0) (2)
the Association pour la Création et la Recherche sur les Out-
7
The machine was running the Planet CCRMA distribution ils d’Expression (ACROE) often employs a haptic sampling
of Fedora Core, which has a patched kernel allowing low- rate of about 3kHz, while audio output is often synthesized
latency audio. at standard audio sampling rates, such as 44kHz. However,
64
Figure 3: Haptic drum
ACROE sometimes employs their ERGOS device with dedi-

cated DSP hardware to run both the haptic and audio loops
at 44kHz in real-time [22].
6.2.2 Actively Controlled Acoustic Instruments

An actively controlled acoustic musical instrument is an Figure 4: Cellomobo front (left) and back (right)
acoustic musical instrument that is augmented with sensors,
actuators, and a controller. These instruments can be con-
sidered a special case of haptic musical instruments where
sinusoidally, various period-doubling and apparently chaotic
the interface is the entire acoustic instrument itself. For
effects may be observed.
example, a monochord string can be plucked and bowed at
various positions as usual, while its acoustic behavior is gov-
erned by the control hardware. Simple and appropriate con- 7.2 Cellomobo
trol algorithms emulate passive networks of masses, springs, The Cellomobo is an instrument allowing the musician to
and dampers or implement self-sustaining oscillators [15]. bow a virtual string using a haptic interface [4]. The length
of a the string is adjusted by a resistive ribbon controller (see
Figure 4, left). The vibrating string element consists of a
7. EXAMPLES piezoelectric disc pickup (see Figure 4, bottom left), which is
mounted upon a shaker (see Figure 4, bottom). The haptic
7.1 Haptic Drum feedback and sound synthesis algorithms run at the audio
The haptic drum is a haptic musical instrument that can rate in Pure Data.
be constructed out of components found in practically any Figure 5 shows a diagram of the the Cellomobo’s com-
computer music laboratory [2]. It employs an event-based bined haptic feedback/sound synthesis engine. The dotted
haptics algorithm that is implemented using a woofer actua- box encloses the digital waveguide model of a lightly damped
tor, a general purpose computer, and an ASP environment. vibrating string. N/fS is the period of the note being played.
The woofer actuator conforms to the metaphor of a vi- The internal feedback loop gain g is between 0.9 and 0.999
brating drum membrane. A sunglass lens is attached rigidly and is controlled by a knob. HLP (z) is a lowpass filter caus-
to the cone but held away from the sensitive surround part ing the higher partials to decay more quickly [26]. The outer
by way of a toilet roll (see Figure 3). Whenever a drumstick feedback loop is closed around the shaker and piezoelectric
strikes the sunglass lens, it makes a loud “crack” sound. A pickup, which provides the excitation input to the instru-
nearby microphone (not shown) provides an input signal to a ment. H2LP (z) is a second order lowpass filter to remove
sound interface. A Pure Data patch detects drumstick colli- upper partials from the feedback loop. The cut-off frequency
sions by checking the threshold of the microphone signal en- of the filter is controlled by left hand finger pressure, to give
velope. Whenever a collision is detected, an exponentially- the musician control of tone color. Before the output sig-
decaying pulse is sent to the woofer that effectively modi- nal reaches the actuator, a hard clipping nonlinearity clips
fies the coefficient of restitution of the collision. The hap- off the tops of the wave form. This gives the haptic signal
tic drum can be configured to make it easier to play (one- more of a square shape, causing the bow to release from the
handed) drum rolls. It also facilitates playing various “gal- bowing surface more easily.
loping” and “backwards” drum rolls, which are otherwise The novel addition of the inner feedback loop is nonphys-
nearly impossible to play using one hand [16]. If instead a ical, but it allows the instrument to be less sensitive to the
ping pong ball is placed on the lens, and if the lens is driven dynamics of the sensor and actuator. This structure en-
65
Bowing surface with [9] http://www.comedi.org/.

piezo pickup [10] http://www.ode.org/.
[11] USB HID.
z −A + z −N http://www.usb.org/developers/hidpage/.
[12] USB PID. http://www.usb.org/developers/
g HLP (z) devclass_docs/pid1\_01.pdf.
[13] M. Antila. Contemporary electronics solutions for
Shaker active noise control. In Proc. Int. Symposium on
Active Noise and Vibration Control, September 2004.
z −B H2LP (z) [14] E. Berdahl, N. Lee, G. Niemeyer, and J. O. Smith.
Power Practical implementation of low-latency DSP for
amplifier feedback control of sound in research contexts. In
Proc. of the Acoustics ’08 Conference, June 2008.
Figure 5: The Cellomobo block diagram [15] E. Berdahl, G. Niemeyer, and J. O. Smith.
Applications of passivity theory to the active control
of acoustic musical instruments. In Proc. of the
hances the playability of the instrument and differentiates Acoustics ’08 Conference, June 2008.
the Cellomobo from previous research efforts [22]. In fact, [16] E. Berdahl, B. Verplank, J. O. Smith, and
the behavior is so robust, that the instrument functions de- G. Niemeyer. A physically-intuitive haptic drumstick.
spite the large ASP system delay (A + B)/fS =20ms.
˙ Note In Proc. Internat’l Computer Music Conf., volume 1,
that this delay is an order of magnitude longer than the pages 363–366, August 2007.
period of the highest note that can be played on the instru- [17] A. Bloomfield and N. I. Badler. A low cost tactor suit
ment, which is about 1ms. for vibrotactile feedback. Technical Report 66, Center
for Human Modeling and Simulation, University of
Pennsylvania, 2003.
8. CONCLUSIONS
[18] N. Castagne and C. Cadoz. Creating music by means
Making haptic musical instruments is not so difficult given of ’physical thinking’: The musician oriented genesis
some forethought and knowledge about the field! Incorpo- environment. In Proc. of the Fifth Int. Conf. on
rating haptic feedback is also often worth the effort—haptic Digital Audio Effects, pages 169–174, 2002.
feedback has been shown to improve the user’s impression of
[19] N. Diolaiti, G. Niemeyer, F. Barbagli, J. K. Salisbury,
playing a haptic musical instrument [22]. Haptic feedback
and C. Melchiorri. The effect of quantization and
has been informally found to make it easier for users to play
coulomb friction on the stability of haptic rendering.
various types of drum rolls [16]. Finally, haptic feedback
In Proc. of the First Joint Eurohaptics Conf. and
has been further shown to improve the accuracy of musi-
Symp. on Haptic Interfaces, pages 237–246, 2005.
cians playing a haptic musical instrument [24].
[20] V. Hayward and K. MacLean. Do it yourself haptics,
In this paper, we presented ideas on how to practically
part 1. IEEE Robotics and Automation Magazine,
implement such instruments given today’s technology. We
14(4):88–108, December 2007.
hope our efforts will help make haptic technologies more
accessible to designers and musicians. We expect more su- [21] K. Kuchenbecker, J. Fiene, and G. Niemeyer.
perior haptic technologies to become even more accessible Improving contact realism through event-based haptic
as other fields drive haptic device development. feedback. IEEE Transactions on Visualization and
Computer Graphics, 12(2):219–230, March/April 2006.
[22] A. Luciani, J.-L. Florens, D. Couroussé, and C. Cadoz.
9. ACKNOWLEDGEMENTS Ergotic sounds. In Proc. of the 4th Int. Conf. on
We would like to thank all of the people at or from CCRMA Enactive Interfaces, pages 373–376, November 2007.
who have helped us and inspired us to study haptics: Bill [23] M. Marshall and M. Wanderley. Vibrotactile feedback
Verplank, Julius O. Smith III, Günter Niemeyer, Chris Chafe, in digital musical instruments. In Proceedings of the
Sile O’Modhrain, Brent Gillespie, and Charles Nichols. 2006 International Conference on New Interfaces for
Musical Expression (NIME06), Paris, France, 2006.
10. REFERENCES [24] S. O’Modhrain and C. Chafe. Incorporating haptic
[1] Immersion fundamentals. feedback into interfaces for music applications. In
http://www.immersion.com/developer/downloads/ Proc. of ISORA, World Automation Conference, 2000.
ImmFundamentals/HTML/ImmFundamentals.htm. [25] S. Sinclair and M. Wanderley. Extending dimple: a
[2] http://ccrma.stanford.edu/~eberdahl/Projects/ rigid body haptic simulator for interactive control of
HapticDrum. sound. In Proc. of 4th Int. Conf. on Enactive
Interfaces, November 2007.
[3] http://cm-wiki.stanford.edu/wiki/AVR.
[26] J. O. Smith. Physical Audio Signal Processing: For
[4] http://homepage.mac.com/coldham/klang/
Virtual Musical Instruments and Audio Effects.
cellomobo.html.
http://ccrma.stanford.edu/˜jos/pasp/, 2007.
[5] http://hubbard.engr.scu.edu/embedded/.
[27] B. Verplank. Haptic music exercises. In Proc. of the
[6] https://www.rtai.org/. Int. Conf. on New Interfaces for Musical Expression,
[7] http://www.arduino.cc/. pages 256–257, 2005.
[8] http://www.chai3d.org/.
66
Considering Virtual & Physical Aspects

in Acoustic Guitar Design
Amit Zoran Pattie Maes

MIT Media Laboratory MIT Media Laboratory
20 Ames Street 20 Ames Street
Cambridge, MA 02139 Cambridge, MA 02139
amitz@mit.edu pattie@media.mit.edu
ABSTRACT
This paper presents a new approach for designing acoustic guitars, 1.1. Acoustic, Electric and Virtual Guitar
making use of the virtual environment. The physical connection
between users and their instruments is preserved, while offering The design of a guitar is influenced by its cultural context. For
innovative sound design. This paper will discuss two projects, thousands of years lutes and afterwards guitars evolved: starting
reAcoustic eGuitar, the concept of a digitally fabricated with ancient instruments that were made out of natural chambers
instrument to design acoustic sounds, and A Physical Resonator (turtle shells, gourds), through fine handmade wooden chambers
For a Virtual Guitar, a vision in which the guitar can also [4] to electrically amplified guitars. Carfoot [7] presents and
preserve the unique tune of an instrument made from wood. analyzes the huge changes in guitar in the 20th century; electric
guitars, which use electricity in order to amplify instead of
chambers, evolved at mid century and were a part of the musical
Keywords revolution of Rock & Roll and its distortion sound.
Virtual, acoustic, uniqueness of tune, expressivity, sound
processing, rapid prototype, 3D printing, resonator. The guitar has been influenced by electrical technologies. It is to
be expected that digital technologies will now take a significant
part in the guitar evolution. While sound design has been
1. BACKGROUND conventionally done using digital software, expressive digital
Each acoustic instrument made of wood is unique. Each piece of instruments are starting to appear as well. The Line 6 Variax [5]
wood is different, leading to uniqueness of tune of the acoustic guitar gives a variety of preset sounds, from classic acoustic and
sound that is created. Both uniqueness and expressivity are the electric tones to sitar and banjo. It allows the player to plug into a
most important characteristics of the acoustic instrument. Digital computer and customize a chosen tone. Expressive playing and
instruments lack the uniqueness but usually allow more sound sound flexibility is enhanced with the digital guitar. Another
flexibility [1], by offering digital sound processing or synthesis example is Fender’s VG Stratocaster [6], a hybrid electric and
[2]. digital guitar.
Digital keyboard instruments have been significantly more
Carfoot uses the term virtual instead of digital. If digital defines
successful than bowed or plucked instruments, which suffered
the type of process being done, virtual refers better to an
from lack of expressivity and uniqueness of tune. On the one
experience’s context. Like virtual reality, the virtual sound
hand, the digital instrument can add new interfaces, controllers
created in digital environment imitates real life experience. This
and sound abilities to the musical experience. On the other hand,
experience feels like a natural experience to our senses, but it was
there is a significant cost for modeling the captured information
created with a computer model of that real life experience. In
into a pre-defined digital structure. Besides the processing
sections 2 and 3 we present our approach using the virtual sound
problem, it usually leads to decreasing or canceling the
uniqueness of tune and expressivity of the instrument. experience in order to create a new physical guitar (a conceptual
work). In section 4 we present a different vision in which the
The main approach to deal with the expressivity problem lies in guitar can also preserve unique tune of a material (a work in
the field of sound processing, instead of synthesis. One option to progress).
this approach is to capture expressive signal and modify some
parameters while preserving the expressive behavior [3].
We come to suggest a different approach. We believe that 2. COMBINING VIRTUAL AND
significant work can be done by combining benefits from both of PHYSICAL IN GUITAR DESIGN
the worlds (digital and physical) – preserving the values of 3D design, sound design and digital music software are becoming
acoustic instruments while applying digital control to their common and easier to use. Their combination is leading to the
structures. possibility of designing, simulating and printing objects according
to pre-required acoustic behavior.
67
Gershenfeld [8] presents a future realm in which personal 3D

printers become as common as color printers. RedEye RPM [9] is
a rapid prototyping company that creates guitars using digital
manufacturing technology. Synthetic materials, such as carbon-
fiber epoxy composites, could be used instead of wood in guitar
soundboards [10]. Blackbird Guitars created the Blackbird Rider
Acoustic [11], a commercial guitar digitally designed and made
from composite materials. This kind of new material enables a
significant decrease of the chamber’s size while preserving the
instrument loudness.
3. reACOUSTIC eGUITAR
Three perspectives are fundamental to the sound experience
created by a musical instrument: the listener, the performer and
the instrument constructor [12].
The vision of reAcoustic eGuitar invites players to become
creators of their acoustic instruments and their sounds with
endless possibilities for the sounds to be re-shaped. Players will
customize their own sounds by assembling different small
chambers instead of using a single large one. Each string has its
own bridge; each bridge is connected to a different chamber.
Changing the chamber size, material or shape will change the
guitar’s sound.
Designing sounds digitally allows the player to share the
experience of the constructor. This might lead in a change of
relationship between players and their instruments. Today rapid
prototype materials have a broad range of qualities. Players can
now take part in designing their own acoustic sounds, by
modifying the physical structure of their instruments, revealing
the characteristics of new materials (see Figure 1).
We created a simple chamber in rapid prototype process. This
chamber adds a significant amplification to a single string (see
Figure 2)1, even without optimizing acoustical parameters as
membrane thickness and sound box size.
In the reAcoustic eGuitar vision digital technology will be used to
design the acoustic guitar structure (see Figure 3 for a design Figure 1: Constructing principles: Searching, downloading,
suggestion). It presents a novel sound design experience between modifying, printing and assembling the chambers.
users, their objects and the digital environment.
Re-designing the guitar according to the characteristics of rapid
prototyping materials could lead to sound innovations. Open
source and shared files environments could create a reality in
which a player downloads or designs his own sound cells, and
plugs them to his instrument (see Figure 4).
Starting from virtual sound, getting the desired virtual shape and
then printing it, the reAcoustic eGuitar offers a new user
experience for the guitar player.
The main disadvantage of the reAcoustic eGuitar concept lies in
the rapid prototype process itself. The process is expensive and
doesn’t preserve uniqueness of tune as wood does. Perhaps in a
few years, 3D printers will become less expensive and more
accessible so this idea can be reconsidered.
Figure 2: The 3D printed chamber connected to single string

on a wood structure vs. string on a wood structure without a
1
3D printed chamber from 3D Systems InVision HR 3-D is chamber.
presented in ambient.media.mit.edu/projects.php, January 27,
2008.
68
unique tool that also enables the player to design the required
sound with the computer.
The uniqueness of a musical instrument influences more than just
its sound. By differing itself from other instruments, it assumes an
individual economic value and stabilizes a unique relationship
with its owner. The structure of the wood is the main reason for
the acoustic instrument’s unique behavior. The grain of the
soundboard [13], the wood’s humidity, the exact thickness and
more influence how it transfers different frequencies. Luthiers
[14,15] used their experience in order to tune the instrument by
making modification to the wood until it gave the required results.
A Physical Resonator For A Virtual Guitar focuses on the
influences of the chamber on the sound of the acoustic guitar. The
chamber’s main parameters are the shape and material [14,15].
The structure and shape can be virtually designed on a computer
and be used as a virtual chamber. The material will not be
synthesized or modulated. In this way we will get a hybrid
chamber – part of it is physical (the guitar’s resonator) and part of
it is virtual (see Figure 5).
A replaceable slice of the material (the guitar resonator) will be
connected to the guitar bridge using mechanism that enables easy
replacement. Piezo sensors will capture the frequencies being
developed on the guitar’s resonator. The signal will be transferred
to a digital signal-processing unit (DSP). The DSP will modify the
sound by simulating different chambers shapes and sizes,
thickness and surface smoothness.
Figure 3: reAcoustic eGuitar, a design suggestion.
Figure 5: Physical resonator in virtual shape.
By combining the virtual with the physical, we believe we can

preserve both worlds’ values. More than that, the new approach of
the physical resonator can play an important role in continuing the
traditional relationship between players and their unique
Figure 4: Examples of different chambers. instruments. The digital part can be replaced and updated; the
resonators can be collected and saved. A player could take one
guitar body with many resonators, instead of a lot of guitars.
The use of a physical resonator is not limited to wood. The
4. A PHYSICAL RESONATOR FOR A resonator can also be created in a rapid prototype process; similar
VIRTUAL GUITAR to the concept presented in section 3.
The former project led to a new vision, A Physical Resonator For
A Virtual Guitar. It is a concept of combining the values of the
virtual guitar with the uniqueness of the wooden acoustic guitar’s
tune. By doing so we can achieve expressive playability in a
69
5. CONCLUSION AND FUTURE WORK Version of Die Gitarre und ihr Bau by Havrey J. C., (1981).
We believe that the future of the guitar lies in the connection The Bold Strummer Ltd, First English Edition, 1981.
between digital sound design and acoustic experience. Digital [5] Line 6, Variax®. Product website line6.com/variax. Last
processing can create new options for sound design, where the accessed: January 27, 2008.
acoustic part of the instrument will give the expressivity and
[6] Fender, VG Stratocaster®. Product website
uniqueness of tune. The reAcoustic eGuitar concept is based on
www.fender.com/vgstrat/home.html. Last accessed: January
rapid prototype techniques and 3D printers. This process is 30, 2008.
expensive and not accessible to the majority of guitar players.
There is not enough knowledge and experience of using rapid [7] Carfoot G. Acoustic, Electric and Virtual Noise: The Cultural
prototype for creating acoustic instruments. However, we believe Identity of the Guitar. Leonardo music journal, Vol. 16, pp.
that this may be more feasible in the future. 35-39, 2006.
The A Physical Resonator For A Virtual Guitar is a work in [8] Gershenfeld N. FAB: The Coming Revolution on Your
progress. We believe that by creating a chamber that is part virtual Desktop - From Personal Computers to Personal Fabrication,
and part physical, we will preserve expressivity and uniqueness of pp. 3-27. Basic Books, April 12, 2005.
tune in digital sound design innovations. We intend to develop a [9] RedEye RPM. Guitar with digital manufacturing technology.
working model for A Physical Resonator For A Virtual Guitar. Company website www.redeyerpm.com. . Last accessed:
This process will be divided into different parts - from mechanical January 27, 2008.
solution for the replaceable resonator through development of
piezo sensors system that will be able to capture the resonator [10] Jonathan H. Carbon Fiber vs. Wood as an Acoustic Guitar
vibration in different locations. We also intend to develop a DSP Soundboard. PHYS 207 term paper.
unit that will implement the digital modeling of the structure. [11] Blackbird Guitars. Blackbird Rider Acoustic. Company
website www.blackbirdguitar.com. Last accessed: January
27, 2008.
6. ACKNOWLEDGMENTS [12] Kvifte T., Jensenius A. R. Towards a Coherent Terminology
Authors want to thank MIT Media Laboratory, Marco Coppiardi, and Model of Instrument Description and Design. NIME 06,
Cati Vaucelle, Nan-Wei Gong and Tamar Rucham or their help June 4-8, 2006. Paris, France.
and support.
[13] Buksnowitz C., Teischinger A., Muller U., Pahler A., Evans
R., (2006). Resonance wood [Picea abies (L.) Karst.] –
7. REFERENCES evaluation and prediction of violin makers’ quality-grading.
J. Acoustical Society of America 121, 2007.
[1] Magnusson T., Mendieta E. H. The Acoustic, The Digital
and he Body: A Survey on Musical Instrument. NIME 07, [14] Kinkead J. Build Your Own Acoustic Guitar: Complete
June 6-10, 2007. New York, New York, USA. Instructions and Full-Size Plans. Published 2004 by Hal
Leonard.
[2] Poepel C., Overholt D. Recent Developments in Violin-
related Digital Musical Instruments: Where Are We and [15] Cumpiano W. R., Natelson J. D. Guitarmaking: Tradition
Where are We Going? NIME 06, June 4-8, 2006. Paris, and Technology: A Complete Reference for the Design &
France. Construction of the Steel - String Folk Guitar & the Classical
Guitar (Guitar Reference). Published 1998 by Chronicle
[3] Merrill D., Raffle H. The Sound of Touch. CHI 2007, April
Books.
28 – May 3, 2007, San Jose, California, USA.
[4] Jahnel F. (1962). Manual of Guitar technology: The History
and Technology of Plucked String Instruments. English
70
Virtual Intimacy : Phya as an Instrument
Dylan Menzies
Dept. Computer Science and Engineering
De Montfort University
Leicester, UK
rdmg dmu.ac.uk
ABSTRACT lead to convincing results, but only if coupled carefully with

Phya is an open source C++ library originally designed for the large scale dynamics.
adding physically modeled contact sounds into computer While physical modeling provides a powerful way to gen-
game environments equipped with physics engines. We re- erate strong percepts, a balance must be struck on the
view some aspects of this system, and also consider it from level of detail, so that the output is not overly constrained.
the purely aesthetic perspective of musical expression. In practice this leads to the development of semi-physical-
perceptual models that provide some freedom for the sound
designer to more easily mould a virtual world.
Keywords It was apparent from early on, that Phya offers an in-
NIME, musical expression, virtual reality, physical model- herently musical experience, even from the limited control
ing, audio synthesis environment of a desktop. The richness of dynamic behav-
ior and multi-modal feedback are characteristic of musical
performance. A later section explores this further. Use
1. INTRODUCTION of coupled musical-visual performances has become com-
The use of impact sounds coupled to a modeled environ- mon, however performances within a physical audio-visual
ment was introduced in [5]. Refinements of impact sound world are still apparently scarce, as are physical audio-
models have since been made [1] . The first working mod- visual worlds in computer games. This state of affairs has
els for sustained contact sounds integrated with a physical prompted this article.
environment was made in [13], greatly expanding the over-
all realism of the simulation by relating audio and visual
elements continuously. Frictional models have been cre- 2. TECHNOLOGICAL REVIEW
ated for musical instruments, and have also been applied Below we briefly review the components of Phya, and the
to surfaces in simulated environments [2]. Further models overall structure used to accommodate them.
have have been proposed for other environmental sounds
including fluids [12]. In [7] a framework was presented for 2.1 Impacts
a physical audio system, designed to operate closely with a
physics engine providing rigid body dynamics. The empha- 2.1.1 Simple spring
sis was on using robust techniques that could be scaled up
The simplest impacts consist of a single excitation pulse,
easily, and accommodate an environment that was rapidly
which then drives the resonant properties of the colliding
changing. This work developed into the Phya physical audio
objects. The spectral brightness of the pulse depends on
library discussed here.
the combined hardness of the two surfaces. Using a spring
The principle goal for Phya has been to generate sounds
model, the combined spring constant, which determines the
that arise from dynamical interactions, that are either that
duration and so spectral profile of a hit, is k = (k1−1 +k2−1 )−1
are clearly visually apparent, or directly affected by user
where k1 and k2 are the spring constants of the individual
control. This is because when audio can be closely causally
surfaces. A model which just takes kpto be the lesser value
correlated to other percepts, the overall perceptual effect
is also adequate. The duration is π m/k where m is the
and sense of immersion is magnified considerably. A wide
selection of sounds fall into this category, including colli- effective mass (m−1 −1 −1
1 + m2 ) . The effective mass can be
sions between discrete solid and deformable objects. The approximated by the lesser mass. If one object is fixed like
complex dynamics of these objects is captured well by the a wall, the effective mass is the free object’s mass.
many physics engines that have been developed. The au- pThe impact displacement amplitude in this model is, A =
dio generated is a modulation of the audio rate dynamics of v m/k where v is the relative normal contact speed. To
excitation and resonance, by the relatively slow large scale give the sound designer more freedom over the relation be-
dynamics of objects. Simple audio synthesis processes can tween collision parameters and the impact amplitude, a lin-
ear breakpoint scheme is used with an upper limit also pro-
viding a primary stage of audio level limiting. Note that
the masses used for impact generation do not have to be in
Permission to make digital or hard copies of all or part of this work for
exact proportion to the dynamics engine masses.
personal or classroom use is granted without fee provided that copies are Audio sensitivity to surface hardness and object mass,
not made or distributed for profit or commercial advantage and that copies helps to paint a clearer picture of the environment. From a
bear this notice and the full citation on the first page. To copy otherwise, to musical perspective it adds variation to the sound that can
republish, to post on servers or to redistribute to lists, requires prior specific be generated, in an intuitive way.
NIME08, Genova, Italy
Copyright 2008 Copyright remains with the author(s).
2.1.2 Stiffness
71
displacement
profile at the contact point to generate an audio excitation.
pulse shorter Rolling is similar to sliding, except there is no relative
because k increases
movement at the contact point, resulting in a spectrally
above threshold
less bright version of the sliding excitation. This can be
modeled by appending a lowpass filter that can be varied
constant k/m pulses
according to the slip speed at the contact, creating a strong
cue for the dynamics there. See Figure 3. A second or-
time
der filter is useful to shape the spectrum better. The con-
tact excitation is also amplified by the normal force, in the
same way impacts are modified by collision energy. More
Figure 1: Displacements from three impacts, one of
subtle are modifications to spectral brightness according to
which is stiff.
the m/k ratio that determines the brightness of an impact.
Low m/k corresponds to a light needle reading the surface
at full brightness. Heavier objects result in slower response,
which can modeled again by controlling the lowpass filter.
contact layer Although simple, this efficient model is effective because it
surface
contact surface profile lowpass gain exittion
speed generator
Figure 2: A grazing impact. / position freq
m/k
Impact stiffness is important for providing cues to the slip speed
listener about impact dynamics, because it causes spec-
tral changes in the sound depending on impact strength, normal force
whereas impact strength judged from the amplitude level
of an impact received by a listener is ambiguous because Figure 3: Surface excitation from rolling and slid-
of the attenuating effect of distance. Stiffness can be mod- ing.
eled by making the spring constant increase with impact
displacement. This causes an overall decrease in impact takes in the full dynamic information of the contact and
duration for an increase in impact amplitude, and makes it uses it to shape the audio which we then correlate with the
spectrally brighter, illustrated in Figure 1. The variation visual portrayal of the dynamics. It is also easily customized
in stiffness with impulse is a property of the surface and to fit the sound designers requirements. When flat surfaces
can be modeled reasonably well with a simple breakpoint are in contact over a wide area this can be treated as sev-
scheme, that can be tuned by the sound designer directly. eral spaced out contact points, which can often be supplied
Increasing brightness with note loudness is an important directly by the dynamics-collision system.
attribute of many musical instruments, acoustic and elec-
tronic, and is rooted in our everyday physical experience. 2.2.2 Contact jumps
It might even be called a universal element of expression. Even for a surface that is completely solid and smooth,
Phya incorporates this behavior naturally. the excitations do not necessarily correspond very well with
the surface profile. A contact may jump creating a small
2.1.3 Multiple hits and grazing micro-impact, due to the blunt nature of the contact sur-
Sometimes several hits can occur in rapid succession. A faces, see Figure 4. The sound resulting from this is signifi-
given physics engine would be capable of generating this im- cant and cannot be produced by reading the surface profile
pact information down to a certain time scale. The effect directly. Again, the detailed modeling of the surface inter-
can be simulated by generating secondary impulses accord- actions is beyond the capabilities available from dynamics
ing to a simple poisson-like stochastic process, so that for and collisions engines, which are not designed for this level
a larger impact the chance of secondary impacts increases. of detail. Good results can instead be achieved by adding
Also common are grazing hits, in which an impact is as- the jumps, pre-processed, into the profile, Figure 5. Down-
sociated with a short period of rolling and sliding. This sampling a jump results in a bump, unless it is sampled
is because the surfaces are uneven, and the main impulse with sufficient initial resolution, which may be impracti-
causing the rebound occurs during a period of less repulsive cal. A useful variation is therefore to downsample jumps to
contact. Such fine dynamics cannot be captured by a typ- jumps, by not interpolating. This retains the ’jumpiness’
ical physics engine. However, good results can be achieved and avoids the record-slowing-down effect.
by combining an audio impulse generation with a continu-
ous contact generation, according to the speed of collision 2.2.3 Programmatic and stochastic surfaces
and angle of incidence, see Figure 2. The component of ve- Stored profiles can be mapped over surface areas to cre-
locity parallel to the surface is used for the surface contact ate varying surface conditions. This can be acceptable for
speed. sparse jump-like surfaces that can be encoded at reduced
sample rates, but in general the memory requirements can
2.2 Continuous contacts be unreasonable. An alternative is to describe surfaces pro-
grammatically, either in a deterministic or fully stochastic
2.2.1 Basic model way. The advantage of a largely deterministic process is
Continuous contact generation is a more complex pro- that repetitions of a surface correlate closely, for instance
cess. The first method introduced, [13], was to mimic a when something is rolling back and forth, providing consis-
needle following the groove on a record. This corresponds tency cues to the dynamic behavior even without visuals.
to a contact point on one surface sliding over another sur- Indexable random number generators provide a way to de-
face, and is implemented by reading or generating a surface terministically generate random surfaces. Others include
72
linked to the movement of the door. Stick and slip for dis-
crete solid objects is simulated well by the generation of
pulses at regular linear or angular intervals. The amplitude
and spectral profile of the pulses modifying as the contact
force and speed changes. As contact force increases, nor-
mally the interval between each pulse increases, due to the
increased static friction limit, with more or less constant
impact lateral spring constant.
2.2.5 Buzzing
Common phenomena are buzzing and rattling at a con-
tact, caused by objects in light contact that have been set
Figure 4: Micro-impact occuring due to contact ge- vibrating. Like impact stiffness, it provides a distant in-
ometry dependent cue of dynamic state, which in this case is the
amplitude of vibration. Objects that are at first very quiet
can become loud when they begin to buzz, due to the non-
linear transfer of low frequency energy up to higher frequen-
cies that are radiated better. Precise modeling of this with
a dynamics-collision engine would be infeasible. However,
the process can be modeled well by clipping the signal from
the main vibrating object, as shown in Figure 7, and feed-
ing it to the resonant objects that are buzzing against each
other. The process could be made more elaborate by cal-
culating the mutual excitation due to two surfaces moving
Figure 5: Preprocessing a surface profile to include against each other.
jumps.
repeating functions to generate pattern based surfaces such

as grids.
A useful range of surfaces can be generated by stochas-
tically generating pulses of different widths, with control
over the statistical parameters. A change of contact speed
is then achieved by simply varying the parameters.
Secondary excitations can also be generated stochasti-
cally, for instance to simulate the disturbance of gravel on
a surface, in a similar manner to the physically informed
footsteps in [3], Figure 6. In this scheme the collision pa- Figure 7: Clipping of resonator output to provide
buzz excitation.
tangent speed
lowpass poisson random lowpass particle
normal force
filter event gen amp pulse filter resonance
2.3 Resonators
Figure 6: Modeling loose surface particle sound. 2.3.1 Modal resonators, calibration, location de-
pendence
rameters are used to determine the activity rate of a poisson There are many types of resonator structure that have
like process which then generates impulses mimicking the been used to simulate sounding objects. For virtual envi-
collisions of gravel particles. A low frequency lowpass fil- ronments we require a minimal set of resonators that can be
ter is used to simulate the duration of the particle spray easily adapted to a wide variety of sounds, and can be effi-
following an impact. The impulses have randomly selected ciently run in numbers. The earliest forms of resonator used
amplitudes and are shaped or filtered to reflect increased for this purpose were modal resonators [5, 13] which con-
particle collision brightness with increased contact force and sist of parallel banks of second order resonant filters, each
speed, before exciting a particle resonance. This model sim- with individual coupling constants and damping. These are
plifies the fact that at high system collision energies there particularly suited to objects with mainly sharp resonances
will still be particle collisions occurring at low energy. It such as solid objects made from glass, stone and metal. It is
also assumes all particles have the same resonance. The possible to identify spectral peaks in the recording of a such
model does however have sufficient dynamic temporal and an object, and also the damping by tracking how quickly
spectral behavior to be interesting. Three levels of dynam- each peak decays, [11]. A command line tool is included
ics can be distinguished here, the gross object dynamics, with Phya for automating this process. The resultant data
the simulated gravel dynamics, and audio resonance. The is many times smaller than even a single collision sample.
detail that can be encoded in surface excitations is critical Refinements to this process included sampling over a range
from the musical point of view. It provides the foundation of impact points, and using spatial sound reconstruction.
from which the full sounds evolves. The associated complexities were not considered a priority
in Phya. Hitting an object in different places produces dif-
2.2.4 Friction ferent sounds, but just hitting an object in the same place
Friction stick and slip processes are important in string repeatedly produces different sounds each time, due to the
instruments. In virtual environments they are much less changing state of the resonant filters. It is part of the at-
common source of sound than the interactions considered traction of physical modeling that such subtleties are man-
so far. A good example is door creaking, which is visually ifested. If needed, an collision object can be broken up into
73
several different collision objects, and different Phya sound There is a common class of objects that are not com-
objects associated with these. pletely rigid, but still resonate clearly, for example a thin
sheet of metal. Such objects have variable resonance char-
2.3.2 Diffuse resonance acteristics depending on their shape. While explicit model-
For a large enough object of a given material the modes ing of the resonance parameters according to shape is pro-
become very numerous and merge into a diffuse continuum. hibitive, an excellent qualitative effect that correlates well
This coincides with the emergence of time domain struc- with visual dynamics is to vary the resonator parameters
ture at scales of interest to us, so that for instance a large about a calibrated set, according variations of shape from
plate of metal can be used to create echos and reverbera- the nominal. This can be quantified in a physical model
tion. For less dense, more damped material such as wood, of a deformable model by using stress parameters or linear
pronounced diffuse resonance occurs at modest sizes, for in- expansion factors. The large scale oscillation of such a body
stance in chairs and doors. Such objects are very common modulates the audio frequencies providing an excellent ex-
in virtual environments and yet a modal resonator is not ample of audiovisual dynamic coupling.
efficiently able to model diffuse resonance, or be matched
to a recording. Waveguide methods have been employed to 2.4 Phya overall structure and engine
model diffuse resonance either using abstract networks, in- Phya is built in the C++ language, and is based around
cluding banded waveguides [4], feedback delay networks [9] a core set of general object types, that can specialized and
or more explicit structures such as waveguide meshes [14, extended. Sounding objects are represented by a contain-
15]. An alternative approach introduced in [6], is to mimic ing object called a Body, which refers to an associated Sur-
a diffuse resonator by dividing the excitation into frequency face and Resonator object, see Figure 9. Specializations
bands, and feeding the power in each into a multi-band noise of these include SegmentSurface for recorded surface pro-
generator, via a filter that generates the time decay for each files, RandSurface for deterministically generated stochas-
band, see figure 8. This perceptual resonator provides a dif- tic surfaces, GridSurface for patterns. The resonator sub-
fuse response that responds to the input spectrum. When types are ModalResonator and PerceptualResonator. Bod-
combined with modal modeling for lower frequencies it can ies can share the same surface and resonator if required in
efficiently simulate wood resonance, and can be easily ma- order to handle groups of objects more efficiently. Colli-
nipulated by the sound designer. A similar approach had sions states are represented using Impact and Contact ob-
been used in [10] to simulate the diffuse resonance of sound jects that are dynamically created and released as collisions
boards to hammer strikes, however the difference here is occur between physical objects. These objects take care of
that the resonator follows the spectral profile of a general updating the state of any associated surface interactions.
input.
body
noise
resonator surface
bandpass gain
bandpass envelope lowpass impact contact

follower
+
body1 impact generator body1 contact generator
body2 body2
bandpass gain
Figure 9: Main objects in Phya.
bandpass envelope lowpass
follower
Figure 8: Outline of a perceptual resonator.

2.4.1 System view
The top level system view is shown in Figure 10. The
collision system in the environment simulator must gener-
2.3.3 Surface damping ate trigger updates in Phya’s collision update section, for
example using a callback system. This in turn reads dy-
A common feature of resonant objects is that their damp-
namic information from the dynamics engine and updates
ing factors are increased by contact with other objects. For
parameters that are used by the Phya audio thread to gen-
instance a cup placed on a table sounds less resonant when
erate audio samples. The most awkward part of the process
struck. This behavior has a strong visual-dynamic coupling,
is finding a way for Phya to keep track of continuous con-
and provides information about the surfaces. It can be sim-
tacts.
ulated by accumulating a damping factor for each resonator
as a sum of damping factors associated with the surfaces
that are in contact.
Dynamics Collision update
2.3.4 Nonlinear resonance

Many objects enter non-linear regimes when vibrating
Collision DSP / Audio
strongly, sometimes causing a progressive shift of energy
to higher frequencies. For a modal system this can be mod-
eled by exciting higher modes by lower modes via nonlinear
couplings. In waveguide systems the non-linearities can be Physics engine Phya
built into the network.
Figure 10: Phya system overview.
2.3.5 Deformable objects
74
2.4.2 Tracking contacts From a more abstract view, the layered, multi-scale dynam-
Most collision engines do not use persistent contacts, mean- ics within Phya capture the layered dynamics present in
ing they forget information about contacts from one colli- real acoustic instruments. It is sometimes claimed that this
sion frame to another. On the other hand Phya wishes structure is particularly relevant to musical performance,
to remember contacts because it has audio processes that [8]. Electronic performance systems often fail to embody
generate excitations continuously during a contact. The the full range of dynamic scales, even within physically
problem can be attacked either by modifying the collision modeled instruments, which sometimes lack physical con-
engine, which is hard or not possible, or searching contact trol interfaces with appropriate embedded dynamics.
lists. In the simplest case, the physics engine provides a list Although grounded in physical behavior, and therefore
of non-persistent physical contacts at each collision step, naturally appealing to human psychology, the intimate in-
and no other information. For each physical contact, the teractions can be tailored to more unusual simulations that
associated Phya bodies can be found and compared with would be difficult or impossible in the real world. For
a list of current Phya contact pairs. If no pair matches a instance very deep resonances can be easily created that
new Phya contact is formed. If a pair is found, it is asso- would require very heavy objects, and unusual resonances
ciated with the current physical contact. For any pairs left can be created. Likewise, the parameters of surfaces can be
unmatched, the associated Phya contact is released. See composed to ensure the desired musical effect. The physical
Figure 11. This works on the, mostly true, assumption that behavior of objects can be matched to any desired scale, of
if a physical contact exists between two bodies in two suc- distance, time or gravity. Because the graphical world is
cessive frames then that is a continuous contact evolving. If virtual it too can be composed artistically with more free-
two bodies are in contact in more than one place then some dom than the real world.
confusion can occur, but this is offset by the fact that the The graphical output not only provides additional feed-
sound is more complex. Engines that keep persistent con- back to the performer, but adds the kind of intimate visual
tacts are easier to handle. The ability to generate callbacks association, present in traditional musical performance, but
when contacts are created and destroyed helps even more. lacking in much live electronic music, especially that focused
around keyboard and mouse control. Phya provides the au-
Physical Body1 Phya Body1 dience with an alternative to the performer as a visual focus.
Physical look in Phya Phya The mouse interface is readily extended to a more hapti-
Contact Contacts Contact cally and visually appropriate controller using a device such
Physical Body2 Phya Body2 as a Nintendo Wii remote. This has the effect of making the
control path correspond directly to the object path, improv-
Figure 11: Find a Phya contact from a Physical ing the sense of immersion for the performer. In a CAVE
contact. like environment the performer can maneuver within a spa-
tial audio environment, although without an audience. In a
full headset virtual reality environment, the performer can
2.4.3 Smooth surfaces interact directly with objects through virtual limbs, with
Another problem of continuous contacts arises from the virtual co-performers and virtual audience.
collision detection of curved surfaces. For example the col- While Phya has not been used yet to produce an extended
lision of a cylinder can be detected using a dedicated algo- musical work, we discuss musical aspects of some demon-
rithm, or a more general one applied to a collision net that strations. Figures 12 and 13, show simple examples of sonic
approximates a cylinder. From a visual dynamic point of toys constructed with Phya. In the first nested spheres form
view the general approach may appear satisfactory. How- a kind of virtual rattle, with the lowest resonance associated
ever, the dynamic information produced may lead to audio with the biggest sphere. The user interacts by dragging the
that is clearly consistent with an object with corners and middle sphere around by invisible elastic. The second shows
not smooth. A way to improve this situation is to smooth a deformable teapot with a range of resonances. The defor-
the dynamic information when it is intended that the sur- mation parameters are used to modify the resonant frequen-
face is smooth, using linear filters. This requires Phya to cies on the fly. The effect is at once familiar and surreal.
check the tags on the physical objects associated with a new Further examples demonstrate the stacking of many differ-
contact to see if smoothing is intended. ent resonant blocks. Configuring groups of blocks becomes
a musical, zen-like process.
2.4.4 Limiters
The unpredictable nature of physical environmental sound
requires automated level control, both to ensure it is suf-
ficiently audible and also not so loud to dominate other
audio sources or to clip the audio range. This has already
been partly addressed at the stage of excitation generation,
however because of the unpredictability of the whole sys-
tem, it is also necessary to apply limiters to the final mix.
This is best achieved with a short look-ahead brick wall lim-
iter, that can guarantee a limit, while also reducing annoy-
ing artifacts that would be caused without any look-ahead. Figure 12: Nested sonic spheres.
Too much look-ahead would compromise interactivity, how-
ever the duration of a single audio system processing vector,
which is typically 128 samples, is found to be sufficient. 4. COPING WITH NETWORK LATENCY
There has been considerable interest in collaborative in-
3. A VIRTUAL MUSICAL INSTRUMENT teractive musical performance over networks. One aspect
While Phya was designed for general purpose virtual worlds, of such systems is the delay or latency required to transmit
the variety and detail of sonic interactions on offer lend information around the network, which can be musically
themselves to the creation of musical virtual instruments. significant for long distance collaborations. In the case of
75
6. CONCLUSION
The original goal was to create a system that can capture
the sonic nuance and variety of collisions, and that was easy
to configure and use within a virtual reality context. This
required the consideration of a variety of inter-dependent
factors. The result is a system that is not only useful from
the point of view of virtual reality, but has natural aesthetic
Figure 13: Deformable sonic teapot. interest and application in musical performance. The inte-
grated graphical output is part of a fused perceptual aes-
thetic. Phya is now an open source project. 2 .
performance with acoustic instruments, it is impossible to
make each side hear the same total performance while also 7. REFERENCES
playing their instruments normally. Virtual instruments of [1] F. Avanzini, M. Rath, and D. Rocchesso.
the kind described here offer another possibility, due to the Physically-based audio rendering of contact. In Proc.
fact that the dynamics of the virtual world is strictly sepa- IEEE Int. Conf. on Multimedia and Expo,
rated from the control in the outer world. Figure 14 shows (ICME2002), Lausanne, volume 2, pages 445–448,
a collaboration between two performers across a network. 2002.
Adding local delays to match the network latency keeps the [2] F. Avanzini, S. Serafin, and D. Rocchesso. Interactive
simulation of rigid body interaction with
friction-induced sound generation. IEEE Tr. Speech
and Audio Processing, 13(5.2):1073–1081, 2005.
Delay D Virtual world Latency D Virtual world Delay D [3] P. Cook. Physically informed sonic modeling (phism):
Synthesis of percussive sounds. Computer Music
Journal, 21:3, 1997.
[4] G. Essl, S. Serafin, P. Cook, and J. Smith. Theory of
Figure 14: Two performers with local virtual banded waveguides. Computer Music Journal, spring
worlds. 2004.
[5] J. K. Hahn, H. Fouad, L. Gritz, and J. W. Lee.
two virtual worlds synchronized. In each world the audio Integrating sounds and motions in virtual
and graphical elements are of course synchronized. Per- environments. In Sound for Animation and Virtual
formance gestures are delayed, but this is not such a severe Reality, SIGGRAPH 95, 1995.
handicap because the visual feedback remains synchronized, [6] D. Menzies. Perceptual resonators for interactive
and is a price worth paying to maintain overall synchroniza- worlds. In Proceedings AES 22nd International
tion over the network. If control is by force rather than po- Conference on Virtual, Synthetic and Entertainment
sition, the gesture delay is even less intrusive. To eliminate Audio, 2002.
drift between the virtual worlds, and handle many perform- [7] D. Menzies. Scene management for modelled audio
ers efficiently, a central virtual world can be used, shown in objects in interactive worlds. In International
Figure 15 This adds return latency delays. Conference on Auditory Display, 2002.
[8] D. Menzies. Composing instrument control dynamics.
Organized Sound, 7(3), April 2003.
[9] D. Rochesso and J. O. Smith. Circulant and elliptic
Virtual world feedback delay networks for artificial reverberation.
Latency 1 Latency 2 IEEE trans. Speech and Audio, 5(1):1997, 1997.
[10] J. O. Smith and S. A. Van Duyne. Developments for
the commuted piano. In Proceedings of the
International Computer Music Conference, Banff,
Figure 15: Many performers with a central virtual Canada, 1995.
world. [11] K. van den Doel. Sound Synthesis for Virtual Reality
and Computer Games. PhD thesis, University of
British Columbia, 1998.
[12] K. van den Doel. Physically-based models for liquid
5. BACK TO REALITY sounds. ACM Transactions on Applied Perception,
The aesthetics of Phya partly inspired a tangible musi- 2:534–546, 2005.
cal performance piece, that we mention briefly because it [13] K. van den Doel, P. G. Kry, and D. K. Pai.
provides an interesting example of how the boundary be- Foleyautomatic: Physically-based sound effects for
tween virtual and real can become blurred. Ceramic Bowl1 interactive simulation and animation. In Computer
centers around a bowl with 4 contact microphones attached Graphics (ACM SIGGRAPH 01 Conference
around the base, where there is a hole. Objects are launched Proceedings), 2001.
manually into the bowl where they roll, slide and collide in [14] S. A. Van Duyne and J. O. Smith. Physical modeling
orbit until they exit. The captured sound is computer pro- with the 2-d digital waveguide mesh. In Proc. Int.
cessed under realtime control and diffused onto an 8 speaker Computer Music Conf., Tokyo, 1993.
rig. The microphone arrangement allows the spatial sound [15] S. A. Van Duyne and J. O. Smith. The 3d tetrahedral
events to be magnified over a large listening area. digital waveguide mesh with musical applications. In
Proceedings International Computer Music
1
First performed at the Electroacoustic Music Studies con- Conference, 2001.
ference, Leicester, 14 June 2007. Broadcast on BBC Radio
2
3 Hear and Now, 25 August 2007 Details at www.zenprobe.com/phya
76
Creating Pedagogical Etudes for Interactive Instruments

Jennifer Butler
University Of British Columbia
Vancouver, B.C. Canada
+1.604.999.1143
jaebutler@yahoo.com
ABSTRACT glove, or any other interactive electronic instrument. The

In this paper I discuss the importance of and need for creation of such a method will, I believe, help to guide both
pedagogical materials to support the development of new composers and instrument builders in the development of a
interfaces and new instruments for electronic music. I describe composed repertoire for interactive instruments, and an increase
my method for creating a graduated series of pedagogical in the expressive capabilities of both the performers and the
etudes composed using Max/MSP. The etudes will help instruments they use.
performers and instrument designers learn the most commonly
used basic skills necessary to perform with interactive
electronic music instruments. My intention is that the final
series will guide a beginner from these initial steps through a
graduated method, eventually incorporating some of the more
advanced techniques regularly used by electronic music
composers.
I describe the order of the series, and discuss the benefits (both
to performers and to composers) of having a logical sequence of
skill-based etudes. I also connect the significance of skilled
performers to the development of two essential areas that I
perceive are still just emerging in this field: the creation of a
composed repertoire and an increase in musical expression
during performance.
Figure 1. The p5 glove
Keywords
Pedagogy, musical controllers, Max/MSP, etudes, composition, 2. THE ETUDES
repertoire, musical expression
2.1 Providing a Musical Context
Since the eighteenth century, it has been common practice for
1. INTRODUCTION composers and performers to write etudes for the development
The inspiration for developing a series of concert-etudes for of technique on virtually every established instrument. All
interactive musical instruments grew from my experiences instrumentalists who have achieved some level of virtuosity on
creating and performing music with a P5 glove (see figure 1). their instruments have done so through diligent practice of
Like most composers working in this field, I was not only technical exercises such as scales, arpeggios, tone practice, and
designing the music, but also learning how to perform on this composed etudes.
new instrument. Predictably, I found myself limited by my lack
of technical skill. I observe this to be a common problem Wanderley and Orio [6] describe another important purpose of
among composers and instrument designers in this field, with etudes: evaluation of different instruments. They describe a
performances featuring interactive electronics often sounding method used to compare interactive music systems. This
more like demonstrations or experiments than musical method uses short, repetitive “musical tasks.” With traditional
performances. musical instruments, they explain, “this task is facilitated thanks
to the vast music literature available. This is not the case [for]
As a musician with numerous years of training, I was not interactive music instruments that have a limited, or even
surprised that I needed to put in significant time to become nonexistent, literature.”
proficient on this instrument. However, it is not only time that is
needed to learn an instrument, but also a method. Currently, Etudes fulfill an important role in learning an instrument by
there are no existing methods for learning how to play a P5 providing an ingredient that short repetitive exercises cannot: a
musical context for the techniques they are teaching. [3] As a
composer, I propose that instead of compensating for the lack of
personal or classroom use is granted without fee provided that copies are repertoire, we start composing a literature for interactive
not made or distributed for profit or commercial advantage and that electronic music instruments.
2.2 Virtuosity
NIME08, June 5-7, 2008, Genova, Italy Historically, one important role of the etude has been to build
Copyright remains with the author(s). virtuosity. For the purposes of this paper, I am using the
77
definition of virtuosity put forward by Dobrian and Koppelman Etude 8 introduces different methods of synthesis (for example:
[1]: “the ability to call upon all the capabilities of an instrument granular and FM), and Etude 9 is a study in changing tempos.
at will with relative ease.” As the authors point out, when Etude 10, the final etude in the series, brings together all skills
working with computers it does not make sense to judge learned in the earlier etudes.
virtuosity only by the factor of speed, because computers can
unquestionably play faster than humans. Each of these introductory etudes is notated along a timeline
that the performer must follow, using a clock that has been
When a performer has achieved virtuosity on an instrument, placed in the etude patch. (see figure 2). The main goal is for
many levels of control and technique have become the performer to become fluent enough on the instrument in
subconscious, and “when control of the instrument has been these basic control parameters so that when further complexity
mastered to the point where it is mostly subconscious, the mind is added the performer will be ready.
has more freedom to concentrate consciously on listening and
expression.” [1]
Virtuosic performers are highly valuable to composers and
instrument designers. Without virtuosic performers, and
instruments capable of adequate expression, composers cannot
hear their music fully realized. In many cases, instrument
designers and programmers have to rely on their own, often Figure 2. Example of Notation
limited, performing skills when first testing a new piece or
instrument.
Etudes help to develop virtuosity, and therefore play a crucial Complexity is increased gradually throughout the series. It is
role in further developing a repertoire for an instrument. understood that the level of complexity might depend on the
Without etudes, players of acoustic instruments would not be characteristics of each interactive instrument. The main method
able to handle the technique needed to perform musical works, of adding complexity is to increase the number of different
and composers would not have performers to play the music control elements (for example, the number of triggers or
they imagine. As it says in the New Grove Dictionary, “the true different layers of sounds to be controlled) or by increasing the
virtuoso has always been prized not only for his rarity but also speed at which these elements need to be controlled. The first
for his ability to widen the technical and expressive boundaries three etudes use only one dimension, layer, or direction of
of his art.” [4] moveable data (constant flow between 0 and 127). Etudes 5 and
6 will involve two such layers. For example, one stream of data
could control volume, and the other spatialization. With some
3. STARTING AT THE BEGINNING instruments or mappings, the gestures that control this data may
3.1 Ordering the Series be completely separate (such as with a keyboard, or different
My initial series of etudes includes ten graduated studies that pedals), and with others they may be more connected (such as
introduce the basic skills needed to manipulate different with a wii, glove, or mouse). The final etudes will be the most
elements of musical sound. This series is designed for a complex: including many control parameters and requiring
beginner or novice performer on interactive instruments. The more intricate synchronization.
etudes are designed to create a non-intimidating experience for However, it is important to keep in mind that for now this is a
a musician with little or no previous experience with electronic series of beginner etudes, designed to prepare a beginning
music. performer for future compositions that may require a much
In choosing which musical elements and types of controls to higher level of complexity and technique.
include in the etudes, and in which order they will appear, I
have also created a priority list. Undoubtedly, my etudes focus 4. COMPATIBILITY
on the skills and musical elements most likely to be needed for
my own compositions. However, I have tried to make the 4.1 A Universal Interface
etudes stylistically diverse. By the end of the series the One of the most important features of these etudes is their
performer will have experience with: triggers, toggle, and more adaptability to many different controllers. Each etude is
fluid or constant parameters. designed so it is playable by any device that can produce the
required types of data. The interface for each etude lists the data
needed and provides the necessary links into the etude. For
3.2 The Etudes example, Etude 1 requires an instrument that can produce eight
Each etude contains four elements: 1 – a basic description of the separate triggers for sample playback (see figure 3).
purpose and intent of the etude, including a simulation
performance of the etude; 2 – a graphically notated score; 3 – Different mappings and interpretations can easily be tried with
the Max/MSP etude patch; and 4 – a Max/MSP patch that will each etude. This flexibility will allow performers to practice
be used to connect the interactive instrument to the etude patch. different movements for different musical parameters, helping
them to assess which of the movements will work best.
Etude 1 introduces the performer to different approaches to Performers can gain a deeper understanding of the particular
rhythm and synchronization. At times rhythmic freedom is strengths and weaknesses of their instrument.
encouraged, and at times strict rhythm is required. Etude 2
focuses on pitch control, while Etude 3 focuses on dynamic, or The etudes do not require specific movements, so the performer
volume control. Etude 4 combines the elements of rhythm, can choreograph all the gestures. For example, depending on
pitch, and volume control. Etude 5 focuses on spatialization and the instrument being used, different actions can activate each
localization, and Etude 6 on timbre and envelope manipulation. trigger; different parameters (position, amplitude, pitch) can
Etude 7 combines the elements used in the first six etudes. produce the same types of continuous numbers – yet the
78
resulting sounds will always be the same. Similar gestures, marimba. The various strengths and weaknesses of each
listening skills, and types of coordination are used by a large instrument become quickly apparent when repertoire is shared.
number of interactive instruments. Therefore, the skills a Also, many composers, notably John Cage, have written pieces
performer develops while learning this series of etudes on one for open instrumentation. Performances of these works can
controller will very likely be transferable to other controllers. vary widely depending on the instruments chosen.
Traditional etudes are also typically practised using a variety of
approaches that challenge players in a variety of ways (for
example, with different articulations or dynamic levels).
Figure 4. Etude 1 User Interface
Figure 3. Etude 1 Interface
4.2 The Etude Patches

Each etude will have two Max/MSP components. The primary
component is the etude patch (see figures 3 and 5). This patch
contains all the programming needed for each etude, and should
not be edited. Each etude patch includes an On/Off switch,
reset button, simulation button, and clock. The patch shows all
the needed information for performing the specific etude.
Each etude will also come with an optional User Interface (see
figures 4 and 6). This interface will include all the “send
objects” needed to communicate with the etude patch, as well as
information about the type of data that the etude patch is
programmed to receive. Performers will need to edit this patch
or create a new patch that sends the necessary information from
their interactive instrument into the etude patch.
4.3 A Shared Repertoire

Having a notated repertoire that can be performed by different
musicians, as well as different instruments, is important to the Figure 5. Etude 2 Interface
development of any musical genre. Currently there is no such
repertoire for interactive electronic instruments, and
consequently no way to make musical comparisons between
4.4 Point of Reference
performers or instruments. One significant role these etudes will fill is providing a reliable
point of reference when making comparisons between
There is also extensive historical precedent for sharing performers, performances, different instruments (level of
repertoire across instruments, especially when the repertoire for subtlety and expressiveness achievable; ease of learning;
one instrument is lacking. For example, several sonatas in the performer reactions), and different mappings. Each etude will
violin canon (Franck, Mozart, and Prokofiev) are commonly also focus on different musical or control elements, allowing a
also played on the flute, and the Bach Sonatas for solo cello are user to quickly determine the controller’s effectiveness and
performed on many instruments, including trombone and ability in each aspect of music.
79
The etudes may also be a good test of which type of controller Interactive electronic music is an emerging field that has yet to
might be best suited for a certain piece of music. This could be solidly establish a repertoire or performance practice. I believe
especially useful while the piece is still being composed. A one of the most important steps in developing both of these
more skilled performer could easily learn these basic etudes on fundamental parts of a musical genre is to create a method for
several different controllers and quickly evaluate their learning performance technique. In the near future I hope to see
effectiveness on many musical levels. As Wanderley and Orio strong performances of well-written pieces replacing the
state, “Musical tasks are already part of the evaluation process demonstrations and experiments that currently occupy many
of acoustic musical instruments, because musicians and concert spots. For this to occur I believe composers, instrument
composers seldom choose an instrument without extensive designers and performers must work together.
testing to how specific musical gestures can be performed.” [6]
These etudes can strengthen such collaborations by providing a
foundation for evaluation of both the instrument and the
performer. This basis for evaluation is an essential ingredient in
building a lasting repertoire for interactive instruments.
6. ACKNOWLEDGMENTS
I am very grateful to Dr. Bob Pritchard and Dr. Keith Hamel for
their support of this project, programming help, and generous
feedback on my work. Thank you also to my husband Michael
Begg for his invaluable editing skills. This project is supported
in part through the Social Science and Humanities Research
Council of Canada, grant 848-2003-0147, and by the University
of British Columbia Media And Graphics Interdisciplinary
Centre (MAGIC) the UBC Institute for Computing, Information
and Cognitive Science (ICICS), and the School of Music.
7. REFERENCES
[1] Dobrian, C., and Koppelman, D. “The ‘E’ in NIME:
Musical Expression with New Computer Interfaces”.
Proceedings of the 2006 Conference on New Interfaces for
Musical Expression (NIME06), Paris, France, 2006.
[2] Fels, S., Gadd, A., and Mulder, A. “Mapping transparency
through metaphor: towards more expressive musical
instruments”. Organised Sound 7:2, 109-126. Cambridge
Figure 6. Etude 2 User Interface University Press, 2002.
[3] Ferguson, H., and Hamilton, K. L. “Study”. Grove Music
Online. L. Macy, ed. http://www.grovemusic.com
5. CONCLUSIONS [4] Jander, O. “Virtuoso”. Grove Music Online. L. Macy, ed.
My primary goals in writing these etudes are to: http://www.grovemusic.com
1. Create a learning environment in which beginners can [5] Lazzetta, F. “Meaning in Musical Gesture”. Trends in
experience a non-intimidating introduction to interactive Gestural Control of Music, M. M. Wanderley and M.
performance. Battier, eds. Paris, Fr: IRCAM - Centre Georges
2. Encourage other composers and performers to create their Pompidou, 2000.
own etudes and pieces that can be exchanged to broaden the [6] Wanderley, M. M., and Orio, N. “Evaluation of Input
level of shared knowledge, and help to define the skills needed Devices for Musical Expression: Borrowing Tools from
for performing on interactive electronic instruments. HCI”. Computer Music Journal 26:3, 62-76. MIT Press,
3. Create a tool that will guide performers and instrument 2002.
builders towards higher levels of control and musical
expression.
80
Discourse analysis evaluation method for expressive

musical interfaces
Dan Stowell, Mark D. Plumbley, Nick Bryan-Kinns

Centre for Digital Music
Queen Mary, University of London
London, UK
dan.stowell@elec.qmul.ac.uk
ABSTRACT sion” [4].

The expressive and creative affordances of an interface are Using precision-of-reproduction as a basis for evaluation
difficult to evaluate, particularly with quantitative methods. also becomes problematic for musical systems which are not
However, rigorous qualitative methods do exist and can be purely deterministic. “Randomness” would seem to be the
used to investigate such topics. We present a methodology antithesis of precision, and therefore undesirable according
based around user studies involving Discourse Analysis of to some perspectives, yet there are many musical systems
speech. We also present an example of the methodology in which stochastic or chaotic elements are deliberately in-
in use: we evaluate a musical interface which utilises vocal troduced.
timbre, with a user group of beatboxers. The question arises of how to evaluate interfaces more
broadly than precision-of-reproduction. It is difficult to de-
sign an experiment that can reliably and validly measure
Keywords qualities such as expressiveness and aesthetics.
Evaluation, qualitative methods, discourse analysis, voice, Poepel [10] operationalises “expressivity” into a number
timbre, beatboxing of categories for stringed-instrument playing, and investi-
gates these numerically using tasks followed by Likert-scale
1. INTRODUCTION questionnaires. This limits users’ responses to predefined
categories, although a well-designed questionnaire can yield
One of the motives for founding the NIME conference was useful results. Unfortunately Poepel analyses the data us-
to foster dialogue on the evaluation of musical interfaces ing mean and ANOVA, which are inappropriate for Likert-
[11]. Yet a scan of NIME conference proceedings finds only scale (ordinal) data [6]. The questionnaire approach also
a few papers devoted to the development or application of largely reduces “expressivity” down to “precision” since in
rigorous evaluation methods. Many published papers do not this case, the tasks presented concern the reproduction of
include evaluation, or include only informal evaluation (e.g. musical units such as vibrato and dynamical changes.
quotes from, or general summaries of, user interviews). This Paine et al [9] use a qualitative analysis of semi-structured
may of course be fine, depending on the paper’s purpose and interviews with musicians, to derive “concept maps” of fac-
context, and the stage of development of the research. But tors involved in expressive performance (for specific instru-
the further development of well-founded evaluation methods ments). These are not used for evaluation, rather to guide
can only be of benefit to the field. design. In the evaluation of their instrument, the authors
In a very useful discussion, Wanderley and Orio [18] look turn to a quantitative approach, analysing how closely users
to the wider field of Human-Computer Interaction (HCI) for can match the control data used to generate audio exam-
applicable methodologies, and suggest specific approaches ples.
for evaluating musical interfaces. Much of HCI focuses on We propose that qualitative methods approaches may
interfaces which can be evaluated using goal-based tasks, prove to be useful tools for the evaluation of musical in-
where measurements can be made of (for example) how terfaces. This paper aims to be a contribution in that area,
long a task takes, or how often users fail to achieve the applying a rigorous qualitative method to study the use and
goal. Wanderley and Orio’s framework follows this route, affordances of a new musical interface.
recommending that experimenters evaluate users’ precision
in reproducing musical units such as glissandi or arpeggios.
Later work uses Wanderley and Orio’s framework [9, 10].
1.1 Discourse Analysis
Precision is important for accurate reproduction. But for Interviews and free-text comments are sometimes reported
composers, sound designers, and performers of expressive in studies on musical interfaces. However, often they are
or improvised music, it is not enough: interfaces should conducted in a relatively informal context, and only quotes
(among other things) be in some sense intuitive and offer or summaries are reported rather than any structured anal-
sufficient freedom of expression [11, 8]. “Control = expres- ysis, therefore providing little analytic reliability. Good
qualitative methods penetrate deeper than simple summaries,
offering insight into text data [1]. Discourse Analysis (DA)
is one such approach, developed and used in disciplines such
Permission to make digital or hard copies of all or part of this work for as linguistics, psychology, and social sciences [14, chapter 6].
personal or classroom use is granted without fee provided that copies are Essentially, DA’s strength comes from using a structured
not made or distributed for profit or commercial advantage and that copies method which can take apart the language used in dis-
bear this notice and the full citation on the first page. To copy otherwise, to courses (e.g. interviews, written works) and elucidate the
republish, to post on servers or to redistribute to lists, requires prior specific connections and implications contained within, while re-
NIME08, Genova, Italy maining faithful to the content of the original text [1]. DA
Copyright 2008 Copyright remains with the author(s). is designed to go beyond the specific sequence of phrases
81
used in a conversation, and produce a structured analysis

of the conversational resources used, the relations between
entities, and the “work” that the discourse is doing.
Uszkoreit [17] summarises the aim of DA very compactly:
The problems addressed in discourse research
aim to answer two general kinds of questions:
(1) what information is contained in extended
sequences of utterances that goes beyond the
meaning of the individual utterances themselves?
(2) how does the context in which an utterance
is used affect the meaning of the individual ut-
terances, or parts of them?
We should point out that DA is not usually regarded as one
single method – rather, it’s an approach to analysing texts.
Someone looking for the single recipe to perform a DA of
a text will be disappointed. However, specific DA methods
do exist in the literature. Our DA method is elaborated in
section 3.3.
In this paper we use DA to analyse interview data, in
the context of a project to develop voice-based interfaces
for controlling musical systems. First we give an overview
of the interface we wish to evaluate.
Figure 1: Constructing a timbre space for timbre
2. VOICE TIMBRE REMAPPING remapping
With recent improvements in timbre analysis and in com-
puter power, the potential arises to analyse the timbre of a
signal in real-time, and to use this analysis as a controller
for synthesis or for other processes – in particular, the po- Such a mapping from one timbre space to another de-
tential to “translate” the timbral variation of one source pends on being able to find a suitable “nearest neighbour”
into the timbral variation of another source. This is the in the target space. This is facilitated if the spaces are cov-
process which we refer to as timbre remapping [16]. De Poli ered by a similar distribution of data points, ensuring that
and Prandoni [3] made an early attempt at such control, the resolution of a timbral trajectory can be adequately re-
more recently investigated by Puckette [13]. flected in the target timbre space. This is why we perform
One of the main issues is the construction of a useful a warping during the construction of the timbre space: it
timbre space for the purpose of timbre remapping. Timbre ensures that the timbre dimensions are covered in a certain
is often very loosely defined, and often taken to refer to all way (guaranteeing various aspects of the distribution such
aspects of a sound beyond its pitch and loudness [7]. There as that it is centred and its standard deviation lies within
are many options as to which acoustic features to derive, a certain range).
and how to transform them, so as to provide a continuous One aspect of our timbre remapping system is that we
space that provides useable control to the performer. Some typically wish to remove pitch-dependencies from the tim-
features exhibit interactions with pitch, and the variation of bral data. Many acoustic measures such as MFCCs or spec-
some features may depend strongly upon the type of source. tral percentile statistics can exhibit interactions with pitch.
In the present work we derive a heterogeneous set of Our current approach to mitigating this is to include a pitch
timbral features, mostly spectral but some time-domain. analysis as one of the features passed to the PCA and there-
We then apply a Principal Components Analysis (PCA) to fore used in constructing the space. We then identify the
decorrelate the features and reduce dimensionality. Finally PCA component with the largest contribution from pitch,
we apply a piecewise linear warping (using the range, mean, and discard that, on the assumption that it is essentially
and standard deviation statistics) to shape the distribution composed of “pitch plus the pitch-dependent components of
of data points; we will come back to the reasons for this other features”. This approach makes simplifying assump-
shortly. The construction of the timbre space is summarised tions such as the linearity of pitch and timbre dimensions
in figure 1. and their interaction, but it leads to usable results in our
Thus far we have a procedure for creating a timbre space experience.
based on any input signal. We might want to analyse two
different classes of signal in this way, and then map the tim- 2.1 Real-time operation
bral trajectory of one system onto another: for example, use We wish to develop a timbre remapping system that can
the timbral trajectory of a voice to control the settings of operate efficiently in real-time, so the relative speed and ef-
a synthesiser, and produce the corresponding timbral tra- ficiency of the processes used is paramount. In fact this is
jectory. To do this, we take a point in the voice’s timbre the strongest motivation behind using PCA for the decor-
space, and find its nearest neighbour in the synthesiser’s relation, dimension reduction, and pitch-removal. PCA is
timbre space. If we can retrieve the synthesiser parameters a straightforward process and computationally very simple
which created this timbre, we can send those parameters to to apply. More sophisticated methods, including non-linear
the synthesiser, thus “remapping” from the vocal timbre to methods, exist, and may be capable of improved results
the synthesiser timbre. This approach has the advantage of (such as better pitch-removal), but imply a significant cost
being independent of the exact relation of the target sys- in terms of the processing power required.
tem’s control space to its timbre space: it works even if the Efficiency is also important in the process which retrieves
target system’s controls have highly nonlinear and obscure a nearest-neighbour data point from the target system’s
relation to the timbres produced. timbre space. We use a k d-tree data structure [12, chapter
82
2] for fast multidimensional search. The focus-group tradition provides a well-studied approach
to such group discussion [15]. Our group session has a lot
3. METHOD in common with a typical focus group in terms of the fa-
cilitation and semi-structured group discussion format. In
In evaluating a musical interface such as the above, we
addition we make available the interface(s) under consid-
wish to develop a qualitative method which can explore
eration and encourage the participants to experiment with
issues such as expressivity and affordances for users. Lon-
them during the session.
gitudinal studies may be useful, but imply a high cost in
As in the solo sessions, the transcribed conversation is
time and resources. Therefore our design aims to provide
the data to be analysed, which means that a neutral facili-
users with a brief but useful period of exploration of a new
tation technique is important – to encourage all participants
musical interface, including interviews and discussion which
to speak, to allow opposing points of view to emerge in a
we can then analyse.
non-threatening environment, and to allow the group to ne-
In any evaluation of a musical interface one must decide
gotiate the use of language with minimal interference.
the context of the evaluation. Is the interface being eval-
uated as a successor or alternative to some other interface 3.3 Data analysis
(e.g. an electric cello vs an acoustic cello)? Who is ex-
Our DA approach to analysing the data is based on that
pected to use the interface (e.g. virtuosi, amateurs, chil-
of [2, p. 95–102], adapted to our study context. The DA of
dren)? Such factors will affect not only the recruitment
text is a relatively intensive and time-consuming method.
of participants but also some aspects of the experimental
It can be automated to some extent, but not completely,
setup.
because of the close linguistic attention required. Our ap-
Our method is designed either to trial a single interface
proach consists of the following five steps:
with no explicit comparison system, or to compare two sim-
ilar systems (as is done below in our case study). The (a) Transcription
method consists of two types of user session (solo sessions
The speech data is transcribed, using a standard style of
followed by group session(s)), plus the Discourse Analysis
notation which includes all speech events (including repe-
of data collected.
titions, speech fragments, pauses). This is to ensure that
3.1 Solo sessions the analysis can remain close to what is actually said, and
avoid adding a gloss which can add some distortion to the
In order to explore individuals’ personal responses to the
data. For purposes of analytical transparency, the tran-
interface(s), we first perform solo sessions in which a partic-
scripts (suitably anonymised) should be published alongside
ipant is invited to try out the interface(s) for the first time.
the analysis results.
If there is more than one interface to be used, the order of
presentation is randomised in each session. (b) Free association
The solo session consists of three phases for each interface:
Having transcribed the speech data, the analyst reads it
Free exploration The participant is encouraged to try out through and notes down surface impressions and free asso-
the interface for a while and explore it in their own ciations. These can later be compared against the output
way. from the later stages.
Guided exploration The participant is presented with au- (c) Itemisation of transcribed data
dio examples of recordings created using the interface, The transcript is then broken down by itemising every sin-
and encouraged to create recordings inspired by those gle object in the discourse (i.e. all the entities referred to).
examples. This is not a precision-of-reproduction task; Pronouns such as “it” or “he” are resolved, using the par-
precision-of-reproduction is explicitly not evaluated, ticipant’s own terminology as far as possible, and for every
and participants are told that they need not replicate object an accompanying description is extracted, of the ob-
the examples. ject as it is in that instance – again using the participant’s
Semi-structured interview The interview’s main aim is own language, essentially by rewriting the sentence/phrase
to encourage the participant to discuss their experi- in which the instance is found.
ences of using the interface in the free and guided ex- The list of objects is scanned to determine if different
ploration phases, both in relation to prior experience ways of speaking can be identified at this point. Also, those
and to the other interfaces presented if applicable. objects which are also “actors” (or “subjects”) are identified
Both the free and guided phases are video recorded, – i.e. those which act with agency in the speech instance;
and the interviewer may play back segments of the they need not be human.
recording and ask the participant about them, in or- It is helpful at this point to identify the most commonly-
der to stimulate discussion. occurring objects and actors in the discourse.
The raw data to be analysed is the interview transcript. (d) Reconstruction of the described world
Our aim is for the participant to construct their own de- Starting with the list of most commonly-occurring objects
scriptions and categories, which means it is very important and actors in the discourse, the analyst reconstructs the
that the interviewer is experienced in neutral interview tech- depictions of the world that they produce. This could for
nique, and can avoid (as far as possible) introducing labels example be achieved using concept maps to depict the inter-
and concepts that do not come from the participant’s own relations between the actors and objects. If different ways
language patterns. of speaking have been identified, there will typically be one
reconstructed “world” per way of speaking. Overlaps and
3.2 Group session contrasts between these worlds can be identified.
To complement the solo sessions we also conduct a group The “worlds” we produce are very strongly tied to the
session: peer group discussion can produce more and dif- participant’s own discourse. The actors, objects, descrip-
ferent discussion around a topic, and can demonstrate the tions, relationships, and relative importances, are all de-
group negotiation of categories, labels, comparisons, etc. rived from a close reading of the text. These worlds are
83
essentially just a methodically reorganised version of the available online1 . In the study, condition “Q” was used to
participant’s own language. refer to the system with timbre remapping active, “X” for
In our particular context, we may be interested in the the system with timbre remapping inactive.
user’s conceptualisation of musical interfaces. It is partic-
ularly interesting to look at how these are situated in the 4.1 Reconstruction of the described world
described world, and particularly important to avoid pre-
conceptions about how users may describe an interface: for User 1
example, a given interface could be: an instrument; an ex-
User 1 expressed positive sentiments about both Q and X,
tension of a computer; two or more separate items (e.g. a
but preferred Q in terms of sound quality, ease of use and
box and a screen); an extension of the individual self; or it
being “more controllable”. In both cases the system was
could be absent from the discourse.
construed as a reactive system, making noises in response
(e) Examining context to noises made into the microphone; there was no concep-
tual difference between Q and X – for example in terms of
The relevant context of the discourse typically depends on affordances or relation to other objects.
the field of study, for example whether it is political or The “guided exploration” tasks were treated as reproduc-
psychological. Here we have created an explicit context tion tasks. User 1 described the task as difficult for X, and
of other participants. After running the previous steps of easier for Q, and situated this as being due to a difference
DA on each individual transcript, we compare and contrast in “randomness” (of X) vs. “controllable” (of Q).
the described worlds produced from each transcript, first
comparing those in the same experimental condition (i.e. User 2
same order of presentation, if relevant), then across all par-
ticipants. We also compare the DA of the focus group ses- User 2 found the the system (in both modes) “didn’t sound
sion(s) against that of the solo sessions. very pleasing to the ear”. His discussion conveyed a per-
vasive structured approach to the guided exploration tasks,
in trying to infer what “the original person” had done to
4. THE METHOD IN ACTION: EVALUAT- create the examples and to reproduce that. In both Q and
ING VOICE TIMBRE REMAPPING X the approach and experience was the same.
In our study we wished to evaluate the timbre remapping Again, User 2 expressed preference for Q over X, both
system with beatboxers (vocal percussion musicians), for in terms of sound quality and in terms of control. Q was
two reasons: they are one target audience for the technol- described as more fun and “slightly more funky”. Interest-
ogy in development; and they have a familiarity and level ingly, the issues that might bear upon such preferences are
of comfort with manipulation of vocal timbre that should arranged differently: issues of unpredictability were raised
facilitate the study sessions. for Q (but not X), and the guided exploration task for Q
We recruited by advertising online (a beatboxing website) was felt to be more difficult, in part because it was harder
and around London for amateur or professional beatboxers. to infer what “the original person” had done to create the
Participants were paid £10 per session plus travel expenses examples.
to attend sessions in our (acoustically-isolated) studio. We
recruited five participants from the small community, all
User 3
male and aged 18–21. One took part in a solo session; one User 3’s discourse placed the system in a different context
in the group session; and three took part in both. Their compared to others. It was construed as an “effect plugin”
beatboxing experience ranged from a few months to four rather than a reactive system, which implies different affor-
years. Their use of technology for music ranged from min- dances: for example, as with audio effects it could be ap-
imal to a keen use of recording and effects technology (e.g. plied to a recorded sound, not just used in real-time; and the
Cubase). description of what produced the audio examples is cast in
In our study we wished to investigate any effect of provid- terms of an original sound recording rather than some other
ing the timbre remapping feature. To this end we presented person. This user had the most computer music experience
two similar interfaces: both tracked the pitch and volume of of the group, using recording software and effects plugins
the microphone input, and used these to control a synthe- more than the others, which may explain this difference in
siser, but one also used the timbre remapping procedure to contextualisation.
control the synthesiser’s timbral settings. The synthesiser User 3 found no difference in sound or sound quality be-
used was an emulated General Instruments AY-3-8910 [5], tween Q and X, but found the guided exploration of X more
which was selected because of its wide timbral range (from difficult, which he attributed to the input sounds being more
pure tone to pure noise) with a well-defined control space varied.
of a few integer-valued variables. We used the method as
described in section 3. Analysis of the interview transcripts User 4
took approximately 10 hours per participant (around 2000 User 4 situated the interface as a reactive system, similar
words each). to Users 1 and 2. However, the sounds produced seemed to
We do not report a detailed analysis of the group session be segregated into two streams rather than a single sound –
transcript here: the group session generated information a “synth machine” which follows the user’s humming, plus
which is useful in the development of our system, but little “voice-activated sound effects”. No other users used such
which bears directly upon the presence or absence of timbral separation in their discourse.
control. We discuss this outcome further in section 5. “Randomness” was an issue for User 4 as it was for some
In the following, we describe the main findings from anal- others. Both Q and X exhibited randomness, although X
ysis of the solo sessions, taking each user one by one before was much more random. This randomness meant that User
drawing comparisons and contrasts. We emphasise that 4 found Q easier to control. The pitch-following sound was
although the discussion here is a narrative supported by
quotes, it reflects the structures elucidated by the DA pro- 1
http://www.elec.qmul.ac.uk/digitalmusic/papers/
cess – the full transcripts and Discourse Analysis tables are 2008/Stowell08nime-data/
84
felt to be accurate in both cases; the other (sound effects / A uniform outcome from all participants was the con-
percussive) stream was the source of the randomness. scious interpretation of the guided exploration tasks as precision-
In terms of the output sound, User 4 suggested some small of-reproduction tasks. This was evident during the study
differences but found it difficult to pin down any particular sessions as well as from the discourse around the tasks. As
difference, but felt that Q sounded better. one participant put it, “If you’re not going to replicate the
examples, what are you gonna do?”
4.2 Examining context A notable absence from the discourses, given our research
context, was discussion which might bear on expressivity,
Effect of order-of-presentation for example the expressive range of the interfaces. Towards
Users 1 and 2 were presented with the conditions in the or- the end of each interview we asked explicitly whether either
der XQ; Users 3 and 4 in the order QX. Order-of-presentation of the interfaces was more expressive, and responses were
may have some small influence on the outcomes: Users 3 generally non-commital. We propose that this was because
and 4 identified little or no difference in the output sound our tasks had failed to engage the participants in creative
between the conditions (User 4 preferred Q but found the or expressive activities: the (understandable) reduction of
difference relatively subtle), while Users 1 and 2 felt more the guided exploration task to a precision-of-reproduction
strongly that they were different and preferred the sound of task must have contributed to this. We also noticed that
Q. It would require a larger study to be confident that this our study design failed to encourage much iterative use of
difference really was being affected by order-of-presentation. record-and-playback to develop ideas. In section 5 we sug-
In our study we are not directly concerned with which gest some possible future directions to address these issues.
condition sounds better (both use the same synthesiser in
the same basic configuration), but this is an interesting as- 5. DISCUSSION
pect to come from the study. We might speculate that
The analysis of the solo sessions provides useful infor-
differences in perceived sound quality are caused by the dif-
mation on the user experience of a voice-controlled music
ferent way the timbral changes of the synthesiser are used.
system and the integration of timbre remapping into such
However, participants made no conscious connection be-
a system. Here we wish to focus on methodological issues
tween sound quality and issues such as controllability or
arising from the study.
randomness.
Above we raised the issue that our “guided exploration”
Considerations across all participants task, in which participants were asked to record a sound
sample on the basis of an audio example, was interpreted
Taking the four participant interviews together, no strong as a precision-of-reproduction task. Possibilities to avoid
systematic differences between Q and X are seen. All par- this in future may include: using audio examples which
ticipants situate Q and X similarly, albeit with some nu- are clearly not originally produced using the interface (e.g.
anced differences between the two. Activating/deactivating string sections, pop songs), or even non-audio prompts such
the timbre remapping facet of the system does not make a as pictures; or forcing a creative element by providing two
strong enough difference to force a reinterpretation of the examples and asking participants to create a new recording
system. which combines elements of both.
A notable aspect of the four participants’ analyses is the Other approaches which encourage creative work with an
differing ways the system is situated (both Q and X). As interface could involve tasks in which participants are asked
designers of the system we may have one view of what the to create compositions, or iteratively develop live perfor-
system “is”, perhaps strongly connected with technical as- mance. We would expect that the use of more creative
pects of its implementation, but the analyses presented here tasks should produce more participant discussion of cre-
illustrate the interesting way that users situate a new tech- ative/expressive aspects of an interface.
nology alongside existing technologies and processes. The Such tasks could also be used to provide more structure
four participants situated the interface in differing ways: ei- during the group sessions: one reason the group session
ther as an audio effects plugin, or a reactive system; as a produced less relevant data than the solo sessions is (we
single output stream or as two. We emphasise that none believe) the lack of activities, which could have provided a
of these is the “correct” way to conceptualise the interface. more structured exploration of the interfaces.
These different approaches highlight different facets of the
interface and its affordances.
During the analyses we noted that all participants main- 6. CONCLUSIONS
tained a conceptual distance between themselves and the We have applied a detailed qualitative analysis to user
system, and analogously between their voice and the output studies involving a voice-driven musical interface with and
sound. There was very little use of the “cyborg” discourse without the use of timbre remapping. It has raised some
in which the user and system are treated as a single unit, interesting issues in the development of the interface, in-
a discourse which hints at mastery or “unconscious compe- cluding the unproblematic integration of the timbral aspect,
tence”. This fact is certainly understandable given that the and the nuanced interaction of issues such as control and
participants each had less than an hour’s experience with randomness.
the interface. It demonstrates that even for beatboxers with However, the primary aim of this paper has been to in-
strong experience in manipulation of vocal timbre, control- vestigate the use of Discourse Analysis to provide a robust
ling the vocal interface requires learning – an observation qualitative approach to evaluating the affordances and user
confirmed by the participant interviews. experience of a musical interface. Results from our DA-
The issue of “randomness” arose quite commonly among based user study indicate that with some modification of
the participants. However, randomness emerges as a nu- the user tasks, the method can derive detailed information
anced phenomenon: although two of the participants de- about how musicians interact with a new musical interface
scribed X as being more random than Q, and placed ran- and accommodate it in their existing conceptual repertoire.
domness in opposition to controllability (as well as prefer- We have presented one specific method for evaluating a
ence), User 2 was happy to describe Q as being more random musical interface, but of course there may be other appro-
and also more controllable (and preferable). priate methods. As discussed in the introduction, the state
85
of the art in evaluating musical interfaces is relatively un- [18] M. M. Wanderley and N. Orio. Evaluation of input
derdeveloped, and we would hope to encourage others to devices for musical expression: Borrowing tools from
explore reliable methods for evaluating new musical inter- HCI. Computer Music Journal, 26(3):62–76, 2002.
faces in authentic contexts.
7. REFERENCES
[1] C. Antaki, M. Billig, D. Edwards, and J. Potter.
Discourse analysis means doing analysis: A critique of
six analytic shortcomings. Discourse Analysis Online,
2004.
[2] P. Banister, E. Burman, I. Parker, M. Taylor, and
C. Tindall. Qualitative Methods in Psychology: A
Research Guide. Open University Press, Buckingham,
1994.
[3] G. De Poli and P. Prandoni. Sonological models for
timbre characterization. Journal of New Music
Research, 26(2):170–197, 1997.
[4] C. Dobrian and D. Koppelman. The ‘E’ in NIME:
musical expression with new computer interfaces. In
Proceedings of New Interfaces for Musical Expression
(NIME), pages 277–282. IRCAM, Centre Pompidou
Paris, France, 2006.
[5] General Instrument. GI AY-3-8910 Programmable
Sound Generator datasheet, early 1980s.
[6] J. Harvey. Evaluation Cookbook, chapter So You Want
to Use a Likert Scale? Learning Technology
Dissemination Initiative, 1998.
[7] J. Kreiman, D. Vanlancker-Sidtis, and B. R. Gerratt.
Defining and measuring voice quality. In Proceedings
of From Sound To Sense: 50+ Years of Discoveries in
Speech Communication, pages 115–120. MIT, June
2004.
[8] T. Magnusson and E. H. Mendieta. The acoustic, the
digital and the body: A survey on musical
instruments. In Proceedings of New Interfaces for
Musical Expression (NIME), June 2007.
[9] G. Paine, I. Stevenson, and A. Pearce. The thummer
mapping project (ThuMP). In Proceedings of New
Interfaces for Musical Expression (NIME), pages
70–77, 2007.
[10] C. Poepel. On interface expressivity: A player-based
study. In Proceedings of New Interfaces for Musical
Expression (NIME), pages 228–231, 2005.
[11] I. Poupyrev, M. J. Lyons, S. Fels, and T. Blaine. New
Interfaces for Musical Expression. Workshop proposal,
2001.
[12] F. P. Preparata and M. I. Shamos. Computational
Geometry: An Introduction. Springer-Verlag, 1985.
[13] M. Puckette. Low-dimensional parameter mapping
using spectral envelopes. In Proceedings of the
International Computer Music Conference
(ICMC’04), pages 406–408, 2004.
[14] D. Silverman. Interpreting Qualitative Data: Methods
for Analysing Talk, Text and Interaction. Sage
Publications Inc, 2nd edition, 2006.
[15] D. W. Stewart. Focus groups: Theory and practice.
SAGE Publications, 2007.
[16] D. Stowell and M. D. Plumbley. Pitch-aware real-time
timbral remapping. In Proceedings of the Digital
Music Research Network (DMRN) Summer
Conference, July 2007.
[17] H. Uszkoreit. Survey of the State of the Art in Human
Language Technology, chapter 6 (Discourse and
Dialogue). Center for Spoken Language
Understanding, Oregon Health and Science
University, 1996.
86
HCI Methodology For Evaluating Musical Controllers: A

Case Study
Chris Kiefer Nick Collins Geraldine Fitzpatrick

Department of Informatics Department of Informatics Interact Lab
University of Sussex, University of Sussex, Department of Informatics
Brighton, UK Brighton, UK University of Sussex,
Brighton, UK
C.Kiefer@sussex.ac.uk N.Collins@sussex.ac.uk
G.A.Fitzpatrick@sussex.ac.uk
ABSTRACT tion of musical controllers. Since then, research in this area

There is small but useful body of research concerning the has been relatively sparse and the adoption of these evalu-
evaluation of musical interfaces with HCI techniques. In ation techniques by the community seems relatively low. A
this paper, we present a case study in implementing these review of the 2007 NIME proceedings, for example, showed
techniques; we describe a usability experiment which eval- that 37% of papers presenting new instruments described
uated the Nintendo Wiimote as a musical controller, and some sort of formal usability testing, though often not ref-
reflect on the effectiveness of our choice of HCI methodolo- erenced to the wider HCI literature. One possible reason
gies in this context. The study offered some valuable results, for the slow uptake of HCI methods is that the practical-
but our picture of the Wiimote was incomplete as we lacked ities of carrying out a usability study are something of a
data concerning the participants’ instantaneous musical ex- black box as, understandably, papers tend to focus on re-
perience. Recent trends in HCI are leading researchers to sults rather than methodology. It is clear that there is a lot
tackle this problem of evaluating user experience; we review more that could be done to draw on HCI; this paper is a
some of their work and suggest that with some adaptation it modest response to this challenge by explicitly articulating
could provide useful new tools and methodologies for com- the processes and lessons learnt in applying HCI methodol-
puter musicians. ogy within the music field.
To date there is only limited HCI literature which fo-
cusses specifically on computer music. Höök [5] examines
Keywords the use of HCI in interactive art, an area which shares
HCI Methodology, Wiimote, Evaluating Musical Interac- common ground with computer music. She describes her
tion methodology for evaluating interaction in an installation,
and examines the issue of assessing usability when artists
1. INTRODUCTION might want to build systems for unique rather than ‘normal’
users; music shares similar characteristics with art. Poepel
A deep understanding of a musical interface is a desir-
[10] presents a method for evaluating instruments through
able thing to have. It can provide feedback which leads to
the measurement of musical expressivity. This technique is
an improved design and therefore a better creative system;
based on psychology research on cues for musical expres-
it can show whether a design functions as it was designed
sion; it evaluates players’ estimations of a controller’s capa-
to, and whether it functions in ways which may have been
bility for creating these cues. Wanderley and Orio [11] have
unexpected. The field of Human Computer Interaction [4]
conducted the most comprehensive review of HCI usabil-
provides tools and methodologies for evaluating computer
ity methodologies which can be applied to the evaluation
interfaces, but applying these to the specific area of com-
of musical systems. They discuss the importance of test-
puter music can be problematic. HCI methodology has
ing within well defined contexts or metaphors, and suggest
evolved around a task-based paradigm and the stimulus-
some that are commonly found in computer music. They
response interaction model of WIMP systems, as opposed
propose the use of simplistic musical tasks for evaluation,
to the richer and more complex interactions that occur be-
and highlight features of controllers which are most relevant
tween musicians and machines. Höök [5], discussing the re-
in usability testing: learnability, explorability, feature con-
lationship between HCI and installation art, suggests that
trollability and timing controllability. Their research fitted
art and HCI are not easily combined, and this may also be
best with our objectives for evaluating the Wiimote, and
true in the multi-disciplinary field of computer music.
had the largest influence of the methodology used.
Wanderley and Orio’s article [11] from 2002 built a bridge
We present a case study on the musical usability of the
between HCI usability evaluation methodology and com-
Nintendo Wiimote. This practical example will help to
puter music, reviewing current HCI research and suggesting
ground this research and provide a talking point for the
ways in which it could be applied specifically to the evalua-
employment of HCI evaluation for interactive systems. We
will go on to review more recent developments in HCI, from
the so called ‘third paradigm’, and discuss how they might
be applied in our field in the future.
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to 2. A CASE STUDY
republish, to post on servers or to redistribute to lists, requires prior specific The semi-ubiquitous Nintendo Wiimote is becoming pop-
NIME08, Genoa, Italy ular with musicians, as can be seen from the multiplicity of
Copyright 2008 Copyright remains with the author(s). demo videos on YouTube. This motivated us to carry out
87
each task the participants were given a period of practice

time; after each task they would be interviewed while the
experience was still fresh, and asked about their preferred
controller. We considered using the ‘think aloud’ method
of gathering data during the tasks, but decided that this
would be incompatible with a musical study as it would
distract the participants’ attention. All data was recorded
for later analysis.
A script was written which described the events in the
experiment and the wording of the interview questions in
order to help the experimenter keep these constant for each
participant. Participants were asked up front about factors
which might affect the experiment such as musical experi-
ence and experience of using the Wiimote. After each task,
questions probed their experience in using the controllers.
To reduce learning effect, the order of use of the HandSonic
and the Wiimote were alternated between participants.
A call for participants was sent out to university mailing
Figure 1: The Nintendo Wiimote and the Roland
lists and local musicians, 21 people volunteering in total.
HandSonic
The study commenced as a rolling pilot, with experimental
parameters being checked and adjusted until a stable setup
a formal evaluation, asking the broad question how use- had been reached. This was important in particular to ad-
ful is the wiimote as a musical controller? Answering this just the difficulty of the tasks and to assess what could be
question presents a number of challenges. What should be fitted into the 30 minute runtime. The first four sessions
evaluated to give an overall picture of the device? How ended up as pilot sessions, so the final results were taken
can the capabilities of the controller be judged with a min- from the remaining 17 participants.
imum of influence on the results from software design and During the study participants were videoed, to observe
an acknowledgement of the potentially differing musical and their gestures while using the Wiimote and also to record
gaming skill levels of the participants? the interviews which occurred throughout the experiment.
The SuperCollider audio software [9] was used to construct
2.1 Experimental Design the experiment. This software allowed us to record a log file
The Wiimote is essentially a wireless 3-axis accelerometer of the participants’ actions which would be analysed later
with some buttons and an IR camera. Due to dependence for quantitative results.
on the force of gravity, only rotation around the roll and
tilt axes are effective for accurate measurement. Follow- 2.2 Post-Experiment Analysis
ing Wanderley and Orio’s guidelines, we decided to test the The initial data analysis fell into two main areas, the
core musical capabilities of the accelerometer using simplis- analysis of the qualitative interview data and of the quan-
tic musical tasks; an evaluation of the basic functions could titative log file data. Results of the analysis were stored in
be extrapolated from this to help assess the Wiimote’s use a MySQL database to facilitate flexible analysis later.
in more complex musical situations. The core functions that The analysis of the interview data happened over sev-
were tested were triggering (with drumming-like motions), eral stages. We used a process of reduction from the raw
continuous control using the roll and tilt axes, and gestu- videos to a final document containing statements summaris-
ral control using shape recognition. Continuous control was ing the participants’ answers to each question along with
divided into two categories; precise and expressive. any interesting quotes. Key parts of answers from the in-
Practical constraints had to be considered. The partici- terviews were transcribed from the video data and stored
pants in the study were volunteers, so a balance had to be in the database. These quotes were then coded according
struck between the length of the experiment and the de- to an emerging set of categories and then re-coded until
gree to which this might dissuade potential participants. A the categorisation was stable. The categorised set of quotes
length of 30 minutes was selected. was summarised to produce the final document of results.
Whenever possible, in order to give the participants a For the quantitative data, the log files were processed in
baseline for comparison of the Wiimote functions, a con- SuperCollider to extract specific data such as timing infor-
troller which represented a typical way of performing the mation from the triggering task. This data was exported to
musical tasks was provided. The Roland HPD-15 Hand- MATLAB for statistical analysis using ANOVA and other
Sonic was selected for this purpose, as it has a drum pad for tests.
comparison of triggering and knobs for comparison of con- Because we wish to concentrate on methodology, we only
tinuous control tasks. The data from this controller would have space to give highlights of the results of this study. In
also provide a basis for statistical comparisons. interview, several people commented on the lack of physical
The triggering task involved participants drumming a set feedback in the triggering task, saying that this made it dif-
of simple patterns along with a metronome, to obtain tim- ficult to determine the triggering point. The pitch task re-
ing data. The precise control task required participants to vealed some insights into the ergonomics of the device; some
co-ordinate discrete changes of pitch of a sawtooth wave to participants described how going past certain points of rota-
the beats of a metronome, and was repeated once for each tion felt unnatural. Some perceived it as less accurate than
Wiimote axis as well as for turning a knob on the Hand- the HandSonic. Participants commented on the Wiimote’s
Sonic. The expressive control task involved modifying the intuitive nature when used for expressive control. They de-
filter and grain density parameters of a synthesiser patch si- scribed it as ‘embodied’, and some felt that it widened the
multaneously. Finally, the gestural recognition task was to scope of editing possibilities. In general, many participants
control five tracks of percussion, by muting and un-muting commented on the fun aspect of using the Wiimote, even
layers through casting shapes with the Wiimote. Before when they may have preferred the HandSonic. An overall
88
criticism was of the device’s lack of absolute positioning ca- side of the controller, and insight into global trends of the
pability. The statistics revealed little significant difference participants. However, the conclusions reached from these
between the two controllers; participants displayed no over- results alone seemed to be a limited measure of the device
all preference and the timing errors showed no significant compared to the subtlety of the participants’ observations.
variance. Did the study result in a complete answer in relation to
the research question, how useful is the Wiimote as a musi-
2.3 Reflections cal controller? It’s difficult to answer this objectively, but it
What we’ve done by presenting this case study is to make can be observed that the results showed a detailed and inti-
explicit the practical issues of conducting a usability exper- mate understanding of the controller in a musical context.
iment. This is often mundane detail that gets omitted from One important thing the results do lack is any measure of
experimental reports, but may be the type of essential detail the participants’ experience while using the controller. The
that will make it easier to others to try out HCI methods. more interesting results came from post-task interviews, but
As such, it is worth noting a few key points about how the there is no data about their experience in the moment while
study was implemented. Firstly, the importance of a pilot they were using the device, something that would seem im-
study is easy to under-estimate. The best way to expose portant for a musical evaluation. This gap in the results is
flaws in a script is to put it into practice; for valid results the partly due to lack of technology and partly due to a lack
experimental parameters need to stay constant throughout of methodology. How can musicians self-report their ex-
the study, so flaws need to be removed at this early stage. perience while they are using a musical controller without
In retrospect, the difficulty of some tasks in the Wiimote disrupting the experience itself? Are there post-task evalua-
study could still have been better optimised at pilot stage tion techniques that can give a more accurate and objective
to suit the range of participants’ skill levels. analysis of a musical experience than an interview? More
Secondly, an issue of particular importance in a musical recent research in HCI is starting to address similar issues
usability study is allotted practice time. There’s a lower and can point to possibilities.
limit on the time participants need to spend becoming ac-
customed to the features of an instrument; getting this 3.1 The ‘Third Paradigm’
amount wrong can result in unrepresentative attempts at Kaye et. al. [7], in 2007, described a growing trend in
a task, concealing the true results. Again, this is something HCI research towards experience focused rather than task
which can be assessed during the pilot study. focused HCI. With this trend comes the requirement for new
Thirdly, the gathering of empirical data presents some evaluation techniques to respond to the new kinds of data
challenges. In order for the data to be valid, the partic- being gathered. This trend is a response to the evolving
ipants needed to perform the tasks in the same way, always in which technology is utilised as computing becomes
though getting people to perform a precise task can be dif- increasingly embedded in daily life, a shift in focus away
ficult especially when you have creative people performing from productivity environments [8], and from evaluation of
a creative task. There needs to be some built in flexibility efficiency to evaluation of affective qualities [3]. As HCI is
in the tasks which allows for this. increasingly involved in other ‘highly interactive’ fields of
Finally, the time and effort in transcribing interviews can- computing such as gaming and virtual reality, the require-
not be under estimated. Even supposing voice-recognition ment for evaluating user experience becomes stronger. This
software of sufficient accuracy was available to help avoid new trend is known as the ‘third paradigm’, and researchers
the hard slog of manual annotation, it might be at the have started to tackle some of the challenges presented by
cost of the researcher not engaging so deeply with the data this approach.
by parsing it themselves. An alternative approach is tran- The Sensual Evaluation Instrument (SEI), designed by
scribing just the ‘interesting’ sections, which can save time, Isbister et.al. [6], is a means of self-reporting affect while
though this selection process entails some subjectivity. Tag- interacting with a computer system. Users utilise a set of
ging log file data for analysis was also a long process, as the biomorphic sculptured shapes to provide feedback in real-
correct data had to be found manually by comparison to time. Intuitive and emotional interaction occur cognitively
the video; this could have been improved if the logging had on a sub-symbolic level so the system uses non-verbal com-
been automatically synchronised to the video data. munication in order to more directly represent this. With
its sub-verbal reporting method, the SEI is a step in the
3. DISCUSSION right direction for evaluation of musical interfaces; however,
The previous section discussed the details of applying the as the reporting technique already involves some interaction
specific methodology we used in this study. It is also useful itself, it could only be used effectively in less interactive con-
to reflect more generally on the structuring of the case study texts such as evaluating some desktop software. The most
and the efficacy of the HCI evaluation. Was it useful to dynamic example of its use is from the designers’ tests with
carry out the Wiimote usability study with the methods we a computer game, and they acknowledge in their results
chose? Where were the gaps in the results and how could that it’s not ideal for time-critical interfaces or tasks that
the methodology be improved to narrow these gaps? require fine-grained data.
The most ‘interesting’ results came from analysis of the For more interactive tasks such as playing a musical con-
interview data. The interviews confirmed some expected troller, a non-interactive data gathering mechanism is es-
results about the controller but more usefully brought up sential, so the measurement of physiological data may yield
some unexpected issues that some people found with certain realtime readings without interrupting the users’ attention.
tasks, and some surprising suggestions about how the con- Some studies concentrate on this area of evaluation. Chateau
troller could be used. This is the kind of data that shows and Mersiols’ AMUSE system [1] is designed to collect and
the benefits of conducting a usability study, the kind of synchronise multiple sources of physiological data to mea-
data that is difficult to determine purely by intuition alone sure a user’s instantaneous reaction while they interact with
and that is best collected from the observations of a larger a computer system. This data might include eye gaze,
group of people. From the remaining results, the quantita- speech, gestures and physiological readings such as EMG,
tive results provided objective backup to certain elements of ECG, EEG, skin conductance and pulse. Mandryk [8] ex-
the interview results, some useful data about the functional amines the issues associated with the evaluation of affect
89
using these physiological measures; how to calibrate the sen- [4] Alan Dix, Janet Finlay, Gregory D. Abowd, and
sor readings and how to correlate multi-point sensor data Russel Beale. Human-Computer Interaction. Prentice
streams with single point subjective data. Both studies Hall, 3rd edition, 2004.
acknowledge that physiological readings are more valuable [5] Kristina Höök, Phoebe Sengers, and Gerd Andersson.
when combined with qualitative data. The challenge here Sense and sensibility: evaluation and interactive art.
is to interpret the data effectively and research needs to be In CHI ’03, pages 241–248, New York, NY, USA,
done into how to calibrate this data for musical experiments. 2003. ACM.
Fallman and Waterworth [3] describe how the Repertory [6] Katherine Isbister, Kia Hook, Jarmo Laaksolahti, and
Grid Technique (RGT) can be used for affective evaluation Michael Sharp. The sensual evaluation instrument:
of user experience. RGT is a post-task evaluation technique Developing a trans-cultural self-report measure of
based on Kelly’s Personal Construct Theory, and it involves affect. International Journal of Human-Computer
eliciting qualitative constructs from a user which are then Studies, 65:315–328, April 2007.
rated quantitatively. It sits on the border between qualita- [7] Joseph ’Jofish’ Kaye, Kirsten Boehner, Jarmo
tive and quantitative methods, allowing empirical analysis Laaksolahti, and Anna Staahl. Evaluating
of qualitative data. RGT isn’t ideal in a musical context experience-focused hci. In CHI ’07: CHI ’07 extended
as the data isn’t collected in the moment of the experience abstracts on Human factors in computing systems,
it evaluates; however, it could be an improvement on inter- pages 2117–2120, New York, NY, USA, 2007. ACM.
views, and has the the practical advantage that the data [8] Regan Lee Mandryk. Evaluating affective computing
analysis is less time-consuming. environments using physiological measures. In CHI’05
A number of experience evaluation techniques attempt Workshop on Evaluating Affective Interfaces -
to gather data from multiple data sources in order to at- Innovative Approaces, 2005.
tempt to triangulate an overall result. This way of working
[9] James McCartney. Rethinking the computer music
brings the challenge of synchronising and re-integrating the
language: SuperCollider. Computer Music Journal,
data sources, and some researchers are creating tools to deal
26(4):61–8, 2002.
with this [2]. These kind of tools would have been of great
[10] Cornelius Poepel. On interface expressivity: a
value to the data analysis in the Wiimote study, especially
player-based study. In NIME ’05: Proceedings of the
because of the need for log file to video synchronisation.
Developments in new HCI research are encouraging, but 2005 conference on New interfaces for musical
how useful are they in a computer music context? All these expression, pages 228–231, Singapore, Singapore,
methodologies need to be assessed specifically in terms of 2004. National University of Singapore.
evaluation of musical experience as well as user experience. [11] Marcelo Mortensen Wanderley and Nicola Orio.
Evaluation of input devices for musical expression:
Borrowing tools from hci. Comput. Music J.,
4. CONCLUSION 26(3):62–76, 2002.
We have examined current intersections between HCI eval-
uation methodology and computer music, presented a case
study of an evaluation based on this methodology, and looked
at some of the new research in HCI which is relevant to our
field. The evaluation of the Wiimote produced some valu-
able insights into its use as a musical controller, but it lacked
real-time data concerning the participants’ experience of us-
ing the device. The third wave of HCI holds promising po-
tential for computer music; the two fields share the common
goal of evaluating experience and affect between technology
and its users. The analysis of musical interfaces can be con-
sidered as a very specialised area of experience evaluation,
though techniques for new HCI research are not necessarily
immediately applicable to music technology. New research
is needed to adapt and test these methodologies in musical
contexts, and perhaps these techniques might inspire new
research which is directly useful to musicians.
5. REFERENCES
[1] Noel Chateau and Marc Mersiol. Amuse: A tool for
evaluating affective interfaces. In CHI’05 Workshop
on Evaluating Affective Interfaces - Innovative
Approaces, 2005.
[2] Andy Crabtree, Steve Benford, Chris Greenhalgh,
Paul Tennent, Matthew Chalmers, and Barry Brown.
Supporting ethnographic studies of ubiquitous
computing in the wild. In DIS ’06: Proceedings of the
6th conference on Designing Interactive systems,
pages 60–69, New York, NY, USA, 2006. ACM.
[3] John Waterworth Daniel Fallman. Dealing with user
experience and affective evaluation in hci design: A
repertory grid approach. In CHI’05 Workshop on
Evaluating Affective Interfaces - Innovative
Approaces, 2005.
90
The A20: Musical Metaphors for Interface Design

Olivier Bau Atau Tanaka Wendy E. Mackay
in|situ| lab, INRIA & LRI Culture Lab in|situ| lab, INRIA & LRI
Bât 490 Université Paris-Sud 11 Newcastle University Bât 490 Université Paris-Sud 11
91405 Orsay Cedex France NE1 7RU United Kingdom 91405 Orsay Cedex France
bau@lri.fr atau.tanaka@ncl.ac.uk mackay@lri.fr
ABSTRACT to explicitly question traditional ways of thinking and open up

We combine two concepts, the musical instrument as metaphor novel design directions. Our goal was to create a technology
and technology probes, to explore how tangible interfaces can probe that focuses on the sonic aspects of tangible interfaces,
exploit the semantic richness of sound. Using participatory using participatory design to create and explore the possibilities
design methods from Human-Computer Interaction (HCI), we of a working prototype.
designed and tested the A20, a polyhedron-shaped, multi-
channel audio input/output device. The software maps sound We also draw on the instrument building approach from NIME,
around the edges and responds to the user’s gestural input, which offers a similar notion of generative design. Musical
allowing both aural and haptic modes of interaction as well as instruments are developed as open-ended systems that allow the
direct manipulation of media content. The software is designed creation of novel compositions and interpretations, while
to be very flexible and can be adapted to a wide range of idiomatic composition recognizes that limitations are imposed
shapes. Our tests of the A20’s perceptual and interaction by the characteristics of the system or instruments. We use this
properties showed that users can successfully detect sound instrument building metaphor as one of the foundations for our
placement, movement and haptic effects on this device. Our generative design approach: the limitations of the instrument
participatory design workshops explored the possibilities of the serve to both define and constrain the design space, with respect
A20 as a generative tool for the design of an extended, to the given research problem.
collaborative personal music player. The A20 helped users to
enact scenarios of everyday mobile music player use and to
generate new design ideas.
KEYWORDS
Generative design tools, Instrument building, Multi-faceted
audio, Personal music devices, Tangible user interfaces,
Technology probes
1. INTRODUCTION
We are interested in creating tangible user interfaces that
exploit the semantic richness of sound. Our research draws from
two disciplines: Human-Computer Interaction (HCI) and NIME
instrument design. The former offers a number of examples of
the use of sound in graphical interfaces, including Buxton et
al.’s [2] early work, Gaver’s auditory icons [5] and Beaudouin- Figure 1. The A20 is a working prototype of a technology
Lafon and Gaver’s [1] ENO system. These systems focused probe for exploring music and sound in a tangible interface.
primarily on sound as a feedback mechanism, with an emphasis This paper describes the design and development of the A20
on graphical rather than tangible user interfaces. (Figure 1), a polyhedron-shaped, multi-channel audio device
We draw upon HCI design methods, particularly participatory that allows direct manipulation of media content through touch
design [7][12], that emphasize the generation of ideas in and movement, with various forms of aural and haptic
collaboration with users. In particular, technology probes [9] feedback. During a series of participatory design sessions, both
engage users as well as designers to create novel design users and designers used the A20 to generate and explore novel
interface designs. The easily modifiable software architecture
concepts, inspired by the use of the technology in situ. This
generative design approach challenges both users and designers allowed us to create various mappings between gestural and
pressure inputs, producing specific sounds and haptic output.
Meanwhile the flexibility of the A20 as an interface allowed
Permission to make digital or hard copies of all or part of this work for users a range of interpretations for any given mapping. The A20
personal or classroom use is granted without fee provided that copies are was never intended as a prototype of a specific future system.
not made or distributed for profit or commercial advantage and that
Instead, we sought to use it as a design tool to explore the
otherwise, or republish, to post on servers or to redistribute to lists, potential of music and sound in tangible interfaces. Our
requires prior specific permission and/or a fee. participatory design workshops served to both evaluate the A20
NIME08, June 5-7, 2008, Genova, Italy itself and to explore novel interface designs, including social
Copyright remains with the author(s). interaction through portable music players.
91
2. RELATED WORK Technology probes were originally designed to study human-to-

Since our goal was to maximize the user’s ability to explore human communication and were tested in the homes of remote
new forms of interaction, we needed a generic shape that would family members. Most focused on the exchange of visual
maximize the user’s freedom of expression and could be easily information, such as the VideoProbe [9] which snaps a picture
adapted to a variety of types of interaction, preferably through from a webcam in the living room – but only if the person does
direct manipulation of a topological display. The D20 Error! not move for three seconds – and shares it with a VideoProbe in
Reference source not found., co-designed by one of the the living room of a remote family member. Another device,
authors, is a design concept for a visual interface embodied as TokiTok [10] explored communication via simple sounds: users
an icosahedron. An icosahedron is a nearly spherical could transmit ‘knocks at a distance’, which conveyed simple
polyhedron with 20 discrete triangular facets. Figure 2 shows information such as ‘I’m home’ but also allowed participants to
how this shape permits a variety of display options and modes establish more elaborate codes to stay in touch. However, sound
of rotation-based interaction, such as around the equator, as and music have not been the focus of technology probe research
slices of a pie, or simply as binary choices (Figure 2). thus far.
The A20 project seeks to leverage the complementary aspects
of music research and user interface design methods. We use
the notion of technology probes to understand users and draw
inspiration, but in simulated settings in design workshops rather
than in the real world. We also take advantage of techniques
from NIME, with the inherently expressive properties of
musical instruments, to explore this design space.
3. CROSSING DESIGN TRADITIONS

3.1 IDIOMATIC WRITING AND
SEAMFUL DESIGN
Musical instruments are built to be vehicles of expressive
Figure 2. The D20 interaction modes that emerge from three communication. An instrument is generic in the sense that many
facet patterns: equator, pie and binary kinds of music can be composed for the same instrument. At the
same time, an instrument is idiosyncratic in that it is capable of
The D20 was created as a design concept, using a computer specific modes of articulation, limited melodic range and
simulation that emphasized the visual display properties of the harmonic combinations. An instrument is not necessarily
icosohedron. We decided to adopt the same form but this time designed to have a set of limitations, but a successful musical
as a functional prototype, focusing on its audio and haptic work must take into account these characteristics. A musical
possibilities. composition that respects and plays upon the idiosyncratic
Several other researchers have created omni-directional nature and limits of an instrument is considered an example of
spherical sound output devices. For example, Warusfel [17] idiomatic writing [15]. This approach to creative musical use of
controls radiation patterns from a single sound source across a acoustical properties and limitations applied to digital
spherical matrix. Freed et al. extended this to a multi-channel interaction properties is one of the core research areas of NIME.
approach that simulates acoustical instrument propagation [4]. In the field of HCI, various design methodologies exist to create
SenSAs [16] add sensors to create a form of electronic chamber useable or efficient user interfaces. This can include
music performance. The primary focus of these projects was to performance optimization in the technical sense, or taking into
recreate multi-directional sound radiation patterns that approach account the end-user’s needs in the design process as in the case
those of acoustic instruments: they create non-frontal forms of of User-Centered Design. A technique similar to that of
amplified sound reinforcement so as to better situate electronic idiomatic writing in music exists in HCI, whereby limitations of
sounds in context with acoustic sources. However, none have a technological system are used as part of the design process.
used a spherical form factor to play multiple sound sources in This is called seamful design [3]. Chalmers argues that
the context of end-user music devices such as MP3 players. accepting all of a system’s “physical and computational
The other relevant research relates to generative design characteristics [whether they are] weaknesses or strengths” not
methods. For example, cultural probes [6] provide people with only offers more robust system design, but may also inspire
unusual artefacts in various settings, with the goal of inspiring novel interface ideas.
novel ideas. The idea is to move from the more classical HCI Composing idiomatic music for an instrument can be
approach, in which users are viewed as a source of data, and considered an act of seamful design: we can make a link
engage in activities in which users become a source of between making a composition that takes into account an
inspiration. Technology probes [9] also focus on design instrument’s limitations, and creating an interface that takes
inspiration with users, but in an explicitly participatory design advantage of a system’s characteristics. In the user-interface
context. Technology probes are founded on the principle of design process, seamfulness helps define the creation of a
triangulation [11] which fulfills three “interdisciplinary goals: design space, while open-endedness helps in interpreting the
the social science goal of understanding the needs and desires design space. Here we apply the duality of seamful composition
of users in a real-world setting, the engineering goal of field- and open-ended instrument to create a tool for generative user
testing the technology, and the design goal of inspiring users interface design.
and researchers to think about new technologies”.
92
3.2 INSTRUMENT METAPHORS FOR 4. A20 INSTRUMENT DESIGN

INTERFACE DESIGN 4.1 Hardware
In the development of the A20, we sought an application- The first version of the A20 was a simple cube, which helped us
neutral approach that would yield a flexible interface. The to develop the software for integrating sound-processing and
design of the A20 is not a direct response to specific interface sensor data. The second version was an icosahedron, which we
design questions. Instead a metaphor-based conceptual used in our studies with users.
development allowed us to pursue an open-ended process to
explore the design space of audio interfaces. We called upon
three metaphors from the musical tradition: instrument building,
composition, and expressivity of interpretation.
When building digital musical instruments, unlike acoustic
instruments, we must define the mappings between input and
output [8]. For a given system specification, we can conceive of
many mappings to create a variety of input and output
connections. This range of mappings turns the system into a
potential family of instruments or corpus of articulations for a Figure 3: The A20 frame (left) consists of 20 triangles, 16
given instrument. This contrasts with most user interface of which hold flat speakers. Transducer and Force Sensing
design, in which the goal is to find the single optimal mapping Resistors (right) fit under each speaker.
of input and output that will create the desired interaction for a
Figure 3 shows A20’s frame on left, built with rapid
specific design problem.
prototyping stereo-lithography. An audio interface and sensors
We also draw from the metaphor of musical instrument were housed within the structure (Figure 3, right). The
composition which emphasizes expressivity and interpretation. icosahedron had 14 cm edges, resulting in a diameter of
A composition exists as a musical structure that can be executed approximately 22 cm. We attached commercially available
and re-interpreted in the context of a musical performance. lightweight flat-panel loudspeakers along the outside of the
These two metaphors, musical instruments and composition, frame, with each panel cut to a triangular shape. The assembled,
encourage us to re-examine the traditional user interface design working version can be seen in Figure 1.
concept of a scenario and redefine it as a compositional
The sixteen flat speakers are driven independently with two
abstraction that can be executed on that tool/instrument. In a
USB 8-channel 16-bit sound cards with a 44.1 kHz sampling
participatory design process, scenario creation and scenario
rate. Thus, only 16 of the faces are able to display independent
enacting can be seen as composition and interpretation. These
sound. Sensors include a Bluetooth Six-Degrees-Of-Freedom
metaphors serve to situate and enrich our interaction scenarios,
(6DoF) inertial sensor pack with a triaxial accelerometer and a
while also guiding the design specification of the system.
triaxial gyroscope for rotation [18]. Force Sensing Resistors
The metaphors of instrument, composition and interpretation (FSR) are integrated under each speaker transducer, with a 10-
correspond to two levels of abstraction of the A20. At the lower bit analog-to-digital conversion processor on a separate micro-
level of abstraction, the instrument is defined by the hardware controller-based acquisition board 1. The micro-controller
specification (form factor, sensors, audio output) and software acquires the pressure sensor data with 12 bit resolution which is
specification (mapping between input and output). As a design then sent over a standard serial port. Figure 4 (lower box)
tool, the A20’s hardware establishes the first set of constraints illustrates the hardware architecture.
for the design space, including gestural and pressure input on
the one hand and multidirectional and multi-channel output
capabilities on the other hand. The software defines the
‘elements of interaction’ that turn the A20 into an instrument.
For example, the user can create a sound that moves around the
device and then make it stop by shaking the device.
The upper level of abstraction comprises composition and
interpretation, which allow the user to play the device in the
context of a specific design scenario. Different interpretations
can be seen as different instantiations of an open-ended
interaction mapping. For example, shaking the device could be
interpreted as a gesture to validate playlist creation, to send a
song to a friend’s device, or an action in a collaborative music
game. The expressivity of the resulting instrument allows a
wide range of interpretations and instantiations for different Figure 4. A20 hardware and software architecture
design questions. The software that defines the A20’s
interaction is highly flexible, which enables us as user interface 4.2 SOFTWARE
designers to invent and invite users to ‘play’ a diverse set of The software architecture is based on a client/server model and
instruments and understand both the problems and potential of consists of: sensor acquisition and interaction mapping modules
each. and an audio engine, shown in Figure 4 (upper box). The data
collected from the A20’s sensors is broadcast to a control
module on the computer, which integrates the sensor data and
1
www.arduino.org
93
defines the interaction mappings. A second module is in charge technology probe and an instrument, to inspire and explore
of audio processing and sends data back to the A20. Both different forms of interaction with a tangible audio device.
modules communicate via Open Sound Control2. We chose the
UDP protocol for its efficiency in time-sensitive applications.
5.1 Multi-faceted Audio Perception
The A20 interaction mappings are implemented in C ++ as a The purpose of the first set of tests was to assess the users’
server process that aggregates data from the accelerometer, ability to perceive different modes of audio display on the A20,
gyroscope and pressure sensors. We used the OpenGL graphics including their ability to perceive sound position, motion
library to program a visual representation of the physical around the device, and haptic patterns. We also wanted to
prototype for debugging interaction mappings and to accelerate familiarize them with the A20 so they could participate in the
matrix operations during real-time sound mapping on the second set of participatory design exercises.
device. We vectorized sound location across the surface of the
icosahedron in a way similar to Pulkki’s work on vector-based
sound positioning [13]. Vector-based audio panning extends the
principle of stereo panning with an additional dimension,
making it useful when the listener is not fixed at a sweet spot or
in cases where sound distribution includes a vertical dimension.
In the control software, 3D vectors represent sound sources.
The origin of a 3D coordinate system is the center of the object,
in this case, the center of the A20. Each face and corresponding
speaker is represented by a vector from that origin to its center.
The control software outputs, in real time, a vector angle for
each sound source. The audio engine can then calculate
amplitude levels given the angular distance between the vectors
representing the sound sources and those representing the Figure 5. Testing how a user perceives the A20
speakers. The control software dynamically calculates the
source vectors, resulting in sounds moving across a series of We asked 16 participants to perform a set of tests, in individual
faces. After audio processing, this results in a gradual sessions lasting approximately 10 minutes each. Each
multidimensional panning between those two faces, giving the participant was given the A20 (Figure 5) and asked to perform
impression of sound moving across the surface of the object. the following tasks:
This software can be adapted to a range of different shapes. The Test 1: Localizing Sound
vectors representing the faces are computed according to the Impulse sounds were played randomly on one of the facets and
number of speakers and their placement. The audio engine is the participant was asked to identify the source facet without
touching the A20. (Repeated five times.)
then configured with the proper number of speakers and data
relative to their output capabilities, such as physical size and Test 2: Detecting Direction of Movement
amplitude range. Thus the same software works for the original An impulse train was panned around the equator of the device
cube-shaped prototype and for the 20-sided icosahedron. to simulate a moving sound source with a circular pattern. The
The audio engine is written in Max/MSP and is divided into two participant was asked to identify whether the direction of
movement was clockwise or counter-clockwise, without
parts. The main control program is the master of two slave
touching the A20.
patches, each controlling a sound card. The audio engine
manages multiple sound streams that can be placed on different Test 3: Distinguishing Static from Dynamic
positions on the device according to location attributes sent by We combined the first two tests to determine whether the
the control software. This software allows us to use synthesized participant could distinguish a moving sound from a static
sounds as well as samples of recorded music in MP3 format. sound. The participant was presented with four conditions: two
Post-treatment algorithms are applied to achieve acoustical static sounds were played (derived from Test 1) and two
effects from the real world. For example, Doppler shift changes moving sounds were played (counter and counter-clockwise) in
the sound pitch as it moves closer or further, and filtering a counterbalanced presentation sequence.
effects change the sound timbre as the sound moves behind Test 4: Distinguishing Haptic Stimuli
obscuring objects, thus enhancing the effect of sound movement We combined the auditory and haptic channels to create various
around the device. combinations – some where the two modes were synchronous,
reinforcing perception of a single source, and others that
5. EVALUATION presented two distinct sources, one in each modality. The haptic
In order to evaluate the A20, we invited non-technical users to channels were presented on the lateral faces under the
the third in a series of participatory design workshops. The first participant’s hands whereas the auditory channel (a musical
two sessions, not reported here, focused on an interview-based excerpt from a well-known pop song) was presented on the
exploration of existing personal music player usage, and ‘pie’ zone at the top of the A20. In some combinations, the
structured brainstorming on communicating entertainment haptic channel corresponded to the music being heard, while in
devices, respectively. Evaluation of the A20 was comprised of others the haptic and audio stimuli were independent. The
two activities. The first type of evaluation focused on its participant was asked to indicate whether or not the haptic and
perceptual characteristics as a multi-faceted multi-channel audio audio signals were the same. In cases where the haptic signal
device. The second type of evaluation used the A20 as a was derived from the music, several variations were made to
bring more or less of the music into the haptic range. This
included generating the haptic signal from the amplitude
envelope of the music, or low-pass filtering the music before
2
www.opensoundcontrol.org generating the corresponding haptic stimulus.
94
Test 5: Distinguishing Haptic Stimuli

Participants were asked to hold the A20. We generated two
different haptic stimuli, one under each hand. These were low
frequency vibration patterns that were not in the audible range
(using pulse width and frequency modulation). The participant
was asked whether or not the two patterns were the same. For
each task, trial order was counterbalanced across participants.
5.2 Results Figure 7. Working with cardboard mockups and drawing

Figure 6 shows the results of each of the five tests. Participants storyboards to illustrate shared scenarios.
were reliably able to locate the position of a sound on the device
(Test 1, 85% accuracy), to detect the direction of motion (Test
2, 77% accuracy) and to perceive whether the sound was
5.4 Results
moving or not (Test 3, 79% accuracy). However users had One of our constraints was that we had only one working
greater difficulty determining whether the haptic stimulus under prototype of the A20, which meant that participants played with
their hands was a filtered version of the music being heard (Test it at different points in their design exercises. However, this
4, 69% of accuracy). Participants were particularly successful in enabled us to observe how the A20 affected their designs, and
distinguishing among haptic stimuli (Test 5, 91% accuracy). compare designs from those who experienced it early or late in
their design processes.
As one would expect, people had various interpretations of the
A20 and incorporated its features differently into their designs.
Some were directly influenced by the interaction elements that
they experienced in the perceptual tests. For example, one
group’s concept emerged from the first interaction mapping:
They extended the idea of flicking the A20 to navigate through
sounds and created a collaborative music game. One user would
perform a succession of back-and-forth flicks to create a sound
sequence. The remote player would then execute the same
sound sequence, adding one new sound at the end. As they play,
the sequence becomes successively more difficult to master,
until one player cannot reproduce the sequence.
Another group imagined a file browsing technique that involved
manipulating the sound source directly. This exploited the
Figure 6. Results of simple perception tests whole-object interaction and audio-only nature of the A20. One
participant applied this functionality to common MP3 players
by adding gestural input and spatialized sound. This modified
5.3 Participatory Design Workshop the concept of the playlist so that it was no longer a textual
We organized the workshop into four major design activities. representation of the music, but the music itself, sequentially
The first asked participants to create personal scenarios that laid out across the faces of the A20.
address the theme of mobile social interaction through music.
The second interaction mapping allows users to send a sound
The second and third activities were conducted in parallel. In
around the equator of the A20, so that the sound moves from
the second activity, small groups collaborated on creating a
the pressed face to its opposite face. Although presented only as
scenario that combined and deepened the individual scenarios
an abstract interaction element, several participants seized upon
from activity one. During this time, we invited individuals to
the idea of generating sonic feedback when sending a music file
test the A20, as described in the previous section. When all the
to someone else. One participant imagined a scenario that
members of a group had completed individual perception tests,
combined the second and third interaction mappings. He would
we used the A20 as a design tool to help each group imagine
turn physically in space with the A20 so as to orient himself
novel interaction scenarios. We implemented three interaction
with respect to his distant correspondent, effectively associating
mappings that allowed participants to play with thee different
a physical person in real space to the topology of the A20. He
forms of gesture-based interaction:
would then select a piece of music from a particular face to
1. Flick the A20 left or right to change the current music share with the other person.
track playing on the top of the device.
The third interaction mapping inspired another group to propose
2. Press on a facet to make a sound rotate around the a device that acts like a sound memory compass: “The A20 can
equator, starting from the pressed speaker and then fading be a recorder for use while traveling, to capture impressions
away. from different places. Each face saves a sonic snapshot from a
3. As the user physically turns in a circle, compensate by place I visit.” They attached sounds to virtual objects in the
panning the A20 so that the music stays fixed relative to environment and proposed navigating through this collection of
the surrounding space. objects by pointing the A20 in different directions in the space.
The fourth activity (Figure 7) asked pairs of participants to Other users imagined scenarios that exploited the A20’s form
create a meta-scenario that incorporated their newfound factor. For example, one group proposed throwing the A20 “like
interpretations of A20 interaction mappings and design a user a die onto the floor”, which would turn on shuffle mode and
interface that exploited its sound properties. The resulting “fill the living room with sound”. Another group proposed using
scenarios were sketched out on storyboards, acted out, and groups of A20’s like stackable bricks, to create a variety of
videotaped. different sound or music effects. These examples illustrate some
95
of the richness and innovation of the ideas generated by non- [4] Freed, A., Avizienis, R., Wessel, M. and Kassakian, P.
technical users, which go far beyond the creativity we saw in (2006). A compact 120 Independent Element Spherical
previous workshops, when they had no specific instrument on Loudspeaker Array with Programmable Radiation Patterns.
which to play and explore ideas. In Proc. of AES’06. paper 6783.
[5] Gaver, W. (1989). The Sonic Finder: An Interface that
6. CONCLUSION AND FUTURE WORK Uses Auditory Icons. In Human Computer Interaction,
Our goal has been to use the expressivity and open-endedness 4(1), pp. 67-94
typical of musical instruments to create generative design tools,
[6] Gaver, W.W. and Dunne, A. (1999). Projected Realities:
encouraging both users and designers to imagine new interfaces
Conceptual Design for Cultural Effect. In Proc. of CHI’99
using the evocative richness of sound. In workshops, users
pp. 600-608.
experienced, tested and explored design ideas, immersed in the
context provided by the workshop theme and the A20’s specific [7] J. Greenbaum and M. Kyng, Eds. (1992). Design at Work:
sound characteristics. We feel that the A20 successfully acted as Cooperative Design of Computer Systems. Lawrence
an expansive platform for generating and exploring new sound Erlbaum Associates, Inc.
interaction ideas. [8] Hunt, A., Wanderley, M.M. and Paradis, M. (2002). The
The icosahedron form served as a generic interface that could importance of parameter mapping in electronic instrument
be reinterpreted in different ways. The A20 constrained the design. In Proc. of NIME’02. pp. 149–154.
design space to gestural input and multi-directional sound [9] Hutchinson, H., Mackay, W.E., Westerlund, B., Bederson,
output and the idiosyncratic form factor influenced some B., Druin, A., Plaisant, C., Beaudouin-Lafon, M.,
participants’ scenario interpretations. However, since the sound Conversy, S., Evans, E., Hansen, H., Roussel, R.,
control software can be easily adapted to work on other form Eiderbäck, B., Lindquist S. and Sundblad, Y. (2003)
factors, different shapes could be used depending upon the Technology Probes: Inspiring Design for and with
design questions to be treated, allowing us to transpose on the Families. In Proc of CHI’03. pp. 17-24.
design space. This could be achieved by creating a wider range
of simple forms or even using Lego-like building blocks to [10] Lindquist, S., Westerlund, B., Sundblad, Y., Tobiasson, H.,
create a shape around the multidirectional sound source. Beaudouin-Lafon, M. and Mackay, W. (2007). Co-
designing Technology with and for Families - Methods,
In our future work, we plan to extend the output and networking Experiences, Results and Impact. In Streitz, N., Kameas,
capabilities of the A20. We found the preliminary perception A. & Mavrommati, I. (Eds), The Disappearing Computer,
tests with haptic patterns interesting and we also plan to explore LNCS 4500, Springer Verlag, 2007, pp. 99-119.
audio-haptic correlation and audio-to-haptic information
transitions and add these features to another instrument [11] Mackay, W.E. and Fayard, A-L. (1997). HCI, Natural
interface. This would allow user interface designers to take the Science and Design: A Framework for Triangulation
haptic capabilities of audio displays into account and to further Across Disciplines. In Proc. of DIS '97. ACM. pp. 223-
explore the multimodal potential across sound and touch 234.
together. We hope to develop a fully wireless lightweight [12] Muller, M.J. and Kuhn, S. (Eds.) (1993).
version of the A20 and would also like to add networking Communications of the ACM Special issue on
features so that multiple A20’s can communicate with each Participatory Design, 36 (6). pp. 24-28.
other and encourage diverse form of musical collaboration
among its users. [13] Poupyrev, I., Newton-Dunn, H. and Bau, O. (2006). D20:
interaction with multifaceted display devices. In CHI’06
Extended Abstracts. ACM. pp.1241-1246.
7. ACKNOWLEDGMENTS [14] Pulkki, V. (1997). Virtual sound source positioning using
This project was developed at Sony Computer Science
vector base amplitude panning. In J. Audio Eng. Soc. 45
Laboratory Paris. Our thanks to the project interns, Emmanuel
(6). pp.456-466.
Geoffray from IRCAM and Sonia Nagala from Stanford
University and to Nicolas Gaudron for the icosahedron [15] Tanaka, A. (2006). Interaction, Agency, Experience, and
structure. the Future of Music. In Brown, B., O’Hara, K. (Eds.)
Consuming Music Together: Social and Collaborative
Aspects of Music Consumption Technologies. Computer
8. REFERENCES Supported Cooperative Work (CSCW) Vol. 35. Springer,
[1] Beaudouin-Lafon, M. and Gaver, W. (1994). ENO: Dordrecht. pp. 267-288.
synthesizing structured sound spaces. In Proc. of UIST’94.
ACM. pp. 49-57. [16] Trueman, D., Bahn, C. and Cook, P. (2000). Alternative
Voices For Electronic Sound, Spherical Speakers and
[2] Buxton, W., Gaver, W. and Bly, S. (1994). Auditory Sensor-Speaker Arrays (SenSAs). In Proc. of ICMC’00.
Interfaces: The Use of Non-Speech Audio at the Interface.
http://www.billbuxton.com/Audio.TOC.html [17] Warusfel, O. and Misdariis, N. (2001). Directivity
Synthesis With A 3d Array Of Loudspeakers Application
[3] Chalmers, M. and Galani, A. (2004). Seamful For Stage Performance. In Proc. of DAFx’01.
interweaving: heterogeneity in the theory and design of
interactive systems. In Proc. of DIS’04. ACM. pp. 243- [18] Williamson, J., Murray-Smith, R., and Hughes, S. (2007).
252. Shoogle: excitatory multimodal interaction on mobile
devices. In Proc.of CHI '07. ACM. pp. 121-124.
96
Low Force Pressure Measurement: Pressure Sensor

Matrices for Gesture Analysis, Stiffness Recognition and
Augmented Instruments
Tobias Grosshauser
IRCAM/ ReactiveS.net
1, Place Igor Stravinsky, Paris
ReactiveS Lab, Munich
+49- 176- 242 99 241
Tobias@Grosshauser.de
ABSTRACT All in all, the target group is from beginners up to professional

The described project is a new approach to use highly sensitive musicians in the areas of teaching, performance, composition and
low force pressure sensor matrices for malposition, cramping and posture and gesture analysis.
tension of hands and fingers, gesture and keystroke analysis and In music and art, sensors can be an alternative or an enahncement
for new musical expression. In the latter, sensors are used as for traditional interfaces like computer keyboard, monitor, mouse
additional touch sensitive switches and keys. In pedagogical and camera in man-machine interaction. Position, pressure or
issues, new ways of technology enhanced teaching, self teaching force sensing is a possibility to translate the haptic reality to the
and exercising are described. The used sensors are custom made digital world. There is already a great choice of high performance
in collaboration with the ReactiveS Sensorlab. motion and position tracking systems, but techniques for pressure
recording are still under-represented. Besides the expenses, this is
Keywords due to the complicated measurement technology needed for the
mostly high capacitance of the industrial sensors and the
Pressure Measurement, Force, Sensor, Finger, Violin, Strings, complicated and damageable mechanical setup. After the first
Piano, Left Hand, Right Hand, Time Line, Cramping, Gesture and development period of the pressure sensors, the main goals, high
Posture Analysis. sensitivity and low weight, were achieved. Later, also the
following additional requirements:
1. INTRODUCTION
Many audio and gesture parameters have already been explored - cheap and stable, “live-performance-proof“
and described in exercising, teaching and performing of musical - easy to use and to install
instruments. The suggested method in this paper extends the
- no distraction of gesture or movements
approved practices. Basic technology is a high sensitive pressure
sensor. The line up of several of these extremely light weighted - usable in performances with and without computer,
sensors in arrays allows a broad field of applications. A “stand-alone system“
combination in matrices allows 3-dimensional representation of - every sensor can be detected autonomously
the linearised data with position and pressure visualisation. The
position, pressure/ force and the data representation with for - high resolution AD-conversion, but also
instance time line alignement shows the change of the overall - compatibility to standards like MIDI
energy and is visualised graphically. Alongside the time axis the
change of the applicated force respectivly the pressure can be
observed. 2. STATE OF THE ART OF RESEARCH IN
Many different visualisation, recording, sonification and feedback GESTURE, PRESSURE AND POSITION
tools are programmed in PD and MaxMSP or similar software
environments and can be applied for the generated data.
RECOGNITION
Poepel shows a summarisation of the extended violins, playing
with ASDSS sounds, playing with expanded existing instruments
and playing with new gestures [1]. Askenfelt already measures
Permission to make digital or hard copies of all or part of this work for bow motion and force with custom electronic devices [2]. A thin
personal or classroom use is granted without fee provided that copies are resistor wire is among the bow hairs to get position data and bow
not made or distributed for profit or commercial advantage and that copies bridge distance with electrified strings. Paradiso uses the first
bear this notice and the full citation on the first page. To copy otherwise, wireless measurement system, two oscillators on the bow and an
or republish, to post on servers or to redistribute to lists, requires prior
antenna combined with a receiver [3]. Also pressure of the
specific permission and/or a fee.
NIME08, June 4-8, 2008, Genova, Italy forefinger and between the hair and wood. Young received
Copyright remains with the author(s). pressure data from a foil strain placed in the middle of the bow
[4]. Demoucron attaches accelerometers to the bow and measures
97
the complete pressure of the bow with sensors connected to the 4. PRESSURE AND POSITION
bow hair [5].
MEASUREMENT
Maestre presents a gesture tracking system based on a commercial
EMF device [6]. One Sensor is glued on the bottom near the neck 4.1 Strings
of the violin, a second one on the bow. Data of position, pressure The basic measurements at the violin (exemplary for strings) are:
by deforming the bow and relating data to this capturing can be 4.1.1 Pressure and Position of each Finger of the
calculated. A lot more systems exist, but mostly combined with a
camera, which does not seem to be stable and reliable enough for Right Hand
performances and everyday use.
A different approach is developed at IRCAM by Bevilaqua [7].
The sensing capabilities are added to the bow and measure the
bow acceleration in realtime. A software based recognition system
detects different bowing styles.
Guaus measures the bow pressure over all [8] and not each finger,
which cause the pressure on the bow. Sensors are fixed on the
hairs of the bow on the tip and the frog. This means additional
weight on the tip, which could influence professional violin
playing, because of the leverage effect.
The recent paper of Young [9] describes a data base of bow
strokes with many sensor data like 3D acceleration, 2D bow force,
and electric field position sensing, again with an over all bow
force measurement.
The presented measuring system here shows a setup easy to
install, just sticking the less than 1mm thick, flexible sensor on the
bow or finger and connecting it with the convertor box. As every
single finger itself is measured, besides pressure and force
allocation and changes between the fingers at different playing
techniques, muscle cramps and wrong finger position can be
detected.
3. BASIC SETUP Figure 1. Change of force during one bow stroke

Each sensor is connected directly with a converter box. If less data
is required, fewer sensors can be plugged into the convertor box.
Standard stereo jacks are used as plugs, each sensor/ plug has its These integrated sensors show when the position and/or pressure
own control channel. This allows individual and minimized setups changes during the movement of the bow (see figure 1) or over a
and a better overview, if fewer channels are used, especially with certain force limit. This limit can be adjusted individually and
younger students. Wireless transmission is partly possible, but not visual feedback or just data recording is possible. This allows ex
always practicable. The connector box can be worn on a belt. Data post playing analysis of the performed music piece, or just
transmission is possible either to a computer or directly to information for beginners. The sensors of the middle and the ring
synthesizers or other modules via MIDI. The connector box finger can also be used for steering or switching peripheries on/off
provides a power supply for each sensor and direct MIDI out. or for constant sound manipulation.
The basic sensor is 5 x 5 x 2 mm, larger dimensions are possible. Figure 2 shows the integration of the sensors into the bow and
It weights only some grammes, depending on the dimensions of finger posture of the right hand. Every sensor is installed to the
the surface area. The sensors are usually combined in 2 to 4 rows right place, individually adapted to the ergonomical and technical
each consisting of 4 to 8 sensors, sticked on a flexible foil. correct position of the musicians’ fingers. For beginners, rather in
the age over 15 years, this is a simple control for the correctness
The basic setup consists of at least one 16-channel programmable of the posture, if they exercise alone at home and it detects wrong
converter and connector box, sensor matrices for shoulder rest, exposure or stiffness of the hand and fingers, for example too
chin rest and bow. Further a computer with MaxMSP, MathLab much or wrong directed pressure on the forefinger.
or common music software to process, record or display the data.
98
neck of beginners and malposition, especially in long-lasting

exercising situations or general inattentiveness.
Figure 2. Pressure sensors on the bow

Figure 4. Shoulder Rest Pressure Allocation, SR1 and SR2
4.1.2 Pressure of the left Hand Sensor Array
This is not an every day solution, because the flexible sensors are
sticked on each finger. But for several methodical and technical
issues useful information is generated. Not the maximum pressure
itself, but how the pressure changes for example at different types
of thrills or vibrato and how the pressure is divided between the
fingers when double stops are played. In combination with bow
position sensor, left- right hand coordination can be explored very
accurate.
4.1.3 Areal Distribution of Pressure and Position of

the Chin on the Chin Rest
First measurements of chinrest pressure in violin playing were
done by M. Okner [10]. Five dependent variables were evaluated:
peak pressure, maximum force, pressure/time integral and Figure 5. Shoulder Rest Malposition, SR1 and SR2
force/time integral, and total contact area. Similar variables are Sensorposition
studied in our measurements and shown in figure 3.
Similar to the above-mentioned chin rest solution, the shoulder
rest matrix (see figure 4) detects malposition and false posture,
often caused by disposition of the shoulder. In figure 5, SR2
shows an incorrect pressure allocation on the shoulder rest,
compared to SR2 in figure 4. Correct violin position is a basic
condition precedent to learn the further playing techniques. This
issue concerns mainly beginners, for advanced musicians it is
possible to use for example a defined pressure raise of the chin or
shoulder for switching and steering interactions. (Besides a bad
positioned shoulder rest, backache can be avoided by good
shoulder and neck posture.)
4.1.4 Comparison Shoulder-Chin Rest Pressure and

Position
Figure 3. Pressure Allocation Chin Rest
Figure 6 shows the pressure allocation of chin and shoulder in one
coordinate system. The upper area is the chin pressure, the lower
Figure 3 shows the chin rest pressure and force sensor matrix area the shoulder rest pressure allocation area. The optical
measurements data in a 3-axis coordinate system. This optical representation is important for a simple every day use. In this case
representation of the force and pressure data seems to be a pratical wrong posture or mal position would appear in a brighter colour
way of showing the measruement results. or inhomogeneous shape of the 2 areas.
The sensor matrix enables the detection of the pressure Postion changes of the violin itself can be detected while playing.
distribution over the whole area, compared to the pressure This enables the musician or teacher to analyse besides pressure
measurement in one point only. Both, position changing and and force changes, changes of the violin position. Evaluation and
muscle cramping is detectable and could prevent pain of back and
99
studies about expressive gestures during a live performance or

practising situations are possible.
Figure 6. Comparison Shoulder-Chin Rest Pressure and Figure 8. Visualisation of 3rd Finger, “Stiffness Control”, too
Position much force/pressure
4.1.5 Stiffness Recognition 4.1.6 Pedagogical Issues

A common problem in teaching, especially in beginners’ lessons, The core application right now is the posture and tension control
is too much tension or stiffness in the bow hand. Besides wrong for stundents older than 15 years. Usually they like this feedback
posture of the hand, elbow and fingers, often force is applied to tool and are used to work with computers. A playful approach
the wrong fingers, sometimes because of the impression, the bow always is important and technical experimentation often ends up
could fall down. The wrong applied force can be detected by the in long exercising periods. This kind of motivation is not always
sensors. Most of the times, too much pressure is applied to working, but the other positive results like remembering the
fingers, where usually nearly no force is needed. In this example posture learned in the last lesson and being able to record data
on the violin bow the middle and the ring finger. A visualisation besides music is interesting.
tool (figure 7) was programmed in PD and MaxMSP. A blue ball The possibilities to provide a simple visualisation tool to get
reacts to the pressure of the ring finger. If there is too much objective data for self studies or practising scenarios are manifold.
pressure, it moves to the right (figure 8) and the color and shape Some useful scenarios are:
changes. Besides that, a sonification is implemented; the more
force, the louder and more distorted sound is generated. When the • Early detecting of too much muscle tension, caused by
opposite appears, the ball reacts into the other direction. wrong finger position or fatigue
Basically all sensor data can be visualised and sonified with this • Easy feedback for beginners, if their posture of the right
tool, but every student reacts individually to technology in hand is OK
teaching situations. First practical experience shows quite good • Avoiding of lasting and time consuming postural
acceptance and feedback, even from the younger pupils. corrections
With professional violinists, interesting tests are made. After long
exercising or job-related playing periods, physical fatigue can be
measured by posture or pressure changes. In exercising situations,
a visual feedback could inform the student and suggest a short
recreation phase.
4.2 Keyboard Instruments

Several basic measurements are applied in piano playing. The
main goals are mal position of the hands of fingers and too much
muscle tension or cramping of the fingers, hands and elbow.
Furthermore the force of single keystrokes and attacks were
measured and visualised for analysing.
With these sensor arrays, augmented pianos are possible and new
ways of playing techniques and expressions could be found.
Figure 7. Visualisation of 3rd Finger, “Stiffness Control”
100
4.2.1 Pressure/ Keystroke and Fingertip Position of

each Finger of the left and right Hand
There are two possibilities of keystroke and fingertip position
recognition. First, sticking the sensor onto the keys (difficult to
play), second fixing one sensor on each fingertip. Only the result
of the second variation were useful.
Keystroke recognition is explored by R. Möller [11] with „High-
speed-camera Recording of Pulp Deformation while Playing
Piano or Clavichord“. Different sensor types are discussed, but in
laboratory conditions with precise but usually unaffordable
equipment and only one playing mode. The low cost system
described here is fast enough to detect the maximum forces and
the gradient of each finger in different playing modes, like thrills
or fast scales. Figure 10. Piano Key Strokes, from “piano” to “forte”,
Sensors sticked on the finger tips
Common key stroke recognition works with two measure points
and the delay between them with every stroke (See Figure 9). Explanations about key strokes could be assisted with this tool.
Guido Van den Berghe describes the system in [12] and mentions Self studies at home could be compared to the recorded data in the
the unsatisfying possibilities to create differentiated key strokes music lesson or from other musicians. Several basic pedagogical
on electric pianos. He also developed a force measurement system problems were explored. The non-releasing of the keys, a
in combination with a high speed camera: But no easy to use and common beginner’s fault can be detected and visualised. Fatigue
an external tool, where no connections in or on the piano, no and cramping are other reasons for this manner, even at advanced
sensors or fixed parts are needed. Even if there are better systems pupils or professionals.
existing now, there is no “force-detection” interface except Many examples could be given, one more is the problem of
someteimes the MIDI-attack recognition for piano or keyboards, rhythmical and dynamical irregularity. Time line, score and audio
to connect to a fine grain and high resolution force and pressure alignment of the data with adjustable time resolutions can show
measurement. clearly the problem, which might be difficult to hear. In these
cases, basic visualizations can simplify long-winded explanations.
4.2.3 Further Analysis

First experiments with further combined measurements are made.
For example: Food pedal usage and pressure measurement. Two
aspects of this measurement are useful. Seeing how and when, for
example a professional player uses the pedals. This could be
recorded in a commonly used audio software and analysed with
the sensor data aligned to an audio recording. Self recording helps
to explore the own usage, or the change of the usage of the pedal
during a longer exercising period.
4.3 Extended Music, Enhanced Scores and

Figure 9. Key velocity measurement system in an electric Augmented Instruments
piano [10] Figure 11 shows a score with additional fingerings with the
thumb. The Zero before the conventional fingering is the left hand
thumb on the violin. It is used like an additional finger.The
second number next to the “0” gives the position of the thumb, if
4.2.2 Pedagogical Issues, Stiffness Recognition the sensor allows more than one active area. In this case three
In Figure 10, the changes of the force of slow keystrokes with
sensitive areas are used and in each the pressure of the thumb is
different attacks are shown.
detected. This allows not only switching effects on and off, even
more, adjusting the amount of data or sound manipulation.
Extended playing techniques, data or sound manipulation and
more voices are feasible.
The goal was the integration of sensors in common playing modes
and gestures. On the one hand thumb-steered real-time interaction
with electronic peripheries like computers and synthesizers, on the
other hand switching, manipulating and mixing of sound effects,
accompaniment and 3d-spatial sound position with integrated
sensors besides the fingerboard, in the range of the thumb.
101
Similar methods could be applied to the right hand sensors, but it 5. REFERENCES
is quite difficult to change the pressure without influencing the [1] C. Poepel, D. Overholt. Recent Developments in Violin-
sound too much. related Digital Musical Instruments: Where Are We and
Where Are We Going? NIME06, 6th International Conference
on New Interfaces for Musical Expression, 2006.
[2] A. Askenfelt. Measurement of bow motion and bow force in
violin playing, Journal of Acoustical Society of America, 80,
1986.
[3] J. A. Paradiso and N. A. Gershenfeld. Musical applications of
electric field sensing, Computer Music Journal, 21:2, S 69-89,
MIT Press, Cambridge, Massachusetts, 1997.
[4] D. S. Young. Wireless sensor system for measurment of violin
bowing parameters, Stockholm Music Acoustics Conference,
2003.
Figure 11. Extended Score [5] M. Demoucron, R. Caussé. Sound synthesis of bowed string
instrumnets using a gesture based control of a physical model,
This extended score is a part of the piece “concertare” from International Conference on Noise & Vibration Engineering,
Tobias Grosshauser [13]. 2007.
For keyboard instruments for instance modulation is possible, just [6] E. Maestre, J. Janer, A. R. Jensenius and J. Malloch.
by changing the finger pressure after the key is already stroken. Extending gdif for instrumental gestures: the case of violin
This new playing technique allows new ways of articulation, even performance, International Computer Music Conference,
when the key is already pressed and sound effects like vibrato on Submitted, 2007.
the piano or keyboard instruments.
[7] F. Bevilaqua, N. Rasamimanana, E. Flety, S. Lemouton, F.
Baschet. The augmented violin project: research, composition
and performance report, NIME06, 6th International
4.4 Further Scenarios and Research Conference on New Interfaces for Musical Expression, 2006.
Further research will be observed with finger pressure
measurements on wind instruments and drums. In combination [8] E. Guaus, J. Bonada, A. Perez, E. Maestre, M. Blaauw.
with position recognition and acceleration sensors, most important Measuring the bow pressure force in a real violin
parameters are detected. performance, International Conference on Noise & Vibration
Engineering, 2007.
This pressure and force sensors provide more and more
possibilities for new music compositions in combination with [9] D. Young, A. Deshmane, Bowstroke Database: A Web-
extended scores and simplified real time interaction within Accessible Archive of Violin Bowing Data, NIME07, 7th
electronical environments. International Conference on New Interfaces for Musical
Expression, 2007
Concerning pedagogic issues, the systems and methods will be
more and more accurate and user friendly for a wider range of [10] M. Okner, T. Kernozek, Chinrest pressure in violin playing:
usage and target audience. type of music, chin rest, and shoulder pad as possible
mediators, Clin Biomech (Bristol, Avon)., 12(3):S12-S13,
The combination of traditional instruments, computer and high
1997-04
tech tools like new sensors could motivate a new generation of
young musicians to learn with new methods they like and they are [11] R. Möller, Wentorf, High-speed-camera Recording of Pulp
more and more used to. Cheap and easy to use sensor systems Deformation while Playing Piano or Clavichord.
would support this development. Also if teaching and pedagogy Musikphysiologie und Musikermedizin, 2004, 11. Jg., Nr. 4,
would be more of an adventure and research for new possibilities [12] Guido Van den Berghe, Bart De Moor; Willem Minten,
and “unknown terrain”, making music, learning and playing a Modeling a Grand Piano Key Action, Computer Music
musical instrument and pracitising could be more fascinating. Journal, Vol. 19, No. 2. (Summer, 1995), pp. 15-22, The MIT
Press
[13] T. Grosshauser, Concertare, www.extendedmusic.net, click
“concertare”
102
The development of motion tracking algorithms for low

cost inertial measurement units
- POINTING-AT -
Giuseppe Torre Javier Torres Mikael Fernstrom
Interaction Design Centre Tyndall National Institute Interaction Design Centre
University of Limerick Cork University University of Limerick
Limerick, Ireland Cork, Ireland Limerick, Ireland
giuseppe.torre@ul.ie javier.torres@tyndall.ie mikael.fernstrom@ul.ie
ABSTRACT
In this paper, we describe an algorithm for the numerical
evaluation of the orientation of an object to which a cluster
of accelerometers, gyroscopes and magnetometers has been
attached. The algorithm is implemented through a set of
Max/Msp and pd new externals. Through the successful
implementation of the algorithm, we introduce Pointing-
at, a new gesture device for the control of sound in a 3D
environment. This work has been at the core of the Celeri-
tas Project, an interdisciplinary research project on motion
tracking technology and multimedia live performances be-
tween the Tyndall Institute of Cork and the Interaction Figure 1: Mote and its cluster of sensors with bat-
Design Centre of Limerick. tery pack. Dimensions are 25 x 25 x 50 mm
Keywords
On the basis of the result achieved, we introduce in the
Tracking Orientation, Pitch Yaw and Roll, Quaternion, Eu- last paragraph Pointing-at, a new gesture device for the
ler, Orientation Matrix, Max/Msp,pd, Wireless Inertial Mea- control of sound in a 3D space or any surround system. The
surement Unit (WIMU) Sensors, Micro-Electro-Mechanical device can be used both in studio and live performances.
Systems (MEMS), Gyroscopes, Accelerometers, Magnetome- Our Celeritas system is built around the Tyndall’s 25mm
ters WIMU which is an array of sensors combined with a 12-
bit ADC [6, 4, 5, 7]. The sensor array is made up of three
1. INTRODUCTION single axis gyroscopes, two dual axis accelerometers and two
Motion Tracking technology has interested the multime- dual-axis magnetometers.
dia art community for two or more decades. Most of these The accelerometers measure the acceleration on the three
systems have tried to offer a valid alternative to camera- orthogonal axes (U, V and W as shown in Figure 2).
based system such as VNS[2] and EyesWeb [14]. Between The gyroscopes measures the angular rate around the
them are: DIEM [1], Troika Ranch[15], Shape Wrap, Pair three orthogonal axes.
and Wisear [19], Eco [17], Sensemble [13],The Hands [8]and The magnetometers measure the earth magnetic field on
Celeritas [20, 16] from the authors. the three orthogonal axes.
In this paper we describe the algorithm to numerically
solve the orientation of each single mote in our Celeritas
system. We also aim to give an introduction to the topic
2. TERMINOLOGY
to persons that aim to develop their own tracking device Before going into the description of the algorithm, we
(using Arduino for example). Although a full Max/Msp would like to introduce the reader to some of the most com-
and pd library has been developed and made available at mon terms in use to make easier the understanding of the
[10] , we have listed in the reference of this paper other following sections. A good explanation of this terms and of
Max/Msp developers [11, 12, 18] whose work has been freely the 3D math can be also found at [9].
released though their work focuses only on the conversion System of Reference. We will discuss the two systems of
between different numerical representation and does not in- reference: the Earth-Fixed one (x, y, z) which has the x
teract with the specific device specified above. axis pointing at the North Pole, y axis at west and z at
the Earth’ core. The IMU-Fixed frame (u, v, w) with three
orthogonal axes parallel to the sensor’s sensitive axes.
Quaternions form a 4-dimensional normed division alge-
Permission to make digital or hard copies of all or part of this work for bra over the real numbers. It is usually written in the form
personal or classroom use is granted without fee provided that copies are qw2 +qx2 +qy 2 +qz 2 = 1. Quaternions are used to represent
not made or distributed for profit or commercial advantage and that copies the rotation of an object in a 3D space. They are very com-
bear this notice and the full citation on the first page. To copy otherwise, to mon in programming, as they don’t suffer from problems
republish, to post on servers or to redistribute to lists, requires prior specific with singularities at 90 degrees.
NIME08, Genova, Italy Euler Angles The Euler angles are usually given in aero-
Copyright 2008 Copyright remains with the author(s). nautical term as Pitch, Roll and Yaw as shown in Figure
103
Table 1: Unit of measurement

Sensor from to
Gyroscopes ADC degrees/second
Accelerometers ADC m/s2
Magnetometer ADC gauss
environment are ADC in the range of 0 and 4096 (as our

microcontroller is 12 -bit resolution). After having calcu-
lated the offset by averaging the first thousand incoming
values leaving the sensor in steady position, we are able to
read the values related to the movement of the sensor by
subtracting from the calculated offset. Then we need to
convert the ADC values from each sensor into the proper
units of measurement as shown in Table 1.
Multiplying the subtracted ADC value by the rate reso-
Figure 2: sensor and IMU reference system.
lution value of each sensor can do this. The rate resolution
value can be found in the specification sheet of each sensor
or by empirical methods.
3.2 Orientation using Gyroscopes

Sampling at t rate enables us at each Δt to know α, φ
and θ applying the following formulas:
Δθ(k + 1) = Δt ∗ Δθ̇(k + 1);
Δφ(k + 1) = Δt ∗ Δφ̇(k + 1);
Figure 3: Orientation Algorithm.

Δα(k + 1) = Δt ∗ Δα̇(k + 1);
where Δθ, Δφ, Δα represent the incremental angle around
2, where: Pitch is the rotation around the lateral (V) axis,
W, V, and U respectively. Next, the algorithm constructs
Roll around the longitudinal (U) axis and Yaw around the
the rotation matrix around each particular axis and multi-
perpendicular (W) one. The calculation involves the usage
ply them together.
of non-commutative matrix multiplication.
Orientation Matrix mathematically represents a mathe- 0 1
matical bases change in a 3 dimensional space, thus, we can cosΔθ(k + 1) −sinΔθ(k + 1) 0
translate the sensors output coordinates, given with respect R(w, θ, k + 1) = @ sinΔθ(k + 1) cosΔθ(k + 1) 0 A
to the IMU fixed frame, into the reference Earth frame using 0 0 1
the Orientation Matrix.
Angle x, y, z describe rotation using a unit vector indi- 0 1
cosΔφ(k + 1) 0 sinΔφ(k + 1)
cating the direction of the axis and an angle indicating the R(v, φ, k + 1) = @ 0 1 0 A
magnitude of the rotation about the considered axis. −sinΔφ(k + 1) 0 cosΔφ(k + 1)
3. ALGORITHM 0 1
1 0 0
With our cluster of sensors we calculate the orientation of R(v, α, k + 1) = @ 0 cosΔα(k + 1) −sinΔα(k + 1) A
the sensor with respect of the Earth-fixed frame of reference. 0 sinΔα(k + 1) cosΔα(k + 1)
The orientation is retrieved using two source of estimation:
the output of the gyroscopes and then the combination of which can be generally written as:
accelerometers and magnetometers on the other. The rea- Rotation(k+1) = R(w, θ, k+1)∗R(v, φ, k+1)∗R(v, α, k+1);
sons for doing this is that gyroscopes are not self-sufficient
for long-term precision because of a drift associated with Therefore we define our Orientation, in Matrix format, as
their reading. Accelerometers and magnetometers, on the to be:
other hand, are good for long-term stability but, once again,
Orientation(k + 1) = Rotation(k + 1) ∗ Orientation(k);
not good for short-term accuracy due to occasional inaccu-
racy caused by linear and rotational acceleration. Thus, From these results, the algorithm converts the resulting
our algorithm combines the short-term precision of the gy- matrix into quaternion and angle,x,y,z format which facili-
roscopes with the long-term precision of accelerometers and tate ease of use in graphical oriented programming language
magnetometers. such as Max/Msp and pd.
3.1 Reading the values from the sensor 3.3 Orientation using Accelerometers and Mag-
As the data from the motes are sent wirelessly to a base netometers
station connected to the host computer via serial port, we So far we considered the 3 x 3 Orientation Matrix as the
designed a C driver to handle this stream. Ultimately, we matrix describing the orientation of the IMU-fixed frame in
compiled a new external (mote) to import this stream in relation to the Earth-fixed frame. Conversely, the Inverse
Max/Msp or Pd. Values appearing in our host application Rotation Matrix describes the orientation of the Earth-fixed
104
Figure 4: Pseudo Max patch. Figure 5: Pointing-at out of the shell.
frame in relation to the IMU-fixed frame and can be written quat2axis Converts the quaternion format to the angle, x,
as: y, z format.
azi ele Converts the input to azimuth and elevation num-
0 1 bers making the format readable by Vector Base Amplitude
a11 a12 a13
Orientation 1 = @ a21 a22 a23 A
− Panning (VBAP) or other multi-channel libraries.
a31 a32 a33 A schematic of the max patch is shown in Figure 4.
The two dual-axis magnetometers enable the reading of 5. APPLICATION DEVELOPMENT

the earth’s magnetic field on the three orthogonal axes.
These values are used to calculate the first column of the In- On the basis of the above algorithm, we developed sev-
verse Orientation Matrix (a11, a21,a31) using the following eral applications for multidisciplinary live performances like
set of formulas: Vitruvian for live dance performance and DjMote. In this
paper we introduce Pointing-at, a new gesture device for
0 1 0 1 the control of sounds in a 3-D surround environment.
a11 Hu
@ a21 A = 1/H ∗ @ Hv A 5.1 Pointing-at
a31 Hw Pointing-at is a new wearable wireless glove that uses the
where H is the earth magnetic field magnitude and Hu, results of our Celeritas project and the reliability of the
Hv and Hw are the magnetic filed vector measured by the Tyndall’s 25mm WIMU to control sounds in a 3-D or any
sensor along U, V and W respectively. set of surround system. Its design is focused on the anal-
To calculate the third column (a13, a23, a33) of the In- ysis of the methodologies concerning the gestural mapping
verse Orientation Matrix, we use the values read from the of sounds in a fully three-dimensional environment. As the
two dual-axis accelerometers. The formula used is: most natural movement related to directionality is the sim-
ple pointing of the hand in a given direction, we decided to
0 1 0 1 use the orientation of the hand/arm as indicator of this di-
a13 Gu rection. Thus, we fitted the WIMU in a glove, which has a
@ a23 A = 1/g ∗ @ Gv A
protective pocket on the top of the hand’s dorsum as shown
a33 Gw in Figure 5.
where g describe the earth gravity acceleration magnitude
and Gu, Gv and Gw the acceleration vector measured by
5.2 Gesture Mapping
the IMU along U, V and W The orientation data retrieved from the reading of the
Finally the third column (a12, a22, a23) is calculated WIMU are translated into azimuth and elevation coordi-
from the cross product between the first and the third col- nates making the data compatible to libraries such as VBAP[3].
umn as described below: Azimuth and elevation are calculated taking into account
the z-axis the main axis of reference. The third variable
0 1 0 1 that is a characteristic of surround sound editor systems
a12 (a21 ∗ a33) − (a31 ∗ a23) is source distance. The gestural mapping of this parame-
@ a22 A = @ (a31 ∗ a13) − (a11 ∗ a23) A
ter has been solved in the following way: a 90 degree roll
a32 (a11 ∗ a23) − (a21 ∗ a13) movement enable the azimuth to be read as distance value
in the range between 0 and 90 where 0 indicates the farthest
4. ORIENTATION LIBRARY AND MAPPING distance and 90 the closest (Fig 6).
For the purpose of the numerical evaluation of the sensor’s
orientation an ad hoc set of Max/Msp and pd external were 6. CONCLUSION AND FUTURE WORK
developed. The most important are listed below.. In this paper we described an algorithm that is used to
Orientation Calculates the Orientation of the IMU. The retrieve the orientation of an object that has attached to it
inlet receives a list made up of the following elements: pitch, a cluster of sensor made up of accelerometers, gyroscopes
yaw, roll, n pack, sampling time, Alpha rate resolution, Phi- and magnetometers. We also introduced a new wearable
rateresolution, Theta rate resolution. Each of the 9 outlets wireless glove, Pointing-at, for the gestural control of sounds
are element of the 3 x 3 Matrix format representing the in a 3-D surround space. The device was tested in our lab
IMU’s orientation. and proved to be a reliable tool for live performances. At
matrix2quat Converts the 3 x 3 Matrix to Quaternion the moment our team is working on the implementation of
format. a bending sensor (see red strip in Figure 5) to enable the
105
dance and music systems. Computer Music Journal,

24(1):57– 69, April.
[15] M.Coniglio.
http://www.troikaranch.org/mididancer.html.
[16] B. O’Flynn, G. Torre, M. Fernstrom, T. Winkler,
A. Lynch, J. Barton, P. Angove, and C. O’Mathuna.
Celeritas - a wearable wireless sensor system for
interactive digital dance theater. Body Sensor
Network, 4th Internation Workshop on Wearable and
Implantable Body Sensor Network, 2007.
[17] C. Park and P. H. Chou. Eco, ultra-wearable and
expandable wireless sensor platform. Proceedings of
the International Workshop on Wearable and
Implantable Body Sensor Networks (BSN’06), pages
162 – 165, April 2006.
[18] D. Sachs. A forearm controller and tactile display,
2005.
[19] D. Topper and P. Swendensen. Wireless dance control:
Pair and wisear. Proc. Of the 2005 International
Figure 6: Gesture to control distance parameter. Conference on New Interfaces for Musical Expression
(NIME’05),, pages 76 –79, May 2006 Canada.
[20] G. Torre, M. Fernstrom, B. O’Flynn, and P. Angove.
grabbing and release feature of sound on the fly. In future Celeritas wearable wireless system. Proc. Of the 2005
works we aim to reduce the size of the WIMU to 10mm to International Conference on New Interfaces for
improve wearability. Musical Expression (NIME’07, pages 205 – 208, 2007
New York.
7. ACKNOWLEDGMENTS
We would like to thank the Tyndall National Institute
for allowing us access under the Science Foundation Ireland
sponsored National Access Programme. Many thanks also
to the staff at the Interaction Design Centre of the Univer-
sity of Limerick for their input in to the project and real-
ization of Pointing-at. Finally thanks to all the researchers
and students at Tyndall and UL who gave an important
input to this project.
8. ADDITIONAL AUTHORS
Additional Author: Brendan OFlynn, Tyndall National
Institute - Cork University, email: brendan.oflynn@tyndall.ie
9. REFERENCES
[1] http://hjem.get2net.dk/diem/products.html.
[2] http://homepage.mac.com/davidrokeby/vns.html.
[3] http://www.acoustics.hut.fi/ ville/.
[4]
http://www.analog.com/en/prod/0,2877,adxl202,00.html.
[5]
http://www.analog.com/en/prod/0,,764 801 adxrs150,00.html.
[6] http://www.analog.com/en/prod/0,,ad7490,00.html.
[7]
http://www.atmel.com/dyn/products/product card.asp?part id=2018.
[8] http://www.crackle.org/the%20hands%201984.htm.
[9] http://www.euclideanspace.com/.
[10] http://www.idc.ul.ie/idcwiki/index.php/celeritas.
[11] http://www.jasch.ch/code release.html.
[12]
http://www.maxobjects.com/?v=libraries&id library=111.
[13] R. Aylward, S. D. Lovell, and J. Paradiso.
”sensemble: A wireless, compact, multi-user sensor
system for interactive dance”. The 2006 International
Conference on New Interfaces for Musical Expression
(NIME’06), (134 - 139), June 2006.
[14] A. Camurri, S. Hashimoto, M. Ricchetti, A. Ricci,
K. Suzuki, R. Trocca, and G. Volpe. Eyesweb:
Toward gesture and affect recognition in interactive
106
Application of new Fiber and Malleable Materials for

Agile Development of Augmented Instruments and
Controllers
Adrian Freed
CNMAT (Center for New Music and Audio Audio Technology)
Dept of Music UC Berkeley,
1750 Arch Street, Berkeley CA 94709
+1 510 455 4335
adrian@cnmat.berkeley.edu
ABSTRACT community – notably: high channel count, high resolution data
acquisition [1, 5, 12]; OSC wrapping, mapping, scaling and
The paper introduces new fiber and malleable materials, calibrating [17]; and visual programming dataflow languages
including piezoresistive fabric and conductive heat-shrink [15] tuned for media and arts applications, e.g., Max/MSP, Pd,
tubing, and shows techniques and examples of how they may
SuperColider, Processing, etc.
be used for rapid prototyping and agile development of musical
instrument controllers. New implementations of well-known The paper is structured as follows. Section 2 introduces new
designs are covered as well as enhancements of existing materials that facilitate rapid processing by describing a series
controllers. Finally, two new controllers are introduced that are of variations on a single theme: the humble footswitch. Section
made possible by these recently available materials and 3 shows how existing controllers can be rapidly improved using
construction techniques. those new materials. Section 4 describes novel controllers made
possible by the new materials. A conclusion considers the
Keywords challenges of teaching the design and construction techniques
that these new materials demand.
Agile Development, Rapid Prototyping, Conductive fabric,
Piezoresistive fabric, conductive heatshrink tubing, augmented
instruments. 2. Variants of the footswitch
A simple controller perhaps, but the humble footswitch finds
1. INTRODUCTION wide use in technology-based musical performance contexts
because the performer’s hands are usually actively engaged
Many human activities have a variant form optimized to deliver
playing an instrument. The traditional approach is to mount a
results in the shortest time. The idea is the same although the
heavy-duty mechanical switch into a solid metal box. These
names vary: short story writing, hacking, sketching, composing
“stomp boxes” are standard tools of the electric guitarist. As a
esquisses, short-order cooking, improv., fast-turn, live coding,
vehicle for exploration of new fiber and malleable materials we
rapid prototyping etc. Rapid and agile development of
will improve on the traditional stomp box by adding the
augmented instruments and controller prototypes is valuable
requirement that switch operation should be silent.
because many of the important insights that guide design
refinements are unavailable until performers have experienced The basic design pattern for a switch is to combine an
and used a device. The best predictor of the effectiveness of interruptible electrical conduction path (contact system) with a
new controller design is usually the number of design iterations device that returns the contacts to a stable rest equilibrium
available. when the actuating force is removed. The additional challenge
we face is that the motions created by these forces have to be
Controller projects usually involve co-design of musical
dampened to minimize the sounds they make.
mapping software, electronics for the sensor data acquisition,
and the physical interface. Providing an approximate physical
interface early speeds development of the other components of
2.1 Floor Protector + FSR
the system by providing both a real data source and time for Figure 1 shows a solution that can be assembled in a few
performance skills to be acquired. minutes.
This paper focuses on new physical design techniques and

materials. We will not explore electronic interfacing and music
control structure programming issues because mature rapid-
prototyping methods [2] are already well known to the
requires prior specific permission and/or a fee. Figure 1:FSR Footswitch
The foot/switch interface is provided by a soft but “grippy”
toroidal ring of molded rubber embedded in a hard PVC disk.
107
These disks are sold as “floor protectors” for furniture. A flat

ring of polyurethane foam with a peel-off adhesive is on the
opposite face of the disk. This foam provides the restoring force
for the “switch” which is implemented with an Interlink Force
Sensing Resistor (FSR). A screw provides the necessary
adhesion to resist kicks of the foot and also preloads the FSR.
Although not generally a good idea, punching a hole at the
center of the FSR (with an office hole punch) provides a clear
path for the screw to reach the wooden substrate below. Dust is
prevented from entering the gap between the interdigitated
conductor layer and the piezoresistive film layer by the foam.
This sensor works adequately but the tension adjustment is
delicate as the FSR saturates easily at the range of forces a foot
can easily apply.
Figure 3:Fabric/Foil Footswitch
2.2 Floor Protector + Piezoresistive Fabric
The top conductor is a copper impregnated nylon fabric. The
advantage these materials offer is that the sensor can be scaled
in either direction simply by cutting more tape or cutting larger
pieces out of the base fabric stock. Wires can be soldered
directly to the copper tape and with some practice even to the
copper fabric.
Many interesting variants of this design are possible by
substituting the copper tape with conductive fabric and using
conductive and insulating thread for adhesion. These techniques
are the basis of many wearable switches and sensors [3, 4, 11].
Figure 2: Fabric/PCB Footswitch
In Figure 2 we have replaced the FSR package with a PCB 2.4 Half-round + Piezoresistive
containing dozens of adjacent conducting strips and a patch of Fabric+Conductive Heatshrink + Copper
piezoresistive fabric made by EEonyx (http://eeonyx.com). The Tape
measuring principle is basically that of the FSR except we have The previous designs are limited by the sizes of floor protecting
exchanged the position of the conducting elements and disks available. The construction illustrated in Figure 4 is free
piezoresistor with respect to the foot. This interface took a few of these size constraints.
minutes longer to integrate because the circuit board needed to
be wired so that alternating traces were connected. The board is
a Schmartboard surface-mount prototyping board
(http://schmartboard.com). The tension adjustment is
unproblematic because the piezoresistive fabric supports very
high dynamic range of forces. By combining plies of fabric or
selecting a thicker felt substrate the foam layer could be
eliminated from the design – the fabric itself providing the
restoring force. This would be a higher reliability solution as
polyurethane foam breaks down, especially on PVC substrates
[14].
2.3 Floor Protector + Piezoresistive Figure 4: Fabric/Heatshrink Footswitch

Fabric+Conductive Adhesive Copper Tape A half-round wooden strip is cut to the desired length and two
and Conductive Fabric separate strips of copper tape are employed on the flat length
The sensor shown in Figure 3 uses the same piezoresistive felt and the curve. A strip of piezoresistive fabric is trapped and
fabric as that of Figure 2 but avoids the need to print a preloaded under the flat copper tape by a length of ShrinkMate
conductor pattern by using a conductor on either side of the heat-shrink tubing that is conductive on the inside (http://
fabric. The base conductor is a series of overlapped adhesive methodedevelopment.com). This tubing connects one side of
copper strips. They form a single conductor because the the fabric to the top conductive strip.
adhesive itself is a conductive acrylic.
If we ground the upper strip and measure the resistance from
the inner conductor to this ground the whole sensor is self-
shielding. We chose the half-round configuration for the
footswitch application but this technique can be used for
cylinders and other shapes. Figure 5 shows a grip sensor built
with a larger dowel.
108
Figure 5: Fabric Round Touch Rod

The inner conductors are strips of separate conductive tape
allowing for separate pressure measurements around the dowel.
By carefully controlling the time of application of the heat gun
the amount of shrink has been controlled to allow the tubing’s
Figure 7: Capacitive footswitch
inner conductor to “self connect” to tape on the dowel without
constricting the piezoresistive fabric. This can be clearly seen Although useful for rapid prototyping we have found that most
on the left of Figure 5. These structures are mechanically stable capacitive sensing integrated circuits are challenging to apply
without adhesives. for musical applications. In stage situations we have found
them to be susceptible to external electromagnetic fields (from
lighting dimmers for example) and reliable sensing often
requires careful calibration and multiple measurements
resulting in delays that are unacceptably long, i.e. hundreds of
milliseconds.
2.6 Position sensing strip + Rubber Door

Threshhold
To create multiple foot switches we can simply tile out arrays
of the previously described devices. There is a better, faster
way with the additional convenience that the switch functions
are built into a single strip that is constructed quickly in a few
steps. Instead of employing the principal of piezoresistivity
Figure 6: Bendable Pressure sensor change we employ a printed-resistor position-sensing strip such
as the SlideLong (http://infusionsystems.com) or a Softpot
The sensor in Figure 6 uses the same design principal as the (http://spectrasymbol.com).
foot sensors but used a 1/4inch malleable aluminum tube as the
substrate. The interesting feature explored in this configuration Modifying an existing interface in this way is often the fastest
is that the shape of the entire sensor assembly can be adjusted way to develop a new controller. These position-sensing strips
after construction. The key to this design is the helical wrap of were originally designed for finger touch interaction. To protect
the piezoresistive fabric under the flexible conductive heat- the sensor strip and provide tactile feedback we attach it to the
shrink tubing. The overlap in the wrap avoids a short circuit base of a length of rubber door threshold. This is molded with
path. The stray fabric end illustrates the value in rapid corrugations that grip the sole of the foot well and a curved
prototyping of the “measure-once cut and trim” principal which surface above the sensor. The flat sensor stays where it is put
is in direct opposition to the traditional carpenter’s maxim on smooth flooring and carpets.
“measure twice and cut once”. The majority of prototypes Unfortunately the lateral sensitivity region of position sensing
illustrated in this paper were sized “on the fly” with reference to strips is narrow and the top of the door threshold may simply
other instruments or the performer’s body rather than to collapse and not activate it. The solution is to fill the gap above
numerical measurements. the sensor with a lightweight flexible incompressible material.
Two lengths of nylon rope worked well in the prototype shown
2.5 E-field (capacitive) Sensing in figure 8.
The previous implementations use the same physical sensing
principle: gesture modulating current flow. Capacitive or e-field
proximity sensing is an interesting alternative principle to apply
for rapid prototyping. Only one conductor is needed and more
flexibility about where the conductor is placed is available. It
can be under glass, plastic or wood or other non-conducting
material.
Figure 7 shows a foot switch built with an approach that can be
used to create sensors in less than a minute. A wire with a
crocodile clip is attached to a conductive spandex fabric
(http://eeonyx.com) that is stuffed inside a length of plastic
door insulation. The fabric needn’t be very conductive for this
to work well as no current flows through it.
Figure 8: Position Switch
109
This device was designed to roll up and fit into a guitar case.
The electronics is configured to measure the positions of up to
two concurrent depressions of the strip.
3. Augmenting controllers
3.1 Pressure sensing buttons
In a project augmenting the cello [6, 7] the author discovered
many situations where it was as easy to install a pressure sensor
as a switch. Many microcontrollers have built-in A/D
converters so often there is a tiny or no additional cost to using
pressure sensors for switches. Continuing this idea -that the
fastest route to a new controller may be to modify an existing
sensor- we see in Figure 9 how to retrofit the Monome button
array with pressure sensors, the grey octagonal disks.
.
Figure 10: Dual Touch Pad
We can still explore some multitouch gestures by assembling a
pad that senses two simultaneous touches as shown in Figure
10. A pair of SlideWide sensors (http://infusionsystems.com)
are stuck to each other at right angles.
Instead of measuring single touch position for each axis using
the well-known potential divider method we ground the
“wiper” contact and measure the two end point resistances to
this ground node to estimate the position of the outer most
touch point pair. This idea was patented in 1972 for duophonic
analog synthesizer keyboards (USPO3665089). This method
was independently rediscovered for resistive touch applications
by the author and Mr. Loviscach [10].
Figure 9: Pressure Sensitive Monome Adaptation
The controller in Figure10 also includes a sheet of
Monome (http://monome.org) interfaces are square arrays of lit piezoresistive fabric to measure a single pressure estimate. The
switches interfaced over USB using OSC messaging. A large SlideWide sensors flex sufficiently for a useful touch pressure
part of the desirability of this interface is the tactile quality of range.
the buttons created with careful design of the silicone molding.
Each button has a ring of conductive rubber attached to connect 3.3 Touch Pad
with a circular array of interdigitated contacts. The conductivity Most computer laptop touch pads use capacitive measuring
of this connection does change with pressure but the techniques because of the low costs of high volume PCB
conductivity is so effective it is hard to measure the change production. Touch pressure cannot be measured by these pads
accurately. By cutting a small disk of piezoresistive fabric with which is unfortunate as it is an extremely useful control
a central hole we can retrofit a higher resistance range pressure parameter in musical applications - specially in combination
sensor. With careful design of the interface electronics we can with spatial location [16]. Resistive touch pads by contrast
even eliminate the array of diodes needed to scan concurrent provide x,y and z axis sensing, require simple calibration and
depressions of the buttons [9, 13]. are less prone to electrical interference and variations in
ambient humidity.
3.2 Dual Touch Pad Interlink, the main supplier of resistive xyz touch pads offers
Because of the number of connections required for matrix
them in only a few small standard sizes. They are rather
scanning it is difficult to rapidly prototype multi-touch systems.
expensive and technically challenging to employ in large
Even optical systems that avoid matrix scanning on the surface
arrays. By combining Velostat (http://3m.com), an electrically
itself are hard to build quickly because of the difficulties of
resistive plastic sheet material and piezoresistive fabric we can
sufficiently illuminating the interior of the touch surface and the
rapidly build xyz pads for modest cost and in a wide range of
complexities of calibrating the optical path of the camera [8].
sizes as illustrated in figure 11.
110
4.2 The Tablo

The novel controller of Figure 13 exploits recently available
conductive stretchable fabric and an approximation to the curve
known as the witch (sic) of Agnesi.
Figure 11: Conductive Plastic XYX TouchPad

The spatial sensing principal has long been used in touch screen
sensors. Conductive strips at the edge of each sheet of resistive
plastic form the nodes of a wide resistor. Two such sheets are
arranged at right angles to each other. An intervening layer of
piezo-resistive fabric establishes current paths at the point of
connection between the two sheets. With a carefully designed
interface circuit the three desired parameters can be extracted
from the four nodes of the sensor [16]. As well as size
flexibility this design avoids the need for a fine array of Figure 13: Tablo Fabric Multitouch Controller
insulating spacer dots or the special air gap used in Interlink
touchpads. The fabric is stretched in an embroidery hoop and draped over
an inverted circular bowl.
4. New Controllers A piece of conductive plastic cut in a special shape forms a

corolla on the surface of the bowl. The tips of each petal are
4.1 Kalimba folded inside the bowl and taped with conducting adhesive
Figure 12 shows a simple controller inspired by the kalimba. It copper tape. The microcontroller board measures the electrical
illustrates application of rapid prototyping practice integrating resistances of these petals from their tips to a common center
the materials and techniques we have seen earlier into a established with a conductor at the flat of the bowl. As the
complete controller that can be assembled in an hour or two. conductive stretchable fabric (the “calyx” to complete the
The core kalimba design lends itself to rapid assembly because flower analogy) is displaced towards the bowl it shorts out
of its use of a single central bar held down by two screws to different lengths of each conductive plastic petal.
trap the array of tines between two pivot bars.
The result is a circular array of nearly mass-less displacement
sensors. Unlike the Continuum Finger board
(http://HakenAudio.com) the gesture-to-displacement
relationship changes sensitivity according to distance from the
center of the bowl. This allows for several different playing
styles. One style – similar to hand drum technique – involves
tapping the fabric surface directly onto the bowl with the
fingers of one hand and leaning towards the other side of the
bowl with the palm.
Another style involves both hands interacting from the outer
hoop towards and around the base of the bowl. The latter style
Figure 12: Fabric and Stick Kalimba
affords some interesting parameter mappings. One fruitful
Wooden tines are used in this prototype because they are faster approach is to divide the circular petal array into two halves and
to shape than the traditional metal and this controller doesn’t compute the direction and amplitude of a pair of vectors by
require the tine’s resonances to be tuned. The flexibility of summing contributions of sensors accessible to each hand. An
copper tape is exploited as strips follow the contour of the flat additional third parameter for each hand representing the “size”
base around the curve of a half-round pressure pivot. of the gesture is obtained by computing the ratio of the
arithmetic and geometric means of the displacement values.
Each tine is covered in conductive copper tape. Trapped
between this copper strip and the base strip is a piece of peizo- Two refinements were added recently: a pressure sensitive
resistive fabric. The rear pivot of the tines also has a copper “aftertouch” using fabric on the base around the bowl and a
strip that is a grounding bus for the tines. The 18F2553 pressure sensing fabric disk at the very center of the controller.
controller has 10 ADC’s and sends an OSC-encoded estimate
of the voltage formed by a pull-up resistor and the variable 5. Conclusion
resistance pressure sensor of each tine.
New fiber and malleable materials present interesting
Notice that the length of each base copper strip is trimmed to challenges and potential beyond the rapid prototyping
simplify the wiring flow of the conductors to the advantages described here. It is surprisingly hard to find
microcontroller. learning materials or a learning environment to exploit this
111
potential. Most engineering departments still focus on high Computer Music Conference, International
manufacturing volume materials made with standard milling Computer Music Association, New Orleans, LA,
and printing techniques. 2006, pp. 636-642.
A difficult problem for experienced designers is that they have [7] A. Freed, F.-M. Uitti, D. Wessel and M.
to abandon standard assumptions, such as “conductors are Zbyszynski, Augmenting the Cello, International
metals and plastics are nonconductive”. Polymers exist now Conference on New Interfaces for Musical
that are nearly as conductive as copper and are expected soon to Expression, Paris, France, 2006, pp. 409-413.
be more conductive. Even translucent concrete is now [8] J. Y. Han, Low-cost multi-touch sensing through
available. frustrated total internal reflection, Proceedings of
the 18th annual ACM symposium on User interface
Physical computing books mostly encapsulate workable recipes software and technology, ACM, Seattle, WA, USA,
that are twenty years old. Vendor application notes usually 2005.
address very narrow application spaces. [9] W. D. Hillis, A High-Resolution Imaging Touch
Effective application of the new materials requires a new Sensor, The International Journal of Robotics
curriculum based on emerging design patterns and will require Research, 1 (1982), pp. 33.
a context where the wisdom and experience of fiber and [10] J. Loviscach, , Two-finger input with a standard
malleable materials artists can be melded with that of material touch screen, Proceedings of the 20th annual ACM
scientists and application developers. symposium on User interface software and
technology, ACM, Newport, Rhode Island, USA,
6. ACKNOWLEDGMENTS 2007.
Frances Marie Uitti’s slider controller bag motivated the [11] R. Koehly, D. Curtil and M. M. Wanderley, Paper
author’s exploration of fabric sensing. Thanks to Leah FSRs and latex/fabric traction sensors: methods for
Buechley and Syuzi Pakchyan for generously sharing their the development of home-made touch sensors,
sources and techniques. Thanks to Judi Pettite for providing a Proceedings of the 2006 conference on New
challenging and rewarding studio environment in her fiber arts interfaces for musical expression (2006), pp. 230-
and malleable materials class. 233.
[12] D. Overholt, Musical Interaction Design with the
CREATE USB Interface: Teaching HCI with CUIs
7. REFERENCES instead of GUIs, ICMC, New Orleans, LA, USA,
[1] R. Avizienis and A. Freed, OSC and Gesture 2006.
features of CNMAT's Connectivity Processor, Open [13] J. A. Purbrick, A Force Transducer Employing
Sound Control Conference, Berkeley, CA, 2004. Conductive Silicone Rubber, Proceedings of the 1st
[2] S. K. Bjorn Hartmann, Michael Bernstein, Leith International Conference on Robot Vision and
Abdulla Brandon Burr, Avi Robinson-Mosher, Sensory Controls (1981), pp. 73-80.
Jennifer Gee, Reflective Physical Prototyping [14] J. T. Remillard, J. R. Jones, B. D. Poindexter, J.
through Integrated Design, Test, and Analysis, H. Helms and W. H. Weber, Degradation of
UIST 06, ACM, Montreaux Switzerland, 2006. Urethane-Foam-Backed Poly (vinyl chloride)
[3] L. Buechley, N. Elumeze and M. Eisenberg, Studied Using Raman and Fluorescence
Electronic/computational textiles and children's Microscopy, Applied Spectroscopy, 52 (1998), pp.
crafts, Interaction Design And Children (2006), 1369-1376.
pp. 49-56. [15] M. J. Wesley, J. R. P. Hanna and J. M. Richard,
[4] A. Chang and H. Ishii, Zstretch: a stretchy fabric Advances in dataflow programming languages,
music controller, Proceedings of the 7th ACM Comput. Surv., 36 (2004), pp. 1-34.
international conference on New interfaces for [16] D. Wessel, R. Avizienis, A. Freed and M. Wright,
musical expression (2007), pp. 46-49. A Force Sensitive Multi-touch Array Supporting
[5] A. Freed, R. Avizienis and M. Wright, Beyond 0- Multiple 2-D Musical Control Structures, New
5V: Expanding Sensor Integration Architectures, Interfaces for Musical Expression, New York,
International Conference on New Interfaces for 2007.
Musical Expression, Paris, France, 2006. [17] M. Wright, Open Sound Control: an enabling
[6] A. Freed, A. Lee, J. Schott, F.-M. Uitti, M. Wright technology for musical networking, Organised
and M. Zbyszynski, Comparing Musical Control Sound, 10 (2005), pp. 193-200.
Structures and Signal Processing Strategies for the
Augmented Cello and Guitar, International
112
Transforming Ordinary Surfaces into Multi-touch

Controllers
Alain Crevoisier †, ‡ Greg Kellum ‡
† ‡
University of Applied Sciences Western Music Conservatory of Geneva
Switzerland (HES-SO / HEIG-VD) Rue de l’Arquebuse 12
Rue Galilée 15, CH-1400 Yverdon, CH-1211 Genève
alain.crevoisier@heig-vd.ch greg.kellum@cmusge.ch
ABSTRACT detection of touch can be achieved roughly using stereoscopy [8,

In this paper, we describe a set of hardware and software tools for 14], or more precisely with four cameras placed in the corners of
creating musical controllers with any flat surface or simple object, the interactive area [9, 23]. It can also be achieved with a single
such as tables, walls, metallic plates, wood boards, etc. The camera by analyzing the shadow of the fingers [13], or by
system makes possible to transform such physical objects and watching fingers intercepting a plane of infrared light projected
surfaces into virtual control interfaces, by using computer vision above the surface [12]. Virtual Keyboards currently on the market
technologies to track the interaction made by the musician, either [21, 22] are based on this approach, which has the advantage of
with the hands, mallets or sticks. These new musical interfaces, requiring less computational power than the other ones. However,
freely reconfigurable, can be used to control standard sound those devices do not compute true coordinates of touch and their
modules or effect processors, by defining zones on their surface interactive area is limited to keyboard size. We have adapted this
and assigning them musical commands, such as the triggering of method to be compatible with larger surfaces, and combined it
notes or the modulation of parameters. with acoustic onset detection in order to get precise timing
information. In addition to fingers, our system can detect oblong
objects striking the surface, like sticks and mallets, and it also
Keywords measures the intensity of taps or impacts, allowing to perform the
Computer Vision, Multi-touch Interaction, Musical Interfaces. interface both with percussive and touch gestures.
1. INTRODUCTION 1.2 Screens vs. Ordinary Surfaces

When considering a large size, reconfigurable music controller,
1.1 Multi-touch Everywhere one may easily think to something like the Lemur, with a rear
There is a strong focus on multi-touch interaction in HCI, projected screen made multi-touch sensitive thanks to one of the
especially since the work of Jeff Han at New York University, approaches mentioned before. However, setups of this kind are
who showed the way for radically new interaction paradigms better suited for a fixed installation and are not very practical for
using a large size multi-touch display [3]. Though using a smaller transporting to different venues. The same can be said in general
screen, the Lemur controller [18] has also shown that multi-touch for large size displays. Even in case of using simple front projec-
has a great potential for innovative music applications, in tion, placing the projector on a shelf or on the ceiling is usually
particular to create virtual control interfaces that are fully not straightforward and once the installation is complete, the
reconfigurable by users. Multi-touch screens have even reached system and the projection surface cannot be moved easily [13].
the point of mass production with the release of Apple’s iPhone On the contrary, flat surfaces, suitable to be used as an interface,
and iPod Touch. However, most of the available technologies and like tables and walls, are available everywhere. No need to trans-
approaches only work in specific conditions and are not suitable port the interface if we can simply carry a compact system that
for ordinary surfaces. For instance, some sensing systems are will allow transforming an ordinary surface into an input device.
embedded into the surface [2, 11, 20], while others are specific to
screens, either as an overlay above the screen (Lemur, iPhone, Another motivation for finding a suitable technology enabling to
iPod Touch), or as a vision system placed behind a rear projected use ordinary surfaces instead of screens or other dedicated
diffusion screen [3, 4, 19, 25]. surfaces is the possibility to invent new musical instruments that
are not only a control device, but also a sound source. In this case,
Solutions exist to track multiple fingers on a generic surface [6, 7, the idea is to use the surface of a vibrating object, such as a
15], but they are not suitable for detecting individual contact metallic plate or a drum head, both to generate a sound and to
points, that is, if fingers are touching or not the surface. True control it via real-time sound processing [1].
Therefore, our system is designed from the bottom up to be
Permission to make digital or hard copies of all or part of this work for suitable for various use-case scenarios, either strictly as a virtual
personal or classroom use is granted without fee provided that copies are control surface, or rather as an augmented percussion instrument,
not made or distributed for profit or commercial advantage and that in combination with all sorts of flat vibrating objects and surfaces.
copies bear this notice and the full citation on the first page. To copy In any case, the question of the visual feedback will be of
otherwise, or republish, to post on servers or to redistribute to lists, particular concern, since users don’t interact with an image, as
requires prior specific permission and/or a fee. with other screen based controllers. Solutions have been
investigated and will be presented in section 3.
113
2. SYSTEM OVERVIEW 2.3 Integrated Illuminators

2.1 Setup Beside their primary function of generating a thin plane of
Figure 1 gives an overview of the setup. The system is comprised infrared light parallel to the surface, illuminators also integrate
of an infrared camera placed above the upper edge of the surface, acoustic sensors, in order to determine more precisely the timing
and two custom designed illuminators placed in the corners of the of impacts on the surface, as well as their intensity and frequency
surface one wants to make touch sensitive. The illuminators are content. They also include several other functions necessary for
then connected to a personal computer, where touch positions are the proper functioning of the system. A list of all the various
mapped to MIDI and OSC control events, using a dedicated functions performed by the integrated illuminators is given below:
software editor. The chosen camera is an OptiTrack Slim:V100 - Generation of the light plane, using an infrared laser, mirror
[26], which features embedded blob tracking at 100 fps, allowing and line generator (Figure 3).
for much faster performances and reduced CPU usage than using
- Control of the power of the laser to adjust to ambient light
a normal camera.
condition (less power is required in low light condition).
- Signal conditioning and amplification for piezo-acoustic
sensors.
- Characterization of impacts (onset detection, intensity, and
distinction between hard and soft impacts).
- USB hub for connecting the camera.
- Synchronization of the laser with the shutter of the camera.
- Management of security, notably with the use of micro-
switches on the bottom of the case (the laser is disabled if the
illuminator is not fixed firmly on the surface).
Figure 3 is giving a schematic view of the main illuminator (left
one). There are two USB connectors, one to connect to the camera
and one to connect to computer. The synchronization signal is
provided through a separate cable. A DIN connector allows for
connecting the second illuminator, which is more simple
(electronic control and signal processing is performed only on the
Figure 1. Setup overview, with camera and integrated main illuminator).
illuminators.
2.2 Multi-touch Detection

The illuminators are generating a plane of infrared light about 1
cm above the surface. When fingers or other objects are
intersecting the plane, reflected light is detected by the infrared
camera as brighter spots in the image (Figure 2). Simple blob
tracking is performed in the camera using a high-pass filter, and
then the positions are sent via USB to the computer, where a
calibration procedure is converting them to the physical space
using interpolation techniques.
Figure 3. Main illuminator.
2.4 Communication
Contact points and their intensity information are sent to the client
application using the OSC protocol. This way, our multi-touch
system can be used as input device by a multitude of OSC
compatible applications (Reaktor, Max/MSP, SuperCollider, and
so on.). The messages are formatted as follows:
/touchEvent id touchState xPos yPos amplitude frequency
Figure 2. Image seen by the camera. Visible light is filtered The TUIO protocol, developed for communication with table-top
out using a 800nm pass filter. tangible user interfaces [5], is also supported. Messages are sent
114
in this protocol using the message type for 2-D cursors, 2Dcur. 3.2 Auxiliary Screen & Reference Grid
These TUIO messages do not, however, contain all of the In this configuration, a visual reference is placed on the surface,
information that is being sent with the previously mentioned OSC in the form of a grid, representing the control area (Figure 4).
message format. TUIO supports sending the identifiers for touch Control widgets displayed on the screen are aligned according to
points and their x and y positions explicitly as well as their touch the same repartition of lines and columns. Figure 5 shows an
state implicitly. It does not explicitly provide support for sending example of three different mapping layouts that have been
amplitude or frequency information, but it does provide a single designed using Max/MSP. Users can switch from one page to
free parameter that can be used to send an int, float, string or blob. another during performance using the two buttons on the bottom
We are using this free parameter to send the amplitude of the right of each page. The first page features a 4x4 array of pads
touch events while discarding their frequencies. with a single fader, the second page a 2D continuous controller
Both our custom OSC messages as well as TUIO can be received with the same single fader, and the third page an array of 5 faders.
by a variety of software clients. We began by working with Max In practice, experiments have shown that the grid on the surface
as our preferred client, but we found that mapping a surface in was giving sufficient information to establish a clear correlation
Max to assign functions to various zones on the interface was between the screen and the surface, allowing to select and activate
quite cumbersome. Even though we were using a scripting the desired control widgets in a single step. The advantage is thus
language inside Max to perform the mapping from contact points a more direct and engaging interaction, compared to the previous
to zones, it still took an inordinate amount of time to create the approach.
mapping script, Therefore, we have designed a dedicated
application for mapping input gestures to MIDI or OSC events, as
described in section 4.
3. IN USE
Since no image is projected on the surface, users need to know
what they are doing and what the state of their actions is on a
different manner. We have explored three different interaction
strategies, as presented below.
3.1 Auxiliary Screen

In this configuration, there is no visual feedback at all on the
surface and users are watching the computer or laptop screen
placed nearby, where fingers positions are represented as colored
dots. Control widgets, such as faders or buttons, are then selected
by tapping the corresponding finger on the surface (in fact, a
quick sequence of Touch Up and Touch Down events with the Figure 5. Three different mapping layouts
same contact position). In this configuration, the surface behaves aligned to the same 6x4 grid.
like a giant touch pad with the traditional two steps procedure,
positioning of the cursor on the appropriate spot, and selection.
The advantage is the simplicity of the approach, but on the other 3.3 Reference Grid Only
hand, the two steps required to select a widget makes it less The last interaction strategy tested is by using only the reference
appropriate for triggering notes and samples, for instance. grid. This is certainly the preferred approach for augmented
percussions, where the goal is not to control multiple faders or a
fine web of control widgets, but rather to trigger samples or use a
simple 2D controller mapped to the entire surface. In this case, the
absence of screen seemed not to be an inconvenient. On the
contrary, it allowed one to be more concentrated on playing the
instrument. The only problem found was in being sure of the
active page in case of multiple, switchable layouts. In this case, it
has been suggested to use a foot controller to change pages (many
foot controllers have a LED to indicate the active switch).
Another suggestion would be to include a bunch of LED’s in a
future version of the integrated illuminator, which users could
assign freely to page changes or other actions. Lastly, if the
surface is horizontal, it is also possible to leave small objects on
the surface to get a visual feedback of the value of a continuous
controller, or of the state of a switch.
4. SURFACE EDITOR
In order to create control layouts and configure surfaces more
easily than using Max, we are currently developing a dedicated
Figure 4. Reference grid on the surface and auxiliary screen. software tool. The Surface Editor is organized around a main
115
window, representing the interface, and several configuration and 6. Koike, H., Sato, Y., and Kobayashi, Y. Integrating Paper and
browsing windows that can be either floating or docked on the Digital Information on EnhancedDesk: a Method for Realtime
border of the main screen (Figure 6). The editor has two modes: Finger Tracking on an Augmented Desk System. ACM
the editing mode, where all configuration windows are visible, Transactions on Computer-Human Interaction (TOCHI), 8
and the full screen mode, where only the interface is visible. (4). 307-322.
Information on the latest version is available on our website [17]. 7. Letessier, J., and Berard, F. Visual Tracking of Bare Fingers
for Interactive Surfaces. Proc. of the ACM Symposium on
User Interface Software and Technology (UIST), 2004.
8. Malik, S., and Laszlo, J. Visual Touchpad: A Two-Handed
Gestural Input Device. Proceedings of the International
Conference on Multimodal Interfaces, 2004, 289-296.
9. Martin, D.A., Morrison, G., Sanoy, C., and McCharles, R.
Simultaneous Multiple-Input Touch Display, Proc. of the
UbiComp 2002 Workshop.
10. Polotti, P., Sampietro, M., Sarti A., Crevoisier, A. Acoustic
Localization of Tactile Interactions for the Development of
Novel Tangible Interfaces, Proc. of the 8th Int. Conference on
Digital Audio Effects (DAFX-05), Madrid, Spain, 2005.
11. Rekimoto, J. SmartSkin: An Infrastructure for Freehand
Manipulation on Interactive Surfaces. Proceedings of CHI
2002. 113-120.
Figure 6. The main screen of the Surface Editor. 12. Tomasi, C., Rafii, A. and Torunoglu, I. Full-size Projection
Keyboard for Handheld Devices. Communications of the
5. ACKNOWLEDGMENTS ACM, 46-7 (2003). 70-75.
The project presented here is supported by the Swiss National 13. Wilson, A. PlayAnywhere: A Compact Tabletop Computer
Funding Agency and the University of Applied Sciences. Special Vision System, Proceedings of the ACM Symposium on User
thanks to all the people involved in the developments presented Interface Software and Technology (UIST), 2005.
here, in particular Pierrick Zoss for the programming of the
editor’s interface, Aymen Yermani for the initial development of 14. Wilson, A. TouchLight: An Imaging Touch Screen and
the multi-touch technology, and Mathieu Kaelin for his work on Display for Gesture-Based Interaction, Proceedings of the
the integrated illuminator. International Conference on Multimodal Interfaces, 2004.
15. Wu, M., and R. Balakrishnan, Multi-finger and Whole Hand
6. REFERENCES Gestural Interaction Techniques for Multi-User Tabletop
Displays. Proc. of the ACM Symposium on User Interface
1. Crevoisier, A. Future-instruments.net: Towards the Creation
Software and Technology, 2003.
of Hybrid Electronic-Acoustic Musical Instruments, Proc. of
the CHI workshop on Sonic Interaction Design, 2008. 16. http://www.nime.org
2. Dietz, P.H.; Leigh, D.L. DiamondTouch: A Multi-User Touch 17. http://www.future-instruments.net
Technology, Proc. of the ACM Symposium on User Interface 18. http://www.jazzmutant.com
Software and Technology (UIST), 2001.
19. http://www.surface.com
3. Han, J.Y. Low-Cost Multi-Touch Sensing through Frustrated
Total Internal Reflection, Proc. of the ACM Symposium on 20. http://www.tactex.com
User Interface Software and Technology (UIST), 2005. 21. http://www.celluon.com
4. Jordà, S., Kaltenbrunner, M., Geiger, G. and Bencina, R., The 22. http://www.lumio.com
reacTable*, Proceedings of the International Computer Music
23. http://www.smarttech.com
Conference (ICMC2005), Barcelona (Spain).
24. http://www.merl.com/projects/DiamondTouch/
5. Kaltenbrunner, M., Bovermann, T., Bencina, R. and Costanza,
E., “TUIO - A Protocol for Table Based Tangible User 25. http://nuigroup.com/wiki/Diffused_Illumination_Plans/
Interfaces”, Proceedings of the 6th International Workshop on 26. http://www.naturalpoint.com/
Gesture in Human-Computer Interaction and Simulation (GW
2005), Vannes (France).
116
A Study of Two Thereminists: Towards Movement

Informed Instrument Design
Nicholas Ward Kedzie Penfield Sile OModhrain,
R. Benjamin Knapp
Sonic Arts Research Centre Queen Margaret University Sonic Arts Research Centre
Queens University Belfast Edinburgh Queens University Belfast
BT7 1NN, Northern Ireland Edinburgh EH21 6UU BT7 1NN, Northern Ireland
+44 (0)28 90974829 +44 (0)131 474 0000 +44 (0)28 90974829
Nward04@qub.ac.uk KPenfield@qmu.ac.uk {sile, bknapp}@qub.ac.uk
ABSTRACT control [1]. Other designers look towards HCI for a framework to
This paper presents a comparison of the movement styles of two base the design process on. Design practise may be informed by
theremin players based on observation and analysis of video ergonomics with a task based view of instrumental performance
recordings. The premise behind this research is that a and an associated desire to reduce the effort required by the
consideration of musicians’ movements could form the basis for a performer to complete a musical task [2]. Ryan and others
new framework for the design of new instruments. Laban however have pointed towards a notion of desirable effort in
Movement Analysis is used to qualitatively analyse the movement instrumental performance, expressing a view that an integral part
styles of the musicians and to argue that the Recuperation phase of expressive musicianship stems from the struggle with the
of their phrasing is essential to achieve satisfactory performance. instrument in the creation of the sound, “Though the principle of
effortlessness may guide good word processor design, it may have
Keywords no comparable utility in the design of a musical instrument. In
designing a new instrument it might be just as interesting to make
Effort Phrasing, Recuperation, Laban Movement Analysis,
control as difficult as possible. Physical effort is a characteristic of
Theremin
the playing of all musical instruments.” [3]
The notion of introducing physicality by making ‘control difficult’
1. INTRODUCTION is one that has been explored in several new interfaces [4]. As
A decoupling occurs in the design of a new digital music
previously described by the first author the GSpring can be seen
instrument (DMI) due to the freedom of not having to match
in this vein [5]. This approach however trivializes the complexity
physical exertion to driving energy. The instrument can be viewed
of human expressive movement. Simply requiring more force for
as being composed of distinct interconnected parts such as the
a given result does not necessarily engender a more expressive
interface, mapping, sound engine, and sound reinforcement. This
performance. The Theremin for example is an instrument that at
decoupling has been the focus of much research in terms of the
first consideration would seem to require little force in its
opportunities it affords and also in terms of problems that arise for
performance due to its ‘hands free’ non-contact interface. It does
the instrumentalist and audience when the relationship between
however allow for rich expressive movement as part of its
gesture and sonic result is obfuscated or removed.
performance. Clearly there is something else to this notion of
In a ‘traditional’ acoustic instrument, the physics of the sound effortful performance than merely the requirement for physical
production provides a guiding framework within which the exertion. As Waiswicz indicates in his comments regarding effort
instrument design evolves. The excitation of a string, the physics and expression, “In the early eighties I formulated thoughts about
of a standing wave in a column of air, these physical realities the importance of forcing the performer to apply physical effort
force their influence upon the instruments overall physical when playing sensor instruments. I assumed that also this effort
realisation, its size, where the valves and buttons are positioned factor was crucial in the transmission of musicality through
etc. In the design of Digital Musical Instruments (DMI) this electronic instruments. Now I think the crucial aspect of perceived
framework is absent. musicality is not the notion of effort itself, but what we feel and
Many different approaches to DMI design are evident in the perceive of how the physical effort is managed by the performer.”
literature. The field of tangible interface design points towards the [6] The term ‘managed’ is key here. It invokes an
notion of physicality in the interface particularly when contrasted acknowledgement of the temporality of movement; that
with standard mouse and keyboard paradigms for computer movement unfolds in time and that therefore to consider musical
performance is to consider how the performer’s movements
unfold over time. Analogous to the musical idea of phrasing we
Permission to make digital or hard copies of all or part of this work for must consider how the performer phrases their movement and
personal or classroom use is granted without fee provided that copies are how this correlates with the musical result if we are to enquire
not made or distributed for profit or commercial advantage and that into the nature of effortful musical expression. What are the
copies bear this notice and the full citation on the first page. To copy qualities that establish a certain relationship between movement
phrasing and sonic phrasing as desirable? It is our belief that a
NIME08, June 4-8, 2008, Genova, Italy better understanding of these qualities could inform the design of
117
new DMI’s that allow for the visceral physicality of performance the gesture expressed his fury. If we say “He brought his fist
visible on acoustic instruments without being simply mimetic. down onto the table with a diminished Punch – Strong, Quick,
As a starting point in our enquiry into the nature of physicality in Direct” – we have a better idea of the expression. Equally, if we
musical performance this paper presents an analysis of two saw the gesture - an arm coming down with a clenched fist toward
musicians’ theremin performances and attempts to draw from the table – done with that Strong Quick Direct Effort quality, we
these observations lessons that may inform the design of new would not have to hear the words, nor would the hand even have
instruments. Our premise is simple: when playing a musical to hit the table in order to see and interpret the expression of the
instrument, the human performing artist needs to use their body in gesture as angry. If the hand came down (clenched or not) with
order to allow an expressive musical process to occur. Therefore Light, Free Flow, Sustained we would probably not interpret it as
in designing a new musical instrument, we need to take this an angry gesture. The expression in the communication comes
premise into account and design instruments that allow for – through the quality of the movement. This is an example of Effort.
perhaps even invite – expressive human movement in the The Body, Shape and Space are also informative of the
production of the sound. individual’s movement patterns in other ways. It is assumed that
each individual has their own movement preferences in all four
In our performer observations Laban Movement Analysis (LMA) categories of movement and that these preferences are
is used as a qualitative framework for the description of this recognisable aspects of that person’s personality and expressive
movement. In particular we focus on one concept taken from the style.
Laban framework, Exertion/Recuperation and its role in
movement phrasing. All terms that are part of LMA are
capitalized. Within LMA these terms have specific defined
meaning.
2. BACKGROUND
The study of musicians’ movement is a well explored area of
research within the field of NIME seeing much concentration on
attempts to classify musicians’ movement in terms of gesture
types [7]. On the quantitative level, motion capture has been used
to study the ancillary gestures of clarinetists [8]. Laban Movement
Analysis (LMA) has been applied to provide a qualitative
description of clarinetists’ ancillary gestures [9]. Elsewhere
Figure 1 Basic Elements of LMA [12]
Laban’s theory of Effort, which constitutes part of the LMA
framework, has been used to investigate the impact of dynamic
resistance modulation on performer’s movement [10]. Since it is An aspect that we feel is of particular interest in studying
the musician’s body movement that produces the sound from the musicians’ movement is how Efforts are sequenced together.
instrument, we believe that qualitatively observing and describing Laban emphasised the role of rhythm in manual work which lead
the body’s movement both in conversational and in performing to the development of his concept of phrasing in movement. This
situations can give us an understanding of how that performer was particularly evident in his work with Lawrence on the
produces his/her expressivity while playing. movement of factory workers in wartime Britain. In this situation
Laban‘s remit was to alleviate strain and dissatisfaction amongst
2.1 Overview of LMA conveyor belt based workers in factories. Prior to this ‘time and
A full description of LMA is beyond the scope of this paper. Here motion’ studies had been used to minimize the amount of
we present a general overview of the framework and explain the movement required for a particular job in the belief that this
background to Effort phrasing in particular approach would maximize productivity. Laban however
Exertion/Recuperation. emphasized the need for full body movement as part of any
LMA provides a rich overview of the scope of movement process to alleviate strain and through his Effort system developed
possibilities. The basic elements of Body, Effort, Shape and Space movement training programs for workers that allowed for
(BESS) can be used for describing movement and providing an Recuperation as part of the production process. He emphasized
inroad to understanding movement. Every human being combines the correct phrasing of Efforts to allow for Recuperation
these movement factors in their own unique way and organizes following Exertion. In this way workers were able to minimize
them to create phrases and relationships which reveal personal, strain, enjoy their work more and work for longer. [13]
artistic, or cultural style [11]. Many movement observers [14][15][16] have developed Laban’s
An important distinction between LMA and other forms of concept of phrasing, or what he originally called “rhythm” in their
movement description is that LMA describes the movement of the work. Peggy Hackney defines phrases as perceivable units of
body in qualitative terms – not in aesthetic, quantitative or movement which are in some sense meaningful. They begin and
anatomical terms. It is the qualitative aspect of movement that is end while containing a through line [17]. Irmgard Bartenieff was
the key to describing and therefore understanding expressive non- particularly interested in how a well-phrased movement was
verbal communication. If we are describing a businessman losing simultaneously more expressive and more functionally effective:
his temper at a board meeting and we say, “His arm traveled “Thus, it is not just the activity that identifies the behavior but it is
downwards until his closed hand came to the table” (a quantitative the sequence and phrasing with their distinctive rhythms that
or mechanical description of an action) we have no sense of how express and reinforce verbal and emotional content.” [18] Maletic
118
comments on a particular basic training exercise that Laban used: 3. MATERIALS AND METHODS
“Its characteristic sequence is a rhythmic chain of a preparatory
swing (Anschwung) followed by a main swing (Aufschwung) and 3.1 Choice of material
its expiration which can coincide with its re-initiation.” [19] These observations are taken from a DVD produced in 1998 by
Moog Music Inc called Mastering the Theremin [23]. Two
Practitioners using the concept of a phrase of movement have
Theremin players, Lydia Kavina and Clara Rockmore,
divided it up in various ways. Hackney depicts this graphically:
demonstrate, discuss and perform pieces on the theremin. A
movement analysis of the two women has been carried out using
both formal performance and informal interactive situations. In
the case of Clara Rockmore three different situations were
analysed: firstly a social gathering discussing the theremin with
her sister and nephew at her apartment, secondly a demonstration
of theremin performance technique where she explains the basics
of performing with the instrument and finally her performances of
three pieces accompanied by her sister on piano. In the case of
Figure 2 Phases of Phrasing [20]
Lydia Kavina we analyzed six lessons given by her to camera and
2.2 Phases of Phrasing four performances. The performers were chosen as they are both
considered experts on this instrument. The availability of footage
In LMA a movement is commonly seen as being organized into
showing the two performers both in and out of performance was a
three sections: a Preparation phase that may overlap with the
requirement for the study as we wished to be able to compare both
initiation phase, a Main Action and finally a Recuperation or
styles in as wide a context as possible. We were also interested in
follow through phase that may resolve into a transition and
observing their individual movement patterns of phrasing in
subsequent preparation phase for the next action. Here we use the
different situations.
idea that a phrase has three phases: Preparation, Main Action, and
Recuperation. 3.2 Analysis Method
2.2.1 Preparation phase: Each section of the DVD described above was observed on four
In order to do any physically demanding or complex task, we see different days over a period of a month. Each observation session
an individual prepare themselves: whether it is the ballet dancer lasted between two and four hours. Three other Laban Movement
before a pirouette, a pole vaulter about to run or a new graduate Analysts were consulted on an informal basis to compare
about to go into a job interview. The body prepares as the mind (ie observations. All four categories of BESS were used in order to
the focus, concentration) prepares for the task. “It is in the establish some understanding of the differences in the performers’
preparation moment that we claim our intention. Intention patterns movement styles before focusing on their respective Recuperation
the organism.” [21] phase of phrasing.
2.2.2 Main Action 3.3 Results: Comparison of Performances

The main action is that which is most connected with the sound Here we will give a brief overview of each performer’s movement
produced and will vary according to the individual’s style and style in order to set their respective Recuperation styles in the
understanding of what they are doing. In terms of playing the context of their overall phrasing.
instrument the main action can often be associated with the
excitation phase of the movement.
3.3.1 Lydia Kavina’s Movement Qualities
Lydia has a preference for the Vertical Plane, Light Effort and a
2.2.3 Recuperation wide range of Free and Bound Flow, some Shape Flow with
“After the Main Action happens, there is generally a natural Quick or Sustained Efforts. She tends to Prepare with a Shape
change of movement quality which allows the mover to Flow Widening and Lightness of the upper torso seen clearly as
recuperate from the exertion of that action.” [22] Whether this she lifts her elbows to the side. (A clear example of this takes
Recuperation is letting the breath out after a tense or high energy place at the beginning of Swamp Music) Her Recuperation tends
interaction or the follow through of a tennis serve in which the to be a Light Free flow gesture of the right arm often accompanied
Flow quality goes from Bound to Free, the individual needs a by a Postural Shift of the body stepping back or to the side of the
recovery from their Min Action. This recovery may be described instrument. For example at the end of lesson 4: [24] she finishes
in any of the movement categories but will be as individual and playing “Somewhere Over The Rainbow” with a Light Free Flow
unique as the rest of the performer’s movement style. gesture of the right arm before she turns off the instrument and
steps back from it. We see variations of this pattern throughout
We propose that the ability to recuperate and prepare as part of the her demonstration: sometimes she simply drops her arms in
performed movement is essential for the musician if they are to Passive Weight and Shape Flow, sometimes she does the more
access their expressivity: if the performer is unable to recuperate, active Recuperation of a large Kinesphere Gesture with Free
the flow of their performance which is the vehicle for expression Flow.
is disrupted. It must be emphasized that Recuperation need not
necessarily be a moment where no sound is produced or control of
the instrument is relinquished. Further, the ability to recuperate
can be seen as a function of several things, the movement
repertoire of the performer, the affordances of the instrument and
the score or musical goal.
119
movement style: Bound Flow in the case of Clara Rockmore,

Lightness in the case of Lydia Kavina.
With all of this observation, our belief is that, within the demands
of the task, the individual must find their own way of playing the
instrument; ‘matching’ so to speak, their individual movement
style with the technical demands of the task. In terms of the
Figure 3 Lydia’s Effort Phrasing Theremin, the design of the instrument dictates a set of demands
that are very specific: due to the effect of body proximity on
3.3.2 Clara Rockmore’s Movement Qualities tuning the performer must remain quite still except for the arms
Clara has a preference for the Sagittal plane and she uses and hands: in Laban terms, Bound Flow, spatial accuracy through
Quickness, Direct Focus (Space Effort) and Bound Flow in her Direct Space Effort, a held torso and head and prescribed weight
personal movement repertoire. Her Bound Flow and use of shifts are encouraged for the performance of tonal music. Despite
Carving Mode of Shape Change while playing is the hallmark of these limitations, both Clara and Lydia find very individual ways
her performing movement style; her face (in particular, her of playing the Theremin. In the context of NIME we feel this
eyebrows) is often where the expressiveness of her playing is point highlights the importance of considering not just the range
most evident. Her Recuperation is almost a collapse; as we see at of movement sensed and used directly by the interface but also the
the end of Hebrew Melody she steps back from the instrument and range of movement that is left free to the performer.
drops her Focus, Flow and Sagittal stance.
In his work with industry Laban emphasized the need for
Recuperation in order to avoid strain. In musical performance the
degree to which a performer may recuperate may be seen to
impact upon musical tension. Composers for voice and wind
instruments do so with an understanding of the need to breathe. In
considering the need to recuperate we see three interrelated
Figure 4 Clara's Effort Phrasing factors: the movement phrasing available to the musician
developed through the attainment of musical skill and personal
movement preference, the movement requirements of the
instrument by virtue of its physical realization and mapping
4. DISCUSSION structure and the particular requirements of the score or
Laban’s theory emphasizes the importance of phrasing in performer’s intention. In seeking to use movement as an
movement, “a movement makes sense only if it progresses interaction design element the designer must address the
organically and this means that phases which follow each other in interdependence of these three factors.
a natural succession must be chosen” [25]. For the performer of
an established instrument the ability to evolve phases that can be An understanding of the relationship of movement phrasing and
efficiently sequenced develops as part of the learning process. sonic phrasing evident in many acoustic instruments may be used
Laban’s theory of phrasing in this context emphasizes that the to inform the development of a higher order mapping system as
ability to find sequences of movement that allow for preparation, part of a new instrument. We believe that focusing on this
action and Recuperation is key to the development of relationship can inform the process of creating new interfaces for
musicianship. It must be noted that the Recuperation from one musical expression that seek to leverage the notion of effortful
action may also form the preparation for the next action and that interaction exemplified by many acoustic instruments whilst
Recuperation may be active in that sound is produced during this avoiding a purely mimetic approach. Further a consideration of
phase. the availability for Recuperation provided by the instrument may
be used to allow for the creation and resolution of musical tension
In terms of phrasing Clara does not seem to have much visible in music composed for the instrument.
preparation or recovery. The camera often does not let us see the
moment before or after she performs so we are making The theremin interface can be seen simply as two proximity
assumptions based on observations made of her in social detectors mapped to pitch and volume respectively. In the context
interaction and demonstration of the theremin. Her preference for of tonal music we observe this simple mapping of distance to
Flow variation at the Bound end of the continuum seems to allow pitch as encouraging a correlation between the LMA element of
her to play the instrument with almost no Preparation or Flow, which ranges from Free to Bound, and pitch articulation
Recovery. She comes from and returns to a Neutral or Bound ranging from glissando to staccato. This however need not be the
Flow state which seems to allow her to Prepare or to Recuperate case for non-tonal music highlighting the requirement to have a
without a high degree of variation in the Efforts used. musical goal in mind as part of the instrument design process. For
the designer of a new instrument a consideration of movement
Lydia’s Recuperation of a large Kinesphere Gesture with Free phrasing in the context of musical outcomes could form the basis
Flow is notable in that the instrument would not allow such an for a new design that leverages the enjoyment of expressive
amount of Free Flow or large looped movement while playing the movement rather than seeks simply to minimize effort.
piece; it is interesting to speculate that she incorporates these
qualities into her Recuperation because they are part of her 5. CONCLUSIONS AND FUTURE
personal repertoire and she needs to return to them in order to
Recuperate from the Exertion of playing the piece. Both these
DIRECTIONS
In this paper we have drawn on the Effort theory of Rudolph von
performers return to at least one Effort in their Recuperation
Laban and the related discipline of LMA in an enquiry into what
Phase that is present in their Preparation and in their personal
might constitute effortful performance on new interfaces for
120
musical expression. Though LMA has been applied to the analysis [8] Wanderley MM et al, The Musical Significance of
of musicians movement in NIME before an aspect that has up Clarinetists’ Ancillary Gestures: An Exploration of the
until now been ignored is Effort phrasing. Here we have Field. Journal of New Music Research 2005, Vol. 34, No.
particularly focused on Recuperation as part of phrasing, 1, pp. 97 – 113
hypothesising that the degree to which the performer may [9] Louise Campbell. On the use of Laban-Bartenieff
recuperate influences the perceived musical tension. We have techniques to describe ancillary gestures of clarinetists
highlighted the factors which influence the Recuperation phase, [Internet]. 2005 ;Available from:
the instruments realisation, the musical goal and the performers http://www.music.mcgill.ca/musictech/clarinet/LBMF_Final
skill, in the belief that an early consideration of the interdependent _Report.pdf
nature of these three factors can inform the design of a new
interface. [10] Bennett P, Ward N, O'Modhrain S, Rebelo P. DAMPER: a
platform for effortful interface development. Proceedings of
Specifically we have presented an analysis of two thereminists the 7th international conference on New interfaces for
movement. We have focused on the manner in which they musical expression. 2007; 273-276.
Recuperate following the Exertion of playing the instrument. Our
analysis shows that each performer has a different style of [11] Hackney P. Making Connections: Total Body Integration
Recuperation and we have demonstrated the utility of LMA in Through Bartenieff Fundamentals. Routledge; 1998. ;237
qualitatively describing this difference. [12] Hackney P. Making Connections: Total Body Integration
In our analysis of Recuperation presented here we have focused Through Bartenieff Fundamentals. Routledge; 1998
on the Recuperation evident as the performer finishes a piece or [13] Davies, E. Beyond Dance: Laban's Legacy of Movement
takes a rest as the accompanist carries the music. Future work will Analysis, Brechin Books. 2001 ;44-45
seek to focus LMA’s theory of phrasing on musicians movement
[14] North, M. Personality Assessment Through Movement,
looking at how the performer phrases their Exertion and Plays, inc. 1975
Recuperation whilst actively playing and seeking to correlate this
with perceived musical tension. [15] Penfield, Kedzie Comparison of Two Dancers.
Unpublished Certificate Dissertation at Dance Notation
Bureau, NYC. 1972
6. REFERENCES [16] Davis, M., Movement characteristics of hospitalized
[1] Ishii, H. & Ullmer, B. Tangible Bits: Towards Seamless psychiatric patients. American Journal of Dance Therapy,
Interfaces between People, Bits and Atoms CHI, 1997, 234- 4,1 (1981), 52-71.
241 [17] Hackney P. Making Connections: Total Body Integration
[2] Wanderley MM, Orio N. Evaluation of Input Devices for Through Bartenieff Fundamentals. Routledge; 1998. ;239
Musical Expression: Borrowing Tools from HCI. Comput. [18] Bartenieff, I. Lewis, D. Body Movement: Coping with the
Music Journal. 2002 ;26(3):62-76. Environment, Routledge. 1980. ;73
[3] Ryan J. Some Remarks on Musical Instrument Design at [19] Maletic, V. Body, Space, Expression: The Development of
STEIM. Contemporary Music Review. 1991 ;6(1):3 - 17. Rudolf Laban's Movement and, Walter de Gruyter. 1987. ;96
[4] Bennett P, Ward N, O'Modhrain S, Rebelo P. DAMPER: a [20] Hackney P. Making Connections: Total Body Integration
platform for effortful interface development. Proceedings of Through Bartenieff Fundamentals. Routledge; 1998
the 7th international conference on New interfaces for
musical expression. 2007; 273-276. [21] Hackney P. Making Connections: Total Body Integration
Through Bartenieff Fundamentals. Routledge; 1998. ;237
[5] Lebel D, Malloch J. The G-Spring controller. Proceedings
of the 2006 conference on New interfaces for musical [22] Hackney P. Making Connections: Total Body Integration
expression (Paris, France: IRCAM; Centre Pompidou, Through Bartenieff Fundamentals. Routledge; 1998. ;240
2006), 85-88. [23] Mastering The Theremin. (2004). DVD Moogmusic
[6] Michel Waisvisz, Composing the now. Available at: [24] Mastering The Theremin. (2004). DVD. Moogmusic
http://www.crackle.org/composingthenow.htm [Accessed
[25] Laban RV, Ullmann L. Choreutics. Macdonald & Evans;
September 6, 2007].
1966 ;4.
[7] Cadoz C, Wanderly M. Gesture Music. Reprint from :
Trends in Gestural Control of Music, M.M. Wanderley and
M. Battier, eds. . 2000 ;
121
Towards an affective gesture interface for expressive

music performance
Vassilios-Fivos A. Maniatakos Christian Jacquemin

RepMus-IMTR AMI
IRCAM-CentrePompidou LIMSI-CNRS & Univ. Paris 11
1, place Igor Stravinsky BP 133
75004 Paris, France 91403 Orsay, France
fivos.maniatakos@ircam.fr christian.jacquemin@limsi.fr
ABSTRACT and music synthesis parameters, we have built and evaluated

This paper discusses the use of ‘Pogany’, an affective anthro- a music performance system for the ‘Pogany’ interface.
pomorphic interface, for expressive music performance. For
this purpose the interface is equipped with a module for ges- 2. SCOPE AND MOTIVATION
ture analysis: a) in a direct level, in order to conceptualize ‘Pogany’ is an affective, anthropomorphic (head), hand-
measures capable of driving continuous musical parameters, manipulated interface designed at LIMSI laboratory [1]. The
b) in an indirect level, in order to capture high-level infor- purpose of the designers was to offer a new medium of com-
mation arising from ‘meaningful’ gestures. The real-time munication that can involve the user in an affective loop.
recognition module for hand gestures and postures is based Related work in terms of design and motivation can be found
on Hidden Markov Models (HMMs). After an overview of in [2] for a doll interface to emotional expression and in [3]
the interface, we analyze the techniques used for gesture for the analysis of voice expressivity through a hand-puppet
recognition and the decisions taken for mapping gestures interface.
with sound synthesis parameters. For the evaluation of the What is investigated in our work is the appropriateness of
system as an interface for musical expression we made an ex- an interface such as Pogany, apart from human communi-
periment with real subjects. The results of this experiment cation, for musical creation and interaction purposes. The
are presented and analyzed. deeper scope of this work is to provide the user with a inter-
active performance system that captures expressivity. For
Keywords Pogany, such a task sounds challenging a priori, basically
for two reasons:
affective computing, interactive performance, HMM, gesture
recognition, intelligent mapping, affective interface 1. The familiarity of a user with the human face, either by
view or touch, can help the user associate instrumen-
tal gestures for the manipulation of the interface with
1. INTRODUCTION common hand gestures. Thus, such a music interface
The shift of interest of Human Computer Interaction (HCI) provides the performer with facilitated apprenticeship.
towards emotions and social interaction resulted to inten-
sive studies relative to Affective Interfaces: such are the 2. Particular gesture patterns may correspond to high-
interfaces that appeal to the emotional state of their users level expressive or emotional information. For instance,
and allow to express themselves emotionally, by receiving if we regard a real human face as the interface itself,
information of emotional content and decoding it through and we somehow detect the facial expressions produced
appropriate techniques. In this work we argue that high- by the alteration of the face parts (nose, lips, etc),
level gesture information can be revealing for the expressive we can then directly have a link between these ex-
diathesis or emotional state of the user. Furthermore, an pressions and corresponding emotions [4]. In our case,
interface that succeeds in decoding such information can be visual feedback (with the form of an animated head)
inspiring to use as a virtual music instrument. Using strate- helps the user in creating a link between gestures in the
gies 1) to decode high-level gesture information from the user vicinity of Pogany and emotions through the interme-
2) to link this information with its semantic meaning 3) to diate semantic level of face expressions. Apart from
create intelligent correspondences between these semantics this type of indirect association between gestures and
emotions, additional emotional information can occur
by the type and the particular area of the contact that
the user can have with the interface. For instance,
Permission to make digital or hard copies of all or part of this work for when someone touches a face on the cheek, depend-
personal or classroom use is granted without fee provided that copies are ing on the force used and the speed and suddenness of
not made or distributed for profit or commercial advantage and that copies the gesture, this action could be each time attributed
bear this notice and the full citation on the first page. To copy otherwise, to to contradictory emotional intentions: from expression
republish, to post on servers or to redistribute to lists, requires prior specific of calmness and tenderness to inelegance and brutal-
NIME08, Genova, Italy ity, with an extreme variability such as the one that
Copyright 2008 Copyright remains with the author(s). exists between a caress and a punch. Such emotions
122
would be a very interesting input in the design of a vir- 3.1 Front-end: gesture capture
tual music instrument, and what remains to do is the The front-end module of the system, is based on the use
validation, classification and detection of such kind of of a camera and a proper video-capture software interfacing
emotions with the proper interface. Therefore Pogany, to ‘Virtual Choreographer’ (VirChor) environment [5]. An
as a member of the affective interfaces family, seems image segmentation tool integrated in VirChor keeps only
to have a priori a major advantage against other inter- the important blocks from the image and finds the normal-
faces in the context of music performance and interac- ized mean luminocity value of the pixels that belong to each
tion. block. In this way we keep just one normalized value of light
intrusion (called alpha value)for each of the pixel blocks that
3. OVERVIEW OF POGANY INTERFACE correspond to each KeyPoint.
current luminosity
alpha value = (1)
luminosity at calibration time
Alpha value is bounded between 0 and 1, with 0 corre-
sponding to maximum light intrusion (that means no cover-
ing of the hole, thus zero activity) and 1 to minimum light
intrusion (the hole is fully covered, maximum activation of
the KeyPoint). The output of the front-end of the system
a b c consists of instantiations of a 43 float vector with a rate of
30 fr/sec, thus providing the gesture recognition core with
Figure 1: a)physical interface b)interaction holes a low dimensional vector instead of raw data of image for-
(KeyPoints) c)types of meaningful gestures detected mat. Further information concerning the particular tech-
niques used (image segmentation, calibration tool) can be
‘Pogany’ is a head-shaped tangible interface for the gen- found in [1], [8].
eration of facial expressions through intuitive contacts or
proximity gestures. The input to the interface consists of 3.2 Middle part: gesture processing
intentional or/and natural affective gestures. The inter- The middle part of the system includes the processing-
face takes advantage of camera-capture technology, passing feature extraction unit and the gesture recognition module.
a video stream to a computer for processing. A number of
constraints mentioned in [1] gave to the interface the size of 3.2.1 Gesture analysis-feature extraction
a joystick and the form shown in figure 1. The position of Here we extract useful features from gesture, such as en-
KeyPoints, small holes on the surface of the head used for ergy and velocity.
finger position capturing, was inspired by the MPEG-4 con- Energy : Particular meaning for the mapping procedure
trol points. In a lit environment, passing over or covering in next stage has the definition of the energy of the signal
these holes with the hands variates the luminosity level cap- that denotes activation in front of the interface. We call
tured by a camera placed inside the facial interface. From this multidimensional signal Xt . In case we define energy
each frame of the raw video image captured we analyze only as: Et = X2t , where Et is the temporal energy vector for the
the pixel blocks that correspond to KeyPoints and thus to frame t=0, 1,..n. The normalized mean short time energy
gestural information on the vicinity of the head. of the signal at frame t is:
Nkp
1 X
Et = Xj,t 2 , (2)
Nkp j=1
where Nkp the total number of KeyPoint holes. As the sig-

nal does not take negative values, it is not wrong instead
PNkp
of energy to consider Mt = j=1 Xj,t , where Mt the Mean
Magnitude Value per frame (MMV), a metric for the acti-
vation of the KeyPoints of the interface.
Velocity : The velocity of the multidimensional signal is
Figure 2: Architecture of Pogany music interface
defined as:
X −X
In figure 2 we show an overview of our system. In the front Vt = t δtt−δt , t=1,2,..n. We assume that V0 = 0;
part, we isolate the image pixel blocks associated with Key- If δt = 1, t ≥ 1 the mean velocity value per frame t is:
Point holes in each frame of the video. Then, in the middle Nkp
1 X
part of the system, we process this information in order to Vt = Xj,t − Xj,t−1 (3)
extract the important features from gesture. These features Nkp j=1
are used either directly for mapping to music parameters or
A useful measure to be used for gesture segmentation is
as an input to a second higher layer of processing (Gesture
the Mean Activation Rate (MAR) defined as:
Recognition Module) employing HMMs. At the last part of
the system and after the processing procedure, we map the
Nkp
processed data to a sound synthesis module that is responsi- 1 X
ble for producing the continuous sound feedback of gestural M ARt = |Xj,t − Xj,t−1 |, (4)
Nkp j=1
action.
123
3.2.2 Real-time Gesture Recognition Module detection module, HMM recognition core etc.). It is also
The gesture recognition module is responsible for the iden- worth to mention a module for visual feedback, in the form
tification of a ‘meaningful’ gesture or posture that the user of an animated head for facial expressions: this permits the
addresses to the interface out of a continous stream of ges- implicit link of user gestures with emotions arising from fa-
ture data in real time. The difference between gestures and cial expressions. Finally, for the HMM core we used the
postures lies on the motion or motionlessness of the hand in HTKLib (library for the HTK toolkit for speech recogni-
front of the interface. Meaningful gestures (figure 1c) cor- tion)[7], adequately adapted to face with real-time recogni-
respond to gestures with a particular significance that the tion issues for gesture. Details for these modules, as well
system has been trained to recognize; classified on a high as a module for gesture intention recognition based on the
level, they function as expressivity-related commands that Token-Passing algorithm (estimation for the type of gesture
tend to modify the sound synthesis procedure in the form of before it is completed), are described in [8].
modulation or interrupts. These gestures, in order to be dis-
tinguished from raw gesture data with higher success rate, 4. MAPPING STRATEGIES
demand permanent contact with the interface. For the mapping module (see figure 2) we followed mixed
Inspired from our experiments for off-line isolated gestures direct and indirect strategies: the first concern low-level con-
based on HMMs presented in [6], we developed a real-time tinuous information arising from direct gesture processing;
module for continuous gesture recognition. On the parallel, the second refer to the semantic (high-level) information of
we were interested in keeping a high degree of expandability meaningful gestures and postures. We linked this informa-
for the system; that means to let open future enhancements tion with parameters from two types of synthesis: FM and
with multiple gestures, complex gestures and a large-scale Granular Synthesis (GS). In general, for low-level informa-
gesture vocabulary. tion we used one-to-one and for high-level information one-
HMM configuration : In our HMM models the number to-many mapping. Correspondences are shown in table 1.
N of states for the HMM is set to 4, plus the two non-
emitting states at start and at the end. We use a left-to-
right-no-skips topology and an observation vector of a size Table 1: Mapping low & high level information to
of 43. The training of the system is based on the Baum- FM and GS parameters
Welsh algorithm. For the recognition we employed a non-
consuming Viterbi-like algorithm. low level: low level: High level:
Segmentation for continuous gesture : An important Energy Velocity Gesture
issue for the recognition of the continuous gesture is segmen- FM loudness Modulation Frequency
Index Ratio
tation. It is implemented in the activity detector module. Granular loudness time between audio sample,
This module is responsible for detecting predefined meaning- Synthesis grains grain duration,
ful gestures and postures in raw gesture data (meaningless pitch transpo-
gestures and silent parts). This module makes use of the sition,...
previously defined MMV and MAR metrics in combination
with a number of constraints. We provide the core of the
algorithm for a) separation of activity parts (gestures and
postures) from silent parts (no activity in front of the inter- 4.1 Direct Mapping strategies
face) and b) separation of gesture from posture: MMV and MAR metrics mentioned in the previous section
serve as continuous parameters that adjust music parame-
if MMV > thresh then ‘activity’
ters in the synthesis procedure.
else ‘silence’
if ‘activity’ then 4.1.1 Mapping Energy
if MAR > thresh2 then ‘gesture’
The Magnitude Value per Frame (MpF) represents loud-
else ‘posture’
ness. The function selected for this transformation was:
MMV represents the general amount of activation in the M pF (nM M V ) = 1 − e−nM M V /a , 0 ≤ nM M V ≤ 1, (5)
vicinity of the interface. Therefore, it gives evidence or not
for the existence of some kind of activity (gestural or postu- where nMMV the normalized MMV in [0..1], a a parameter
ral) or, for values near zero, what for we call ‘gesture silence’. for the control of the gradient of M pF (x). This parameter
MAR expresses the speed of the gesture, therefore it is useful helps to adjust the radius of sensitivity around the interface.
in separating gestural from postural activity. thresh1 and MpF is a conjunction of the need to quasi-linearize the dis-
thresh2 are thresholds used to regulate the procedure rela- tance factor and to preserve the additive effect of multiple
tively to light conditions. According to the output of the finger haptic interaction in zero distance.
activity detector module described above, the system trig-
gers or not the gesture and posture recognition and replies 4.1.2 Mapping Velocity
analogously according to the vocabulary of the meaningful MAR was defined as a metric for the speed of the ges-
types of gestural/postural activity it is trained to detect. ture in front the interface. According to theory, the Mod-
ulation Index M I = Am/F m in FM is responsible for the
3.2.3 Implementations brightness of the sound, as the relative strength of the dif-
In order to support the interface, we have implemented a ferent sidebands (which affects the timbre) is determined
variety of cooperating modules which we have integrated to by the relationship between the modulator amplitude (Am)
VirChor rendering environment (image segmentation, ges- and the modulator frequency (Fm). Hence, we have set
ture collection and data transformation algorithm, gesture M I = M AR/b, where b a normalization factor which gives
124
to the continuously changing value a meaningful -in musi- 3) learnability, and 4) explorability. Furthermore, the eval-
cal terms- range [8]. The MAR metric was also adapted for uation process was properly adapted in order to provide an
granular synthesis, this time in order to control the time objective measure for judging the effect of high-level discrete
between grains. gestural information to musical expressivity.
4.2 Indirect mapping strategies 5.2 Preparation of the experiment

Gesture recognition acts in two levels of interest for the The experiment was divided in two sessions. Both ses-
mapping procedure: First, at the level of gesture recogni- sions made use of the same synthesizer modules (FM and
tion; hand gestures with a number of frames varying from GS), and also shared the same mapping elements, as far as
5-70 frames (0.18-2.3 sec), are isolated and probable to get direct strategies are concerned. This means that we were
identified by the system. Second, at the level of posture motivated to create the ‘loudness by distance (MMV)’ and
recognition; in between gestures, recognizable or not by the ‘brightness by MAR’ correspondences (described in 4.1) in
system, whenever hands remain almost steady, the recog- order to drive the two synthesizers in both experiments.
nizer estimates the probability that this posture of the hands The main difference concerns the indirect mapping strate-
corresponds to one of the pre-learned postures. gies. In the second experiment we made use of indirect map-
The decisions we took concerned recognition for four prin- ping exactly as described in the previous session. Whenever
cipal gestures (figure 1c): ‘eyes up’, ‘eyes close’, ‘smile’, the user performed a meaningful gesture, this information
‘sad’, that correspond to the emotions: ‘surprise’, ‘suspi- was set to adjust a set of parameters inside the synthesizer,
cion’, ‘joy’, ‘sadness’ respectively (for further details con- linearly and with a certain amount of delay.
cerning the different types of gestures see [6]). On the contrary, in the first session the mappings of high-
level information to the music parameters of the synthesizers
4.2.1 Mapping to FM were arbitrary. This means that after a meaningful gesture
From FM synthesis practice, the Frequency Ratio pro- the change in parameters of FM and GS was not the one
duces harmonic sounds when it is a multiple of 1. On which corresponded to this particular gesture but an arbi-
the contrary, with non-integer frequency ratio, inharmonic trary selection from a sum of preset values of all gestures.
partials become more prominent. In our situation, facial It is worth to mention that for the second experiment,
emotions with a rather positive impact, such as ‘joy’ and an animated head was connected with the system in order
‘surprise’, were associated with values of harmonicity ratio to execute face-animation commands according to gestures
that lead to a consonant result (an harmonic sound). On over the interface. The recruitment of such a feedback was
the other side, ‘suspicion’ and ‘sadness’, as emotions with necessary in order to ensure that, even implicitly, the user
mostly negative impact, were less likely to result to an har- assigns a set of actions to corresponding facial emotions, and
monic spectrum. Thus, by mapping gesture infrormation to thereafter to the corresponding sound feedback.
Frequency Ratio we gained control over the consonance of The system in the second experiment was trained to rec-
the resulting sound. ognize four meaningful gestures which correspond to four
basic emotions: joy, sadness, surprise and suspicion. Alter-
4.2.2 Mapping to GS ation of one of these 4 moods was triggering relative changes
With appropriate selections of the audio material used in the harmonicity ratio of the FM synthesis and the dura-
for the grains of GS we can adequately control the nature of tion, pitch and grain source of the granular synthesis mod-
the sound to be representative of positive or negative emo- ule. Additionally, five postures were recognized during ses-
tional impacts. The parameters of the granular synthesis sion, which corresponded in five different types of activation
which participate on the mapping with gestures are the up- in front of the main areas of the facial interface: the eye-
per and bottom limits of the grain duration, the limits of brows, the eyes, the cheeks, the mouth and the nose. The
transposition, the time between grains and of course, the activation of such cues was mapped to result to minor -in
location where the grains were extracted from. Whenever comparison with the primal gestures- change on the sound.
a meaningful gesture is recognized, above parameters of GS The two symmetrical parts of the face were designed to give
are affected relatively to their instant values. equal sound results.
In these two experiments we have tried to achieve global
similarity on the sound quality, as well as their temporal
5. EVALUATION OF THE INTERFACE evolution and duration variability: this would allow a fair
In order to evaluate our design selections for the inter- comparison between other parameters that were different
face, we organized an experiment with human subjects that between the two sessions (indirect mapping).
interact musically with the interface. Context, preparation The experiment : The experiment took place at LIMSI
and conclusions arising from this experiment are described laboratory, Orsay. The light conditions during the experi-
in this section. ment were physical (slightly non-homogenous).
The procedure was as follows: The subject received some
5.1 Context explanations concerning the devices that he/she should use
Discordance over the evaluation criteria and methods [8] and the general concept of the experiment. Then she/he
do not provide a concrete evaluation process to follow. Un- was given a short time to familiarize her/himself with the
der these circumstances, we decided to base our evaluation interface by observation and touch, without any kind of feed-
method on the axis set by Wanderley [9], adapted to the back. The next step was to get introduced to the procedure:
particularity of the ‘Pogany’ interface. Thus, the interface during two sessions of 5 minutes each, the subject would be
set-up for the experiment aimed to give clues for four main let to interact with the interface in every desirable man-
attributes: 1) time controllability, 2) sound controllability, ner. The subjects were encouraged to perform quick or slow
125
our situation, results show in general the strong acceptance

of the interface as a virtual instrument. It is important to
mention that the values between the two experiments show
minimum differences. This seems reasonable, as the ques-
tion mostly referred to the direct mapping strategies that
are responsible for the modification of the most prominent
parameters of a sound itself: loudness and brightness (for
FM).
At this point, it was important to correlate the answers on
Figure 3: Evaluation of the subjects over control- the questionnaires with real data extracted from the perfor-
lability in time and sound evolution control for the mances. It was difficult to set objective evaluation criteria
(left) 1st and (right) 2nd session. Horizontal axis for the character of each performance. An interesting ap-
represents the subjects, and vertical axis the evalu- proach to this matter was to use a normalized MAR as a
ation score (between 0 and 5) criterion for the kind of activity on the vicinity of the in-
terface: High values of MAR make proof of high velocity in
movements. Hence it was decided to calculate the zero cross-
movements, by distance or touch, in the front or the vicinity ing rates of T M AR = m − M AR, where m the estimated
of the interface. Alternative modes of action were also pro- mean value for MAR during gesture activation. Variations
posed, such as tapping, caressing or hitting (slightly), using of the MAR value for each session and user were recorded.
each hand separately or both hands simultaneously. After After processing, the mean value of MAR was set to m=
the end of the two sessions, the subjects were asked to fill a 0.7. In 2 we show the zero crossing rates for the Transposed
questionnaire related to the experiment. MAR (TMAR).
Six subjects passed the music experiment with Pogany
(five male and one female). Their age varied from 23 to 29 Table 2: zero crossing rate of the TMAR
years old. All subjects had used before interfaces connected 1st 2nd 3rd 4th 5th 6th average
to a computer; two subjects have made use before of an in- TMAR 7 26 27 28 43 69 33.3
terface for music over five times, three subjects less than
five and for one subject it was the first relative experience.
After interaction with the interface, all subjects answered Making the comparison of table 2 with the graphs in fig-
to a set of questions concerning their experience. A number ure 3, it is straightforward to understand that the subject 1
of these questions was focussing on eliciting their subjective who claimed less control had the lowest TMAR score, this
view for the controllability potential of the setups, in terms means that the amount of general activation rate (i.e the
of time, sound quality, sound modification capabilities and velocity of his gestures) was limited (with a value of 7 to
expressiveness. Other questions concerned the ability for a mean value of 33.3 among subjects). On the other hand,
recovering past gesture-sound dyads, and repetition of per- subjects 5 and 6 that ranked the system as very good or
formed patterns during performance. Another set of ques- excellent had a definitely a more ‘attacking’ approach. It is
tions focused on the easiness to explore new sounds in the also important to note that subject 1 had not any experi-
given time, and the expectation for an hypothetical second ence of a music interface before, a fact that gives the clue
chance. Finally, other questions referred to the visual feed- that previous experience with interfaces probably affects the
back effect, the willingness of the subjects to experience the learning curve and the easiness of manipulation, as well as
same or similar interfaces in the future, etc. Apart from the overall view on the effectiveness of such as system.
the questionnaires, audio material was collected for each of One of the most important issues of our work concerned
the sessions of participants, as well as text records concern- the expressivity capabilities through the interface. This fact
ing the number of meaningful gestures activated for both had had a straight impact on decisions taken concerning
sessions. the configuration of each session. After the experiments,
participants were asked in the questionnaire to give a judge-
5.3 Results ment about the system expressive capabilities. The question
Despite the limited number of participants, data gathered was posed relatively between sessions, asking if there was
proved sufficient to provide important feedback and the base one session in particular that help them more in express-
for a number of conclusions. Furthermore, it gave clues for ing themselves. It was impressive that all but one subject
the capability -or not- of the interface for expressive music found the second configuration better in expressing them-
creation. selves, while the other subject found two sessions as of equal
In the domain of controllability, the participants found the expressivity capabilities. This fact, in correlation with the
quality of control in time and timbre adjustment more than minor divergence in evaluation of time and timbral control
satisfying (figure 3). In a range of 0 to 5 (with 0 correspond- between sessions, pose an important issue for expressivity
ing to ‘very bad’ and 5 to ‘excellent’), subjects evaluated the related excusively to the process of decoding gesture cues in
system with an average of 3.66 and 3.33 for time and tim- a higher-level approach employed in the second session. It
bre modification flexibility respectively for the first session. is also overwhelming that one of the subjects- the one that
At the second session only the average value for the timbre was statistically found to have the best score of the TMAR
modification increased slightly, while the temporal modifica- value- evaluated the level of control as being better in the
tion potential remained unchanged. Despite dependence of first experiment, while in the same time confirmed the su-
these values from factors such as the complexity of mapping periority of the second session in terms of expressivity.
and the degree of polyphony which were not the purpose in In terms of learnability, five of the six subjects claimed
126
that they definitely succeeded in learning new gestures through- 6. CONCLUSIONS-FUTURE RESEARCH
out the little time they were given for manipulation, while The impressions we obtained from this experiment were
the sixth-referred as 1st on statistics- also gave a positive encouraging at many different levels. Firstly, the high-level
answer but with less certainty. On the question if, even af- gestural information decoding module in the second session
ter the experiment, the subject can recall correspondences proved to be particularly useful in terms of expressivity of
between gestures and resulting sounds all subject gave a the user, as stated by all the subjects and confirmed by the
positive answer, each time with more or less certainty. The equivalence of the two sessions in all other aspects of synthe-
opinion of the subjects on the matter was of great interest, sis’ global quality. Second, even through a non-complicated
as with their spontaneous thoughts they have underlined mapping, the general impressions for timbre modification
one of the most important issues for an interface: how to and time precision were positive, as well as for the interface
establish a learning curve that would not discourage ama- itself as a device. Third, the interface succeed in provid-
teurs from getting on with learning and in the same time ing sufficient conditions for learning patterns and exploring
set high limits for the perfection of performance and thus new gestures, with a priority in the advanced users learning
be intriguing for more experienced users to go on explor- curve. Finally, even not proved from the particular exper-
ing the capabilities of the virtual instrument. Hence, as far iment, the decisions concerning the expandability options
as the term of learnability converges with the issues set by of the setup that were left open during architectural design
the term of explorability, it would be worth having a look (such as the option for the interface to be trained for com-
at the statements of some of the subjects (1st, 5th and 6th plex gestures) were not discouraged by the results of the
respectively): experiment.
‘Many difficulties encountered when trying to explore new sounds... Recent results showed that the use of an interface for mu-
difficulties to find a logic and patterns...’ sic within an affective protocol could be beneficial. In the
‘...For the manipulation some time is necessary to explore the future we will focus on consolidating our results with fur-
possibilities but when it’s done, it is very interesting to produce ther experiment and artistic performance use cases. In this
different sounds.’ framework it is worth to also deal with technical issues con-
‘... However, the control on the second experiment was less cerning the interface: robustness, increased sensibility and
effective, maybe due to that it demanded a higher degree of ex- enhanced multimodal techniques, instability under difficult
pertise gained through practice.’ . light conditions, latency etc.. Enhancements within pure
In a question asking for the subject’s expectation concern- recognition issues could also help to improve the overall per-
ing the exploration of new sounds in an hypothetical second formance of the interface.
chance with the interface all subjects have responded pos- Finally, for an affective interface such as ‘Pogany’, even
itively, as if the impression created to themselves is that if the visual head animation feedback implicitly creates cor-
there is still part of the potential of the interface not discov- respondences between users emotions and sound results, a
ered yet. Some of the subjects underlined the importance of study of relative research in psychology field (such as a
the visual feedback in the form of an animated head for the model for touching parts of the body) is more than impera-
exploration of the sound capabilities of the interface. Con- tive. However, such a model is difficult to evaluate, due to
cerning this kind of feedback, all subjects found it in all ways the polyparametric nature of actions of touch among people
useful, also mentioning ‘control’ and ‘logic’ in the sound as and the social factor effect. Nevertheless, this would surely
factors of the creation where it can contribute. help create a solid base for the semantic space to be linked
About the general impression on the interface, the 1st to gestural information.
subject was rather negative. He insisted in the problems he
encountered in trying to understand how exactly it works.
All the other subjects found the interface at least interest-
7. REFERENCES
ing. Although some subjects claimed not to have familiarity [1] Jacquemin, C. ‘Pogany: A tangible cephalomorphic interface
for expressive facial animation’, ACII ’2007, Lisbon, Portugal,
with the ‘type’ of music it produced, or even not to find it 2007.
pleasant, this did not prevent them from attaining a good [2] Paiva, A., Andersson, G., Hook, K., Mourao, D., Costa, M.,
general impression: Martinho, C.‘Sentoy in FantasyA: Designing an affective
‘...sometimes it is noisy, but it’s funny. I felt like playing (good sympathetic interface to a computer game’, Personal
or bad!) a music instrument...’ Ubiquitous Comput. 6(5-6) (2002) 378389.
A subject underlined the constructive appropriateness of [3] Yonezawa, T., Suzuki, N., Mase, K., Kogure, K.,
the ’Pogany’ interface for such a scope: ‘HandySinger: Expressive Singing Voice Morphing using
Personified Hand-puppet Interface ’ , NIME 06, Paris, 2006.
‘Touching the interface seems important and the contact/touch
[4] Ekman, P., Friesen, W.V.: ‘Facial action coding system: A
impression is quite nice...’ technique for the measurement of facial movement’.
Finally, some of the subjects proposed types of usage Consulting Psychologists Press, Palo Alto, CA, USA,1978.
where setups such as the one of ’Pogany’ for music would [5] http://virchor.sourceforge.net
prove particularly useful, such as for blind people. An in- [6] Maniatakos, F., ‘Affective interface for emotion-based music
spiring point of view was also set from one of the subjects, synthesis’, Sound and Music Computing conference SMC07,
mostly concerning intuitive purposes of tangible interfaces Leykada, Greece, 2007.
for music : [7] htk.eng.cam.ac.uk/
‘With this interface, people have to guess how to touch it, to [8] Maniatakos, F., ‘Cephalomorphic interface for emotion-based
musical synthesis’, ATIAM Master Thesis, UPMC Paris 6 &
learn it by themselves...perhaps a ’traditional’ instrument player,
IRCAM, LIMSI-CNRS, Orsay, France, 2007.
after practicing with an interface such as the head interface, will
[9] Wanderley, M., Orio, N. ‘Evaluation of input devices for
try to find other manners to play with his instrument and produce musical expression; borrowing tools from HCI’, Computer
new sounds.’ Music Journal, 26(3):6276, 2002.
127
Hoppsa Universum – An interactive dance installation for

children
Anna Källblad Anders Friberg Karl Svensson
University College of Dance Speech, Music and Hearing SKIMRA ljusproduktion STHLM
Stockholm, Sweden CSC, KTH Hallandsgatan 50
Stockholm, Sweden SE 11857 Stockholm, Sweden
+46 73 6870718 +46 8 7907876 +46 705 617714
anna.kallblad@bredband.net afriberg@csc.kth.se karl@skimra.com
Elisabet Sjöstedt Edelholm

University College of Dance
Stockholm, Sweden
elisabet.sjostedt-
edelholm@danshogskolan.se
ABSTRACT a form of choreography. (could be called social choreography or

It started with an idea to create an empty space in which you play choreography) The present aim was to create an environment
activated music and light as you moved around. In responding to where visitors of all ages felt inspired to move with focus on one’s
the music and lighting you would activate more or different own movement, space and sound, and other people’s movement
sounds and thereby communicate with the space through your rather than on the understanding of the technology or the
body. This led to an artistic research project in which children’s underlying system. The physical and emotional experience of
spontaneous movement was observed, a choreography made being in the space, the awareness of self through exploration with
based on the children’s movements and music written and body in space and music, all in a playful manner was in focus.
recorded for the choreography. This music was then decomposed These ideas of improvisation were inspired by co-author Sjöstedt
and choreographed into an empty space at Botkyrka konsthall Edelholm’s work regarding children’s dance education and the
creating an interactive dance installation. It was realized using an use of improvisation in technical training [8]. An artistic research
interactive sound and light system in which 5 video cameras were project, Hoppsa Universum was initiated for realizing these ideas
detecting the motion in the room connected to a 4-channel sound by choreographer Anna Källblad at University College of Dance,
system and a set of 14 light modules. During five weeks people Stockholm together with Modern Dance Theater, Stockholm.
of all ages came to dance and move around in the installation. The Källblad’s work deals with themes such as people’s need for
installation attracted a wide range of people of all ages and the contact, their self-expression and what forms this takes. This has
tentative evaluation indicates that it was very positively received resulted in a series of works dealing with the roles of the
and that it encouraged free movement in the intended way. performer and the audience, and how the shape of the perform-
Besides observing the activity in the installation interviews were ance space and the physical relationship between the performer
made with schoolchildren age 7 who had participated in the and the audience affect the audience experience 1 . In “For Your
installation. Eyes Only”, a dancer performed a dance live for one audience
member at a time inside a closed box measuring 2x1m. The box
Keywords was placed in various places such as theatre lobbies, hospital
Installation, dance, video recognition, children’s movement, lobbies, nightclubs and art galleries. Källblad has been inspired by
interactive multimedia works by Susan Kozel and Gretchen Schiller for example
“Trajets”, in which large screens showing graphic images of
1. INTRODUCTION moving bodies are controlled by the visitors’ movements in the
Observations of children running around in big spaces, led to the room. 2
idea of creating an open space that responded with sound and light The idea of controlling music with gestures is not new. The
to movement. The children’s spontaneous movement with DIST group in the University of Genova has investigated several
elements of repetition, theme and variation could be perceived as ways of artistic interactions between gesture and sound. The
analysis of general gesture parameters have been inspired by the
theory by Rudolf Laban and the Japanese KANSEI research [3]. It
Permission to make digital or hard copies of all or part of this work for has resulted in the Eyes Web system; a general-purpose graphical
programming environment for extracting gesture analysis from
otherwise, or republish, to post on servers or to redistribute to lists, 1
www.wipsthlm.se /sk_konstnarer.html
requires prior specific permission and/or a fee. 2
NIME08, June 4-8, 2008, Genova, Italy www.incult.es/projectinfo.php?id=8&pi=2
128
video input [1] that has been used in several European research in each group (age divided) were let into an empty dance studio
projects (e.g. [2]). There exists a multitude of other software (without mirrors) measuring 150 m2. Each session was filmed
aimed at real-time video analysis and manipulation (e.g. from a fixed video camera. The instruction was ”Welcome in, we
Troikatronix 3 , MAX/MSP/Jitter 4 ). However, EyesWeb was will start in a while”. The children who all knew each other
designed primarily for analysis of expressive human gestures and coming from the same preschool group, immediately started to
was thus chosen for the present application. move around in the space. After approx. 5 minutes (depending on
Gesture control of audio with video recognition has been the activity in the room) music was put on. Music of different
used in several applications at KTH. The first attempt was the styles (orchestral, pop songs) with different rhythm and tempo
Groove Machine in which a dancer controlled the mixing of was played for each group. The music was played for 10-15
different music loops using the emotional expression of the dance minutes, after which the children were gathered in a circle to rest
predicted by a combination of overall motion features extracted and to discuss what had happened. The children were asked if
from a video camera. This evolved later into the computer game they remembered how they had moved. The same music was
Ghost in the Cave centered on emotional expression of gestures played again. The children then repeated some of the movements
and music [6]. It was played with two competitive teams, each and made new ones. During the second phase of moving around
with one to three main players. The main players had to express 2-3 children at a time were invited to a 5 minute recorded
different emotions with either gestures or vocally that was then interview. Accompanied by the interviewer they left the room,
recognized by the computer utilizing fuzzy logic techniques [4]. followed blue arrows taped on the floor in the outside corridor to
An important feature was the collaborative aspects in the game. another room close by. After the interview they returned to the
For example, the team controlled the speed of an avatar while the dance studio and joined in the ongoing activity. Summarizing
main player controlled the steering. The team also controlled the these interviews, most of the children experienced the session in
music – the more they were dancing the more intense became the the dance studio as positive. No one expressed negative feelings
music. The music was synchronized across the teams. One of the with the exception of a few responses referring to a particular
teams controlled the percussion instruments and the other team all music example and the dark studio being scary. (Two groups
the other instruments. came a second time and spent the session in a black dance studio
In the recent artistic research project “Nu Moove” led by instead with purposely little lighting and using flashlights. The
Lisa Ladberg, Royal College of Music, Stockholm, gesture idea was to observe if a near to dark room affected the amount of
control of sounds was used in a stage production. Two movement activity. It turned out that it influenced the movement
professional dancers controlled an interactive sound synthesis very little.) In these interviews to the question what is your
using both overall motion parameters and different zones on the favorite type of movement the answer most often was running.
stage. Choreographic Processing With Dancers
Siegel and Jacobsen [7] describe an example of an interactive In the second step, based on an analysis of what happened in the
dance application, which has many ideas in common with the sessions with the children; through looking at shapes, effort and
present study. Instead of having non-invasive video cameras they timing of movement; balance and shift of weight of the body;
used custom-built bending sensors attached to the joints of a overall movement patterns, interaction, and moods; a 23 minutes
professional dancer. The bending data was transferred wirelessly dance was choreographed. This was done in collaboration with
to a computer and mapped to sound-generating software. Several five professional contemporary dancers. The dancers began by
scenes were defined using different mappings and sound material. learning some of the children’s movement in detail. In this
They also discussed the interesting shift of the roles of the process something that stood out compared to other choreographic
composer and dancer. When the dancer is controlling the music processes was a relationship to time, interaction and perception of
there is a shift from being a dancer to becoming a musician. This self. In the children’s movement there was no expression of
new role of using gestures to control the sound may come in anticipation, planning, or judging. The adult dancers tried to move
conflict with the visual impression of the gesture; the latter with the same intent and found this very difficult. When trying to
obviously being the modus operandi of a dancer. do the children’s material they became aware of their own habits
The dancers’ role in the interaction between choreographer of for example anticipation, or judging. Trying to move without
and dancers has changed considerably recently. The dancer’s role these learned habits became one of the main focuses in the work
has undergone a change from interpreting the choreographer’s and affected it on all levels regarding, space, shape, timing,
intentions and movements to being actively participating in the emotional expression, the way scenes were structured, overall
creative process by also providing more of the movements, that is, dramaturgy, and relationships between the characters/dancers and
going from interpreters to creators. In the current project this has the audience. When looking at the movement and working with it
been taken one step further in that there is no determined we found an emotional content that we tried to put into the
choreography and the audience becomes performers themselves choreography and its performance. This came to play an important
taking an active part in the result. role in the choreography and therefore also in the music based on
the choreography. An example of these types of gestures can be
2. METHOD seen in Figure 1.
Children’s Moving Session
Composing Music
The development of the installation was the last stage in a three-
In the third step music was composed for the choreography based
step process that started with observation of children’s’ free
on a live dance performance and on a video recording of the
movement. Twelve groups of children age 3-6, with 7-9 children
choreography by the composer Niko Röhlcke. He analyzed the
choreography and made a chart of it according to a timeline
3
www.troikatronix.com marking for example rhythmical and spatial patterns and themes.
4
www.cycling74.com He also worked with tempos and efforts set by the dancing,
129
images that the different sections evoked and specific movements Even though the installation was temporary, we wanted the
in the choreography. The finished music contained six sections of equipment to look as it was integrated into the room. By removing
clearly different character. distractions we wanted to help the participants to focus on their
own bodies in the interaction with the music and the light. As the
exhibition hall could never meet the weight and power
requirements of a full light rig we had to find a flexible, light and
low power solution. Therefore we chose fluorescent light modules
from the manufacturer Leader Light. Each module is fitted with
four tubes in the color red, green, blue and white. There is full
color blending, a very bright output with very low power usage
and a reasonable weight. Fourteen modules divided in two parallel
rows were neatly fitted into gaps in the acoustic padding in the
ceiling, covering the room’s entire length. The light installation
was programmed and controlled through an AVAB Pronto! ver.
3.1 DMX light console.
A speaker was placed in each of the four corners. All the
equipment were connected to one PC computer running Windows
Figure 1. An example taken from the dance performance with XP equipped with a sound card (E-MU 0404) and two analog
the dancers interpreting children’s movements. video capture boards (IEI, IVC-200).
Choreography Of Music Cameras

In the fourth step the actual music material of the installation
was obtained by decomposing the music based on the sections and
within a section according to its relationship to where in space and
time movement and music had coexisted in the choreography. The
different sounds were in this way ”choreographed” into the room.
The interaction was tested in advance in a dance studio with four
cameras temporarily placed in the ceiling using tripods. The aim
was to create a sound environment that invited and encouraged
movement throughout the whole space in different ways in
different musical sections.
Although basing the decomposing of the music on the
choreography it was more important that each scene worked in its
own, was musically interesting and that it inspired to movement
than for the sound response to stay true to, for example, exact
patterns in the choreography according to the original musical
score. In each step of the process artistic choices were made in
order to keep with the feeling of what the artist wished for the Figure 2. A picture of the installation showing two of the
installation to express. cameras and two of the loudspeakers.
3. TECHNICAL DESCRIPTION
The final system was realized in the Botkyrka art exhibition hall The sequences and different scenes were pre-programmed in the
situated in Tumba outside Stockholm, see Figure 2. In the light console. The overall control including change or scene and
exhibition hall, a room with the size 130 m2 (10x13m) was interactive control from the motion was controlled from the
constructed using moveable walls. The aim was to use the whole computer using the MIDI protocol.
floor as the active space for controlling the music. This was The motion analysis of the video cameras was done in the
realized by using four video cameras mounted in the ceiling about EyesWeb program version 3.3.0 [1]. Each camera zone (covering
4 m above the floor pointed downwards. Variable zoom lenses on one fourth of the floor) was subdivided into four regions resulting
the cameras were adjusted so that each camera covered one fourth in 16 different active zones covering the floor area. These zones
of the floor area. In order to be as much as possible independent were mapped differently to the music and light in each scene. The
of the lighting, we used video cameras for night surveillance analysis of the motion was rather simple and consisted of
equipped with an infrared light ring (KPC-N600) and with an computing the overall quantity of motion (QoM) for each camera
additional filter that removed visible light. In this way it was and each sub zone (see also [5]). Briefly, this was computed by
possible to use dynamically changing light without interfering taking the difference between consecutive frames of the black and
with the movement detection provided that that light change was white video input, applying a threshold and counting the number
not too fast. of white pixels. This number is the resulting QoM measure and
The purpose of the light design was to stimulate movement, will reflect the amount of total movement within the camera view
hint the functionality of the installation, emphasize the difference (or sub zone). In addition, the frame difference was also computed
between the scenes - making the changes obvious, and to create a for the inverted input video signal. By using this double detection
visual atmosphere corresponding to - and working together with - the number of changing pixels for a given gesture will
the music and the sound effects. approximately be twice as much since both sides of the body
contour are detected. Thus it considerably improves the detection
130
accuracy. The resulting output from the video analysis is a stream The light started with an instant bright all covering green
of numbers reflecting the amount of motion in each zone. light, sharply contrasting the purple color of the previous scene.
A patch written in pure-data (pd) 5 served as the overall This light quickly faded to the basic dim light of the scene. We
control unit. The complete sequence consisted of six different worked exclusively with the green color in this scene. Each of the
scenes controlled by a timer unit. Each scene was active for a sixteen zones where connected to the closest corresponding light.
fixed time duration and the whole sequence took about 20 min to As the participants played the different instruments, they were at
cycle through and was then repeated. All the audio control was the same time playing the light. In this way the light hinted the
done within the pd patch. There was a separate sub patch for each different camera zones giving the participants a visual reference to
scene containing the audio samples and the specific control in where a particular sound was played.
each scene. All audio samples were panned in four channels and
positioned according to their trigger position in the room Scene 3 Waltz
facilitating for the user to understand the interaction. Similar to scene 1 the tracks of the original music were divided in
A large floor map showing the correlation between different groups. However, the control was slightly different. The
movement and sound response in the different scenes, was at melody track was activated if there were movement anywhere in
display outside the exhibition space and was used to explain to the the room. Thus one person could walk around in the whole room
visitors how the interaction worked. The idea of having the map and play the melody. When there was movement in two zones
on the floor was to better couple the explanation with the simultaneously, also different effect type of sounds were
installation and to encourage the visitors of physical action while activated. When there was movement in at least three major zones
comprehending the functionality of the system. Following is a the rhythmic accompaniment was finally activated.
description of each scene and its interaction. In scene three we worked with a pale yellow color base.
When a zone was activated, as the music faded up, the pale yellow
Scene 1 Disco light dimmed slightly and gave way for a soft red pulse. This
The tracks in the original sound track were divided into four main change only occurred in the activated zone. The idea was to
parts coupled to the four cameras. The music was groove-oriented stimulate movement in one zone at a time. With participants in all
with two different percussion parts, each divided in two levels, four zones the entire room became warmer and more suitable for a
one bass part and one part with the rest of the melodic lines and soothing waltz.
the accompaniment. The interaction can be seen as an advanced
mixer in which the music is continuously playing and with the Scene 4 Lightning
volume of the different tracks controlled by the QoM from each The major interaction was the possibility to exchange “lightning”
camera. Thus, when nobody is moving the room it is silent. pulses from one side to the other. A large movement in one corner
Moving in one zone will activate the corresponding track. The activated these. There was a resulting light rapidly moving to the
amount of motion controls the volume but will also activate new other side and a corresponding sound moving in the same
tracks in order to enhance the coupling between motion and direction. This pulse could not be retriggered before it has been
sound. Thus to “play” the whole music there must be several “sent” back from the other side. In addition, movement in the
people moving. Scene 1 was optimized so that the typical children outer areas activated some background sounds. The light base was
movement of running around the room in a big circle would space blue. Using the white fluorescents in one row successively
maximize the music output. created the lighting pulse.
The basic light was dim blue with just a tint of red. When
movement occurred in one of the zones the blue light started to Scene 5 Techno/metal
move in a smooth chase clockwise around the room. If two zones Similar to scene 1 and 3 the music was divided in tracks. The
where activated simultaneously a red chase was added. Though, major groove was activated if everybody was gathered in the
just as smooth the red chase was slightly faster allowing every middle of the room and was dancing forcefully. Some effect
nuance of color between blue, purple and red to run through the sounds were triggered in the corners of the room.
room. Activating a third zone made the intensity of the blue chase The green and blue lights corresponding to the activation
reach maximum and adding a movement in the forth zone made zones in the middle was dimly lit from the start. When the first of
the red chase peak as well. these zones where activated the green lights started to strobe to
The main idea was that the running light would stimulate the the beat of the music. Upon adding activity in the second and third
participants to run. In practice if a few people were running zone the blue and white fluorescents started to strobe as well.
around very fast they could activate all four zones, thus getting the Activating the outer corner zones sent red light running back and
entire soundtrack as well as the full light. forth through the entire length of the room.
Scene 2 Chinese percussion Scene 6 Dawn

In this scene different traditional Chinese instruments could be Here the light was sequenced simulating a dawn with gradually
played. There were several different percussion sounds as well as increasing light and changing colors. The only way to make sound
bowed and plucked instruments. All the 16 zones were used for was to move on the floor of the room as detected by the fifth
triggering individual instrument samples. They were divided into video camera mounted in a corner. When everybody was creeping
different instrument groups such as percussive sounds and string on the floor a bird singing sound emerged followed by a drone
sounds. The sound started if the QoM was reaching a fixed sound. Scene six started completely dark. All the lights were then
threshold. The volume was controlled by the amount of QoM. randomly turned on with the lowest possible intensity and then
slowly faded up over a period of three minutes leaving the room
5
puredata.info very brightly lit in the end.
131
4. EVALUATION right way. Adults with little children were more relaxed. Older
The exhibition was open to the public during a five-week period. people were both shy and let go. Many of them brought their
Schools could book for a guided tour of the installation and a grandchildren. 9-13 year-olds who hangs out in the nearby
workshop. During public open hours there were personnel shopping center found ”Hoppsa” where they run around a
available to answer questions and to offer a guide to the played for a while.
installation. The installation came to attract people of all ages Many old people came, some from the elderly home, and
from preschool to senior citizens. All available school tours were groups from schools for people with special needs. These
fully booked and there were in total about 3980 visitors. groups spontaneously saw a room that gave energy enabling
Interviews were made with 25 children age 7 two weeks after they them to walk around. They were sometimes hesitant at first,
had visited the installation. Summarizing their answers the standing in a corner. High school kids went mad and the
children all thought it was a fun, exciting and magical experience. teachers backed and there became groups within the groups.
They were fascinated by fact that sounds were invisible. They The ”cool guy” was actively participating but also the shy ones
could not see any instruments, and they were impressed that one dared also.
could not see where the sound came from. They felt great freedom
to dance and to move around freely without instructions from any
5. DISCUSSION
adult. This was an indication that the intended purpose with the The idea of the project was, both in process of making it and in
installation worked with the children. It was also informally the final installation, to be very open to input from those who
confirmed while observing the visitors in activity in the participated and to allow for them to shape the project. Making
installation. room in the process and in the final “product” for shared ideas and
Following are some representative quotes from the interviews: working material (movements, music) contributed greatly to the
Q: What did you think when you heard the music? project and gave it lots of energy and momentum in moving
A: I thought it was fun and that you could dance how you forward. In this process with its different steps, how something
wanted and you didn’t need to decide, as it usually is. was done, to great extent also shaped what was done.
A: Different things, perhaps that it was fun. Something can bee said about usefulness, purpose together
A: I thought when there was lightning, and then I was just with openness. The aim was to offer a room that had possibilities
playing and having fun. It was a bit strange because they were to become different things depending on how one moved in it. So
transparent; I mean invisible instruments in that room. one could se it as a place where the participant used the room for
A: I thought it was very well done because I would never her or his needs, and at the same time through her or his dancing
dream about making such invisible instrument in the rooms. It expressed this in a sort of performance. This could thus be seen a
was like magic. The lightning, it was like you only did like that “product” that consists of an empty room, no light, no sound, until
with the hand in the air. That was very cool. someone steps in, and shapes it according to her or his needs or
A: It was exciting pleasures. This also touches on ideas such as does the visitor
Q: What was exciting? becomes a dancer? Does the dancer become a musician or does
A: That the music started without you knowing it. It changed. the music become a dance or a choreography? From our
Q: Why did you want to move to that music in that room? experience from the installation, professional dancers tended to
A: It felt so good to be able to move when the music was on. see it as a dance improvisation focusing on the physical and
A: It’s fun, instead of standing still. It is more fun to dance. emotional experience in interaction with music and light.
A: It’s more fun to move than to stand still. Musicians, on the other hand, tended to see it as musical
A: It was so, it was like it steered me. It was so fun I felt I had instruments controlled by gestures. Either way it is through their
to move. bodies in motion that they interact with the room.
A: They (the instruments) almost steered me. It felt a little When meeting with dance and art teachers who were to
weird sometimes you could do it but you could not. (Shows a receive school groups in the installation a discussion came up to
movement) what extent the visitors/participants should be given instructions.
Q: You could not? The teachers had planned for a workshop in and around the
A: I wanted to do it then it steered so I started running instead. installation that included painting and making a performance. The
Q: How did it feel to move in that room? choreographer wanted for the visitors to have the freedom to do
A: Good! what they felt like in the installation. Depending on one’s
A: It was fun. To be with friends and play with them at the background one had different preferences of how to use the
same time you are dancing. installation. This relates to the possibility of the shaping of the
A: It was fun with my friends. “product” or experience. How one uses it affects what one gets, in
A: It was a lot of fun. this case the children’s experience of being in the space.
Q: Why was it fun? It is popular from elementary to high school education to use
A: It was just like that with different music and dances and art and dance for learning about other things for example
different light and everything was sort of higgledy-piggledy. communications skills, discipline, conflict solving, physical
exercise, mathematics and history 6 . We argue that it is also very
Following are some comments from the staff at the exhibition hall important to practice art as art in order to understand it and its
to the question “What have you seen children and adults do?” processes and techniques, including creativity, to fully make use
Children were open and unafraid; grown-ups asked what one of it in all those other areas mentioned above. From the interviews
should do. The children were more spontaneous and wild. I with the 7 year olds who had visited the installation one can see
told them to go in, to try it a little and to come back. I tried to
6
get them to discover new movement patterns. Many people http://www.oru.se/templates/oruExtNormal____37079.aspx
were looking for an answer/key in order to interpret it in the
132
that they had strong memories and impressions. The opportunities the 1999 IEEE International Conference on Systems, Man,
they were given to explore and make decisions themselves played and Cybernetics, SMC'99. IEEE Computer Society Press,
an essential part in their experience and memory of it. New York, 1999.
[4] Friberg, A. A fuzzy analyzer of emotional expression in
6. ACKNOWLEDGMENTS music performance and body motion. In Brunson, W., &
This project was supported by The University College of Dance,
Sundberg, J. (Eds.), Proceedings of Music and Music
Stockholm, Swedish Arts Council, City of Stockholm, Stockholm
Science, Stockholm 2004, 2005
County Council, The Modern Dance Theater, Stockholm, and The
Municipality of Botkyrka. [5] Friberg, A. Home conducting: Control the overall musical
We would like to thank the composer Niko Röhlcke, the expression with gestures. In Proceedings of the 2005
dancers Linda Adami, Johanna Klint, Kerstin Abrahamsson, International Computer Music Conference, ICMA, San
Maryam Nikandish, Stina Nyberg, and the set designer Tove Francisco, (pp. 479-482), 2005
Axelsson. [6] Rinman, M-L., Friberg, A., Bendiksen, B., Cirotteau, D.,
Dahl, S., Kjellmo, I., Mazzarino, B., & Camurri, A. Ghost in
7. REFERENCES the Cave - an interactive collaborative game using non-verbal
[1] Camurri A., Hashimoto S., Ricchetti M., Trocca R., Suzuki communication. In Camurri, A., & Volpe, G. (Eds.), Gesture-
K., and Volpe G. EyesWeb – Toward Gesture and Affect based Communication in Human-Computer Interaction (pp.
Recognition in Interactive Dance and Music Systems. 549-556), Springer Verlag, Berlin, 2004.
Computer Music Journal, 24(1), 2000, 57-69.
[7] Siegel, W. and Jacobsen, J. The Challenges of Interactive
[2] Camurri, A., De Poli, G., Friberg, A., Leman, L., and Volpe, Dance: An Overview and Case Study. Computer Music
G. The MEGA project: analysis and synthesis of Journal, 22(4), 29-43, 1998.
multisensory expressive gesture in performing art
applications. Journal of New Music Research, 34(1), 2005, 5- [8] Sjöstedt Edelholm, E. and Wigert, A. Att känna rörelse - en
21 danspedagogisk metod. (To know/feel movement – a dance
pedagogical method), Carlsson, Stockholm, 2005.
[3] Camurri, A., Hashimoto, S., Suzuki, K. and Trocca, R.
KANSEI Analysis of Dance Performance. In Proceedings of
133
Mappe per Affetti Erranti: a Multimodal System for Social

Active Listening and Expressive Performance
Antonio Camurri, Corrado Canepa, Paolo Coletta, Barbara Mazzarino, Gualtiero Volpe
InfoMus Lab – Casa Paganini

DIST- University of Genova
Piazza Santa Maria in Passione 34
16123 Genova, Italy
+39 010 2758252
{toni, corrado, colettap, bunny, volpe}@infomus.org
ABSTRACT experience are going to be developed. With active experience

This paper presents our new system Mappe per Affetti Erranti and active listening we mean that listeners are enabled to
(literally Maps for Wandering Affects), enabling a novel interactively operate on music content, by modifying and
paradigm for social active experience and dynamic molding of molding it in real-time while listening. Active listening is the
expressive content of a music piece. Mappe per Affetti Erranti basic concept for a novel generation of interactive music
allows multiple users to interact with the music piece at several systems [1], which are particularly addressed to a public of
levels. On the one hand, multiple users can physically navigate beginners, naïve and inexperienced users, rather than to
a polyphonic music piece, actively exploring it; on the other professional musicians and composers.
hand they can intervene on the music performance modifying Active listening is also a major focus for the new EU-ICT
and molding its expressive content in real-time through full- Project SAME (Sound and Music for Everyone, Everyday,
body movement and gesture. An implementation of Mappe per Everywhere, Every Way, www.sameproject.eu). SAME aims
Affetti Erranti was presented in the framework of the science at: (i) defining and developing an innovative networked end-to-
exhibition “Metamorfosi del Senso”, held at Casa Paganini, end research platform for novel mobile music applications,
Genova, in October – November 2007. In that occasion Mappe allowing new forms of participative, experience-centric,
per Affetti Erranti was also used for a contemporary dance context-aware, social, shared, active listening of music; (ii)
performance. The research topics addressed in Mappe per investigating and implementing novel paradigms for natural,
Affetti Erranti are currently investigated in the new EU-ICT expressive/emotional multimodal interfaces, empowering the
Project SAME (Sound and Music for Everyone, Everyday, user to influence, interact, mould, and shape the music content,
Everywhere, Every Way, www.sameproject.eu). by intervening actively and physically into the experience; and
(iii) developing new mobile context-aware music applications,
Keywords starting from the active listening paradigm, which will bring
Active listening of music, expressive interfaces, full-body back the social and interactive aspects of music to our
motion analysis and expressive gesture processing, multimodal information technology age.
interactive systems for music and performing arts applications,
collaborative environments, social interaction. In the direction of defining novel active listening paradigms, we
recently developed a system, the Orchestra Explorer [2],
1. INTRODUCTION allowing users to physically navigate inside a virtual orchestra,
Music making and listening are a clear example of a human to actively explore the music piece the orchestra is playing, to
activity that is above all interactive and social. However, modify and mold in real-time the music performance through
nowadays mediated music making and listening is usually still a expressive full-body movement and gesture. By walking and
passive, non–interactive, and non-context sensitive experience. moving on the surface, the user discovers each single
The current electronic technologies, with all their potential for instrument and can operate through her expressive gestures on
interactivity and communication, have not yet been able to the music piece the instrument is playing. The interaction
support and promote this essential aspect of music making and paradigm developed in the Orchestra Explorer is strongly based
listening. This can be considered a degradation of traditional on the concept of navigation in a physical space where the
listening experience, in which the public can interact in many orchestra instruments are placed. The Orchestra Explorer is
ways with performers to modify the expressive features of a intended for fruition by a single user.
piece.
Our novel multimodal system for social active listening Mappe
The need of recovering such active attitude with respect to per Affetti Erranti starts from the Orchestra Explorer and the
music is strongly emerging and novel paradigms of active lessons learned in over one year of permanent installation of the
Orchestra Explorer at our site at Casa Paganini, and several
installations of the Orchestra Explorer at science exhibitions
personal or classroom use is granted without fee provided that copies are and public events.
not made or distributed for profit or commercial advantage and that Mappe per Affetti Erranti extends and enhances the Orchestra
Explorer in two major directions. On the one hand it reworks
requires prior specific permission and/or a fee. and extends the concept of navigation by introducing multiple
NIME08, June 4-8, 2008, Genova, Italy levels: from the navigation in a physical space populated by
Copyright remains with the author(s). virtual objects or subjects (as it is in the Orchestra Explorer) up
to the navigation in virtual emotional spaces populated by
134
different expressive performances of the same music piece. number of voices is interacting with the installation. Moreover,
Users can navigate such affective spaces by their expressive since each user controls the performance of the voice associated
movement and gesture. On the other hand, Mappe per Affetti to the area she occupies, the whole piece is performed with the
Erranti explicitly addresses fruition by multiple users and same expressive intention only if all the users are moving with
encourages collaborative behavior: only social collaboration the same expressive intention. Thus, the more users move with
allows a correct reconstruction of the music piece. In other different, conflicting expressive intentions, the more the
words, while users explore the physical space, the (expressive) musical output is incoherent and chaotic. But the more users
way in which they move and the degree of collaboration move with similar expressive intentions and in a collaborative
between them allow them to explore at the same time an way, the more the musical output is coherent and the music
affective, emotional space. pieces is listened to in one of its different expressive
performances.
Section 2 presents the concept of Mappe per Affetti Erranti;
Section 3 focuses on the specific aspect of expressive Mappe per Affetti Erranti can therefore be experienced at
movement analysis and describes the model we designed for several levels: by a single user who has a limited but still
navigating the affective space; Section 4 and 5 illustrate the powerful set of possibilities of interaction, by a group of users
implementation of an installation of Mappe per Affetti Erranti who can fully experience the installation, by multiple groups of
developed for the science exhibit “Metamorfosi del Senso” users. In fact, each physical area can be occupied by a group of
(Casa Paganini, Genova, Italy, October 25 – November 6, users. In this case each single group is analyzed and each
2007). Conclusions summarize some issues and future work participant in a group contributes to intervene on the voice
that emerged from such installation. associated to the area the group is occupying. Therefore, at this
level a collaborative behavior is encouraged among the
2. CONCEPT participants in each single group and among the groups
The basic concept of Mappe per Affetti Erranti is the participating in the installation.
collaborative active listening of a music piece through the
navigation of maps at multiple levels, from the physical level to The possibility of observing a group or multiple groups of users
the emotional level. during their interaction with Mappe per Affetti Erranti makes
this installation an ideal test-bed for investigating and
At the physical level space is divided in several areas. A voice experimenting group dynamics and social network scenarios.
of a polyphonic music piece is associated to each area. The
presence of a user (even a single user) triggers the reproduction 3. EXPRESSIVE MOVEMENT ANALYIS
of the music piece. By exploring the space, the user walks This section focuses on a specific and relevant aspect of Mappe
through several areas and listens to the single voices separately. per Affetti Erranti, i.e., how the system analyses the expressive
If the users stays in a single area, she listens to the voice intentions conveyed by a user through her expressive movement
associated to that area only. If the user does not move for a and gesture. Such information is used for navigating the
given time interval music fades out and turns off. affective, emotional space and for controlling the expressive
performance of a voice in the polyphonic music piece.
The user can mould the voice she is listening to in several ways.
At a low level, she can intervene on parameters such as Expressive movement analysis is discussed with reference to an
loudness, density, amount of reverberation. For example, by implementation of Mappe per Affetti Erranti we recently
opening her arms, the user can increase the density of the voice developed (see Section 4). In such implementation we selected
(she listens to the two or more voices in unison). If she moves four different expressive intentions: the first one refers to a
toward the back of the stage the amount of reverberation happy, joyful behavior, the second one to solemnity, the third
increases, whereas toward the front of the stage the voice one to a intimate, introverted, shy behavior, the fourth to anger.
becomes drier. In order to make description easier we will label such
expressive intentions as Happy, Solemn, Intimate, Angry.
At a higher level the user can intervene on the expressive Please note, however, that we consider the reduction to such
features of the music performance. This is done through the labels as a too simplistic way of describing very subtle nuances
navigation of an emotional, affective space. The system of both movement and music performance. In fact, we never
analyzes the expressive intention the user conveys with her described Mappe per Affetti Erranti to users in terms of such
expressive movement and gesture and translates it in a position labels. Rather, we provided (when needed) more articulated
(or a trajectory) in an affective, emotional space. Like the descriptions of the kind of expressive behavior we (and the
physical space, such affective, emotional space is also divided system) expected and we let users to discover themselves the
in several areas, each one corresponding to a different installation step by step.
performance of the same voice with a different expressive
intention. Several examples of such affective, emotional spaces Such four expressive intentions were select since they are
are available in the literature, for example the spaces used in different and characterized enough to be easily conveyed and
dimensional theories of emotion (e.g., see [3][4]) or those recognized by users. Furthermore, they are examples of
especially developed for analysis and synthesis of expressive low/high positive/negative affective states that can be easily
music performance (e.g., see [5][6][7]). mapped on existing dimensional theories of emotion (e.g.,
valence-arousal or Tellegen’s space).
Users can thus explore the music piece in a twofold perspective:
navigating the physical space they explore the polyphonic 3.1 Feature extraction
musical structure; navigating the affective, emotional space In our current implementation, analysis of expressive gesture is
they explore music performance. A single user, however, can performed by means of twelve expressive descriptors: Quantity
only listen to and intervene on a single voice at time: she cannot of Motion computed on the overall body movement and on
listen to the whole polyphonic piece with all the voices. translational movement only, Impulsiveness, vertical and
Only a group of users can fully experience Mappe per Affetti horizontal components of velocity of peripheral upper parts of
Erranti. In particular, the music piece can be listened to in its the body, speed of the barycentre, variation of the Contraction
whole polyphony only if a number of users at least equal to the Index, Space Occupation Area, Directness Index, Space Allure,
135
Amount of Periodic Movement, Symmetry Index. Such following motion phase. The variance of such inter-onset
descriptors are computed in real-time for each user. Most of intervals is taken as an approximate measure of PM.
descriptors are computed on a time window of 3 s. In the
context of Mappe per Affetti Erranti, we considered such time Symmetry Index (SI) is computed from the position of the
interval as a good trade-off between the need on the one hand of barycenter and the left and right edges of the body bounding
having an enough responsive system, and the need on the other rectangle. That is, it is the ratio between the difference of the
hand to give the users a time long enough for displaying an distances of the barycenter from the left and right edges and the
expressive intention. width of the bounding rectangle:
Quantity of Motion (QoM) provides an estimation of the | | | |

amount of overall movement (variation of pixels) the | |
videocamera detects [8]. Quantity of Motion computed on
where xB is the x coordinate of the barycentre, xL is the x
translational movement only (TQoM) provides an estimation of
coordinate of the left edge of the body bounding rectangle and
how much the user is moving around the physical space. Using
xR is x coordinate of the left edge of the body bounding
Rudolf Laban’s terminology [9][10], whereas Quantity of
rectangle.
Motion measures the amount of detected movement in both the
Kinesphere and the General Space, its computation on 3.2 Classification
translational movements refers to the overall detected In order to classify movement with respect to the four
movement in the General Space only. TQoM, together with expressive intentions, the values of the descriptors are quantized
speed of barycentre (BS) and variation of the Contraction Index in five levels: Very Low, Low, Medium, High, Very High.
(dCI) are introduced to distinguish between the movement of
the body in the General Space and the movement of the limbs in Starting from previous works by the authors (e.g., [8][13][14])
the Kinesphere. Intuitively, if the user moves her limbs but does and results of psychological studies (e.g., [11][15][16][17]), we
not change her position in the space, TQoM and BS will have characterized each expressive intention with a combination of
low values, while QoM and dCI will have higher values. levels for each descriptor. The Happy intention is characterized
by high energy, upward and fluent movement; almost constant
Impulsiveness (IM) is measured as the variance of Quantity of values in the kinematical descriptor with periodical peaks are
Motion in a time window of 3 s, i.e., a user is considered to associated with Solemn behavior; low energy, contracted, and
move in an impulsive way if the amount of movement the localized movements are typical of a shy behavior;
videocamera can detect on her changes considerably in the time impulsiveness, high energy and rigid movement are associated
window. to Anger. Table 1 summarizes such characterization.
Vertical and horizontal components of velocity of peripheral Classification is performed following a fuzzy-logic like
upper parts of the body (VV, HV) are computed starting from approach. Such approach has the advantage that it does not need
the positions of the upper vertexes of the body bounding a training set of recorded movement and it also is flexible
rectangle. The vertical component, in particular, is used for enough to be applied to the movement of different kinds of
detecting upward movements that psychologists (e.g., Boone users (e.g., adults, children, elder people).
and Cunningham [11]) identified as significant indicator of
positive emotional expression. The expressive intention EI referred to a time window covering
the last 3 seconds of movement is computed as:
Space Occupation Area (SOA) is computed starting from the
movement trajectory integrated over time. In such a way a
bitmap is obtained, summarizing the trajectory followed along arg max with 1. .
the considered time window (3 s). An elliptical approximation
of the shape of the trajectory is then computed [12]. The area of
where M = 4 is the number of expressive intentions, N = 12 is
such ellipse is taken as the Space Occupation Area of the
the number of motion descriptors, is the weight of the k-th
movement trajectory. Intuitively, a trajectory spread over the
motion descriptor, is the value of the k-th motion descriptor
whole gets high SOA values, whereas a trajectory confined in a
and is a function applied to the k-th motion descriptor. The
small region gets low SOA values.
function being applied depends on the expected level for such
Directness Index (DI) is computed as the ratio between the motion descriptor if the h-th expressive intention were detected.
length of the straight line connecting the first and last point of a In this first implementation weights have been selected
given trajectory (in this case the movement trajectory in the empirically while developing and testing the system.
selected 3 s time window) and the sum of the lengths of each
A Gaussian function is applied for motion descriptors whose
segment composing the trajectory.
values are expected to be High, Medium, or Low:
Space Allure (SA) measures local deviations from the straight
line trajectory. Whereas DI provides information about whether
the trajectory followed along the 3 s time window is direct or
flexible, SA refers to waving movements around the straight where A is the amplitude (set to 1), is the expected value
trajectory in shorter time windows. In the current for descriptor k, when the user shows expressive intention h and
implementation SA is approximated with the variance of the DI the variance is used for tuning the range of values for
in a time window of 1 s. which the descriptor can be considered to be at the appropriate
level (High, Medium, Low).
The Amount of Periodic Movement (PM) gives a preliminary
information about the presence of rhythmic movements. A sigmoid is applied for Very High or Very Low descriptors:
Computation of PM starts from QoM. Movement is segmented
1
in motion and pause phases using a threshold on QoM [13];
inter-onset intervals are then computed as the time elapsing 1
from the beginning of a motion phase and the beginning of the
136
where is used for tuning the range of values for which the about 10 m from the stage (we did not use sensors or additional
descriptor can be considered to be at the appropriate level (Very videocameras in this first experience). Four loudspeakers were
High or Very Low), controls the steepness of the sigmoid placed at the four corners of the stage for audio output. A white
and controls the type of sigmoid, i.e., = 1 if the screen covered the back of the stage for the whole 9 m width:
descriptor is expected to be Very Low and = -1 if the this was used as scenery since the current implementation of
descriptor is expected to be Very High (inverse sigmoid tending the installation does not include video feedback. Lights were set
to 1 for high values). in order to enhance the feeling of immersion for the users and to
have a homogenous lighting of the stage.
Intuitively, the output of the Gaussian and sigmoid functions
applied to motion descriptors is a measure of how much the The music piece we selected is “Come again” by John Dowland
actual value of a motion descriptor is near to the value expected for four singing voices: contralto, tenore, soprano, and basso.
for a given expressive intention. For example, if a motion With the help of singer Roberto Tiranti and composer Marco
descriptor is expected to be Low for a given expressive Canepa we chose a piece that could be soundly interpreted with
intention and its expected value is 0.4, a Gaussian is placed with different expressive intentions (i.e., without becoming
its peak (normalized to 1) centered in 0.4. That motion ridiculous) and could result interesting and agreeable for non
descriptor will therefore provide the highest contribution to the expert users. We asked professional singers to sing it with the
overall sum if the real value is in fact the expected value. As a four different expressive intentions Happy, Solemn, Intimate,
consequence the highest value for the sum is obtained by that and Angry. The piece was performed so that changes in the
expressive intention whose expected values for descriptors interpretation could be perceived even by non-expert users.
according to Table 1 best match the actual computed values.
The physical map is composed by four rectangular, parallel
Table 1. Expected levels of each motion descriptor areas on the stage. Tenore and soprano voices are associated
for the four expressive intentions with the central areas, contralto and basso to the lateral ones.
This allows an alternation of female and male voices and
Motion
Happy Solemn Intimate Angry attracts users toward the “stronger” voices, i.e., the central ones.
descriptor
Navigation in the affective, emotional space is obtained with
QoM High Low Low High
the techniques for expressive movement analysis and
TQoM High Low Low High classification discussed in Section 3. As for music performance,
each recorded file was manually segmented in phrases and sub-
IM Medium Low Very low Very phrases. Changes in the expressive intention detected from
high movement triggers a switch to the corresponding audio file at a
position which is coherent to the position reached by that
VV High Low Low Medium
expressive interpretation as a result of the movement of other
HV High Medium Low High users/groups. In such a way we obtain a continuous
resynchronization of the single voices depending on the
BS Not Not Low Medium expressive intentions conveyed by users.
relevant relevant
In occasion of the opening of “Metamorfosi del Senso”,
dCI Medium Low Low Very choreographer Giovanni Di Cicco and his dance ensemble
high designed and performed a contemporary dance performance on
SOA Not Not Low High Mappe per Affetti Erranti. In such a performance, dancers
relevant relevant interacted with the installation for over 20 min, repeatedly
moving from order to chaos. The public of the dance
DI Medium High Low Low performance counted more than 400 persons in 3 days. Figure 1
shows a moment of the dance performance and a group of users
SA Low Low Medium Low experiencing Mappe per Affetti Erranti. The installation was
PM High Very high Low Very experienced by more than 1500 persons during “Metamorforsi
low del Senso”, with general positive and sometimes enthusiastic
feedback.
SI Medium Medium Not Low
relevant 5. IMPLEMENTATION: THE EYESWEB
XMI OPEN PLATFORM AND THE
4. THE INSTALLATION AT THE EYESWEB EXPRESSIVE GESTURE
SCIENCE EXHIBITION PROCESSING LIBRARY
“METAMORFOSI DEL SENSO” The instance of Mappe per Affetti Erranti we developed for the
Mappe per Affetti Erranti was presented the first time at the exhibit “Metamorfosi del Senso” was implemented using a new
science exhibition “Metamorfosi del Senso”, held at Casa version of our EyesWeb open platform [13][18]: EyesWeb XMI
Paganini, Genova, Italy, on October 25th – November 6th, 2007. (for eXtended Multimodal Interaction). The EyesWeb open
The exhibition was part of “Festival della Scienza”, a huge platform and related libraries are available for free on the
international science festival held in Genova every year. EyesWeb website www.eyesweb.org.
Mappe per Affetti Erranti was installed on the stage of the 250- With respect to its predecessors, EyesWeb XMI strongly
seats auditorium at Casa Paganini, an international center of enhances support to analysis and processing of synchronized
excellence for research on sound, music, and new media, where streams at different sampling rates (e.g., audio, video, data from
InfoMus Lab has its main site. The installation covered a sensors). We exploited such support for the synchronized
surface of about 9 m × 3.5 m. A single vidocamera observed the processing and reproduction of the audio tracks in “Come
whole surface from the top, about 7 m high, and at a distance of Again”. The whole installation was implemented as a couple of
137
EyesWeb application (patch): the first one managing video intention of a group as a kind of average of the expressive
processing, extraction of expressive features from movement intentions conveyed by its components or more complex group
and gestures, navigation in the physical and affective space; the dynamics have to be taken into account? Research on
second one devoted to audio processing, real-time audio computational models of emotion, affective computing, and
mixing, and control of audio effects. Every single component of expressive gesture processing usually focus on the expressive
the two applications was implemented as an EyesWeb sub- content communicated by single users. Such group dynamics
patch. The two applications ran on two workstation (Dell and their relationships with emotional expression are still
Precision 380, equipped with two CPUs Pentium 4 3.20 GHz, 1 largely uninvestigated.
GB RAM, Windows XP Professional) with fast network
connection. Another issue concerns the robustness of the selected
expressive movement descriptors with respect to different
Extraction of expressive descriptors and models for navigating analysis contexts. For example, the kind of motion a user
physical and expressive spaces were implement as EyesWeb performs when she stays inside an area in the space is often
modules (blocks) in a new version of the EyesWeb Expressive different under several aspects from the motion she performs
Gesture Processing Library. when wandering around the whole space. Motion inside an area
is characterized by movement of limbs, the amount of energy is
mainly due to how much limbs move, the expressive intention
is conveyed through movement in the Kinesphere. Walking is
instead the main action characterizing motion around the space,
the amount of energy of walking is much higher than the
amount of energy associated with possible other movements of
arms, the expressive intention is conveyed through the walking
style. The system should be able to adapt to such different
analysis contexts and different sets of motion descriptors should
be developed either specifically for a given context or robust to
different contexts.
Future work will also include refinements to the classifier and
formal evaluation with users. As for the classifier, it
encompasses many parameters (e.g., weights, parameters of the
functions applied to the movement descriptors) that need to be
fine tuned. In this first installation such parameters have been
set empirically during tests with dancers and potential users.
However, a deeper investigation based on rigorous experiments
would be needed in order to individuate a minimum set of
statistically significant descriptors and to find for them suitable
values or range of values for parameters. Formal evaluation
with professional and non expert users is needed for a correct
estimation of the effectiveness of the installation and its
usability.
Such future work will be addressed in the framework of the EU-
ICT Project SAME (www.sameproject.eu), focusing on new
forms of participative and social active listening of music.
7. ACKNOWLEDGMENTS
We thank our colleague and composer Nicola Ferrari for his
precious contribution in developing the concept of Mappe per
Affetti Erranti; choreographer Giovanni Di Cicco and singer
Roberto Tiranti for the useful discussions and stimuli during the
preparation of the dance and music performance; composer
Marco Canepa for recording and preparing the audio material.
Figure 1. Mappe per Affetti Erranti: on the top a We also thank singers Valeria Bruzzone, Chiara Longobardi,
snapshot from the dance performance; in the bottom a Edoardo Valle who with Roberto Tiranti performed “Come
group of user interacting with the installation. Again” with different expressive intentions, and dancers Luca
Alberti, Filippo Bandiera, Nicola Marrapodi who with Giovanni
6. CONCLUSIONS Di Cicco performed the dance piece on Mappe per Affetti
From our experience with Mappe per Affetti Erranti, especially Erranti. Finally, we thank our colleagues at DIST – InfoMus
at the science exhibit “Metamorfosi del Senso”, several issues Lab for their concrete support to this work, Festival della
emerged that need to be taken into account in future work. Scienza, and the visitors of the science exhibition “Metamorfosi
del Senso” whose often enthusiast feedback strongly
A first issue is related to the expressive movement descriptors encouraged us in going on with this research.
and the modalities of fruition of Mappe per Affetti Erranti. The
installation can be experienced by a single user, by a group, or 8. REFERENCES
by multiple groups. However, the expressive descriptors have [1] Rowe, R. Interactive music systems: Machine listening and
been defined and developed for analyzing movement and composition. MIT Press, Cambridge MA, 1993.
expressive intention of single users. To what extent can they be [2] Camurri A., Canepa C., and Volpe G. Active listening to a
applied to groups of users? Can we approximate the expressive virtual orchestra through an expressive gestural interface:
138
The Orchestra Explorer. In Proceedings of the 7th Intl. of cue attunement, Developmental Psychology, 34 (1998),
Conference on New Interfaces for Musical Expression 1007-1016.
(NIME-07) (New York, USA, June 2007). [12] Kilian J. Simple Image Analysis By Moments, Open
[3] Russell J.A. A circumplex model of affect, Journal of Computer Vision (OpenCV) Library documentation, 2001
Personality and Social Psychology, 39 (1980), 1161-1178. [13] Camurri A., De Poli G., Leman M., and Volpe G. Toward
[4] Tellegen A., Watson D., and Clark L. A. On the Communicating Expressiveness and Affect in Multimodal
dimensional and hierarchical structure of affect. Interactive Systems for Performing Art and Cultural
Psychological Science, 10, 4 (Jul 1999), 297-303. Applications, IEEE Multimedia Magazine, 12,1 (Jan.
[5] Juslin, P. N. Cue utilization in communication of emotion 2005), 43-53.
in music performance: relating performance to perception. [14] Camurri A., Mazzarino B., Ricchetti M., Timmers R., and
Journal of Experimental Psychology: Human Perception Volpe G. Multimodal analysis of expressive gesture in
and Performance, 26,6 (2000), 1797-1813. music and dance performances. In A. Camurri, G. Volpe
[6] Canazza S., De Poli G., Drioli C., Rodà A., and Vidolin A. (Eds.), Gesture-based Communication in Human-
Audio Morphing Different Expressive Intentions for Computer Interaction, LNAI 2915, 20-39, Springer
Multimedia Systems, IEEE Multimedia, 7, 3 (200), 79 – Verlag, 2004
83. [15] Wallbott H. G. Bodily expression of emotion, European
[7] Vines B. W., Krumhansl C.L., Wanderley M.M., Ioana M. Journal of Social Psychology, 28 (1998), 879-896.
D., Levitin D.J. Dimensions of Emotion in Expressive [16] Argyle M., Bodily Communication, Methuen & Co Ltd,
Musical Performance. Ann. N.Y. Acad. Sci., 1060 (2005), London, 1980.
462-466. [17] De Meijer M., The contribution of general features of body
[8] Camurri A., Lagerlöf I., and Volpe G. Recognizing movement to the attribution of emotions, Journal of
Emotion from Dance Movement: Comparison of Spectator Nonverbal Behavior, 13 (1989), 247-268.
Recognition and Automated Techniques, International [18] Camurri A., Coletta P., Demurtas M., Peri M., Ricci A.,
Journal of Human-Computer Studies, 59, 1-2 (2003), 213- Sagoleo R., Simonetti M., Varni G., and Volpe G. A
225, Elsevier Science. Platform for Real-Time Multimodal Processing, in
[9] Laban R., and Lawrence F.C. Effort. Macdonald & Evans Proceedings International Conference Sound and Music
Ltd., London, 1947. Computing 2007 (SMC2007) (Lefkada, Greece, July
[10] Laban R., Modern Educational Dance. Macdonald & 2007).
Evans Ltd., London, 1963.
[11] Boone R. T., and Cunningham J. G. Children's decoding of
emotion in expressive body movement: The development
139
NEW DATA STRUCTURE

FOR OLD MUSICAL OPEN WORKS
Sergio Canazza Antonina Dattolo
Centro Polifunzionale di Dept. of Informatics and
Pordenone Mathematics
University of Udine University ofUdine
Via Prasecco 3/A - 33170 Via delle Scienze 206 - 33100
Pordenone UDINE
+39 0434239404 +39 0432558757
sergio.canazza@uniud.it antonina.dattolo@dimi.uniud.it
ABSTRACT paper with a series of note groupings; he then has to

Musical open works can be often thought like sequences of choose among these groupings, first for the starting unit
musical structures, which can be arranged by anyone who and, next, for the successive ones in the order in which he
had access to them and who wished to realize the work. elects to weld them together: in this way, he can mount the
This paper proposes an innovative agent-based system to sequence of musical units in the order he chooses,
model the information and organize it in structured changing the combinative structure of the piece. In Pierre
knowledge; to create effective, graph-centric browsing Boulez’ Third Sonata for piano, the first section
perspectives and views for the user; to use authoring tools (Antiphonie, Formant 1) is made up of ten different pieces
for the performance of open work of electro-acoustic on ten corresponding sheets of music paper. These can be
music. arranged in different sequences like a stack of filing cards,
though not all possible permutations are permissible. A
particularly representative example of musical open work
Keywords is Scambi, an analogue tape work created in 1957 by the
Musical Open Work, Multimedia Information Systems, Belgian composer Henri Pousseur at the Studio di
Software Agents, zz-structures. Fonologia in Milan. By means of a specific process, called
dynamic filtering (realized by using a special equipment,
1. INTRODUCTION designed by Alfredo Lietti, the engineer of the studio), the
A classical musical composition (a Beethoven’s composer is able to extract from noise animated time
symphony, a Mozart’s sonata, or Stravinsky’s Rite of structures, then to process them further in different
Spring) posits an assemblage of sound units that the parameters, and thus to produce 32 sequences. These
composer arranged in a closed, well-defined manner sequences can be arranged by anyone who had access to
before presenting it to the listener. He converted his idea them and who wished to realize the work, according to
into conventional symbols, obliging (more or less) the certain rules regarding their order and possible
(eventual) performer to reproduce the format devised by overlapping.
the composer himself. On the contrary, a number of music Today several multimedia artistic works are ‘in
pieces (or, more general, of multimedia works) are linked progress’, ‘open’ in structure or in realization. Here the
by a common feature: the considerable autonomy left to audio recording is included in a complex procedure of
the individual performer1 in the way he chooses to play the audio signal processing and where different writing
work. Thus he is not merely free to interpret the systems (video, text, static pictures, gestures) flow
composer’s instructions following his own discretion (as together. In this field emerges the necessity to define
happens in traditional music), but he must impose his suitable data structures and to convey, in a single (digital)
judgment on the form of the piece, as when he decides in medium, verbal and musical documents, pictures, audio
what order to group the sounds: a real act of improvised and video signals.
creation. In Klavierstück XI, Karlheinz Stockhausen
presents to the performer a single large sheet of music The aim of this paper is to provide an innovative
system for the performance of electro-acoustic music open
works, considered as a representative subset of multimedia
1
As it is well known, the practical intervention of a performer artistic works. Will be introduced innovative authoring
(the musician who plays a music piece) is different from that of tools able to make open a musical work, enabling the user
an interpreter in the sense of consumer (somebody who listens to become self-author of new versions for a given work.
to a musical composition performed by somebody else). In this
context, however, both cases are seen as different These topics are faced using an agent-based extension
manifestations of the same interpretative attitude [5]. of the ZigZag model. Conceiving a system as aggregation
of autonomous and cooperative agents is one of the most
exciting aspects of the challenging arena known as multi-
Permission to make digital or hard copies of all or part of this agent system (MAS); this perspective revolutionizes
work for personal or classroom use is granted without fee radically the way in which a model may be conceived and
provided that copies are not made or distributed for profit or
work [3]. In addition, the presence of ZigZag model
commercial advantage and that copies bear this notice and the full
citation on the first page. To copy otherwise, or republish, to post guaranties a graph-centric browsing tool, which supports
on servers or to redistribute to lists, requires prior specific the representation of persistent context: user visualizes the
permission and/or a fee. unit of interest in relation to other associated units. But
NIME08, June 4-8, 2008, Genova, Italy more than this, he sees the unit of interest from that unit's
Copyright remains with the author(s). position relative to multiple perspectives or orientations.
As case study, we chose the cited, complex electro-
140
acoustic open work (Scambi by Henri Pousseur), actors, which interact with each other by sending
characterized by a variety of sequences and of different messages, improving on the sequential limitations of
performance degrees of freedom. passive objects.
Each actor is defined by three parts: a passive part, which
2. ACTOR–BASED ZIGZAG MODEL is a set of local variables, termed acquaintances, that
In order to present our model in section 3, this section constitute its internal state; an active part, that reacts to
introduces in two separate subsections (2.1 and 2.2) the the external environments by executing its procedural
basics of the ZigZag model and a brief description of the skills, called scripts. This active part constitutes actor’s
actors, a particular class of computational agents. behaviour; the third part is represented by the actor’s mail
queue, which buffers incoming communication (i.e.
2.1 The ZigZag model messages).
ZigZag [6] introduces a new, graph-centric system of
conventions for data and computing; it separates the Each actor has a unique name (the uniqueness property)
structure of information from its visualization (i.e. the way and a given behaviour; it communicates with other actors
the data – text, audio, video - is presented to the user); via asynchronous messages. Actors are reactive in nature,
therefore a ZigZag structure handles all the different i.e., they execute only in response to messages received.
visualizations necessary to realize an Electronic Edition of
An actor can perform three basic actions on receiving a
musical works.
message: create a finite number of actors with universally
The main element present in the ZigZag model is the fresh names, send a finite number of messages and
zz-structure: it can be viewed as a multigraph where edges assume a new behaviour.
are colored, with the restriction that every vertex, called
The actor's behaviour is deterministic in that its response
zz-cell, has at most two incident edges of the same color;
to a message is uniquely determined by the message
the sub-graphs, each of which contains edges of a unique
contents and its internal state. Furthermore, all actions
color, are called dimensions. The cells in a same
performed on receiving a message are concurrent.
dimension are linked into linear and directed sequences,
called ranks. Each dimension can contain a number of In order to describe the actors in our model we adopt the
parallel ranks, which are a series of distinct cells connected formalism used in [3]:
sequentially.
 (DefActor ActorName 
Since there is no canonical visualization, the pseudo-   [inherits‐from Class‐Name] 
space generated by zz-structures is called zz-space and <acquaintances  list> 
may be viewed in various ways. A view is a presentation {scripts list} ) 
of some portion of zz-space and is presented by a view Therefore an actor is described by specifying its
program, which visualizes, for example, a region around a superclass, its data part and its script part; the script part
particular cursor. represents the set of scripts that can be executed.
A 2D view can be drawn picking a single cell as a
focal point, and drawing the neighborhood around that cell 3. THE MODEL
along two chosen dimensions. By changing the chosen pair The architecture of our model is organized in two layers: a
of dimensions, we can visually reveal, hide, and rearrange component layer contains the zz-cells, those are actors
nodes in interesting ways. Considering that a zz-structure specifically designed to model the audio documents
may be very large, and that there is usually not enough domain; a meta layer contains the actors classes
room in the 2D view for all of the cells, we restrict the specialized, for example, to manage connections among
dimension of the 2D view. zz-cells or to generate specific views on them. The
Some observations are necessary on the zz-cells; they interaction between actors is defined using the
are the principal unit of the system and they are conceived diagrammatic language AUML (Agent Unified Modelling
not only as passive containers of primitive data (i.e., text, Language) [8], extension of UML for agents.
graphics, audio, etc.) but they can have types, based either
on their functions or on the types of data they contain. 3.1 Component layer
Thus, a zz-cell may have a variety of different properties The component layer is defined in relation to the magnetic
and functions, such as executable program or scripts (this tape structure: each open reel is usually composed by
type of cell is called progcell), or represent the package of several physical segments, i.e. pieces of magnetic tape
different cells (this type of cell is called referential cell). connected by means of adhesive tape (called junction).
Analogous observations can be made on the In each segment, the audio signal is recorded in one, two
dimensions; in fact, they can be passive and nominal or more tracks. Following this structure we define the
(merely receiving and presenting data) or operational, actors Source, PhysicalSegment, and DigitalSignal.
programmed to monitor changing zz-structures and events, Moreover, the actor LogicalSegment is introduced, with the
and calculate and present results automatically (for aim to compare the sources on the basis of a segmentation
example, the dimensions d.cursor and the dimension that is different by the physical one. The actor Source
d.clone). From these considerations it turns out that it is represents the overall characteristics of the document, such
reductive to associate zz-cells to passive entities meant as the tape width (typical values are 1/4, 1/2, 1, and 2 inch)
simple nodes are in a graph. So, we have considered the and the cataloguing fields.
opportunity to model a zz-cell by means of a specific class
of computational agents, the actors. (DefActor Source 
<physicalSegments  width  archive  shelfMark  inventory 
conservationCondition ...>  
2.2 Brief description of the actor model {calculateDuration ...}) 
The actor model [1] is a model of concurrent computations It contains also a list of phisicalSegments, which compound
in distributed systems; it is organized as a universe of the open reel tape. This actor is able to perform several
inherently autonomous computational agents, called
141
actions, e.g. the script calculateDuration asks each physical {swapSegment ...})  

document for his length and rate and it calculates the total (DefActor EquivalentSegmentsRank 
duration of the tape. The LogicalSegment carries out a [inherits‐from Rank] 
virtual partition of the Source. <sources ...> 
{compareQuality ...}) 
(DefActor LogicalSegment 
<start length source rate equalization   Another typology of meta-actor is the generativeProcess;
noiseReductionSystem  this class is devoted to generate dynamic new virtual
tracksLayout digitalSignal audioQuality …>  
{getQuality getSignalProperties …}) 
hyperdocuments.
(DefActor generativeProcess 
Its acquaintances are the start time and the duration of the <… > 
segment, a pointer to the corresponding source, the {generateView createSource}) 
recording rate (tipical values are 7.5 or 15 inch/s), the
equalization (e.g . IEC1/CCIR), the noiseReductionSystem
(e.g. Dolby or dbx), the tracksLayout (i.e. the tracks 4. CASE STUDY
number and width), a pointer to the digitalSignal We propose now the analysis of an interesting case
representing the audio recorded on each track, and an study, in which we use a subset of scripts and actors
audioQuality index, that can be subjectively specified by defined above: the Pousseur’s electronic work Scambi.
the user. If the audioQuality field is left blank, the script Writing about his work in 1959, Pousseur ended by
getQuality asks the digitalSignal to estimate the signal to envisaging the day when technology would allow listeners
noise ratio of each track and returns an index of quality. to make their own realizations of the work (either
The script getSignalProperties asks the digitalSignal for the following his ‘connecting rules’ or not) and to give the,
digital audio of each track and estimates if the audio signal now active, listener the experience of a temporal event
is monophonic, stereophonic, or polyphonic. This open to his intervention and which could therefore be
information doesn’t always correspond to the number of elevated in type, as vital, creative freedom [7]. The active
tracks, because, following the practice of magnetic listener becomes, in effect, a composer; reception and
recording, a monophonic signal can be recorded on a interpretation are expressed as (musical) production.
multi-track format, writing the same signal in all the In our paper, Pousseur’s invitation to interpret his
tracks. A specialization of this actor class is the work creatively as re-composition has been extended to
PhysicalSegment actor that specifies geometrical features of remix and other types of appropriation that were not only
the segment, such as the angle of his junction. permitted but welcomed by the composer [7] (a position
(DefActor PhysicalSegment  that associates him with popular-music culture in which
   [inherits‐from LogicalSegment]  such freedom is assumed).
<junctionAngle …> 
{getFadeDuration …})  In our case study, we have collected the original 32
audio sequences realized by Pousseur, thanks to the
The script getFadeDuration, starting from the geometrical ‘Scambi Project’, Lansdown Centre for Electronic Arts,
properties of the junction, calculates the duration of the School of Arts, Middlesex University, UK
fade-in and fade-out of the audio at the segment edges. (http://www.scambi.mdx.ac.uk).
The NumericSignal actor represents the audio signal
recorded on the tracks. Thanks to the cooperation of different classes of
actors, our system allows user-author to surf among the
 (DefActor NumericSignal  existing performances of Scambi (by the composer,
<size samplingRate resolution data[i,j]>  Luciano Berio and others) and to create a new virtual
{estimateSNR calculateSTFT 
source, automatically picking up the audio sections
    calculateAmplitudeEnvelope})  
following the ‘connecting rules’ proposed by the
Its acquaintances are related to the numeric format: composer. Moreover, the user-author can establish himself
number of samples, sampling rate, and resolution. data[i,j] stochastic rules.
is a matrix, with a row for each track and a column for
Thanks to the separation of structure (dimensions)
each sample. This actor can perform several actions that,
from content (zz-cell) in zz-structures, different users can
using signal-processing techniques, calculate signal to
customize their workplace and the visualization of same
noise ratio, short-time Fourier transform, amplitude
contents, creating new dimensions or new virtual
envelope or other kinds of representations.
performances. This is achieved activating specific actor
3.2 Meta layer collaboration schemes and is shown in following section
In order to manage the information in a zz-structure 4.1.
context, a meta layer is added. Following the rank
definition introduced in ZigZag model, the actor Rank can
4.1 New performances
be described as an ordered list of actors, belonging to the The four acoustic parameters taken into account by
same dimension. Each specific dimension contains one or Pousseur are:
more ranks. Examples of dimensions are:
• source dimension, that composes given sources as an 1) statistical tempo (from slow to fast);
ordered sequence of segments; 2) relative pitch (from low to high);
3) homogeneity of the sound pattern (from dry to
• equivalent segments dimension: links segments with a
strong reverb);
common music content present in different sources.
4) continuity (from long breaks to continuous sound).
For each dimension, we define specialized classes of In Figure 1 is detailed the start- and end-situations
actors, (such as SourceRank and EquivalentSegmentsRank). for each sequence.
(DefActor SourceRank 
[inherits‐from Rank] 
We define equivalent segment as the sequences that
<source ...>  can follow the current sequence on the basis of Pousseur’s
intention to ensure a transition without break (in the four
142
parameters) from one sequence to the next (a sort of features and that of others segments, each segment (srcj,
‘continuity principle’). j=1,...32) assigns this task to the rank re‐sj  (that manages the
logical segments able to follow srcn on the basis of
Pousseur’s ‘continuity principle’). Each re‐sj  contacts (in
synchronous multicast way) all its components (e‐srcjs, s = 
1,  …,  m) and, following stochastic law defined off-line by
the user or by means of user input, it choices the
components with the best matching. This information is
returned to generativeProcess. This last actor collects the
segments and creates the new requested performance. In
Figure 2 a screenshot of the system is showed.
5. CONCLUSION
The era of high modernism, in which concept of the
open work was a radical resistance to this dominant
aesthetic, has been relegated to history. The contemporary
western culture, as it is well-known, assumes that all
musical works are open to perpetually renewed
interpretation by listeners, musicologists, analysts, and
performers [2]. In particular, in multimedia domain no
work is permitted to resist endless (interactive)
interpretation. This contemporary situation is partly the
effect of the invention of the concept of the open musical
work, in which Pousseur was a precursor. For this reason,
Figure 1. Characteristic per sequence (from: Decroupet [4]).
the interest in Scambi is particularly high today, as also
proved by the success obtained by the Scambi Project
This segmentation process can be iteratively applied (www.scambi.mdx.ac.uk/). One effect of our work might
to all the sequences, obtaining a set of audio segments be to free the historical musical open work from its iconic
linked along two dimensions. The user-author can generate status as history, to revive and redefine its specific
new performances mixing different sequences also in openness within general (digital and interactive) openness,
polyphonic structure. To do so, user can apply and to return a continuous presence to it by opening it up
deterministic laws (given by the composer), stochastic to interpretive renewal.
models or self-oriented choices; this allows user to
generate new ‘reading’ performances of an open work. 6. REFERENCES
[1] Agha, G. Actors: A Model of Concurrent
Computation in Distributed Systems, MIT Press,
Cambridge, MA, 1986.
[2] Ayrey, C. Pousseur’s Scambi (1957), and the new
problematics of the open work, Proc. of Symposium
on Scambi at Goldsmiths College, University of
London 2005.
[3] Dattolo A. and Loia, V. Distributed Information and
Control in a Concurrent Hypermedia-oriented
Architecture. Inter-national Journal of SEKE, Vol.
10, n. 6, pp. 345-369, 2000.
[4] Decroupet, P. Vers une théorie generale – Henri
Pousseurs “Allgemeine Periodik” in Theorie und
Praxis in MusikTexte 98, pp. 31-43, 2003.
[5] Eco, U. The role of the reader: explorations in the
semiotics of texts, Indiana University Press, USA,
1979.
Figure 2. A screenshot of the system. X-axis: time, Y-axis: [6] Nelson, T. H. Cosmology for a Different Computer
sequences. The user can be realize a polyphonic structure and Universe. Journal of Digital Information, Vol. 5,
modify the pitch, the duration and the volume of each Issue 1, 2004.
sequence. [7] Pousseur, H. Scambi. In Gravesaner Blätter IV, pp.
We assume that an user is interested in creating the 36-54, 1959.
new performance starting from a segment srcn; this request [8] Winikoff, M. Toward making agent UML practical: a
is captured from the meta-actor generativeProcess and it is textual notation and a tool. First International
forwarded from it to the source rank rsrc (that manages the Workshop on Integration of Software Engineering
logical segments src1, …, src32). rsrc sends a synchronous and Agent Technology, Melbourne, Australia, pp.
multicast message (CalculateRule) to all its logical 401-412, 2005.
segments. In order to enable the comparison among its
143
An Agent-based System for Robotic Musical

Performance
Arne Eigenfeldt Ajay Kapur
School of Contemporary Arts School of Music
Simon Fraser University California Institute of the Arts
Burnaby, BC Valencia, CA
Canada USA
arne_e@sfu.ca akapur@calarts.edu
ABSTRACT The notion of an “agent” varies greatly: Minsky’s original

This paper presents an agent-based architecture for robotic agents [15] are extremely simple abstractions that require
musical instruments that generate polyphonic rhythmic patterns interaction in order to achieve complex results. Recent work by
that continuously evolve and develop in a musically Beyls [1] offers one example of such simple agents that
“intelligent” manner. Agent-based software offers a new individually have limited abilities, but can co-operate to create
method for real-time composition that allows for complex high levels of musical creation.
interactions between individual voices while requiring very The authors’ view of agency is directly related to existing
little user interaction or supervision. The system described, musical paradigms: the improvising musician. Such an agent
Kinetic Engine, is an environment in which individual software must have a much higher level of knowledge, but, similar to
agents, emulate drummers improvising within a percussion other multi-agent systems, each agent has a “limited viewpoint”
ensemble. Player agents assume roles and personalities within of the artistic objective, and, as such, collaboration is required
the ensemble, and communicate with one another to create between agents to achieve (musical) success.
complex rhythmic interactions. In this project, the ensemble is Kinetic Engine [6, 7], created in Max/MSP, is a real-time
comprised of a 12-armed musical robot, MahaDeviBot, in generative system in which agents are used to create complex,
which each limb has its own software agent controlling what it polyphonic rhythms that evolve over time, similar to how actual
performs. drummers might improvise in response to one another. A
conductor agent loosely co-ordinates the player agents, and
Keywords manages high-level performance parameters, specifically
Robotic Musical Instruments, Agents, Machine Musicianship. density: the number of notes played by all agents. Each agent
manages one of the percussion instruments of MahaDeviBot,
aware of its function within the ensemble, and its specific
1. INTRODUCTION physical limitations.
MahaDeviBot [11, 12] is a robotic drummer comprised of
twelve arms, which performs on a number of different 2. RELATED WORK
instruments from India, including frame drums, shakers, bells,
and cymbals. As such, it is, in itself, an ensemble, rather than a
2.1 Multi-agent Systems
Multiple-agent architectures have been used to track beats
single instrument; to effectively create music for it –
within acoustic signals [10, 5] in which agents operate in
particularly generatively in real-time performance - an
parallel to explore alternative solutions. Agents have also been
intelligent method of interaction between the various
used in real-time composition [21, 3]. Burtner suggests that
instruments is required.
multi-agent interactive systems offer the possibility for new
The promise of agent-based composition in musical real-time complex behaviours in interactive musical interfaces that can
interactive systems has already been suggested [23, 18, 16], “yield complexly organic structures similar to ecological
specifically in their potential for emulating human performer systems”. Burtner’s research has focused upon performance,
interaction. Agents have been defined as autonomous, social, and extending instrumental technologies, rather than interactive
reactive, and proactive [22], similar attributes required of composition; as such, his systems are reactive, rather than
performers in improvisation ensembles. proactive, a necessary function of agency.
Dahlstedt and McBurney [4] developed a multi-agent model
based upon Dahlstedt’s reflections on his own compositional
Permission to make digital or hard copies of all or part of this work for processes. They suggest such introspection will “yield lessons
personal or classroom use is granted without fee provided that copies are for the computational modeling of creative processes”. Their
not made or distributed for profit or commercial advantage and that system produces output “that (is) not expected or predictable –
copies bear this notice and the full citation on the first page. To copy in other words, a system that exhibits what a computer scientist
otherwise, or republish, to post on servers or to redistribute to lists, would call emergent properties.”
NIME08, June 4-8, 2008, Genova, Italy Wulfhorst et al. [23] created a multi-agent system where
Copyright remains with the author(s). software agents employ beat-tracking algorithms to match their
pulse to that of human performers. Although of potential benefit
for real-time computer music and robotic performance, the
research’s musical goals are rather modest: “Each agent has a
144
defined rhythmic pattern. The goal of an agent is to play his Kinetic Engine, in collaboration with MahaDeviBot, builds
instrument in synchronism with the others.” upon such previous efforts; however, it is fundamentally
Murray-Rust and Smaill’s AgentBox [17] uses multi-agents in a different in two respects: firstly, it is a real-time system with
graphic environment, in which agents “listen” to those agents performance as its primary motivation; secondly, the software
physically (graphically) close to one another. A human controls a physical instrument that requires mechanical
conductor can manipulate the agents - by moving them around - movement.
in a “fast and intuitive manner”, allowing people to alter aspects
of music “without any need for musical experience”. The 3. AGENT-GENERATED RHYTHM
stimulus behind AgentBox is to create a system that will “enable It is important to recognize that rhythmic intricacy can result
a wider range of people to create music,” and facilitate “the not only from the evolution of individual rhythms, but also
interaction of geographically diverse musicians”. through the interaction of quite simple parts; such interaction
can produce musical complexity within a system. The
2.2 Rhythm Generation interrelationship of such simple elements requires musical
Various strategies and models have been used to generate knowledge in order to separate interesting from pedestrian
complex rhythms within interactive systems. Brown [2] rhythm. Such interaction suggests a multi-agent system, in
describes the use of cellular automata (CA) to create which complexity results from the interaction of independent
monophonic rhythmic passages and polyphonic textures in agents.
“broad-brush, rather than precisely deterministic, ways.” He Existing musical models for such a system can be found in the
suggests “CA provide a great deal of complexity and interest music of African drum ensembles and Central and South
from quite simple initial set-up”. However, complexity American percussion ensembles (note that Indian classical
generated by CA is no more musical than complexity generated music, which contains rhythmic constructions of great
by constrained randomness. Brown recognises this when he complexity, is fundamentally solo, and therefore lacks rhythmic
states that rhythms generated through the use of CA “often interaction of multiple layers). Furthermore, models for the
result in a lack of pulse or metre. While this might be relationship of parts within an improvising ensemble can be
intellectually fascinating it is only occasionally successful from found in jazz and certain forms of Techno. For more
the perspective of a common aesthetic.” information on such modeling, see [8].
Pachet [19] proposes an evolutionary approach for modeling
musical rhythm, noting that “in the context of music catalogues, 4. TOOLS
[rhythm] has up to now been curiously under studied.” In his
system, “rhythm is seen as a musical form, emerging from 4.1 MahaDeviBot
repeated interaction between several rhythmic agents.” Pachet’s
model is that of a human improvisational ensemble: “these
agents engage into a dynamic game which simulates a group of
human players playing, in real time, percussive instruments
together, without any prior knowledge or information about the
music to play, but the goal to produce coherent music together.”
Agents are given an initial rhythm and a set of transformation
rules from a shared rule library; the resulting rhythm is “the
result of ongoing play between these co-evolving agents.” The
agents do not actually communicate, and the rules are extremely
simple: i.e. add a random note, remove a random note, move a
random note. The system is more of a proof of concept than a
performance tool; it developed into the much more powerful
Continuator [20], a real-time stylistic analyser and variation
generator.
Martins/Miranda [13] describe a system the uses a connectionist
Figure 1. MahaDeviBot controlled by Kinetic Engine.
approach to representing and learning rhythms using neural
networks. The approach allows for the computer to learn
rhythms through similarity by mapping incoming rhythms in a The development of the MahaDeviBot serves as a paradigm for
three dimensional space. The research is part of a longer project various types of solenoid-based robotic drumming techniques,
[16, 14] in which self-organising agents create emergent music striking twelve different percussion instruments gathered from
through social interactions; as such, the emphasis is not upon around India, including frame drums, bells, finger cymbals,
the interaction of rhythms as in the emergence of new and/or wood blocks, and gongs. The machine even has a bouncing
related rhythmic patterns. head that can portray tempo to the human performer. The
MahaDeviBot serves as a mechanical musical instrument that
Gimenes [9] explores a memetic approach that creates stylistic extends North Indian musical performance scenarios, which
learning methods for rhythm generation. As opposed to viewing arose out of a desire to build a pedagogical tool to keep time
rhythmic phrases as consisting of small structural units and help portray complex rhythmic cycles to novice performers
combined to form larger units (a more traditional method of in a way that no audio speakers can ever emulate. It accepts
musical analysis), the memetic approach suggests longer blocks MIDI messages to communicate with any custom software or
that are dependent upon the listener (suggesting a more recent
hardware interface.
cognitive method of rhythmical analysis that utilizes
“chunking”). RGeme “generates rhythm streams and serves as a
tool to observe how different rhythm styles can originate and
evolve in an artificial society of software agents.”
145
4.2 Kinetic Engine Type can be loosely associated with the instrument an agent
Kinetic Engine is a real-time composition/performance system plays, and the role such an instrument would have within the
created in Max/MSP, in which intelligent agents emulate ensemble. Table 1 describes how type influences behaviour.
improvising percussionists in a drum ensemble. It arose out of a Table 1. Agent type and influence upon agent behaviour.
desire to move away from constrained random choices and
Type Low Type Mid Type High
utilise more musically intelligent decision-making within real-
time interactive software. Timbre low frequency: midrange high frequency:
The principle human control parameter in performance is • frame drums frequency: • hand drum
limited to density: how many notes played by all agents. All • gongs • tambourine
• shakers
other decisions - when to play, what rhythms to play in
response to the global density, how to interact with other agents Density lower than average higher than
– are left to the machines’ individual agents. average average
Agents generate specific rhythms in response to a changing Variation less often average more often
environment. Once these rhythms have been generated, agents
“listen” to one another, and potentially alter their patterns based
upon these relationships. No databases of rhythms are used: The stored personality traits include Downbeat (preference
instead, pre-determined musical rules determine both generation given to notes on the first beat), Offbeat (propensity for playing
and alteration of rhythmic patterns. off the beat), Syncopation (at the subdivision level), Confidence
(number of notes with which to enter), Responsiveness (how
5. AGENTS responsive an agent is to global parameter changes), Social
Agent-based systems allow for limited user interaction or (how willing an agent is to interact with other agents),
supervision, allowing for more high-level decisions to be made Commitment (how long an agent will engage in a social
within software. This models interactions between intelligent interaction), and Mischievous (how willing an agent is to
improvising musicians, albeit with a virtual conductor shaping disrupt a stable system). A further personality trait is Type-
and influencing the music. scaling, which allows for agents to be less restricted to their
specific types. For example, low agents will tend to have lower
There are two agent classes: a conductor and an indefinite
densities than other types, but a low agent with a high type-
number of players (although in this case the agents are limited
scaling will have higher than usual densities for its type. See
to the twelve instruments of the robot).
Figure 2 for a display of all personality parameters.
5.1 Conductor Agent

The conductor agent (hereafter simply referred to as “the
conductor”) has three main functions: firstly, to handle user
interaction; secondly, to manage (some) high-level
organisation; thirdly, to send a global pulse.
Kinetic Engine is essentially a generative system, with user
interaction being limited to controlling only a few global
parameters:
Figure 2. Example personality parameters for a player
• individual on/off – individual agents can be forced to “take
agent.
a rest” and not play.
• density – the relative number of notes played by all agents.
(Described in section 6.1). 6. RHYTHMIC CONSTRUCTION
• global volume – the approximate central range of an
agent’s velocity. Agents vary their velocities independently,
6.1 Density
and will “take solos” (if they feel they are playing something Agents respond to the global density variable – this correlates to
interesting) by increasing their velocity range; however, their the number of notes playing within a measure. Agents are
central velocity range can be overridden by the conductor. unaware of the exact global density required, and instead rely
upon the conductor to rate the requested density as “very low”,
• agent parameter scaling – the user can influence how the “low”, “medium”, or “high” and broadcast this rating. Agents
individual agents may react. (Described in section 5.2). know the average number of notes in a pattern based upon this
• new pattern calculation – agents can be forced to “start rating, which is scaled by the agent’s type and type-scaling
again” by regenerating their patterns based upon the parameter. Agents apply a Gaussian distribution around this
environment. average, and choose an actual density from within this curve,
Metre, tempo, and subdivision are set prior to performance by thereby maintaining some unpredictability in actual density
the user, and remain static for a composition. The conductor distribution.
also sends a global pulse, to which all player agents The conductor collects all agent densities, and determines
synchronise. whether the accumulated densities are “way too low/high”, “too
low/high”, or “close enough” in comparison to the global
5.2 Player Agents density, and broadcasts this success rating.
Upon initialisation, player agents (hereafter referred to simply
as “agents”) read a file from disk that determines several [1] if the accumulated density is “way too low”, non-
important aspects about their behaviour; namely their type and active agents can activate themselves and
their personality. generate new densities (or conversely, active
agents can deactivate if the density is “way to
high”).
146
[2] if the accumulated density is “too low”, active agents begin social interactions. These interactions involve potentially
can add notes (or subtract them if the density is endless alterations of agent patterns in relation to other agents;
“too high”). these interactions continue as long as the agents have a social
bond, which is broken when testing an agent’s social
[3] if the accumulated density is judged to be “close commitment parameter fails. This test is done every “once in a
enough”, agent densities are considered stable. while”, an example of a “fuzzy” counter.
Social interaction emulates how musicians within an
6.2 Density Spread improvising ensemble listen to one another, make eye contact,
An agent’s density (i.e. seven notes) is “spread” across the and interact by adjusting and altering their own rhythmic
available beats (i.e. four beats) using fuzzy logic to determine pattern in various ways. In order to determine which agent with
probabilities, influenced by the agent’s downbeat and offbeat which to interact, agents evaluate other agent’s density spread.
parameters (see Figure 3 for an example of probability Evaluation methods include comparing density spread averages
weightings spread across four beats). Thus, an example spread and weighted means, both of which are fuzzy tests.
of seven notes for agent A, below, might be (3 1 2 1), in which Table 2. Example density spreads in 4/4: comparing agent 1
each beat is indicated with its assigned notes. with agents 2 and 3.
Agent # 1 2 3
Density spread 3122 1221 2333
Similarity rating 0.53 0.48
Dissimilarity rating 0.42 0.33
Figure 3. Example density spread weightings for two An agent generates a similarity and dissimilarity rating between
agents, 4/4 time with different downbeat and offbeat its density spread and that of every other active agent. The
parameter values. highest overall rating will determine the type of interaction: a
dissimilarity rating results in rhythmic polyphony
(interlocking), while a similarity rating results in rhythmic
Agents determine the placement of the notes within the beat heterophony (expansion). Note that interlocking interactions
using a similar technique, but influenced by the agent’s (dissimilarities) are actually encouraged through weightings.
syncopation parameter.
Once another agent has been selected for social interaction, the
6.3 Pattern Checking agent attempts to “make eye contact” by messaging that agent.
After an initial placement of notes within a pattern has been If the other agent does not acknowledge the message (its own
accomplished, pattern checking commences. Each beat is social parameter may not be very high), the social bond fails,
evaluated against its predecessor and compared to a set of rules and the agent will look for other agents with which to interact.
in order to avoid certain patterns and encourage others.
Previous beat Pattern A Pattern B
30% 90%
Figure 4. Example pattern check: given a previous beat’s Figure 5. Social messaging between agents.
rhythm, with one note required for the current beat, two 7.1 Interaction types: Polyphonic
“preferred” patterns for the current beat. In polyphonic interaction, agents attempt to “avoid” partner
notes, both at the beat and pattern level. For example, given a
In the above example, if the current beat has one note in it, and density spread of (3 1 2 2) and a partner spread of (1 2 2 1),
the previous beat contains the given rhythm, a test is made (a both agents would attempt to move their notes to where their
random number is generated between 0 and 1). If the generated partner’s rests occur (see Figure 6). Because both agents are
number is less than the coefficient for pattern A (.3, or a 30% continually adjusting their patterns, stability is actually difficult
chance), the test passes, and pattern A is substituted for the to achieve.
original pattern. If the test fails, another test is made for pattern
B, using the coefficient of .9 (or 90%). If this last test fails, the
original rhythm is allowed to remain. Using such a system,
certain rhythmic patterns can be suggested through
probabilities. Probability coefficients were hand-coded by the
first author after extensive evaluation of the system’s output.
7. SOCIAL BEHAVIOUR
Once all agents have achieved a stable density and have
generated rhythmic patterns based upon this density, agents can
147
author controlled Kinetic Engine’s conductor agent via a Lemur

control surface, and the second author performed on ESitar
[11]. In this case, the experience was very much like working
with an improvising ensemble, in that high-level control was
possible (density/volume/instrument choice), but low-level
control (specific pattern choice or individual agent control) was
not possible. At the same time, the intricacy of musical
interaction created by the intelligent agents resulted in the
perception of the robot being a complex organism, capable of
Figure 6. Example polyphonic interaction result between intelligent musical phrasing and creation, rather than a simple
agents A and B, with density spreads of (3 1 2 2) and (1 2 2 tool to play back pre-programmed rhythms; combined, they
provided a genuinely new and powerful interface for musical
1). Note that not all notes need to successfully avoid one
expression.
another (beats 3 and 4).
10. ACKNOWLEDGMENTS
7.2 Interaction types: Heterophonic We would like to thank Trimpin and Eric Singer for their
In heterophonic interaction, agents alter their own density support in building the MahaDevibot.
spread to more closely resemble that of their partner, but no
attempt is made to match the actual note patterns (see Figure 7).
11. REFERENCES
[1] Beyls, P. Interaction and Self-Organization in a Society of
Musical Agents. Proceedings of ECAL 2007 Workshop on
Music and Artificial Life (MusicAL 2007) (Lisbon,
Portugal, 2007).
[2] Brown, A. Exploring Rhythmic Automata. Applications
On Evolutionary Computing, Vol. 3449 (2005), 551-556.
[3] Burtner, M. Perturbation Techniques for Multi-Agent and
Multi-Performer Interactive Musical Interfaces.
Figure 7. Example heterophonic interaction result Proceedings of the New Interfaces for Musical Expression
between agents A and B, with density spreads of (3 1 2 2) Conference (NIME 2006) (Paris, France, June 4-8, 2006).
and (2 1 2 1). Agent B had an initial spread of (1 2 2 1). [4] Dahlstedt, P., McBurney, P. Musical agents. Leonardo,
39, 5 (2006), 469-470.
8. ADDITIONAL AGENT KNOWLEDGE [5] Dixon, S. A lightweight multi-agent musical beat tracking
Because each agent is sending performance information, via system. Pacific Rim International Conference on Artificial
MIDI, to a specific percussion instrument, agents require Intelligence (2000), 778-788.
detailed knowledge about that instrument. Each instrument has [6] Eigenfeldt, A. Kinetic Engine: Toward an Intelligent
a discrete velocity range, below which it will not strike, and Improvising Instrument. Proceedings of the 2006 Sound
above which it may double strike. These ranges change each and Music Computing Conference (SMC 2006) (Marseille,
time the robot is reassembled after moving. Therefore, a France, May 18-20, 2006).
velocity range test patch was created which determines these
[7] Eigenfeldt, A. Drum Circle: Intelligent Agents in
limits quickly and efficiently before each rehearsal or
Max/MSP. Proceedings of the 2007 International
performance. These values are stored in a global array, which
Computer Music Conference (ICMC 2007) (Copenhagen,
each agent directly accesses in order to appropriately choose
Denmark, August 27-31, 2007)
velocities within the range of its specific instrument.
[8] Eigenfeldt, A. Multi-agent Modeling of Complex
Similarly, each instrument also has a physical limit as to how
Rhythmic Interactions in Real-time Performance, Sounds
fast it can re-strike; this limit is also determined through a test
of Artificial Life: Breeding Music with Digital Biology,
patch used to inform the program regarding potential tempo
Eduardo Miranda, ed., A-R Editions (forthcoming in
limitations. For example, the frame drums have limits of
2008).
approximately 108 BPM for three consecutive sixteenths (138
ms. inter-onset times) while the tambourine and hand-drum can [9] Gimenes, M., Miranda, E. R. and Johnson, C. A Memetic
easily play the same three sixteenths at over 200 BPM (better Approach to the Evolution of Rhythms in a Society of
than 75 ms inter-onset times). The conductor will limit the Software Agents. Proceedings of the 10th Brazilian
overall tempo and subdivisions so as not to exceed these Symposium on Computer Music (Belo Horizonte, Brazil
limitations; furthermore, individual agents will attempt to limit 2005).
consecutive notes for each drum at contentious tempi. [10] Goto, M., Muraoka, Y. Beat Tracking based on Multiple-
agent Architecture - A Real-time Beat Tracking System for
9. CONCLUSION Audio Signals. Proceedings of The Second International
Kinetic Engine has been used previously as an independent Conference on Multiagent Systems, (1996), 103-110.
ensemble, both autonomously (as an installation) and under [11] Kapur, A., Davidson, P., Cook, P.R., Driessen, P.F., and
performance control (via a network on nine computers for the W. A. Schloss. Evolution of Sensor-Based ETabla,
composition Drum Circle); its use as a generative environment EDholak, and ESitar. Journal of ITC Sangeet Research
for the control of MahaDeviBot has been discussed here. This Academy, Vol. 18 (Kolkata, India, 2004).
“collaboration” has been used in performance, in which the first
148
[12] Kapur, A, Singer, E., Benning, M., Tzanetakis, G., [19] Pachet, F. Rhythms as emerging structures. Proceedings of
Trimpin Integrating HyperInstruments, Musical Robots & the 2000 International Computer Music Conference ICMC
Machine Musicianship for North Indian Classical Music. 2000) (Berlin, Germany, August 27-September 1, 2000).
Proceedings of the 2007 Conference on New Interfaces for [20] Pachet, F. The Continuator: Musical Interaction With
Musical Expression (NIME 2007) (New York, New York, Style. Journal of New Music Research, 32, 3, (2003) 333-
June 6-10, 2007). 341.
[13] Martins, J., Miranda, E.R. A Connectionist Architecture [21] Spicer, M. AALIVENET: An agent based distributed
for the Evolution of Rhythms. Lecture Notes In Computer interactive composition environment. Proceedings of the
Science, Vol. 3907, (2006). Springer, Berlin, 696-706. International Computer Music Conference (ICMC 2004)
[14] Martins, J. and Miranda, E. R. Emergent rhythmic phrases (Miami, Florida, November 1-6, 2004).
in an A-Life environment. Proceedings of ECAL 2007 [22] Woolridge, M., Jennings, N. R. Intelligent agents: theory
Workshop on Music and Artificial Life (MusicAL 2007) and practice. Knowledge Engineering Review, 10, 2 (1995)
(Lisbon, Portugal, September 10-14, 2007). 115-152.
[15] Minsky, M. The Society of Mind. Simon & Schuster, Inc [23] Wulfhorst, R.D., Flores, L.V., Flores, L.N., Alvares, L.O.,
(1986). Vicari, R.M. A multiagent approach for musical interactive
[16] Miranda, E.R. Evolutionary music: breaking new ground. systems. Proceedings of the second international joint
Composing Music with Computers. Focal Press (2001). conference on Autonomous agents and multiagent systems.
ACM Press, New York, NY, 2003, 584-591.
[17] Murray-Rust, D. and Smaill, A.: The AgentBox.
http://www.mo-seph.com/main/academic/agentbox
[18] Murray-Rust, D., Smaill, A. MAMA: An architecture for
interactive musical agents. Frontiers in Artificial
Intelligence and Applications, Vol. 141 (2006).
149
Elementary Gestalts for Gesture Sonification

Maurizio Goina Pietro Polotti
Conservatorio G. Tartini di Trieste Department of Arts and Industrial Design
Via Ghega, 12, Trieste, Italy University IUAV of Venice, Italy
maurizio@goina.it polotti@iuav.it
ABSTRACT essentially abstract. Even if aiming at an interactive performative

In this paper, we investigate the relationships between gesture and environment, no reference to physical metaphor or models (as for
sound by means of an elementary gesture sonification. This work example in [3] or [4]) is pursued.
takes inspiration from Bauhaus’ ideals and Paul Klee’s This paper wants to be the starting point for work that looks
investigation into forms and pictorial representation. In line with promising and will be further developed in the future. As
these ideas, the main aim of this work is to reduce gesture to a discussed later, we envisage an extension of the principles
combination of a small number of elementary components discussed here to visual domain and we look at the definition of
(gestalts) used to control a corresponding small set of sounds. By elementary gestalts, intended as unitary perceptual/expressive
means of a demonstrative tool, we introduce here a line of structures, activated by gestures as a possible way for defining
research that is at its initial stage. The envisaged goal of future effective (indirect) mapping between visual forms and sounds.
developments is a novel system that could be a
composing/improvising tool as well as an interface for interactive Here we present an initial demonstrative tool, realized with
dance and performance. Max/MSP/Jitter that implements a sonification of gestures
generated by hand movements. Elementary sounds are defined
and employed for the sonification of two main categories of
Keywords gestures: straight movements and circular movements.
Bauhaus, Klee, gesture analysis, sonification.
The following section will give an overview of the aesthetics and
ideals of Bauhaus, and in particular of Paul Klee, in relation to
1. INTRODUCTION our work. Section 3 develops the issue of Elementary Gestalts for
This work takes inspiration from the “musically-oriented”
Gesture Sonification (EGGS). Section 4 illustrates the first
thinking of Bauhaus particularly evident, for example, in the
prototype of a system based on EGGS. In Section 5 we discuss
activity of Paul Klee. Andrew Kagan in his essay on Klee says: possible future developments and applications, and draw our
“of all those concerned with the question of musical-pictorial conclusions.
interrelationships, no one devoted more time and energy to it, and
no one arrived at more compelling answers, solutions, an insights
than Paul Klee” [1]. Klee was a violinist and for a period he 2. KLEE, BAUHAUS AND TENETS OF
played with the Bern municipal orchestra and other Swiss musical DESIGN AND ARTISTIC PRODUCTION
organizations as a semi-professional musician [1]. We like to In 1921, Klee joined the Bauhaus School of Art and Architecture
imagine that Klee’s practice with the violin was a source of where he taught until 1931, together with other important artists
inspiration for his pictorial formalism and that the lines and as the Russian painter Wassily Kandinsky, the German architect
curves that populate his paintings were somehow related to the and designer Whalter Gropius, the German painter Joseph Albers,
practice of leading the bow on the violin strings. the Hungarian painter and photographer László Moholy Nagy,
Following Klee’s teachings, we start from a dot, what he calls the and others.
mobility agent. By moving, the dot generates lines. This concept Basically, Bauhaus teaching encouraged the idea that there is a
is well illustrated by Klee in the first example of his “Pedagogical universal, non-figurative, visual language, and parallels were
Sketchbook” [2], a book intended as the basis for the course in often made with the ‘universal’ language of music [5]. In
Design Theory at Bauhaus. The dot is conceived as the atomic particular, Klee’s interest in musical aspects of painting is related
element that generates lines and planes. In a similar way we think to rhythm. In his essay [1], Kagan says: “It was the faceted
of gesture as generated by sequences of dots forming structures shadings of Cubism which gave Paul Klee his first solid basis for
that become complex at different levels. The approach is musical-pictorial thinking. In Cubist patterns of alternating light
and dark facets, he perceived a link to the foundation of music –
Permission to make digital or hard copies of all or part of this work for rhythm..../...Throughout his career, Klee continued to work with
personal or classroom use is granted without fee provided that copies are and refine his concepts of pictorial rhythm”. On the other side:
not made or distributed for profit or commercial advantage and that “Klee himself, particularly during the early years of his career,
copies bear this notice and the full citation on the first page. To copy was extremely circumspect about drawing analogies between the
otherwise, or republish, to post on servers or to redistribute to lists, arts” and “Klee’s effective application of musical models to his
requires prior specific permission and/or a fee. art only came through a very long and slow process of evolution”.
Also, Klee “believed in Goethe's assertion that color and sound do
150
not admit of being directly compared./.. but both are referable to a What we are trying to do here, is a sort of reverse process, from
universal formula.” These concepts seem perfectly in line with gesture to sound by looking at Klee’s lesson on dots and lines to
nowadays psychological and technological research on cross- define a new way of designing sound through gesture.
modality [6]. What we pursue in our present and future research is
the exploration of cross-modality features (i.e. of Goethe's
3. TOWARDS A SONIFICATION OF
universal formula) by investigating an abstract version of the
triangle gesture-sound-image. GESTURE THROUGH ELEMENTARY
In his Pedagogical Sketchbook, Klee points out a didactical path
SOUNDS
At this stage of our research, the aim is to create a virtual
for his students in the Bauhaus, but at the same time, he presents
instrument, producing “abstract” sounds via gesture analysis and
the general principles of his artistic research. In the first part of
recognition, where gesture is understood as an abstract entity. The
the book, Klee introduces the transformation of the static dot into
objective is to look for original relationships between gesture and
linear dynamics. In the colorful words of Sybil Moholy Nagy’s
sound through the recombination of elementary categories. In our
preface to the Sketchbook, the line, being a sequence of dots,
conception, we assume that there is no necessary relationship
“walks, circumscribes, creates passive-blank and active filled
between gesture and sound. On the contrary, the goal is to show
planes” (see Figures 1, 2 and 3).
how it is possible to build new effective and meaningful
relationships between gesture and sound, by defining abstract
relationships and appropriate mappings. The main idea is to
define a number of elementary components of gesture trajectories
and to associate to each of them a specific category of sounds. In
this section we discuss the principles adopted and the preliminary
results obtained.
3.1 Elementary Gesture

Figure 1: Dot generating an active line. Circular movement. Our mobility agent, the equivalent of Klee’s dot, is a colored
Drawings by Paul Klee [1]. hand. Its movement produces lines and curves in (a 2D) space that
are controlling sound generation. Klee starts from a point on a
paper as an atomic element to generate lines and, at a higher level,
planes. In a similar way, we start from a position (the hand) in the
space in order to generate gestures and then sounds.
Indeed, sound/music production in general originates from
gesture. Research on gesture analysis and interpretation is a vast
topic [7]. A gesture is directly related to movement and it is
Figure 2: Dot generating an active line. Circular movement. charged with meanings that are related to dynamics, effort,
Drawings by Paul Klee [1]. inertia, as attributes of movement. In our work, all of this is
bypassed by an abstract geometrical approach to gesture analysis.
At this stage of the work, we decided to reduce the set of gesture
components to linear segments and curvilinear segments. By
means of these two simple and very generic categories, we
decompose gestures into a sequence of straight lines and curves.
Accordingly, the sonification of gesture will be a sequence of
sounds corresponding to the two categories. This is the basic
geometrical principle that controls the selection of a sound family.
Beyond this abstract (pictorial-like) part, a number of secondary
Figure 3: Dot generating an active line. Straight movement. parameters are considered in order to make a sound response
Drawings by Paul Klee [1]. more perceptually coherent with gesture evolution. This is
discussed in the following subsection.
This abstract approach to visual representation is somehow
equivalent to what we want to do in the auditory domain: free the
sound from the task of expression and symbolization and give it
3.2 Elementary Sounds
an autonomous life in relationship with the linearity (or As already said, we need to define two main categories of sound
circularity) of gesture. Klee considered music as a mature model corresponding to straight gestures (movements) and circular
of what he wanted to achieve in the visual domain: he viewed gestures (movements). Presently, we decided to employ sounds
“...the ultimate greatness of Mozart, Bach and Beethoven...” as a generated by a simple additive synthesis: two harmonics form the
path to “...an equally monumental, universal visual art of the “linear-sound” and eight inharmonic partials generate the
future...” [1]. The ultimate goal was “...to discover what “curvilinear-sound”. Indeed, the curvilinear-sound produces a fast
universally applicable aesthetic properties could be isolated from and continuous glissando, more precisely an infinite glissando in
the accomplishments of the titans of music and then to translate the Shepard’s fashion (see [8] and [9] p.1069). The choice of an
those discoveries... into practical, concrete, effective visual terms” infinite glissando was taken to apply to the sound the concept of
[1].
151
rotation. In this category a further distinction is done between

clockwise and counter-clockwise rotation.
An important aspect that is added to the abstract approach is
related to movement attributes other than trajectory: absolute
position and velocity. Even if no acceleration and effort-like
features are considered, these attributes draw us back from an
abstract world to a physical one. This is necessary to avoid the
risk of monotony. Absolute position is mapped to pitch (higher
position = higher pitch) and stero spatialization, and velocity is
mapped to sound volume.
4. EGGS, A SYSTEM FOR ELEMENTARY

SONIFICATION
The first EGGS realization has been implemented in
Max/MSP/Jitter. As already discussed, we focused basically on
two types of movements, straight and circular (see Figures 4, 5
and 6). In the future, we intend to use this elementary “bricks” to
define compound gestures and a wider set of (sub-)categories as Figure 4: Trajectory detection and classification: a straight
circles, ellipses, ovals, spirals, to be mapped to proper sounds, movement.
derived form the basic ones. Also, our analysis is presently
limited to a bi-dimensional case. However, even by means of such
an elementary mapping, the result “sounds” surprisingly rich and
effective and the system revealed unexpected potentialities in
terms of the exploration of sound-movement relationship.
Table 1. Relation between movement and sound
Type of gesture Sound production

still Silence
ascending Shepard + simple glissando,
CW
depending on height
descending Shepard + simple glissando,
CCW
depending on height
straight simple glissando, depending on height
MnM [10] is a package included in the FTM [11], an external Figure 5: Trajectory detection and classification: a CCW
library for Max/MSP, and in it is provided a Gesture-Follower curvilinear movement.
(see [12] and [13]). Unfortunately it was not suitable for our
purposes. In fact, this tool is intended for recognizing a large
collection of specific objects, while we need to recognize only From a technical point of view, the discrimination between
some more abstract characteristics. Here the purpose is to identify straight and circular movements is obtained by measuring the
a common characteristic of infinite objects. MnM needs to learn angle variations of the segments generated by three subsequent
many single object family in order to recognize similar ones. Our couples of points, i.e. the centripetal acceleration of the motion. A
aim is to find a common algorithm, a model that is valid for all variation near to zero is classified as a straight trajectory,
cases of a general category, for instance, of the curvilinear otherwise the curvilinear category is chosen.
movements (e.g. circles and spirals belong to the same category).
In EGGS, visual data concerning gesture are processed by a color
tracking routine that returns five values. The first one, ranging
from 0 to 3, discriminate between stillness, circular counter clock
wise (CCW) movement, straight movement, and circular clock
wise (CW) movement (see Table 1).
The second value is the scalar velocity of the gesture. The third
one is the angle, in radians, of the velocity vector, calculated from
the origin. The fourth value is the total angle, in radians,
calculated from the starting of the session; this value is useful in
order to have a continuously varying angle, avoiding the gap
between the end of a circle and the beginning of the next one.
152
reveal unexpected relations between images and sounds or video

and sounds. A system like this could be adopted for artistic-
oriented investigation of cross-modal and multimodal domains
spanning sonic and visual media.
The system is conceived envisaging some possibilities for
application in performing arts and interactive dance. EGGS may
become a complete system through which a dancer, becoming an
audio-video composer, would be an ideal (virtuoso) player of an
instrument able to reveal novel correlations between audio and
video.
6. REFERENCES
[1] Kagan, A. Paul Klee: Art & Music. Cornell University Press,
Ithaca, New York, 1987.
[2] Klee, P. Pedagogical Sketchbook, trans. Sibyl Moholy-Nagy.
Frederick A. Praeger, New York, 1965.
[3] Cadoz C., A. Luciani, and J.-L. Florens. Artistic creation and
computer interactive multisensory simulation force feedback
gesture transducers. In Proc. Conf. on New Interfaces for
Figure 6: EGGS in action: detection of a curvilinear
Musical Expression (NIME), pages 235–246, Montreal,
movement.
Canada, May 2003.
[4] D. Rocchesso and F. Fontana, editors. The Sounding Object.
Mondo Estremo, Firenze, 2003.
5. PERFORMATIVE POTENTIALITIES
[5] Kennedy, A. Bauhaus. Flame Tree Publishing, London,
AND FUTURE DEVELOPMENTS 2006.
EGGS provides a basic performance system. Many possibilities of
articulation and combination of the elementary mapping are [6] Camurri A., Drioli, C., Mazzarino, B., and Volpe, G.,
conceivable. We have tested a simple realization of an "Controlling Sound with Senses: multimodal and cross-
accumulation process, where stillness is the starting signal of the modal approaches to control of interactive systems". In P.
looping of a sonification. A fast alternation of movements and Polotti and D. Rocchesso, eds. Sound to Sense, Sense to
still instants create polyphonic situations, in which every loop Sound. A State of the Art in Sound and Music Computing.
automatically fade out in time. Logos Verlag, Berlin, 2008.
[7] Camurri, A. and Volpe, G., eds., Gesture-based
Also, as in any musical practice, the learnability issue is
Communication in Human-Computer Interaction, LNAI
fundamental. Exercise is important in order to understand the
2915, Springer Verlag, February 2004
possibilities of the instrument and obtain relevant results.
However, not many technical skills are needed as any simple [8] http://en.wikipedia.org/wiki/Shepard_tone
gesture produces a meaningful sonification. [9] Roads C., Computer Music Tutorial. The MIT Press,
Furthermore, following once more Klee’s and Bauhaus’ teaching Massachusetts, 1996.
and the “Punkt, Linie, Flaeche” (point, line, planes) paradigm, we [10] Bevilacqua, F., Müller, R. and Schnell, N., MnM: a
are working on an extension of the system in order to define plane Max/MSP mapping toolbox, Proceedings of the New
sonification. From a sonic point of view, this will correspond to Interfaces for Musical Expression Conference, NIME,
sound textures. More in general, our future plans are to Vancouver, Canada, 2005.
investigate the idea of using gesture as a control of both sound
[11] Schnell, N., Borghesi, R., Schwarz, D., Bevilacqua, F.,
and image generation. We can imagine three directions in creating
Muller, R. 2005. “FTM – Complex Data Structures for
correspondence between sounds and images: mapping sound to
Max.” Proc. of ICMC 2005. International Computer Music
image, mapping image to sound, and concurrent generation of
Association. Barcelona, Spain.
sound and image. With EGGS, the ultimate objective would be to
search for novel relations between sound and image by means of [12] http://ftm.ircam.fr/index.php/Gesture_Follower
recombining abstract categories controlled by gesture. The [13] Bevilacqua F., Guédy F., and Schnell N. “Wireless sensor
intention is to investigate if the definition of abstract (gestural) interface and gesture-follower for music pedagogy.” In
categories and the definition of effective (and independent) Proceedings of the 2007 Conference on New Interfaces for
mappings for both sound generation and image generation will Musical Expression (NIME07), New York, NY, USA.
153
Sonically Augmented Found Objects

Stefano Delle Monache Pietro Polotti Stefano Papetti Davide Rocchesso
Computer Science Dept., Computer Science Dept., Computer Science Dept., DADI
University of Verona University of Verona University of Verona University IUAV
Strada le Grazie, 15 Strada le Grazie, 15 Strada le Grazie, 15 of Venice,
Verona, Italy Verona, Italy Verona, Italy Venezia, Italy
dellemonache@sci.univr.it polotti@sci.univr.it papetti@sci.univr.it roc@iuav.it
ABSTRACT object involves “natural” and well-known gestures, easy to be

We present our work with augmented everyday objects performed by the player and to be visually interpreted by listeners.
transformed into sound sources for music generation. The idea is A third issue considered in this work is the practice of augmenting
to give voice to objects through technology. More specifically, the everyday objects with sonic features. This is for example what is
paradigm of the birth of musical instruments as a sonification of envisaged in one of the three future scenarios depicted in the SMC
objects used in domestic or work everyday environments is here Roadmap [3]: “[…] many sound devices will have a general
considered and transposed into the technologically augmented purpose computer in them and will include quite a number of real-
scenarios of our contemporary world. time interaction capabilities, sensors and wireless communication.
Basically, any sound-producing device will be able to behave like
Keywords a personalised musical instrument. Music making will become
Rag-time washboard, sounding objects, physics-based sound pervasive and many new forms of communication with sound and
synthesis, interactivity, sonification, augmented everyday objects. music will become available. Music content will be inherently
multimodal and music making available to everyone: music from
1. INTRODUCTION all to all.”
Usually, in electronic music and performance, the sound material
is somehow an a priori: it results from some synthesis method or The paper has the following structure: in Section 2, the theme of
some processing technique applied to a selection of samples. everyday objects employed as musical instruments is introduced;
Interface invention and design in the context of Sound and Music Section 3 deals with embodiment and gesture in musical
Computing (SMC) start from music and face, as main issue, the interfaces design; Section 4 concerns aspects related to physics-
achievement of an appropriate interface/instrument enabling users based Sonic Interaction Design (SID); in Section 5 we present our
to control a specific sound material in a musical and expressive former work and summarize the issues coming out of the previous
way. It could be said that, established a certain musical style and sections; in Section 6 examples of what we mean by Sonically
aesthetics, the problem is how to produce music in a consistent Augmented Found Objects (SAFOs) are provided; in Section 7
way through the manipulation of some physical system conclusions are drawn.
controlling computer-generated sound in an effective (expressive)
way [1] [2]. 2. EVERYDAY OBJECTS AND MUSIC
From the point of view of Ethnomusicology, the transformation of
In this work, we try to turn things up-side down: given some everyday objects into musical instruments is quite a common
objects, we wonder how to provide them with an “expressive process: music raises from the employment of working-tools,
voice”. This voice will be the source of music. In other words, domestic (e.g., kitchen) objects, natural product of human
music as a consequence of a process that starts from the object activities or natural objects tout-court [4]. Examples are drums
and goes through the definition of a sound (the voice of the obtained from clay vessels or from pumpkins, bones with notches
object) and ends in some form of organized sonic events that give scratched by a stick, shells used as trumpets, bamboos that
birth to music. become whistles, the musical arch and so on. Other examples are
In addition, these kind of processes involve a manipulation of the cross-cultural and well-known to everybody as a grass tightened
objects that somehow transmits expressivity to sound through between two fingers and used as a reed. This seems to respond to
gesture. The fact of using everyday objects, thus, can provide an impulse of humans to make things of our environment talking.
sounds with the richness of meaning and expressivity of everyday A very good example (maybe even a paradigm) of what we mean
actions. Indeed the interaction with and manipulation of everyday by use of everyday objects is the washboard employed in the rag-
time tradition: two different tools, a washing board and a set of
five thimbles – washing and sewing activities together –
Permission to make digital or hard copies of all or part of this work for employed for producing a sound able to act as a surrogate of a
personal or classroom use is granted without fee provided that copies are whole drum-set thanks to its natural usability.
copies bear this notice and the full citation on the first page. To copy In this perspective, the musical practice of “found objects” is very
otherwise, or republish, to post on servers or to redistribute to lists, appealing and challenging. Initiated by experienced artists such as
requires prior specific permission and/or a fee. Marcel Duchamp, this kind of aesthetics focuses on the use of
NIME08, June 5-7, 2008, Genova, Italy existing objects that have not been designed for artistic purposes.
Copyright remains with the author(s). Found objects may exist either as utilitarian, manufactured items,
154
or things which occur in nature. In any case, the artist (e.g., the through in-solid acoustic waves; further, thanks to the analogy of
musician) exploits the potentialities of the objects as a vehicle of these waves with sound, they can be “naturally” mapped to a
artistic meaning. These objects are denoted as “found” in order to perceptually clear and energetically consistent sound response.
distinguish them from other purposely created items used in arts. The limit – or the advantage – of TAIs with respect to TUIs is a
From the early compositions of Musique Concrète such as Pierre restriction of the scope: from “no limit” in the physical design of
Henry’s Variation pour une porte et un soupir (1963), John the input interface, to “no limit” in the choice of any object as an
Cage’s compositions, or astonishing soundtracks such as Jacques input interface. The possibility of using “any object” offers the
Tati’s movie Playtime, this practice continues investigating great opportunity to skip (to a certain extent) any training or
expressive qualities of everyday artifacts1, electronics included practice stage: the “interaction sound” mapping can be
(see for instance [5] for the practice of circuit bending, and [6] for designed so that the sound responds in an effective way to
the notion of infra-instruments). usual/everyday interactions with the objects.
3. EMBODIED INTERFACES In the present work, we consider embodiment as a consequence of

Embodiment [7][8] is a fundamental issue of the musical practice. employing everyday objects. As in the case of TAIs, the fact that
Generally speaking, embodiment is the result of a learning process people can play the “instruments” by means of usual (everyday)
resulting from perceiving by doing and doing by perceiving. The and well-known gestures is exploited.
action-perception loop (the so-called Enaction2) model describes
the usual modality of most of our actions. Indeed, a perceptually
4. PHYSICALLY-BASED INTERACTION
guided action is what a musician performs when playing an The third aspect considered in this work is related to sound
synthesis and control. The goal is to adopt sound synthesis
instrument. Somehow, traditional musical instruments can be seen
as the means for transforming physical movements into musical algorithms allowing an effective control over the sound
sounds. In this sense, musical composition becomes an implicit production. Sound is as a pressure signal generated by interactions
process of organizing and directing physical human gestures on a with and between objects. By modeling sound sources in terms of
musical instrument. In other words: music as sonification of their physical behavior it is possible to define a natural mapping
gesture. between human gestures and the control parameters of the sound
model, this way providing physical consistency between action
Nowadays, in a SMC scenario, embodiment is not a necessary and sound.
feature. Movement is no longer limited to the physical actions
An example of physics-based sound models are those developed
required to play traditional acoustic and electro-acoustic
instruments. A whole new range of musical “gestures” can be in the context of the SOb (the Sounding Object) [15] and the
imagined and designed for new interfaces [9]. A physical gesture CLOSED3 (Closing the Loop Of Sound Evaluation and Design)
can affect music at different levels: possibly modifying the research projects. The SOb/CLOSED algorithms comply with the
structure of the musical discourse (macro-level), or adjusting modular structure resonator-interactor-resonator, hence
some parameters of sound synthesis or processing (micro-level). representing the interaction between two resonating objects.
Also, new instruments can take any shape or size [10]. For Thanks to the modularity of the framework adopted, it is possible
instance, they could occupy a large space, or be split into to connect any couple of resonators through complex (non-linear)
individual parts forming a kind of network. interaction models. The sound models were developed following
the guideline given by the so-called ecological acoustics [16].
All of these points offer new and exciting perspectives for musical Simple sound events were modeled for instance as impacts,
production [11]. However, it is well known that they involve also frictions, bubbles. These have been recognized as the basic sonic
the risk of achieving (paradoxically) poor results. Disembodiment, events underlying many complex processes. For example, rolling,
weak mapping strategies, loss of expressive details are problems bouncing and crumpling sounds are implemented by means of
often faced in the design of new interfaces. complex temporal patterns controlling the generation of
elementary impact events; rubbing, squeaking and braking sounds
Ishii’s Tangible User Interfaces (TUIs) represent a fundamental can be traced back to frictions; finally, the bubble model is the
innovation in the sense of recovering the body [12]. The idea is to basis for burbling, dripping, pouring and frying sounds. As a
employ physical objects and the surrounding space as a media
result of the physical consistence of the models, it is
bridging the gap between virtual and physical worlds. Since the straightforward to map their control parameters to continuous
early 1990s, the Hyperinstruments Group of the MIT Media Lab physical interactions, and to describe resonators and interaction
has developed a number of applications such as musical toys for models by means of their physical and geometric properties.
children and other musical devices requiring no pre-existing
traditional instrumental skills. One of the musical applications This kind of approach is the one most largely adopted in SID: a
realized within the group are the Squeezables [13]. The possibility novel discipline, emerged in the last decade from the fields of
of controlling sound parameters by means of a physical effort ecological acoustics, soundscape and everyday listening studies,
appears as a successful strategy. Another example is given by and interaction design. In SID, the functional aspect of sound, that
Tangible Acoustic Interfaces (TAIs) [14]. The idea of using is its role as a “carrier of information”, plays a fundamental role.
acoustical signals generated by mechanical interactions with The goal of SID is to create or reveal new functionalities, to
objects as control signals has many benefits: in fact, the enhance the sonic identity of objects or to improve their usability
physical/gestural expressivity of the manipulation is transmitted and user performance during the interaction. Sonic information
contributes, together with visual, tactile and haptic qualities, to
forming the experience of an object. Moreover, when embedded
1
For instance http://www.youtube.com/watch?v=Z7h8qkMBE_E
2 3
https://www.enactivenetwork.org/index.php?8/objectives http://closed.ircam.fr/
155
sound and (inter-)action are tightly coupled, the kinesthetic and

tactile experiences establish a strong perceptive link between
sound feedback and the artifact-source. This happens to such an
extent that we say that the sound is the thing.
All of these aspects are inherited from the authors’ experience in
the context of SID, and transposed into this new work. The goal is
then to turn sonically interactive everyday artifacts into musical
instruments by challenging the application of a solmization
system to them. The aim is to give birth to a new generation of
sonically augmented found objects [17] [18].
5. AUGMENTED TABLES Figure 1 - Max/MSP patch for the fork.

A conspicuous set of tangible interfaces and table-based
interactive sonic devices have been developed during the last
decade. Most of existing tabletop-like tangible interfaces, such as
the reacTable [19] and other related devices, act as controllers of
a sound synthesis and processing engine, focusing on and
questioning about expression issues. Another example, the Table
Recorder4 is a sonically augmented table whose concept and
realization cleverly couples interaction and real everyday
sounding objects (such as glasses, cans, dishes and so on). The
Cardboard Box Garden [20] explores everyday interaction with
an augmented cardboard box as container of sounds; interaction
with boxes allows to manipulate stored sounds in a simple and
intuitive way. Of interest is also the Tactophonics [21], a design
research in musical affordance by using sounds as control signals. Figure 2 - Max/MSP patch and details of two top dressing
In most of the cases these tangible interfaces serve as controls for bottles.
complex sound processing, effects or sequencer-based music
organization. They describe systems that can be effectively flavor, color) of the liquids. The continuous sound feedback
controlled live and with a very intuitive approach. The interfaces informs also about the quantity of liquid that has been
are used as media to recover human gestures and manipulations. poured;
However, both as spectators and performers, we’re not able to the decanter: the action of pouring liquids is sonified as a
infer any musical quality of the said tangible interfaces. In a way, continuous friction/braking sound feedback;
the produced music still remains detached from a real source: the
sound controlled or generated via the interface manipulation is the sangria bowl: the rotation of the ladle is sonified by
still not the sound of a physical object. means of a dense granular crumpling sound, as the sound
produced by footsteps on the sand;
The Gamelunch [22] – a sonically augmented dining table –
follows a complementary approach: various sensing devices the salad bowl: continuous dripping and boiling sounds are
(force transducers, light and magnetic field sensors) enable to coupled with the action of stirring and mixing the salad;
capture continuous interaction between humans and everyday
(dining) objects, and the data sensed drive the control parameters the tray: during the action of balancing the tray while
of physics-based sound models. With its set of immediate and serving beverages, continuous dripping and burbling sounds
natural gestures and actions, the dining scenario sets a fertile inform about its inclination.
context for an investigation of interactive sound generation. Our aim is now to develop a brand new solmization system for
Simple actions such as cutting, sticking, drinking, pouring, these sonically augmented artifacts as that already existing for
grasping, stirring, mixing, have been analyzed in terms of source crafted musical instruments or for found objects.
behavior and generated sounds. The current set of sound-
enhanced interactive objects includes: 6. EVERYDAY MUSICAL INSTRUMENTS
Focusing on the more immediate musical aspects of the dining
the fork: a continuous friction/squeaking sound sonifies the
scenario, we extracted cutlery and dressing bottles as candidate
action of lifting the fork while eating;
prototypes of SAFOs.
the knife: the action of cutting is sonified as rubbing on a
For the realization of the cutlery and bottles prototypes, we made
wrinkled plastic surface;
wide use of the Nintendo Wii Remote controller since it provides
the shakers (a set of bottles for making cocktails): an ergonomic handle, plus 3D accelerometers and a set of buttons
interactions and correspondent sound feedback are addressed that can be easily interfaced with our sound synthesis engine.
to barmen. Free, yet energetically consistent sonifications
While maintaining the type of sonification discussed above, we
synesthetically represent the qualities (alcohol content, taste,
focused on creatively exploring and pushing the boundaries of the
sound design space. Three aspects are considered: 1) Availability
4
F. Gmeiner, “The Table Recorder: Instrument for everyday life’s to the user of a wide range of sonic material to work with. This
patterns,” http://www.fregment.com/table happens by dynamically modifying the configuration of the
156
physics-based sound models during the interaction. To this end, [9] N. Armstrong, “An Enactive Approach to Digital Musical
we created families of parameters configurations among which to Instrument Design,” PhD thesis, Princeton University, 2006.
morph; 2) Thanks to the possibility of recording gestural data, it is [10] S. Fels, L. Kaastra, S. Takahashi and G. McCaig, “Evolving
possible to interact with gestural loops in a “sequence and Tooka: from experiment to instrument,” Proc. Conf. on New
playback” style; 3) Interaction modalities (configurations) are interfaces For Musical Expression (NIME), Hamamatsu,
investigated in order to set basic musical gestures (as e.g., bending Shizuoka, Japan, June 03-05, 2004.
or finger-picking for a guitar). In detail:
[11] G. Essl and S. O’modhrain, “An enactive approach to the
the cutlery: both the fork and the knife make use of the design of new tangible musical instruments,” Org. Sound
friction sound model. By exploiting combinations of buttons 11, 3 (Dec. 2006), pp. 285-296.
and movements, users can range over different presets, or
effectively and reliably drive the control parameters of the [12] J. Patten, B. Recht and H. Ishii, “Interaction Techniques for
sound model, such as the stiffness and viscosity of the Musical Performance with Tabletop Tangible Interfaces,”
interaction, or the mass and the resonant qualities of the ACE 2006 Advances in Computer Entertainment,
Hollywood, California June 14-16, 2006.
objects (Figure 1);
[13] G. Weinberg, “Playpens, Fireflies and Squeezables – New
the bottles make use of a continuous-crumpling sound model Musical Instruments for Bridging the Thoughtful and the
[23]. The available control parameters are the stiffness and Joyful,” Leonardo Music Journal, MIT Press, vol. 12, pp.
shape of particles, and material resistance as a metaphor of 43-51.
the present quantity of liquid (Figure 2);
[14] A. Crevoisier and P. Polotti, “Tangible Acoustic Interfaces
the steak configuration: typically when holding the fork and their Applications for the Design of New Musical
with the left hand, and the knife with the right one; Instruments,” Proc. Conf. on New Interfaces for Musical
the pasta configuration: when holding the fork with one Expression (NIME), Vancouver, Canada, May 26-28, 2005.
hand, and a dressing bottle with the other. [15] D. Rocchesso and F. Fontana, editors, “The Sounding
Object,” Mondo Estremo, 2003. Available at
7. CONLUSIONS http://www.soundobject.org/
In this paper, we present a development in musical direction of
[16] W. W. Gaver, “How do we hear in the world? Explorations
our former work on sonic interaction design for artifacts. Some
of ecological acoustics,” Ecological Psychology, vol. 5, no.
examples of what we called SAFOs are illustrated. These new
4, pp. 285-313, 1993.
instruments reflect the impulse of giving voice to everyday
objects that belongs to musical traditions of every time and [17] K. Moriwaki, “MIDI scrapyard challenge workshops,”
culture. This practice is here brought to the present by making use Proc. Conf. on New interfaces For Musical Expression
of current technologies and interaction design. (NIME), New York, June 06-10, 2007.
[18] P. Cook, “Musical Coffee Mugs, Singing Machines, and
8. REFERENCES Laptop Orchestras,” 151st Meeting of the Acoustical
[1] A. Gadd and S. Fels, “MetaMuse: metaphors for expressive Society of America, Providence, May 2006.
instruments,” Proc. Conf. on New interfaces For Musical
Expression (NIME), Dublin, Ireland, May 24-26, 2002. [19] S. Jordà, M. Kaltenbrunner, G. Geiger, and R. Bencina,
“The reacTable,” Proc. Intern. Computer Music Conf.
[2] R. Hoskinson, K. van den Doel and S. Fels, “Real-time (ICMC), 2005.
Adaptive Control of Modal Synthesis,” Proc. Conf. on New
Interfaces for Musical Expression (NIME), Montreal, pp. [20] K. Ferris and L. Bannon, “The Musical Box Garden,” Proc.
99-103, 2003. Conf. on New Interfaces for Musical Expression (NIME),
Dublin, Ireland, May 24-26, 2002.
[3] “A Roadmap for Sound and Music Computing,”
http://smcnetwork.org/roadmap [21] A. A. Cook and G. Pullin, “Tactophonics: Your Favourite
Thing Wants to Sing” Proc. Conf. on New Interfaces for
[4] C. Sachs, “The History of Musical Instruments,” Norton and Musical Expression (NIME), pp. 285-288, New York, NY,
Company, Inc., 1940. USA, 2007.
[5] R. Ghazala, “Circuit-Bending: Build Your Own Alien [22] P. Polotti, S. Delle Monache, S. Papetti and D. Rocchesso,
Instruments,” Wiley Publishing Inc, Indianapolis, USA, “Gamelunch: Forging a Dining Experience through Sound”,
2005. http://www.anti-theory.com/soundart/ Proc. Conf. on Human Factors in Computing Systems
[6] J. Bowers and P. Archer, “Not Hyper, Not Meta, Not Cyber (CHI), Florence, Italy, 2008. http://www.vimeo.com/874774
but Infra-Instruments,” Proc. Conf. on New Interfaces for [23] R. Bresin, S. Delle Monache, F. Fontana, S. Papetti, P.
Musical Expression (NIME), Vancouver, BC, Canada 2005. Polotti and Y. Visell, “Auditory feedback through
[7] P. Dourish, “Where the Action Is: The Foundations of continuous control of crumpling sound synthesis”. Proc.
Embodied Interaction,” MIT Press Cambridge, MA, USA, CHI - Sonic Interaction Design workshop, Florence, Italy,
2001. April 6th, 2008.
[8] F. J. Varela, E. Thompson and E. Rosch, “The Embodied
Mind: Cognitive science and human experience,” MIT
Press, Cambridge, MA, USA, 1991.
157
Sonified Motion Flow Fields as a Means of Musical

Expression
Jean-Marc Pelletier
International Academy of Media Arts & Sciences Graduate School of Media andGovernance,
3-95 Ryoke-cho, Ogaki-shi, Gifu Keio University
503-0014, Japan 5322 Endo, Fujisawa-shi, Kanagawa
+81-584-75-6600 252-8520, Japan
+81-0466-47-5111
jmp@iamas.ac.jp
ABSTRACT extracted from them. Another important characteristic is that

This paper describes a generalized motion-based framework for they typically require no physical contact between the sensor
the generation of large musical control fields from imaging data. and the object. While this is certainly useful for stage
The framework is general in the sense that it does not depend on performers, it also allows a great degree of freedom as to the
a particular source of sensing data. Real-time images of stage nature of the objects being sensed. Smoke, clouds, traffic,
performers, pre-recorded and live video, as well as more exotic crowds and countless other phenomena can only be sensed
data from imaging systems such as thermography, pressure remotely. Such phenomena exhibit complex natural patterns that
sensor arrays, etc. can be used as a source of control. Feature may prove interesting when they are translated to music. Nature,
points are extracted from the candidate images, from which after all, has been a continuing source of inspiration for
motion vector fields are calculated. After some processing, these countless artists.
motion vectors are mapped individually to sound synthesis Generating the large numbers of control parameters frequently
parameters. Suitable synthesis techniques include granular and required to generate complex and organic sound structures is a
microsonic algorithms, additive synthesis and micro-polyphonic recurring problem. Granular synthesis typically requires the
orchestration. Implementation details of this framework is generation of a large number of grains per second [19] each
discussed, as well as suitable creative and artistic uses and one having its own set of parameters and additive synthesis
approaches. requires dozens if not hundreds of amplitude envelopes to be
generated simultaneously. Several approaches have been
Keywords proposed over the years to address this issue. Several of these
Computer vision, control field, image analysis, imaging, involve one-to-many mappings or stochastic processes [19] to
mapping, microsound, motion flow, sonification, synthesis generate large control fields.
Thus, on one hand, imaging technologies give us access to
1.INTRODUCTION potentially large amounts of information originating from varied
Since the pioneering work of Erkki Kurenniemi [17] and David sources that often possess inherent complex structures. On the
Rokeby [20], a great number of musical interfaces using moving other hand, sound synthesis algorithms like granular and
images as a source of control have been put forward. While the additive synthesis require a large number of control parameters
simple design of Kurenniemi's DIMI-O and Rokeby's VNS to create complex sounds. It follows that moving images should
allowed a great degree of freedom as to what their cameras be well-suited to control such algorithms.
could shoot, it limited at the same time the variety of output of As has been pointed out in previous research [16][26], motion
which they were capable. More recent work, notably using the provides a link between images and sound. Hence, in order to
EyesWeb platform, has focused on extracting higher-level access and sonify the complex structures that can be found in
information from images, such as expressive content [3]. These imaging data, a motion flow field describing the motion at
advances allow a richer form of interaction between performers several key points in the image is called upon. Instead of
and musical system but to achieve this, they require the deriving high-level descriptors from this field, its individual
assumption that there is a performer in the first place. components will be used to control matching components of
dense synthesis techniques such as granular and additive
Despite severe limitations like poor time-wise resolution and
synthesis.
latency, image-based interfaces have important merits. One,
highlighted above, is that a great amount of information can be
2.PREVIOUS WORK
Permission to make digital or hard copies of all or part of this work for 2.1 Sonification
personal or classroom use is granted without fee provided that copies are There has been surprisingly little research done towards the
not made or distributed for profit or commercial advantage and that automatic sonification of generic video sequences for artistic
copies bear this notice and the full citation on the first page. To copy purposes. Some systems have been proposed for automatically
otherwise, or republish, to post on servers or to redistribute to lists, generating soundtracks for existing movies [18] but these are
NIME08, June 4-8, 2008, Genova, Italy typically not aimed at musicians or composers. Some more
Copyright remains with the author(s). artistically-relevant work has been done with still images:
158
sonArt [30], for instance, is a system for creating music from While the term “feature” is used extensively in the computer
information contained in still pictures. vision literature, its definition remains somewhat vague. A
feature can be seen as “an interesting image structure that could
Some research has been done from a more scientific perspective
arise from a corresponding interesting scene structure. Features
towards the sonification of vector fields, which at the mapping
can be single points such as interest points, curve vertices,
stage shares some similarities with the framework presented
image edges, lines or curves or surfaces, etc.” [5] For the
here.
purpose of this paper, however, features can be seen as having
Funk, Kuwabara and Lyons used an optical flow field in the following properties: 1) They are local, that is, they have a
conjunction with face detection and zones to devise a musical specific (x,y) position. 2) They exist at a given scale. For
interface that can be played with the muscles of the face. [7] example, a square can either yield a single large-scale feature or
Jakovich and Beilharz used a dense optical flow field (one four small scale features at each corner. 3) Features are the local
computed at every pixel in the image) to alter the cells of a maximum of some image intensity variation metric. The
cellular automaton running a “game of life”, which in turn features that match these properties are often referred to as
controlled a granular synthesizer [10]. “corners”.
The most similar research to date is that of Kapur et al. who As feature detection is now one of the most fundamental
used motion data from a VICON system to control parameters processes in computer vision, several algorithms have been put
of various synthesis algorithms [11]. While their direct mapping forward [15]. The Harris detector [8] and its multi-scale variant
(for instance using n motion vectors to control n sinusoids of an [14], and the very closely related Shi-Tomasi detector[22],
additive synthesiser) closely mirrors that of the framework which are based on the partial derivatives of the image intensity,
presented here, the VICON system, with its six cameras and are some of the most commonly used algorithms. Other
physical markers imposes great physical and technological detectors include the difference of gaussian (DoG), the SUSAN
constraints that limit the range of its practical uses. corner detector [25] and the FAST corner detector [21]. If we
Furthermore, the authors focused their research on human limit our search to smallest-scale features those occurring in
gestures, whereas this research aims towards the use of arbitrary a 9 pixel by 9 pixel neighborhood the machine-learning-
imaging data. based FAST detector is well-suited due to its rapid execution
time. However, the Shi-Tomasi detector and the DoG detector
may prove better choices in certain situations.
3.FEATURES AND FLOW
3.1 Image Features 3.2 Motion Flow
Raw images contain a vast amount of information: a single- As a result of performing feature detection, the image is
channel 320 by 240 pixel 8-bit image contains 76,800 bytes, described as a field of image coordinates (and optionally scale
which translates to 2,304,000 bytes per second. By contrast, a values) corresponding to the features in the image. While it
stereo 16-bit audio stream at 44.1 kHz yields only 176,400 bytes would be possible to use this information as it is, in order to
per second. In order to limit the amount of data available, salient perform more significant mappings, it is important to find out
image features must first be identified. how these features move from frame to frame. The techniques to
achieve this can be broadly classified in two categories: feature
matching techniques and optical flow-based techniques.
Feature matching techniques [25] involve finding the features in
two different frames and matching each feature in one frame to
the most similar feature in the second. A number of statistical
metrics can be used to measure the similarity of two features
based on the values of its pixel neighborhoods. The sum of
squared differences and the earth mover distance are two such
metrics that perform well [25]. It should be noted that feature
matching is an asymmetric process: not all features in both
images can be matched into pairs. Some features in the first
image will be lost, some in the second will appear and some,
outliers, will be mismatched.
Instead of computing features for each frame and finding
matching pairs, it is also possible to start with a given set of
features and calculate the optical flow at each of these points.
The optical flow is a (x,y) vector expressing apparent motion
at a point. Common optical flow estimation algorithms can be
classified between block-matching methods [1] (which are
computationally similar to feature matching algorithms) and
differential-based methods such as the Lucas-Kanade algorithm
and its more robust pyramidal implementation [2]. Knowing the
Figure 1: Features computed at two different scales displacement value, it is possible to compute the new position
for every feature at each frame. Because in most cases features
will be lost, for example by moving outside the image bounds,
159
and new features are bound to appear, it is necessary to update 4.2 Space
the feature list in parallel with the optical flow calculation. Since images are inherently spatial, the most natural and
Hence, the image is processed in this way: optical flow is motivated mapping possible is that of vector position to sound
calculated for existing features and their position is updated position. As a matter of fact, the framework outlined in this
features that could not be successfully tracked are removed from paper is particularly apt at creating complex spatial trajectories.
the list new features are searched in image areas where there
are currently no features. The simplest type of spatial mapping is to assign the normalized
x values of each vector to the stereo pan position of the sonic
Regardless of the combination of feature detection and tracking
component it corresponds to. There is typically more freedom as
algorithm used, the result is conceptually the same: a field of
to how the y axis can be interpreted: in a planar surround
motion vectors either in the format (x,y,x,y) or
playback environment, it can be mapped to the front-back axis
(x,y,s,x,y,s) where x and y denote position and s denotes
although in some setups, it could also be assigned to the up-
scale.
down axis.
3.3 The Flow Field It is also possible to generate positional vectors for the various
In its raw state, the motion vector field computed above is not audio spatialization methods available. The scale dimension (if
usable for musical purposes. It will typically contain a certain it is calculated) can be mapped to the z or y axes, with larger
number of outlier vectors, which will tend to produce jarring features being mapped to closer positions. While this would
and unpredictable results when mapped to sound synthesis correctly translate features becoming smaller and larger to
parameters. It is thus necessary to run a rather strict filter on the sounds moving further and closer, this is a rather naïve mapping
motion field to get rid of these outliers. This filter can be that can often lead to undesirable results. Large features do not
implemented several different ways, including the median flow necessarily correspond to closer objects. In this case, one
technique described by Smith et al., in which “each vector in dimension must be assumed to be constant: the vectors are
turn is compared with its neighbours. If it points in a similar assumed to move along a plane, though in some cases this is not
direction or is a similar, small, length when compared to the an accurate representation of the motion of the object being
'median flow' in that area, then it is classified as an inlier, sensed. In some situations it should also be possible to
otherwise it is discarded as an outlier.” [25] extrapolate the z axis displacement of motion vectors using 3D
reconstruction algorithms.
While the motion flow field is expressed using cartesian One of the great advantages of using motion vectors for
coordinates and deltas, for later mapping purposes it is useful at
spatialization is that since we know not only where a feature lies
this stage to translate at least the displacement values to a polar but also how fast and in what direction it is moving, it is also
coordinate system. This yields the following motion vector:
possible to use this information to control doppler shift
(,,s,,,s) or in hybrid form: (x,y,s,,,s). (Here also, simulations.
the scale dimension is optional.)
It would be possible at this stage to perform further analysis on
4.3 Amplitude
An often convincing approach to controlling amplitude
the motion field. 3D reconstruction algorithms would allow us
parameters of synthesis components is to assign the length of
to recover some form of depth information, either in the form of
the displacement vector () to amplitude. As is directly
camera ego-motion or scene structure. More general algorithms
related to motion velocity, this means that faster objects will
can quantify certain types of macroscopic motion such as
sound louder. This relationship is somewhat metaphorically
contractions and expansions, as well as perform object
grounded: if the sound is thought to be generated through
segmentation. However, in this framework, this step is skipped
friction, then indeed faster gestures will produce louder sounds.
in favor of using the vectors directly.
Hence, the velocityamplitude mapping is to an extent
perceptually motivated.
4.MAPPING
Overall amplitude is also indirectly controlled via vector
4.1 Time density. As has already been mentioned, motion flow over a
Depending on the type of synthesis technique used, it may be given area exhibits smooth transitions. This means that areas
necessary to process the motion vectors temporally. Using with a high density of features will tend to produce several
current hardware, frame rates of 30 Hz are typical for camera- similar sound components, which adding up, result in greater
based systems. More specialized cameras can image at up to 120 overall amplitude.
Hz but processing these images in real-time becomes
problematic. For additive synthesis and similar generators a Lastly, when scale is taken into consideration, it can make sense
control rate as low as 30 Hz may not always be a significant to use it to control amplitude, with larger features sounding
problem, however, the time quantization artifacts that results louder. Note that since spatialization, beyond simple linear
when motion vectors are used to generate sonic grains are quite panning, also affects amplitude it might not always be necessary
noticeable and likely undesirable. The solution to this problem to control amplitude directly.
is to smooth out the vector field temporally by delaying each 4.4 Frequency and Timbre
vector individually by some random value normalized between 0
The most difficult mapping to motivate is that of parameters that
and the projected time until the next frame is processed. affect the pitch and timbre of the sound. That is not to say that
such mappings must always be arbitrary, but they largely depend
on the nature of the image used, the type of synthesis technique
160
employed and most importantly the intent of the composer or three voices to the music corresponding to each dancer, they
performer. were never explicitly identified by the system. The polyphonic
aspect of the music was a direct translation of the “polyphonic”
Even in situations where there is no spatialization, mapping
nature of the action on stage.
vector displacement in a pseudo-doppler fashion can often result
in interesting sound textures. Here, frequency is a function of To a limited extent, this form of control gestalt, where global
the displacement relative to the image origin. control structures implicitly result in similar global output was
already present in early zone-based systems like the VNS.
In some cases, where the image is to be controlled by a musical
However, the greater amount of information contained in motion
performer, simply assigning a given axis value to pitch can be a
vector fields coupled with microsonic sound generation means
convincing and easy to understand approach.
that these relationships occur at a much finer degree.
Other possible mappings for frequency include distance from
The performance scenarios outlined above use image analysis in
origin (), displacement direction (similar conceptually to
a traditional fashion. Some more exotic approaches include the
accordions and harmonicas) or displacement amplitude (related
use of pre-recorded video as a composition tool. Translating
to the pseudo-doppler approach).
visual structures and movements to musical forms can be a very
Timbre control in this framework is achieved by the efficient and rewarding method of generating musical material
superposition of a great number of sound components and by that can be further edited or processed as part of a composition.
altering the pitch and amplitude of these components. It is also The motion flow field lends itself especially well to the
possible to affect the timbre through the number of features generation of dense, micro-polyphonic scores.
present in the image. This can be done either by changing the
Returning to the realm of performance, the framework has also
input image so that it is less complex or by changing the
been used in conjunction with video content generated in real-
threshold of the feature detector. A greater number of features
time by a VJ, in order to have the visuals linked to part of the
directly translates to a greater number of synthesis components.
music. The possibility of robustly coping with a vast range of
When using granular processing of recorded sound, it is also possible input structure is a great asset in this scenario.
possible to control the timbre of the resulting sound by
In a somewhat less musical vein, it is also possible to use
assigning vector position to sound file position. For example, a
systems based on motion flow fields to perform automated foley
vector moving from the left edge of the image to the right edge
tasks. While some research has been done in this direction in the
might trigger a sound to be played back from start to end (or
past [17], it assumed that the motion of objects in the scene was
vice-versa.)
already known. With some adjustments and proper sound
generation algorithms, it is possible to create convincing sound
5. AESTHETIC ISSUES effects, especially considering the spatial gestalts outlined
The framework presented here is meant to be general in nature above.
and adaptable to many different situations. As a performance
tool, it offers a natural method of controlling sound clouds and 6. IMPLEMENTATION
dense textures. Two usage examples highlight an important While the general concepts of how the framework can be
aesthetic aspect, that of control gestalt . This control gestalt acts
implemented are presented in earlier sections, the current
as a binding agent between perceptual groups, or clusters, in
system implementation will be described in greater detail.
both the source image and its sonified form [28].
Recent portable computers often come equipped with a camera Despite its usefulness, the computation of the motion flow field
mounted somewhere above the screen. With this camera, we can remains somewhat intensive, limiting both the maximum frame
control the parameters of a sound mass generated through rate, minimum latency, image size and CPU cycles left for
additive synthesis. If frequency is a function of the motion sound generation. This is especially problematic since the sound
vector's position, then head movements towards and away the synthesis algorithms used tend to also be rather taxing. In the
screen will result in sonic expansion and contraction, as each of earliest implementation, the solution was to use two computers
the components' frequencies more towards and away from each with one dedicated to image analysis and the other to sound
other. The image features are also contracting and expanding generation. This solution worked well but it is bulky and costly.
away from each other. However, we do not need to actually In recent years, much attention has been directed towards
measure this change. By virtue of direct sonification, the global general processing on graphical processors (GPGPU) [13].
characteristics of the motion flow field are expressed in the Already a number of libraries, such as OpenVIDIA [6],
sound output. implement some computer vision task on the GPU, freeing the
CPU for other tasks and sometimes yielding improvement in
The first use of a system based on this framework by the author performance of an order of magnitude [23].
occurred in January 2007 for an improvised dance performance
held at the Hy go Performing Arts Center. The sounds were The system is currently implemented as an external object for
generated by granular processing of sound files, with each Cycling'74's Jitter system. Standard Jitter functionality is used
grain's spacial location mapped so that it would sound to the for image input but all further processing is carried out
audience as though it was coming from where the dancer was. If internally. While this most recent implementation of the
there were two dancers standing at opposing sides of the stage, framework uses the GPU to perform the image analysis, it is
two different sound clusters could be heard in those positions. independent of existing software libraries. When GPU
When a third dancer raced across the stage, yet another sound processing is available, features are identified using the Shi-
followed him. However, while it sounded as though there were Tomasi method and matching is performed using the sum of
161
squared differences. If the computation cannot be performed on [8] C. G. Harris, “Determination of ego-motion from matched
the GPU, it reverts to the previous CPU-based algorithm, where points,” in Proceedi ngs of the 3rd AlveyVision Conference,
features are selected using the FAST method and are then pp. 189-192, 1987.
tracked using pyramidal Lucas-Kanade optical flow estimation. [9] A. Hunt, M. Wanderley, and R. Kirk, “Towards a Model for
It should be noted that since different feature detection and Instrumental mapping in Expert Musical Interaction,” in
tracking algorithms are used the vector fields generated by GPU Proceedi ngs of the 2000 Inter national Computer Music
and CPU implementations will differ. In practice, however, they Conference , pp. 209-212, 2000.
will display similar characteristics that will result in very similar
sound output. [10] J. Jakovich and K. Beilharz, “ParticleTecture: interactive
granular soundspaces for architectural design,” in
After the motion flow field has been processed to remove the Proceedi ngs of the 2007 inter national Conference on New
noise and make adjustments to its coordinates it is sent to the inter faces For Musical Expression, pp. 185-190, 2007.
sound synthesizer via OSC [29]. OSC is used to decouple the
analysis module from the synthesis module, which is meant to [11] A. Kapur, G. Tzanetakis, N. Virji-Babul, G. Wang, and P.
be implemented by the user. Temporal smoothing through Cook, “A Framework for Sonification of Vicon Motion
random delay can be electively performed prior to output. Capture Data,” in Proceedi ngs of the 8th Inter national
Conference on Digital Audio Effects, 2005.
7.CONCLUSION [12] E. Klein and O. Staadt, “Sonification of Three-Dimensional
Motion flow fields are not a perfect method of controlling Vector Fields,” in Proceedi ngs of the SCS High
musical parameters. As outlined above, the temporal resolution Perfor mance Computing Symposium, pp 115-121, 2004.
is comparatively poor. The biggest flaw is probably that feature [13] D. Luebke, M. Harris, N. Govindaraju, A. Lefohn, M.
detection and tracking algorithms are not perfectly robust. When Houston, J. Owens, M. Segal, M. Papakipos, and I. Buck,
used as an instrument, it is often very difficult to finely control “GPGPU: general-purpose computation on graphics
individual components, as one cannot know with certainty where hardware,” in Proceedi ngs of the 2006 ACM/IEEE
precisely features will be identified in real-world situations. Conference on Supercomputing, p. 208, 2006.
However, motion flow fields are better suited for control of
dense masses of sound which in practice alleviates the problem. [14] K. Mikolajczyk and C. Schmid, “Scale & Affine Invariant
Its main merits lies in the generality of the approach, the Interest Point Detectors,” Inter national Journal of
possibility of using natural structures as a source of sonic Computer Vision, vol. 60, no. 1, pp. 63-86, Oct. 2004.
complexity and the control gestalts outlined above. [15] F. Mokhtarian and F. Mohanna, “Performance evaluation of
corner detectors using consistency and accuracy measures,”
8.REFERENCES Computer Vision and Image Underst anding, vol. 102, no. 1,
pp. 81-94, Apr. 2006.
[1] S. S. Beauchemin and J. L. Barron, “The Computation of
Optical Flow,” ACM Computing Surveys, vol. 27, no. 3, pp. [16] N. Moody, N. Fells, and N. Bailey, “Ashitaka: an
433-367, 1995 audiovisual instrument,” in Proceedi ngs of the 2007
inter national Conference on New inter faces For Musical
[2] J.-Y. Bouguet, “Pyramidal implementation of the Lucas-
Expression, pp. 148-153, 2007.
Kanade feature tracker,” Intel Corporation Microprocessor
Research Labs, 1999. [17] M. Nayak, S. H. Srinivasan, and M. S. Kankanhalli, “Music
synthesis for home videos: an analogy based approach,” in
[3] A. Camurri, S. Hashimoto, M. Ricchetti, A. Ricci, K.
Proceedi ngs IEEE Pacific-Rim Conference On Multimedia,
Suzuki, R. Trocca, and G. Volpe, “EyesWeb: Toward
pp. 1556- 1560, 2003.
Gesture and Affect Recognition in Interactive Dance and
Music Systems,” Computer Music Journal , vol. 24, no. 1, [18] M. Ojanen, J. Suominen, T. Kallio, and K. Lassfolk,
pp. 57-69, Apr. 2000. “Design principles and user interfaces of Erkki
Kurenniemi's electronic musical instruments of the 1960's
[4] M. Cardle, S. Brooks, Z. Bar-Joseph, and P. Robinson,
and 1970's,” in Proceedi ngs of the 2007 Inter national
“Sound-by-numbers: motion-driven sound synthesis,” in
Conference on New inter faces For Musical Expression, pp.
Proceedi ngs of the 2003 ACM Siggraph /Eurographi cs
88-93, 2007.
Symposium on Computer Animation, pp. 349-356, 2003.
[19] C. Roads, Microsound, Cambridge, Mass., USA: MIT
[5] R. Fischer, K. Dawson-Howe, A. Fitzgibbon, C. Robertson,
Press, 2001.
and E. Trucco, Dictionary of Computer Vision and Image
Processi ng, New York: Wiley, 2005. [20] D. Rokeby, "Very Nervous System," Nov. 2000. [Online].
Available: http://homepage.mac.com/davidrokeby/vns.html
[6] J. Fung and S. Mann, “OpenVIDIA: parallel GPU computer
[Accessed Apr. 15, 2008].
vision,” in Proceedi ngs of the 13th Annual ACM
inter national Conference on Multimedia, pp. 849-852, [21] E. Rosten and T. Drummond, “Machine learning for high-
2005. speed corner detection.” in Proceedi ngs of the 9th European
Conference on Computer Vision, pp. 430-443, 2006.
[7] M. Funk, K. Kuwabara, and M. J. Lyons, “Sonification of
facial actions for musical expression,” in Proceedi ngs of [22] J. Shi and C. Tomasi, “Good Features to Track.” in
the 2005 Conference on New inter faces For Musical Proceedings of the 1994 IEEE Computer Society
Expression, pp. 127-131, 2005. Conference on Computer Vision and Pattern Recognition,
pp. 593-600, 1994.
162
[23] S. Sinha, J.-M. Frahm, M. Pollefeys, and Y. Genc, “GPU- [27] B. Truax, “Real-Time Granular Synthesis with a Digital
based video feature tracking and matching.” presented at Signal Processor,” Computer Music Journal , vol. 12, no. 2,
Workshop on Edge Computing Using New Commodity pp. 14-26, Summer 1988.
Architectures, Chapel Hill, North Carolina, USA, May [28] S. Williams "Perceptual Principles in Sound Grouping," in
2006. Auditory Display: Sonification, Audification and Auditory
[24] P. Smith, D. Sinclair, R. Cipolla, and K. Wood, “Effective Inter faces , G. Kramer, ed. Santa Fe Institute Studies in the
Corner Matching,” in Proceedi ngs of the 9th British Sciences of Complexity, Proc. Vol. XVIII, Reading MA:
Machine Vision Conference, pp. 545–556, 1998. Addison Wesley, pp. 95-125, 1994
[25] S. M. Smith and J. M. Brady, “SUSAN—A New Approach [29] M. Wright, A. Freed, “Open Sound Control : A New
to Low Level Image Processing,” Inter national Journal of Protocol for Communicating with sound Synthesizers,” in
Computer Vision, vol. 23, no. 1, pp. 45-78, May 1997. Proceedi ngs of the 1997 Inter national Computer Music
[26] S. Soto-Faraco and A. Kingstone, “Multisensory Conference, pp. 101-104, 1997.
Integration of Dynamic Information,” The Handbook of [30] W. Yeo, J. Berger, “Application of Image Sonification
Multisensory Processe s, G. A. Calvert, C. Spence, and B. Methods to Music,” in Proceedi ngs of the 2005
E. Stein, Eds. Cambridge, Mass., USA: MIT Press, pp. 49- Inter national Computer Music Conference, 2005.
68, 2004.
163
P[a]ra[pra]xis: Poetry in Motion

Josh Dubrau Mark Havryliv 3rd Author
Faculty of Arts and Social Sciences Sonic Arts Research Network 3rd author's affiliation
University of New South Wales Faculty of Creative Arts 1st line of address
Australia University of Wollongong 2nd line of address
Australia Telephone number, incl. country code
joshninelives@internode.on.net
mhavryliv@gmail.com 3rd E-mail
ABSTRACT Words are treated (and encapsulated) as objects, with properties

P[a]ra[pra]xis is an open two-part software suite and Java library and relationships to other words that can be evaluated and used in
(JAR) that facilitates the realtime creation and simultaneous realtime; it becomes possible to sonify text as whole words, using
sonification of poetry/prose. It is particularly designed to well-defined relationships between different words, rather than
implement word substitutions based on the psychoanalytical sonifying text as characters or keyboard events alone. However,
principles of free association and metonymic slippage. this raw power delivers us a ‘blank-slate’ problem: how do we
create an appropriate framework for this linguistic data? How do
The first part, P[a]ra[pra]xis Collection Editor, allows a user to we incorporate these extra dimensions of words into a Human
create and maintain a dictionary of words and their grammatical Computer Interface (HCI)? Use them as a creative tool that can be
properties (i.e. verb, singular noun, pronoun etc.) and the meaningfully integrated with music?
corresponding properties of user-defined substitutions for those
words. The second part, Realtime P[a]ra[pra]xis, executes these Magnusson [5] presents a useful discussion in which designing
substitutions as the user/performer types, and broadcasts OSC music software is framed as a semiotic act, “structur[ing] a system
messages containing the properties of the original and substituted of signs into a coherent whole that incorporates some
words, along with discrete notifications of keyboard events. compositional ideology (or an effort to exclude it)”. He draws a
distinction between traditional HCI design (representational and
A case study (based on a live networked performance) is task-based, often imitating real-world tasks in order to prepare
presented which highlights one particular usage of this program in and organise information) and the type of design that uses the
the form of an Instant Messenger (IM) style chat with interpolated computer for artistic creation. The distinction between the two,
‘Freudian slips’ to create a dialogue which changes between the however, is problematic. Magnusson argues that whilst a user
point of transmission and the point of reception, and engaged in creative practice “deploy[s] software to achieve some
spontaneously generates music reflecting physical and emotional end goals...this very software is also a system of representational
changes in the dialogue. meanings, thus influencing and coercing the artist into certain
work patterns.” This is as true for the most permissive musical
Keywords software as it is for the most restrictive, seen in the user
Poetry, language sonification, psychoanalysis, linguistics, Freud, modifications (such as GUI extensions and sliders) created even
realtime poetry. for such flexible programs as pd.
In the case of text to music converters, software at both ends of
1. INTRODUCTION the spectrum remains creatively restrictive despite program
Text-to-sound converters are not uncommon. Realtime music sophistication and flexibility. The NLTK, for example, has the
software like pd, Csound and SuperCollider can receive discrete ability to track words along multiple axes (synonym, homonym,
keyboard events when a key is typed. Other software maps text (as antonym etc.) yet it still treats words only as raw data; music
ASCII characters) either to MIDI note numbers or to an MP3 file, produced through a linkage to the NLTK that is not based on a
invariably based on transmogrifications of alphabet positioning to structured relationship between the performer, language and
pitch, texture or rhythm. More advanced converters create meta- sound can harness no more of the power of language than an
descriptors (which may be based on a readability index, or some ASCII conversion. In a creative environment increasingly rich
other lingual parser) which are then used to control musical with collaborative and multi-modal performances, there exists a
parameters. Please see [1] for an extensive listing and discussion gap, a loss of meaning in the translation of text to sound.
of software. Common performance techniques include: poetry performance
with sonification designed and improvised in response to the
At the other end of the spectrum, sonification mechanisms have performed poetry in real time or the previously mentioned ASCII
been developed that can be linked to specialist language systems. conversion of typed text, perhaps with extra manipulation [6]. In
SoniPy[2] is an open framework for sonification written in many performances which use data to generate sound, the source
Python, and can therefore be linked to the Natural Language of the data, or its potential ‘meaning’ is often considered largely
Toolkit (NLTK) [3]. In turn, NLTK can import data and functions irrelevant. However, where text is employed as the data source it
from Wordnet – a dictionary and development toolkit in which is mostly on display in some form, whether auditory or visual.
“[n]ouns, verbs, adjectives and adverbs are grouped into sets of This would seem to imply that the text/sound relationship has a
cognitive synonyms, each expressing a distinct concept” [4].
164
certain importance, which may not be fully realised in current native tongue) has nothing to do with either the image it conjures
processes of sound generation. up, or the physical reality of a tree. This idea, that sign and
signified have no innate connection, has played out in many
different guises over the course of the last hundred odd years,
2. P[A]RA[PRA]XIS: A SEMI[ER]OTIC beginning with early modernism, and culminating in multiple
MACHINE instances of user-created semiotic systems, where any sign may be
P[a]ra[pra]xis provides a platform for the performer (or attached to any signifier, as long as the relationship is pre-
musician, or writer) to sculpt a personally meaningful system of determined. In the paper previously mentioned, Magnusson sees
linguistic substitution within a self-created text. Although the that “actors and the contexts in which they function are all
P[a]ra[pra]xis Suite software is applicable to any project elements in a semiotic language…We provide a semiotics or
involving the sonification of data gathered from lingual suggest language games where the behaviour of an actor maps
substitutions, it was created with a particular direction in mind. onto some parameters in a sound engine. For example, vertical
The term ‘Parapraxis’ emerged as an English translation for what location of an actor could signify the pitch of a tone or playback
Freud termed die Fehlleistung, literally, ‘faulty action’, used to rate of a sample”[5].
describe the unintentional miscommunication occurring during
even the most banal of daily human interactions [7]. It In taking on Saussure’s notion that ‘the link between signal and
encompasses the range of mistaken perceptions, actions or speech signification is arbitrary’, many conceptual versions of semiotic
which occur when the subconscious and the conscious mind, as is systems fail to take a key factor into account: much of the power
generally the case, are working to non-aligned agendas, and is of language arises precisely because of the false innate meaning
commonly known as the Freudian slip, where you may ‘say one we ascribe to individual words. P[a]ra[pra]xis aims to utilise this
thing but mean your mother’. Needless to say, its motives are power by involving the performer/user in a tension between
often classed as sexual. emotional or psychic resonances which may be attached to
particular word significations and the implementation of a rule-set
The unique combinations of words and concepts which parapraxis which can make what may at first appear to be extremely radical
creates also lend an additional flexibility to grammatical norms. changes to the associations between words as we generally use
Whereas Freud’s ‘parapraxis’ is either a singular instance or a them.
genre-descriptor of such an error and constitutes that which is a
kind of ‘sub-normal activity’ in relation to the business of This returns us to Freud’s investigation of the hidden associations
perception and communication, our version, P[a]ra[pra]xis, lurking in every Parapraxis; P[a]ra[pra]xis works to open up
conflates the nuance of ‘para’ meaning ‘beyond’, or ‘outside of’ these associations in several ways. Firstly, a user involved in
with the academic notion of ‘praxis’ as theory put into action: entering or modifying words for the dictionary file is free to
thus it comes to describe an entire way of creatively exploring explore their own mental links between sounds, text and ideas.
language and music through the building of user-initiated When dealing with the word ‘box’, one man’s ‘bo[ra]x’ may be
dictionaries based on free association and metonymic slippage [8]. another man’s ‘b[ot]ox’. When playing P[a]ra[pra]xis in real-
time, users will be forced to respond to lingual substitutions
In the early 1900s, the Swiss linguist, Ferdinand de Saussure, was determined by a dynamic, but grammatically oriented rule-set. A
responsible for the development of a linguistic apparatus which player writing a poem or story will be subjected to a continually
re-defined the focus of the relationship between words and the altering narrative, and will thus involuntarily form new chains of
ways in which meanings become attached to them. Saussure signification, by either engaging or refusing to engage with the
claimed the linguistic sign as “a two-sided psychological entity”, material presented.
consisting only of “a concept which exists in equilibrial
relationship with a sound pattern” [9].
3. THE P[A]RA[PRA]XIS SOFTWARE

SUITE
The language substitutions that occur when a performer enters a
dictionary word in the P[a]ra[pra]xis set are predicated on six
linguistic conditions: anagram; phonetic substitution; predictive;
additive; subtractive; midrash. For a more detailed outline, please
see [10]. The P[a]ra[pra]xis Software Suite includes two
applications which together enable the creation of and
implementation of the word substitution process described above.
P[a]ra[pra]xis Collection Editor is a straightforward Java
application which manages the relationships between words and
their possible substitutions. Realtime P[a]ra[pra]xis is also a Java
Figure 1. de Saussure’s diagram, demonstrating the application; it handles the realtime implementation of rules
relationship between concept and sound pattern designed within it on a dictionary file created in the Collection
Editor. Here, a rule describes the conditions that must be met for a
In one of Saussure’s most well-known diagrams (See Figure 1. word to be substituted by another word.
above) the concept is designated by a word which ‘stands in for’
an actual physical object. The ‘word’ tree (or arbor, in Saussure’s A typed word is only replaced if two conditions are met: the typed
word exists as an ‘original’ word in the dictionary; and the typed
165
word has at least one substitution that meets the conditions of a properties. These are appended sequentially to the list broadcast
rule. For example, if the rule stipulates that nouns can only be on the /knownWord and /replacement address patterns1.
replaced with other nouns, and the typed word is a noun but none
of its possible replacements are nouns, no substitution is made.
Figure 2. shows how a set of possible substitutions are filtered
into a set of legal substitutions.
Figure 3. Pd receives information regarding the performer’s

input in realtime. Key presses, unknown words, known words
and (when applicable) replacement words are broadcast, along
with their properties.
As well as the two Java applications, we are making a Java library
(JAR file) available which provides all the functionality for
Figure 2. A screenshot from Realtime P[a]ra[pra]xis. A user P[a]ra[pra]xis. This library can be used to develop custom
has just typed ‘I’ve been on the net, trawling’ when ‘trawling’ graphical interfaces as well as manipulating word and word
is found to have possible word substitutions, shown in the ‘All substitution relationships in a unique way and from the ground
P[a]ra[pra]xes’ column. The rule here, however, only allows up.
present-tense verbs to be replaced with a word that is either an
adjective or a past-/present-tense verb. Further, this rule only 4. CASE STUDY
permits a phonetic substitution, disallowing ‘[ex]tra[]w[il]ling’ Po[or Symm]etry [Dra]in[s] [E]motion[s] is a live networked
as a possible substitution. performance piece developed with P[a]ra[pra]xis software and
Realtime Parapraxis broadcasts on four different OSC address implemented using the JAR library and a custom GUI. It should
patterns which, interpreted together, give an external application be stressed that this description is not prescriptive; lyrical and
insight across the continuum of a performer’s input: musical decisions are entirely decoupled from the core software.
1) /key (Integer): the ASCII code for each key typed; The work presents an Instant Messenger (IM)-like conversation
between two people, in an obviously troubled relationship. Both
2) /word (String): a String that contains the last word
screens are presented separately to the audience.
typed. This is sent whenever a non-alphanumeric
character is typed and the system assumes a word has As performer A begins to type, the text is displayed unadulterated
been completed, but this word is not found in the on their screen. When they click ‘send’, however, the text briefly
original word list in the dictionary; appears in its original form on performer B’s screen before being
‘re-written’, converted on screen as though it were being typed.
3) /knownWord (String): a String that contains the last
Figure 4 shows part of this conversation.
word typed if that word appears in the dictionary’s
original word list - whether or not the word has been The dictionary for this piece consists of about 160 words; there
substituted - followed by a list containing that word’s are currently 370 possible substitutions which can be made. Even
properties; and if the performers have worked with the piece before, there will
still be linguistic surprises.
4) /replacement (String): a String that contains the word
that replaced the last word typed only if a legal The piece presents both an auditory and a visual rendering of the
substation was made, followed by a list containing that ways in the ‘meaning’ of language shifts: from ‘speaker’ to
replacement word’s properties.
Figure 3. shows the OSC output as received in pd. 1
Whilst the default OSC output is a space-separated string that
Further, Parapraxis Collection Editor allows users to create their contains a word’s properties, the software can also output a list
own word/replacement descriptors as well as the standard of Boolean states that may be more easily interpreted in
different music software.
166
‘hearer’ and from the utilitarian meanings we ascribe to words for the relationships between glissandi in a fugal counterpoint, and
the sake of shared communication to the metonymic resonances signaling the start of a new invocation of the cantus firmus, or
(often unwelcome) which are engendered in the unconscious principle melodic line.
mind. As the performer/musician/writer has complete control not only
over the possible substitutions created for dictionary words, but
also over the framework in which to define their relationships, it
is very easy to generate audio output which maps the emotionality
of the piece through changes in the text.
5. CONCLUSION
The development of this P[a]ra[pra]xis software suite marks a
milestone in a continually evolving and expanding project.
Starting from the simple shared idea of a basic real-time
interactive poetry generator, we have been drawn to grammar,
linguistics, psychoanalytical theory and serial, electronic
composition as tools to investigate the human relationship to
language.
P[a]ra[pra]xis marks a collaboration between two authors from
divergent backgrounds within the Creative Arts field; Poetry and
Sonic Arts. In order to make P[a]ra[pra]xis a genuine
collaboration, not just an outsourcing of difficult specialist tasks,
we have had to adjust and develop our perceptions of our own and
each others’ language, just as those who play P[a]ra[pra]xis will.
Hopefully others will find this as beneficial as we have.
6. REFERENCES
[1]Judge,A.<http://www.laetusinpraesens.org/docs00s/conv
ert.php> (draft, 2007)
[2] Worral, D., Bylstra, M., Barrass, S., and Dean, R.
Sonipy: The Design of an Extendable Software Framework
for Sonification Research and Auditory Display Proc.
ICAD 2007, Montreal, Canada.
[3] NLTK <http://nltk.sourceforge.net >
[4] Wordnet <http://wordnet.princeton.edu>
[5] Magnusson, T. Screen-Based Musical Interfaces as
Figure 4. Screen shots of the IM performance. Each person Semiotic Machines Proc. NIME 2006, Paris, France.
sees the original text they type whilst only seeing the altered
version of the other person’s text. [6] See, for example, the work of Jenkins, G.S. at
Because the text is re-written in realtime on the other person’s
<http://www.1-4inch.com/archive05.html>
screen, (typically animated at around 25 mSec/character), the [7] Freud, S. ‘The Psychopathology of Everyday Life’ in
performance develops its own pace. Also, a visual counterpoint The Standard Edition of the Complete Psychological Works
develops between the two screens, as the square brackets make of Sigmund Freud, Vol. VI (London: The Hogarth Press
substituted sections appear especially dense. and the Institute of Psychoanalysis, 1966).
The music is generated by interpreting a number of performance [8] Detailed discussion on the role played by metonymic
artefacts. Based on a set of endless glissandi [Risset], their slippage in the functioning of the unconscious can be found
relative base frequencies and speed are continually modified as a
counterpoint to the tension in the dialogue. Specific factors
in: Lacan, J. The Four Fundamental Concepts of Psycho-
controlling musical parameters are: average time between analysis, ed. Jacques-Alain Miller, trans. Alan Sheridan
keystrokes; sentence length; phrase length (how much a person (London: Hogarth Press, 1973)
types before pressing the ‘send’ button); and type of substitution. [9] de Saussure, F. Course in general linguistics, ed.
Interpreting the type of substitution is especially powerful. Whilst Charles Bally and Albert Sechehaye with Albert Riedlinger,
most of the substitutions are midrashes and use square brackets,
trans. Roy Harris, (London: Duckworth, 1983), 67.
phonetic substitutions and anagrams provide visual relief as well
as prompting a different kind of intellectual reaction from an [10] Dubrau, J. and Havryliv, M. P[a]ra[pra]xis in proc.
audience. The music-generating algorithm uses these to structure ACMC, June 2007
167
davos soundscape, a location based

interactive composition
Jan C. Schacher
Zurich University of the Arts
Institute for Computer Music and Sound Technology
Baslerstrasse 30
CH-8048 Zurich
0041 (0) 43 446 55 06
jan.schacher@zhdk.ch
ABSTRACT contrasts inspired us to devise a location-aware site-specific

Moving out of doors with digital tools and electronic music and musical work that would cover a large area and would have to be
creating musically rich experiences is made possible by the explored by an audience ready to do some walking. We called the
increased availability of ever smaller and more powerful mobile piece very prosaically the davos soundscape [1] and decided to
computers. Composing music for and in a landscape instead of for make use of an experimental dedicated GPS-enabled computing
a closed architectural space offers new perspectives but also raises platform that would be purpose-built for the specific demands of
questions about interaction and composition of electronic music. this project.
The work we present here was commissioned by a festival and ran
on a daily basis over a period of three months. A GPS-enabled 2. CONTEXT
embedded Linux system is assembled to serve as a location-aware Locative media work [2] draws from a rich background of
sound platform. Several challenges have to be overcome both Dadaist, Situationist, and post '68 philosophy and media theory.
technically and artistically to achieve a seamless experience and The focus lies on the creation of situations of social interaction
provide a simple device to be handed to the public. By building where the intention is to bridge the gap between the virtual, online
this interactive experience, which relies as much on the user's and Internet based media and the physical world, be it through
willingness to explore the invisible sonic landscape as on the mapping, tagging and/or representing one domain in the other. In
ability to deploy the technology, a number of new avenues for our own approach the situationist Guy Debord's concept of dérive
exploring electronic music and interactivity in location-based [3] is as much present as is the de/territorialization of sounds
media open up. New ways of composing music for and in a postulated by Deleuze/Guattari in their seminal work "Mille
landscape and for creating audience interaction are explored. Plateaux" [4]. The former intends to generate an urban psycho-
geography by dissociation of motion through urban space from its
function of transport or reaching a destination. The latter focus on
Keywords the role of sound and music in defining a territory or mental
Location-based, electronic music, composition, embedded Linux, space. Through the creation of a territory that exposes music in a
GPS, Pure Data, interaction, mapping, soundscape real landscape, we establish a non-deterministic composition that
owes its existence and perception as much to the presence of the
1. INTRODUCTION audience as to our own construction. In a sense, we present a
In the spring of 2007, we were commissioned to produce an typical open work, as set out by Umberto Eco [5], who defines an
interactive piece by the organizers of the annual music festival in open work as one that doesn’t intend to convey a definite meaning
the Swiss mountain town of Davos. Instead of proposing a more to be comprehended by the audience. The creation of the work of
traditional interactive installation in one of the festivals venues we art is rather an act in which the perceiving individual is directly
opted for a new approach that would expose the audience more involved by assigning it a personal meaning. The constituting
directly to the spectacular nature and landscape present. Davos is elements of the work do not represent a static structure but
best known for being the location of the annual World Economic establish a dynamic and fluctuating field of relationships.
Forum or the setting of Thomas Mann's Magic Mountain. The
town and its surroundings are characterized by richly varying In this context, the design of a musical expression takes on a
natural and urban environments, ranging from the urban chic of different meaning. The relevant issues here are not, for example,
the main shopping street to the alpine rock wilderness of the high modeling of the action-perception coupling [6] and the
peaks reaching up towards 3000 meters above sea level. These enhancement of the specific affordances present in the
technological platform deployed. The intended audience behavior
and the expressive qualities given by the technology are very
Permission to make digital or hard copies of all or part of this work for clear. The user moves through the landscape; the music heard is
dependent on his topographical position and his temporal
copies bear this notice and the full citation on the first page. To copy evolution through the given space. Sonic complexity arises out of
otherwise, or republish, to post on servers or to redistribute to lists, the superposition of natural environmental sounds and composed
requires prior specific permission and/or a fee. materials triggered by the motion of the user. The real cognitive
NIME08, June 5-7, 2008, Genova, Italy achievement is the ability to perceive one's movement in space as
Copyright remains with the author(s). the acting element of the piece and to merge proprioperception
168
with listening on at least two levels (natural sound and composed point of interest in the landscape and extending spheres of
electronic music) and to engender from this a meaningful musical influence of varying size. We collected the GPS-coordinates for
experience. This setting is representative of a number of everyday each spot, assigned them their music, a playback mode, a radial
musical and sonic experiences that form part of our contact with amplitude envelope which controls the cross fade between zones
mobile technologies. Ubiquitous or pervasive computing is a or the increase in volume towards the point of interest and finally
growing trend through the closer integration of a increasing the size of the sphere. The eight routes were mapped out with a
number of technologies into our personal communication devices. total of 86 points of interest, each covering a large area and
Tapping into this potential for musical creation and acquiring overlapping with one or several of the neighboring zones. Google
knowledge and experience in dealing with these issues is one of Earth became an invaluable tool for planning and visualizing the
the main motivations of this project. spatial relationships of the sound zones and routes. (Fig 1.)
The following projects serve as examples of other location-based While approaching the task of producing the actual music for the
concepts in the urban field. Both Akitsugu Maebayashi’s Sonic davos soundscape several additional strategies emerged. Since
Interface [7] and Lalya Gaye’s Sonic City [8] use urban sounds one of the premises was to present a transparent sound overlaying
and user interaction to create a mobile personal soundscape. Marc natural and composed elements through open headphones, it was
Shepard's Tactical Sound Garden [9] is about planting sounds in quite natural to think of the effects an augmented acoustic reality
an urban context and it locates the user through triangulation of achieved by using field recordings or recordings of natural sounds
known wireless hotspots. The projects Mediascape by HP Labs as well as their polar opposites, the purely synthetic sounds. It
[10] and the net_dérive by Atau Tanaka for Sony/CSL [11] merge soon became apparent that the strict separation of the two would
different types of media content and location technologies to be difficult and not very desirable if one wants to to maintain the
create an urban and social interaction. sonic unity of the piece. The music contains brief sequences of
field recordings made on site, sometimes deliberately displaced,
for example where the cowbells from the alpine pastures make
3. COMPOSITION their appearance on the busy main thoroughfare or when the
The first task in our composition process was to define how the sound of the wash of the waves on the lakeshore reappears in the
landscape should be subdivided. Eight routes were devised, each middle of the mountain woods. Since most of the time no
representing an essential aspect of the area. The two town centers predetermined chronology is possible in the arrangement of
(Davos is actually split in two); the lake-side promenade; the sounds, the music rarely establishes a linear evolution. Most zones
town’s park; the famous two kilometer long hill-side promenade; have several possible neighborhood relationships; the music can
the walk downriver to the secluded forest cemetery; the high overlap and occur in a number of combinations all depending on
pastures and woods on the slopes above the Schatzalp sanatorium the itineraries chosen by the visitors. It was our principal intention
and finally the alpine hiking trails high up towards the to generate an inderminate field of acoustic possibilities that had
Weissfluhjoch. Each of these routes was treated differently, the to be explored and experienced in an individual way.
sonic structure or spatial placement of the music governed by a
different principle. The longitudinal topography of the hillside 4. TECHNOLOGY
promenade for example engendered sequential musical segments At present mobile devices equipped with GPS are becoming very
that connect differently according to the point of entry and the common but they were less accessible when we evaluated
direction along which one walks on the promenade. possible solutions for our GPS-enabled music platform. Based on
the options available at the time the choice was made to use a
semi-industrial platform running Linux.
4.1 Hardware
The prerequisites for the mobile device were guided by a number
of personal choices. We wanted to be able to write custom code
without having to develop the entire software from the ground up.
We wanted a device that gives access to all low-level routines of
the Firmware or OS in order to set up daemons for automatic
upkeep of the devices for extended periods of time. Coming from
a background in electronic music, we were interested in using
data-flow software for composition. The device needed to contain
or easily connect to a GPS receiver and make the position-data
available on a software-accessible interface. It had to have a solid-
state medium on board that would store several hours of
uncompressed stereo PCM audio and an analog audio output
which could be controlled from software. The device needed to be
able to run for about eight hours on one battery charge. Most
Figure 1. The lakeside promenade and its corresponding importantly we wanted to avoid having to build any custom
sound zones (image from Google Earth) electronic components ourselves.
The circling of the lake by its promenade led to overlapping zones 4.1.1 Choice of Platform
some of which functioned like musical beacons across the water. After evaluating all available options, ranging from commercially
The common structure that emerged and became the guiding available GPS equipped PDAs to open source hardware, we
principle for all areas was the use of circular zones centered on a finally decided to use the Gumstix platform [12]. It fulfills many
169
of the prerequisites by offering a selection of expansion-boards in keyboard. All operations are executed from the host computer on
addition to its ARM-based motherboards. The three determining the command line through a secure shell.
factors were that the Gumstix run a Linux OS, that they offer an
expansion-board that hosts both a GPS receiver and audio I/O 4.2.2 PDa
chip and, as we were excited to learn, that a port of Pure Data An important step for our project was to port PDa to the Gumstix.
called PDa (Pure Data anywhere) [13] is available for the ARM Featuring a limited set of functions PDa still contains all essential
processors used on many of the embedded single board devices. tools for audio on such a system. Originally ported to run on an
Finally, several expansion modules exist for the Gumstix that iPaq-PDA this downscaled version of Pure Data has been
offer Compact Flash or MMC interfaces to connect large solid- successfully applied to Apple iPods, Linksys routers and a variety
state storage devices. of portable devices running Linux [14]. Since the ARM type
processor doesn't feature a dedicated Floating Point Unit, and
4.1.2 Device Assembly software processing of IEEE-754 32-bit floating point numbers is
In addition to the embedded computer with its daughter boards, extremely slow, PDa has been rewritten to run all DSP code in
three other components are necessary to make the device 16-bit fixed point numbers. This makes extending the audio-
complete: the battery, the GPS-antenna and the headphones. capabilities difficult and for that reason, for example it is not
Unsure about the actual power consumption we opted for a large possible to play compressed audio in the Ogg/Vorbis or the mp3
single-cell 3300mA/h Lithium Polymer battery. This is the same formats. Apart from this limitation normal patches can be written,
technology found in mobile phones and laptops. Preliminary tests system-access is given through the shell external and access to
had shown that the device consumed roughly 250 to 350 serial ports is possible after porting Pure Data’s comport object.
milliamps per hour so in theory a full charge should run for a full The most essential feature of PDa for our application is the ability
eight hours. Active GPS antennas are readily available and have a to extend its functionalities by writing dedicated objects in C.
form-factor ideally suited for mounting on top of headphones. The
entire device was assembled in a standard electronics shielding 4.2.3 Custom C Code
metal-case and packed into a soft case for protection and user- Because of the limited set of objects and for the sake of
friendlier packaging (see Fig. 3). efficiency, PDa is used as a kind of framework within which to
run our own code. The first task is to obtain the coordinates from
the GPS receiver. Thankfully, the data from the module used by
the Gumstix is made accessible on a standard serial port. This
stream of data is parsed for the standard NMEA GPS reports to
obtain a new set of coordinates every second [15]. At startup a
database file is loaded into a simple data structure which contains
the map with the coordinates of all the points of interests and their
associated sound files and further information about global scaling
factors, reference points and envelope tables. With each new GPS
coordinate, this internal map is evaluated and the appropriate
commands are generated to control a very simple patch which
consists of four sound file players and a mixer.
4.3 Long Term Usage

A lot of care was taken when planning and assembling the devices
to ensure a completely unattended functioning for the duration of
three months. By avoiding any manipulation by either the users or
Figure 2. The Gumstix computer with a 4GB Compact Flash the attending festival personnel, we achieved the goal that the
Card on top, the battery underneath and both the LiPo devices survive this extended period. Through a combination of
charger circuit and the active GPS-antenna on the left. shell scripts and daemons the device was made to boot straight
into the PDa patch and in case of a crash automatically relaunch
4.2 Software it. In order to reset the GPS receiver the Gumstix was also
automatically power cycled and restarted at regular intervals.
4.2.1 Linux OS Whenever not in use the devices needed to be charged since a full
The Gumstix computers come with a factory-installed fully charge can take as long as twelve hours.
functioning Linux OS that contains all necessary device drivers
for the possible expansion boards. In our case there was a problem 5. USER EXPERIENCE
with the OSS audio layer that took a lot of time to correct. In- We built a series of ten devices for the festival. Against a deposit
depth knowledge of Linux and experience in building and the public borrowed them from the local tourist office for the
compiling was a definite plus. In theory, things should work right duration of a day. The goal was to present a seamless experience
out of the box but our experience was a little bit marred by the that convey as little technological complexity as possible. With
problems we had getting the audio to work. The buildroot the compact packaging and the clear documentation this goal was
environment used for cross compiling for the ARM architecture clearly achieved.
and the available command line tools on the Gumstix offer a The audience or visitors were handed an already running device in
limited but essential set of features, all geared towards
a closed pouch to which a set of semi-open light headphones is
autonomous functioning with a minimal footprint. The working
attached. The GPS antenna is placed on top of the headphones to
methodology for embedded systems may take some getting used
ensure optimum satellite reception. There are no user-controllable
to, since the Gumstix typically provide neither screen nor
elements on the device and no setup is required. Right after
170
moving outdoors with the device there is a brief waiting period device capable of real-time interaction with intelligent electronic
since the GPS receiver needs to locate and identify the satellites, music generation out into nature and to witness a musical
download the almanac and obtain a stable position fix. A printed expression and spatial sonic experience which would not be
map of the landscape, including point descriptions and possible any other way. Due to the complexity of all elements
information, is handed out together with the GPS device. The involved, the composition and topographical principles applied to
eight routes that make up the davos soundscape are clearly the music in the davos soundscape had to remain quite simple.
marked. In the real landscape, to facilitate orientation but more The platform’s computing power and the flexibility of the
importantly to leave a physical mark, stakes painted a bright software offer a much greater creative potential that remains to be
orange and bearing the logo of the davos soundscape are planted explored. Generative, algorithmic music and a closer integration
at all 86 points of interest. of the user through sensor technology are only some of the ideas
that come to mind.
Feedback from members of the audience clearly indicates that a
memorable sonic experience was presented. Of course not all of For future iterations of the piece, the software will be ported to
the music can be heard in one day and sometimes the participants one of the new commercially available GPS enabled devices
have difficulties to orient themselves within the multitude of running Linux such as the N810 Internet tablets by Nokia. With
elements present within the acoustic domain. Often the terms these devices the hardware constraints are resolved and since the
“treasure hunt” and “exploring new territories” are mentioned. software has already received its validation, location-based
The intention to enhance Davos’ sonic reality by overlaying the interactive music experiences can now be imagined in many other
natural acoustic environmental with electronic sounds is not forms.
always recognized. This might be largely due to the fact that we
have all been conditioned to filter out external sounds when 7. ACKNOWLEDGEMENTS
wearing headphones. Depending on the weather the individual I'd like to thank Marcus Maeder for his partnership in this creative
experiences can also vary. Satellite signals get disturbed by endeavor, Alejandro Duque for his expertise in all things Linux
certain atmospheric conditions; some people reported problems and the organizers and sponsors of the Davos Festival 2007 for
during thunderstorms and were clearly apprehensive to walk making this project possible.
around under such conditions wearing an antenna on their head!
8. REFERENCES
[1] http://www.davosoundscape.ch
[2] Galloway, A; Ward, M; 2006. Locative Media as Socialising
and Spatialising Practices: Learning from Archaeology
Leonardo Electronic Almanac, Vol. 14, Issue 3/4.
[3] http://library.nothingness.org/articles/all/all/display/314
[4] Gilles Deleuze, Félix Guattari, Mille Plateaux, Minuit,
coll. « Critique », Paris, 1980, 645 p.
[5] Umberto Eco; Opera aperta. Forma e indeterminazione nelle
poetiche contemporanee, 1962, Bompiani,
[6] Marc Leman, 2008, Embodied Music Cognition and
Mediation Theory, p. 52-53, the MIT Press, ISBN 978-0-
262-12293-1
[7] http://www2.gol.com/users/m8/installation.html
Figure 3. One of 86 markers in the landscape and a visitor
with the GPS-device and Headphones. [8] Gaye L., Mazé R., Holmquist L. E. Sonic City: The Urban
Environment as a Musical Interface NIME 2003, Montreal,
6. CONCLUSION AND OUTLOOK Canada, May 2003
Davos soundscape taught us some valuable lessons. Complexity [9] http://www.tacticalsoundgarden.net
emerged as the constantly challenging factor. It happened on a
technical level during the first phase when a lot of elements had to [10] http://www.hpl.hp.com/mediascapes
be assembled and problems sorted out, before even raw sketches [11] http://www.csl.sony.fr/items/2006/net_derive
of the planned features could be made and evaluated. Once the
[12] http://www.gumstix.com/
prototype for the device was functioning, the challenge of
imagining, structuring and composing music for a landscape [13] Geiger, G. PDa: Real Time Signal Processing and Sound
arose. As musicians we are clearly not trained to think in loose Generation on Handheld Devices, Proceedings of the
aural or temporal relationships and need to learn how to deal with International Conference on Computer Music 2003
a real topographical space as the stage for our music. The final (ICMC'03) Singapore, 29. Sept. - 4. Oct, 2003
most valuable lesson learned was never to underestimate the [14] http://gige.xdv.org/pda/
demands that a series of experimental devices make to be able to
run for an extended period of time without any attendance. This [15] http://gpsd.berlios.de/NMEA.txt
being said, it still seems an intriguing concept to be able to take a All URLs were accessed and verified in April 2008
171
Posters

uOSC: The Open Sound Control Reference Platform

for Embedded Devices
Andy Schmeder Adrian Freed
Center for New Music and Audio Technologies, Center for New Music and Audio Technologies,
University of California, Berkeley University of California, Berkeley
1750 Arch Street 1750 Arch Street
Berkeley, CA 94720 Berkeley, CA 94720
+1 (510) 643-9990 +1 (510) 643-9990
andy@cnmat.berkeley.edu adrian@cnmat.berkeley.edu
ABSTRACT uOSC (pronounced “micro-OSC”), improves on the

A general-purpose firmware for a low cost microcontroller is performance, standards-conformance, and cost of OSC
described that employs the Open Sound Control protocol over implementations for new controllers and retrofits. uOSC uses
USB. The firmware is designed with considerations for the OSC protocol at the level of the embedded device itself,
integration in new musical interfaces and embedded devices. obviating the need for intervening applications to provide
Features of note include stateless design, efficient floating-point protocol translation and making possible more direct (and
support, temporally correct data handling, and protocol thereby, higher performance) access to the data. In addition,
completeness. A timing performance analysis is conducted. because high-speed manipulation of microcontroller pin
functions is provided, users can develop applications in any
programming environment with OSC support without learning
Keywords microcontroller programming or a new specialized language
Open Sound Control, PIC microcontroller, USB, latency, jitter such as Wiring [http://wiring.org.co].
A key aim of the uOSC effort is to provide the developer
1. INTRODUCTION community with a solid reference implementation of OSC to
1.1 Motivation extend and port to other embedded devices. Developers of other
The Open Sound Control (OSC) protocol [10] is widely adopted OSC clients and servers are facilitated by an affordable source
by the NIME community as a common means for the and sink of OSC data that can be integrated into tangible
transmission of streaming musical gesture data. In the human-computer interfaces.
observation of the authors, the success of OSC arises not from
its technical features but rather from its simplicity (i.e., low 1.2 Implementation Challenges
conceptual overhead and human-readability) and the promise of Since its introduction in 1997, the OSC protocol has been
interoperability with a diverse array of applications. For successfully integrated in dozens of hardware and software
example, the need for conceptual simplicity and generalized products and used in thousands of performances and
interoperability has led developers to create OSC “wrappers” installations. Unfortunately, nearly all implementations fail to
that translate other hardware protocols into OSC message implement the complete OSC 1.0 Specification1. In spite of
sources and sinks—such as HID (CUI-OSC, oscjoy), P5 Glove general consensus that temporal semantics are crucial for
(GlovePie), Nintendo Wii, SpaceNavigator, and MIDI musical interfaces [9], meaningful support for OSC timestamps
(OSCulator). While these efforts achieve usable results, is extremely rare. Support for the full address pattern-matching
indirection due to protocol translation introduces unnecessary syntax is often omitted perhaps because of unfounded concern
latency, and such translations have no hope of achieving over the performance of the pattern-matching algorithm.
timestamp and atomicity semantics present in OSC. It is natural
The implementation challenges have been noted by the core
then to ask why OSC is not implemented directly in the OSC creators [2], and this new implementation is carefully
hardware, thereby obviating the problem.
written to demonstrate efficient solutions and best practices.
A common assumption is that the features of OSC (floating
point support, high resolution timestamps, and a moderately 1.3 Overview
verbose binary representation) are excessive for embedded The focus in this paper is on novel features, design concepts
targets. The work described in this paper demonstrates that this and especially on how the implementation was optimized to
is no longer the case for contemporary microprocessors. achieve the performance and completeness that OSC
The freely available OSC-enabled firmware described here, applications require within the programming and performance
constraints of small microcontrollers.
Permission to make digital or hard copies of all or part of this work for To promote wider use of OSC, we initially targeted a physically
personal or classroom use is granted without fee provided that copies are small, readily available, extremely low cost (USD $25)
not made or distributed for profit or commercial advantage and that hardware platform, the PIC18F2455-based “bitwacker”.
1
NIME08, June 5-7, 2008, Genova, Italy A database of OSC implementations and their features is
Copyright remains with the author(s). online: http://opensoundcontrol.org/implementations.
175
The uOSC project source code, new developments, benchmarks point that is irrelevant to the affordable applications we have in
and details beyond the scope of this paper are documented mind.
online at http://cnmat.berkeley.edu/research/uosc.
3. FIRMWARE OVERVIEW
2. HARDWARE PLATFORM uOSC builds on the MCHPFSUSB firmware [13], an open-
source implementation of the USB control endpoint and a USB
2.1 Microchip PIC USB Full-Speed class-compliant serial port. The uOSC core program is triggered
uOSC runs on the popular and compact Microchip PIC18F by activity on the USB interface: receipt of the USB start-of-
USB-Full-Speed family of microcontrollers. The product line frame (SOF) packet from the host controller serves as an
spans chips from 20-80 pins, 10+ analog inputs, hardware isochronous 1000Hz timing beacon to which the firmware
modules for TTL, PWM, etc, 2-4Kb RAM, 8-128Kb ROM, and operations are synchronized.
CPU speeds of 12 MIPS. Many prototyping boards for these
devices are available for less than USD $100. The initial release
3.1 Device Clock
of uOSC specifically supports the Sparkfun Bitwacker, the The current time, relative to device initialization, is tracked with
CREATE USB Interface (CUI) [4]; and Olimex PIC-USB- a precision of 1 msec. The clock is incremented by the SOF
455x. (Pictured in Figure 1, ordered bottom-to-top). Microchip interrupt. Because this signal comes from the host controller,
provides a free C compiler (C18), an implementation of C the clock is not subject to any thermal drift or resonator
standard library and a comprehensive IDE. imprecision caused by the hardware. The clock is used for
bundle timestamping and scheduling.
3.2 Pin Initialization

By default all pins are configured as inputs on power up. The
user may change the direction of any pin by sending the
appropriate OSC message, and if this direction state is
committed to the flash memory, it will be restored on power-up.
3.3 Extensible Hardware Modules

Special features provided by hardware modules such as PWM
control and TTL serial can be enabled on user request and, if
desired, re-enabled on initialization.
3.4 Unique Identification

A non-volatile writable memory section is provided for the
storage of a unique 64-bit identifier. On first startup when this
identifier is undefined it is populated with a pseudo-random
number derived from the non-deterministic USB host
enumeration time.
3.5 Pin I/O

Double-buffering is used on digital and analog pin I/O to
minimize possible skew in timing of pin read and write
operations. Skew for digital I/O is less than one microsecond,
and approximately 30 microseconds between analog inputs.
Double-buffering also ensures that I/O operations always occur
at regular and known time slices.
Figure 1: A collection of devices supported by uOSC
3.6 USB-Serial Interface
2.2 USB vs. Ethernet uOSC implements the descriptor and endpoint logic to appear to
Many OSC users are only familiar with OSC data transported in the host controller as a CDC-ACM device. Of the many
TCP/IP packets. Even though OSC is transport independent, possible USB device classes this one has the advantage of being
overall application performance does depend on the particular the most “plug and play” as modern operating systems are
transport used, so it is worth examining the advantages of shipped with drivers that support it. It also offers higher
various transports. The key advantage to the NIME community performance and more flexibility than HID classes.
of USB is that power is provided over the cable. Although there
are now standards for sending power over Ethernet cables these CDC-ACM uses the USB bulk transfer type that has a
are not employed in current desktop or laptop computers. theoretical maximum bandwidth of 12Mbit/sec on a USB Full-
Another advantage of USB concerns timing aspects of OSC. Speed bus. We have measured rates of up to 3Mbit/sec of fully-
USB provides a timing beacon (the Start-of-frame packet), and formatted OSC data.
supports the timing guarantees and bandwidth reservation of
3.6.1 SLIP for Serial Transport Framing
isochronous data streams for appropriate device classes [12].
SLIP [11] is a simple and lightweight protocol popular for
USB provides point-to-point connections in a shared bus microcontroller applications that provides the framing
arrangement whereas Ethernet has network-wide addressing, necessary to mark the boundaries between OSC bundles on a
electrical capacity for long cable runs and can leverage the serial transport. The double-ENDed variant of SLIP is
performance benefits of a switched communication fabric. recommended because it provides a robust state-free detection
Currently 10-gigabit Ethernet is winning a throughput- of the start of a packet.
performance race over Firewire, USB and SATA, but at a cost
176
SLIP is the recommended framing method for OSC encoding oscBundleOpenTimestamped(); // sends SLIP_END and packs time
oscMessageOpen(); // reserves 4-bytes at start for length
over stream-oriented transport such as TCP and has already oscPackROMString("/rb“);
been used for this purpose in the popular Make Controller Kit p_osc_tt = p_osc_message + 1; // pointer to typetags
oscPackROMString(",NNNNNNNN“); // final types unknown
by Making Things [http://makingthings.com]. for(i = 0; i < 8; i++) {
// invokes oscPackInt16ToFractionalFloat
// returns 'T' or 'F' for digital pins
4. ULTRA-LIGHT OSC PROGRAMMING *p_osc_tt++ = oscReportPin(i);
The small memory model, limited type support, and low clock }
oscMessageClose(); // prepends length, invokes CDCTxRAM
rate of the microcontroller imposes challenging limitations on // other messages are packed in here…
the implementation of an OSC library that is both full-featured oscBundleClose(); // sends SLIP_END and finalizes CDCTxRAM
and easy to create and understand. 5. LOW-COST FLOATING POINT
4.1.1 OSC as Binary Data Structure A widely adopted OSC convention also used by audio plug-ins
OSC implementations typically translate from the OSC binary is to scale control parameters to floating point values using a
message structure to/from an appropriately typed data structure conventional representation such as the unit interval. The
in the native format of the language along with encoding benefit of this abstraction became obvious for PIC18 family of
metadata. With only a few thousand bytes of memory to work microcontrollers as Microchip recently upgraded the ADC on
with, uOSC cannot accommodate this style, and so the some new variants from 10-bit to 12-bit—an integer encoding
programmer works directly with C pointers to a statically would require target-specific logic on the client side to
allocated buffer. Only one incoming message and one outgoing accommodate both ranges.
message are simultaneously processed. This style was Even though the PIC18 processor has no hardware FPU,
anticipated in the OSC specification with the mod-4 byte- Microchip provides an implementation of <math.h>, the C float
alignment rule and conservative native type support. type, and IEEE-754 compliant operations by software
emulation. Profiling of this code revealed that the cost of int-to-
4.1.2 Open-Ended Bundles float conversion (90 microseconds per conversion) was too
An important feature of an OSC bundle is that the total length great for use at the desired reporting rates.
of the frame is not encoded in the bundle header. This allows
uOSC to format bundles with multiple messages while only We therefore created novel special-purpose code for floating
retaining a single outgoing message in memory. In addition, the point conversion that is exact for integers up to 23-bits and is
number of responses generated by an OSC pattern dispatch does approximately 3 times faster than the general-purpose library.
not need to be known in advance.
5.1 Theory
4.1.3 Type Considerations We take the normalized target range to be the closed interval
The PIC18 is an 8-bit processor, so, for efficiency the use of 8- [0.0, 1.0]. This results in the conversion formula:
bit and 16-bit numbers is preferred. OSC uses minimum 32-bit y = x / (2n – 1)
numbers, so uOSC provides efficient routines to pack 8-bit and
For simplicity, suppose that n = 8. x is given in binary digits as:
16-bit numbers. uOSC also provides routines to pack low-bit
depth integers as normalized floating point fractions, and to x = x8x7x6x5x4x3x2x1
pack automatically padded strings from ROM or RAM data. where x8 is the most significant bit. Then, as a binary repeating
uOSC packs boolean data types using the ‘T’ and ‘F’ typetags, decimal:
which do not consume any space in the data section of an OSC
message. y = 0.x8x7x6x5x4x3x2x1(x8x7x6x5x4x3x2x1)…
The conversion to y attains the sufficient precision as x when
4.1.4 Push-down of the SLIP encoder the decimal expanded to the first repetition of the most-
The SLIP reserved characters have the two highest bits set significant-bit of x (the 9th fractional digit above). This bit
(ASCII characters >= 192). The bulk of the output data stream equals 1 when y >= 0.5, and 0 otherwise. Furthermore, a special
does not require SLIP encoding. For example, the SLIP encoder case applies when x = 2n - 1, y = 1.0 since by definition of a real
can remain inactive in cases such as OSC address patterns that number, the repeating binary decimal
are printable ASCII, bundle sub-message lengths, NULL- 0.11111111(111111111)… = 1.0.
padding bytes, and other bytes known to be strictly less than
192. 5.2 Conversion Algorithm
The calculation of y as IEEE-754 single-precision floating point
4.1.5 Input decoding state machine proceeds as follows:
The SLIP decoder must be active at all times. To avoid the
necessity to reexamine input bytes, the OSC parser is embedded 1. If x = 0, return 0.0. If x = 2n – 1, return 1.0.
inside the SLIP decoder. The SLIP decoder, in turn, is 2. Scan digits of x to find the index, i, of the most
embedded in the USB serial input handler, resulting in a third- significant non-zero bit. Requires O(log n)
order nested state machine. The OSC parser consists of bundle comparisons. If x > = 2 (n - 1) – 1 then least significant
start detection, basic sanity checks on the packet format, and bit of y (first repetition of most significant bit of x) is
pointer retention to the location of address, typetags, start and 1, else it is 0.
end of the data section. Any SLIP decoding error causes the
3. Compute the exponent as e = 127 – (n – i).
entire bundle to be discarded.
4. Left-shift x by (n – i) + 1 places to obtain mantissa.
4.2 Code Example 5. Composite the exponent, mantissa and least
The following example illustrates the programming style on the significant bit together to realize IEEE-754 format.
microcontroller using the ultra-light OSC implementation to Requires O(n/8) shift and or operations.
create a port report with 8 data values of variable type:
177
The first byte is never SLIP encoded (sign bit is always zero). 7.1 Dispatch Table Structure
The last byte is SLIP encoded for y >= 0.5, otherwise the last The dispatch structure is a statically allocated tree structure
byte is zero. using the following data structure:
The inverse conversion similar algorithm is similar but also typedef struct _oscSchemaNode {
requires detection of denormal numbers and a rounding oscCallback target;
byte num_children;
operation. rom char* child_name[OSC_MAX_CHILDREN];
rom struct _oscSchemaNode* child[OSC_MAX_CHILDREN];
} oscSchemaNode;
6. OSC REPORTING
uOSC sends OSC packets reporting the current state of all pins Adding new method calls is simply a matter of inserting new
isochronously at intervals of two milliseconds. The reporting nodes into the root node.
itself consumes only approximately one millisecond of
processor time. The remaining time is used to handle other 7.2 Efficiency of Pattern Matching
device functions such as processing of incoming OSC The purpose of the OSC pattern syntax is primarily to enable
messages. Note that this doesn’t mean that there is 2msec of the client to compactly describe certain bulk and atomic
jitter in the measurements themselves -- their timing operations, not to provide a sophisticated search mechanism.
relationship to the 1000Hz USB-SOF beacon is precisely The OSC address pattern syntax is significantly less complex
known. An appropriately implemented host driver could than typical general-purpose regular expression languages.
achieve sub-millisecond timing precision. Specifically: 1. Patterns may not cross ‘/’ boundaries in the
address, 2. List matches do not support nesting or containment
6.1 Bundle Timestamps of other pattern operators, and 3. Character-class matches and
The bundle timestamp conforms to the NTP fixed-point format wildcard operators ‘?’ and ‘*’ are always greedy, obviating the
described in the OSC specification. The fractional part is need for backtracking. Therefore a pattern match is O(1) in
computed to a precision of 1 msec This is approximately 2-10, so memory.
a 16-bit integer is sufficient for the calculation. The fractional The set of possible matching addresses is finite, and for patterns
part is exactly zero at intervals of 1000 SOF interrupts, i.e., up to a set length, the total execution time to match is bounded.
there is no roundoff error accumulation. The integer part is a Furthermore, the dispatch process can leverage the nested
long integer, which is unbounded for all practical purposes. structure since child addresses cannot match if the parent fails
Since the host and microcontroller have a point-to-point to match.
connection the timestamp can theoretically be conformed to the
host computer’s best UTC approximation [2]. Our profiling shows that the cost of matching in uOSC is not a
cause for concern and in particular is not more expensive than a
6.1.1 Use of IMMEDIATE standard string comparison for the most common case of
Informational messages such as device firmware version, pin addresses that contain no wildcards.
capability reports, profiling and debugging information are not
time-sensitive and are encapsulated in bundles that use the 7.3 Scheduled Dispatch
IMMEDIATE timestamp (value: 0x0.0x1). When a received bundle has a timestamp in the future relative
to the device internal clock, the action of the packet can be
6.2 Efficient encoding of Port Reports delayed until the requested time. A bundle with a timestamp in
To save space in the data stream, sequentially numbered pins the past is discarded. This mechanism makes possible the
are grouped together in a single message called a port report. forward synchronization method for jitter compensation [5].
Each analog input pin is reported as a normalized floating-point The embedded processor has insufficient RAM to retain entire
number, OSC typetag ‘f’, requiring approximately 5 bytes. A packets for future processing so scheduling is limited to digital
pin configured as a digital input or output is reported as boolean pin writes, which are stored in a fixed length, insertion-sorted
using OSC typetag ‘T’ or ‘F’, requiring 1 byte of data space. A list.
pin that is not connected or in a reserved state (e.g., in use by a
hardware module) is reported as NULL using OSC typetag ‘N’, 7.4 Port Writes and Pin Aliasing
consuming 1 byte data space. The CNMAT OpenSoundControl The client can write to groups of pins organized in ports using
object for MaxMSP2 supports these types sensibly. the same format described in section 6.2. Individual pins can
For a port of 8 pins, the total size of the OSC message is 12-60 also be addressed using their specific addresses.
bytes, depending on current pin configuration. The same
number of pins encoded as separately addressed messages 8. DEBUGGING AND PROFILING
would require 96-160 bytes. Profiling is essential for code optimization. However, use of in-
circuit serial debuggers is known to be problematic for USB
7. OSC DISPATCH devices because associated interrupts are time-sensitive. Timing
An incoming OSC message is dispatched by matching its issues can also arise when using printf-style debugging over the
address pattern against a nested structure of path components TTL serial port.
and invoking the appropriate callback for each match. Full uOSC includes a microsecond-accuracy profiling system, and
support for OSC address pattern matching is implemented in when enabled by compile-time switch, timing of various
uOSC. operations are measured and reported periodically in
supplemental OSC messages. This solution has negligible
impact on the timing performance of the system.
9. DEVICE ADDRESS SPACE

2 The following lists the messages that uOSC generates and
This and other OSC-related software is available online at:
accepts for the case of the Sparkfun Bitwacker board. Minor
http://cnmat.berkeley.edu/downloads.
178
variations apply for other boards because of the “user friendly” 10.2 uOSC via Serial Connectivity
design choice that parameters are named according to the silk The serial driver is known to contain some input buffering so it
screens on each development board. is expected that this data pathway will not be as fast as the
reference platform. Two variations on accessing the serial port
9.1 Port and Pin Messages data were tested:
/ra ffffFf : generates/accepts port-report format
/0 : individual pin control for /ra/0
/info : returns “dio”, “adc”, “pwm”, “ttl” 10.2.1 MaxMSP serial -> slipOSC
/state : returns “input”, “output”, “reserved”
/set : accepts “low”, 0, 0.0, or “high”, 1, 1.0 The built-in MaxMSP serial object is used to perform high-rate
/get : see /set non-blocking reads on the corresponding serial port. A custom
/1-5 (same as /ra/0)
/rb fffffFFF : /rb port report (8 pins) object, slipOSC, decodes the SLIP framing into OSC
/0-7 (same as /ra/0) “fullpacket” messages compatible with the CNMAT
/tx (same as /ra/0)
/rx (same as /ra/0)
OpenSoundControl object.
/status
/0 : controls the yellow status LED 10.2.2 py-serial to UDP
/set : accepts “off”, 0, 0.0, “on”, 1, 1.0
/get : return LED state In this configuration, a Python program reads the serial port,
/1 : controls the red status LED decodes the SLIP framing, and relays the resulting datagram to
9.2 Device Messages MaxMSP via the network stack as a UDP datagram.
/device
/platform : returns “Bitwacker”, “CUI”, etc. 10.3 Discussion
/firmware : returns “uOSC 1.0”
/processor : returns “PIC18F2455 Rev. B4” The py-Serial method is clearly the worst performer (Figures 2,
/ports : returns a list of port addresses 3), as is expected due to the extra layer of indirection.
/pins : returns a list of pin addresses
/id : user-writeable 64-bit hex string It is clearly possible to attain timing performance within the
/save 1 : commits pin and module state to flash
/reset 1 : restore default state
desired latency bounds for musical performance (~ 6-8 msec),
/modules however the jitter observed requires consideration. The use of
/list : list available modules
/enable s : enable a module
OSC bundle timestamping can be used to compensate for this
/disable s : disable module jitter, and will be an interesting topic for future work.
/pwm
/0 : control the first hardware PWM
/rate f : rate in Hz
/duty f : duty cycle in [0.0-1.0]
/ttl
/0 : control first hardware TTL
/open [baud, bits, stopbit]
/read : return string data
/write : write string data
/close
/usb
/stall : stall detected
/error : error detected
10. HOST TIMING PERFORMANCE

In this section we analyze the timing performance (latency and
jitter) for data received by the host transmitted from an input
pin of a uOSC device attached to a simple switch circuit. The
method described in Wright et al. [11] is used, whereby the
acoustic signal of the switch activation is recorded with a Figure 2: Latency histogram on idle system
known latency simultaneously with a signal generated by the
method under test. Conditions are repeated with and without
background system load.
The data presented is intended to show typical performance
attainable with the current configuration and should not be
interpreted as the final target goal of this project.
10.1.1 CoreAudio
The CoreAudio path copies the sensor data into a dedicated
audio channel, available directly as audio in Mac applications in
particular as a signal in MaxMSP. Since primary interrupts and
core audio threads are the highest priority operations in OS/X
no priority inversion occurs. This represents the highest
reliability operating system path for musical applications, and
Figure 3: Latency histogram under system load
has a consistent input latency of 4 msec and peak jitter of 0.7
msec corresponding to the gesture-input scan rate. 11. Sample Applications
10.1.2 /dev/osc The uOSC platform has been successfully integrated into
The /dev/osc path writes the sensor data into a UNIX-style several new music controllers developed at CNMAT [3]. The
character-device file which is read using standard file I/O compact size of various hardware platforms have also allowed
operations via the devosc object for MaxMSP (see Footnote 2). us to retrofit older MIDI and analog devices such as the Max
Although preemption can delay packet delivery to Max, only a Mathew’s radio drum and various foot pedals.
single context switch is required to read data. A more sophisticated sensor platform was constructed using
custom hardware module extension added to uOSC that makes
179
use of the SPI port and other pins to communicate with a 3-axis
magnetometer having a digital communication interface.
Combined with standard analog input capabilities of uOSC, a
compact, high-speed inertial measurement module was
constructed for research into spatial gesture tracking (Figure 4).
12. CONCLUSION
This paper describes the implementation of OSC, including its
advanced timing features and type support, for an embedded
microprocessor.
By including deadline scheduling and timestamping, uOSC
contributes to a large project now underway to implement solid
deadline scheduling in future multi-core desktop and handheld
device operating systems. [5].
The inclusion of end-to-end latency and jitter performance
benchmarking demonstrates the current results with USB-serial
in relation to a best-case reference platform, an analysis that the Figure 4: IMU+magnetometer hybrid sensor built on the
Bitwacker running uOSC, mounted on Sennheiser HD650
authors consider to be essential for the discussion and careful
analysis of any similar implementation.
15. REFERENCES
13. FUTURE WORK [1] Brandt, Eli; Dannenberg, Roger, Time in Distributed Real-
Measurements and tuning on a wide range of host platforms is Time Systems, in Proceedings of the ICMC (San
ongoing. We are exploring other USB device classes such as the Francisco, CA, USA, 1998) 523-526
CDC-ECM (Ethernet Control Model) and USB-Audio classes, [2] Freed, Adrian, Towards a More Effective OSC Time Tag
both of which can use isochronous endpoints having improved Scheme, in Proceedings of the OSC Conference (Berkeley,
reliability for real-time applications. CA, USA, June 30 2004)
As we release the source code we will support the community’s [3] Freed, Adrian, Application of new Fiber and Malleable
applications and participate in ports to other processors with Materials for Agile Development of Augmented
initial focus on PIC controllers with integrated Ethernet. Instruments and Controllers, in Proceedings of the NIME
The code structure of uOSC anticipates the desire to port to new Conference, (Genova, Italy, 2008)
microprocessor targets by isolating platform-independent code [4] Freed, Adrian; Avizienis, Rimas and Wright, Matt, Beyond
components. We are exploring the implementation of uOSC on 0-5V: Expanding Sensor Integration Strategies, in
the ATmega controllers employed on the Arduino and Wiring Proceedings of the NIME Conference (Paris, France,
platforms. These implementations rely on a separate USB serial 2006), 97-100
controller instead of using integrated USB. Therefore they
[5] Hayes, Brian, Computing in a Parallel Universe,
cannot implement different USB protocols. They are also more
American Scientist, Volume 95, Issue 6, 2007, 476-480
expensive and have slower performance than PIC18F systems
for time-critical applications. The Wiring platform, for [6] Overholt, Dan, Musical Interaction Design with the
example, has three different unconnected clock domains. Many CREATE USB Interface: Teaching HCI with CUIs instead
Arduino-compatible systems such as the Lilypad use the cheap of GUIs,, in Proceedings of the ICMC (New Orleans, LA,
integrated clock that is neither accurate nor precise. We have USA Juny 11 2006)
achieved some success using forward and backward [7] Romkey, J., A Nonstandard for Transmission of IP
sychronization on the host side to obviate these problems [8] Datagrams over Serial Lines: SLIP, RFC 1055,
but we strongly encourage designers of future physical http://rfc.net/rfc1055.html, 1988
computing platforms to carefully study these timing and
performance issues. [8] Schmeder, Andy and Freed, Adrian, Implementation and
Applications of Open Sound Control Timestamps, in
14. ACKNOWLEDGEMENTS Proceedings of the ICMC (Belfast, Ireland, 2008)
We gratefully acknowledge the financial support of Sennheiser, [9] Wessel, David and Wright, Matthew, Problems and
the pioneering implementations of Making Things by Liam Prospects for Intimate Musical Control of Computers,
Staskawicz, and to Dan Overholt who brought the integration Computer Music Journal, Volume 26, Issue 3, 2002, 11-22
advantages of the PIC processors to our attention with his CUI
[10] Wright, Matthew, The Open Sound Control 1.0
board.
Specification, http://opensoundcontrol.org/spec-1_0
[11] Wright, Matthew; Cassidy, Ryan J. and Zbyszynski,
Michael F., Audio and Gesture Latency Measurements on
Linux and OSX, in Proceedings of the ICMC (Miami FL,
USA, 2004) 423-429
[12] The Universal Serial Bus Specification Revision 2.0,
http://www.usb.org, April 27, 2000.
[13] MCHPFSUSB User’s Guide, DS51679A, Microchip
Technology Inc., 2007
180
Addressing Classes by Differentiating Values and

Properties in OSC
Timothy Place,a Trond Lossius,b Alexander Refsum Jensenius,c

Nils Peters,d Pascal Baltazare
a
Electrotap, tim@electrotap.com
b
BEK - Bergen Center for Electronic Arts, lossius@bek.no
c
University of Oslo & Norwegian Academy of Music, a.r.jensenius@imv.uio.no
d
CIRMMT, McGill University, Montréal, nils.peters@mcgill.ca
e
GMEA, pb@gmea.net
ABSTRACT destinations of OSC messages received by the OSC server

An approach for creating structured Open Sound Control and correspond to each of the points of control that the
(OSC) messages by separating the addressing of node values application makes available. ‘Invoking’ an OSC method
and node properties is suggested. This includes a method is analogous to a procedure call; it means supplying the
for querying values and properties. As a result, it is possible method with arguments and causing the method’s effect to
to address complex nodes as classes inside of more complex take place.”
tree structures using an OSC namespace. This is particu- Our proposal extends current OSC concepts by consider-
larly useful for creating flexible communication in modular ing nodes to represent classes in an object-oriented sense,
systems. A prototype implementation is presented and dis- rather than simple methods. For the purpose of this dis-
cussed. cussion, we will be considering only nodes that contain one
or more methods and/or properties. Properties provide ad-
ditional information concerning how the node behaves and
Keywords responds to methods, e.g. by specifying how a parameter
OSC, namespace, Jamoma, standardization interpolates to a new value. A node might or might not
have a value. If it does possess a value property, that value
1. INTRODUCTION may be set directly, as it is considered an implicit property
Open Sound Control (OSC)1 has evolved into the de facto of the node. A node may branch out to additional nodes,
standard in the computer music community for communi- as in existing OSC practice.
cation in and between controllers and sound engines [11]. This paper starts by reviewing various approaches to cre-
OSC is a protocol for transmitting messages where the ad- ating more complex communication using OSC. This is fol-
dressing of nodes is based on a “slash” notation similar to lowed by a presentation and discussion of our suggested
URLs. As such, OSC is focused on standardizing the com- approach, introducing the use of a colon for differentiating
munication of messages. There is, however, no prescribed between values and properties. Finally, a prototype imple-
standardization of the namespaces or the structure of these mentation built in Jamoma is presented and discussed.
namespaces.
The authors are involved in developing OSC namespaces 2. COMPLEX STRUCTURES IN OSC
for the Jamoma project.2 Jamoma is a modular system for The original idea of OSC is that it is tree-structured into
the Max/MSP/Jitter environment. It uses OSC for inter- a hierarchy called the address space, where each of the nodes
nal and external communication [6]. As Jamoma’s modular has a symbolic name and is a potential destination of OSC
structures grow more complex, we find the bi-dimensional messages [10]. In contrast to the static schema of MIDI,
namespace conventions of OSC to be inadequate for ad- the open nature of OSC means that the address space is
dressing our constructs. OSC standardizes the addressing defined and created by the “implementor’s idea of how these
of a node, but it becomes increasingly unclear what to do features should be organized” [11, p. 153].
once we reach the node. The problem is exacerbated when This open approach has made OSC useful in a broad
the node itself implements its own OSC namespace, as is range of applications, and adaptable to situations not fore-
the case with Jamoma. seen by its developers [9]. However, this lack of standard-
The OSC 1.0 specification [8] considers nodes only in ization in namespace schemas is also likely a major reason
terms of their methods: “OSC Methods are the potential that OSC has not gained more widespread use in commer-
1 cial software applications.
http://www.opensoundcontrol.org
2
http://www.jamoma.org
2.1 OSC Namespace Standardization
A growing expanse of projects, including the research on
mapping between controllers and sound engines, require the
Permission to make digital or hard copies of all or part of this work for ability to discover and query namespaces using a known and
personal or classroom use is granted without fee provided that copies are common syntax [4]. Other projects, such as LIBOSCQS3
not made or distributed for profit or commercial advantage and that copies attempt to solve the problem of disparate namespaces by
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific providing a query system and service discovery for applica-
permission and/or a fee. tions using the OSC protocol [2, 7].
NIME08, Genova, Italy 3
Copyright 2008 Copyright remains with the author(s). http://liboscqs.sourceforge.net
181
Several projects have undertaken a standardization of

OSC messages, and OSC syntax, for different purposes. In
node / node / node
actual practice, these independent efforts at standardizing /gain
namespaces incorporate syntactic elements with conflicting /unit {midi, dB, linear}
meanings as compared to each other. However, there are /data type {int, float}
some commonalities to these efforts and the problems that /module /audio
/description
they try to address. /value
/name
2.1.1 Querying Nodes /description /mix
A primary concern in many of these efforts is the ability to /unit {percentage}

query a node for it’s value. The Integra project4 uses a .get /data type {int, float}
/description
appended to the node’s address. Meanwhile, Jazzmutant’s /value
OSC 2.0 Draft Proposal suggests repurposing the reserved
? to query for the value of a node [3]. We agree that
this functionality is needed, and that a standardized way
of doing it is essential. However, we propose that users are Figure 1: An OSC address tree, with some nodes
interested not only in querying the value of the node, but as classes
other properties of that node as well.
2.1.2 Specifying Additional Information <audio><gain> 0 </gain></audio>

The standardization of OSC namespaces for interfacing This is clearly more cumbersome than the equivalent OSC
with VST Plug-ins was suggested in [12], where units may message:
be specified for the value that is being sent to or from a /audio/gain 0
node. In this proposal the units are specified within the It is also more work for the receiving processor to parse
namespace. For example, /low/output and /low/dBoutput and it uses more bandwidth.9 However, XML elements are
are two ways of controlling the same thing (gain) but spec- able not only to express a value (content in xml parlance)
ified using different units. This approach is similar to how between the tags, but also they can provide properties (at-
we have previously addressed different units in Jamoma, e.g. tributes) to the node. For example, we may provide the
for specifying gain in either MIDI units: /audio/gain/midi unit for specifying the gain:
or dB: /audio/gain. While such an approach may be ben- <audio><gain unit=‘dB’> 0 </gain></audio>
eficial in some contexts, we find that a more structured We suggest a model where it is possible to fork an OSC
approach could be beneficial in more complex setups. We address to access not only the value of the node, but also
therefore propose that the units should be specified as a the properties of that node, much like what is possible in
property of that node, rather than contaminating the names- other existing models such as XML.10
pace itself.
2.3 OSC Nodes as Classes
2.1.3 Augmented Syntax In the introduction we made reference to the OSC 1.0
A review of the myriad of attempts at creating standard- Specification, which states that an OSC node represents
ized OSC schemas, and standardized means of discovering a function call on a server. We propose that OSC nodes
and querying namespaces, indicates that additional syntax may represent more complex classes, and thus require a
is needed for clarifying function, address, or both when mechanism to address the members of these classes. A class
working with a complex OSC system. Integra, Jazzmutant, member may be a property or a method.
and Jamoma are all examples where additional symbols, In the Figure 1, the OSC namespace from the previous
such as the colon, have reserved (if different) meanings. examples is shown across the x-axis. Traversing the OSC
To investigate possible alternatives for sending this struc- namespace is an action making a horizontal traversal across
tured information, it is useful to observe how existing method- the figure. In the next section, we will suggest a new syn-
ologies represent and send data over a network. tax for traversing this diagram on the y-axis to address the
members inside of these nodes.
2.2 XML
2.4 Introducing the Colon Separator
Extensible Markup Language (XML)5 is a particularly
relevant analogue to OSC. XML defines a means for for- In addition to the ASCII symbols already reserved for
matting data, but not the data or anything specific to a specific purposes within the OSC protocol [10], we introduce
schema or namespace [1]. the colon “:” as a separator between the OSC address of a
A number of standardized namespaces using XML have node and the namespace for accessing the members of the
gained wide adoption, including Scalable Vector Graphics node:
(SVG)6 , XHTML7 and SOAP8 . SOAP is of particular in- <node address> <value>
terest because it is designed as a protocol for exchanging <node address>:<member address> <value>
structured information. The former message sets the value of the node just as it
Using XML, information is encapsulated into elements. would using the existing OSC conventions. This is because
These elements form a tree structure analogous to an OSC a property named value is considered to be implicitly ad-
message. Using XML elements, one way to represent the dressed if there is no specific member address given. The
previous audio gain example is thus: 9
The Efficient XML Interchange (EXI) Format solves many
4 of these concerns with XML, but at the expense of human
http://www.integralive.org readability because it is a binary format. http://www.w3.
5
http://www.w3.org/XML/ org/TR/2007/WD-exi-20071219/
6 10
http://www.w3.org/Graphics/SVG/ A more concise option than XML, albeit with less clar-
7
http://www.w3.org/TR/xhtml1/ ity and interoperability than OSC, is JSON (http://www.
8
http://www.w3.org/TR/soap/ json.org)
182
latter form calls or sets a member of the node. The member

itself is addressed using a fully-qualified OSC namespace. /gain
Again using gain as an example, we can send two messages: /unit {midi, dB, linear}
one for setting the unit property and one for setting the /type {int, float}
/description
value. /value
/module /audio
/module/audio/gain:/unit midi /drive
/module/audio/gain 120 /name /name
Section 3 provides an illustration of the ideas suggested /description /granularity
here. In the remainder of the discussion, the address of the /ramp
node will be omitted for the sake of brevity; e.g. /function
/computer/module/parameter:/member will be abbreviated /name

as :/member. /coefficient
2.5 Standardizing Members

To make working with classes in OSC practical, it is
important to have some standard members in place. At Figure 2: An example OSC address tree to nodes
present we recommend standardizing the following member within another node
methods, and reserving their syntax:
• :/get returns the value of the node. 3.4 Ramping to New Values
• :/dump returns the state of the node, which is to say The ability to smoothly move from one value to another is
the values of all of the properties including the value fundamental to any kind of transition and transformation of
itself. musical or artistic material. Jamoma offers the possibility
• :/namespace returns the namespace implemented at of interpolating from the current to a new value in a set
this node. amount of time. While the OSC message
• :/catalog returns an enumeration of available options /myComputer/myModule/myParameter 1.0
for a node, if relevant. will set the parameter value to 1.0 immediately, the message
/myComputer/myModule/myParameter 1.0 ramp 2000
3. A PROTOTYPE IMPLEMENTATION will cause the value to interpolate, or ramp, to 1.0 over 2000
The general concepts introduced in this paper form the milliseconds. Ramping in Jamoma works with messages and
basis of the standardized namespace for node members used parameters of type integer, float and list.
in Jamoma. The following uses select aspects of the Ja- Jamoma offers vastly extended possibilities in how ramp-
moma node namespace to illustrate how class-oriented ad- ing can be done as compared to Max. In Jamoma the pro-
dressing in OSC can provide users with extended and struc- cess of ramping is made up from the combination of two
tured control of available nodes. components: A drive mechanism triggers calculations of
Jamoma distinguishes between the parameters and mes- new values at desired intervals during the ramp, while a set
sages of a module. Both parameters and messages are ad- of functions offers a set of curves for the ramping. Libraries
dressed as OSC nodes. The primary difference is that pa- for both components are implemented as C++ APIs, and
rameter nodes implement a value property. The remaining can easily be extended.
properties of these nodes are shared.
3.4.1 Ramp Drive
3.1 Node Type The ramp drive in Jamoma is implemented as a library
The type of the node can be specified. Possible types are of self-contained classes, coined RampUnits. The existing
none, boolean, integer, float32, symbol and list. If one do classes include a scheduler drive using the Max scheduler, a
not want to restrict the type of the node, it can be set to queue drive running in the Max queue, and an async drive
generic. The none type is only valid for messages. Some of which calculates output only when an update is requested.
the properties below will only be valid for certain types of The ramp units internally perform normalized linear ramps.
nodes. The type property is accessed thus: The values are then mapped using the appropriate Functio-
:/type :/type:/get nUnit as discussed in Section 3.4.2 and scaled to the appro-
priate range.
3.2 Controlling the Node Itself
As the node value is considered an implicit property, it 3.4.2 Ramp Function
can be set and retrieved as such. If the node is an inte- The ramp function in Jamoma is handled by the Jamoma
ger, float or list type it can also be stepwise increased or FunctionLib. The FunctionLib provides normalized map-
decreased. If so the size of the steps is itself a property: pings of values x ∈ [0, 1] to y ∈ [0, 1] according to functions
:/value :/value:/get y = f (x). Currently five FunctionUnits are implemented:
:/value/stepsize :/value/stepsize:/get Linear, cosine, lowpass series, power function and hyper-
:/value/inc bolic tangent. There are plans to expand the FunctionLib
:/value/dec with additional functions.
3.3 Controlling the Range 3.4.3 OSC Namespace for Ramping Properties
For integer, float and list nodes a range can be specified. Ramping properties are addressed using :/ramp/drive
This can be useful for setting up auto-scaling mappings from and :/ramp/function OSC name classes.
one value to another, or for clipping the output range. The The ramping case provides an example of a node class
clipping property can be none, low, high or both. The range which contains other node classes, as illustrated in Figure 2.
properties are accessed thus: As discussed in Section 2.5 information on all available ramp
:/range/bound :/range/bound:/get units or functions can be requested with the standardized
:/range/clipmode :/range/clipmode:/get :/catalog method. If the current function or ramp unit
183
contains additional parameters, the namespace of the unit [4] J. Malloch, S. Sinclair, and M. M. Wanderley. From
can be retrieved by the :/namespace method, while :/dump controller to sound: Tools for collaborative
returns the state of the node: development of digital musical instruments. In
:/ramp/drive :/ramp/drive:/get Proceedings of the International Computer Music
:/ramp/drive:/catalog Conference, Copenhagen, 2007.
:/ramp/drive:/dump [5] N. Peters, S. Ferguson, and S. McAdams. Towards a
:/ramp/drive:/namespace Spatial Sound Description Interchange Format
:/ramp/drive:/catalog (SpatDIF). Canadian Acoustics, 35(3):64 – 65,
:/ramp/function :/ramp/function:/get September 2007.
:/ramp/function:/catalog [6] T. Place and T. Lossius. Jamoma: A modular
:/ramp/function:/dump standard for structuring patches in max. In
:/ramp/function:/namespace Proceedings of the International Computer Music
For instance the user can control how often the sched- Conference, pages 143–146, New Orleans, LA, 2006.
uler RampUnit is to update by setting the :/granularity
[7] A. W. Schmeder and M. Wright. A query system for
property of the ramp:
Open Sound Control. Draft Proposal, July 2004.
:/ramp/drive:/granularity
:/ramp/drive:/granularity:/get [8] M. Wright. The Open Sound Control 1.0
The same principles apply to the function units used for Specification. Version 1.0. Technical report, Avaliable:
ramping. http://opensoundcontrol.org/spec-1 0, 2002.
[9] M. Wright. Open sound control: an enabling
3.5 DataspaceLib technology for musical networking. Organised Sound,
In addition to the current RampLib and FunctionLib, 10(3):193–200, 2005.
work has started on the implementation of a DataspaceLib. [10] M. Wright and A. Freed. Open sound control: A new
The DataspaceLib will enable nodes to be addressed using protocol for communicating with sound synthesizers.
one of several interchangeable measurement units. For ex- In Proceedings of the International Computer Music
ample a gain parameter can be set using MIDI, dB or linear Conference, pages 101–104, Thessaloniki, 1997.
amplitude depending on the context and preferences of the [11] M. Wright, A. Freed, and A. Momeni. Open Sound
user. The OSC representation of this will be implemented Control: State of the Art 2003. In Proceedings of
as a set of properties to the node. The DataspaceLib is also NIME-03, Montreal, 2003.
meant to offer mapping between more complex interrelated [12] M. Zbyszynski and A. Freed. Control of VST plug-ins
coordinate systems, so that e.g. Cartesian and spherical using OSC. In Proceedings of the International
coordinates can be used interchangeably for description of Computer Music Conference, pages 263–266,
points in space, as it was proposed for SpatDIF [5]. Barcelona, 2005.
4. DISCUSSION
As discussed in Section 2.1.1 other projects have also pro-
posed standardizing the means of querying values of OSC
nodes and the OSC namespace in general. They propose
syntax that differs or conflicts with the suggestions put for-
ward in this paper as well as each other. The authors call
on the OSC developer community to work towards a stan-
dardized query system to extend the current OSC 1.0 spec-
ification, resolving these conflicts in the process.
At the same time we would like to point out that the
proposal put forward in this paper broadens the scopes of
the Integra project and Jazzmutant OSC 2 proposals by
integrating a querying system with the notion of nodes as
classes. The proposal set forward in this paper could thus
be considered one step in the direction of a more object
oriented approach to Open Sound Control.
5. ACKNOWLEDGMENTS
The authors would like to thank all Jamoma developers
and users for valuable contributions, and iMAL Center for
Digital Cultures and Technology for organizing a workshop
where the issues presented in this paper were discussed.
6. REFERENCES
[1] T. Bray, J. Paoli, C. M. Sperberg-McQueen,
E. Maler, and F. Yergeau. Extensible markup
language (xml) 1.0 (fourth edition). Technical report,
W3C, September 2006.
[2] M. Habets. OSCQS - Schema for Open Sound Control
Query. System version 0.0.1, 2005.
[3] Jazzmutant. Extension and enhancement of the OSC
protocol. Draft 25 July, 2007.
184
Microphone as Sensor in Mobile Phone Performance
Ananya Misra Georg Essl Michael Rohs

Computer Science, Princeton Deutsche Telekom Deutsche Telekom
University Laboratories, TU-Berlin Laboratories, TU-Berlin
35 Olden St. Ernst-Reuter Platz 7 Ernst-Reuter Platz 7
Princeton, NJ 08540 Berlin, Germany 10587 Berlin, Germany 10587
amisra@cs.princeton.edu georg.essl@telekom.de michael.rohs@telekom.de
ABSTRACT These parameters are then used to drive the various forms
Many mobile devices, specifically mobile phones, come equip- of parametric synthesis. We believe that the microphone of
ped with a microphone. Microphones are high-fidelity sen- mobile devices is a useful addition to the palette of sensory
sors that can pick up sounds relating to a range of physi- interactions for musical expression.
cal phenomena. Using simple feature extraction methods, In recent years there has been intensified work to create
parameters can be found that sensibly map to synthesis al- sensor based interaction and parametric playback on mobile
gorithms to allow expressive and interactive performance. devices. Tanaka presented an accelerometer based custom-
For example blowing noise can be used as a wind instru- made augmented PDA that could control streaming audio
ment excitation source. Also other types of interactions [14] and ShaMus uses both accelerometers and magnetome-
can be detected via microphones, such as striking. Hence ters to allow varied interaction types [5]. Geiger designed
the microphone, in addition to allowing literal recording, a touch-screen based interaction paradigm with integrated
serves as an additional source of input to the developing synthesis on the mobile device using a port of Pure Data
field of mobile phone performance. (PD) for Linux-enabled portal devices like iPaqs [8, 7]. Ca-
Mus uses the camera of mobile camera phones for tracking
visual references and motion data is then sent to an exter-
Keywords nal computer for sound generation [12]. Various GPS based
mobile music making, microphone, mobile-stk interactions have also been proposed [13, 15]. A review of
the general community was recently presented by Gaye and
co-workers [6].
1. INTRODUCTION The microphone-signal as a generic sensor signal has been
Many mobile devices come with the intrinsic ability to used previously in the design of various new musical instru-
generate sound and hence suggest their use as musical in- ments. For example, PebbleBox uses a microphone to pick
struments. Hence there has been an increasing interest to up collision sounds between coarse objects like stones while
find ways to make interactive music performance possible CrumbleBag picks up sounds from brittle material to con-
with these devices. trol granular synthesis [11]. Scrubber uses the microphones
An important step in this process is the discovery of ex- to pick up friction sounds and sense motion direction [3].
pressive ways to interact with mobile devices. In recent Live audio based sound manipulation based on microphone
years, optical sensing, keys, touch-pads and various motion pickup is a known concept. It has for instance been used
and location sensors have been explored for this purpose. by Jehan, Machover and coworkers[9, 10]. The audio was
In this work we consider the built-in microphone of mo- driven by an ensemble mix of traditional acoustical musical
bile phones as a generic sensor to be used for mobile per- instruments. Microphones are also used for control in non-
formance. One reason for using microphones is that they musical settings. For example, they can be used to derive
are integral to any mobile phone, no matter how basic. It position via microphone arrays [2].
seems natural to integrate microphones into mobile phone
performance as well. 2. TURNING MICROPHONES INTO SEN-
To this end we implemented stand-alone recording (half-
duplex) as well as simultaneous recording and playback
SORS
(full-duplex) for MobileSTK for Symbian OS mobile de- A goal of the project is the broad accessibility of micro-
vices [4] . Using this implementation we explore ways to phone recording for engineers and musicians interested in
use the microphone away from the traditional use of direct mobile phone performance. Hence it was natural to consider
recording. Instead we want to use it as a generic sensor to extension of MobileSTK to include microphone recording.
drive sound synthesis algorithms in expressive ways. We For this, it was necessary to recover the audio recording ca-
discuss a few basic parameter detection methods to extract pability which already existed in the original STK [1] for
and give abstract representations to the microphone signal. Symbian OS, make it accessible in the MobileSTK context
and offer examples of use of the capability. MobileSTK pro-
vides a range of digital filters and synthesis algorithms in
C++ and the capability to interact with and play sound.
Permission to make digital or hard copies of all or part of this work for Mobile Operating Systems are still in a process of matura-
personal or classroom use is granted without fee provided that copies are tion, which adds some complications to the development of
not made or distributed for profit or commercial advantage and that copies applications for this platform.
bear this notice and the full citation on the first page. To copy otherwise, to The complete architecture can be seen in Figure 1. The
republish, to post on servers or to redistribute to lists, requires prior specific core to make this possible is allowing recording audio from
NIME08, Genova, Italy the microphone. Then the microphone data is processed.
Copyright 2008 Copyright remains with the author(s). The processed data can either be used directly as output or
185
as input for parametric synthesis algorithms - in our case Mobile phone

instruments and unit generators of STK. Aside from the Microphone Speaker
audio signal, also other sensor data can be used to derive
1
*
0
2
control parameters. The next sections describe the details
3
Microphone
of each part of this architecture. input
Camera,
accelerometer, Synthesized
audio
2.1 Recording under Symbian other sensors
signal
Recording has been implemented for Series 60, 3rd edition
devices running version 9.1 of Symbian OS1 . The record- Processed
audio signal
ing class needs a CMdaAudioInputStream object for real-
time audio input streaming, and must implement the MMda- Digital
STK
signal
AudioInputStreamCallback interface. This interface in- unit
processing Control
cludes methods that are called when the input stream has generators
blocks parameters
been opened or closed, and when a buffer of samples has or input
been copied from the recording hardware. Reading the next
buffer of input audio from the hardware starts with a call
to the ReadL() function of CMdaAudioInputStream.
With this framework, half-duplex audio works simply by
Figure 1: Processing pipeline: Audio received by
ensuring that only one of the two audio streams—input or
the microphone passes through digital signal pro-
output—is open at any time. Full-duplex audio succeeds
cessing units. The output of these can be sent di-
if the output stream has been opened and is in use before
rectly to the speaker for playback, or act as control
the input stream is started. Experiments with the Nokia
parameters or input to STK unit generators.
5500 Sport phone yielded a delay of up to half a second
between audio recording and playback in full-duplex mode
within MobileSTK. It is also worth noting that the au- • Rewind() : Resets output index to 0 so that tick()
dio input buffer size is fixed for each phone. The Nokia starts returning samples from the beginning of the in-
5500 Sport and Nokia 3250 use audio input buffers of 1600 ternal storage buffer.
samples. Other Nokia S60 3rd edition phones use 4096-
sample buffers, while S60 2nd edition phones use 320-sample With this framework, using microphone input within Mo-
buffers. The buffer size may further differ for other mobile bileSTK involves creating a MicWvIn object and calling Open-
phone models. Mic(). Individual samples from the microphone can then
S60 2nd edition phones such as the Nokia 6630 use a be obtained via tick() and directed into a playback buffer
different full-duplex framework. Both the recording and or as input into processing and synthesis modules. A new
playback classes implement the MDevSoundObserver inter- version of MobileSTK, which contains these additions, is
face, which includes methods called when an audio buffer already available under an open software license.2
needs to be read or written. Both also have a CMMFDevSound
object. The recording and playback classes are run on dif-
2.3 Deriving control parameters from audio
ferent threads, and audio is passed between the two via a signals
shared buffer. As the older phones tend to have much less Audio signals are very rich in nature. They often contain
processing power, we focus on S60 3rd edition phones here. more information than is needed to identify certain physical
parameters. For example, loudness of an impact sound is
2.2 Additions to MobileSTK a good correlate of the impact strength, while spectral in-
Microphone input has been integrated into MobileSTK formation contains cues about the impact position [17, 16].
via a new MicWvIn class, which inherits from the WvIn class Separating these parameters and removing fine detail that
already in STK. MicWvIn acts as the interface between the does not influence the desired control is an important step
microphone and the rest of MobileSTK. As required, it in using the microphone as an abstracted sensor.
implements the MMdaAudioInputStreamCallback interface In many of our usage examples we are indeed not inter-
and contains a CMdaAudioInputStream object. In addition, ested in the content of the audio signal per se, but these
it holds two audio buffers. The CMdaAudioInputStream ob- more general physical properties that lead to the audio sig-
ject copies input samples from the recording hardware into nal. So a form of signal processing is necessary, which one
the first of these buffers. Samples from the first buffer are can think of as a simple version of feature extraction. Some
then copied into the second and typically larger internal relevant methods to do this for impact sounds (detecting
buffer, to be stored until they are sought elsewhere in Mo- impact moment, impact strength and estimates of spectral
bileSTK. content) have already been proposed in a slightly different
The interface provided by MicWvIn includes the following context [11]. Similarly [3] describes separation of amplitude
methods: and spectral content for sustained friction sounds. We im-
plemented the onset detection method from [11] to allow
• OpenMic() : Opens the audio input stream and starts impact detection.
recording. The buffer sizes and input/output modes Another relevant and interesting use of the microphone
can be set via this method. is as virtual mouthpiece of wind instruments. We take the
• CloseMic() : Closes the audio input stream. heuristic assumption that audio signal amplitude is a good
• tick() : Returns the next sample of audio, as read indicator of blow pressure; hence, arriving at an abstracted
from the internal storage buffer. This is inherited from pressure measurement means keeping a windowed-average
WvIn along with other ticking methods. amplitude of the incoming wave-form. This value is then
1
rescaled to match the expected values for a physical model
Resources on Audio programming for SymbianOS in C++ of a mouth piece as can be found in STK [4].
can be found at http://www.forum.nokia.com/main/
2
resources/technologies/symbian/documentation/ MobileSTK can be downloaded at http://sourceforge.
multimedia.html net/projects/mobilestk
186
2.4 Mobile phone camera as tone-hole phone input through a simple onset detector and play a
A tone-hole in a conventional instrument is a hole that sound file from memory each time it detects an onset. The
is covered or uncovered to control the produced sound. To file played contains either hi-hat samples or a more pitched
allow tone-hole-like behavior, we read information from the sound. The value of the input signal’s amplitude envelope
mobile phone camera. We let the camera lens act as a tone- at the time of onset detection, relative to its value at the
hole by computing the average grayscale value of the camera previous onset, determines which file is played. Thus if the
input image. When this value drops below a threshold, we input grows louder from one onset to the next, we hear
estimate that the camera lens (or metaphorically, the tone- the hi-hat, while the pitched sound is played if it becomes
hole) is covered. We can also estimate degrees of covering softer.
by setting several different thresholds. This simple example can be extended in various ways.
While this technique has drawbacks when used against For instance, which audio is played, and how, could be de-
a completely dark background, it succeeds in most nor- termined by other features of the microphone input or by
mal, reasonably bright surroundings. It also sets the stage information from other sensors. It would be feasible to in-
for further ways for camera and microphone information to clude several more drum samples and create a mini drum
complement each other in creating a unified expressive mu- kit. The onset detection could also control other unit gen-
sical instrument. For example, more dynamic input like the erators. However, the perceptible delay between cause and
strumming gesture of guitar-plucking can be sensed by the effect in full-duplex mode may prove more challenging to
camera and combined with microphone input. the striking paradigm than to other interaction models.
3.3 Other full-duplex instruments

3. EXAMPLES Some examples do not readily match an existing interac-
We now present a number of examples using the micro- tion model, but provide a well-defined way for the mobile
phone as a sensor in MobileSTK. Many of these are experi- phone to respond to any sound from the user or the en-
ments rather than full-blown instruments, meant to demon- vironment. One such instrument, the Warbler, maps the
strate some possibilities and encourage further exploration. windowed-average input amplitude to a frequency quan-
Some examples use the camera as an additional sensor to tized to a musical scale. This determines the frequency of
augment the instrument. Due to the limited processing a constantly playing sine tone. Thus, the sine tone’s pitch
power of current mobile phones, most of the examples use changes whenever the microphone input varies beyond a
simple unit generators such as sine waves or a plucked-string small range. As the loudness of the microphone signal in-
model. But as processing power grows, they can easily be creases, the sine tone plays at higher frequencies. If the mi-
extended to control more sophisticated unit generators. crophone signal is more perturbed, switching often between
loud and soft, the sine tone similarly switches between high
3.1 Blowing and low frequencies within the specified musical scale.
A simple breath-controlled instrument is the Sine-Wave- A similar instrument, the Plucked-Warbler, uses a plucked-
Whistle. The whistle tone is constructed from two sine string model instead of a sine wave. It maps the microphone
waves at slightly different frequencies, producing a beat. amplitude to a quantized frequency for the plucked-string,
When someone blows near the microphone so that the mi- and excites (plucks) the string only when this frequency
crophone input’s windowed-average amplitude is above a changes. Thus, the density of plucks reflects the stability of
threshold, the whistle tone is heard; otherwise, there is si- the microphone input, and by extension, the surrounding
lence. While the tone is heard, the precise value of the audio environment.
microphone amplitude controls either the relative gains of We have also applied the amplitude of the microphone
the two sine waves or the beat frequency. Given more pro- input to pure noise, to hear only the amplitude envelope
cessing power, this paradigm can expressively control other of the original signal. Another instrument counts the zero
wind instruments such as saxophone, flute, and blow bot- crossings in a lowpass-filtered version of the microphone in-
tle, with the precise amplitude value mapped to vibrato put to estimate its pitch. It then excites a plucked-string
frequency or gain, breath pressure, or other parameters. at a corresponding frequency once the input signal’s am-
Breath-controlled instruments using both microphone and plitude envelope has decreased after a temporary high (to
camera include the Press-to-Play, where covering the cam- minimize pitch confusion at a note onset). Zero crossing
era lens produces a tone and uncovering it results in silence. offers a rough correlate of spectral centroid.
Meanwhile, the microphone determines the pitch of the tone
via a continuous mapping between the average amplitude of 3.4 Half-duplex
the microphone input and the frequency of the sine tone. It While full-duplex audio enables real-time control via the
is also feasible, with better processing capability, to extract microphone, half-duplex allows a short audio segment to be
the pitch of the microphone input and apply the same pitch recorded and saved in memory while the application runs.
directly to the synthesized tone. The audio segment can then be re-used in conjunction with
Alternatively, in the Mini-Flute, another microphone and information from other sensors, including the current full-
camera instrument, covering the camera lens lowers the duplex microphone input. We created some simple exam-
pitch of the synthesized tone by a small amount. In this ples with two MicWvIn objects, one for full-duplex audio and
case, the tone is played at the amplitude of the microphone the other for standalone recording followed by playback.
input. Thus, blowing harder produces a louder sound, while One example records a few seconds of audio in no-playback
covering or uncovering the tone-hole modifies the pitch. mode. The user can then switch to playback-mode and re-
Hence the camera serves the function of the slider of a slider- peatedly play the recorded segment. However, it applies
flute or of a pitch bend wheel. the amplitude of the current (full-duplex) microphone in-
put to the previously recorded (half-duplex) segment being
3.2 Striking replayed. This allows interactive control over which parts
The ability to identify impact sounds in the microphone of the repeating audio loop stand out each time it repeats.
input allows the mobile phone to become a tapping or (gen- A second example continues to replay the recorded audio
tly) striking device. In one example, we pass the micro- from the previous instrument, but at a fixed gain. To this
187
it adds the current microphone input, so that one hears the Speech, and Signal Processing ICASSP), Las Vegas,
repeating loop as well as any noise one is currently making NV, April 2008.
near the microphone. This gives the performer a way to [3] G. Essl and S. O’Modhrain. Scrubber: An Interface
accompany himself in creating interactive music. Such an for Friction-induced Sounds. In Proceedings of the
instrument could also be modified to let the current micro- Conference for New Interfaces for Musical
phone input drive one of the instruments described earlier Expression, pages 70–75, Vancouver, Canada, 2005.
or be otherwise processed before reaching the output stage. [4] G. Essl and M. Rohs. Mobile STK for Symbian OS.
Another example, the Fast-Forward, uses the previously In Proc. International Computer Music Conference,
recorded audio samples along with camera input. In this pages 278–281, New Orleans, Nov. 2006.
case, the amount by which the camera lens is covered con- [5] G. Essl and M. Rohs. ShaMus - A Sensor-Based
trols the speed at which the recorded loop is played. If not Integrated Mobile Phone Instrument. In Proceedings
covered at all, the samples are played at normal speed. If of the International Computer Music Conference
the lens is slightly covered, every other sample is played, (ICMC), Copenhagen, Denmark, August 27-31 2007.
while if it is fully covered, every fourth sample of the pre- [6] L. Gaye, L. E. Holmquist, F. Behrendt, and
recorded segment is played. This example does not use full- A. Tanaka. Mobile music technology: Report on an
duplex audio at all, but allows the camera input to control emerging community. In NIME ’06: Proceedings of
playback in a way that would be difficult if the samples to the 2006 conference on New Interfaces for Musical
play were not recorded in advance. Expression, pages 22–25, June 2006.
[7] G. Geiger. PDa: Real Time Signal Processing and
4. CONCLUSIONS Sound Generation on Handheld Devices. In
Proceedings of the International Computer Music
We presented the use of the microphone of mobile phones
Conference, Singapure, 2003.
as a generic sensor for mobile phone performance. The fi-
delity and dynamic range, along with the types of physical [8] G. Geiger. Using the Touch Screen as a Controller for
effects that can be picked up via acoustic signals, make this Portable Computer Music Instruments. In Proceedings
an interesting addition to the range of sensors available in of the International Conference on New Interfaces for
mobile devices for mobile music making. The microphone Musical Expression (NIME), Paris, France, 2006.
is particularly interesting for picking up performance types [9] T. Jehan, T. Machover, and M. Fabio. Sparkler: An
that are not easily accessible to mobile devices otherwise. audio-driven interactive live computer performance
For example, the wind noise from blowing into the micro- for symphony orchestra. In Proceedings of the
phone can be used to simulate the behavior of a simple International Computer Music Conference, Göteborg,
mouthpiece of a wind-instrument, or just a police whistle. Sweden, September 16-21 2002.
At the same time the sensor also allows for other types [10] T. Jehan and B. Schoner. An audio-driven, spectral
of gestures to be detected, like striking. In addition, it analysis-based, perceptual synthesis engine. In
allows instant recording and manipulation of audio samples, Proceedings of the 110th Convention of the Audio
letting the samples heard in performance be directly related Engineering Society, Amsterdam, Netherlands, 2001.
to the venue. Audio Engineering Society.
The great advantage of microphone sensing in mobile de- [11] S. O’Modhrain and G. Essl. PebbleBox and
vices is their broad availability. While accelerometers are CrumbleBag: Tactile Interfaces for Granular
only just emerging in contemporary high-end models of mo- Synthesis. In Proceedings of the International
bile devices (Nokia’s 5500 and N95, Apple’s iPhone), micro- Conference for New Interfaces for Musical Expression
phones are available in any programmable mobile phone and (NIME), Hamamatsu, Japan, 2004.
offer signals of considerable quality. [12] M. Rohs, G. Essl, and M. Roth. CaMus: Live Music
One current limitation for interactive performance is the Performance using Camera Phones and Visual Grid
limited performance of current devices when using floating Tracking. In Proceedings of the 6th International
point arithmetic. This means that currently either all sig- Conference on New Instruments for Musical
nal processing has to be implemented in fixed-point or one Expression (NIME), pages 31–36, June 2006.
has to tolerate only somewhat limited computational com- [13] S. Strachan, P. Eslambolchilar, R. Murray-Smith,
plexity on processing algorithms. It’s very likely that this S. Hughes, and S. O’Modhrain. GpsTunes:
will change with the evolution of smart mobile phones. Al- Controlling Navigation via Audio Feedback. In
ready Nokia’s N95 contains a vector floating point unit, and Proceedings of the 7th International Conference on
the overall computational power is considerably higher than Human Computer Interaction with Mobile Devices &
the earlier Nokia 5500 model. One can expect this trend to Services, Salzburg, Austria, September 19-22 2005.
continue, making this limitation eventually obsolete. [14] A. Tanaka. Mobile Music Making. In NIME ’04:
Microphones offer yet another sensor capability that can Proceedings of the 2004 conference on New Interfaces
be used for mobile music performance and allow performers for Musical Expression, pages 154–156, June 2004.
to whistle, blow and tap their devices as a vocabulary of [15] A. Tanaka, G. Valadon, and C. Berger. Social Mobile
musical expression. Music Navigation using the Compass. In Proceedings
of the International Mobile Music Workshop,
5. REFERENCES Amsterdam, May 6-8 2007.
[16] K. van den Doel and D. K. Pai. The sounds of
[1] P. Cook and G. Scavone. The Synthesis ToolKit physical shapes. Presence, 7(4):382–395, 1998.
(STK). In Proceedings of the International Computer [17] R. Wildes and W. Richards. Recovering material
Music Conference, Beijing, 1999. properties from sound. In W. Richards, editor,
[2] H. Do and F. Silverman. A method for locating Natural Computation. MIT Press, Cambridge,
multiple sources using a frame of a large-aperture Massachusetts, 1988.
microphone array data without tracking. In
Oriceedubgs of the IEEE Conference on Acoustics,
188
A Mobile Wireless Augmented Guitar
N. Bouillot, Z. Settel J. R. Cooperstock

M. Wozniewski Université de Montréal McGill University Centre for
McGill University Centre for Montréal, Québec, Canada Intelligent Machines
Intelligent Machines zs@sympatico.ca Montréal, Québec, Canada
Montréal, Québec, Canada jer@cim.mcgill.ca
{nicolas,mikewoz}
@cim.mcgill.ca
ABSTRACT accompaniment and self-duo. The former is typical of In-

We present the design of a mobile augmented guitar based dian classical music instruments, which may provide har-
on traditional playing, combined with gesture-based con- monic resonance from a basic note or the more interesting
tinuous control of audio processing. Remote sound pro- gesture-based Satara double flute from Rajastan, in which
cessing is enabled through our dynamically reconfigurable the absence of holes in the second flute result in a steady
low-latency, high-fidelity audio streaming protocol, running drone. Self-duo can be seen in the playing technique de-
on a mobile wearable platform. Initial results demonstrate veloped by jazz musician Roland Kirk, in which a single
the ability to stream the audio and sensor data over IEEE musician plays two (or more) reeds at once. However, si-
802.11 to a server, which then processes this data and out- multaneous management of multiple instruments seriously
puts the resulting sound, all within a sufficiently low delay reduces the range of gestures available on each one. Play-
as required for mobile multimodal performance. ing techniques require considerable practice, for example,
to regulate circular breathing or coordinate new hand and
body positions.
1. INTRODUCTION Our particular implementation is based on a small form-
Electronic instrument augmentation can allow an already factor computer that runs a fixed-point version of Pure Data
skilled musician to exploit gestures to gain extra control [4]. Leveraging the Bluetooth communication capability of
over the resulting sound. Typically, additional sensors are the Gumstix, we can read accelerometer values, position in-
attached to the instrument or body in order to acquire formation (of IR sources), and button presses of a Nintendo
gesture-related data. Existing augmented instruments often Wii controller affixed to the guitar headstock. Interaction is
transmit sensor data using serial [10], USB [6] or Ethernet based on traditional playing technique, augmented by a set
interfaces [8]. Unfortunately, such wired interfaces can re- of simple gestures for invoking added functionality, such as
duce artists’ mobility during performance. Wireless sensor sample recording and looped playback. These features can
platforms have been proposed to overcome this situation. be used to simulate simultaneous performance with mul-
Implementations include the WiseBox [3], and Bluetooth tiple instruments, as described above. Gesture data and
Arduino [2], among others. However, these interfaces do the resulting audio must be communicated with sufficiently
not support wireless transmission of audio signals, instead low delay to allow the performer to understand the rela-
requiring cables, often analog, to send audio to the process- tionship between the gesture and corresponding sound. We
ing machine. thus developed an dynamically reconfigurable low latency
In related work, we proposed methods for high quality streaming engine used in this implementation.
wireless audio transmission [9], which alleviate the need for The remainder of this paper is organized as follows. Sec-
cables entirely. With the availability of a wireless capabil- tion 2 describes hardware and software components of our
ity for audio transmission, we were motivated to explore platform, keeping latency issues in mind. Section 3 presents
the potential of a mobile platform to support performance our mobile augmented guitar and its main feature: the
with an augmented musical instrument, in this case, a gui- gesture-based phase vocoder control using real-time sam-
tar. Using WiFi potentially lets the musician walk around pling. Mapping is also described and discussed. Before
a much larger space can also take advantage of remote pro- concluding, various applications enabled by our platform
cessing capabilities of computers over IP rather than requir- are explored.
ing all audio gear to be situated locally avoids interference
from common devices such as cellular telephone. In term 2. WIRELESS PLATFORM
of musical interaction, our intent was to demonstrate how The mobile platform architecture is presented in Figure 1.
such an instrument could provide for musical expression The Gumstix captures audio data through its input jack and
that would otherwise require an inordinate investment of obtains gesture input and button-press information from
training, or additional musicians. Examples include self- the Wii controller over Bluetooth. Both the audio sam-
ples and sensor data are then transmitted by IEEE 802.11g
(WiFi) to a laptop computer, where signal processing func-
tions are performed. Open Sound Control (OSC) is used as
Permission to make digital or hard copies of all or part of this work for the protocol for control data.
For wireless transmission of audio data, we developed a
bear this notice and the full citation on the first page. To copy otherwise, to dynamically reconfigure low-latency, uncompressed audio
republish, to post on servers or to redistribute to lists, requires prior specific streaming protocol1 as a Pure Data external for both the
permission and/or a fee. 1
NIME08, Genova, Italy It is worth noting that the industry is moving toward sup-
Copyright 2008 Copyright remains with the author(s). port for low-latency transmission of lossless audio over Blue-
189
press or simple gesture) are required. While similar func-

tionality could be obtained by a combination of other less
“game-oriented” devices, we chose the Wii so that we could
concentrate on the higher level issues involving musical ex-
pression rather than devote unnecessary resources to hard-
ware concerns.
2.3 Remote sound processing

We first attempted to perform digital signal processing
on the Gumstix as an autonomous unit. To provide an
approximate reference of CPU demand for audio processing
(beyond the stereo audio I/O overhead), we were able to
Figure 1: Mobile wireless platform architecture run a maximum of one medium quality mono in/stereo out
reverb module, zverb [1], consisting of the following units
and their interconnections: nine 1-tap early reflection delay
Gumstix and the DSP machine (laptop). lines, one 4-tap feedback delay with quad circulator matrix,
The remainder of this section discusses our architecture, four low-pass filters, four interpolated multiple gain scalers,
the gestural interface, and explores issues of computational and several addition and multiplication units. Attempts to
requirements and low-latency streaming as necessary for au- render multiple audio effects while performing low-latency
dio feedback. streaming and sensor processing quickly overwhelmed the
limits of the 400MHz Gumstix processor.
2.1 Gumstix and “Pure Data anywhere” We have no doubt that future small form-factor comput-
Gumstix computers [5] can be extended with various ex- ers will prove capable for more complex tasks. However,
pansion boards for communication, I/O, and memory. Our given current limitations, we opted, instead, to prototype
platform is composed of a Gumstix main board (Verdex using a remote sound processing server. The resulting im-
XM4-bt with Bluetooth) with two expansion boards for au- plementation described here is thus a first step toward au-
dio and WiFi. Figure 2 shows the Gumstix, a portable USB tonomous, augmented musical interaction.
power pack, and the two antennas needed for Bluetooth and
WiFi. A soundcard provides mini-jack input and output. 2.4 Low-latency data streaming
Various delays are involved in the communication of au-
dio during streaming. These can be broken down into the
packetization delay, dp , corresponding to the time required
to fill a packet with data samples for transmission, and net-
work delay, dn , which varies according to network load and
results in jitter at the receiver. Received data is typically
held in a playback buffer, designed to maintain a constant
latency and thus mask the effects of jitter and late arriving
packets.
Packetization delay is calculated as:
dp = nsp /f seconds (1)
where nsp is the number of samples per packet and f is
Figure 2: Gumstix size relative to a quarter the audio sampling frequency. Thus, we can affect dp by
varying the number of samples per packet, or changing the
The software development for arm-based 400MHz Gum- audio sampling frequency. During experiments, we set f to
stix processors (Marvell’s PXA270) requires a cross-compilation 44.1kHz.
tool-chain to generate binaries. Floating point computa- In order to minimize latency, we developed a prototype
tions are allowed, but computed through software emula- dynamically reconfigurable transmission protocol, nstream,
tion, which means that Gumstix DSP code must be com- which supports both unicast and multicast communications.
puted using fixed-point operations. For local Gumstix pro- It permits dynamic adjustment of sender throughput by
cessing, we use Pure Data anywhere (PDa)[4], a rewrit- switching between different levels of PCM quantization (8
ten fixed-point Pure Data (Pd) version.2 This allows us bit, 16 bit and 32 bit) and number of samples per packet
to cross-compile existing externals such as OSC, but also (nsp ), as well as receiver buffer size, as suitable for low-
to develop platform-related software such as real-time dy- latency, high quality audio streaming over wireless networks.
namically reconfigurable low-latency streaming (see Section This protocol is developed as a Pd external and allows inter-
2.4) as well as sensor acquisition and filtering. Our software, operability between (the fixed-point) PDa and (the floating
along with PDa, easily fits within the base 15MB of memory point) Pd.
supplied with the Gumstix motherboard. Best results were obtained by streaming audio using nsp =
64 with a receiver buffer of two packets, which corresponds
2.2 Controller interface to (64/44.1) x 3 = 4.35ms. We measured average ping time
The Wii remote provides a number of powerful capabili- of 2ms, suggesting a network delay of approximately 6ms.
ties in a single unit that greatly simplify prototyping where The Wii controller transmits sensor data at a rate that
both accurate spatial positioning (e.g., temporal sweeping could be detrimental to the performance of a WiFi network,
through a sound file) and mode selection (e.g., by button especially when already loaded with uncompressed audio.
tooth, as evident by Qualcomm’s acquisition of the Open As an example, raw camera data is transmitted at a rate
Interface Soundabout audio codec. of 197Hz. Therefore, the Gumstix is used to filter such
2 data and transmit control events only when relevant to the
This is based on Pd v0.37 and was motivated for use on
PDA devices. augmented guitar.
190
3. AUGMENTED GUITAR AND MAPPING to the x -axis, allowing the performer to play desired sections
The development of a mobile augmented guitar involves of the material with horizontal motions. This is interesting
several issues. One challenge is the delay between acqui- when harmonic progressions have been recorded, allowing
sition of gestures and related feedback. Wireless audio musicians to explore tonal or atonal accompaniment, as de-
streaming must be sufficiently fast to allow for real-time in- scribed in Section 4.
teraction and control. The 6ms of network delay, combined
with minimal OSC overhead, was regarded as sufficient dur-
ing experimentation.
Figure 4: Gesture-based sample reading
Note that the gestures we support in the current proto-

type (jerk and pointing along the x or y axes) are extremely
(a) Wiimote fixed on (b) Controls simple to recognize in that we need only apply a peak de-
headstock tector to determine the jerk and can use the blob detection
functionality of the IR camera to provide accurate position
Figure 3: Sensors and controls information.
To maximize performer mobility, all augmented function- 3.2 Controls

ality can be invoked through gestures made with the gui- In addition to the phase vocoder described above, several
tar and sensed by the attached Wii controller (see Figure additional features can be accessed using the Wii controller.
3(a)). These gestures include simple jerking motions and The full list of mappings, including distortion, muting, and
pointing with the neck of the guitar. Additionally, buttons background loop control is illustrated in Figure 3(b). The
are easily accessible for changing higher-level parameters. signal recorded into the background loop contains not only
Musicians can start/stop the recording of an audio sam- the direct guitar signal, but also the result of applied dis-
ple from the instrument using a quick flick of the guitar, tortion and the output of the phase vocoder. It should also
avoiding the need for non-mobile controllers, such as a tra- be noted that there is an additional play control for looping
ditional pedal. Recorded samples can be played back using the sample used by the phase vocoder instead of using IR
the infrared sensor, which controls the position of a play time indexing.
head. An FFT-based phase vocoder maintains the pitch of The augmented guitar has four features that can be tog-
the original recorded material. An additional background gled on and off, including background loop recording, dis-
loop can also be recorded in separate memory space. To- tortion, phase-vocoder sample recording and phase-vocoder
gether, these techniques allow for gesture-based processing pitch shift. The states of these features are shown by the
to accompany traditional guitar playing. four LEDs on the Wii controller. This provides feedback
that is easily visible when the musician glances at the head-
3.1 Gesture-based phase vocoder stock, and is particularly useful when the accelerometer-
Accelerometers are used to detect fast movements per- based is used activator since detection errors occasionally
formed by the musician, which are mapped to control the occur. Activation might not be detected when performer
recording of musical samples. Once a sample has been movements are too slow, whereas long motions can cause
stored, the infrared sensor on the Wii controller is used two detections sequentially, causing the function to be tog-
to capture the x and y coordinates of an IR light source gled on and off.
(e.g., the sensor bar provided with the Nintendo Wii sys- While accelerometers and IR sensors can easily be used
tem). This is used to control a phase vocoder based on a Pd during play, buttons may require gestures that disable gui-
patch by Miller Puckette. The vocoder allows for selective tar playing during the time taken to push the button. This
continuous listening to a small portion of a sound [7]. As leads us to consider the spatial layout, and the correspond-
seen in Figure 4, the x -axis is mapped to a time index in the ing feature to which each button is mapped. As seen in
current stored sample. A small window around that index Figure 3(a), buttons 1 and 2 can be pressed easily by the
is analyzed with an FFT to determine the spectral content thumb when the guitarist’s left hand is not using the finger-
of the location, allowing an inverse FFT to generate a time- board. This allows the guitarist to control background loop
stretched sound with equivalent pitch. The y-axis controls recording and distortion activation without changing hand
the pitch of the resulting sound at any location. position on the neck. To press other buttons, the hand must
When the pitch-shifting functionality is enabled, both x leave the fingerboard, so we assign those buttons to less fre-
and y axes coordinates are used to manipulate sound, allow- quently used functionality. Since the phase vocoder can
ing the musician to control playback with two dimensional be used without simultaneously playing the guitar, keeping
movements, such as curved trajectories with the headstock the hand on the fingerboard is unnecessary, and buttons
or dance-like gestures. In the case that only time-indexing can become an integral part of the interaction. Future in-
is desired, then the pitch-shifting feature can be disabled vestigations will include other mappings and other kinds of
with a button press. Interaction thus becomes constrained sensors to enable or disable functionality. Important factors
191
are mobility and ease of use during play. Fortunately, experimental results demonstrate that our dy-
namically reconfigurable audio streaming protocol satisfies
4. MUSICAL PRACTICE both the timing and fidelity requirements for the demands
of musical performance.
Several body movements are easily achievable by the mu- The gesture-controlled phase vocoding features of our sys-
sicians while playing the guitar. For example the neck can tem suggest some interesting application possibilities, in-
be moved horizontally and vertically while the musicians cluding note matching for counterpoint techniques and mu-
can walk, run and jump in any direction. The Wii con- sical transcription, where sample-part selection is controlled
troller, attached to the guitar’s headstock, easily allows for by the user’s movements.
the detection of neck movements; it provides control of the Our experiences with this platform lead us to believe that
augmented guitar features while the instrument is being in the near future, a more capable mobile platform will ex-
played simultaneously. This allows a musician to experi- tend the range of pedagogical and artistic practice. This
ment with various forms of self interaction within solo or may open up a new range of interaction possibilities in such
group performance.3 Since performers are able to move creative areas as music, theater and dance.
around within the WiFi 802.11g range (100 meters), they
may interact with the surrounding environment in addi-
tional ways. For example, a composer can arrange the space 6. ACKNOWLEDGMENTS
with ambient light or moving objects, with which the per- The authors wish to acknowledge the generous support
former can interact. Multi-user interactions can also include of NSERC and Canada Council for the Arts, which have
IR LEDs or reflectors worn to augment group interaction. funded the research and artistic development described in
In terms of musical dialogues, the augmented system pro- this paper through their New Media Initiative.
vides some interesting techniques for an individual performer
that usually cannot be accomplished with a single instru- 7. REFERENCES
ment. We consider the possibility of three such dialogues:
[1] nSlam: TOT [Territoires Ouverts - Open Territories]
Website. http://tot.sat.qc.ca/logiciels nslam.html.
• Questions and answers: Samples can be recorded while
[2] Arduino. http://www.arduino.cc.
playing, and then pointing the guitar in a particular
direction replays the melody or sound in a different [3] F. Bevilacqua, F. Guédy, N. Schnell, E. Fléty, and
order and speed. N. Leroy. Wireless sensor interface and
gesture-follower for music pedagogy. In NIME ’07:
• Self-duo: The same technique as questions and an- Proceedings of the 7th international Conference on
swers, but pointing and simultaneously playing the New Interfaces for Musical Expression, pages
guitar allows one to perform counterpoint-like inter- 124–129, New York, NY, USA, 2007. ACM.
actions. [4] G. Geiger. PDa: Real time signal processing and
sound generation on handheld devices. In Proceedings
• Self-accompaniment: Recorded chords can be played of International Computer Music Conference
back using dance-like movements while simultaneously (ICMC), 2003.
playing guitar. [5] Gumstix. www.gumstix.com.
[6] S. Schiesser and C. Traube. On making and playing
The dialogues that allow for playing two melodies simul- an electronically-augmented saxophone. In NIME ’06:
taneously can be interesting in terms of tonal or atonal ex- Proceedings of the 2006 Conference on New Interfaces
perimentation. Particularly, when pitch shift is disabled, for Musical Expression, pages 308–313, Paris, France,
a musician can record a progression of chords or a melody France, 2006. IRCAM Centre Pompidou.
that can be used as the sample for the phase vocoder. The
[7] Z. Settel and C. Lippe. Real-time musical applications
musician can then experiment with various guitar-played
using frequency-domain signal processing. In IEEE
harmonies while triggering sounds from the recording using
ASSP Workshop Proceedings, 1995.
body movements.
[8] D. Wessel, M. Wright, and J. Schott. Situated trio:
An interactive live performance for a hexaphonic
5. CONCLUSIONS guitarist and two computer musicians with expressive
We presented the design and investigation of a mobile controllers. In International Conference on New
wireless augmented guitar, in which powerful control mech- Interfaces for Musical Expression (NIME), pages p.
anisms allow the user to control sonic events through sim- 171–173, Dublin, Ireland, 2002.
ple gestures while simultaneously playing the instrument. [9] M. Wozniewski, N. Bouillot, Z. Settel, and J. R.
Augmentation includes sample recording, looping, distor- Cooperstock. Large-scale mobile audio environments
tion and a gesture-based phase vocoding sample reader. We for collaborative musical interaction. In International
described our experience with the initial prototype, as well Conference on New Interfaces for Musical Expression,
as new musical practices that it supports, including gesture- Genova, Italy, 2008.
based self-duo and self-accompaniment. [10] A. K. E. Yang and A. T. P. Driessen. Wearable
The benefit of mobility, such as that provided by our sensors for real-time musical signal processing. In
small form-factor wireless system, is that its features can IEEE Pacific Rim Conference on Communications,
be used in classrooms or during rehearsals and do not de- Computers and signal Processing PACRIM, Aug.
pend on the technology available at a particular venue. 2005.
However, the computational demands for signal processing
necessitate, at present, the use of a remote computer for
generating the audio output. In such a context, remotely
computed audio raises the crucial issue of feedback latency.
3
Demo videos are available at
http://cim.mcgill.ca/∼nicolas/NIME2008/
192
A Mobile Music Environment Using a PD Compiler and

Wireless Sensors
Robert Jacobs, Mark Feldmeier, Joseph A. Paradiso

Responsive Environments Group
MIT Media Lab
20 Ames St., Cambridge MA 02139, USA
rnjacobs@alum.mit.edu, carboxyl@mit.edu, joep@media.mit.edu
ABSTRACT
Keywords
1. INTRODUCTION
2. COMPILER
republish, to post on servers or to redistribute to lists, requires prior specific
NIME08, Genoa, Italy
193
3. PHYSICAL INSTANTIATION
194
loadbang table GhPdR 352838 loadbang loadbang t b b
1.00003e+06 1.5e+06 22050
read gpprog.txt read msprog.txt table dl 423360 read -raw 44 1 2 n plain.wav plain 15
s gpprog s msprog table SqBt 109189 unpack f f f
read -raw 44 1 2 n SqBt.wav SqBt
table gpprog 64 table msprog 64 1.5e+06 22050
table plain 607345 read -raw 44 1 2 n GhPdL.wav GhPdL tabread4~ SpDr 1.00003e+06 / 1000
/
read asprog.txt table SpDr 1.5e+06 read -raw 0 1 2 n DL.raw dl osc~ 0.04 / -
s asprog table GhPdL 352838 read -raw 44 1 2 n SpDr.wav SpDr osc~ 0.007 f f f (2 sec)
netreceive 9999 1 table asprog 128 table MjSt 7746 read -raw 0 1 2 n majorstrings MjSt *~ *~ pack f f pack f f pack f f
route d796F kick d7970 X Y Z gX gY table OCs 49000 read -raw 44 1 2 n OilCans16.wav OCs dac~ spacedrone predicted period (1 sec)
unpack 0 0 0 0 0 unpack 0 0 0 0 0 soundfiler r current-rate delay delay delay
0
footacc footrotate handacc handrotate t b b
0 45808 t f b / 2 f + 1 tabread asprog
t f bt f bt f bt f bt f bt f b t f bt f b timer * 1.9 delay
line~ (0 sec)
f f f f f f f f t f f f < sel 127 sel 0 mtof 0.6, 0.2 100
0 / 2 osc~ line~
- - - - - - - - spigot 0
r asv
t f ft f ft f ft f ft f ft f f t f f t f f clip 300 2000
sel 0 / 2 tabread msprog f + 1 delay
tabread gpprog *~
predicted footfalls
sel 0 0 line~
* * * * * * * * / 1 30-200bpm input f sel 0 0
mtof *~
+ + + + + t f b t b f
mtof
dac~
detected footfalls
error signal sel 0
t f f b b f t f f b b f f 97600 r gpv t b f
- 10000, 607345 33147 line~ 3.7e+06
t f f b b f / stop
f f line~ / 0, 352828 $1
0 f delay 0 *~
0 0
f * 0.3 t f f f b 0, 7746 $1
tabread~ plain dac~ line~ pad pattern |angular acceleration|
- * 0.99 * 0.99
0 f summator line~ osc~ 0.017 tabread4~ GhPdL
* 0.01 prop -
* 0.01 * 0.99
+ +
* 0.01
* 0.05 + f tabread4~ MjSt
osc~ 0.013 r current-rate |Jerk|
- * 0.3 *~ *~ t f b
0 r msv loadbang
f abs + first; dac~ tonedrone
- * 2
-
-
* 0.99
f abs +
+ backwards; + *~ line~
1
r handacc spigot r current-rate gyro_y
difference
f abs + 0.1 * 0.01 - dac~ 0 t f b b
* 0.99 0 spigot t f b b
t f f
* 0.99 / + 0
+ 0.1 * 0.01 clip 200 4000 0 > 5 < -5
random 2 random 8 gyro_x
* 0.01 s handacc * 2
/ + 0 * 2
+ f sel 1 sel 1
0 s handrot + +
+ 0.1 0 0
a_z
/ t b f f f + * 26460
0 * 27304
0 this area down here;
attempts to enforce phase? f 1000 s current-rate
* 16384 + 27303 + 26459 a_y
> 5 t f f 0 0
sel 1 0 0 1 + 16383 pack 0 0 pack 0 0
spigot 1 pack f 667 r handrot t l b t l b
a_x
timer timer f
spigot 0 line~ * 20 f f
0 0
delay 300 delay tabread4~ OCs + 40 line~
> > 0 stop line~
t b b loadbang *~ 0.3 tabread4~ SqBt tabread4~ dl
1
sel 1 sel 1 t b b
dac~
mtof
*~ r sbv
loudness (linear)
lop~ *~ r hpv
dac~ dl 2 line~ dac~ line~ dl 1
0
t f b f f t f b f f t f b f f t f b f f t f b f f t f b f f
t f b f f t f b f f t f b f f t f b f f
1000 1500 2000 2500 3000 3500 4000 4500
* 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > <
spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot
f f f f f f f f f f
* 0.99 * 0.99 * 0.99 * 0.99 * 0.99
f f f f f f f f f f
(2 sec)
* 0.99 * 0.99 * 0.99 * 0.99 * 0.99
t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99
+ + + + + + + + + + + + + + + + + + + + predicted period (1 sec)
- fX - fY + - fZ - fgx + - fgy - - - -
hX hY + hZ hgx + - hgy
t f b f + + t b f + +
10k<sum<150k + - 70000 r current-rate
(0 sec)
f something like 0<v<.6;
0 / 20000 t b f ghostpad v=constant-current-rate
- / 100000
so-called squishbeat during faster work.
/ 1000 clip 0.2 0.8 always running, but we'll bring it to varying;
0
clip 0 1
600
this might also be fctn of hY
predicted footfalls
clip 0 1 levels of prominence - / 1000 clip 0 0.6 $1 3000 s gpv
$1 3000
(proportional to total amount of activity) t b f1
$1 3000 s sbv * $1 3000 s msv these two are exclusive
s hpv play only as total activity decreases
- *
but only when gp;
$1 3000 s asv is on
detected footfalls
|angular acceleration|
|Jerk|
gyro_y
gyro_x
a_z
a_y
a_x
loudness (linear)
0
0 500 1000 1500 2000 2500 3000 3500 4000
4. EXAMPLE MAPPING
195
6. REFERENCES
5. FUTURE WORK
196
Gesture ≈ Sound Experiments: Process and Mappings

Ross Bencina Danielle Wilde Somaya Langley
Composer CSIRO Textile and Fibre Technology Sound and Media Artist
Melbourne, Australia Monash University Art & Design Berlin, Germany
rossb@audiomulch.com d@daniellewilde.com somaya@criticalsenses.com
ABSTRACT with movement-only composition, and later, some sound

This paper reports on outcomes of a residency undertaken at mappings were prototyped by improvising movement to voice or
STEIM, Amsterdam, in July 2007. Our goal was to explore pre-recorded sound.
methods for working with sound and whole body gesture, with an From the outset our intention was to research and experiment with
open experimental approach. In many ways this work can be new techniques and to document our experiments, rather than
characterised as prototype development. The sensor technology produce a completed performance work. This was motivated by a
employed was three-axis accelerometers in consumer game- number of factors including our need to develop experience with
controllers. Outcomes were intentionally restrained to stripped- new approaches and technologies; the desire to not be constrained
back experimental results. This paper discusses the processes and by the requirements of structuring a coherent performance; and
strategies for developing the experiments, as well as providing the goal to create circumstances where working in new ways was
background and rationale for our approach. We describe “vocal natural, rather than giving in to the tendency to fall back on tried
prototyping” – a technique for developing new gesture-sound and trusted solutions and techniques.
mappings, the mapping techniques applied, and briefly describe a
selection of our experimental results. 2. BACKGROUND & RELATED WORK
The authors have been variously engaged in the creation of
Keywords performances incorporating gestural sound control for some time.
Gestural control, three-axis accelerometers, mapping, vocal Of most relevance here are Bencina's work with Simulus
prototyping, Wii Remote incorporating the P5 Virtual Reality Glove [14]; Wilde's
performance interfaces such as Face Clamps [20] and hipDisk
[21]; and Langley's ID-i/o [8] and work with HyperSense
1. INTRODUCTION Complex [12]. Our combined processes include consideration of
In July 2007 the authors undertook a residency at STEIM, the musical, physical and technical perspectives, as reflected in
Amsterdam with the goal of exploring and experimenting with the discussion of related work that follows.
new methods for control and performance of digital sound using
whole-body gesture. More specifically, to develop systems which Within Wanderley's review [18] our interests are located
support kinaesthetic-auditory synchresis, where human body proximally to the performance practices of Tanaka (EMG based
motion is mapped into sound in such a way that sound production gestural control) and Waisvisz (The Hands), both cited therein.
becomes an inherent and unavoidable consequence of moving the However our concerns are largely independent of Wanderley's
body – with the intention of engaging both performer and taxonomy of sensor based input and gestural control interfaces.
audience in a fluid experience of the relation between performed We seek to create performances which engage the whole body,
sound and gesture. and hence avoid dependence on hand-based sensor input or
'interface artefacts' which draw attention (of both performer and
Our approach was multifaceted and reflected the various interests audience) away from the body towards the artefact. In this regard
of the collaborators. Considerations included: physicality in the we relate perhaps more closely to Hahn and Bahn's “Pikapika”
space, sonic and compositional form, structure and aesthetics, character [6] who embodies movements from the bunraku
conceptual semantics, sensor technologies and applications. These (Japanese puppet theater) where physical gesture not only
concerns were used as the basis for devising experiments, some of corresponds to music, but dictates certain sound effects. With
which were undertaken without interactive technology. For regard to our relation to the plethora of interactive dance systems
example, in the early phases of the residency we experimented (see discussion in [6]), we note that our goal is not to compose
performances for dancers, but rather to give expressive sonic
Permission to make digital or hard copies of all or part of this work for capabilities to the whole body in motion. With respect to this we
personal or classroom use is granted without fee provided that copies are find strong resonance with Bahn et al's discussion concerning the
not made or distributed for profit or commercial advantage and that copies integration of the dancing body and the musical body [2] and also
bear this notice and the full citation on the first page. To copy otherwise, with Winkler's idea of “allowing the physicality of [human]
or republish, to post on servers or to redistribute to lists, requires prior movement to impact on musical material and processes.” [22]
specific permission and/or a fee.
NIME08, June 5-7, 2008, Genova, Italy Paine's recent publication [11] presents an instrument design
Copyright remains with the author(s). approach and example composition utilising the same
197
accelerometer based game controllers used here. Two significant on, nor do they have highly developed skills in this area. How
points of difference are, firstly, his application of a might musicians/technologists explore physical expressiveness in
parameterisation strategy based on an analysis of traditional extended ways?
instrumental methods of sound production (as opposed to say, the While there was no desire to privilege the physical, we felt it
natural causality of Chion's synchretic footsteps [3], or a musical important to short circuit the musician and technologist’s
structuring model grounded in a more general sonic typology tendencies to de-prioritise their body’s expressive range – to
[15]) and secondly, an interface-centric rather than body-centric create a different mindset from which to launch our investigation.
orientation towards control affordances. Moody et al. [10]
develop a mapping strategy for gestural control of an audio-visual Our working process began with free-form brain and body
system, which seeks to achieve synchresis between generated storming. We brainstormed possible uses of the sensor technology
audio and video. Their argument is relevant to the present work, before plugging in the Wii Remote and playing with it, to avoid
where we seek synchresis between observed performer gesture imaginations being tempered by knowledge of the device's
and generated synthetic sound. limitations, and to encourage working directly with the body as a
medium through which we could undertake our research.
The technical development of the sensor filtering and mappings Similarly, we created short physical vignettes without setting a
described below has been informed by a range of literature specific point of departure or other assistive limitations, thus
concerning gestural control of music [18], pragmatic forcing open engagement of the imagination to be linked directly
accelerometer-based motion analysis [4, 9], and more specifically, to the body from the outset. This process enabled the
accelerometer controlled synthesis [13, 17]. Much of the inertial musician/composer collaborators to familiarise themselves with
motion analysis literature is concerned with more elaborate extended physical expression and with varying levels of physical
sensing and filtering schemes than were applied here, however we proximity. It also formed a platform from which we could begin to
found the mathematical development in Ilmonen and Jalkanen's talk about movement.
system for analysis of conductor gestures particularly helpful [7].
For the duration of our working process, we made a point of
3. TECHNOLOGY OVERVIEW preceding discussions, brainstorming and ideation sessions with
Although we were interested in exploring a range of gestural input movement sessions. This allowed us to approach non-physical and
technologies, the Nintendo Wii Remote was settled on as a sensor physical tasks alike in a ‘physically ready’ gestural state.
platform for prototyping. The Wii Remote was chosen for
pragmatic reasons as it provided a wireless 3-axis accelerometer 4.2 The Approach
in an off-the-shelf package. As we were primarily concerned with Over the course of the residency we engaged in a range of
gestural input, only accelerometer data from the Wii Remotes was activities to develop our gesture sound mappings, continually
utilised. Masayuki Akamatsu's aka.wiiremote Max objects [1] striving to broaden the parameters within which we were thinking
were adapted to simultaneously convert accelerometer data from and working. We investigated ideas stimulated by the sensor
up to 6 Wii Remotes into an OSC data stream that was used as an technology such as how and where the Wii Remote could be
input to AudioMulch, where mapping and sound synthesis was placed on the body and what kinds of gestures it might be able to
performed. Mappings were developed using an embedded Lua sense and measure. We thought directly about sound – without
script interpreter running inside AudioMulch. limiting our ideas to the constraints of the technology; and worked
directly from a consideration of the body’s affordances and
dynamic capabilities. Throughout, we engaged in repeated
4. PROTOTYPING PROCESS ideation sessions, developed simple patches in response to ideas,
The schema for gesture≈sound prototyping arose out of a belief
and tried to understand what different choices afforded and what
that interweaving the development of sound and movement would
directions might be valuable for us to pursue. All of our
open up new ways of thinking about gestural sound performance
experiments were captured on video to enable ongoing assessment
and lead to gestural sound synchresis. We adopted a strategy of
and review.
minimal development – pursuing development in each modality
sufficient only to allow or provoke advance of the work as a Although we kept our attention on the technology, we remained
whole. We were thus prevented from falling back on known cautious that its demands not draw our focus away from other
methods and solutions, or staying in our comfort zones. The areas of inquiry. One method we used to counter this tendency
different modalities – sound, movement and technology – were was to vocally prototype our ideas so that we could discuss and
developed in tandem. A new vocabulary was allowed to emerge explore links between sound and movement without being limited
from our existing skills and the area of inquiry. Our approach by the technical constraints of the mapping process.
included ‘vocal prototyping’, discussed below and, while neither
extensively nor rigorously evaluated, resulted in each of us 4.3 Vocal Prototyping
working in new and unexpected ways, with positive outcomes. The aim of vocal prototyping was to challenge our usual ways of
thinking about movement and sound and to begin to understand
4.1 Moving Musicians the kinds of relationships we might make between them. Through
According to our criteria, a gesture controlled sonic performance this process we generated a substantial amount of material and
needs to engage the body of the performer in movement which made concrete steps towards formalising a gesture sound
incorporates a broad spectrum of physical expression. vocabulary. As outlined below, vocally prototyping ideas
Successfully engaging musicians and technologists in physical naturally flowed out of other approaches.
exploration can prove challenging, as typically they do not focus
198
We began by exploring a range of processes to develop PREPARED VIGNETTES: Take 10 minutes to compose a short
appropriate sounds. Working individually we identified sounds gestural/vocalised vignette that is then performed. Decide where
from the Freesound creative commons database [5], which we the sensor technology would be placed and how the data would
used as a basis for discussing and understanding the qualities of affect the sonic output. Experiment with the imagined placement
sonic space we each desired to create. This was followed by free- of technology – identical to the other person, mirrored, completely
form sound generation using the voice only; physical performance unconnected. Experiment also with possible sound output and
making sessions during which we vocalised sounds that were effects, performance relationships, etc.
suggested by movement; and free-form movement and sound While the above is not exhaustive it gives an indication of our
generation using the voice and entire body. approach in what we hope is a repeatable manner. As mentioned
previously the challenge was to find new ways of working with
and thinking about both sound and movement. ‘Vocal
Prototyping’ was found to be ideally suited to this task, it also
released us from the constraints of technology development. The
methodology was both rich and fecund.
4.4 Incorporating Technology

In order to incorporate technology we reviewed the material
generated during the ‘Vocal Prototyping’ sessions and considered
further development. Questions included: What ideas or
fragments did we consider worth pursuing? and How could we
implement them through the technology? We were interested in
how particular vocalized sounds might be reinterpreted through
synthesis. We didn’t want to simply translate what we had
Figure 1: sound to movement to sound discovered we wanted the creative process to continue.
The outcomes of vocal prototyping formed one of a number of
To expand our movement sound vocabulary we undertook a range
inputs to the process of generating sound to movement and
of exercises, including those outlined in Figure 1:
movement to sound mappings. Other inputs included free-form
A. first person vocalise, other find movement that corresponds ideation around categories of possible mappings, possible sonic
[x10 sounds]. Swap and repeat responses to particular motion events and other abstract sonic,
B. first person gesture, other find vocalisations that correspond musical and movement ideas.
[x10 gestures]. Swap and repeat From these varied sources we developed simple patches that
C. each make their own movement/sound pairings [x10]. Work would enable further exploration. From this point forward the
concurrently. When ready, perform one for the other. development of technology, movement and sound was
interwoven, and we continued to test the patches and rework
We then physicalised the four elements (Earth, Air, Fire and them. The aim, ultimately, was to take individual outcomes to a
Water) together in the space without vocalising or adding other trial performance level so that we could undertake further
sound, reviewed the video, then physicalised Thunder – one by assessment.
one, alone in the space for five minutes with the others observing.
This work, inspired by the unseasonally tempestuous weather
experienced in Amsterdam at the time, served to extend our 5. MAPPING
movement vocabulary by providing a familiar, yet imaginary Three-axis accelerometers are powerful sensors for gestural
impetus for highly abstract movement. The challenge throughout control applications. Although raw three-axis acceleration signals
was to extend our movement vocabulary and habits in the have only limited applicability for musical control, a range of
performance space by focusing on the physically expressive body useful signals may be derived from them. Since it is not generally
and its relationship to sound. possible to separate acceleration due to the earth's gravity from
that applied by a performer wearing the device, nor rotational vs.
Other approaches included: linear acceleration, useful derived signals are (at best) considered
FREESOUND SOUNDS: work through the previously collated pragmatic approximations of physical parameters such as
Freesound sounds to arrive at corresponding gestures/movements. orientation, velocity, etc.
CONVERSATION: (isolated voices): pass gestures and sounds This section describes the transformations applied to the raw
back and forth to create a kind of conversation between two (or three-axis accelerometer data in order to produce useful audio
more) people. E.g. The first person gestures while vocalizing (or control signals. Here the focus is on the low-level mathematical
vocalises while gesturing), the other responds with a different processes applied, motivated by the idea that it is helpful to
vocalised gesture/gestured vocalization. This leads to document such procedures. The hope is that an organised
characterisation and helps to break vocal and gestural habits. framework of such mapping strategies may emerge, in the spirit of
work begun in [16]. Application of these control signals to
OVERLAPPING CONVERSATION: As above but the
specific sound generation strategies is discussed further in the
improvisation is less structured and can overlap.
outcomes section.
199
5.1 Calibration and Normalisation 5.3 Approximations

A number of the applied transformations depend on the sensor Given the above quantities and some assumptions about the way a
data being normalized such that the computed 3D acceleration human body moves in performance, we computed additional
magnitude for a stationary sensor be constant at 1.0 irrespective of signals useful for driving gesture controlled sound synthesis.
the sensor's orientation. The calibration was performed according To approximate the magnitude of performer acceleration
to the procedure described in [19] and resulted in the computation excluding gravity we subtracted 1 from the total 3D acceleration
of an offset and scaling factor for each sensor axis. Calibration magnitude and took the absolute value:
was performed once for each sensor, with the resulting values
stored in a lookup table. During operation the calibration data was a performer motion=∣a xyz∣1∣=∣[  a 2xa 2y a2z ]1∣
used to compute a normalized floating point acceleration vector
Even with fast gestures the bandwidth of acceleration signals
(ax,ay,az) such that a sensor axis aligned to the direction of gravity
induced by muscular action alone is generally quite low (on the
would have a value of 1.0, corresponding to the acceleration
order of 10-20Hz) when compared to acceleration induced by
induced by the earth's gravitational field. Similarly, the 3D vector
physical stops (footfalls transmitted through the skeleton for
magnitude of the stationary sensor in any orientation was also 1.0.
example). To focus on muscular gestures we often applied low
5.2 Basic Derived Quantities pass filtering to the acceleration signals. When only rapid motion
was of concern, a high pass filter (usually 10 Hz) was employed
For the benefit of the authors, and those readers less
to extract only sudden changes in acceleration.
mathematically inclined, a few basic operations that can be
performed on the normalized three dimensional acceleration a HPF performer motion =HPF ∣a xyz∣
values (ax,ay,az) are reviewed here: In theory, when the sensor's orientation to gravity is fixed (i.e.
5.2.1 Single axis angle to gravitational field when the sensor is not allowed to rotate) it is possible to
Under the assumption that the sensor is stationary, it's acceleration completely remove the effects of gravity using a high-pass filter.
will vary between 0 and 1 based on its alignment to the direction Although this procedure was not practical in most situations we
of gravity. The angle (in radians) between the sensor axis and encountered, it led us to an approximation of velocity magnitude
gravity is given by the arcsine of the single axis acceleration computed by integrating the high frequency acceleration of each
angle axis=arcsin a axis  . Assuming the sensor is mounted parallel axis independently using a leaky integrator and then computing
to the spine (for example) this value can be useful for sensing how the magnitude of the resultant 3D velocity vector:
far off-centre the performer is leaning. v axis=v axis∗0.99HPF  aaxis  , ∣v xyz∣=  v x2 v 2y v z2
5.2.2 Two axis tilt amount and angle This is the extent of the approximations utilized in the present
Given a plane defined by any two sensor axes, say ax and ay, we work to date. The interested reader is advised to consult [7] for a
can apply the Pythagorean theorem to compute the amount of tilt: more elaborate scheme for double integrating displacement from
tilt magnitude xy =∣a xy∣=  a 2x a 2y acceleration using a second accelerometer to compensate for
angular rotations.
Assuming a stationary sensor accelerated only by gravity, |axy| will
vary from 0.0 (when the plane is parallel to the ground) to 1.0 5.4 General Mapping Primitives
when the plane is at right-angles to the ground. We can compute In the mapping discussions below we refer to the following
the direction in which the plane is tilted using: additional primitives: lowpass filters with various cutoff
ay frequencies, denoted LPF cutoff  x  ; envelope followers with
tilt direction xy =atan2   separate attack and release times, denoted ENV attackT ,releaseT  x ;
ax
gates, where sound (or some other behavior) is only triggered
Where atan2 is the four-quadrant version of the arctangent when a sensor value exceeds a threshold; leaky counters which
function commonly found in modern programming languages, “charge” when sensor signals exceeded a threshold (often the
which gives an angle in the range (-π,π]. Two axis tilt values may charge amount is influenced by the sensor value), and “discharge”
be used, for example, to respond to the orientation of the trunk of over time, with the counter value modulating the audio signal in
the body towards the ground. some way. Unless noted, sensor data ranges were linearly scaled
and clamped into synthesis control signal ranges.
5.2.3 Three dimensional acceleration magnitude
Applying the Pythagorean theorem in three dimensions we can 6. OUTCOMES
compute the absolute vector magnitude |axyz| defined as: This section describes a selection of experimental outcomes which
∣a xyz∣= a x2a y2a z2 . were developed and presented to the public1. An attempt is made
to give an impression of the sonic, performative and mapping
This is the total magnitude of acceleration affecting the sensor aspects of each experiment.
including both the gravitational and gesture motion components.
HEAD SCRAPE (Figure 2): A hyperinstrument in which a sound
As noted above, |axyz| will remain constant at 1.0 for a sensor at
generator is triggered by the motion of one performer's head. The
rest. When put in motion by a performer the acceleration
resulting sound is processed by a bank of resonators whose
magnitude will usually increase, although under certain
circumstances it may temporarily decrease (if the performer 1
STEIM wiiiiiiii concert, 24 September 2007. See:
rapidly accelerates the sensor towards the ground for example). http://www.steim.org/steim/archief.php?id=209
200
As the speed of the arms increase (sometimes requiring spinning

the whole body), white noise and additional bass is faded in, and
comb filters are swept across the spectrum creating a swooshing
sound. LPF 4Hz ∣v xyz∣ sweeps the comb filter between 400 and
4000Hz with increased performer velocity. LPF 1Hz ∣v xyz∣ controls
the introduction of the white noise and bass boost through a
sweeping shelf filter. The filtered velocity signal is also quantized
into 10 steps, and used to select one of the harmonics of the
oscillator bank: the velocity signal is applied to an envelope
follower associated with the selected harmonic, which boosts or
sustains the current harmonic level. When the velocity no longer
Figure 2: Head Scrape excites a particular harmonic it slowly fades to silence.
frequencies are modulated by the motion of a second performer.

When the first performer's a HPF performer motion exceeds a threshold, a
gate is opened which causes a granular glitching sound to be
generated. The processing performer wears two sensors, each
controlling an amplitude modulated delay line and a bank of
spaced resonators. The modulation rate and resonator frequencies
are modulated by LPF 5Hz ∣v xyz∣ while an envelope follower
ENV ∣v xyz∣ controls the amount of signal entering the filter bank.
MOTION SHATTER: A smooth continuous drone of Tibetan
monks chanting is fed through a granulator. As the performer
spins in a circle holding the sensor in an outstretched hand the
sound becomes less smooth. Spinning faster causes the sound to Figure 3: Speed Harmonics
become gritty, and eventually to break up. It is necessary for the
TONE CHANGE: Two performers each perform with two Wii
performer to spin in circles, in an increasingly desperate manner
Remotes, one in hand and the other attached to the hip. Each Wii
in order to effect a complete cessation of sound. The controlling
Remote is associated with two sine wave oscillators. One is
signal LPF 0.6Hz ∣a xyz∣ reduces grain durations (from approx 500
slightly detuned from the other with the detune distance
ms down to 10ms) while increasing the randomised interonset increasing by an offset of between .01 and 20Hz with increased
time from 2.6 to 500ms causing the sound to slowly break up with LPF 1Hz ∣v xyz∣ . The amplitude of each oscillator pair is
increased centripetal acceleration.
modulated by ENV 500ms ,1500ms ∣v xyz∣ . The polarity of the filtered
LEG RATCHETS: Sensors are attached to the performer's lower Z velocity is tracked. When the LPF 2Hz v z  sensor has been at
legs. Each leg controls a similar synthesis patch, which granulates rest and starts moving again in the opposite direction a new
a different sound. The patch iterates a pulse generated by gating a random note from a diatonic scale is chosen. Thus, the performers
granular texture (pulse rate ranging from 5 to 40 Hz) with pulse start and stop to change notes, and move in various ways to
rate, transposition and gain modulated by ∣∣a xyz∣1∣ . When the articulate their tones, creating slowly modulating random chord
sensor is at rest the pulse is slow, silent, and lower pitch. The legs' sequences.
movement results in accelerated pulses or rhythmic modulation.
At some point an error was made with this patch and the 7. DISCUSSION & OPEN QUESTIONS
acceleration value was offset by -0.35 which resulted in the In each of the experimental outcomes outlined above, we strove to
performer having to move one leg to make sound, and the other maintain a balance in the relationship between movement and
leg to stop its corresponding sound. This opened up as yet resultant sound that was easy to perceive for audience and
unconsidered possibilities, and provided a rich space for performer alike. The mappings discussed were intentionally
performer experimentation. simple. More complex mappings, while more satisfying from a
BLADES OF GRASS: Each performer wears a Wii Remote performance perspective require careful consideration and tuning
aligned to their spine, which is associated with a synthesis patch in order for the relationship between movement and sound to
consisting of processed noise with a resonant filter swept attain synchretic coherence. The development of such mappings is
according to the angle and direction in which they are leaning. a clear direction for further investigation.
tilt direction xz is processed into a triangular shaper which Engaging the body in performance necessarily raises notions of
produces a periodic sweep as the performer rotates the tilt of their the body as interface, and, for the audience, physical theatre, or
spine. This is multiplied by the amount the performer is leaning ( theatre of the body. We feel that it is difficult to escape a
tilt magnitude xz ) and mapped to the resonant filter cutoff theatrical mode of interpretation when confronted with a musical
frequency. performer without an instrument, which of course also invites a
dramaturgical mode of composition. We consider the dialog
SPEED HARMONICS (Figure 3): The performer wears a sensor
between musical and theatrical creation to be a significant area for
on each forearm. The sound world consists of two resonant
future development in whole body gesture sound performance.
harmonically tuned oscillator banks, one controlled by each arm.
201
As previously observed by Bahn et al. [2] performing with the Proceedings of 6th Eurographics Workshop on Virtual
whole body involves skills not always possessed by musicians – Environments, Amsterdam. 2000, 187-196.
some of the authors are now considering training in this area to [8] Langley, S. ID/i-o Website: http://www.criticalsenses.com
continue the research. Accessed January 31 2008.
Finally, the sensor technology employed so far has been adopted [9] Mizell, D. Using Gravity to Estimate Accelerometer
as a pragmatic prototyping aid. We are now considering options Orientation. In Proceedings of the 7th IEEE international
for smaller, wearable sensor platforms. Symposium on Wearable Computers. ISWC. IEEE Computer
Society, Washington, DC, 2000, 252.
8. CONCLUSION [10] Moody, N. Fells, N. and Bailey, N. Ashitaka: an audiovisual
The gesture ≈ sound experiments outlined in this paper represent, instrument. In Proceedings of the 2007 Conference on New
for the authors, a solid foundation from which to continue our Interfaces for Musical Expression (NIME07), New York,
research. While many questions remain unanswered, the process NY, USA. 2007.
has both provoked and supported new ways of grappling with the
problem of mapping gesture and sound. The importance of getting [11] Paine, G. Interfacing for dynamic morphology in computer
musicians to think through their bodies has been highlighted. By music performance. The inaugural International Conference
consistently approaching non-physical and physical tasks alike in on Music Communication Science, 5-7 December 2007,
a ‘physically ready’ and gestural state, our way of working, Sydney, Australia.
thinking and creating shifted dramatically. Our clear intent to [12] Riddell, A. HyperSense Complex: An Interactive Ensemble.
develop movement and sound mappings in tandem was central to In Proceedings for the Australasian Computer Music
our approach, and was integral to providing the outcomes Conference. Brisbane, Australia, Australasian Computer
presented here. Music Association, 2005.
In our search for gesture sound synchresis, we have established [13] Ryan, J. and Salter, C. TGarden: wearable instruments and
clear directions for ongoing research and an approach which augmented physicality. In Proceedings of the 2003
promises to support development of a diverse performance Conference on New interfaces For Musical Expression.
vocabulary. National University of Singapore, Singapore, 2003, 87-90.
[14] Simulus P5 Glove Developments. Website:
9. ACKNOWLEDGMENTS http://www.simulus.org/p5glove/ Accessed 31 January 2008
We gratefully acknowledge the support of STEIM1 for hosting [15] Smalley, D. Spectromorphology and Structuring Processes.
this residency. For their financial assistance we thank The In Simon Emmerson (Ed.) The Language of Electroacoustic
Australia Council for the Arts, The Australian Network for Arts Music, London, 1986.
and Technology, Monash University Faculty of Art and Design
and CSIRO Division of Textile and Fibre Technology. [16] Steiner, H. Towards a catalog and software library of
mapping methods. In Proceedings of the 2006 Conference
on New Interfaces for Musical Expression (NIME06), Paris,
10. REFERENCES France. 2006.
[1] Akamatsu, M., aka.objects: aka.wiiremote. Website: [17] Trueman, D. and Cook, P. BoSSA: The Deconstructed
http://www.iamas.ac.jp/~aka/max/. Accessed Jan. 31 2008 Violin Reconstructed, Proceedings of the 1999 International
[2] Bahn, C. Hahn, T. and Trueman, D. Physicality and Computer Music Conference. Bejing, China, 1999, 232-239 .
Feedback: A Focus on the Body in the Performance of [18] Wanderley, M. Gestural Control of Music. In Proceedings
Electronic Music. In Proceedings of the 2001 International International Workshop Human Supervision and Control in
Computer Music Conference, Havana. ICMA, 2001. Engineering and Music. Kassel, Germany, 2001.
[3] M. Chion. Audio-Vision: Sound on Screen. Columbia [19] Motion Analysis at Wiili.org. Website:
University Press, 1994. http://www.wiili.org/index.php/Motion_analysis Accessed
[4] Davey, N. P. Acquisition and analysis of aquatic stroke data 31 January 2008
from an accelerometer based system. M. Phil. Thesis, [20] Wilde D., Achituv, R. ‘faceClamps’ (1998). Website: http://
Griffith University, Australia, 2004. www.daniellewilde.com/docs/faceclamps/faceClamps.htm.
[5] The Freesound Project. Website: Accessed 31 January 2008
http://freesound.iua.upf.edu/ Accessed 31 January 2008 [21] Wilde, D. hipDisk: using sound to encourage physical
[6] Hahn, T. and Bahn, C. Pikapika - The collaborative extension, exploring humour in interface design. Special Ed.
composition of an interactive sonic character. Organised International Journal of Performing Arts and Digital Media
Sound, 7, 3 (2002), Cambridge: Cambridge University Press, (IJPADM). Intellect. 2008. Forthcoming.
229-238. [22] Winkler, T. Making Motion Musical: Gesture Mapping
[7] Ilmonen, T. and Jalkanen, J. Accelerometer-Based Motion Strategies for Interactive Computer Music. In Proceedings of
Tracking for Orchestra Conductor Following. In the 1995 International Computer Music Conference. San
1
Francisco, CA. Computer Music Association, 1995.
The Studio for Electro-Instrumental Music, Amsterdam, the
Netherlands. http://www.steim.org
202
“3rd. Pole” – a Composition Performed via Gestural Cues

Miha Ciglar
Sound Artist / Student
University of Music and Dramatic
Arts Graz, Austria
++43 650 973 3947
miha.ciglar1@guest.arnes.si
ABSTRACT coordinates were then used as an input for a gesture recognition

rd
“3 . Pole” is a musical composition that is performed by a algorithm, inspired by the concept of left to right Hidden
dancer, on a specially designed interface (instrument), based on Markov Model architecture [14], [11]. The algorithm was
motion tracking technology. This paper introduces the technical implemented in the real-time programming environment: PD
and artistic ideas behind the composition and outlines some of (Pure data) [13] which is receiving the location data from the
the main conceptual tendencies of my earlier work [8]. Several Vicon server through the OSC communication protocol [16].
independent components like choreography, instrument design,
sound design, formal concepts, etc. were in parallel
development throughout the last three years, and came together
in this piece. Some of those were already realized in individual
projects and are joined now under a new concept of
interdependence. A major component of this project however
was the implementation of a real-time gesture follower / gesture
recognition algorithm applied to full-body motion data. This
should be considered as an autonomous module, not allied to
this particular project exclusively. Therefore it can be treated
and developed independently – as a multi-purpose data
mapping strategy – for it has the potential to find further use in
any kind of interactive (dance, music, theatre, etc.) performance
scenario.
Keywords Fig.1: markers attached to the dancer’s limbs

Motion tracking, Gesture recognition, Haptic feedback
The dancer receives feedback from the system in two ways. In
form of music and in form of electricity, that is directly applied
1. TECHNICAL DESCRIPTION to his body, through a cable, he is holding in his mouth. Ideally
1.1 Infrastructure the setup should be realized with a wireless unit so the dancer
The dancer is monitored by the “Vicon 8” motion capture has more freedom to move and does not get tangled in the
system [15]. A brief description of the system and some cable.
common data mapping strategies for musical applications can
be found in [3]. The “Vicon 8” system consists of 12 infra red 1.2 Gesture Recognition
cameras / sensors, placed around the dancer (performance area)
and is able to track and extract the Cartesian x/y/z coordinates
1.2.1 Related work
There is a variety of different approaches dealing with gesture
of light-reflecting markers on his body in 3 dimensional space,
recognition in performing arts, deploying all kinds of sensing
at a sampling-rate of up to 120 frames per second. In our case,
devices. Those are applied to movement patterns of dancers /
the markers were arranged in groups, so that a characteristic
performers directly [1], [4] or, if concerning musicians, to their
constellation of 4 to 5 markers attached to the end of each limb
conventional instruments like in [2] - showing an example of
(fig.1) would represent one central point from which we
such an augmented instrument. Gesture recognition /
received our spatial coordinates. The trajectories of those
classification techniques may also be of benefit in analytical
applications like for example in music pedagogy [4], or in [7],
where it was proposed to separate style and structure of full
body gestures and to analyze stylistic differences between
different gesture realizations. Further, there are also freely
Permission to make digital or hard copies of all or part of this work for available tools for gesture recognition, like the MNM and FTM
personal or classroom use is granted without fee provided that copies are libraries [5] developed at IRCAM, for application within the
MAX/MSP programming environment [12].
requires prior specific permission and/or a fee. 1.2.2 Initial directives
NIME08, June 5-7, 2008, Genova, Italy My goal was not to design a blind, – multi-purpose pattern
Copyright remains with the author(s). recognition system, which could operate with any kind of
203
multidimensional data, but to take in account the particular dimensions (the relation to the other three static limbs). Ideally,
characteristics of human-body gestures with special in this case we would generate a single location change in the
consideration of the human perception capabilities and its dimension space. In practice however, the individual dimension
tolerance against significant variations in different realizations. values would not change absolutely synchronous, resulting in a
It is an attempt to simulate the human ability to read and line of trace through the vector space, which is composed of
recognize a gesture as an abstract sign, by drawing attention to instable (elusive) states and is pointing from one stable state to
the temporal progression of the relational statistics among the other. This phenomenon would appear even more
selected body features. The key thesis that I was trying to work pronounced in the case of complex, full-body gestures. There is
with is that a human body-gesture can be sufficiently described a slight variation in the sequence as well as the actual presence
or abstracted by the trajectories of the inter-point (marker) of those instable states in successive realizations of the same
distance variations. gesture, which is why we need to define a radius of tolerance (a
cluster in the vector space) for each incoming state of the probe
1.2.3 Inter-point distance variation sequence. This radius was designed to exhibit a dynamic
A first version with 4 points (markers), which mark the ends of behavior, namely to allow for a specific degree of deviation
extremities (arms, legs) was completed, where the variation in from the currently compared state in the reference (exemplar)
distance between each pair of markers is taken as feature. By sequence, but simultaneously featuring indifference towards a
choosing this approach, we immediately get rid of the absolute specific “location” (the exact dimension) in which the deviation
coordinates in space and are not bound to a specific location or might occur.
orientation of the performer inside the tracking area. Four
markers would generate 6 distances between the markers,
respectively a 6 dimensional vector space for modeling the state
sequence that would classify a particular gesture. In this
situation, where we work with 4 markers, we achieve a 50%
dimensionality reduction: from 12 (4 * x,y,z coordinates) to 6
(inter–point distances), however this approach would not be so
effective once we increase the amount of markers.
1.2.4 Spatiotemporal quantization

The gesture recognition system was designed not to distinguish Fig.2: a sequence of three temporal clusters of state vectors
between intensity as well as temporal evolution of different (indicated by the small circles) with a dynamic tolerance
gesture realizations, which should allow for some degree of radius, (indicated by the ellipsoids), representing the spatial
variation in the interpretation. The inter-point distance clusters
variation parameter set is a time dependant vector, defined by
the sign information of the first derivatives (velocity) of the 1.2.6 Adaptive filtering
incoming signal. Its individual dimensions are thereby confined Before the state vectors are determined, the incoming signal is
to 3 discrete states: (constant “0”, increasing “+1” and low-pass filtered. The exemplar sequences that serve as the
decreasing “-1” distance), making the system unsusceptible to reference for later recognition are recorded with fixed filter
gesture intensity. With 6 dimensions (distances) in 3 possible parameters, and should be conducted as clear and evenly as
states respectively, we have a set of 36 = 729 different state possible. Later, in the recognition phase, the length of the
vectors to distinguish between. Whenever one dimension integration window of the low-pass filter is being adapted in
changes its state, the system would generate a new state vector, real-time, according to the overall acceleration value of the
keeping the unchanged dimensions as they were, thereby incoming signal. This approach is related to the idea of gesture
allowing an arbitrary and even nonlinear temporal evolution of segmentation described in [6], where the parsing of body
a gesture. After the algorithm has been trained, a gesture can be motion into different gestural segments is based on the
identified in real-time as it is being conducted and we get a interpretation of acceleration values of the incoming signal.
continuous parameter describing the degree of completion of a Although, the segment lengths and their durations are not
particular gesture. relevant in our algorithm, the information about the location of
transition points was found to be a useful parameter for the
1.2.5 Gesture segmentation and state clustering adaptive filtering of incoming data. Here, the sum of absolute
There is no perceptually relevant gesture segmentation taking values of the second derivatives of single dimensions of the
place in this algorithm, because the state vectors described incoming signal is the criterion for the choice of the number of
above are mostly being generated burst-wise. On the other integrands in the filter. The amount of states generated by the
hand, those bursts could be interpreted as indicators of system depends on the size of the integration window
transition points, of perceptually relevant segments. However, processing the incoming signal. To assure a satisfactory inter-
there is merely the succession order of those incoming state gestural discrimination, and a sufficient intra-gestural
vectors that is of our interest here, but the exact timing (variation) tolerance, we need to “code” the incoming data with
(duration) information of the incoming state vectors is not redundant information where less is being generated by the
captured by the algorithm and is not used for gesture nature of the signal (slow movements, few coarse changes).
classification, for we want to achieve freedom in the temporal Whereas on the other hand, we need to reduce the amount of
interpretation of the gesture. The bursts or temporal clusters of data being generated at high signal acceleration values
different states occur due to large distance jumps through the (transition points) in order not to loose track of the gesture
vector space and would result from a change of movement progression due to an excess of data. Experimental results have
direction of every single limb in relation to the others. If only proven this strategy to outmatch a system with a static filter
one limb is activated (changing location) and suddenly alters its design.
course, the usual consequence is a change of value in three
204
1.2.7 Time warping avoid sonification of his actions (different from the recorded
Although the adaptive filtering component should foster the gestures) if not desired.
disaggregation of temporal clusters and an equal state density
distribution along gesture progression, there are still situations 1.2.9 Results and observations
where the proportional variations of individual gesture It is to say that the algorithm is still in development at this time
segments exceed the threshold of correct recognition. If the and all the constellations of different parameters were not
current state of the probe signal, for example, does not match extensively tested yet. The tests that we made up to now
the currently compared state of the recorded reference showed following results: Through careful tuning of the
sequence, neither its values would fit inside the probe-state algorithm parameters, it was possible to achieve around 80%
clusters tolerance radius, the incoming state vector is being correct identifications – (4 out of 5 identical gestures (including
passed on to a time warping function, which compares it against variation factors) were recognized to 100%). At the same time,
a certain neighborhood of states. If this function finds a match the inter- gesture discrimination was kept under 70%, i.e. no
in the values of the neighboring states of the reference more than 30% of a “false” (arbitrary) gesture were identified
sequence, it time warps the probe signal to it and updates the as one of the reference gestures.
index of the state that is to be compared next.
It is obvious that an approximation of a gesture through four
points on a human body is not very accurate and satisfactory.
Further, the concept of inter-point distance variation usually
does not discriminate between mirror-inverted gestures, etc. We
also discovered that it is possible to work with gestures of
varying complexity levels (from robotic to more fluent and
natural choreographies), but it is very important to maintain an
equal degree of complexity in all gestures that we want to
identify, since the algorithm tuning parameters depend strongly
on gesture complexity. The selected choreographic vocabulary
has to exhibit as much diversity between its single elements
(gestures) as possible and the algorithm parameters need to be
tuned according to it. However, if we take in account the
specific conditions and limitations of such an approach, we can
Fig. 3: Time warping the probe- to the reference gesture – still develop a well distinctive choreographic language /
image taken from [4] vocabulary that might even set of a new and unique – system-
conditioned aesthetic of movement.
1.2.8 Identification process
One of the intentions of this project was to blur the causal 1.2.10 Future work
relationship of movement and sound, as it is usually the case For now there is still a lot of testing and tuning work to be done
when we apply direct mapping between sensor data and with this particular approach. In the further development of the
musical parameters. However, the approach of generating gesture recognition system, I would still like to stick the basic
musical parameters via gestural cues should not restrain the principles of spatiotemporal quantization described in this
control data to discrete values emerging at the end of a paper, but to put more focus on the state-bursts (the temporal
successful completion of a predefined gesture. The goal was clusters described above) in the recognition process. Perhaps
rather to stick to the possibility of generating continuous output more reliable information could be gained, by disregarding the
data, but to restrain it to accompany only specific exact temporal progression of the state vectors, and by
choreographic material. Thus we are expecting to work with a analyzing the temporal progression of state clusters instead.
continuous output parameter describing the degree of Then the statistics of state occurrences in such a cluster would
completion of a particular gesture in real-time. The algorithm be compared to each other in different gesture realizations.
does not need to output probability values or to show a degree Since it was found out that the clusters mark the transition
of deviation from the temporarily observed state, since the points of gesture segments, they consequentially include all the
acceptable deviation limits are already integrated in the directional information of the preceding as well as the
clustering radius, the time warping function, etc. described following segment.
above. We are not interested in how strong a deviation really is
as long as it is inside a carefully chosen tolerance radius
considering an adequate inter- and intra-gestural discrimination 1.3 Feedback
/ tolerance. Each gesture in our prerecorded gallery has its own By moving through space, the dancer conducts actions in three
module, continuously monitoring the input feed. If the initial spatial dimensions plus one temporal dimension. A
state of a gesture is being detected, the attention is put to the fundamental part of the musical composition is the function that
next and so on, for as long as the break condition is not translates those actions to a two dimensional space (a time
exceeded. If this is the case, the algorithm stops tracking the varying amplitude (the audio signal)), and will undergo a
gesture and returns in the initial state to continue looking for detailed discussion later in the text. The dimension of amplitude
the beginning of the gesture again. As soon as it turns out that a refers to the (fast changing) electronic signal waveform
gesture is not the one we are looking for, the algorithm needs to corresponding to the sound being generated and projected. In
be ready to accept a new “candidate” sequence. Not all the addition to the sonification of the electronic waveform, which
incoming data needs to be assigned to a particular prerecorded produces an auditory feedback, the dancer is also exposed to an
gesture, and therefore we are not selecting the highest alternative instance of the same signal. This instance is the
likelihood among our reference sequences to match the probe (amplified) signal itself, in its primary (the electronic) domain.
sequence. Thus the dancer is able to provoke an expected sonic The connection with the dancer is established by a cable which
result by selecting its choreographic material in real-time and to he is holding in his mouth. This concept of direct electronic
205
signal-feedback was already applied and discussed in my earlier most important part however contributed the Institute for
compositions and interface designs [9], [10]. It enables the Electronic Music (IEM) – www.iem.at – in Graz (A), by
dancer / performer to experience an alternative impression of providing the facilities and technical infrastructure. Special
the induced sound. Since it is electricity we are dealing with Thanks goes to Dr. Gerhard Eckel and David Pirro for
here, the dancer would feel a pain with waveform (sound providing theoretical opinions and IOhannes Zmölnig for
amplitude) dependant intensity. Therefore, we need to be very technical assistance.
careful with the amplification of the signal in order not to
seriously harm the dancer.
5. REFERENCES
[1] Aylward R. and Paradiso J., “Sensemble: Awireless,
compact, multi-user sensor system for interactive dance”
Proc. of the International Conference on New Interfaces
for Musical Expression (NIME 06), Paris, France, 2006.
[2] Bevilacqua, F., Fléty, E., Lemouton, S., Rasamimanana,
N., Baschet, F. “The augmented violin project: research,
composition and performance report” Proc. of the
International Conference on New Interfaces for Musical
Expression (NIME 06), Paris, France, 2006.
[3] Bevilacqua, F., Dobrian, C. “Gestural Control of Music
Using the Vicon 8 Motion Capture System”, Proc. of the
International Conference on New Interfaces for Musical
Fig. 4: the dancer with the audio-output cable in her mouth Expression (NIME 03), Montreal, Canada, 2003
[4] Bevilacqua, F., Fléty, E., Guédy, F., Leroy, N., Schnell,
2. ARTISTIC CONCEPTION N. “Wireless sensor interface and gesture-follower for
In a dance performance, there are usually 2 elements (visual music pedagogy”, Proc. of the International Conference
and audible) that need to be arranged and put into a contrasting on New Interfaces for Musical Expression (NIME 07),
or harmonizing etc. context. The title “3rd. Pole” should New York, NY, USA, 2007
indicate the inclusion of a third, a haptic component contributed [5] Bevilacqua, F., Muller, R., Schnell, N. “MnM: a
by the electronic current running through the dancer’s body. He Max/MSP mapping toolbox “, Proc. of the International
is exposed to a situation where he is in absolute decision power Conference on New Interfaces for Musical Expression
and needs to consider and outbalance all three elements (poles). (NIME 05), Vancouver, Canada, 2005.
Like already mentioned, we have the induced sound
respectively its electronic abstraction, which is in direct contact [6] Bevilacqua, F., Cuccia, D., Ridenour, J. “3D motion
with the performer’s body. This enables a different corporal capture data: motion analysis and mapping to music”,
perception and interpretation of the caused sound, since now Proceedings of the Workshop/Symposium on Sensing and
the performer does not only have the audible but also a haptic Input for Media-centric Systems, Santa Barbara CA, 2002
reference - i.e. pain, caused by the electric current - for the [7] Brand, M. “Style machines”, In Proceedings of
choice of his following actions. Therefore, also the process of SIGGRAPH, New Orleans, Louisiana, USA, 2000
composition or better to say, the final arrangement of pre-
[8] Ciglar, M. homepage: http://www.ciglar.mur.at
composed material is only possible in real time, since we are
interested in an alternative arrangement of the choreographic [9] Ciglar, M. “I.B.R. Variation III.” Proceedings of the EMS
and musical progression, which is inspired by all three “poles” – Electroacoustic music Studies Network Conference –
together. A pre-composed form or sequence of events would Beijing, China - October 2006
not make any sense, apart from satisfying possible sadistic [10] Ciglar, M. “Tastes Like…” Proceedings of the ACM
tendencies of the composer. Multimedia Conference. Singapore, November 2005
[11] Jie Yang, Yangsheng Xu, Chen, C.S. “Human action
3. CONCLUSION AND FUTURE WORK learning via hidden Markov model” IEEE Transactions
The focus of this paper was rather on the interfacing concept on Systems, Man and Cybernetics, Part A, Jan, 1997.
and the interactivity of the system. A second major component
of this project besides the gesture recognition system was sound [12] Max/MSP programming environment
design, which was not discussed here at all. Those components http://www.cycling74.com/products/maxmsp.html
however are not bound to each other, so the project presented [13] Puckette, M. “Pure Data” Proceedings of the ICMC, 1996
here is not meant to be considered as a sealed (finished) entity. [14] Rabiner, L. R. and Juang, B. H., "An introduction to
It can be developed further independently in both, the artistic hidden Markov models," IEEE Acoust. Speech Sign.
(musical, choreographic) and/or technological domains. “3rd. Process. Mag. 3 (1986) 4-16.
Pole” is only a first manifestation of an artwork and stands for
one of many possible results that can be achieved in the future. [15] Vicon 8 motion capture system:
http://www.vicon.com/entertainment/technology/v8
4. ACKNOWLEDGMENTS [16] Wright, M. “Open Sound Control: an enabling technology
This project has been supported by STEIM – www.steim.org – for musical networking” Organised Sound, 2005/12/01,
in Amsterdam (NL), by offering me a residency, during which I Volume 10, Issue 3, p.193-200, (2005).
completed most of the sound design for this composition. The
206
More DJ techniques on the reactable
Kjetil Falkenberg Hansen Marcos Alonso

Speech, Music and Hearing Music Technology Group
KTH - Royal Institute of Technology UPF - Universitat Pompeu Fabra
Stockholm, Sweden Barcelona, Spain
kjetil@kth.se malonso@iua.upf.edu
ABSTRACT mon for all these directions is that they mimic or model the
This paper describes a project started for implementing DJ speed manipulation of a turntable. To our knowledge, there
scratching techniques on the reactable. By interacting with are no commercial products that take advantage of using the
objects representing scratch patterns commonly performed above mentioned DJ playing techniques directly.
on the turntable and the crossfader, the musician can play On the software side, we have seen some attempts at using
with DJ techniques and manipulate how they are executed scratch techniques to simplify the process of sounding like a
in a performance. This is a novel approach to the digital DJ real DJ. For example, with Scratcher2 the user can manually
applications and hardware. Two expert musicians practised draw speed and amplitude envelopes and play them back,
and performed on the reactable in order to both evaluate the making scratch patterns on audio files. This opens the pos-
playability and improve the design of the DJ techniques. sibility of coming up with new techniques, to experiment
with the sounds or to compose music for the turntable. The
disadvantage of drawing envelopes is the lacking real-time
Keywords control for performance situations.
reactable, DJ scratch techniques, interfaces, playability Another path is seen in Skipproof,3 where scratch tech-
niques can be assigned to hardware or software controllers.
1. INTRODUCTION AND BACKGROUND Here, all the techniques are based on models derived from
analysis of real DJs’ movements [7]. The user affects the
It is well known that scratch DJs acquire very specific playback of the techniques by the action and gesture as-
skills and learn a more or less defined set of playing tech- signed, for instance can the speed of the scratch be controlled
niques. One recent example of formalizing the techniques by the effort of the player. Skipproof have been used in com-
can be found in the DVD by DJ Q-bert [6], one of the lead- bination with the Radio Baton, gesture sensors, MIDI de-
ing musicians in the field. In the DVD, about one hun- vices and computer input such as Wacom tablet. However,
dred different “scratches”, or techniques, are demonstrated. it has been desirable to treat the techniques as individual
These techniques are interesting for several reasons: They building blocks in a scratch performance.
represent a natural starting point for studying how turntable The presented work builds on the Skipproof application
musicians—or turntablists—play expressively, they define in combination with the reactable.
what a new (non-vinyl) DJ interface should manage, and
they offer an approach to perform complicated playing ges-
tures with simple actions.
The reactable instrument
Since turntablism peaked in popularity in the late nineties, The reactable is by now a well-known novel electronic musi-
many solutions for scratching and DJing without using vinyl cal instrument, with recent massive exposure in all kinds of
records and a turntable have surfaced. These are mentioned media, especially since the artist Björk gave it a pronounced
in several earlier papers, see eg. [2, 10, 13]. Such hardware position in her stage shows and compositions. It is a ver-
include, among others, the CD scratch decks (e.g. Pioneer satile instrument that works in a similar way as tools like
CDJ1000), time coded vinyl controlling sound files stored on Pure Data, Max/MSP or Reaktor. It was designed to meet
a computer (e.g. Final Scratch), software simulations (e.g. artistic and musical demands not catered for by other inter-
TerminatorX and FruityLoops), various “scratch pads” and faces,4 and it follows a well-defined principle for developing
jog wheels, and also controllers found on keyboards.1 Com- the behavior of its physical objects that are handled on the
table top [11, 12].
1
These are examples from the many emerging products. For Integration of DJ techniques on the reactable was started
a while back with the development of a few objects that
could provide some of the functions from Skipproof [1]. These
Permission to make digital or hard copies of all or part of this work for instance, CDJ1000 was not the first CD scratch player, but
personal or classroom use is granted without fee provided that copies are it represented a market break-through.
not made or distributed for profit or commercial advantage and that copies 2
http://web.ics.purdue.edu/ãfaulsti/skrasms/
bear this notice and the full citation on the first page. To copy otherwise, to 3
republish, to post on servers or to redistribute to lists, requires prior specific http://www.csc.kth.se/˜kjetil/software.html
4
permission and/or a fee. There are however a number of interfaces with a similar
NIME08, Genova, Italy approach; most of those are listed on the reactable project
Copyright 2008 Copyright remains with the author(s). website: http://reactable.iua.upf.edu/?related
207
were later combined with a different approach to scratching For the performance sessions, the musicians were left alone
by Dimitrov [4, 9]. Although these objects worked, some and undisturbed in a rehearsal room, and listened to their
improvements remained to be done, and a formal evaluation own performance through loudspeakers. The second perfor-
of the scratch functionality was needed. mance was videotaped. While practising, they had the pos-
Within the SID initiative5 the development that started sibility to get help and ask questions (however, they both
in 2006 could continue. Main aims were to get better scratch preferred to practise without help).
possibilities on the reactable, and to investigate how a trained For evaluating playability, the DJ and reactable player
DJ could interact with familiar techniques in new ways. Our answered to a questionnaire and an open interview following
method included to let musicians into the design loop and re- each of the performances, including the rehearsal.
ceive their feedback and expertise, and to let them evaluate The questionnaire was a modified version of the TAI-
the playability. Given the latency introduced by a system CHI evaluation questionnaire proposed by Bornand et al. [3]
based on video recognition, we never expected the system for testing two different interfaces. Our modified version,
to be responsive enough for performing real scratch gestures which added a few questions directly concerning the re-
comfortably. For this reason, our main focus was on the actable scratch objects, was not used to make comparisons
higher level of control and let the users control scratch mod- between interfaces (the reactable and standard scratch tools,
els instead of scratching directly. for instance), but rather improvement within-subject for the
performance sessions and differences between-subject.
The interviews tried to isolate specific problems subjects
2. METHOD faced while playing, or any other comment not accounted
For the reactable, the development of the objects is done for in the questionnaire. As a last part of the interview, the
in Pure Data (Pd) patches. This allows for fast prototyping subjects suggested possible improvements to the objects and
and can even be done in run-time. The video projection that their behavior. Between the two sessions, most of their sug-
is used to provide visual feedback on the table is also the gested improvements could be addressed and implemented.
user environment that is displayed on the computer screen,
called the virtual environment. Working with the objects in
real life (moving, twisting and turning them) is very different 3. RESULTS
than working with the objects in the virtual environment. The reactable objects developed in a previous phase of the
During the design phase, the virtual environment was used project [1], were only slightly modified for the first session.
for editing and simple testing. The new patches were then In addition to the existing sample player, the vinyl move-
tested on the real table by the developers, and parameters ment models object, and the crossfader movement models
were adjusted to correspond to the objects. object, we introduced a “manual crossfader” that changed
The underlying concept of the scratch objects on the re- from on to off when moved, and a second sample player that
actable is that some of the patterns that DJs play on their used a different playback function. After the first session,
turntables and crossfaders are used as control models that the functionality of these two new objects were integrated
are triggered and manipulated by new gestures and actions. in the crossfader object and sample player, respectively.
The mapping between gestures and control is the most crit- As the development followed an iterative process, the re-
ical part of the design process. By assigning control prop- sults from the first session of evaluation naturally affected
erties and behavior to physical objects and by making con- the objects used in the second session. Not only the parame-
nections to sample playback functions, we came up with a ters were adjusted between the sessions; even the functional-
totally new method of “scratching”—and as a consequence, ity and mapping were changed and improved. The following
with new conventions for playing. We made an effort to section describes the state of the objects at the final stage.
respect the reactable principles in designing the objects, al- One important improvement between sessions was achieved
though we needed to make some compromises. by moving to a new reactable software version that increased
Given a virtually endless number of possible functionali- the time resolution of the video recognition from around
ties and mappings to test, a few were settled on. 30 fps to 60 fps.
Since there are only a handful of reactable instruments
and the number of performers is accordingly very limited, 3.1 New reactable objects
we decided to use two experts for testing. One was a pro- Figure 1 shows the three different objects active in the
fessional reactable musician, and the other an experienced virtual environment. The Loop player object plays back an
scratch DJ. By using experts from different fields, we aimed audio file and has visual representations of the track progres-
at highlighting important aspects of DJ and reactable do- sion (ii) and sound level (iii). A wave form (i) is “travelling”
mains respectively. from the object towards the Out. The Crossfader object
Two sessions were arranged for each subject. In the first applies a crossfader movement pattern (B) to the sound,
session, a 30 min rehearsal was followed by a 45 min per- resulting in the chopped-up sounds typical for scratching.
formance, while the second session only had a 45 min per- The sound level the Loop player will get (graphically rep-
formance. Some tasks were given, for instance to explore resented by A) moves out from the Crossfader object. The
all objects, to perform scratching with and without backing Movement speed object changes the sample player’s speed
music and to try beat juggling (another common DJ tech- in some defined patterns (3) and beat subdivisions (2) syn-
nique), but the subjects disposed the time as they wished. chronized with the current bpm of the table.6 The speed
alteration enforced on the audio playback is shown with (1)
5
SID is the acronym for COST IC0601 Action on Sonic In-
6
teraction Design, http://www.cost-sid.org. The presented There is a global bpm object in the normal reactable setup
work is the result of a Short Term Scientific Mission of two that is not included in Figure 1. The metronome is visual-
weeks granted by SID, reported in [8]. ized with a wave (a) propagating from the Out.
208
Crossfader object ject, it was found that having too many patterns was con-
fusing, and a few were chosen, including the chirp, the flare
Loop player object
A and a muting of the vinyl’s return movement.
iii Also for the crossfader object, a manual mode was in-
cluded where the sound was constantly on or off. Moving
ii
B the object would produce a very short silence or burst of
sound, like moving the crossfader between the fingers.
i 3.2 Evaluation
1
a (..........a)
3 Out
2
Movement speed object
Figure 1: Screenshot from the reactable virtual en-

gine. The three finalized scratch objects and the
table’s “output” are shown. All labels, letters and
numbers are added for explanations. Some visuals Figure 2: One of the test subjects playing with the
are continuously changing (1, A, i, ii, a), while some objects.
show the state or setting of an object (2, 3, B, iii).
There was a difference between the subjects in their initial

attitude toward the interface. The reactable expert (RE)
which travels toward the Loop player. was, as expected, trying out the objects systematically and
For all objects, parameters can be adjusted through for quickly experiencing the limitations that differentiates these
instance movement, rotation or distance to other objects. from the regular reactable setup. The DJ, on the other hand,
The final implementation is described in more detail in [8]. tried to find how the connection between his scratching tech-
For the loop player, short sampled sounds are easier to niques and these experimental objects could be made, and
use. When the movement model object is connected, the experienced that it was not very easy to scratch with models.
playback speed typically oscillates between ±150% for nor- In general, their attitudes changed over time. The RE be-
mal scratch sounds and up to several times that for the fast came more positive about the interaction with the objects,
scratches. How large portion of the sound that is scratched while the DJ became more pessimistic, especially towards
(similar to extent of hand movement) is decided by distance the control of the models.
between the movement speed object and the sample object. For the RE, the biggest problem was that playing with the
The subjects pointed out that for DJs it is important to models was not like playing with other reactable objects. He
have control over “where” the sound is at all times, and es- learnt the new objects fast enough, but got frustrated and
pecially the start of a sound. Therefore, rotation of the loop the enjoyment of playing did not increase. Somewhat con-
player should change the sound position when connected to tradictively, he felt more and more that he would be willing
the movement models. to use it in a concert situation, and that the possibility to
Additionally, the sampler’s playback speed could be man- express himself musically grew with practise. For the DJ,
ually altered just like pushing and pulling the vinyl. most aspects showed a negative trend. Still, he maintained
The movement speed object makes the playback speed of the impression that with practise, it was possible to play
the loop oscillate when connected. Although the DJ nor- something musically meaningful, and he felt overall that the
mally can vary the speed freely with the hand, the most outcome of his actions were predictable.
common action is to move it steadily back and forth to the The DJ was at first very positive towards playing with
rhythm, resulting in the so-called baby-scratch. The second models and patterns, but when he got to learn them better,
most common movement is the tear -scratch, where either he wanted a specific behavior. This was made clear only af-
the push or the pull is divided in two strokes. Here, the ter the last performance, so no improvements could be made
player can move gradually from baby to tear scratch by ad- to the objects before evaluation. The RE liked the crossfader
justing the object’s parameters. model, but not so much the movement model. Compared to
Initially, the crossfader and movement objects were syn- the reactable environment, the crossfader object fitted more
chronized for always guaranteing a ‘perfectly’ performed tech- closely to any expected behavior than the movement model.
nique. However, the testers would rather like to have both To the enthusiastic ears of the authors, performances by
objects synchronized to the global beat and to note dura- both testers sounded quite nice.7 It is perhaps not overly
tions derived from that bpm, but independent from each convincing as real scratching, but it might very well match
other. This makes sense compared to how many techniques and exceed the outcome of a total novice’s first few attempts
are varied by DJs. In normal performances, there is a large 7
And it sounded even better in an unplanned jam session
number of different crossfader patterns, and those patterns after the experiment that also demonstrated the reactable’s
also defines most of the techniques. For the crossfader ob- capacity as a collaborative instrument.
209
with turntables. With proper training, as seen in the evalu- Martin, Günter) and the MTG for all help.
ation, performances will improve greatly. Also thanks to Smilen Dimitrov, who worked alongside
Musically, the most interesting result could probably be this project with the implementation of physics-based mod-
found in the meeting between a skilled DJ and the easy els of friction sounds for both reactable and scratching.
access to the techniques normally used, with the added di- This work was sponsored as a Short Term Scientific Mis-
mension of manipulating their parameters in real-time with sion by COST IC0601 Action on Sonic Interaction Design
unfamiliar means. (SID), http://www.cost-sid.org, and by BrainTuning FP6-
2004-NEST-PATH-028570.
4. DISCUSSION
This is the first major test of performing with scratch
6. REFERENCES
techniques, or the combination of synchronized crossfader [1] M. Alonso. Scientific Report from ConGAS Short
patterns and oscillating movement patterns. The advantage Term Scientific Mission (STSM) to Stockholm.
of this approach, as compared to modeling the turntable and Technical report, ConGAS Cost action 287,
mixer, is that even non-experts can perform rather intricate http://www.cost287.org/documentation/stsms,
and correct techniques without much practise. Results from October 2006.
the evaluation show that the non-expert felt confident with [2] T. H. Andersen. In the Mixxx: Novel digital DJ
playing with the models. Given some more development, interfaces. In Proc. of CHI, pages 1136–1137, 2005.
this approach can provide realistic-sounding scratching for [3] C. Bornand, A. Camurri, G. Castellano, S. Catheline,
various types of interfaces. A. Crevoisier, E. Roesch, K. Scherer, and G. Volpe.
Today, the reactable is mostly used for either beat-based Usability evaluation and comparison of prototypes of
or more freely structured experimental electronic music. Vi- tangible acoustic interfaces. In Proceedings of
sually, the performances are very exiting as the blocks on the ENACTIVE05, 2005.
table creates dynamically changing images. To see it being [4] S. Dimitrov. Scientific Report from ConGAS Short
played in traditional DJ-style demonstrates another side of Term Scientific Mission (STSM) to Stockholm.
the graphical feedback, where the visualizations aid the mu- Technical report, ConGAS Cost action 287, March
sician in performance. Traditionally, turntablists mark their 2007.
records with stickers or notice the position of the center label [5] S. Dimitrov. Scientic report from SID Short Term
to find the right spot in the music. Here, we experimented Scientic Mission (STSM) to Barcelona. Technical
with other representations, and both subjects were able to report, MTG, UPF, 2008. Online: http://www.cost-
use them to their advantage. sid.org/browser/action/stsm/reports/.
For real virtuoso playing, the reactable implementation of [6] DJ Q-bert. Scratchlopedia Breaktannica: 100 secret
scratching cannot match real turntables, but by manipulat- skratches. DVD: SECRT001-DVD, 2007.
ing the models the musician can on the other hand effort- [7] K. F. Hansen. The Basics of Scratching. Journal Of
lessly go beyond what is normally possible to accomplish, New Music Research, 31(4):357–365, 2002.
for example very fast scratches.
[8] K. F. Hansen. Scientic report from SID Short Term
After the evaluation (and unscheduled jam session), the
Scientic Mission (STSM) to Barcelona. Technical
two musicians and the reactable team suggested a number
report, MTG, UPF, 2008. Online: http://www.cost-
of improvements to the objects. For the main part, the sug-
sid.org/browser/action/stsm/reports/.
gestions involve making the interaction smoother and easier,
[9] K. F. Hansen, M. Alonso, and S. Dimitrov. Combining
not changing the way they are designed.
DJ scratching, tangible interfaces and a physics-based
Working with the reactable has proved to be a helpful op-
model of friction sounds. In Proc. of the International
portunity for testing how specific playing styles and musical
Computer Music Conference, 2007.
ideas can be transferred to unfamiliar interfaces. The fast
and easy means for prototyping and testing have many ad- [10] K. F. Hansen and R. Bresin. Mapping strategies in DJ
vantages. High latency and slow response time, determined scratching. In Proc. of the Conference on New
by the frame rate and processing of the video image, might Interfaces for Musical Expression, 2006.
pose a problem. For manipulating techniques and patterns [11] S. Jordá, G. Geiger, M. Alonso, and
this was not troublesome, but for more direct manipula- M. Kaltenbrunner. The reacTable: Exploring the
tion of playback speed and amplitude, the instrument was synergy between live music performance and tabletop
as foreseen far too slow for expert performances with our tangible interfaces. In Proc. of the first international
implementation. conference on ”Tangible and Embedded Interaction”,
As mentioned in the Introduction, there was also a related Baton Rouge, Louisiana, 2007.
project by Dimitrov, who connected the reactable scratch [12] M. Kaltenbrunner, S. Jordá, G. Geiger, and
objects to physics-based models of friction sounds [9, 5]. M. Alonso. The reacTable*: A collaborative musical
Although not tested extensively, it was clearly possible to instrument. In Proc. of the Workshop on ”Tangible
use friction models instead of sampled audio as sound source Interaction in Collaborative Environments” (TICE),
for scratch patterns. at the 15th International IEEE Workshops on
Enabling Technologies, 2006.
[13] T. M. Lippit. Turntable music in the digital era:
5. ACKNOWLEDGMENTS Designing alternative tools for new turntable
We are very grateful for all the time and effort that the two expression. In Proc. of the Conference on New
dexterous musicians Lele and Carles put into the project. Interfaces for Musical Expression, 2006.
Thanks to the whole reactable development team (Sergi,
210
Developing block-movement, physical-model based

objects for the Reactable
Smilen Dimitrov Marcos Alonso Stefania Serafin

Aalborg University Universitat Pompeu Fabra Aalborg University
Copenhagen Music Technology Group Copenhagen
Medialogy Medialogy
malonso@iua.upf.edu
sd@imi.aau.dk sts@imi.aau.dk
ABSTRACT tory feedback using physical models - that aims to add an

This paper reports on a Short-Term Scientific Mission (STSM) ’acoustic’ behavior to the motions performed during inter-
sponsored by the Sonic Interaction Design (SID) European action with a Reactable - was investigated.
COST Action IC601. 1.1 Reactable interaction language
Prototypes of objects for the novel instrument Reactable
were developed, with the goal of studying sonification of
movements on this platform using physical models. A phys-
ical model of frictional interactions between rubbed dry sur-
faces was used as an audio generation engine, which allowed
development in two directions - a set of objects that affords
motions similar to sliding, and a single object aiming to
sonify contact friction sound. Informal evaluation was ob-
tained from a Reactable expert user, regarding these sets of
objects. Experiments with the objects were also performed
- related to both audio filtering, and interfacing with other
objects for the Reactable.
Figure 1: Interacting with a Reactable (from [9] )

Keywords
Reactable, physical model, motion sonification, contact fric- Typically, a Reactable user interacts and creates sounds
tion by moving and rotating different types of objects on the ta-
ble. Depending on the object types and their proximity to
1. INTRODUCTION one another, so-called ”links” are established between them,
The Reactable [6, 7, 8], developed by the Music Techology which determine the audio flow. Reactable objects have
Group at University Pompeu Fabra (UPF) in Barcelona, is a a physical representation (consisting of the object and the
novel electronic instrument, whose user interface is projected fiducial marker it carries) - and a corresponding visual repre-
on a tabletop surface. Users interact by moving and rotating sentation (generated by the Reactable engine), projected on
objects placed on the table. In that sense the Reactable the table top itself, based on the position and orientation
features a rich and specific interaction language, both from of the physical object’s marker. There are many types of
a tactile and a visual perspective. The Reactable is intended Reactable objects - some represent sequencers, others repre-
mostly for control of auditory devices typical in electronic sent samplers, envelopes, LFOs and other typical electronic
music, such as sequencers, oscillators and LFOs. Each audio music sound creation devices. In essence, each object can
effect is represented by a physical object, marked with a be manipulated through its rotation, through a ’finger’ pa-
unique fiducial pattern. rameter (which is set by pressing the finger on the table in a
In this paper, we describe a research project performed proximity of an object), and through its position on the ta-
during a short term scientific mission (STSM) visit of the ble - this represents a main part of the interaction language
first author to University Pompeu Fabra (UPF) in Barcelona, of the Reactable. Although the ”links” between objects are
which took place in January 2008. During this visit, audi- automatically established based on objects’ proximity, by
briefly touching two objects it is possible to establish a ’hard
link’ between them - which doesn’t break if another object
is brought in proximity.
Permission to make digital or hard copies of all or part of this work for As these motions usually change some parameter of an
personal or classroom use is granted without fee provided that copies are electronic music instrument device (like filter amount or a
not made or distributed for profit or commercial advantage and that copies sequence number), typically slow but precise motions are
bear this notice and the full citation on the first page. To copy otherwise, to required from the player. This is also represented in other
republish, to post on servers or to redistribute to lists, requires prior specific modes of electronic music performance, where sound is con-
NIME08, Genova, Italy tinuously generated, and the player only changes certain
Copyright 2008 Copyright remains with the author(s). parameters of the individual instrument devices. In that
211
sense, the Reactable does not exhibit ’acoustic’ behaviour the objects, which could then be mapped to a sound param-
in a physical sense; however, due to the tactile nature of the eter. In principle, a contact friction sound is perceptually
way to interact with it, one can easily imagine a related - noisy, and thus it could be generated through variuos sources
physical - set of table and objects, made of a rough material. [1].
In this ’rough physical’ case, gliding of the objects upon the A Pure Data real-time implementation of a physical model
table surface would produce a contact friction sound, which of frictional interaction between dry surfaces was available,
lasts as long as the objects are in motion. To create an anal- which has already been described in [12, 13, 11]. It was
ogy with the real world: assuming that objects and table are decided that this friction model could be used as a sound
made of, say, wood - it is easy to conceptualize that to pro- generator for contact friction - especially in those ranges
duce significant amount of sound from this system, would where high forces and low velocities would be involved. Due
require both a significant amount of force, and specific mo- to the limited duration of the STSM visit, only a design and
tions, from the player. For the purposes of this paper, we implementation of a prototype object was initally planned,
will name such motions ’block movements’. to be followed by an expert user evaluation.
2.1 Mapping between the Reactable and the

friction model
Figure 2 shows the first proposal for the mapping strat-
egy between the Reactable and the friction model, called
’kviolin’ in this simulation. The screenshot was extracted
from the Reactable simulator. In the first proposal, four pa-
rameters of the friction model (velocity and position of the
excitation, amplitude and frequency of the resonator) were
selected and controlled by objects of the Reactable. The
object called ”source” controls the amplitude of the sound
using a finger, and the excitation position using the rota-
tion of the object; while the distance of the object from the
table center, is mapped to fundamental frequency of the fric-
tion model. Excitation velocity and force are kept constant.
Upon adding the exciter object, it is possible to additionally
control the exciter force and velocity. The exciter force is
controlled through a finger parameter, whereas rotation of
the object maps to a constant velocity.
Figure 2: Reactable simulator showing the first pro-

posal for a mapping strategy to connect the Re-
actable to the friction model.
2. METHODOLOGY
The work of developing objects that would demonstrate
block motions on a Reactable, was made much easier by the
efforts of the Reactable team, who provided a working and
fully compatible standalone Reactable simulator for Win-
dows, with an audio engine based in PureData (Pd) [10].
The simulator renders the visual representation of the Re-
actable objects on screen, and allows these representations
to be manipulated through the GUI - as on a real Reactable.
Patches developed on the simulator can then be ported and Figure 3: Reactable simulator showing the second
tested on a real Reactable. As it was relatively easy to build proposal for a mapping strategy to connect the Re-
upon existing objects for inheritance of the user interaction, actable to the friction model.
most of the work consisted of audio programming in Pd.
Since one of the defining high-level characteristics of block This first proposal contradicted with common mapping
motion sound seems to be the relationship between sound strategies used in the Reactable, where frequency is related
volume and velocity of the objects on the table, it was de- to rotation of an object. Moreover, the relatively slow cam-
cided that the main parameter obtained from the objects era used for tracking, created some differences in response
(besides the standard parameters), would be the velocity of between the simulator and the tangible interface. Therefore
212
a second mapping strategy was investigated, as shown in Additionally, video recordings were taken from some of the
Figure 3. In this second strategy, the parameters mapped development tests; these, along with a development log, were
in the exciter object were switched, the constant velocity posted online [4].
was mapped to a finger parameter and force was mapped
to rotation. The second prototype was further improved, to
produce the strategy shown in Figure 4.
Figure 5: Friction objects for Reactable
4. DISCUSSION
4.1 Reactable as a development platform
As mentioned previously, as the Reactable (both real and
simulator) has an interface to Pd, the easiest way to create
Figure 4: Reactable simulator showing the third additional audio capabilities for it, is by creating plugins
proposal for a mapping strategy to connect the Re- for Pd. From a perspective of a new Reactable object de-
actable to the friction model. veloper, possibly the only glitch in the engine could be the
current impossibility to set the so-called ’finger’ parameters
The implementation of the SkipProof (a DJ scratching of Reactable objects directly from Pd (as it is possible with
application and virtual turntable developed by Hansen and the ’rotation’ parameter of the objects, for instance). Other-
others at KTH [5]) engine as a set of DJ scratching objects wise, it is relatively easy to develop the auditory behaviour
for the Reactable [2] was also furthered. As interfacing be- of new objects using a Reactable simulator locally.
tween the friction physical model and the SkipProof engine In our experiments we used a vision tracking system work-
was attempted as a part of a previous STSM visit [3], it was ing at 60 fps. This created some problems with motion blur
attempted again - this time as an experiment in the context during fast motions. For future experiments, the motion
of Reactable objects. blurring and low (in audio terms) 60 fps framerate must
The original intent to develop a single block-motion ob- be taken into account - especially for objects that are to
ject, changed soon after deciding to take upon the friction be moved in a faster, linear manner across the table. This
model as a sound engine base - as also in frictional interac- proved to be a major difficulty in implementing a motion-
tions which happen in the real world, we can observe situ- based object, as for faster linear motions the system failed to
ations where several objects interact, but only one of them detect the object, and the corresponding control signal used
is the primary sound source. in audio was interrupted. Some measures were attemped
Hence, the goal was extended with development of a pro- to overcome this, which were not succesful - which finally
totype of a set of objects for the Reactable, where one would resulted in the not-so-extatic evaluation of these Reactable
represent the interaction control, and the other would repre- object prototypes.
sent the source. As an analogy to bowed string instruments, Here, the framerate issue had to be taken into account for
we can consider these objects as a ’bow’ interacting with a both the exciter object, and the single friction object, whose
’string’. average velocity of motion across the table was used to de-
rive a bow velocity signal. For these objects, accumulation
of signal values, undersampling and linear smoothing was
3. RESULTS attempted to overcome the sudden change of values (dur-
The main results of the study visit are the production of ing video tracking blurring). This, however, didn’t prove to
prototypes of two sets of Reactable objects, and their pre- be efficient; averaging and low pass filtering in audio sig-
liminary (and informal) evaluation by an expert Reactable nal domain, would possibly be a much better approach to
user. overcome these problems. On the other hand, one can try
As a first set of experiments, we tried to emulate the sound and avoid linear motions when designing interaction, and
of a bow exciting a violin string. The second set is a single replace them with rotatory ones - as was suggested by the
object intended to simulate the sound of surface friction of expert user. Although, it is important to note that the Re-
moving objects in contact. actable team currently works on overcoming these problems
213
with the vision input, and at some point in the future, such European Cost action 287 and SID, European Cost action
problems could become minor. 601 respectively.
4.2 Problems with the friction model 7. REFERENCES

The first experiment was performed in order to try to tune
[1] A. Akay. Acoustics of friction. The Journal of the
the friction model to simulate the most common musical
Acoustical Society of America, 111:1525, 2002.
instrument driven by friction, i.e., a bow interacting with a
[2] M. Alonso. Scientific Report from ConGAS Short
string. The same can be extended to applying independent
Term Scientific Mission (STSM) to Stockholm.
control signals to change the friction model parameters. For
Technical report, Technical report, ConGAS Cost
instance, in the case of the controller/source set of objects,
action 287, http://www. cost287.
the friction model can be understood as a crude bowed string
org/documentation/stsms, October 2006.
model - and the corresponding parameters involved (such as
force or velocity), could be understood as a ”bow force” or [3] S. Dimitrov. Scientific Report from ConGAS Short
”bow velocity”. Initially, one can apply independent signal Term Scientific Mission (STSM) to Stockholm.
sources for ”bow force” and ”bow velocity”, but this will not Technical report, Technical report, ConGAS Cost
necesarilly produce a physically realistic (or pleasant) sound action 287, http://www. cost287.
- as the change of these variables in reality may be coupled org/documentation/stsms, March 2007.
(for instance, change of bow force may be coupled to change [4] S. Dimitrov. Stsm development log - barcelona 2008,
of bow velocity - as pointed out by both research [12], and http://media.aau.dk/∼sd/barca08/barcelona stsm 08.html.
expert user evaluation). So, regarding the issue of finding World Wide Web electronic publication.
auditorily pleasant parameter values of the friction model, [5] K. Hansen. The Basics of Scratching. Journal of New
understood as a bowed string (i.e. a violin bow) model - it Music Research, 31(4):357–365, 2002.
may be a better approach to acquire recordings of control [6] S. Jordà, G. Geiger, M. Alonso, and
signals from a real-life source (like a violin) first, which are M. Kaltenbrunner. The reacTable: exploring the
certain to drive the physical model in a predictable range synergy between live music performance and tabletop
of attractive sonic output; and then use these as a base for tangible interfaces. Proceedings of the 1st international
further development of both a friction sound source object, conference on Tangible and embedded interaction,
and for an independent controller object for a Reactable. pages 139–146, 2007.
This approach could be extended to any kind of interface [7] S. Jordà, M. Kaltenbrunner, G. Geiger, and
that might be applied to a friction physical model - and with R. Bencina. The reacTable*. Proceedings of the
the usage of a proper physical model, one could ultimately International Computer Music Conference (ICMC
extend the set of friction Reactable objects, to become a 2005), Barcelona, Spain, 2005.
proper set of physical violin controller objects. For example, [8] M. Kaltenbrunner, S. Jordà, G. Geiger, and
results described in [13] show that a precise augmented bow M. Alonso. The reacTable*: A Collaborative Musical
interface is an ideal controller for a bowed string physical Instrument. Proc. of the TICE Workshop at the
model. WETICE 2006.
[9] MTG-UPF. Reactable homepage,
5. EXPERT USER EVALUATION http://mtg.upf.es/reactable/. World Wide Web
electronic publication.
The comments collected during the informal expert user
interview can be summarized as follows. Usage of a fric- [10] M. Puckette. Pure data: another integrated computer
tion model in the Reactable could be interesting, but, before music environment. Proc. the Second Intercollege
that, some improvements to the sound quality and control Computer Music Concerts, Tachikawa, pages 37–41,
of the friction model are needed. The concept of dual ob- 1996.
jects is found interesting, beyond the notion of a controller [11] S. Serafin. Toward a generalized friction controller:
- it has been suggested that an exciter object is made, that from the bowed string to unusual musical instruments.
could similarly change parameters of any Reactable object. Proceedings of the 2004 conference on New interfaces
The concept of a single, movement-driven, friction object for musical expression, pages 108–111, 2004.
has not been found particularly interesting, and it does not [12] S. Serafin, F. Avanzini, and D. Rocchesso. Bowed
necessarily require a physical model as an underlying sound string simulation using an elasto-plastic friction
engine. model. Proc. Stockholm Music Acoustics Conf.(SMAC
So, in spite of some technical problems experienced dur- 2003), pages 95–98.
ing prototyping, , there are indications that objects that [13] S. Serafin and D. Young. Bowed string physical model
provide sonification of block movement in the context of validation through use of a bow controller and
the Reactable system, could potentially be a usable musical examination of bow strokes. Proc. Stockholm Musical
expression tool (provided the technical problems are over- Acoustics Meeting (SMAC), 2003.
come).
6. ACKNOWLEDGMENTS
The first author would like to thank researchers at the
Music Technology group at UPF for welcoming him for the
STSM and for providing useful input in the development of
the application. The STSMs were sponsored by ConGAS,
214
Real Time Gesture Learning and Recognition: Towards

Automatic Categorization
Jean-Baptiste Thiebaut† , Samer Abdallah‡ , Andrew Robertson‡ ,

Nick Bryan Kinns† , Mark Plumbley‡
†Interaction, Media and Communication Group, {jbt,nickbk}@dcs.qmul.ac.uk
‡Centre for Digital Music, {samer.abdallah,andrew.robertson,mark.plumbley}@elec.qmul.ac.uk
Queen Mary, University of London
ABSTRACT (2)
This research focuses on real-time gesture learning and recog-

nition. Events arrive in a continuous stream without ex- 3
plicitly given boundaries. To obtain temporal accuracy, we 2
1
need to consider the lag between the detection of an event
and any effects we wish to trigger with it. Two methods 0 50 100 150 200
for real time gesture recognition using a Nintendo Wii con-
troller are presented. The first detects gestures similar to a Figure 1: The three accelerometer signals captured
given template using either a Euclidean distance or a cosine from a Wii controller while making repeated gestures.
similarity measure. The second method uses novel informa-
tion theoretic methods to detect and categorize gestures in
an unsupervised way. The role of supervision, detection lag on pre-defined classification, we are interested in real-time
and the importance of haptic feedback are discussed. classification for use in music performances. A starting point
of our research was to develop an on-the-fly learning of spe-
cific gestures, in order to create a database of recognizable
Keywords gestures that could be shared between performers. The first
Gesture recognition, supervised and unsupervised learning, part of this paper describes two algorithms used to recognize
interaction, haptic feedback, information dynamics, HMMs a fixed length gesture. The second part presents a dynamic
and unsupervised recognition model that is able to handle
1. INTRODUCTION various length gestures. The two methods are discussed and
Gesture forms an integral part of music performance. Tra- future works are presented.
ditional instrumentalists develop a virtuosity for the ges-
tures related to their instruments. In a similar manner, the 2. SUPERVISED METHOD WITH
performers who use digital interfaces develop a virtuosity HAPTIC FEEDBACK
adapted to their devices, and an important issue to address
The Wii remote controller is a popular and pervasive de-
is to categorize and recognize these gestures. Research by
vice that detects 3-dimensional movements via three ac-
Cadoz and Wanderley [3] has stressed the importance of
celerometers, one for each dimension (relative to the con-
gesture classification and recognition. Previous research by
troller). The signals produced by the accelerometers are
Cadoz [2] also emphasized the importance of haptic feed-
transmitted via Bluetooth to a laptop computer. We used
back for the design of interactive interface for sound pro-
an external object within Max/MSP developed by Masayuki
duction: the physical feedback given by the intermediary
Akamatsu to decode the transmissions from the controller.
device - such as a Wii remote in our case - contributes to cre-
The three signals sent by the controller are sampled at rate
ate memorizable gestures, and complete the audio feedback
of 50 Hz with an accuracy determined by the Max/MSP in-
rendered by the interface. Kela et al. [5] studied the use
ternal timing system. The latency produced by bluetooth
of accelerometers for multi modal activities; applications in
devices has been estimated to approximately 50ms [9]. How-
music have also been studied (see e.g. [8]). However, the al-
ever, more precise measures of both latency and sampling
gorithms presented for this research can be used with other
jitter still need to be made. The Fig. 1 shows an example of
gesture controllers, such as motion capture or any sensor-
how the data evolves over a fixed period of time.
based technology. Whilst other approaches have focussed
The Wii device can produce a vibration that we use as
feedback to the user when a gesture is recognized. In addi-
tion, a visual cue is produced. We now turn to a method
Permission to make digital or hard copies of all or part of this work for implemented in order to categorize a gesture in real time
personal or classroom use is granted without fee provided that copies are with supervision.
bear this notice and the full citation on the first page. To copy otherwise, to 2.1 First method: Euclidean distance
republish, to post on servers or to redistribute to lists, requires prior specific In this method, a window of controller signals is stored.
NIME08, Genova, Italy The length of the window is determined by the duration of
Copyright 2008 Copyright remains with the author(s). the gesture to be recognised, so that the length in samples,
215
L, will depend on the sampling rate, e.g. 6 (x, y, z) triplets (a) Incoming signals from Wii controller (5)
at 50 Hz for a gesture that lasts 120ms. The user triggers 100
the capture of a template or reference gesture by pressing a
button (‘A’) on the controller at the end of the movement1 . 0
At this stage, the system is ready to compare fragments of −100

the incoming data with the reference gesture. 0 50 100 150 200 250 300 350 400
If the reference gesture Vr is considered as a 3L-dimensional (b) Euclidean distance between reference vector and incoming vector (5)
vector, and Vi is a similar vector constructed from the last 500
L samples of the input signal, then the Euclidean distance
between the reference and the input is
p 0
D = (Vi − Vr ) · (Vi − Vr ), (1) 0 50 100 150 200 250 300 350 400
(c) Cosine of angle between reference vector and incoming vector (5)
where for our purposes the dot product is defined as 1
X
L 0
A·B = Ax (i)Bx (i) + Ay (i)By (i) + Az (i)Bz (i), (2)
i=1 −1
0 50 100 150 200 250 300 350 400
that is, a sum over the L samples and the three dimensions.
The gesture is detected when the distance drops below, or Figure 2: Analysis of the signal for a repeated gesture.
reaches a minimum below, a given threshold, as shown in The reference gesture was taken from the begining and is
fig. 2(b). visible where the distance drops to zero. Suitable thresh-
olds for detection are shown as black horizontal lines.
2.2 Second method: cosine similarity
The cosine of the angle between the reference vector and
where the Euclidean distance drops to zero. As it is shown
the input vector can be computed by taking the dot product
in figure 2, repetitions of the same gestures are not iden-
and dividing by the norms of the two vectors:
tical, therefore the threshold for detection must be larger
Vr · Vi than 0 or less than 1 for the two methods respectively. The
C= √ √ , (3)
Vr · Vr Vi · Vi cosine method, being invariant to overall magnitude of the
accelerometer signals, is able to recognize the reference ges-
using the same definition of the dot product as before. It is ture even if it is performed at a larger scale, as long as it
1 when the vectors are parallel, i.e. the gestures are identi- has the same duration.
cal up to an arbitrary scaling factor. Thus, we can detect Both methods are quite sensitive to the choice of reference
gestures similar to the reference by looking for peaks in the gesture and the thresholds, but in this case we were able to
cosine above a certain threshold, as shown in fig. 2(c). find parameters that gave successful detection of all 45 in-
stances of the reference gesture using the cosine methods,
2.3 Discussion and 44/45 using the Euclidean distance measure, with no
Supervised recognition, in both cases presented above, false positives. We were also able to use the results of the
seems to be an appropriate method for the definition of pre- initial run to construct a better reference gesture by averag-
cise gestures. By focusing on one gesture at a time, we are ing all the previously detected instances. This gave perfect
able to repeat a movement several times until the vibration results using both methods.
produced (as a result of the recognition) arrives at the mo-
ment it is expected. Moreover, the issue of latency due to
the various processing steps can be addressed. A gesture can 3. UNSUPERVISED METHOD USING
be recognized before it is finished as long as its initial frag- INFORMATION DYNAMICS
ment can reliably be recognized in advance. In our case, we The above supervised method requires two distinct pieces
observed that initial fragments of more than 80ms are usu- of information to recognise a gesture in a timely way: one is
ally distinct enough not to be confused with other gestures. the reference gesture with its label and the other is the indi-
If we increase the ‘anticipatory lag’ by choosing a gesture cation of the particular time point, relative to the reference,
template from an initial fragment that ends well before the at which to respond to the gesture. This can be thought of
end of the gesture, the haptic feedback can be triggered at as a mark indicating the ‘perceptual centre’ of the gesture
the time the performer expects, but on the other hand, the (see fig. 3).
detection is less reliable. The number of entries of the con- Though in some applications it may be possible to inter-
stituted database is also an important factor in the overall leave the training phases with the performances phases, as
error rate. we did in the system described above, in other applications
We chose to analyse a regular, repeated movement, con- it may not be possible for the person or system creating the
sisting of a cycling through three hand movements, visible gestures to provide this extra stream of information stating
as the large peaks in fig. 2(a). One of these movements, ex- that ‘this is gesture A’, ‘this is gesture B ’, and so on. For
tracted from near the beginning of the signal, was taken to example, a dancer’s movements might be improvised and
be the reference gesture—it is visible in fig. 2(b) at the point the dancer too occupied with the actual execution of them
1
Pressing the button while doing the gesture is not an ap- to be able to mark and label them as well. However, human
propriate solution in the long term, as it affects the gesture observers are capable of recognising a repeated gesture and
itself. This problem is addressed in the unsupervised version inferring a series of relatively precise timings from what is
(see section3). on the face of an unstructured continuous movement.
216
position perceptual 11
centre 1
10 6
9
12 16
18
time 19 15
20
2
7
14
17
Figure 3: A one dimensional gesture (e.g. a hand moving
4
up and down) where the implied punctual event or beat 13
8
is marked as the perceptual onset and is some time after 5
the initial onset. 3
Z:Past X:Present Y:Future

Figure 5: The state space of one of the HMMs trained
S1 S2 S3 S4 S5 S6 S7 on the recorded data. The directed edges represent the
transitions; self-transitions and transitions with very low
probability have been hidden. The darkness of the edges
Figure 4: Representation of a sequential perceptual pro- shows the probability of the corresponding transition.
cess: at any given time, there will be context of known
previous observations, a ‘current’ observation, and an
unobserved future.
chain) in which the predictive information associated with
each observation can be computed quite straightforwardly2 .
3.1 Predictive information The analysis proceeds as follows. The three sampled sig-
The question at the heart of gesture recognition is how do nals are windowed, taking L consecutive samples, and rep-
we perceive discrete and punctual (that is, associated with a resented as a vector with N = 3L components. At each time
particular point in time) events in a continuous signal? Our step the window is shifted along by one sample. The result-
approach to this is to consider the predictive information ing sequence of vectors is taken as the as the continuous-
rate of the signal as processed by the observer. Essentially, valued observation sequence from a hidden Markov model
we consider our hypothetical observer to be engaged in a (HMM) with Gaussian state-conditional distributions and
continuous (and largely unconscious) process of trying to K possible states. The parameters of this HMM (the tran-
predict the future evolution of a signal as it unfolds. These sition matrix and the mean and covariance for each of the
predictions are probabilistic in nature; that is, they entail the K states) are trained using a variant of the Baum-Welch
assignments of probabilities to the various possible future algorithm [7]. Once the HMM is trained, the most likely
developments. sequence of hidden states is inferred using the Viterbi algo-
A sufficiently adaptive perceptual system will internalise rithm and the information dynamic analysis applied to the
any statistical regularities in the signal, such as smoothness Markov chain.
or any typical or repeated behaviour, in order to make bet- Many instances of the system were trained with different
ter predictions. If a particular observation, which in practi- random initialisations. Fig. 5 shows the underlying Markov
cal terms might consist of a few samples of motion capture chain found in one such instance with L = 9 and K = 20.
data, brings about a large change in the predictive prob- The transition structure shows that there a small number
ability distribution, then we associate with it a large pre- of typical paths through the state space, corresponding to
dictive information. In this way, we can plot the predic- different gestures. Our information dynamic analysis auto-
tive information rate against time. Referring to fig. 4, the matically picks out states which most effectively signal that
predictive information is the Kullback-Leibler divergence (a a particular path is being traversed; in the figure, the most
measure of distance between probability distributions) be- informative states are 17, 8, and 15. Note that state 3 is not
tween P (Y |Z = z) and P (Y |Z = z, X = x), where Z = z and as informative as state 8 as state 3 has a high self transition
X = x denote the propositions that past and present vari- probability.
ables respectively were observed to have particular values z In fig. 6, the variation in predictive information rate over
and x. time is shown (this example actually uses a different HMM
Now, depending on both the signal and the observer’s pre- from that shown in fig. 5). Event detection then proceeds by
dictive model, the predictive information rate can take many picking all transitions with a predictive information greater
forms, but in particular, it may in some cases be relatively than a fixed threshold, and the identity the target state is
flat, while in others, more peaky or bursty, in the sense that used to categorise the event. In our experiments, we sonified
the predictive information arrives in concentrated ‘packets’ these events using a different pitch for each event type. In
interspersed by longer periods of relatively low predictive most cases, all the gestural events (approximately 150 in
information. It is in this latter case that we identify the total) are detected and categorised into 2–4 classes, with
‘packets’ of information as the ‘events’. 1–3 false positives.
3.2 HMM-based implementation 2

However, the Markov chain is not observed but inferred
We have implemented a version of this hypothetical ob- using a hidden Markov model (HMM) so there is an element
server using a relatively simple predictive model (a Markov of approximation involved
217
(3) ing algorithm for the HMM [7]. The explicit probabilistic
formulation of the model makes it well suited to handling the
3 detection latency problem by predicting the future motion
2
1
of the controller and estimating how accurate this prediction
0 50 100 150 200 might be. The supervised method, however, is implemented
ave predinfo for states (3) in Java as a plug-in for Max/MSP and works in real-time.
2 An external to calculate the Euclidean and cosine match-
1 ing methods for any signal will be soon be released. Online
training of HMMs is possible but is an inherently more dif-
0
0 50 100 150 200 ficult problem which we are researching currently.
predinfo for transitions: (3) Part of the motivation behind this work is that multiple
4 performers could use the system and thereby share infor-
2
mation about gestures made. For example, when a gesture
triggers or schedules a sonic or visual event, it could also
0
0 50 100 150 200 cause a vibration signal to be sent to the other performers’
controllers. This extra level of haptic communication could
Figure 6: Information dynamic analysis of accelerom- enhance the sonic and visual interaction without interfering
eter signals in top panel. The middle panel shows the with the performance as seen and heard by the audience.
state sequence inferred from the HMM in a way that Future work will explore the importance of shared cues be-
highlights the average informativeness of each state in tween performers and the development of haptic solutions
the sequence: the shading of each marker encodes which to communicate these cues.
of the 20 states is active, while the y-axis represents
the average predictive information associated with that 5. ACKNOWLEDGMENTS
state. In the bottom panel, the shading encodes the state This work was partly supported by two EPSRC grants:
as before, but the y-axis encodes the predictive informa- GR/S82213/01 and EP/E045235/1. A. Robertson and J.-B.
tion associated with that particular transition in context. Thiebaut are supported by EPSRC Research studentships.
3.3 Related work 6. REFERENCES

Hidden Markov models have been applied to gesture recog- [1] S. Abdallah and M. Plumbley. Unsupervised onset
niton by many researchers [4, 6]. In the terminology used in detection: a probabilistic approach using ICA and a
this field, our system performs continuous gesture recogni- hidden Markov classifier. In Cambridge Music
tion because there are no given boundaries between gestures. Processing Colloquium, Cambridge, UK, 2003.
Our current HMM based system is not online but could eas- [2] C. Cadoz, A. Luciani, J. Florens, C. Roads, and
ily be made so using fixed-lag decoding of the HMM instead F. Chadabe. Responsive input devices and sound
of the current off-line Viterbi algorithm. synthesis by stimulation of instrumental mechanisms:
Unlike other HMM-based systems of which we are aware, The cordis system. Computer Music Journal, 1984.
our system uses a single HMM to model all gestures instead [3] C. Cadoz and M. Wanderley. Gesture - music. In M.M.
a separate HMM for each one. Thus the categorisation of Wanderley and M. Battier, editors, Trends in Gestural
input signals as one gesture or another is made through the Control of Music. Ircam - Centre Pompidou, 2000.
normal operation of the forwards-backwards or Viterbi al- [4] Y. Iwai, H. Shimizu, and M. Yachida. Real-time
gorithms. context-based gesture recognition using hmm and
In fact our system is more closely related to the audio on- automaton. In Proc. of the Int. Workshop RATFG-RTS
set detection system described in [1]. The difference is that ’99, Washington, 1999. IEEE Computer Society.
in the earlier system, the choice of which states were to be [5] J. Kela, P. Korpipaa, J. Mantyjarvi, S. Kallio,
taken as indicators of significant events had to made manu- G. Savino, L. Jozzo, and S. Di Marca.
ally, where as the current system uses information dynamic Accelerometer-based gesture control for a design
principles to do this automatically. environment. Personal and Ubiquitous Computing,
pages 285–299, 2006.
[6] C. Lee and Y. Xu. Online, interactive learning of
4. CONCLUSION gestures for human/robot interfaces. In IEEE Int. Conf.
In this work, we have investigated the development of ef- on Robotics and Automation, pages 2982–2987, 1996.
ficient tools for real-time gesture recognition. The Nintendo [7] L. R. Rabiner. A tutorial on hidden markov models and
Wii remote was chosen to provide data to our methods, how- selection applications in speech recognition. Proc. of the
ever, both supervised and unsupervised algorithms are adap- IEEE, 77(2):257–286, 1989.
tive enough to deal with signals from different controllers. [8] H. Sawada and S. Hashimoto. Gesture recognition
The template matching system is based on well-known tem- using an accelerometer sensor and its application to
plate matching methods, while the HMM based system uses musical performance control. Electron Commun Jpn,
novel information-theoretic criteria to enable unsupervised Part 3:9–17, 2000.
identification of an initially unknown number of gestures. At [9] Sena Technologies. White paper: Latency/throughout
this stage, the recognition part of the HMM-method is im- test of device servers/bluetooth-serial adapters.
plemented in Matlab, but could be implemented in real-time Technical report, Sena Technologies, 2007.
fairly straightforwardly using a standard fixed-lag smooth-
218
Making of VITESSIMO for Augmented Violin:

Compositional Process and Performance
Mari Kimura
The Juilliard School
80 Lincoln Center Plaza
New York, NY USA
marikimura@mac.com
musical expressions created by the sounds do not exactly have

the direct correlation with bowing. [Figure 1] shows a short
ABSTRACT phrase, where the highest amplitude is where the bowing i s
This paper describes the compositional process for creating still in the process of making the c r e s c e n d o . Bowing
the interactive work for violin entitled VITESSIMO using the movements do not physically illustrate the curves or designs
Augmented Violin [1]. of musical expression and the perceptual effect.
Keywords
Augmented Violin, gesture tracking, interactive performance
1. INTRODUCTION
In June 2006, after meeting at 2006 NIME at IRCAM, Dr. Figure 1. Crescendo and amplitude discrepancy
Frédéric Bevilacqua (who has previously collaborated with
composer Florance Baschet [2]), and I decided to collaborate, B. However, bowing movements before and after a functional
which became my project of creating a new work entitled bowing, how you prepare before starting a stroke, and how you
VITESSIMO for Violin and the Augmented Violin, release the bow after ending a stroke, directly affect the
commissioned by Harvestworks. Nicolas Rasamimanana, one expression that the bow arm must make (or just made), in order
of the designers of the Augmented Violin said, "It is important to create a 'correct' or desired movement and musical
for us to stress that IRCAM's ultimate goal is to make such a expression. I personally recognize these ‘non-sound
device affordable and easy for any acoustic instrument to be producing’ movements as a kind of a gold mine of musical
"augmented” [3] expression, as such information is not transmittable without
the Augmented Violin; it ‘augments’ the expression of the
2. How to use the Augmented Violin violin.
I understood very early that Augmented Violin could become
just a fancy device ends up becoming an alternative to a
3. Building a ‘palette’
simple footswitch, only to create what George Lewis would call It was essential for me to first acquire an entirely new ‘palette’
the “Command and Obey” mechanism, and not a true of expressions using the Augmented Violin in order to start
interaction. [4] Dr. Andrew Schloss, one of the foremost composing VITESSIMO. Interactive Installation artist David
composer/percussionist working with the Radio Drum, a 3- Rokeby wrote, “Rather than creating finished works, the
dimensional computerized gesture controller, [5] mentions interactive artist creates relationships. The ability to represent
that he also differentiates two kinds of information coming relationships in a functional way adds significantly to the
out from his device: "meta-information" and "information", as expressive palette available to artists.” [7]
the meta-information is information about the event, but not I imagined performance scenarios that can only be made using
the event itself. [6] Dr. Schloss’s comment corresponds to my the Augmented Violin, such as:
own observation of violin bowing described below.
3.1 ‘Silent’ violin
2.1 Observing Bowings [Example 1] These low ‘echo’ pizzicatos are generated by the
For composing VITESSIMO using the Augmented Violin, I Augmented Violin, which detects a ‘mock’ pizzicato
started making observations of my bowings. My findings s o movement of my right arm.
far, can be described in two main points below:
A. Bowing is a functional movement to create sounds. But the
NIME08, June 4-8, 2008, Genova, Italy [Example 1] ‘silent Pizzicato’
219
3.2 Control Sounds or Rhythm

[Example 2] shows a transition between two phrases i n
VITESSIMO. The violin plays a soft phrase, decreasing in both
speed (“molto rit.”) and dynamics (d e c r e s c e n d o t o
pianissimo). As the bow slows down to a halt, the scaled
output from the Augmented Violin sends a rhythm that slows
down as well, corresponding to the decreasing bow speed.
[Figure 2] Augmented Violin Glove
5. Conclusion
This paper describes a ‘palette’ of expression using the
Augmented Violin. I believe that a gesture-tracking device
such as the Augmented Violin should be musically coherent
and effective, even without visual effect. There is also a danger
that a gesture-tracking interface could make a performer
unknowingly calibrate his/her gestures for the device. At the
same time, I believe that using the Augmented Violin and
[Example 2] Tracking ‘molto rit.’ creating new ‘palette’ of expression, is an extraordinary
learning process of human-machine interaction, developing
3.3 Control without playing new kinds of expression of our time.
I use ‘retake’ bowing gesture for creating expressions,
especially movements right before the second stroke, I believe, ACKNOWLEDGMENTS
must be consistent with the expression of the musical context Special thanks to:
of the second stroke. [Example 3] shows the non-sound
making up-bow ‘retake’ movement, controlling the glissando Nicolas Leroy and Emmanuel Fléty and the Real Time Musical
rate of the pitch-shifted, delayed chord. Interactions Team at IRCAM, Harvestworks, and Hervé
Brönnimann.
REFERENCES
[1] F. Bevilacqua, F. Guédy, N. Schnell, E. Fléty, N. Leroy.
“Wireless sensor interface and gesture-follower for music
pedagogy”, In Proceedings of the 2007 Conference on New
Interfaces for Musical Expression (NIME07), New York,
NY, USA
[2] F. Bevilacqua, N. Rasamimanana, E. Fléty, S. Lemouton and
F. Baschet. “The augmented violin project: research,
composition and performance report”, In Proceedings of
the 2006 Conference on New Interfaces for Musical
[Example 3] ‘Retake’ tracking Expression (NIME06), Paris, France.
4. Making the Augmented Violin Glove [3] N. Rasamimanana: email correspondence with the author
When Dr. Bevilacqua loaned me the Augmented Violin, the [4] G. Lewis, G, Interacting with latter-day musical automata.
device was made of two small parts connected with short wires. Aesthetics of Live Electronic Music: Contemporary Music
The sensor portion attaches to the bow, and a small circuit Review, 18(3): 99–122, 1999.
board containing a battery and wireless portion attaches to the [5] R. Jones, A. Schloss, Controlling a physical model with a
bow arm with a Velcro band. Therefore I created my own 2D force matrix, In Proceedings of the 7th conference o n
Augmented Violin Glove, which is a lace glove containing New interfaces for musical expression, 2007, p. 27-30.
both the sensor and the battery portion of the Augmented
Violin. The glove is made of Velcro strips, balloons attached [6] A. Schloss: Email correspondence with the author
to a lace glove for elasticity. The Velcro strips allow [7] D. Rokeby, “Transforming Mirrors : Subjectivity and
experimenting quickly with the different placement and angles Control in Interactive Media”, Critical Issues in Interactive
of the accelerometers. (See [Figure 2]) Media, ed. S. Penny, SUNY press, 1996, p.133-158.
220
Programming a Music Synthesizer through Data Mining
Jörn Loviscach
Hochschule Bremen (University of Applied Sciences)
Flughafenallee 10
28199 Bremen, Germany
joern.loviscach@hs-bremen.de
ABSTRACT according to their statistical relation with it. For instance,

Sound libraries for music synthesizers easily comprise one increasing the attack time for the amplitude envelope may
thousand or more programs (“patches”). Thus, there are also increase the attack time of the filter envelope. Setting
enough raw data to apply data mining to reveal typical an oscillator to a pulse wave may configure the LFO for
settings and to extract dependencies. Intelligent user inter- pulse width modulation. In this mode, the standard user
faces for music synthesizers can be based on such statistics. interface of the synthesizer is augmented by an arrangement
This paper proposes two approaches: First, the user sets of the parameters as dots on a 2D field. Statistically related
any number of parameters and then lets the system find the parameters are placed next to each other. Every parameter
nearest sounds in the database, a kind of patch autocomple- influences its neighbors according to their distance and the
tion. Second, all parameters are “live” as usual, but turning joint statistics.
one knob or setting a switch will also change the settings The presented methods work with synthesizers of the clas-
of other, statistically related controls. Both approaches can sic Moog type. Due to the large variations in their wave
be used with the standard interface of the synthesizer. On forms, sampling synthesizers may not benefit from the pre-
top of that, this paper introduces alternative or additional sented methods. Also synthesizers of the FM type cannot
interfaces based on data visualization. be treated well, since the acoustical meaning of their pa-
rameters changes drastically with the choice of the FM al-
gorithm, that is: the interconnection of the operators.
Keywords The freely available software synthesizer Synth1 (http:
Information visualization, mutual information, intelligent //www.geocities.jp/daichi1969/softsynth/) serves as a
user interfaces model for the experiments. It is implemented as a VST
plug-in (http://www.steinberg.de/324_1.html) and offers
87 patch parameters; its sound library as collected from
1. INTRODUCTION different sources on the Internet comprises 1250 patches
Most software-based synthesizers adhere to standard pro- including lead synth, pad, and effect sounds. The proto-
gramming interfaces to retrieve program data, to set param- type software employs Hermann Seib’s publically available
eters, and to notify other software if a parameter is changed C++ source code for a VST host program (http://www.
through the synthesizer’s graphical user interface. Thus, by hermannseib.com/english/vsthost.htm). The host code
creating appropriate host software, one can not only collect has been extended to communicate via Internet Protocol
the sound programs for data mining, but also intervene in (IP) and to automatically extract all available patch data
the actions triggered by the synthesizer’s switches, knobs of the synthesizer and write them to a text file on startup.
and sliders. The novel interfaces proposed in the paper em- For easier visualization and debugging, the statistical com-
ploy this possibility to enhance a synthesizer’s standard user putations and the augmented user interfaces are created
interface through statistical data: in a C#-based application that reads in the text file with
Patch Autocompletion. One may set only a selec- the patch data and sends and receives parameter change
tion of parameters such as the oscillators’ pitch and wave- commands to and from the VST host through a local IP
forms and the envelope, but leave all other parameters un- connection on the same computer, see Figure 1.
touched. Then the system can search for sound programs in
the database in which the touched parameters have similar
settings. This process is similar to word autocompletion in
2. RELATED WORK
text-processing software. In this mode, the standard user Statistical methods have a long history in sound and mu-
interface of the synthesizer is augmented by an interactive sic computing, in particular concerning information extrac-
parallel coordinates visualization. tion [5]. Attempts to learn about the human perception
Co-Variation. Whenever the user edits a parameter, of timbre [3] are psychoacoustic studies and thus are in-
other parameters are varied along with the first parameter herently of a statistical nature. The approach proposed in
this paper studies bypasses psychoacoustics, however, and
directly evaluates the statistics of a sound library. One may
liken this to the technical analysis of share prices, through
Permission to make digital or hard copies of all or part of this work for which analysts try to learn about a company without taking
personal or classroom use is granted without fee provided that copies are a look at its fundamental data.
not made or distributed for profit or commercial advantage and that copies Several works that aim at handling the vast parameter
bear this notice and the full citation on the first page. To copy otherwise, to spaces of synthesizers employ evolutionary methods with a
republish, to post on servers or to redistribute to lists, requires prior specific human in the loop. Both Genophone [9] and MutaSynth [4]
NIME08, Genoa, Italy do not rely on detailed knowledge of the inner workings
Copyright 2008 Copyright remains with the author(s). of the synthesis unit and thus can be used with a large
221
Figure 2: Most parameters possess a clustered dis-

tribution. Every parameter is normalized to the
interval [0, 1] and smoothed with a Parzen window
Figure 1: The prototype comprises a modified VST of width 0.02.
host, additional graphical user interfaces, and a
statistics engine.
be exact, since most of the parameters are continuous. Thus,

range of sound generators—an aspect in which they resem-
the method also shares some aspects with a text spell-
ble the approach presented in this paper. Hoffman and
checker, which looks for close but not exact matches. To
Cook [7] discuss a database model for feature-based syn-
implement this, one needs a method to measure the “dis-
thesis. They employ an Ln distance in the low-dimensional
tance” between two patches. In principle, one could base
feature space. The database retrieval is related to the au-
such a distance function on psychoacoustic measurements,
tocompletion mode presented in this work, even though the
for instance through dividing the parameter space into just
databases’ contents are different in the two works.
noticeable differences. However, the parameter space of a
Co-variation, the second interaction mode described in
standard synthesizer may easily comprise one hundred di-
this paper, can be interpreted as equipping a synthesizer
mensions; it is hard to see how this could be exhausted by
with “embedded” metaparameters: Every single of the usual
experiments with human listeners. A different option could
controls acts like a metaparameter in that it controls a num-
be to set up a computational model of timbre perception
ber of related parameters. There is no dedicated control
that inputs audio files; this could be used to evaluate the pa-
for the metaparameter’s value, as opposed to standard ap-
rameter space fully automatically. Such an approach would
proaches to metaparameters such as in VirSyn miniTERA
still face the curse of dimensionality. In addition, psychoa-
http://www.virsyn.de/, in which all technical details are
coustic timbre models are still in their infancy [3]. Thus,
subsumed under a handful of general controls. Johnson and
this work resorts to the data that are at hand: the patch
Gounaropoulos [8] train a neural net to map metaparame-
statistics. Based on these data, one can, for instance, de-
ters to low-level controls.
termine which values are more probable (see Figure 2) and
Bencina [1] maps an arbitrary number of control dimen-
define a per-parameter distance based on the rank statistics.
sions to a 2D surface. This seems to be related to the co-
To determine a data-based distance between a certain pa-
variation interaction mode described in this paper. Bencina,
rameter’s values 0.4 and 0.8, we count for which percentage
however, places parameter settings manually in 2D, whereas
of the patches this parameter is larger than or equal to 0.4
this work automatically places parameters as such.
but smaller than 0.8. This distance measure is more sen-
sitive at points where the values cluster. For instance, the
3. PATCH AUTOCOMPLETION detune values cluster around zero, so that the distance mea-
Autocompletion is a standard feature in text-based soft- sure feels a slight detuning as strongly as a strong tuning
ware. While the user types a word, the software queries a difference far away from zero.
dictionary of common terms and—if successful—either of- One can argue whether this approach is reasonable for
fers a context menu containing all possible completions or nominal, discrete-valued parameters, too. For instance, a
offers the shortest found completion for immediate inser- control for a wave form may offer sawtooth, rectangle, tri-
tion. This interaction mode can be carried over to music angle, and sine. Then, the parameter distance between the
synthesizers: The user sets as few or as many parameters latter two is the same as between the rectangle and triangle
as he or she likes. Then the system searches the patch wave, which does not correspond to the perceived degree
database for sounds with corresponding values of these pa- of similarity. In many instances, however, the assignment
rameters. The synthesizer is set to the patch forming the of discrete parameters to switch positions conforms to per-
closest match. If the user is not satisfied with the result, ception. This is true for a frequency range switch with the
he or she can also retrieve the next best matches or sim- settings 16”, 8”, 4”, 2” as well as for a filter type switch
ply adjust the parameters, the ones set before as well as with the settings LPF, BPF, HPF. Thus, this work sticks
additional ones, and again ask for the best match. to the same simple definition of per-parameter distance for
Patch autocompletion presents one major issue as com- all types of parameters even though this may not be the
pared to word autocompletion: Only rarely will the matches optimum choice in all situations.
222
Conflicts in rank order occur when the values of a certain

parameter are exactly identical in two patches. To resolve
this, every parameter value is shifted by a small random
number that is negligible concerning the actual sound.
To find best-matching patches in the library, the overall
distance between the settings made by the user and a patch
from the library has to be computed. To this end, the per-
parameter distances for the parameters set by the user are
combined in a Euclidean manner: The total distance is the Figure 4: The joint probability density of all pairs
square root of the sum of the squares of the per-parameter of parameters can be determined from the patches
distances. This computation ignores that some parameters in the database (Parzen window of width 0.1).
may be meaningless. For instance, the LFO’s frequency
does not matter if the amount of LFO modulation is set to
zero. Thus, the system relies on the user providing sensible
input: He or she should set the LFO modulation amount to
zero and provide no further LFO adjustments.
The graphical user interface for this interaction mode con-
sists of the original synthesizer panel plus a parallel coordi-
nates plot of the patch collection, see Figure 3. The parallel
coordinates plot recommends typical settings: It allows to
read off the value distribution of any single parameter and
the correlations this parameter may have with its neigh-
bors. This includes, for instance, dependencies between the
ADSR parameters of an envelope generator.
Parameters can be set both in the original interface and in
the parallel coordinates plot; they show up as red dots. The
parameter values of the best or n-th best match in the sound
library are indicated by a polyline. In the prototype, the
parameters set by the user retain the precise values the user
has set them to and do not reflect the values belonging to the
patch retrieved from the library. This facilitates tweaking
the sound: Small edits still lead to the same nearest patch.
On the other hand, however, this approach means that no
sound from the library will be reproduced perfectly.
4. CO-VARIATION OF PARAMETERS Figure 5: A substantial number of the parameters

The mathematical construct of mutual information be- exhibits clear statistical dependencies in terms of
tween two random variables allows us to learn which param- mutual information.
eters bear any kind of statistical relation. In contrast to the
usual measurement, which is Pearson’s correlation coeffi-
cient, mutual information deals well with nominal, discrete- specifically for the current synthesizer. For instance, it as-
valued parameters such as waveform settings. In addition, signs a weight w of zero to one to the LFO speed depending
it also detects relations such as y = x2 for x ∈ [−1, 1]. Mu- on the value of its modulation amount setting.
tual information is measured in bits, giving the typical gain The weighted mutual entropy I(X; Y ) thus amounts to
in knowledge one gets about one of the two random vari-

ables from knowing the value of the other. If there is no p(x, y)
I(X; Y ) = w(x) w(y) p(x, y) log2 ,
statistical relation, this gain is zero bits. x y
p(x)p(y)
This work considers only the statistical relations between
pairs of parameters. In principle, one could also try to infer where the x are the values of the first parameter X, p(x)
the best value for a parameter (such as the filter’s cutoff fre- is the probability of the first parameter being x, similarly
quency) given two others (such as one oscillator’s waveform for the second parameter Y , and p(x, y) is the probability
and the amplitude envelope’s sustain level). But this faces of the first parameter being x and the second being y at
another curse of dimensionality: There may not be enough the same time. Since most of a synthesizer’s parameters
patches in the library with this waveform setting and a sim- are continuous, we divide every parameter’s range into 16
ilar sustain level to warrant a meaningful decision. bins to compute probabilities. This method is a basic ap-
Mutual information is not the only mathematical tool proach to estimate probability distributions and needs to be
to measure the degree of dependency between two random replaced by more sophisticated estimators if the number of
variables. Another prominent choice for this task is condi- samples (that is: sound programs) is low. Figure 4 shows
tional entropy. It is not symmetric, however, which makes the joint distribution for several pairs of parameters with
it hard to visualize, because the geometric distance from A high mutual information, based on the 1250 patches from
to B always equals the distance from B to A. the sound library. Figure 5 displays all pair-wise results.
Depending on the overall settings, some parameters may On startup, the prototype software computes the mu-
have little or no influence on the sound. For instance, the tual information between all pairs of parameters. It cre-
speed of an LFO is of less interest when the modulation ates a 2D layout of the parameter set that visualizes the
amount is low. This degree of relevance is taken care of by statistical relationship through nearness or distance, see
weights introduced into the computation of the mutual in- Figure 6. To this end, a force-directed graph-layout al-
formation. These weights stem from a simple model created gorithm [6] is applied that attempts to minimize the en-
223
Figure 3: The “autocompletion” interaction mode augments the synthesizer’s interface by a parallel coor-
dinates plot of the library.
isting screen interfaces. Reusing the standard knobs and

switches also presents some issues, however. For instance,
the standard user interface does not reveal which controls
have been set and which have not. On the screen, this could
be solved through a semitransparent overlay.
The presented approach may only be the first step toward
a statistical evaluation of sound libraries: Can one correlate
three or more parameters, possibly through dimensional re-
duction? [2] Can one create a perception-oriented layout of
the parameter controls on the screen? What is the appro-
priate weighting for sound parameters when computing the
“distance” between patches: Is the filter frequency more
important than the LFO speed? How can one improve the
statistical analysis of the sound library with—manageable—
psychoacoustic tests?
6. REFERENCES
[1] R. Bencina. The metasurface: applying natural
Figure 6: The parameters (represented by dots) are neighbour interpolation to two-to-many mapping. In
arranged according to their statistical relation, with NIME ’05, pages 101–104, 2005.
their colors representing functional groups. The [2] C. J. C. Burges. Geometric methods for feature
disk indicates the influence radius. The window ti- extraction and dimensional reduction. In Data Mining
tle names the parameter below the cursor. and Knowledge Discovery Handbook, pages 59–91.
Springer, 2005.
2 [3] J. A. Burgoyne and S. McAdams. Non-linear scaling
target
ergy E = X=Y 1 − dactual
X;Y /dX;Y while staying in a techniques for uncovering the perceptual dimensions of
actual timbre. In ICMC 2007, pages 73–76, 2007.
square of 400 × 400 pixels. Here, dX;Y denotes the dis-
tance of the markers representing the parameters X and [4] P. Dahlstedt. A MutaSynth in parameter space:
Y on the screen, and the targeted distance is given by interactive composition through evolution. Org. Sound,
−1 6(2):121–124, 2001.
dtarget
X;Y = I(X; Y )3 + 200 1
, so that unrelated parameters
are pushed 200 pixel apart. The third power lets related pa- [5] D. P. Ellis. Extracting information from music audio.
rameters exhibit a strong pull on each other. Commun. ACM, 49(8):32–37, 2006.
The user can specify an influence radius in this 2D repre- [6] I. Herman, G. Melançon, and M. S. Marshall. Graph
sentation to control how many other parameters a change visualization and navigation in information
in one parameter will affect. The new value of each influ- visualization: a survey. IEEE Transactions on
enced parameter is computed through a weighted average Visualization and Computer Graphics, 6:24–43, 2000.
of its value in every patch. The relative weight of a patch is [7] M. Hoffman and P. R. Cook. Real-time feature-based
exp −r2 /(2 · 0.012 ) , where r denotes the difference of the synthesis for live musical performance. In NIME ’07,
parameter value set by the user and its value in the patch. pages 309–312, 2007.
[8] C. G. Johnson and A. Gounaropoulos. Timbre
interfaces using adjectives and adverbs. In NIME ’06,
5. CONCLUSION AND OUTLOOK pages 101–102, 2006.
This work presented two interaction modes that give new [9] J. Mandelis and P. Husbands. Don’t just play it, grow
meaning to the classic controls of a synthesizer, no matter if it!: breeding sound synthesis and performance
they are actual knobs or if they are drawn on a computer’s mappings. In NIME ’04, pages 47–50, 2004.
screen. This allows sticking to existing hardware or to ex-
224
i-Maestro: Technology-Enhanced Learning and Teaching

for Music
Kia Ng Paolo Nesi
ICSRiM - University of Leeds DISIT-DSI – University of Florence
School of Computing & School of Music, Via S. Marta 3
Leeds LS2 9JT, UK 50139 Firenze, Italy
kia@icsrim.org.uk, www.kcng.org nesi@dsi.unifi.it, www.dsi.unifi.it/~nesi
info@i-maestro.org, www.i-maestro.org
ABSTRACT
This paper presents a project called i-Maestro The project specifically addresses training support for string
(www.i-maestro.org) which develops interactive multimedia instruments and among the many challenging aspects of music
environments for technology enhanced music education. The education. The project is particularly interested in linking music
project explores novel solutions for music training in both practice and theory training.
theory and performance, building on recent innovations
resulting from the development of computer and information
technologies, by exploiting new pedagogical paradigms with
cooperative and interactive self-learning environments, gesture 2. I-MAESTRO
interfaces, and augmented instruments. This paper discusses the With an analysis of pedagogical needs, the project develop
general context along with the background and current enabling technologies to support music performance and theory
developments of the project, together with an overview of the training, including tools based on augmented instruments,
framework and discussions on a number of selected tools to gesture analysis, audio analysis and processing, score
support technology-enhanced music learning and teaching. following, symbolic music representation, cooperative support
and exercise generation. The resulting i-Maestro framework for
technology-enhanced music learning is designed to support the
creation of flexible and personalisable e-learning courses, and
Keywords aims to offer pedagogic solutions and tools to maximise
Music, education, technology-enhanced learning, motion, efficiency, motivation, and interests in the learning processes
gesture, notation, sensor, augmented instrument, multimedia, and improve accessibility to musical knowledge.
interactive, interface, visualisation, sonification.
A process of continuous user requirements analysis was started
since the beginning forms the basis of the specification of a
framework which include enabling technologies, pedagogic
1. INTRODUCTION tools and the production of content, and supportive pedagogical
The i-Maestro project [1, 12, 18, 19] aims to explore novel aspects, such as modelling and formalising educational models
solutions for music training in both theory and performance, for music, courseware production tools. These include
building on recent innovations in computer and information innovative aspects, such as models and support for cooperative
technologies. New pedagogical approaches are being studied training, interactive and creative interfaces with sensors, and
with interactive cooperative and self-learning environments, gesture tracking, client tools for theory and play training,
and computer-assisted tuition in classrooms including gesture distribution and management tools for music lessons, and music
interfaces and augmented instruments. The project develops a exercise generation.
technology-enhanced environment for aural and instrumental The outcomes are being validated by several European
training both for individuals and ensembles, as well as tuition in institutions including Accademia Nazionale di Santa Cecilia
musical analysis, theory, and composition. (Rome), the Fundación Albéniz (Madrid) and IRCAM (Paris).
2.1 Framework and Tools

The user requirements, and a set of pedagogical scenarios with
Permission to make digital or hard copies of all or part of this work for use cases and test cases have been translated into specifications
personal or classroom use is granted without fee provided that copies are of the framework and tools. This section presents a diagram
not made or distributed for profit or commercial advantage and that which depicts the overall framework (see Figure 1), and briefly
copies bear this notice and the full citation on the first page. To copy describes a set of selected tools (in Section 2.2 to 2.5).
225
concepts with the interactive tools. Figure 3 shows the wireless

module and sensor setup to provide various feedbacks of the
Sensors Posture Visual Audio Accessible playing (augmented violin [7])
interface and Rendering and Rendering and Interface
Gesture interaction acquisition
i-Maestro Music Training Exercise Processor

Symbolic Practice Training Music Editing Assessment
Training Processing Tools & Score- Models and
processing Tools Following Tools
i-Maestro Other players Cooperative Support Audio Processor:

DB access & tools for Music Training beat, pitch, etc.
i-Maestro Client Tool

Figure 3. Wireless module and sensor.
i-Maestro
Production Tools To other
i-Maestro 3D motion capture technology is also being ultilised. Figure 3
School i-Maestro
Client Tools shows the 3D Augmented Mirror (AMIR) [11, 13, 14, 17, 18]
Server captures and visualises the performance in 3D. It offers a
Teacher, number of different analyses to support the teaching and
Assessment learning of bowing technique and body posture. Figure 5 shows
experts the bowing tracking with automated bowing annotation on
Group of students SMR
i-Maestro P2P
& Cooperative
Work Support
Educational individual
content, students
profiles,
history and
results
Group of students
Figure 1. An overview of the i-Maestro architecture.
2.2 Music Training Supports

Prototypes for music training supports such as the score-
follower (see Figures 3) [6, 7, 10], notation editor, client tools, Figure 4. AMIR for 3D visualisation and sonification of a
gesture support and visualisation, and exercise generation are bowing exercise.
now available and continuously being enhanced and tested in
pedagogical practice.
Figure 2. The Score Follower listens to the player and

provide automated “page turning” and accompaniment.
Figure 5. AMIR with automated bowing annotation.

Sensor interface and several pedagogical contexts have also
been developed to support students to internalise key musical
226
2.3 Symbolic Music Representation An overarching pedagogical approach and model [16] for
Music notation is one of the fundamentals in music education. technology-enhanced teaching and learning has been
i-Maestro is promoting MPEG Symbolic Music Representation developed. On this basis, a set of detailed pedagogical scenarios
(SMR), an ISO standard for the representation of music related to the use of the i-Maestro tools has been created.
notation with enhanced multimedia features [3, 4, 8, 9, 15]. This paper presented a brief overview of the i-Maestro project.
Cooperative work is another key area of music education. It With the introduction, the paper presented the overall
allows different components of the i-Maestro framework to be framework design and introduced several tools to support music
used across a network. Other tools include the Exercise learning and teaching including MPEG SMR for theory
Generator, which supports (semi-)automated creation of training, gesture analysis for performance training.
exercises and the School Server offers online access to stored
lesson material for sharing learning material at home and in the The final results are expected to consist of a framework for
classroom. technology-enhanced music training, that combines proven and
novel pedagogical models with technological tools such as
Figure 5 shows an MPEG SMR player/decoder within the IM1 collaborative work support, symbolic music processing, audio
MPEG-4 reference software. The MPEG-SMR has been processing, and gesture interfaces. Offering accessible tools for
accepted as an ISO standard under MPEG-4. music performance and theory training as well as for authoring
lessons and exercises will ensure wide participation.
Many prototype tools available are expected to be incorporated
in various new products and services, which will be made
available to both the general public and educational
establishments. These are in the process of being validated and
refined and the project is inviting music teachers and students
to take part in the testing phase the i-Maestro software. We are
particularly interested in testing the system in real pedagogical
situations to see how teachers and students interact with the
technology. At the ICSRiM - University of Leeds (UK), open
lab sessions are being organised for people to come and try out
the i-Maestro 3D augmented mirror system with a 12-camera
motion capture system.
Figure 6. An MPEG SMR player/decoder.

4. ACKNOWLEDGMENTS
2.4 Notation, Annotation and The research is supported in part by the European Commission
Representation under Contract IST-026883 I-MAESTRO. The authors would
For annotation of audio recordings and notation, an adaptation like to acknowledge the EC IST FP6 for the partial funding of
of the SDIF format has been developed to represent sensor, the I-MAESTRO project (www.i-maestro.org), and to express
motion, and analysis data, raising interest from practitioners of gratitude to all I-MAESTRO project partners and participants,
the domain. The partners continued research and development, for their interests, contributions and collaborations.
extensions, and refining models and tools in light of the
responses and feedbacks from User Group and contacts. The
individual tools are brought together for integration, linking
them into the overall framework. 5. REFERENCES
[1] i-Maestro project website: www.i-maestro.org
2.5 Integration [2] i-Maestro project Deliverable DE4.5.1 on Accessibility
The combination of tools leads to new functionality, e.g. the aspects in Music Tuition, available online via
automatic annotation of a score with bowing symbols in real http://www.i-
time while a musician is playing, which is reached by maestro.org/documenti/view_documenti.php?doc_id=629
combining, score follower, motion capture and SMR support.
An application (called the i-Maestro Start) has been created to [3] P. Bellini, P. Nesi, G. Zoia. MPEG Symbolic Music
offer students and teachers a unique tool to start all the tools Representation: A Solution for Multimedia Music
offered by i-Maestro. With the tools and results available, Applications. Published on Interactive Multimedia Music
validation activities have been started with teachers in music Technologies, Copyright © 2008 by IGI Global. ISBN
schools and conservatories. 978-1-59904-150-6 (hardcover) - ISBN 978-1-59904-152-
0 (ebook).
[4] P. Bellini. XML Music Notation Modelling for
Multimedia: MPEG-SMR. Published on Interactive
3. Reflections and Next Steps Multimedia Music Technologies, Copyright © 2008 by
The project has worked on pedagogical aspects, enabling IGI Global. ISBN 978-1-59904-150-6 (hardcover) - ISBN
technologies, i-Maestro software components, and started 978-1-59904-152-0 (ebook).
validation activities. In addition, guidelines for accessibility in [5] Neil Mckenzie and David Crombie, Creating Accessible
technology-enhanced music training have been developed [2, Interfaces for i-Maestro Learning Objects, in Proceedings
5]. of the Second International Conference on Automated
227
Production of Cross Media Content for Multi-channel Conference on Automated Production of Cross Media
Distribution (AXMEDIS 2006), Content for Multi-channel Distribution (AXMEDIS 2006),
www.axmedis.org/axmedis2006, Volume for Workshops, www.axmedis.org/axmedis2006, Volume for Workshops,
Tutorials, Applications and Industrial, pp. 87-91, 13th – Tutorials, Applications and Industrial, pp. 87-91, 13th –
15th December 2006, Leeds, UK, Firenze University Press 15th December 2006, Leeds, UK, Firenze University Press
(FUP), ISBN: 88-8453-526-3, (FUP), ISBN: 88-8453-526-3,
http://digital.casalini.it/8884535255 http://digital.casalini.it/8884535255
[6] Cont, A., Schwarz, D. (2006), Score Following at IRCAM, [13] Kia Ng, Oliver Larkin, Thijs Koerselman, and Bee Ong,
MIREX‘06 (Music Information Retrival Evaluation i-Maestro Gesture and Posture Support: 3D Motion Data
eXchange), The Second Annual Music Information Visualisation for Music Learning And Playing, in
Retrieval Evaluation eXchange Abstract Collection, Edited Proceedings of EVA 2007 London International
by The International Music Information Retrieval Systems Conference, Eds: Jonathan P. Bowen, Suzanne Keene,
Evaluation Laboratory (IMIRSEL), Graduate School of Lindsay MacDonald, London College of Communication,
Library and Information Science, University of Illinois at University of the Arts London, UK, 11-13 July 2007,
Urbana-Champaign http://www.music- pp20.1-20.8.
ir.org/evaluation/MIREX/2006_abstracts/MIREX2006Abs [14] Kia Ng, Oliver Larkin, Thijs Koerselman, Bee Ong,
tracts.pdf, p. 94, October 2006, Victoria, Canada Diemo Schwarz, Frederic Bevilacqua, The 3D Augmented
(http://ismir2006.ismir.net/) Mirror: Motion Analysis for String Practice Training, p.
[7] F. Bevilacqua, N. Rasamimanana, E. Fléty, S. Lemouton, 53-56, in Proceedings of the International Computer Music
F. Baschet (2006) The augmented violin project: research, Conference, ICMC 2007 – Immersed Music, Volume II,
composition and performance report, 6th International pp. 53-56, 27-31 August 2007, Copenhagen, Denmark,
Conference on New Interfaces for Musical Expression ISBN: 0-9713192-5-1
(NIME 06), Paris, 2006 [15] Kia Ng and Paolo Nesi (eds), Interactive Multimedia
[8] Pierfrancesco Bellini, Paolo Nesi, Maurizio Campanai, Music Technologies , ISBN: 978-1-59904-150-6
Giorgio Zoia, FCD version of the symbolic music (hardcover) 978-1-59904-152-0 (ebook), 394 pages, IGI
representation standard. FCD version of the symbolic Global, Information Science Reference, Library of
music representation standard, MPEG2006/N8632, Congress 2007023452, 2008.
October 2006, Hangzhou, China [16] Tillman Weyde, Kia Ng, Kerstin Neubarth, Oliver Larkin,
[9] P. Bellini, F. Frosini, G. Liguori, N. Mitolo, and P. Nesi, Thijs Koerselman, and Bee Ong, A Systemic Approach to
MPEG-Symbolic Music Representation Editor and Viewer Music Performance Learning with Multimodal
for Max/MSP, in Proceedings of the Second International Technology
Conference on Automated Production of Cross Media Support, in Theo Bastiaens and Saul Carliner (eds.),
Content for Multi-channel Distribution (AXMEDIS 2006), Proceedings of E-Learn 2007, World Conference on E-
www.axmedis.org/axmedis2006, Volume for Workshops, Learning in Corporate, Government, Healthcare, & Higher
Tutorials, Applications and Industrial, pp. 87-91, 13th – Education, Québec City, Québec, Canada, Association for
15th December 2006, Leeds, UK, Firenze University Press the Advancement of Computing in Education (AACE),
(FUP), ISBN: 88-8453-526-3, October 15-19, 2007.
http://digital.casalini.it/8884535255 [17] Thijs Koerselman, Oliver Larkin, and Kia Ng, The MAV
[10] Norbert Schnell, Frederic Bevilacqua, Diemo Schwarz, Framework: Working with 3D Motion Data in Max MSP /
Nicolas Rasamimanana, and Fabrice Guedy, Technology Jitter, in Proceedings of the 3rd International Conference
and Paradigms to Support the Learning of Music on Automated Production of Cross Media Content for
Performance, in Proceedings of the Second International Multi-channel Distribution (AXMEDIS 2007). Volume for
Conference on Automated Production of Cross Media Workshops, Tutorials, Applications and Industrial,
Content for Multi-channel Distribution (AXMEDIS 2006), i-Maestro 3rd Workshop, Barcelona, Spain, ISBN: 978-88-
www.axmedis.org/axmedis2006, Volume for Workshops, 8453-677-8, 28-30 November 2007.
Tutorials, Applications and Industrial, pp. 87-91, 13th – [18] Kia Ng, Tillman Weyde, Oliver Larkin, Kerstin Neubarth,
15th December 2006, Leeds, UK, Firenze University Press Thijs Koerselman, and Bee Ong, 3D Augmented Mirror: A
(FUP), ISBN: 88-8453-526-3, Multimodal Interface for String Instrument Learning and
http://digital.casalini.it/8884535255 Teaching with Gesture Support, in Proceedings of the 9th
[11] Ong, B., Khan, A., Ng, K., Nesi, P., Mitolo, N. (2006), international conference on Multimodal interfaces,
Gesture-based Support for Technology-Enhanced String Nagoya, Japan, pp. 339-345, ISBN: 978-1-59593-817-6,
Instrument Playing and Learning. International Computer ACM, SIGCHI, DOI:
Music Conference (ICMC) 6-11 November 2006, New http://doi.acm.org/10.1145/1322192.1322252, 2007
Orleans, Louisiana, USA, www.icmc2006.org, ISBN: 0- [19] Kia Ng, 4th i-Maestro Workshop on Technology-Enhanced
9713192-4-3 Music Education, in Proceedings of the 8th International
[12] Bee Ong, Kia Ng, Nicola Mitolo, and Paolo Nesi, Conference on New Interfaces for Musical Expression
i-Maestro: Interactive Multimedia Environments for Music (NIME 2008), Genova, Italy, 5-7 June 2008.
Education, in Proceedings of the Second International
228
The HOP sensor: Wireless Motion Sensor

Bart Kuyken, Wouter Verstichel, Michiel Demey, Marc Leman
Frederick Bossuyt, Jan Vanfleteren IPEM – Department of Musicology, Ghent University
Blandijnberg 2
TFCG Microsystems Lab, Ghent University
9000 Ghent, Belgium
Technologiepark Zwijnaarde 914
+32 (0)9 264 41 26
9052 Zwijnaarde
+32 (0)9 264 53 54 michiel.demey@ugent.be
bart.kuyken@ugent.be
wouter.verstichel@ugent.be
ABSTRACT
This paper describes the HOP system. It consists of a wireless The radio chip used to establish the wireless link is the
module built up by multiple nodes and a base station. The nodes CYWM6935 of Cypress Semiconductor [5]. The chip runs the
detect acceleration of e.g. human movement. At a rate of 100 WirelessUSB protocol, a short-range, high-bandwidth wireless
Hertz the base station collects the acceleration samples. The data radio communication protocol that uses the 2.4 GHz band, so it is
can be acquired in real-time software like Pure Data and well suited for our application.
Max/MSP. The data can be used to analyze and/or sonify
movement.
Keywords
Digital Musical Instrument, Wireless Sensors, Inertial Sensing,
Hop Sensor
1. INTRODUCTION
This paper presents wireless motion sensors. The application is a
multipoint-to-one system. Three people can attach a sensor to
their body; their acceleration will be measured in three Figure 1. The node
dimensions and transmitted to a central base station at a data rate
of 100 Hertz. The collected data can be used to analyze or sonify The data of the accelerometer is acquired by the microcontroller,
movement [1]. Distances up to 30 meters are allowed between through the I²C interface. The microcontroller processes this
transmitter and receiver. Goal of the project is to increase the information and sends the data to the transceiver chip through the
number of users to over 10 people at the same data rate. SPI [6] interface. The chip transmits the data to the base station.
For this project the ATmega168V of Atmel is being used [7]. The
chip transmits the data to the base station.
2. HARDWARE
The system consists of wireless nodes and a base station. The
wireless nodes collect the acceleration data and send these
samples to the base station at a rate of 100Hz.
2.1 Node Figure 2. Dataflow at the node

The node consists of an accelerometer, a transceiver chip with two The node contains three pushbuttons, which the user can use to
antennas, three pushbuttons and a battery. interact with software applications. To restrict the total number of
Earlier experiments with Xsens Technologies [2] pointed out that bytes, sent by the node, the information of the buttons is
human movements can peak up to an acceleration of 6g. encapsulated in the bytes with the acceleration data. Less data
Information about the acceleration in all three dimensions is bytes means less time to finish the transmission protocol. The
needed. This is why the LIS3LV02DQ [3] of STMicroelectronics, faster a node can send the data, the more nodes can be attached to
a three axes digital output linear accelerometer is used. This one base station, while keeping the timeframe constant e.g. 10 ms.
component provides an I²C [4] serial interface to communicate
Normally one acceleration byte for one dimension is represented
with the external world.
by 12 bits in the accelerometer. The microcontroller ignores the 5
least significant bits and adds the information of one pushbutton,
not made or distributed for profit or commercial advantage and that which is represented by one bit (e.g. 0 or 1). The 7 bits of the
copies bear this notice and the full citation on the first page. To copy accelerometer and the bit of one pushbutton can be bundled into
otherwise, or republish, to post on servers or to redistribute to lists, one byte. With the information of three dimensions and three push
requires prior specific permission and/or a fee. buttons it is clear that only 3 bytes are needed to be send to the
NIME08, June 5-7, 2008, Genova, Italy base station.
Copyright remains with the authors.
229
Ignoring the 5 least significant bits of the accelerometer doesn’t

harm the accuracy, since acceleration noise can occur in the up to
the 7th bit. The acceleration in the x direction of a node at rest is
shown in Figure 3. These acceleration values are still affected by
noise.
Acceleration Noise x-Dimension

Figure 5. Data packet received by host computer
0,25
0,2
0,15 2.3 Transmission Protocol
0,1 The nodes communicate with the base station through the
0,05 CYWM6935 RF chip with a 16 kB/s data rate. The protocol is a
0 simple time division multiple access (TDMA) scheme. At the start
-0,05 of each timeframe the base station sends a start signal. The
-0,1 different nodes will send their data after each other to the base
-0,15 station when receiving this start signal. Hence the base station
-0,2 controls the data rate. By using TDMA it is not necessary to send
-0,25 address information from the node to the base station.
Time By using our Cypress radio chip with WirelessUSB, 78 different
frequency channels are at our disposal. Every timeframe, the base
station chooses another channel and includes the number in the
Figure 3. Acceleration in the x dimension of a node at rest
start signal. Therefore the nodes know on which channel to
respond. Different timeframes use different channels (different
frequencies). This basic kind of ‘frequency-hopping’ can diminish
2.2 Base station bursts of packet loss due to other devices (e.g. Bluetooth, WLAN)
The base station consists out of a RF chip (the same chip as on the considerably.
nodes), a microcontroller and a ‘USB to UART’ chip (CP2102)
[8], which allows communication with a host computer by USB. Two nodes can be connected to the same base station. It is
The RF chip receives data from different nodes and sends them to possible to use different base stations with their own sensors at
the microcontroller through a SPI interface. The microcontroller the same time. This is achieved by giving the different base
on the base station knows according to the timing which node he stations (and thus also the nodes connected to this base station)
has just received acceleration data from. It extracts the data (i.e. their own frequency band.
splits the acceleration and button information) and makes a new
data packet.
2.4 Measurements
2.4.1 Properties
The different nodes each have a Lipo battery. The weight of such
a node is 35g, including the battery. The dimensions are 8mm x
50mm x 30mm. The battery-life is approximately 18 hours.
Distances up to 30 meters are allowed between transmitter and
receiver.
2.4.2 Noise
In figure 3 a view is given of the noise of the accelerometer. The
resolution of one acceleration sample here is 7 bit. The standard
deviation of the noise is roughly 0,054g. If this value appears to
be too large, there is a possibility to improve the noise by taking
more acceleration samples but keeping the data rate constant. The
microcontroller now reads the value of the accelerometer 4 times
Figure 4. Dataflow at the base station in one timeframe and takes the average of these samples before he
This packet contains the node address (one byte), a time stamp transmits the acceleration data to the base station. By doing this,
(two bytes), acceleration information (three bytes) and button Figure 6 is obtained. The standard deviation is now decreased to
information (three bytes). The microcontroller sends this packet to approximately 0,024g.
the ‘USB to UART’ chip, which sends the packet to a host
computer by USB.
230
Acceleration Noise x-Dimension

(Averaged)
0,25
Acceleration (g)
0,15
0,05
-0,05
-0,15
-0,25 Figure 8. Node on Velcro strip

Time
5. FUTURE DIRECTIONS
At this moment a new PCB-design is being tested. The
Figure 6. Data packet received by host computer CYRF6936 [10] of Cypress Semiconductor now replaces the
radio chip. This chip has a maximum data rate of 1Mbps. By
3. APPLICATIONS implementing this chip more nodes can be connected to one base
The data can be obtained on the computer through a virtual COM- station. A recharging circuit is added to the design in order to
port. Applications are being developed by the Institute for recharge the Lipo batteries by plugging an adapter to the node
Psychoacoustics and Electronic Music (IPEM) at Ghent board. Later on, extra functionality can be added to the nodes by
University. Several applications in Pure Data and Max/MSP are appending a gyroscope and magnetometer. By Kalman-filtering it
already in use. These applications are being used for sonification is possible to trace the orientation of the node.
of human movement and rhythm analysis [9].
6. REFERENCES
[1] Demey, M., Leman, M., De Bruyn, L., Bossuyt, F.,
Vanfleteren, J. The Musical Synchrotron: using wireless
motion sensors to study how social interaction affects
synchronization with musical tempo. NIME ‘08
[2] Inertial Motions Sensors
http://www.xsens.com
[3] STMicroelectronics Website LIS3LV02DQ Product
Information
http://www.st.com/stonline/products/literature/ds/11115.htm
[4] NXP (formerly Philips) I²C 2.1 specification (January 2000)
http://www.nxp.com/acrobat_download/literature/9398/3934
0011.pdf
[5] Cypress Semiconductors Website CYWM6935 Product
information
http://www.cypress.com/products/?gid=14&fid=65&category
=All&
[6] Martin Schwerdtfeger, 06/2000 , SPI - Serial Peripheral
Interface
[7] Atmega Website Atmega168 product information
http://www.atmel.com/dyn/products/product_card.asp?part_i
Figure 7. Visualizing the acceleration in Max/MSP d=3303
[8] Silicon Labs CP2102 Product Information
4. WEARABILITY http://www2.silabs.com/public/documents/tpub_doc/dsheet/
The nodes are mounted on a Velcro strip, what makes it very easy Microcontrollers/Interface/en/cp2102.pdf
to attach a node around one’s arm or leg. This way, the nodes do [9] Bevilacqua et al. Wireless sensor interface and gesture-
not interrupt the user’s movement (figure 8). follower for music pedagogy. NIME '07.
[10] Aylward and Paradiso, Sensemble: A wireless, compact,
multi-user sensor system for interactive dance. NIME '06
231
[12] Cypress Semiconductors Website CYWM6936 Product

[11] Flety, E., The WiSe Box: a Multi-performer Wireless. Sensor information
Interface using WiFi and OSC. NIME '05 http://www.cypress.com/products/?gid=14&fid=65&category
=All&
232
Sensory Chairs: A System for Biosignal Research and

Performance
Niall Coghlan R. Benjamin Knapp

Sonic Arts Research Centre Sonic Arts Research Centre
Queen’s University Queen’s University
Belfast, Northern Ireland BT7 1NN Belfast, Northern Ireland BT7 1NN
ncoghlan02@qub.ac.uk b.knapp@qub.ac.uk
ABSTRACT to door-locks. These 'embedded' computers allow these objects

to expand their functionality, to recognise individuals or
Music and sound have the power to provoke strong emotional contexts and behave appropriately. These computers require
and physical responses within us. Although concepts such as no guidance from us to fulfil their function, indeed in many
emotion can be hard to quantify in a scientific manner there cases we may be completely una ware of their presence,
has been significant research into how the brain and body interacting with them implicitly through our use of the objects
respond to music. However much of this research has been in which they live.
carried out in clinical, laboratory type conditions with intrusive
or cumbersome monitoring devices. One of the challenges involved in creating objects such as
these, is to make them behave appropriately when used in a
Technological augmentation of low-tech objects can increase variety of situations. One approach to creating such context
their functionality, but may n ecessitate a form of context aware objects is to give them the ability to sense the user’s
awareness from those objects. Biosigna l monitoring allows emotions or feelings. Emotions are a key part of what it means
these enhanced artefacts to gauge physical responses and from to be human and frequently influence our decisions and actions
these extrapolate our emotions. In this paper a system is over our capabilities for intellect and reason [1]. They are also
outlined, in which a number of chairs in a concert hall one of the most difficult area s of our psycho-physiology to
environment were embedded with biosignal sensors allowing understand, especially when trying to program an 'unfeeling'
monitoring of audience reaction to a performance, or control of machine to recognise them.
electronic equipment to create a biosignal-driven performance. This paper presents an overview of the 'Sensory Chairs' project
in which four aud ience chairs used in the Sonic Lab
This type of affective computing represents an exciting area of performance hall at the Sonic Art Research Centre, Queens
growth for interactive technology and potential applications for University, Belfast, were augmented with a variety of sensors
‘affect aware’ devices are proposed. to provide information about t heir occupants. The chairs
require no user interaction beyond their normal function, yet
provide physical and biometric data, over a network, to a
Keywords computer for visualisation, da ta recording and control of
external devices. Basic emoti onal state inference from the
Ubiquitous computing, context -awareness, networking, sensor data was also built into the chair software. The chairs
embedded systems, chairs, digital artefacts, emotional state were used as an indicator of a udience attentiveness or
sensing, affective computing, biosignals. enjoyment and incorporated into a performance.
1. INTRODUCTION
Increasingly computers are becoming part of our everyday
2. PROBLEMS OF ESS
lives, not only as our familiar laptops or towers, but built into Emotional state sensing is currently far from an exact science.
less inherently digital items, from fridges to water-filters, cars There are three core problems with accurately judging the
emotions of a human being using indica tors from the
autonomic nervous system (ANS) [2].
not made or distributed for profit or commercial advantage and that copies Firstly there is what is referred to as the ‘Baseline Problem‘,
bear this notice and the full citation on the first page. To copy otherwise, or finding a condition against which changes in the ANS can be
republish, to post on servers or to redistribute to lists, requires prior specific measured. How does one induce a state of emotional
‘neutrality’ in a subject for study? Individual physiological
Copyright remains with the author(s). characteristics also mean that readings may be at the high end
233
of the scale for one person, with lower readings for another, chair, both as independent streams and an interpolated view of
while both are experiencing th e ‘same’ emotion. data from all four chairs.
Environmental factors such as ambient temperature can also
play a part.
The chairs were fitted with a sensor package consisting of a
Secondly there is the ‘Timing of Data Assessment Problem’. light dependent resistor (LDR) mounted in the back of the
Emotions can be fleeting, arising and disappearing in a matter chair , two pressure sensors under the legs-one left, one right,
of seconds, Levinson [3] suggests that they may be as short as composed of Quantum Tunnelling Compound (QTC)) and a
0.5 seconds and last up until 4 seconds or beyond. This means galvanic skin response (GSR) sensor mounted on the arm of
that by measuring at the wrong time an emotion might be the chair. These allowed the system to capture various
missed. Other emotions may have a long initial onset, such as physical movements (posture in the chair-measured with the
anger, whilst some may be much shorter, such as surprise. LDR, Left /Right movement in t he chair-measured with the
QTC) as well as biometric data (weight and galvanic skin
response-QTC & GSR).
Thirdly there is the ‘Intensit y of Emotion Problem’, which
addresses the correlation betw een the magnitude of the
physiological response and the ‘intensity’ of the emotion felt.
At low levels of emotion there may be little response from the
3.2 Gauging Emotion
ANS whilst at higher levels th e pattern of ANS activity In order to create an ‘emotionally-aware’ system the data from
associated with a particular emotion may be destroyed. the four sensors was graphed according to the affect scale in
common use per Russell [7]. The X-axis was labelled
Other issues complicate the gr aphing and reporting of 'Valence' and its output was a combined product of the
emotions, such as how was the emotion induced, how was the pressure and LDR sensor data and was used as an indicator of
subject encouraged (or not) to ‘express’ the emotion and the occupants 'enjoyment' level. The Y-axis was labelled
complications from physiological responses not connected to 'Arousal' and is a product of the GSR readings. The GSR is an
emotional state [4]. indicator of skin conductance, measured across the hand, and
increases linearly with a person’s level of overall arousal [2].
Systems which rely on data from audience members also raise This was used as an indicator of the chair occupants’
questions relating to what we shall call ‘sensor ethics’. An intellectual engagement. This divides the graphing window
audience member may feel uncomfortable being monitored in into four distinct sectors or 'affect spaces', as may be seen
this way. Perhaps they wish to pretend to be enjoying the below, with some common emotions indicated.
piece for motives of their own . Perhaps they are
uncomfortable as ‘performers’ or have a medical condition that
the sensors might illuminate. For reasons such as these we
must approach audience monitoring/participation in the same
way as we would for conducting a physiological or
psychological experiment.
3 THE SENSORY CHAIRS

3.1 Sensors
Each chair was augmented with four sensors to provide Figure 1: Affect Spaces and Emotions
information about their occupants. A key factor in the choice
of sensors used and their placement was that they should be
passive i.e. require no specia l interaction from the chairs In order to facilitate off-line research, the data streams from
occupant, simply sitting in a chair is enough to activate some each chair may be recorded into a named text file for analysis
or all of the sensor package. It is unrealistic to expect a at a later juncture. Sensors are sampled at a rate definable by
potential audience member to learn to 'play' the chair or indeed the user into a text file and labelled with the ID of the relevant
to have to 'perform' in order to enjoy an event which they have chair (Table 1).
attended, possibly without prior knowledge of the interactive
nature of their seating. This is also important in contexts
where the chairs are being used to gather data about audience Table 1. Example of Recorded Data
reaction to a performance as o ne cannot expect 'natural' Ind QTC QTC GS
LDR ID
reactions if the subject has first to be 'wired up' with electrodes ex 1 2 R
or the research is carried out in an ‘unnatural‘ laboratory
134 909 998 1023 254 ChairB
environment.
135 907 998 1023 250 ChairB
Each chair was equipped with an Arduino [5] micro-controller 136 906 997 1023 249 ChairB
data acquisition board, mounted under the seat and connected
to the computer via a USB hub (serial over USB). Max/MSP
4.6 [6] was used to capture and visualise the data from each
234
This file may then be imported into a graphing program such 3.3.3 Experiment C
as Matlab or Excel for visualisation. The operator may also
select the option of recording an audio file in parallel with the A short binaural audio play wa s created to test the system,
data, again for later study in conjunction with the sensor data. played over headphones comprising a recording of footsteps
running towards the listener from behind, followed by a loud
gunshot very close by with a high degree of realism.
3.3 Experimental Results This produced both a physical reaction in the listener (a jump
in their seat) as well as a spike in the GSR reading, although
The ’Sensory Chairs’ system provides a way of moni toring
this was very brief and diffic ult to detect without more
multiple audience or performer biosignals at the same time and
sensitive equipment.
using this data to make basic emotional state judgements. It is
also possible to compare this data against the performance
which generated it, either as an audio recording or a recording
of the performers own biosignal data.
The following are 3 experiment s to demonstrate the

capabilities of the system and a preliminary examination of the
data:
3.3.1 Experiment A
During a performance of experimental electronic music, data
was recorded from four volunte ers seated in the Sensory
Chairs system. Audio from the three performances was Figure 2: Example Output A-Sensor Data Over Time
recorded simultaneously. The participants were unfamiliar
with the pieces played and wer e seated centrally in the
performance space.
Subsequent analysis of the dat a showed notable differences
between the magnitudes of individual participants’ responses
to the performance. Some prov ed very sensitive to GSR
monitoring while others showed more muted responses.
Comparison of the data with the accompanying audio indicated
fluctuations in the sensor rea dings that appeared linked to
audio events. During ‘calm’ o r ‘soothing’ portions of the
pieces we noted a lowering in the GSR reading, indicating a
relaxed state. Sudden loud so nic events following such
portions of audio showed a rising in the GSR indicating a state Figure 3: Example Output B-Sensor Data Over Time
of alertness. Accompanying these events were spikes in the
pressure sensors and LDR sensor indicating movement in the
chair, probably in response to the sudden sound. Figures 2 and 3 show the recorded sensor data for 2
individuals during the same performance and graphs the output
of each sensor (vertical axis) over 0.5 second intervals
3.3.2 Experiment B (horizontal axis). The Red and Blue plots show the readings
from the pressure (QTC) sensors, the Yellow the LDR and the
An interactive audio piece was created specifically for the Green line is the GSR. Note the values from the LDR remain
Sensory Chairs system. This was an ‘enactive’ composition in at maximum for most of the per formance indicating both
which the volunteers seated in the chairs generate d sonic participants leant heavily against the back of the chairs. In
events based on the biometric data sent from the system. Each Fig. 4 we see the participants GSR reading suddenly drop, this
chair/volunteer was assigned a specific ‘voice’ in the piece is as a result of the participant having removed their hand from
with rhythmic and melodic even ts as well as processing the sensor and then replaced it during the performance.
controlled by their emotions and movements.
We may clearly see the difference in magnitude of the sensor
readings for each individual a cross all the sensors. If we
A short questionnaire and info rmal debriefing afterwards compare the pressure (QTC) sensor data (Blue and Red) for
revealed that participants found it difficult to connect a sense both graphs we can see the individual in Fig. 2 remained
of control or ownership to the ir sounds. This illustrates a relatively still in their chair, whereas the individual in Fig. 3
mapping issue pertaining to em otional state sensing and shifted in their seat much more.
biosignals, how does one sonify an emotion or an affective
state?
Closer observation also reveals similar rise/fall cycles between
participants GSR, corresponding with relaxing or sudden
events in the audio performance.
235
4. FUTURE IMPROVEMENTS AND x New Musical Instruments – emotionally aware

instruments may allow for rich er interaction and
APPLICATIONS performance potential in the n ext generation of
4.1 Possible Expansions electronic instruments.
Future generations of the system will see the chairs augmented
with other sensors.
5. CONCLUSION
An EKG sensor was part of the initial design but it proved In order to allow facilitate d eeper levels of communication
impossible to get a reliable signal from electrodes positioned between technological devices and their human users, systems
on the arms of the chair. It may be possible to implement an must be developed that can interpret their users emotions and
EKG sensor using a more sophis ticated circuit design and moods.
electrodes than available during this design phase.
The study of biological and psychological reactions to sound
An Electro-Myogram (EMG) sensor would allow for detection allows us to explore and inter pret abstract concepts within
of muscle activity and electrodes could be unobtrusively placed psychoacoustics, such as the p leasant/unpleasant effects of
on the armrests of the chair, in a similar fashion to the GSR sound and emotional reactions to music and performance.
sensor. Currently little is understood outside of direct neurological and
neurochemical effects, observed under laboratory conditions.
Electric Field Sensing (EFS) would allow for some gesture
recognition capabilities and information about limb and head Biosignals can provide further depth of interaction between
movements. This would involve placing the EFS under the performers and their instruments and compositions, taking the
seat of the chair (to capture leg movements, foot tapping etc.) current generations of 'hyper' instruments to the next level.
or on the back of the chair (t o capture torso and head
movements). In the Sensory Chairs system we have presented a potential
tool for studying emotional reactions to music that may also be
used as a biologically driven instrument. We have also
presented some preliminary fin dings of our research into
4.2 Future Applications emotional and physical effects of performance.
Beyond the applications outlined so far in the fields of research
and performance there are a number of areas where obje cts
embedded with sensors or emotional state sensing capabilities
will prove advantageous.
6. ACKNOWLEDGMENTS
Many thanks to Dr. Sile O’Modhrain and Nick Ward..
x Affective Gaming - physiological monitoring of

gamers allows for adaptive gam e environments,
which can adjust difficulty levels or reward players 7. REFERENCES
for completing sections they f ind particularly 1 Scheutz M., Surviving in a Hostile Multi-Agent
difficult, or reward ‘courageous’ action. Environment: How Simple Affective States Can Aid in
x Communication - increasingly much of our the Competition for Resources. In Proc. 13th Canadian
communication is done virtually, via email, Conference on Artificial Intelligence (Montreal, Canada,
videoconferencing etc. Sensor enhanced 2000) pp. 389-399
communication could include some form of 2 Naksone A., Prendinger H., Ishizuka M., (2005). Emotion
visualisation of the other persons emotional state, Recognition from Electromyography and Skin
thought this has obvious privacy implications.. Conductance. In Proc. Of 5th International Workshop on
x e-Learning - By examining physiological signals the Biosignal Interpretation (BSI-05) (Tokyo, Japan, 2005)
program can determine a users level of interest, pp. 219-222
disinterest, frustration etc. and modify its teaching 3 Levenson R., Emotion and the Autonomic Nervous
rate or take an new approach i n order to keep the System: A Prospectus for Research on Autonomic
user engaged Specifity. In H.L. Wagner (ed.) Social Psychophysiology
x Psychological/Neurological research- most research and Emotion: Theory and Clinical Applications, John
into the psychophysiology of emotions so far has Wiley and Sons, 1988, pp. 17-42
taken place in lab conditions, the sensor enhanced 4 Picard R.,. Affective Computing. MIT Media Laboratory
chairs of this paper provide a tool for research in Perceptual Computing Section Technical Report No. 321,
alternative, non-clinical environments, more similar MIT, Cambridge, MA, USA, 1995
to those experienced in ‘everyday’ life. 5 http://www.arduino.cc
x Information retrieval - 'intelligent' virtual agents that 6 http://www.cycling74.com
are able to refine search results/data supply based on 7 Russell, J.A. A Circumplex Model of Affect. Journal of
emotional state e.g. user is 'sad', therefore play Personality and Social Psychology, 39(6), 1161-1178
uplifting music. (1980)
236
Wearable Interfaces for Cyberphysical Musical

Expression
Andrew B. Godbehere Nathan J. Ward
Cornell University Cornell University
Ithaca, NY, USA Ithaca, NY, USA
abg34@cornell.edu njw23@cornell.edu
ABSTRACT 2. HARDWARE
We present examples of a wireless sensor network as applied
The hardware components of the system, in essence, comprise
to wearable digital music controllers. Recent advances in
a basic Motion Capture (MC) system. Accelerometers placed
wireless Personal Area Networks (PANs) have precipitated the
at different points on the arms, legs, and head, track the
IEEE 802.15.4 standard for low-power, low-cost wireless
motion of the user. This data, collected at different points
sensor networks. We have applied this new technology to
around the body, must be transmitted to a computer for
create a fully wireless, wearable network of accelerometers
analysis and translation into music. To minimize hindrance to
which are small enough to be hidden under clothing. Various
the user, our MC system completely eliminates wires. Data is
motion analysis and machine learning techniques are applied
transmitted wirelessly and independently from each
to the raw accelerometer data in real-time to generate and
accelerometer to a base station, which is attached to a
control music on the fly.
computer.
Keywords This constitutes a wearable wireless sensor network, made

Wearable computing, personal area networks, accelerometers, possible by the emergence of the IEEE 802.15.4 standard [4].
802.15.4, motion analysis, human-computer interaction, live Though each node in the network is independently battery
performance, digital musical controllers, gestural control powered, each uses such little power that a small, light battery
is used for each, which can last for tens of hours of continuous
use. The robustness of the IEEE protocol enables reliable
communication within 100 feet of the base station, suitable for
1. INTRODUCTION a typical performance environment.
Music and dance are rarely separated, as they complement
each other so fully. The rhythms of music echo in the 2.1 Sensor Node Design Background
movements of the bodies of performers and audience alike. While sensor networks are relatively new, several have
We describe a digital interface which seeks to fully integrate previously been implemented [3] [7]. In one instance, sensor
music and dance by transforming the human body itself into a networks comprised of Eco motes have been applied to dance
musical instrument. [8]. Sensor networks used in live performance situations have
strict design requirements. Our system, focusing on real-time
The system described in this paper allows the user to create
music creation, is subject to these constraints and requires a
and manipulate music with motion and dance. To offer the
high level of perceived interactivity with minimal latency.
maximum flexibility for the musician, dancer, performing
artist, or DJ, the system is fully programmable and The system in [8] utilizes a mix of low-data-rate wireless
configurable for a wide variety of musical scenarios. Machine nodes in the 2.4 GHz band (with similar characteristics to
learning techniques offer robust customizable gesture support 802.15.4 networks) co-located with 802.11 transceivers. The
to create motion-based control commands. When coupled with 802.11 transceivers were responsible for communication
choreography, performance of electroacoustic compositions is across the performance environment. However, 802.11
possible with organic input introduced by the motions of a live transceivers are bulky and consume a lot of power.
performer. Additionally, it has been indicated that 802.11 networks co-
located with 802.15.4 networks significantly interfere with the
communication of the 802.15.4 networks [9]. Because of these
concerns, our design relies solely on 802.15.4 nodes. These
nodes are still capable of communicating across a performance
environment. A basic interference prediction technique,
Permission to make digital or hard copies of all or part of this work for similar to a more sophisticated version [6], is applied to
personal or classroom use is granted without fee provided that copies minimize incidental 802.11 interference and allow for fast and
are not made or distributed for profit or commercial advantage and that reliable data throughput.
requires prior specific permission and/or a fee. 2.2 Sensor Node Design
NIME08, June 5-7, 2008, Genova, Italy Each sensor node, or mote, consists of three main components:
Copyright remains with the author(s). the accelerometer, the radio, and the microcontroller. The
237
microcontroller and the radio (see Figure 1) are available Our system requires several independent sensor nodes to
together from Atmel’s Z-Link series, designed for Zigbee and communicate with a single base-station. Communication
802.15.4 networks. The Atmel radio, the AT86RF230, offers a latencies must be kept to a minimum, samples should be
digital radio solution that requires a bare minimum of external collected from each node at regular intervals, and power
components, allowing for low cost and a physically small consumption should be minimized. The 802.15.4 standard
footprint. A Linx chip antenna is used to minimize the form describes the Guaranteed Time Slot (GTS) feature that allows
factor of the devices. The three-axis accelerometers from rigid, reliable data transmission rates between network slaves
Kionix offer 6-g sensitivity and 12 bits of resolution, allowing and a network master. However, the GTS feature requires the
the sensors to detect fluctuations in acceleration as small as slaves to be either persistently listening, which wastes power,
0.003 g’s in any direction. The radio and the accelerometer or time-synchronized, which requires extra communication.
both interface with the microcontroller through an SPI (Serial
To solve this problem, our system utilizes a collaborative
Peripheral Interface) link, with speeds up to 2 Mbps, as the
virtual time slot allocation technique, which takes advantage
ATMega644 microcontroller is operated at 4 MHz.
of the Carrier Sense Multiple Access with Collision Avoidance
The radio operates in the 2.4 GHz band, although the IEEE (CSMA-CA) feature. In essence, when each node wants to
specification defines two other bands, around 800 and 900 transmit, it listens to see if the channel is busy. If it is not
MHz, which may be used when there is too much noise in the busy, it will wait a random interval before transmitting. After a
2.4 GHz band. The radio, when operating at 2.4 GHz is successful transmission, the node starts a deterministic timer,
capable of a raw throughput of 250 kbps. As each sample from corresponding with the desired sampling rate, which indicates
the accelerometer contains approximately 50 bits (12 bits * 3 when the node should transmit its next sample. In the steady
axes plus protocol overhead bits), each node is itself state, the node will transmit the next message after this pre-
theoretically capable of transmitting around 5000 samples per determined interval and will settle into a regular transmission
second. With a 5-sensor node system, the theoretical limit of schedule. If the node listens and finds the channel busy, it will
the rate at which samples may be collected from the entire wait a random interval before attempting to transmit again. It
system is around 1000 samples every second. This time will continue to wait and check the channel until it finds the
resolution is more than sufficient for a responsive system channel is not busy. At this point, the node will transmit its
without noticeable latency. Our experiments have used as few message.
as 60 samples per second with excellent results and no
With every node following this behavior, and using the same
noticeable latency. This wide range allows for successful
sampling rate, they will eventually settle into a schedule that
operation of the system even in electrically noisy
fits for every node, where no message overlaps, assuming the
environments where the communications rate is forced to
message lengths are short enough given the sampling rate that
drop.
is used. In addition, between each sample, the node can enter a
standby mode to reduce power consumption and extend
battery life. This scheme works well in a system such as this
sensor network where each data frame to be transmitted will
be of exactly the same length, and each node is taking samples
at exactly the same rate. Since the clocks are not
synchronized, however, and may actually run at slightly
different rates, the "set" schedule for each node is not actually
fixed. This scheme is flexible: as each sample timer is started
only after a successful transmission, the schedule is readjusted
such that no messages overlap. To minimize the latency jitter
this may introduce, a reasonably low sampling rate is required,
to allow some room in the transmission schedule for
readjustments.
In short, this transmission scheme allows for high throughput
without the communication overhead that would be required
with other schemes. Samples are transmitted on reasonably
Figure 1. A wireless node tight schedules that allow for little random jitter in the time
intervals between them, and is done without the use of
timestamps and the overhead of clock coordination.
2.3 Network Layer Design
The software that runs on each node in the network is built on
top of a custom library, designed according to an AT86RF230 3. SOFTWARE
software programming document [1], which encapsulates the The base station is connected to the computer via a USB
physical layer of the network. The network layer is kept very connection. FTDI's D2XX drivers1 allow direct access to the
simple to allow for fast implementation of new techniques, USB device through a DLL so our software can access it
which are not incorporated into a typical 802.15.4 Medium through a series of DLL function calls. We wrote this software
Access Control (MAC) layer. In addition, we are interested in using flext2, a C++ layer for cross-platform development of
a single-hop network and do not need many of the features the
full 802.15.4 specification provides. The networking layer we
have implemented is not 802.15.4 compatible, although the
1
physical layer is. http://www.ftdichip.com/Drivers/D2XX.htm
2
http://grrrr.org/ext/flext/
238
Max/MSP3 and Pure Data (Pd).4 This gives us an object, or 3.2 Motion analysis
external, to use in either of these graphical programming Patches were also written for movement and gesture
languages that interfaces directly with the base station through recognition. Patches were created to determine the magnitude
the USB connection and streams the accelerometer data into and direction of movements. Directionality is determined by
our Max/MSP or Pd programs, or patches. using the last known orientation of the sensor at rest as the
We then designed a suite of patches to enable use of the sensor initial state and comparing this to the detected vector of
network with direct and indirect mappings and to allow the movement. While we often used a simple measure of
user to create or manipulate music in real-time. The acceleration for the magnitude of a movement, we also found
accelerometer data can be processed in various manners to it helpful to track the duration of a movement as an important
extract inclination and orientation when accelerometers are not basic parameter.
moving (i.e. when overall acceleration is about 1g) and detect We considered two forms of gesture recognition, essentially
movements and gestures when in motion. By creating a library separating them into programmed and learned gestures. The
of low level data processing patches that analyze the raw programmed gesture schemes used a simple patch that detects
accelerometer data and extract meaningful parameters about when one specified action follows another within a specified
the sensor nodes, we were able to provide functional time frame. This enabled us to combine multiple movements
components for use in higher-level designs. such that the overall gesture occurs when one movement is
followed by another movement within a certain time period. A
3.1 Data processing useful instance of these manually programmed gestures was
The low-level library includes patches for calibration and that of recognizing a specified orientation followed by motion
converting ADC values to real measures of acceleration in g’s, in certain direction. We designed this example with an
calculating total acceleration, jerk, frequency, and overall accelerometer attached to the wrist to detect 6 orientations
activity, and determining orientation and inclination. The total (palm up, palm down, thumb up, thumb down, fingers up,
acceleration patch can be used for detecting overall fingers down) and 6 directions of movement (up, down, left,
acceleration of a sensor, but is also important in inclination right, forward, backward), which provide 36 different
error control. If the total acceleration of a sensor goes above orientation/movement combinations. When combined with a
1g, there are forces other than gravity acting on it and second sensor for the other hand, the number of
inclination calculations are no longer valid. orientation/movement combinations is in the thousands. This
example illustrates the ability to use the system to make
One simple orientation patch takes the raw acceleration of commands with an “alphabet” of gestures, much like flag
three axes as input and essentially outputs which axis is facing semaphore signaling uses two flags held in specific positions
upward, with a check that the accelerometer isn’t in motion to signify letters.
and a small bias toward the current orientation. For a more
accurate indication of the accelerometer’s position in three- The second form of gesture recognition uses machine learning
dimensional space, we created an inclination patch to use on a techniques to teach the computer a set of gestures. Then, an
per-sensor basis. It includes trigonometric calculations that use arbitrary motion can be recognized from that set in real-time.
gravity to determine angles referred to as pitch, roll, and yaw We explored gesture recognition with hidden Markov models
for rotation about the accelerometer’s x, y, and z axes. The (HMM) by utilizing the FTM and MnM libraries [1]. The
method maintains constant sensitivity and allows tilt angles system has the capability to learn gestures, e.g. drawing shapes
greater that 45° to be sensed accurately and precisely by using or numbers in the air, perform gesture following, and detect
the acceleration of all three axes in each calculation of pitch, gestures with accompanying degrees of certainty.
roll, and yaw [5]. For example, the pitch (X-tilt) calculation is
given by I in Equation 1.
§ ·
¨ ax ¸
I arctan¨ ¸ (1)
¨ ay az 2
2
¸
© ¹
After performing the three inclination calculations, making
further corrections with sign recognition, and testing whether
the sensor is moving and its data is valid, the patch outputs
accurate measures of pitch, roll, and yaw in degrees.
Note that while designed for our sensor system, these patches
also work with popular accelerometer-based input devices
such as the Nintendo Wii Remote and Apple iPhone.
Figure 2. Dancer performing with sensor system
4. APPLICATIONS
3 Our hardware and software infrastructure was applied to a
http://www.cycling74.com/products/maxmsp
number of scenarios with success. One of the most valuable
4
http://puredata.info was using the system on trained dancers (see Figure 2) with
239
the intention of not requiring the learning of any specific 6. ACKNOWLEDGMENTS

motions or gestures. In this situation, we wanted the design of The authors would like to thank Bruce Land and Kevin Ernste
the piece to allow for creativity and freedom of expression of for providing an environment conducive to this work. They
the dancer. We attached four sensors to the performer’s hands would also like to thank Cisco Systems, Inc., Kionix, Inc., and
and feet, mapping continuous parameters of the dancer’s Atmel Corporation for their support in developing this system.
motion onto algorithmic compositions. In a typical example, This work was performed while Andrew Godbehere and
movement of each sensor would influence particular Nathan Ward were students at Cornell University where
instruments. For each sensor, subtle movements could Godbehere was studying Electrical and Computer Engineering
generate quieter sounds while quicker or longer motions and Ward was studying Computer Engineering and Music.
triggered louder sounds that could be from different sets of Godbehere focused on the hardware for sensor data acquisition
instruments. Although the performer doesn’t have control over while Ward focused on the software for data interpretation.
the particular notes being played in this scenario, the type of Both authors contributed equally to this paper.
movement influences the harmonic direction of the piece. We
were able to effectively communicate these types of mappings
to the choreographer, who was free to focus on dance without
a need for the dancer to correctly perform specific gestures. 7. REFERENCES
This scheme worked well because the responsibility of musical [1] Atmel Corporation. AVR2001: AT86RF230 Software
content is shifted to the programmer. Programmer’s Guide, 2007.
On the other hand is a contrasting scenario in which the http://www.atmel.com/dyn/resources/prod_documents/do
performer has a more direct influence over the music. A DJ or c8087.pdf.
other musician needs functionality for precise control, so we [2] Bevilacqua, F., Muller, M., and Schnell, N. MnM: a
depended more heavily on direct mappings and gesture Max/MSP mapping toolbox. In Proceedings of the 2005
recognition in these instances. For example, in one case we put International Conference on New Interfaces for Musical
a sensor on one hand that allows the performer to make Expression (NIME05), Vancouver, Canada, 2005.
commands and “push buttons” via gesture recognition, and a
[3] Gao, T., Massey, T., Selavo, L., Crawford, D., Chen, B.,
second sensor on the other hand to control continuous
Lorincz, K., Shnayder, V., Hauenstein, L., Dabiri, F.,
parameters via multi-dimensional inclination and “twist
Jeng, J., Chanmugam, A., White, D., Sarrafzadeh, M.,
knobs.” This case was successful because the first hand was
and Welsh, M.: “The Advanced Health and Disaster Aid
relegated to performing discreet actions with recognized
Network: A Light-weight Wireless Medical System for
gestures while the second could be used for continuous
Triage” in IEEE Transactions on Biomedical Circuits and
parameters. For instance, the gestures of the first hand could
Systems, in press, 2007.
trigger the next part of a song, control loops, switch
instruments, etc. while the second hand could do things such [4] Gutierrez, J., Callaway, E., and Barrett, R. Low-Rate
as control the levels of multiple effects or act as a theremin- Wireless Personal Area Networks: Enabling Wireless
like instrument. Sensors with IEEE 802.15.4, Second Edition. IEEE Press,
New York, NY, 2007.
The system has also been applied in other interactive media
settings, including use as an alternative gaming controller and [5] Kionix, Inc. Tilt-Sensing with Kionix MEMS
as a human-computer interface for navigating operating Accelerometers, (Nov. 30, 2007). http://kionix.com/App-
systems and controlling computer applications with gestures. Notes/AN005%20Tilt%20Sensing.pdf
[6] Musăloiu-E., R. and Terzis, A., Minimizing the effect of
WiFi interference in 802.15.4 wireless sensor networks.
Int. J. Sensor Networks, Vol. 3, No. 1, 2008
5. DISCUSSION
This sensor system has been a powerful tool for musical [7] Park, C., and Chou, P., Eco: ultra-wearable and
expression in translating human movement to music. Although expandable wireless sensor platform. International
the sensors and auditory output are external processes, they are Workshop on Wearable and Implantable Body Sensor
based on internal human motivations, and the system was able Networks, 2006.
to capture one’s natural motion and materialize the intangible [8] Park, C., Chou, P., and Sun, Y., A Wearable Wireless
processes of the performer. Sensor Platform for Interactive Dance Performances.
Further work will include increasing the reliability of the Proceedings of the Fourth Annual IEEE National
hardware system as well as decreasing its size and power Conference on Pervasive Computing and
consumption. We also plan to increase the robustness and Communications, Pisa, Italy, 2006.
flexibility of the software patches, hope to improve the [9] Petrova, M., Wu, L., Mahonen, P., and Riihijarvi, J.
usability of the gesture recognition system, and test scenarios Interference Measurements on Performance Degradation
using a greater number of sensor nodes. between Colocated IEEE 802.11g/n and IEEE 802.15.4
Networks. Sixth International Conference on Networking,
2007.
240
MusicGlove: A Wearable Musical Controller

for Massive Media Library
Kouki Hayafuchi Kenji Suzuki

Dept. of Intelligent Interaction Technologies Dept. of Intelligent Interaction Technologies
University of Tsukuba, Japan University of Tsukuba, Japan
kouki@ai.iit.tsukuba.ac.jp kenji@ieee.org
ABSTRACT Recently, a system that controls the tracks like a DJ has also
This research aims to develop a wearable musical interface been proposed in [6].
which enables to control audio and video signals by using In this study, we focused on a sophisticated interface that
hand gestures and human body motions. We have been enables users to control the sound and music in intuitive
developing an audio-visual manipulation system that real- and efficient manners. The glove-like input device is one
izes tracks control, time-based operations and searching for of the conventional interfaces for human-computer interac-
tracks from massive music library. It aims to build an emotion. The developed system, MusicGlove, has a role of in-
tional and affecting musical interaction, and will provide teractive music player and explorer, which performs tracks
a better method of music listening to people. A sophisti- control, time-stretching of audio and video signals, and in-
cated glove-like device with an acceleration sensor and sev- formation retrieval from massive music library as a result
eral strain sensors has been developed. A realtime signal of hand gestures and body motion recognition. In particu-
processing and musical control are executed as a result of lar, time-based multimedia interaction including audio and
gesture recognition. We also developed a stand-alone de- video signals becomes popular in these years. To date some
vice that performs as a musical controller and player at the rich time-based operations have also been proposed for dif-
same time. In this paper, we describe the development of a ferent applications [7].
compact and sophisticated sensor device, and demonstrate Gestural control allows the real-time control and high
its performance of audio and video signals control. affinity for expressive performance. The target of Music-
Glove project includes an all-in-one device that can control
music and generate audio sound by itself. In this instance,
Keywords the user can listen to music that is produced by the wear-
Embodied Sound Media, Music Controller, Gestures, Body able device. Therefore, people can enjoy musical control by
Motion, Musical Interface using MusicGlove at any time and place, even in transit or
on the walking. The developed device and system can con-
tribute to new listening style of music from massive music
1. INTRODUCTION library.
The listening habits of people have been dramatically
changing in recent years because people can bring massive
libraries of digital music with small portable music players
2. SYSTEM OVERVIEW
like iP od. In this situation, it is required to build a new In this study, the musical control is mainly divided into
system that people can find desired music from enormous two functions; tracks control and audio/video time-based
numbers of digital media data. Many methods have been control. The tracks control is regarded as common manip-
proposed to address this problem, for example, a method of ulations of music player such as play, stop and skip to the
graphical visualization to organize lucidity music libraries next music. In addition, a function of searching tracks from
were suggested [1]. music library is implemented, which is similar to the manip-
In order to allow users to provide more degrees of free- ulation performed by a DJ. On the other hand, audio/video
dom, a variety of physical input devices such pen tablet, dial time-based control is regarded as signal processing which di-
or glove shaped interface are commercialized and widely rectly controls sound waveform such as change of tempo or
being used in various fields. In addition, intuitive input addition of tonal effect, which is a resemblance function of
devices such as touch and haptics have been attracting so- audio signals.
cial attention. To date there are a number of researches
about the gesture interface for music [2, 3]. For example, 2.1 Hardware Overview
systems for musical controller that are able to control elec- The overview of the developed glove-like sensing device
tronic devices by using simple finger gestures were proposed is shown in Figure 1. The device consists of one 3-axis
in several studies such as FreeDigiter [4] and Ubi-Finger [5]. acceleration sensor, 4 strain sensors, 1 microprocessor for
signal processing and control, Bluetooth wireless module, a
portable music player that is used in the Wearable Music
play (all-in-one application), and a battery. The measure-
Permission to make digital or hard copies of all or part of this work for ment range of the acceleration sensor is from ±10g. As the
personal or classroom use is granted without fee provided that copies are sensor is fixed on external side of the wrist part, X, Y, Z-
not made or distributed for profit or commercial advantage and that copies axis are also fixed at a given position. Four strain sensors
bear this notice and the full citation on the first page. To copy otherwise, to are mounted at upside of index finger and mid finger, and
republish, to post on servers or to redistribute to lists, requires prior specific also inner and exterior side of wrist as illustrated in Figure
NIME08, Genova, Italy 2. The strain sensors provide analog values of bending of
Copyright 2008 Copyright remains with the author(s). each position. The glove like device has light weight and
241
Strain Sensor Battery (9V) Acceleration Gesture

Sensor y Strain
Sensor Data Earphone
Player Control
z x Acceleration
Sensor Data
Wireless
Transmission
Speaker
Music Files Audio Data Buffer Audio Control
LED Microprocessor
Wireless Communication Module
Video Files Video Data Buffer
Screen
Strain Sensor
Video Control Synchronism
Figure 1: The overview of the MusicGlove in-
put/output device
Figure 3: Flow diagram
crease the number of usable gesture. Due to analysis of

hand gesture for musical control, the user, therefore, do not
need to have training and consciousness about the switching
of the method.
3.1 Classification of the performed gestures

Max/MSP and EyesWeb[8] software architecture are used
for controls of audio/video signals and data receiver. In
this study, Eyesweb is solely used as a data receiver and
Figure 2: Arrangement of sensors transmitter to Max/MSP. However, the system, can be ex-
tended to incorporate with camera-based recognition and
Table 1: Gestures and Functions other performance installations. The media library is im-
Mode Function Gesture ported to a buffer in advance, and Max/MSP then receives
Tracks control Play Bend index finger sensor data from the glove device on a steady basis. A hand
Pause Make a fist
Volume up/down Wrist up/down
shape is recognized in a successive manner by means of some
Next/Previous music Pointing right/left filtering techniques based on the sensor model.
Sound control Fast forward Wrist rotation The interaction is classified mainly into 3 styles: (1) Air
Fast rewind Wrist rotation disc jockey (DJ), (2) gestural conducting, and (3) wear-
Tempo up/down Wrist up/down able music. These include three common modes for musical
Scratch Scratch motion control: i) tracks control mode, ii) sound control mode, iii)
Searching tracks Searching Hand Motion
search for audio tracks mode, Each mode is continued until
Shuffle *Acceleration used
to be forward the user changes mode by changing his/her hand posture.
to next music
3.2 Three styles of interaction
is designed to satisfy the minimum requirement for musical (1) Music Player control - Air DJ : As shown in figure4,
control. The arrangement of sensors as illustrated in Figure the system allows users to control musical features at the ex-
2 is determined as a result of preliminary experiments. The ternal computer, and audio sounds are produced from loud-
microprocessor is used to obtain sensor data, to perform speakers installed in the surrounding environment. The user
gesture recognition, and then to transmit processed data wears a wireless sensor glove, and sensory data are trans-
to the wireless module. The device communicates with the mitted to the computer. When the user does some hand
host computer via Bluetooth. In addition, the music player gestures or body motion, the computer translates them into
can be connected to the microprocessor via dock connec- musical control and produce audio sounds.
tor port. In the all-in-one application, the microprocessor
generates control signals for the music player. (2) Gestural conducting - Air Conductor : We explain
about a gestural conducting which is regarded as a time-
based interaction by using audio/video signal rendering. As
3. GESTURES AND FUNCTIONS illustrated in figure 3, the system allows users to control not
Figure 3 shows the flow chart of data and signal process- only audio signals by user’s gesture but also videos associ-
ing. The acquired sensor data from the acceleration sensor ated with music like a conductor. The performance video
and strain sensors can be transmitted to the host computer such as orchestra’s playing or musical dance is controlled
at 60Hz. Controls of sound effect, tracks and video are exe- in accordance with the user’s conducting behavior. The
cuted according to predetermined gestures in the host com- audio/visual performance provide high immersive environ-
puter . The audio signals are presented by the loudspeakers ment.
or attached headphones. A video sequence with polyphonic audio such as orches-
The classification of performed gesture is described as tra or quartet play is used in this study because the mu-
follows. In particular, we focus on the hand posture based sic quality is less tolerated compared to the video sequence
on index finger and wrist. All hand gestures are classified with song or speech. By utilizing the acceleration sensor
mainly into 3 categories according to the measured data by values, the control of audio volume and time-stretching of
equipped sensors. Different category of gestures has a differ- audio/video signals are carried out in real time. A modified
ent role. While the user takes the particular posture, a pre- phase vocoder algorithm with noise reduction technique is
determined style of musical controller is activated. While applied to achieve the time-stretched audio signals. Regard-
the user is holding the initial hand posture, the user is able ing the visual signal processing, a simple speed control of
to realize different control methods. When a control meth- playback is used. The audio volume is also controlled by
ods will be enabled, other control methods are disabled at the accumulated value from the triaxial acceleration sen-
the same time. This processing enables the system to avoid sor. The tempo control is based on extracting beats as a
undesired behavior of the system, and it also helps to in- inflection point of Z-axis acceleration value.
242
Glove Device
Display / Speaker Acceleration sensor
Wireless communication module
Strain sensor 4
Control Computer
Max/MSP, EyesWeb
Gesture recognition, Beat extraction
Audio/Visual processing
synchronism
Figure 4: Air Disc Jockey (DJ) Figure 5: Air Conductor Figure 6: Wearable Music
When a user begins to swing his/her arm with a constant motion which is detected based on the accumulated value
tempo and keeps three times within a certain tempo change, of the acceleration sensors in all-axes. The user listens to
this conducting interaction is initiated. audio data, and acts a grasping motion of playing music
when she/he finds a desired track or library. Grasping mo-
(3) Wearable music: This is another application of the tion is detected based on the strain sensors. In addition,
developed MusicGlove. A portable music player can be waving user’s hand is regarded as “ shuffle search. ” The
attached with the glove device, and the users are able to user is able to choose a media library or tracks in a random
control music player by his/her gestures as illustrated in manner. These manipulations provides users with intuitive
Figure 6. This enables the system to be stand-alone, and search like grasping a music at the air.
users can listen to music via headphone or earphones that
are directly connected to the developed device without any
other equipments. The embedded microprocessor produces 4. PERFORMANCE DEMONSTRATIONS
a control signal to the player such as: play or stop tracks, In this section, we show some performance demonstra-
skip to the next or previous music, fast forward and rewind, tions with the developed device. We first describe time-
and volume control by means of acceleration and strain sen- series examples of sensors and gesture classification. Next,
sors. The predetermined gestures are the same as ones in a example of waveform regarding the scratch motion will be
the track control mode. shown with the spectrogram.
Behavior of Sensors: We carefully arranged the sen-

3.3 Common Control Mode sors’ location and filtering algorithm in order to achieve
i) Tracks control mode: The control of tracks is done the gesture classification. An example of time-series sen-
according to the hand posture, which includes the following sor data obtained from the strain sensors during a tracks
functions: play, stop, skip to the next music, back to the control gesture are shown in Figure 7. The tracks control
previous music, and volume control. This mode is initiated mode begins at the point A when the user took the prede-
when the user stretches index finger, and the hand is then termined posture. At the point C and D, the user makes
shaped like pointing to the air. For example, playing music gestures to be forward to next track, and back to previous
is done by bending down index finger a little. To skip to the track, respectively. At the point B and E, the user plays
next or previous music, the user should make pointing to and stops the audio track. The Figure 8 represents ges-
the left or right by index finger. The volume control is done tures of sound control. This mode begins at the point A
by using data from the acceleration sensor, in particular, and the user made a gesture of scratch motion at B, C,
the value of Z-axis. When the sensory value exceeds a pre- and D. On the other hand, at two points E, scratch effect
determined threshold level, the amount of volume increase is not occurred. The acceleration value in Y axis exceeds
and decrease are done. the threshold because the user made different gestures from
ii) Sound control mode: This mode is initiated by bend- previous gestures. The system successfully distinguished
ing the wrist toward the palm. In this mode, the user is the intended gestures for musical control. The Figure 9
able to control audio sound features. The predetermined represents a time-series sensor data during a gestural con-
controls include: fast forward and rewind, changing tempo trol of ”search of audio tracks.” This control is initiated
without time-stretching, and scratch. The acceleration sen- when the user extends his wrist and the value of strain sen-
sors of X and Y axes caused by circular motion of hand are sor exceeds the predetermined posture, indicated by A in
used for fast-forward and fast-rewind operations. A clock- the figure. The search of audio tracks is occurred according
wise rotation corresponds to fast-forward, while fast rewind to the accumulated value of the summation of the triaxial
corresponds to the counterclockwise rotation. Regarding acceleration as indicated by the points B, C and D. The
the tempo change, solely audio resampling by changing the user then makes play gesture at the point E, and the chosen
speed results in the pitch-shifting effect. Scratch play like track begins to play.
DJ is available as a result of filtering value from X-axis ac-
celeration sensor. In order to prevent a false operation, the Waveform of audio output signals: We examined the
user must not generate acceleration signal in Y and Z axes quality of audio output signals modified by the user’s gestu-
when during scratch motion. ral control. In this section, we particularly focus on the Air
iii) Search for audio tracks mode: This mode is initi- DJ style and present the waveform of audio output signals
ated by bending the wrist toward the back of hand. In this during scratch motions. The Figure 10(a) shows the spec-
mode, the user is able to search audio tracks by grasping trogram during a scratch motion measured by the developed
gesture at the air. In this mode, audio tracks presented to device. The Figure 10(b) shows the spectrogram during a
the user in a successive manner. Search for audio tracks is scratch motion which is performed by using a commercially
regarded as repetition of trial and error to choose a music available digital turntable system. The region indicated by
(or album) from massive libraries. The system will present dotted line represents the period of the scratch motions.
the first part of a track in a music library at each step of There appears to be particular spectrum features because
search successively according to the user’s searching mo- the scratch performs as a short fast-rewind and the partic-
tion. The searching motion is regarded as a simple hand ular frequency spectrum can be seen.
243
A B C D E A B
Frequency [Hz]
Frequency [Hz]
Voltage [V]
Extension of wrist Strain of index finger

Flexion of wrist
Strain of middle finger
Time [s] Time [s]
(a) MusicGlove (b) Turntable
Figure 10: Spectrogram comparison

Time [Sec]
Figure 7: Tracks control mode

from the device itself. In recent years, we are able to access
to enormous amounts of media data with a portable de-
vice. A better style of interaction than conventional control
AB C D E
by means of buttons or cursor keys is definitely required.
Some media players that accept haptic control on the LCD
are already commercialized.
Flexion of wrist
The challenges to associate human body motion with mu-
Acceleration [G]
sical control have a quite long history and will be contin-

Voltage [V]
ued with the advancement of sensor and wireless technolo-

gies. We have been investigating the embodied musical in-
Z axis teraction by means of expressive behavior and gestures [9],
Accelerations X axis Y axis namely Embodied Sound Media. The next stage of this re-
search includes the implementation of physiological sensors
in order to extend the control capability by users. The
Time [Sec] effort to achieve a non-intrusive and transparent interface
Figure 8: Sound control mode is continued, which allows humans to make more natural,
exciting and artistic performance with machines.
A B C D E Acknowledgement
A part of this work is supported by Japan Science and
Technology Agency (JST), CREST ”Generation and Con-
Extension of wrist
trol Technology of Human-entrained embodied Media.”
Acceleration [G]
Voltage [V]
6. REFERENCES
[1] M. Torrens, P. Hertzog and JL Arcos. Visualizing and
Accelerations of X,Y,Z Strain of index finger Exploring Personal Music Libraries. Proceedings of the
5th International Conference on Music Information
Retrieval (ISMIR2004) , 2004. Avaiable online.
[2] M. Wanderley, Gestural Control of Sound Synthesis.
Time [Sec] Proc. of the IEEE, 92(4), pp. 632-644, 2004.
Figure 9: Search for audio tracks mode [3] K. Ng, Music via Motion: transdomain mapping of
motion and sound for interactive performances, Proc.
In addition, a distinguishing spectrum feature can be seen of the IEEE, 92(4), pp. 645-655, 2004.
in the region A of Figure 10(a). That wave pattern is simi- [4] C. Metzger, M. Anderson and T. Starner, ”FreeDigiter:
lar to the region B which is seen in the Figure 10(b). The A Contact-free Device for Gesture Control,” Proc. of
wave pattern is generated by a particular rotation of the the Intl. Symp. on Wearable Comput., pp. 18-21, 2004.
turntable after the scratching action. That can be said that [5] K. Tsukada, and M. Yasumura, ”Ubi-Finger: a Simple
we have duplicated the behavior of digital turntable sys- Gesture Input Device for Mobile and Ubiquitous
tem. The waveform of audio output by the developed sys- Environment,” Journal of Asian Information, Science
tem has quite similar characteristics with one by the digital and Life(AISL), Vol.2, No.2, pp. 111-120, 2004.
turntable system in terms of temporal transition. The re- [6] K. F. Hansen and R. Bresin. DJ Scratching
sponse to gestural control is enough fast to realize natural Performance Techniques: Analysis and synthesis, Proc.
sound effects by scratch. Stockholm Music Acoust. Conf., pp. 693-696, 2003.
[7] E. Lee, et al., Toward a Framework for Interactive
5. DISCUSSION AND CONCLUSIONS Systems to Conduct Digital Audio and Video Streams,
In this paper, we proposed a method of active music lis- Comput. Music J., 30(1), pp.21-36, 2006.
tening for massive media library. Different styles of musical [8] Eyesweb, InfoMus Lab, [web]
interaction are realized by using the sophisticated glove-like http://www.infomus.dist.unige.it/EywMain.html
input device. The developed system allows humans not only [9] K. Suzuki et al., Robotic Interface for Embodied
to control audio and video signals but also to search audio Interaction via Dance And Musical Performance, Proc.
tracks from massive media library by hand gestures and of the IEEE, 92(4), pp. 656-671, 2004.
body motion. In addition, it is possible to listen to music
244
An Elementary Method for Tablet

Michael Zbyszyski
Center for New Music and Audio
Technologies, UC Berkeley
1750 Arch Street
Berkeley, CA 94709 USA
+1.510.643.9990
mzed@cnmat.berkeley.edu
ABSTRACT then compared to research on human-computer interfaces,

This paper proposes the creation of a method book for tablet- with the goal of combining the best of both fields. Finally, the
based instruments, evaluating pedagogical materials for creation of a new method book for tablet-based interfaces is
traditional instruments as well as research in human-computer proposed, and the details of this book are described.
interaction and tablet interfaces.
2. INSTRUMENT PEDAGOGY
Keywords Many musicians choose new instruments specifically to
Wacom tablet, digitizing tablet, expressivity, gesture, escape the baggage that comes with a “classical” instrument.
mapping, pedagogy, practice As someone who plays both, I appreciate the freedom of being
in uncharted territory with an interface I have designed. I am
not worried about playing in all twelve keys, memorizing
1. INTRODUCTION excerpts, or learning riffs. However, I also miss some of the
In 2006, Christopher Dobrian asked whether the ‘e’ in NIME discipline: the focus of playing the same warm-up exercise or
(expression) was being adequately addressed by researchers the same bit of repertoire over many years and coming
and performers of real-time computer music.[10] He went on understanding how my own technique develops. Traditional
to define musical expression as “the nuance that a live method books tend to have a few different kinds of material:
performer adds to the available materials.” Examining Études (short musical pieces focussing on a particular musical
whether or not machines can be expressive is beyond the skill), exercises (scales, patterns, etc.), and practical advice
scope of this work. In the case of a live performer, the (how to practice, how to hold the instrument, etc.)
possibility for expressive nuance is constrained by the
sensitivity of the interface/instrument and the performer’s 2.1 Études
ability to take advantage that sensitivity. Études are the quintessential pedagogical material, and
It is difficult to be expressive on a new instrument because the represent both the best and the worst of learning an
fact of its newness means that the performer has not had the instrument. Études range from music that is extremely
time to learn it1. The activity of instrument building is very expressive and wonderful, such as Chopin Opp. 10 and 25 or
involving, and it is hard to resist the temptation to keep Bartók’s Mikrokosmos, to pieces that are, at best, mechanical
building until the last possible second. It is important to and utilitarian. Students of the piano are probably familiar
schedule time to learn to play our instruments, but even with with method books by Hanon [14] and Czerny [6,7]. These
days or weeks of practice we are still beginners. It has been methods are explicitly designed to develop the physical skills
suggested that it takes more than a decade to learn a musical of playing the piano. Although widely used, they are often
instrument.[16] Furthermore, traditional instrumentalists are criticized for their emphasis on repetition and lack of
aided by centuries of pedagogical materials and methods, musicality.[27] It is clear that physical skills are required to
which demonstrate that there is more to practicing and interpret virtuosic repertoire, but teachers [23] question the
learning than just playing repertoire. With a newly developed need to develop these skills in a context that is not expressive
instrument it is hard to know how to practice; nobody else has and may even cause physical damage.
ever learned to play this instrument and there is no body of If there is any element that pushes students away from
repertoire to suggest how it could be played. classical training, it is here. This kind of étude, as part of an
This paper addresses the question of how to practice by overall curriculum, can be successful in developing a certain
proposing a method book for tablet-based instruments. dexterity, but that dexterity is useless without musicality. It is
Suggesting that more emphasis be placed on expression was important for students to tackle challenges that are beyond
the first step. With the focus now on expression, we can their current abilities, and to work. But it is also important for
inspire performers to practice by showing them how. The students to be mentally, as well as physically engaged [11]. If
strengths and weaknesses of the pedagogical methods of one must learn to play octaves, for instance, it would be better
traditional instruments are examined first. These methods are
1
There are performers in the NIME Community – Michel Permission to make digital or hard copies of all or part of this work for
Waisvisz with The Hands personal or classroom use is granted without fee provided that copies
(http://www.crackle.org/TheHands.htm) and Laetitia are not made or distributed for profit or commercial advantage and that
Sonomi with the Lady’s Glove otherwise, or republish, to post on servers or to redistribute to lists,
(http://www.sonami.net/lady_glove2.htm) are two examples requires prior specific permission and/or a fee.
– who have devoted substantial time to developing their NIME08, June 5-7, 2008, Genova, Italy
own personal performance practice, and to great effect. Copyright remains with the author(s).
245
to learn by playing Chopin Op. 25 no. 10, rather than just understand that other texts will provide the missing pieces of
scales in octaves. the curriculum. In the case of the tablet method, there are no
other methods to fill in the gaps. It will be important that all
2.2 Basic techniques three of these categories are represented.
The most successful methods address the development of 3. STYLUS and TABLET RESEARCH
executive skills in a larger context of attentive practice and
musical development. When physical skills need to be 3.1 An extremely short history
practiced, they should be focused on specifically and with the
same intensity as music making. This describes the approach
of Joe Allard [19], who strongly influences David Liebman’s
method (see table 1). [17] Liebman does not offer the student
any musical études, instead he devotes the first seven chapters
of his book to the act of making a sound with the saxophone,
covering the mechanism part by part, offering visualizations
and physical exercises. His discussion of expressive
techniques covers devices such as pitch bends, portamento,
and vibrato, but does not address how to be expressive, as
defined above, but presents techniques that could be used for
“furthering one’s personal expression, so long as it is within Figure 1: Bert Sutherland at the TX-2, with light pen [1]
the bounds of artistic and musical taste.” Finally, he offers
The first appearance of a pen-computer interface is the
advice on practicing, which make it clear that Liebman
Lincoln TX-0 computer from the MIT Lincoln Laboratory in
expects the student (or teacher) to find other sources for
1957 [31]. There are many music-specific implementations of
études (e.g. [21]) and repertoire that will round out a whole
tablet and spatial interfaces, including Fairlight CMI 2
curriculum.
(although not for real-time performance), Xenakis UPIC [18],
Table 1. Chapter headings from Developing a Personal Buxton SSSP [5], Boie/Mathews/Schloss Radio Drum [3].
Saxophone Sound [17]
Chapter One Overview of The Playing Mechanism
3.2 HCI
Chapter Two Breathing Much can be learned about tablet and stylus interfaces from
literature of human-computer interaction.[22] An important
Chapter Three The Larynx
early study of pointing technologies was done by Paul Fitts in
Chapter Four The Overtone Exercises 1954.[12] His formulation, now referred to as Fitts’ Law,
Chapter Five The Tongue Position and Articulation predicts the time required to rapidly move to a target area, as
Chapter Six The Embouchure a function of the distance to the target and the size of the
target. This work has been expanded with the Steering Law,
Chapter Seven Reeds and Mouthpieces
[1] which deals not just with targets, but also with trajectories.
Chapter Eight Expressive Techniques This work shows that tablets out-perform other input devices
Chapter Nine Practicing (mouse, trackpoint, touchpad, and trackball). While both laws
have wide reaching implications for designers of interfaces,
In a two-hour practice session, one hour is devoted to the focus on untrained movements limit applicability of
different categories of tone exercises, 20 minutes to sight- authors of method books. However, the underlying metrics for
reading, and 40 minutes to “scales, arpeggios, and intervals … evaluating interfaces (indexes of performance) could be
in order to learn the alphabet of music.” For a method book to applied to evaluating performers and their progress. Also,
function in the context of “new” music and new interfaces, the selection of mappings and gestural situations are especially
possible alphabet(s) of music would need to expand beyond critical when preparing an instrument for students. [25]
these patterns. Also, there is no expressive music making in
this practice session – that happens at some other point. The 4. A TABLET METHOD BOOK
point of practicing is “to insure that the needed and physical 4.1 Why Tablet?
and technical manipulations occur quickly and efficiently, so It would be impossible to write a method book that addressed
that a musical idea is immediately transferable from ear to the entire range of instruments that arrive at NIME. While
mind with the soul (emotions) monitoring the entire process.” practicing and learning can be addressed in a general context,
2.3 Practical Information and Advice the details of implementation and developing performance
A third type of material in a pedagogical method is practical skills are specific to an instrument. Since specific skills are
information and advice. Steve Lacy offers a wealth of critical to understanding performance practice, it is desirable
information in his book Findings [15]. In addition to standard to develop a whole method, from basics to real music, around
fare, such as fingering charts, he advises against smoking and one instrument as an example for other instruments. The
poetically describes the rigors of life as an improvising Method for Table could potentially spawn a whole series:
musician. This book also has exercises and études. Method for Wii Remote, Method for Footswitch, etc.
Previous work [29] surveyed musical work with tablets, and
The Inner Game of Music [13] moves away from the category presented reasons why digitizing tablets make good interfaces.
of method book entirely, offering exclusively advice in prose. Briefly, the tablet interface offers:
With no musical examples, this is still an important addition
to instrument pedagogy. Like the methods above, the authors
2
http://en.wikipedia.org/wiki/Fairlight_CMI
246
Low cost the method in Pd6, to be compatible with the largest number
Easy availability of possible users.) Incoming tablet data is mapped using an
High resolution output data Open Sound Control [28] wrapper, which is part of the
Fine temporal resolution CNMAT Max/MSP/Jitter Depot. [30]
Multiple axes of control
These qualities are even more important in choosing the focus
for a method book than they are for choosing one’s personal
instrument. It would be impractical to write a method for a
unique interface, no matter how good it is. The desire is for
people to use this text, either for individual practice, in
groups, or in the classroom.
Tablet interfaces offer other benefits as an instrument for
beginners. Stylus-based interfaces outperform other pointing
devices, such as joysticks, because they leverage the high Figure 3: An exercise based on Engraver Script by Willis
bandwidth of the thumb and finger in combination [2]. Most A. Baird (http://www.zanerian.com/BairdLessons.html)
performers come to the tablet with pre-existing pen skills, and
physical demands of the instrument are such that they are
attainable by a large number of users. (There are no issues The second section consists of basic exercises, analogous to
with handedness, for instance.) Tablet interfaces have been scales and arpeggios of classical instrumental technique. Their
part of the NIME community since the beginning [26] and are nature as interactive software means that some of the pitfalls
now are well established, appearing both in performance and of exercises (e.g. mindless repetition) are avoided. While the
print [8, 24, 29]. instrument mapping should stay the same, the content and
difficulty of an exercise adapts to the level of the student. An
alternate model for these exercises is a computer game[9].
The third section is the largest and most musically interesting.
It consists of études by a number of composers. For example:
• M. Zbyszyski’s News Cycle #2 [29] requires the player to
pull lines from a video stream to generate sound. 7 A Fitts-
esque exercise involves quickly and accurately putting the
pen down in a zone on the tablet surface.
• News Cycle #2 also uses the buttons and sliders on an
Intuos3 tablet, and requires the user to switch pens.
• N. D’Alessandro’s HandSketch [8] controller uses a polar
coordinate system, calibrated to the ergonomics of a
performer’s arm. This mapping is presented, and
calibrated for individual users. Individual gestures
(forearm for pitch, fingers for intensity) are practiced in
isolation and in combination.
• Ali Momeni [29, 20] uses multiple interpolation spaces:
one controlled by the tip of the pen and one by the tilt.
While initially difficult, this complex spatial navigation
scheme has huge expressive potential.
Figure 2 An Elementary Method for Tablet • Matthew Wright [29] employs a scrubbing metaphor,
where a click on the tablet defines the material to which a
4.2 The Method long trajectory is applied.8 This method also generates
The method book will have three basic sections: Practical
multiple spaces and navigation challenges.
Issues, Basic Exercises, and Études.
In addition to myself, I have already invited other members of
The practical issues section covers topics of getting situated the NIME Community to contribute, and I anticipate
with a tablet interface, including a discussion of which tablet involving additional composers in response to this paper.
to acquire (sizes and models, strengths and weaknesses), the Études should be short, focused pieces that deal with a
use of alternate pens, etc. Setting and adjusting the driver and technical issue from the composer’s musical practice.
sensitivities for musical performance follow, then Hopefully, the pieces will be more in the model of Chopin
recommended software implementations and conventions than Czerny, fully formed pieces of music and not simply
specific to the method book. The exercises and études in the exercises.
method use Max/MSP3 and Jean-Marc Couturier's Wacom
Further important topics will be addressed in an appendix or
Object4. They are programmed so that students can use the
the Advanced Method. These include:
free, runtime version of Max, and distributed under a Creative
Études and exercises that are intended for use in pairs, or
Commons License5 that allows sharing in a non-commercial
larger groups are also desirable in this section.
context. (It is also worth considering implementing some of
3 6
http://www.cycling74.com/ http://crca.ucsd.edu/~msp/software.html
4 7
http://cnmat.berkeley.edu/ video at: http://www.mikezed.com/music/nc2.html
5 8
http://creativecommons.org/licenses/by-nc/3.0/ video at: http://www.youtube.com/watch?v=4dTcSeDTq84
247
Extensions to the tablet interface by employing an [15] Lacy, S. Findings: My Experience with the Soprano
alternate controller in the other hand, including the Saxophone. Paris, Outre Mesure, 1994.
Qwerty Keyboard, fader boxes, and FSR’s. [16] Lehman, A. A. “Efficiency of deliberate practice as a
Material that has a more explicit connection to the use of moderating variable in accounting for sub-expert
the stylus in other arts, such as writing, drawing, and performance” in Deliege and Sloboda (eds.) Perception
painting – Calligraphy or sumi-e inspired études. and Cognition of Music. Hove, East Sussex, Psychology
5.ACKNOWLEDGMENTS Press 1997.
Thanks to my colleagues Richard Andrews, Adrian Freed, [17] Liebman, D. Developing a Personal Saxophone Sound.
David Wessel, and Matthew Wright and to Wacom Co., Ltd. Medfield, MA, Dorn Publications, 1994.
[18] Marino, G, M. Serra, and J. Raczinski “The UPIC
6.REFERENCES System: Origins and Innovations” In Perspectives of New
[1] Accot, J. and S. Zhai “Performance evaluation of input Music Seattle, WA, Volume 31.1 1993, pp. 258-269.
devices in trajectory-based tasks: An application of the
[19] McKim, D. J. Joseph Allard: His Contributions to
steering law. ACM Conference on Human Factors in
Saxophone Pedagogy and Performance. Published
Computing Systems. Pittsburg, PA, 1999, pp.466-472.
Doctor of Arts Dissertation, University of Colorado,
[2] Balakrishnan, R. and MacKenzie, I. S. “Performance 2000.
differences in the fingers, wrist, and forearm in computer
[20] Momeni, A. and D. Wessel “Characterizing and
input control. Proc. of the SIGCHI Conference on Human
Controlling Musical Material Intuitively with Geometric
Factors in Computing Systems Atlanta, Georgia, United
Models” Proc. of the New interfaces for Musical
States, March 22 - 27, 1997. S. Pemberton, Ed.
Expression Conference, Montreal, Canada, 2003, pp.54-
[3] Boie, B, M. Mathews, and A. Schloss “The Radio Drum 62.
as a Synthesizer Controller,” In Proc. of the International
[21] Niehaus, L Jazz Conception for Saxophone. Hollywood,
Computer Music Conference Colombus, OH, 1989,
Try Publishing Company, 1965.
pp.42-45.
[22] Orio, N., N. Schnell, and M. Wanderly “Input Devices
[4] Buxton, B. Sketching User Experiences: getting the
for Musical Expression: Borrowing Tools from HCI”
design right and the right design. Morgan Kaufmann,
Proc. of the New Interfaces for Musical Expression
San Francisco, 2007.
Conference. Seattle, WA, 2001.
[5] Buxton, W, R. Sniderman, W. Reeves, S. Patel and R.
[23] Sand, B. L. Teaching Genius: Dorothy DeLay and the
Baecker “The Evolution of the SSSP Score Editing
Making of a Musician. New York, Amadeus Press, 2000.
Tools” In Computer Music Journal (The MIT Press:
Cambridge, MA, Volume 3.4, Winter 1979), pp. 14-25. [24] Schacher, J. “Gestural Control of Sounds in 3D Space”
[6] Czerny, C. The Art of Finger Dexterity. G. Schirmer,
Conference. New York, USA, June 2007, pp.358-362.
New York, 1986.
[25] Wanderly, M. “Gestural Control of Music” Proc. of the
[7] Czerny, C. The School of Velocity. G. Schirmer, New
International Workshop on Human Supervision and
York, 1986.
Control in Engineering and Music. Kassel, Germany,
[8] D’Alessandro, N. and T. Dutoit “Handsketch Bi-Manual 2001.
Controller” Proc. of the New Interfaces for Musical
[26] Wessel, D. and M. Wright “Problems and Prospects for
Expression Conference. New York, June 2007, pp.78-81.
Intimate Musical Control of Computers” ACM
[9] Denis, G. and P. Jouvelot “Motivation-driven educational Computer-Human Interaction Workshop on New
game design: applying best practices to music education” Interfaces for Musical Expression, Seattle, WA, 2001.
Proc. of the 2005 ACM SIGCHI International Conference
[27] Whiteside, A. On piano playing. Amadeus Press,
on Advances in computer entertainment technology, June
Portland, Or., 1997.
15-17, 2005, Valencia, Spain, pp.462-465.
[28] Wright, M. and A. Freed “Open Sound Control: A New
[10] Dobrian, C. and D. Koppelman “The ‘E’ in NIME:
Protocol for Communicating with Sound Synthesizers”
Musical Expression with New Computer Interfaces”
Proc. of the International Computer Music Conference.
Thessaloniki, Hellas, 1997, pp.101-104.
Conference. Paris, France, June 2006, pp.277-281.
[29] Zbyszyski, M., M. Wright, A. Momeni, and D. Cullen
[11] Ericsson, K. A., R. Th. Krampe, and C. Tesch-Römer
“Ten Years of Tablet Musical Interfaces at CNMAT”
“The role of deliberate practice in the acquisition of
expert performance” Psychological Review, 100, 1993,
Conference. New York, USA, June 2007, pp.100-105.
pp.363-406.
[30] Zbyszyski, M., M. Wright, and E. Campion “Design and
[12] Fitts, P. “The information capacity of the human motor
Implementation of CNMAT's Pedagogical Software”
system in controlling the amplitude of movement”
Proc. of the International Computer Music Conference,
Journal of Experimental Psychology, 47, 103-112, 1954.
Volume 2, Copenhagen, Denmark, 2007, pp.57-60.
[13] Green, B. and T. Gallwey The Inner Game of Music. New
[31] “The TX-0: Its Past and Present” The Computer Museum
York, Doubleday, 1986.
Reports, Vol. 8. Boston, Computer History Museum,
[14] Hanon, C. L. The Virtuoso Pianist in 60 Exercises. G. 1984.
Schirmer, New York, 1986.
248
A tabletop waveform editor for live performance
Gerard Roma Anna Xambó

Music Technology Group Music Technology Group
Universitat Pompeu Fabra Universitat Pompeu Fabra
groma@iua.upf.edu axambo@iua.upf.edu
ABSTRACT
We present an audio waveform editor that can be oper-
ated in real time through a tabletop interface. The system
combines multi-touch and tangible interaction techniques in
order to implement the metaphor of a toolkit that allows di-
rect manipulation of a sound sample. The resulting instru-
ment is well suited for live performance based on evolving
loops.
Keywords
tangible interface, tabletop interface, musical performance,
interaction techniques
1. INTRODUCTION
The user interface of audio editors has changed relatively
little over time. The standard interaction model is centered
on the waveform display, allowing the user to select portions Figure 1: Close view of the waveTable prototype.
of the waveform along the horizontal axis and execute com-
mands that operate on those selections. This model is not
very different to that of the word processor, and its basics Barcelona 2005) which inspired this project. The use of a
are usually understood by computer users even with little standard audio editor on a laptop computer as a sophisti-
or no experience in specialized audio tools. As a graphical cated looper served the performers’ minimalist glitch aes-
representation of sound, the waveform is already familiar thetics. Perhaps more importantly, the projected waveform
for many people approaching computers for audio and mu- provided a visual cue that helped the audience follow the
sic composition. Thus, audio editors have become general evolution of the concert, a simple solution to one of the
tools used for many different applications. most criticized problems of laptop based performance.
Particularly interesting are creative uses of these pro- Tabletop tangible interfaces have gained popularity in re-
grams that go beyond their originally devised functionality. cent years by allowing intuitive interaction with computers.
In describing ’glitch’ music, Kim Cascone wrote: In music performance, they bring back this needed visual
contact with the audience that is missing in laptop music
”In this new music, the tools themselves have by making interaction readable [17]. Thus, the availability
become instruments, and the resulting sound is of low cost means for building multi-touch and tangible in-
born of their use in ways unintended by their terfaces opens the door to a new revision of the possibilities
designers.” [4] of direct interaction with waveforms.
In this sense sound editors have revealed specially useful In this article we describe the waveTable, a tangible sound
in the production of errors and glitches and in general for editor that may be used as a sophisticated looping and sam-
experimental sound design. Thus, it may not be surprising ple manipulation device for live performance with an intu-
that, despite the essentially non-realtime interaction model itive interface that provides feedback to both the performer
that typically governs these programs, many musicians have and the audience.
used them in live performances. One example of this was
the Sound Waves live set by Saverio Evagelista and Federico
Spini (9th LEM International Experimental Music Festival,
2. RELATED WORK
The idea of sound generation from hand-made waveforms
was already envisioned in the 1920s by László Moholy-Nagy,
who proposed that the incisions in wax played by the phono-
Permission to make digital or hard copies of all or part of this work for graph could be created by hand (quoted in [9]). In the
personal or classroom use is granted without fee provided that copies are 1970s, Iannis Xenakis’ UPIC explored many different tech-
not made or distributed for profit or commercial advantage and that copies niques for sound generation from drawings, including wave-
bear this notice and the full citation on the first page. To copy otherwise, to forms and envelopes [8]. Also, one of the first commercial
republish, to post on servers or to redistribute to lists, requires prior specific samplers, the Fairlight CMI, included a light pen for the
NIME08, Genoa, Italy purpose of waveform and envelope edition.
Copyright 2008 Copyright remains with the author(s). In recent years a number of systems have been developed
249
Figure 2: waveTable Tools: icons, gestures and functions.
that exploit tangible interfaces based on commodity com- show how this approach enables the implementation of ba-
ponents for music composition and performance. Most of sic concepts of data edition available in desktop computers
them are focused on synthetic sound generation or realtime for the case of a tabletop sound editor. The result is a tool
sequencing, but do not directly address the problem of tan- that allows the user to sculpt sound in a convenient way so
gible manipulation of waveform data. that sound design becomes a realtime composition process.
One of the applications of Enrico Costanza’s d-touch [5]
library, the physical sequencer, represents sound samples
as tangible objects in a cyclic timeline. Sounds can be 3. INTERACTION DESIGN
metaphorically loaded into objects through a microphone
and several effects can be applied by overdub recording. 3.1 Toolkit Metaphor
The Music Table [19] allows to compose music patterns The relevance of metaphor is traditionally recognized in
by placing cards on a table. Cards are tracked through the field of Human-computer interaction and interface de-
a camera and displayed on a separate screen with an aug- sign, being also applicable to Tangible User Interfaces (TUI)
mented reality layer overlapped. Copying patterns is sup- [7]. Interface metaphors are able to communicate the way
ported thanks to a copy card which stores patterns in phrase users can interact with the system, suggesting or simplifying
cards to be reused or edited at any time without requir- possible actions [6]. Within tangible interfaces, it has been
ing the presence of their note cards. The reactable [12] identified that real-world objects can be used in computer
has become one of the most popular multi-touch and tan- systems to couple physical and digital representations [18].
gible tabletop instruments. This collaborative instrument Thus, metaphor and coupling should provide meaning by
implements dynamic patching in the tradition of modular helping to establish a a continuous dialogue between the
synthesizers. Among many other features, the reactable al- physical and the virtual [6]. This is accomplished in our sys-
lows to draw the waveform of a wavetable oscillator us- tem by metaphorically mapping tangible pucks to tools [7].
ing one finger. Looping samples is also supported with The principal metaphor chosen for interacting with the
sampler objects [11]. Using the reactable technology, the waveTable system is closely inspired by the widely used con-
scoreTable* [10] explores realtime symbolic composition in cept of a tools palette found among graphical desktop ap-
a circular stove. Being more focused on the higher-level plications since the 1980s (e.g. in MacPaint or HyperCard),
compositional aspects of music performance, this project including some sound editors. This approach may be useful
takes advantage of the reactable round shape to represent for shaping the waveform graphically employing tangible
a cyclic timeline, allowing users to collaborate in moving and iconic tools, establishing an interactive dialogue that
physical notes along the stove. Golan Levin’s Scrapple [15] uses familiar verbs and nouns (in the sense proposed in [7]).
allows the generation of a spectrographic score using tangi- Thus, an effective toolkit is provided that can be easily ex-
ble objects laid on a long table, with an augmented reality ploited by musicians, experts or beginners, facilitating the
overlay for visual feedback. This approach seeks a compro- act of editing sound.
mise between compositional precision and flexibility in the
tradition of spectrogram based composition. 3.2 Tools and Gestures
One common aspect of these projects is that tangible ob- According to Bill Buxton, the natural language of inter-
jects are used as physical representations of data. Thus, action should deal with non-verbal dialogues and highlight
their interfaces imply that manipulating a tangible object the gestures as phrases with their own meaning [3]. The in-
is analogous to performing modifications on the underlying teraction elements used in the waveTable are both physical
model. The main drawback of this approach is that physi- artifacts and fingers, and the properties detected by the sys-
cal objects cannot be created from scratch, nor can they be tem are 2D position, rotation and presence of the objects,
duplicated. As seen with the Music Table, this may lead to as well as one or two finger movements. The toolkit is com-
break the relationship between the tangible object and the pounded of tools representing basic operations such as copy,
digital model. We propose the utilization of tangibles as paste or erase, as well as tools that represent audio effects
tools, which represent functions that operate on data. We applied in real time. The chosen mapping is one object
250
per tool. There are four main groups of gestures and tools, Tools for the described operations are simple acrylic plas-
namely Editing, Effects, File and Visualization/Navigation tic pieces with fiducial markers attached on one side and
operations, as shown in Figure 2. descriptive icons on the other. Tools and fingers are illumi-
Editing tools represent operations used for basic modifi- nated using infrared leds and captured by a webcam with a
cation of the sound: Eraser, Pencil, Copy, Paste and Gain. visible light blocking filter. Captured video is processed by
Eraser deletes part of the sample when moving along the x reacTIVision which tracks position and rotation of fiducial
axis. Pencil allows freely drawing waveforms with one fin- markers, as well as position of fingers. This information is
ger when present. Copy stores a fragment selected by drag- encoded using the Tangible User Interface Objects (TUIO)
ging over the waveform along the x axis. Paste stamps that protocol based on Open Sound Control (OSC) [14] and sent
fragment at the object position and repeats it when moved over UDP to the waveTable software. Visual feedback is
along the x axis. Gain increases or decreases the overall provided through rear projection on the table surface.
amplitude when turning the object clockwise or counter-
clockwise.
Effects tools represent common audio effects applied to
the sound in real time: Delay, Resonant low pass filter,
Tremolo, Reverb and Bit crush. In all cases position and
rotation are detected, modifying respectively the position
and shape of an envelope. Each envelope controls the most
relevant parameter of its respective effect.
File tools are applied following the VCR metaphor (com-
mon buttons present in VCRs or CD players [6]): Open file,
Play and Record. Open file allows previewing samples from
a collection displayed in a radial menu, and loading one by
pointing with the finger. Play reproduces the sound in a
loop when it is present. Turning the object clockwise or
counterclockwise increases or decreases the playback rate
respectively. Record captures the output of the system in
real time when present, and then swaps the playback sample
for the result.
Figure 4: A sample of waveTable tools.
Visualization/navigation gestures and tools are con-
cerned with displacement and zoom level: Two finger zoom,
One finger scroll and Grid. Two finger zoom allows nav- The software is written in the SuperCollider 3 [16] lan-
igation between the closest detail and the most general guage, which is based on a distributed architecture where
overview of the waveform depending on the direction and audio synthesis is carried out by a specialized server process
distance between fingers. One finger scroll provides the that is controlled using OSC. This environment allows rapid
option of moving from the starting point towards the end development and very easy implementation and evaluation
point of the sample scrolling right or left along the x axis. of all kinds of effects and operations over audio buffers that
Grid shows a pattern of vertical lines in order to facilitate can be used in real time. Moreover, the Mac OS X ver-
the task of some editing tools, say, eraser, copy or paste. sion provides a set of graphics primitives and a ready-made
waveform display. On the other hand, the distributed na-
ture of the system involves some complications in the syn-
4. IMPLEMENTATION chronization of data between the server and the client. In
the current prototype this limitation is overcome by using a
Using computer vision software like reacTIVision [13] it
RAM disk. Integration with reacTIVision is done through
is now possible to build tangible and multi-touch interfaces
Till Bovermann’s implementation of the TUIO protocol [1].
with low cost components. In order to implement the con-
The software is logically divided into control, model and
cept of realtime direct manipulation of a looping sample, the
view modules. The control module is composed by a hier-
waveTable system was developed as a program that runs on
archy of TUIO objects that handle each of the tools, and
reactable-class hardware using the reacTIVision framework.
a class that handles TUIO cursors (fingers). The model is
implemented as a SuperCollider server node tree, that runs
synth definitions for playing the sound buffer and dynam-
ically applying effects. An overdub record synth definition
allows swapping the playback buffer with a recording of the
output. The view module is also composed of a hierarchy of
objects that implement the graphic representation of tools
and envelopes, and a container view that manages the main
waveform display.
5. USAGE
The resulting prototype makes it possible to modify a
sound sample using fingers and tangible artifacts at the
same time it is being played in a loop. This is accomplished
by loading a sample with the Open file tool. For starting
from scratch by drawing a waveform, a sample filled with
silence of the desired length may be loaded. The waveform
of the sample is then projected onto the table, and can be
zoomed and scrolled using finger gestures. Locating the
Figure 3: System overview. Play tool starts looping the sound, and rotating it modifies
251
the playback rate. Composition and Performance. In Proceedings of the

At this point edit operations can be used to copy, paste or 6th International Conference on Digital Audio Effects
erase portions of the sound, modify its gain, or draw on it (DAFX03), London, UK, 2003.
at the desired zoom level. The Grid tool may be employed [6] P. Dourish. Where the action is, the foundations of
to improve precision if a rhythmic pattern is wanted. embodied interaction. MIT Press, 2001.
Effects tools add several realtime audio processes nonde- [7] K. P. Fishkin. A taxonomy for and analysis of
structively. When located on the table, each of these tools tangible interfaces. Personal Ubiquitous Computing,
displays an envelope that controls the variation of the most 8(5):347–358, 2004.
important parameter of that effect along time. Envelopes [8] J.-M. R. Gérard Marino, Marie-Hélène Serra. The
are composed of two sinusoidal segments forming a smooth UPIC system: origins and innovations. Perspectives of
curve. The central point is controlled by the object position, New Music, 31(1):258–269, 1993.
while the height of the extreme points is controlled by the [9] S. Jordà. Digital Lutherie: Crafting musical
object rotation. Each envelope is represented graphically computers for new musics’ performance and
with a different color by a curve superimposed to the wave- improvisation. PhD thesis, 2005.
form according to the zoom level. This system allows to
[10] S. Jordà and M. Alonso. Mary had a little
control a good number of envelope shapes with very simple
scoreTable* or the reacTable* goes melodic. In
gestures.
Proceedings of 6th International Conference on New
In order to permanently apply the effects, the Record tool
Interfaces for Musical Expression, Paris, France, 2006.
can be used. Once the tool is on the table, the system waits
for the next loop and records it. When the loop is finished [11] S. Jordà, G. Geiger, M. Alonso, and
all effect tools on the table are deactivated together with M. Kaltenbrunner. The reacTable: exploring the
the Record tool. Their virtual representation disappears synergy between live music performance and tabletop
to invite the user to collect them. This process may be tangible interfaces. In TEI ’07: Proceedings of the 1st
repeated as desired. international conference on Tangible and embedded
interaction, pages 139–146, New York, NY, USA,
2007. ACM Press.
6. CONCLUSIONS AND FUTURE WORK [12] S. Jordà, M. Kaltenbrunner, G. Geiger, and
We have presented a set of interaction techniques and an R. Bencina. The reacTable*. In Proceedings of the
implementation for a live performance instrument based on International Computer Music Conference (ICMC
realtime manipulation of a sound sample using tangible and 2005), Barcelona, Spain, 2005.
multi-touch techniques. [13] M. Kaltenbrunner and R. Bencina. reacTIVision: A
Informal testing of the prototype confirmed our expecta- computer-vision framework for table-based tangible
tions regarding the potential of using a tabletop interface interaction. In Proceedings of First International
for waveform manipulation as a live instrument. Moreover, Conference on Tangible and Embedded Interaction,
the system can also prove useful as an intuitive tool for 2007.
creative sound design, although with different ergonomic
[14] M. Kaltenbrunner, T. Bovermann, R. Bencina, and
implications than traditional desktop editors. In order to
E. Costanza. TUIO: a protocol for table based
fully support this application, some tools need to be added,
tangible user interfaces. In Proceedings of the 6th
yet in a way that the system remains free from verbal in-
International Workshop on Gesture in
teraction. Such newer tools include a Snapshot tool that
Human-Computer Interaction and Simulation (GW
allows to save the sound buffer to the hard disk, and a Crop
2005), Vannes, France, 2005.
tool to isolate specific segments of larger sound recordings.
[15] G. Levin. The table is the score: An
Another interesting direction is offered by the possibility of
augmented-reality interface for realtime, tangible,
implementing a low cost light pen, for example using the
spectrographic performance. In Proceedings of the
Nintendo Wii Remote [2].
International Conference on Computer Music 2006
(ICMC’06), New Orleans, November 6-11, 2006.
7. ACKNOWLEDGMENTS [16] J. McCartney. Rethinking the computer music
The waveTable prototype started as a classroom project language: Supercollider. Computer Music Journal,
developed in the Interactive Systems Workshop and Elec- 26(4):61–68, 2002.
tronic Music Workshop courses at the Computer Science [17] J. Patten, B. Recht, and H. Ishii. Interaction
Faculty of the Universitat Pompeu Fabra in Barcelona. We techniques for musical performance with tabletop
would like to thank professors Sergi Jordà, Martin Kaltenbrun- tangible interfaces. In ACE ’06: Proceedings of the
ner and Günter Geiger as well as Marcos Alonso for their 2006 ACM SIGCHI international conference on
guidance and advice. Advances in computer entertainment technology,
page 27, New York, NY, USA, 2006. ACM Press.
8. REFERENCES [18] B. Ullmer and H. Ishii. Emerging frameworks for
[1] SuperCollider TUIO. http://tuio.lfsaw.de. tangible user interfaces. IBM Syst. J.,
[2] Low-Cost Multi-point Interactive Whiteboards. 39(3-4):915–931, 2000.
http://www.cs.cmu.edu/ johnny/projects/wii/. [19] R. Berry, M. Makino, N. Hikawa, and M. Suzuki. The
[3] B. Buxton. The ”natural” language of interaction: a augmented composer project: The Music Table. In
perspective on non-verbal dialogues. INFOR: ISMAR ’03: Proceedings of the 2nd IEEE/ACM
Canadian Journal of Operations Research and International Symposium on Mixed and Augmented
Information Processing, 26(4):428–438, 1988. Reality, page 338, Washington, DC, USA, 2003. IEEE
[4] K. Cascone. The aesthetics of failure: ”post-digital” Computer Society.
tendencies in contemporary computer music.
Computer Music Journal, 24(4):12–18, 2000.
[5] E. Costanza, S. Shelley, and J. Robinson. Introducing
Audio D-Touch: A Tangible User Interface for Music
252
Integrated Algorithmic Composition

Fluid systems for including notation in music composition cycle
Andrea Valle
CIRMA, Università di Torino
via Sant’Ottavio, 20 - 10124
Torino, Italy
andrea.valle@unito.it
ABSTRACT FFT
Audio
GUI
In
This paper describes a new algorithmic approach to instru-
mental musical composition that will allow composers to
Audio
explore in a flexible way algorithmic solutions for different Input CAC FFT In
compositional tasks. Even though the use of computational
tools is a well established practice in contemporary instru- GUI
Analysis UI IAC
mental composition, the notation of such compositions is
still substantially a labour intensive process for the com-
Compositional
poser. Integrated Algorithmic Composition (IAC) uses a procedures
Output Stochastic
generator Notation
fluid system architecture where algorithmic generation of
notation is an integral part of the composition process.
Stochastic Notation
generator
Keywords
Algorithmic composition, automatic notation
Figure 1: Rackbox v.s fluid architecture.
1. INTRODUCTION
Algorithmic composition can be defined as a composition completely algorithmic workflow –from the first idea to the
practice that employs formalized procedures (algorithms) final score–, can be defined as Integrated Algorithmic Com-
for the generation of the representation of a musical piece. position (IAC). An IAC approach pursues the integration
Apart from the many ante litteram examples, algorithmic of notation generation with musical data manipulation, so
musical composition has been proposed and practiced widely that any manual process could be removed from the com-
starting from the ’50s. In particular, from the late ’50s a position pipeline.
computational perspective started spreading across the two The paper is organized as follows: first IAC/CAC approaches
Western continents (see [1] for a detailed discussion). An in- are discussed in relation to different software architectures;
teresting shift in perspective has occurred roughly from the then, the need for a specific architecture is motivated in
’60s up to present day. The first approaches to algorithmic relation to automatic generation of music notation; finally,
composition were driven by instrumental scoring. But, even two cases are presented.
if computer tools are largely widespread in contemporary in-
strumental scoring through computer-assisted composition
systems (hence on CAC, e.g. PatchWork, Open Music, [2],
2. RACKBOX VS. GLUE ACHITECTURES
PWGL [9], but also Common Music, [11]), the idea of a CAC systems are intended to aid the composer in the
purely algorithmic approach, in which a strict formalization computational manipulation of musical data: these data,
rules the whole composition process, is no more pursued in at the end, can be exploited in traditional score writing.
its integrity and has migrated from the instrumental domain Typically based on Lisp, CAC systems offer a large body of
to the electroacoustic one. In fact, considering the final out- functionalities: pitch/rhythm operations remain the core of
put of the composition process, while in the electroacoustic the system, with the inclusion of input modules for audio
domain the synthesis of the audio signal is a trivial task analysis and sound synthesis modules in output. All this
per se, in the instrumental domain the generation of mu- functionalities are typically accessible through a GUI envi-
sical notation still remains a very difficult task ([4], [10]). ronment. While GUI is the main interface to the system,
This notational issue has prevented the diffusion of real al- offering an easier access for the less programming-oriented
gorithmic practice for instrumental composition. Such an composer, a high degree of flexibility is offered by enabling
approach, in which the composition process is turned into a the user to extend the program via the Lisp language. Still,
CAC architectures are based on the assumption that new
functionalities must in some way be adapted to the host-
ing environment. A CAC application architecture can be
Permission to make digital or hard copies of all or part of this work for thought as a rackbox containing a certain number of mod-
personal or classroom use is granted without fee provided that copies are ules (Figure 1, left): the box can leave large room for other
not made or distributed for profit or commercial advantage and that copies modules to be inserted in it. Still the container is solid
bear this notice and the full citation on the first page. To copy otherwise, to and consequently rigid, its capacity is finite, and the mod-
republish, to post on servers or to redistribute to lists, requires prior specific ules, in order to be inserted, must meet the requirements of
NIME08, Genova, Italy the box geometry. By reversing the perspective, a different
Copyright 2008 Copyright remains with the author(s). approach to computer-based composition environments can
253
be conceived. It can be noted that each of the elements Start
of the diagram in Figure 1 (left) may be replaced by an

undetermined plethora of components. As a consequence, 1: input data Compositional
parameters
2, 9: computational processes 1
instead of starting from a solid framework where to insert 3: data structure
modules, it is possible to start from an indefinite variety of 4, 5, 7: manual operations
6, 8: documents 2 Compositional
available modules to be plunged –when necessary– into an algorithms
open environment (Figure 1, right). Such an environment

is fluid because it is intended as a glue capable of attaching 4 5 3 Data
structure
different modules together. This reversed perspective im- Compositional Notational
control control
plies a different approach by the user. By definition, a fluid Ok?
system such as the discussed one cannot be implemented
in a closed application. On the contrary, a programming A B
language is the most flexible way to glue together differ- 6 Data list
ent software modules. Thus, in a fluid system, the com-

munication among the modules can be operated by a glu- Transcription
Transcription
algorithms
7 9
ing language: the language writes scripts addressing each
module’s specific interface and executes them by calling the
related program from OS. The language is responsible for 8 Musical
score
symbolic manipulation representing the selected music fea-
tures and for the communication with the modules in in- Ok?
put/output. Not every programming language is suitable

for such a gluing task. Requirements can be summarized Stop
as follows: high-level, dynamic typing, richness in dynamic

data types, interactivity, string processing, interfacing to
many system calls and libraries. The first two requirements Figure 2: Algorithmic composition cycle.
are needed to let the composer concentrate on composition
algorithms and not to deal e.g. with compilation or complex
project structure issues, and to allow for continuous feed- by looking at the resulting notation. The control processes
back while experimenting in composition. The last two are are high level musical tasks, and potentially they can be
necessary in order to ensure the gluing mechanism. The re- performed very quickly, even in terms of the few seconds
quirements for softwares modules are in fact programmabil- necessary to have a glance at the resulting notes. On the
ity and command line interfacing. Indeed, there is a strong other hand, the transcription step is a low level musical
coupling between CAC systems and rackbox architectures: task, which is always very time-consuming –its timescale be-
CAC applications are oriented toward composers interested ing typically measurable in hours. This slowness depends
in a computational approach to symbol manipulations but both on the complexity of notation in se and on the dif-
are not programmers and are not interested in automatic ficulty to clearly anticipate from a data list the peculiar-
composition. On the other side, an IAC approach, i.e. a ity of the resulting notation. The crucial move towards
fully-integrated computational approach to music composi- an IAC approach consequently requires to automatize the
tion including notation, can evidently benefit from the flex- transcription step (Figure 2, from A to B). In this way,
ibility of a fluid system. More specifically, the case of auto- the composer can focus on the higher level aspects of con-
matic notation generation demonstrates that a completely trol: this would speed up the composition cycle (1-8) which
algorithmic approach to composition can be achieved only can be executed until a satisfying result is obtained, thus
through a fluid architecture. leading to an interactive, trial-and-error methodology. The
task of automatically generating musical notation is a priv-
ileged example of the power and, at the same time, of the
3. AUTOMATIC MUSICAL NOTATION necessity of a glue-connected, fluid system. CAC systems
Algorithmic composition requires to define a mapping proposes generic transcription algorithms, intended to pro-
from data structures (the output of composition algorithms) vide a draft of a possible notational output. Consequently,
to a subset of notation symbols (the final output of instru- even if the resulting scores can be quite sophisticated, they
mental composition practice). In the “classic” approach to are still drafts to be reworked manually by the composer.
algorithmic composition (Figure 2), this sensitive step is In an IAC approach, this handmade notational work is re-
performed in first person by the composer, whose work de- versed into the definition of an algorithmic procedure for
fines the two extremes of the workflow: s/he provides com- the control of a notation module. It must be stated clearly
position parameter in input and defines algorithms; from a that music notation cannot be derived exclusively from mu-
certain data structure the computer generates a data list sical data structures because notation information involves
in textual form; the composer controls the adequateness of graphic data which are autonomous from musical data, but
the output, eventually modifying his/her composition strat- are important at the same degree for the final composition
egy; then, s/he proceeds to transcribe the data list in musi- output: in short, music notation is not only music represen-
cal notation. Finally, s/he evaluates the notational result, tation ([7]), and the composer must take into account both.
eventually modifying some steps of the process. To sum- More, in real practice composition and notation are related
marize, with respect to the computer output, the composer by a feedback loop, so that any decision on one side has al-
works both as a compositional/notational controller and as ways to be verified on the other one. A fluid architecture is
a transcriber. indeed needed to automate such a task, so that transcrip-
The composition cycle thus requires two controls which are tion modules can be opportunely defined and fine-tuned.
iterated in two different moments. Before the transcription, More, specific modules can be “plunged into the fluid” to
the composer can evaluate the generated data and foresee meet the requirements of different notations: as an exam-
specific issues that would raise up during transcription. Af- ple, in the case of graphic notation, a drawing module may
ter the transcription, the same control can be carried out fit better that a musical notation one, even if the latter is
254
Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music eng

Music engraving by LilyPond 2.5.29 —Music
www.lilypond.org
engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org ppp Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond
Music engraving 2.5.29 2.5.29
by LilyPond — www.lilypond.org
— www.lilypond.org

ppp
Music engraving by LilyPond 2.5.29 — www.lilypond.org Python Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — ww
.070 Music engraving by LilyPond 2.5.29 — www.lilypond.org

Music engraving by LilyPond 2.5.29 — www.lilypond.org ppp ( Scalptor ) Music engraving by LilyPond2.5.29 — www.lilypond.org
ppp Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.2
p
.063 ppp

.070 Music engraving by LilyPond2.5.29 — www.lilypond.org
vertices
.061 Music engraving by LilyPond 2.5.29 — www.lilypond.org Musi

.051 .058 Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond2.5.29 — www.lilypond.org Music engraving by LilyPond2.5.29 — www.lilypond.org
1 Music engraving by LilyPond 2.5.29 — www.lilypond.org .065 .058

Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org ppp Music engravin
2 notation
Music engraving by LilyPond 2.5.29 — www.lilypond.org Labels (vertices) LilyPond

.025 .083 Music engraving by LilyPond2.5.29 — www.lilypond.org


ppp ppp ppp
.059
Music engraving by LilyPond2.5.29 — www.lilypond.org
3
ppp
.141 Labels (edges) Music engraving by LilyPond 2.5.29 — www.lilypond.org
.124
Edges Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engravin
graphics Music engraving by LilyPond 2.5.29 — www.lilypond.org


ppp
Graph MetaPost Music engraving by LilyPond2.5.29 — www.lilypond.org
I ppp
ppp
ppp Music engraving Music
by LilyPond 2.5.29
engraving by—LilyPond
www.lilypond.org
2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engravingMusic
by LilyPond 2.5.29
engraving by— www.lilypond.org
LilyPond ppp
2.5.29 — www.lilypond.org
4
Music engraving by LilyPond

Music2.5.29 — www.lilypond.org
engraving by LilyPond 2.5.29 — www.lilypond.org
text
"Graph I" Annotations
TeX Prestissimo possibile, ma preciso
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond2.5.29— www.lilypond.org
ConTeXt
In ogni arco, l’etichetta indica il valore a cui deve essere legata l’ultima nota del vertice da cui l’arco inizia.
Tutto deve essere suonato alla 15 ma superiore.
Music engraving by LilyPond2.5.29— www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving
Figure 3: A graph model is fed into Scalptor, gluing LilyPond and ConTEXt to generate a graphic notation. Music engraving Music
by LilyPond
engraving
2.5.29
by—LilyPond
www.lilypond.org
2.5.29 — www.lilypond.org
provided with some drawing capabilities. Examples of fluid fitting in this case. LATEX and ConTEXt are two typeset-
Music engraving by LilyPond 2.5.29 — www.lilypond
architectures implementing IAC systems are described in
ting systems for document preparation implemented as a Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
[13] (where they are referred as Automatic Notation Gen- set of TEX macros. Both allow to work together with ad-
erators), [5], and [3]. In the rest of the paper, we discuss vanced graphic packages. ConTEXt ([6]) has been chosen
two cases of IAC where different fluid systems are designed as it provides direct support for the MetaPost graphic lan-
to fit different needs, allowing for a complete algorithmic guage and extends it by adding a superset of macros (named
control over the final score. “Metafun”) explicitly oriented towards design drawing (e.g.
allowing pdf inclusion). For this particular project, Python
4. GRAPHIC NOTATION has been chosen as the gluing language: it has a remark-
ably clear syntax and meets all the previously discussed
In the first project, the final scores (for piano solo) is com- requirements for an IAC language. Python takes into ac-
posed of a page in very large format (A0) containing graph- count all composition data processing, i.e. graph generation
ical notation. The formal composition model is a graph and manipulation algorithms, and also the gluing, scripting
and the notation mirrors visually the graph structure (Fig- process. The Python module, named Scalptor (“engraver”),
ure 3). All information associated to the graph data struc- generates the score by writing text files containing code for
ture in the model has to be mapped into music/notation each of the involved modules and calling each module in
information, so that notation can be generated automati- order to render it.
cally. The score is made up of musical notation (vertices
and edge labels), graphics (graph drawing), text (perform-
ing annotations) (Figure 3, right): all these components 5. SPECTRAL COMPOSITION
must be provided by programmable modules and their out- As previously noted, an IAC system should provide room
put integrated in an unique document. A strong constraint for inserting modules specialized in audio analysis. Analy-
is that musical tradition requires high typographic quality sis parameters can then be processed and used as starting
both for the overall document and for the specific musi- material for musical composition. Figure 4 represents an
cal notation elements. As all the involved components are implementation of an IAC fluid system for a composition
alphabetic or geometric, vector graphic solutions are conse- project involving parameter extraction from audio signals.
quently needed. In generale, as standard GUI applications In particular, the commission was to use as starting material
are here not relevant, the possible candidates shares a TEX- an excerpt from Sophocles’ Antigon, which was read by a
based approach ([8]), i.e. they are command languages, to philologist so to respect as possible the reconstructed Greek
be input via textual interface and to be compiled in order classic pronunciation. Three voices sing melodies generated
to generate a vectorial output. Concerning musical nota- from data resulting from the analysis of the original au-
tion, among the possible candidates (for a review see [10]), dio file, in particular from the fundamental frequency and
LilyPond, while still sharing a TEX-oriented approach, en- the first two formants. The Praat software has been cho-
sures very high typesetting quality but on the same time sen for the analysis task, being it specialized in phonetic
can be tailored for advanced uses, has a simple, human- processing. The SuperCollider application ([12], hence on
readable syntax, it has undergone a fast development and SC) has been chosen both as system glue and as an audio
it is now the most common text-based music notation ap- module: as a language, SuperCollider is rich in data struc-
plication. LilyPond scripting solves the problems of gener- ture, highly expressive, provides an interfaces to the OS
ating standard notation for the vertices of the graph, but environment, allows for string manipulation; as an audio
the resulting files (one for each vertex, in pdf/ps format) server, it provides state-of-the-art sound processing. Most
must then be included into the drawing of the graphic no- importantly, from a UI perspective, SC allows for interac-
tation. It is interesting to see that many candidates are tive sessions and provides also programmable GUIs. Com-
255
Audio data
- input
Compositional Audio
data processing analysis
Transcription
SuperCollider
- processing
0.35861678
Lei
Audio Praat 5000
synthesis
4000
Formant frequency (Hz)

3000
Audio Python 2000

data
UI Unicode
displays conversion 1000
Graphic data
III
0
- output PyX 0 0.71723
Music
. $# $#&
Time (s)
# $# # #
4
$#
1 2 ( ' # #$# + + $# # ( " " "
LilyPond notation Soprano
pp #&
mp pp pp mp mf
e - y - o - - u # # o æ u a e #
Figure 4: From audio to notation: modules.

$# $# # $# $#
1 2 ( $# # & ## #& $# $# # + + $# # $# $#$# # #( " " " $#
pp # #
Tenore
8
p mp p pp pp mp mf
e i y # a % o - % o u # # o æ u - - a e #
munications between SC and Praat has been carried out

3 2 ( - ( " " "
via text files: Praat can be easily scripted by passing text Basso
/ # # # #
f mp ff
files and it can, in turn, export text files, which can be read e - - - - - - - a e #
back from inside SuperCollider. The whole composition cy-
. !)
cle can then be executed interactively from inside SC. As 4 2 " " " " " " " " ( !) !& !! !!( ' !!
!
Voce 1
before, a transcription module is responsible for the gener- Figure 5: Different outputs. SC GUI, Praatsf graph- sf
ation of LilyPond files which can then be rendered to final ics, LilyPond notation. s r h st' pi ç x k%
0 0 0 0 0
pdf score file. The transcription algorithm also performs a Voce 2 4 2 " " " " " " " " ( !) ! ! ! !!!! !!!!
melodic contour evaluation on the input data, so that con-
tinuous pitch increases/decreases are converted into ascend-
7. REFERENCES sf
s r h st' t' - - - - -
[1] C. Ames. Automated composition in retrospect: !,

( !) ! ! ! !
ing/descending glissandos (see Figure 5, bottom). For each 4 2 " " " " " " " " ! ! !& !
/1956-1986. Leonardo, 20(2):169–185, 1987. '
Voce 3
note of a voice, a vowel symbol is assigned, as a result of a sf
global evaluation of the two formants. As SuperCollider ac- [2] G. Assayag, C. Rueda, M. Laurson, C. Agon,sand "$ # m! f s
tually does not support Unicode, the LilyPond file has been O. Delerue. Computer-assisted composition at
post-processed by a Python module replacing special ASCII IRCAM: From PatchWork to OpenMusic. Computer
strings sequences with necessary Unicode glyphs. SC pro- Music Journal, 23(3):59–72, 1999.
vides facilities to sonify in real time all the data, i.e. before. [3] T. Baça. Re:8 Lilypond for# serial music? LilyPond
&
7 $#
( # $# # (& ( # # # ( " " " " " " " " (& *
and after processing and, through GUI packages, the same1 * '
S
pp
mailing list (lilypond-user@gnu.org), Nov. 28 2007.
pp p mf
data can be displayed on screen. For purposes of documen- # [4]- D. Byrd. - ø Music # % notation
- - ø# software and intelligence. e -
tation, high quality, vector graphics has been generated by - Computer % &
Music - $ # Journal, 18(1):17–20, 1994. 7 %
writing in SC the opportune modules. Such modules allow ( $#
1 # #( ( # # # # # $ # # (Specification
" " " " " " " " (& # # $#
pp [5] N. Didkovsky. pp Java Music Language,
T
8 p mf
to interface Praat, which is able to create graphics from # o v103 øupdate. # In % Proceedings
o - - - - ø# of the International e æ
all its data, and the PyX Python graphics package, which Computer Music Conference 2004, Miami, 2004.
- 7 %
has been used to plot compositonal data structure. FigureB /53 ( $ # [6] H. Hagen. ConTeXt the manual. #( " " PRAGMA
" " " " " " (& #
mf #& p
Advanced
shows (from top to bottom) a GUI from SC plotting formant # -
Document - -
Engineering,
- o -
Hasselt NL, 2001.
- - #
f
e -
data, the same data exported by Praat into an eps file, and [7]!5 K.
. 0 H. Hamel. A design for music editing 0and
) 0 printing 7 !
an excerpt from the final score by LilyPond. This rich sys- + !) !
4 ! ( "
software "
based " ( & ! !syntax.
" on "notational ! & ! ! & Perspectives
!!! ! + " "of " " ( !) !
!6 * !6
V1
tem output provides a constant feedback to the composer, sf

New Music, 27(1):70–83, 1989. sf sf
x(% ( tum t&k t' s r h st' t' - ( tum ' g
allowing her/him to control interactively the composition [8] D. Knuth. The TeXbook. Addison Wesley, Reading,
!& !& ! 0 0 0 7 !
" ( & !) ! ! & ! & ! ! ! ! ! ! ! ! ! & ! ! ! ! + " "
process. 4
+ !) ! ! Mass., 1984. !! ( " " ( !) !
!6 * !6
V2
sf [9] M. Laurson, V. Norilo, andsfM. Kuuskankare. sf

x(% ( tum ' gæ h st' ti s r h st' t' - " ps ks r h st' t' ( tum ' m
6. CONCLUSIONS PWGLSynth: A visual synthesis language for virtual
!5 instrument!5 0 0 0 0 0 0 7
4 + !) ! ! ! ! ! ! ! design ! !and ( & !) ! & !Computer
( " control. ! ! & ! ! ! Music ! ! ! !! ! ! ! & + ( !) !
/ !6 !6 ' !
The case of musical notation is particularly relevantV3in
sf Journal, 29(3):29–41, 2005.sf* ' sf 6
demonstrating the need (and the strength) of a fluid archi- x(% ( tum t&k tum t&k t' - - " ps r
[10] H.-W. Nienhuys and J. Nieuwenhuizen. s "r h st'LilyPond,
t' - - " ps r a ( tum '
tecture for an integrated algorithmic composition system.
system for music engraving. In Proceeding of the XIV
In itself, notation is not a simple mapping from musical data
CIM 2003, pages 167–172, Firenze, 2003.
to notation symbols, as it requires the composer to provide
[11] H. Taube. An introduction to Common Music.
specific typographic information. An IAC approach allows . & # &
16
# #+"
20
7
the composer to develop case-specific solutions to suchS a1 " ( # (
Computer " " Music " Journal,
" " "21(1):29–34,
" " " ( '1997. " " " " "
**
problem, by plunging the selected modules into a fluid sys- [12] S. Wilson, D. Cottle, and N. Collins, editors. The
pp p pp
#o y - - ø
tem: indeed, these can include not only notation but e.g. SuperCollider Book. The MIT Press, Cambridge,
7
many different UIs. The maximum flexibility is evidently 1 " ( & #Mass.,
# (& " 2008. " " " " " " " " ( # # #$ # +" " " " " "
** # # #
T
gained by using an interactive language as a system glue. [13] ppH. Wulfson, G. D. Barrett, and M. Winter.
8
p Automatic
pp
Some examples of automatic generated notation can be

#
notation generators. In Proceedings of the y eæa # -
7th ø
3 international conference on New interfaces - for musical - + " " "7 "
found at ( (& # # + #+" ( # # # # #&
B
/ #$ # * $pp
*expression,## pages #
346–351, New York, 2007. ACM.
http://www.cirma.unito.it/andrea/compositionNotation/. mp p p
mp
mp
- # - - - - - - - - - - - - y
. ! 7
4 ! ! + " + !) ! !( " " " " " " " " + ! ( " " " " " " (
!! !) &
sf 6
V1
256 sf
st' ti d# to tum ' k% h
0 0 !, 0 0 0 0 0 0 0 0 0 7
4 ! !& ! ! ! ! ! ! ! ! ! ! ! ( " " " " " " + ! !!!! ! ! !!!!( " " " " (
! !)
V2
sf
GeoGraphy: a real-time, graph-based composition

environment
Andrea Valle
CIRMA, Università di Torino
via Sant’Ottavio, 20
10124 - Torino, Italy
andrea.valle@unito.it
ABSTRACT 1: woodLow
vLab woodLow woodHi woodHi woodLow woodHi
This paper is about GeoGraphy, a graph-based system for vID 1 2 2 1 2
the control of both musical composition and interactive per- 4: 1 3: 0.5 1: 0.7
eID: eDur 4: 1 2: 1.2 1: 0.7 3: 0.5
formance and its implementation in a real-time, interactive t
application. The implementation includes a flexible user 2: woodHi 2: 1.2

vDur
(0.7)
interface system.
Keywords Figure 1: A graph (left), vertex durations and coor-

dinates omitted) and a resulting sequence (right).
Musical algorithmic composition, composition/performance
interfaces, live coding
based approaches. Nevertheless, it has been noted that live
1. INTRODUCTION coding may share some of the problematic issues of both.
During the last fifty years, the use of computer for cre- A general observation ([5]) is that the real-time, improvi-
ating music has replicated a typical Western opposition be- sational approach per se gains in instrumental expressivity
tween composition and performance. On one side, the com- but often lacks in structure: structure is what a non real-
puter allows an in-depth exploration of algorithmic com- time approach to composition is aimed at, typically focusing
position techniques. Algorithmic composition is typically on large-time scale definition and providing an overall ar-
a non real-time process which strictly follows the two-step chitectural framework.
model of traditional composition, in which the composer This paper describes GeoGraphy ([11]), a real-time envi-
creates a score to be played by the musician. On the other ronment for graph-based algorithmic composition, i.e a for-
side, computers have been used for live performing from mal system for the algorithmic control of sound organiza-
the late ’60s, but it is only from the ’90s that the increas- tion, and a real-time implementation that takes advantage
ing computational power of processors has consistently al- of both GUI and live coding possibilities in terms of com-
lowed the widespread of real-time audio softwares which position and interaction.
can be used by the player to perform in a live concert:
as a consequence, an “instrumental” use of audio software
has grown, requiring the definition of interfaces allowing
2. GEOGRAPHY: THE SEQUENCING MODEL
human-machine interaction, which have mainly taken the In the most general sense, GeoGraphy generates sequences
form of GUIs. Commercial software is based on various of sound objects. The generation process relies on graphs.
constrictive GUIs closing any access to the software apart Graphs have proven to be powerful structure to describe
from the built-in functionalities: more, they are typically musical structures ([7]): they have been widely used to
focused at the textural level, thus making difficult the work model sequencing relation between musical elements be-
at the note level, which is typical of the live instrumen- longing to a finite set. A common feature of all these graph
tal performing ([4]). In order to get around the rigidity representations is that they do not model temporal infor-
of musical software, two solutions have been devised: the mation: on the contrary, graphs as defined by GeoGraphy
development of new musical interfaces, escaping the limits are intended to include it, so that time-stamped sequences
of commercial/traditional ones ([4]), and live-coding, where of sound objects can be generated. The sequencing model
the UI is the code window and the direct access to the cho- is based on direct graphs (Figure 1) where each vertex rep-
sen language allows for a fluid and flexible approach to the resents a sound object and each edge represents a possible
definition of composition algorithms as a performing act sequencing relation on pairs of sound objects. This direct
([5]). Live coding emerges as a very innovative perspective, graph is actually a multigraph, as it is possible to have more
as it merges aspects of composition-based and instrumental- than one edge between two vertices; it can also include loops
(see Figure 1 on vertex 2). Each vertex is given a label rep-
resenting the sound object duration and each edge a label
representing the temporal distance between the onset time
Permission to make digital or hard copies of all or part of this work for of the two sound objects connected by the edge itself. The
personal or classroom use is granted without fee provided that copies are vertices are given an explicit position in terms of coordinates
not made or distributed for profit or commercial advantage and that copies of a Euclidean dimensional space. This metric information
bear this notice and the full citation on the first page. To copy otherwise, to allows to distinguish between two graphs being identical
republish, to post on servers or to redistribute to lists, requires prior specific from a topological point of view, as they can have different
NIME08, Genova, Italy positions in the space. Other information can be optionally
Copyright 2008 Copyright remains with the author(s). associated to vertices and edges. The graph defines all the
257
possible sequencing relation between adjacent vertices. A GUI

SuperCollider Application
iXno
sequence of sound objects is achieved through the insertion SuperCollider Language
of dynamic elements into the graph, called “graph actants”. GeoDocument
A graph actant is initially associated with a vertex (that be- ZGRViewer

Grapher Graph
comes the origin of a path); then the actant navigates the GraphParser GUI
graph by following the directed edges according to some dot file
probability distribution. Each vertex emits a sound object

as determined by the passage of a graph actant. Multiple GraphViz GUI Runner Painter GUI
independent graph actants can navigate a graph structure GeoAudio

at the same time, thus producing more than one sequence. pdf file subclass
Actant Actant ... Actant
In case a graph contains loops, sequences can also be infi-
nite. As modeled by the graph, the sound object’s duration SuperCollider Audio Server
Audio
and the delay of attack time are independent: as a conse-
quence, it is possible that sound objects are superposed1 .
A composition is a a set of sequences, like voices in poly-
phonic music: in a composition there are as many sequences Figure 2: Overall architecture of the SC implemen-
as graph actants. The music generation process can be sum- tation.
marized as follows. Graph actants circulates on the graph:
there are as many simultaneous sequences of sound objects same time a presentation of one of the possible ways of in-
as active graph actants. An example is provided in Figure teracting with the system in a live situation. The overall
1. The graph (top) is defined by two vertices and four edges. architecture is represented in Figure 2: the components will
The duration of both vertices is set to 0.7 seconds (it is not be discussed progressively in the rest of the paper.
shown in the graph, only in the resulting sequence, Figure The core of the system are the two classes Graph and Run-
1: bottom2 ). In Figure 1 (top), vertices are labeled with ner. The Graph is deputed to all (static) graph manipula-
an identifier (“1”, “2”). More, each vertex is given a string tions (adding/removing vertices/edges, etc). A graph stores
as an optional information (“woodLow”, “woodHigh”): this information about its structure in a dictionary associating a
string can represent a meaningful property of the referred vertex ID to the vertex definition: this includes coordinates,
sound object (e.g.“wood” can stands for a woodblock au- vertex duration, label, options and also the definition of all
dio samples), to be used in sound synthesis (see later). A the edges starting from it. The graph in Figure 1 can be
composition starts when an actant begins to navigate the created by the following code, where a Graph instance g is
graph, thus generating a sequence. Figure 1 (bottom) rep- created and some elements (vertices, edges) are added:
resents a sequence obtained by inserting a graph actant on
vertex 1. The actant activates vertex 1 (“woodLow”), then g = Graph.new ;
travels along edge 4 and after 1 seconds reaches vertex 2 g.addVertex(x: 10, y: 20, dur:0.7, label: "woodLow") ;
(“woodHi”), activates it, chooses randomly the edge 2 be- g.addVertex(x: 10, y: 10, dur:0.7, label: "woodHi") ;
tween the available ones (edges 1 and 2), re-activates vertex g.addEdge(start:2, end:1, dur:0.7) ;
2 after 1.2 seconds (edge 2 is a loop), then chooses edges 1, g.addEdge(start:2, end:2, dur:1.2) ;
and so on. While going from vertex 1 to vertex 2 by edge g.addEdge(start:1, end:2, dur:0.5) ;
3, vertex duration (0.7) is greater then edge duration (0.5) g.addEdge(start:1, end:2, dur:1) ;
and sound objects overlap.
The Runner is a dependant of a Graph instance, i.e. each
time a graph changes, its runner is updated. A runner is an
3. THE REAL-TIME COMPOSITION EN- interface towards instances of the Actant class (representing
VIRONMENT the graph actants) and manages all the parallel real-time,
In the current implementation the challenge has been to sequencing processes. An Actant instance is a wrapper for a
use GeoGraphy as a real-time composition environment. routine which schedules events by traversing the graph and
In order to bring together these diverging issues, the Su- reading edge temporal information. The sequence depicted
perCollider application (SC) has been chosen among other in Figure 1 can be generated by executing this code:
possible candidates (e.g. Chuck[3], Impromptu [9]), as it
features a high-level, object-oriented, interactive language r = Runner.new(g) ;
together with a real-time, efficient audio server. The Su- r.addAndSetup(aStartingVertexID: 1) ;
perCollider language summarizes features common to other r.start(aID:1) ;
general and audio-specific programming languages (e.g. re- First, a Runner instance r is created; then, an actant is
spectively Smalltalk and Csound), but at the same time placed on vertex 1; finally the actant is started. By instan-
allows to generate programmatically complex GUIs ([12]). tiating an actant, the Runner becomes its dependant. Thus,
The whole system relies heavily on the Observer design each time a vertex is activated, the Actant instance sends a
pattern ([1]) for event handling. The pattern allows loose message to the dependent Runner. The runner adds to the
coupling between –to speak in SC– a “model” (the ob- message other information and, in turn, forwards it then to
served) and its “dependants” (the observers). The depen- all its dependants. The whole message passing process can
dency mechanism is fundamental in allowing the maximum be exemplified by discussing audio synthesis (Figure 3).
flexibility in the interaction with the system, as dependants, It is apparent that GeoGraphy is intended as a general se-
whatever their nature could be, can be interactively added quencing system, as it does not make any assumption about
or removed on-the-fly during a performance. Due to the in- sound objects, whose generation is demanded to an external
terpreted, interactive nature of SC, the following discussion component. Properly, it defines a mechanism to generate
on the architecture, by introducing code examples, is at the sequences of referred sound objects (grouped in sequences).
1 Audio synthesis can be achieved by subclassing the GeoAu-
This happens when the vertex label is longer than the cho-
sen edge label. dio abstract class (see Figure 2). GeoAudio handles inter-
2
Vertex coordinates are omitted too. nally the relations with the system, while each subclass,
258
controls actants by visualizing activated vertices in red. The Painter

Graph is known Runner Actant(s) works not only as a viewer (which is crucial in letting the
sends user explore the graph structure), but also as a controller:
message
complete and forwards vertices’ positions can be modified by dragging their rel-
message ative GUI elements. As the coordinates of the activated
is registered
["s60", vID, ...]
vertex are sent by the Runner to its dependants (GeoAu-
Sinusoider dio included), they can be mapped to audio parameters:
is mapped if label[0] == "s": audio if so, by dragging the vertices in the Painter’s space, the
... generates
user can modify the graph metrics (i.e. the vertices’ coor-
dinates) and thus control gesturally the synthesis. A differ-
ent GUI/code mixed approach is implemented in the visual
Figure 3: Dependency mechanism and message dis- interface provided by the Grapher class. Thanks to SC’s
patching for a GeoAudio subclass (“Sinusoider”). operating system interfacing capabilities, the Grapher class
can interface GeoGraphy with the GraphViz command line
utilities for graph drawing (see Figure 2, [2]). The Grapher
to be provided by the user, requires the definition of three class creates, while working in real-time, a description of
methods, respectively for initialization, synthesis definition, the graph in the dot language. This allows the user to ex-
and mapping from data passed by the messages sent by plore interactively the graph with a specialized viewer such
the Runner. By subclassing GeoAudio, the user can create as ZGRViewer ([6]) or to render it to an image file. In this
his/her own audio module library. The user can instantiate case, the result is Figure 1, left.
(and remove) in real time as many audio instances as de- Even if mixed, code and GUI still remain in themselves two
sired (from different GeoAudio subclasses), as each of them quite distinct interface layers. On the other hand, Taube
will be registered to a runner and react to specific messages, has proposed for Commom Music a three layer interface
like in a plug-in mechanism (Figure 2). Each time an actant system, based on “procedural, textual and gestural modes”
activates a vertex, a message is sent from the runner to the ([10]). Common Music is written in Lisp and the procedural
audio device, which reacts by spawning an audio event. The mode indicates the control of the system directly by pro-
audio mapping is left to the user but typically relies heavily gramming in Lisp. Gestural mode refers in Common Music
on the vertex string label. From a user-centered perspec- to its dedicated GUI system Capella. More, an intermedi-
tive, to give a name to a sound object (e.g. “woodLow” ate layer between code and GUI is provided, by means of
and “woodHi”) is much more meaningful than to assign a a “textual” interface: a list of commands is defined that
numerical identifier to a vertex. can be interpreted by Stella, a shell-like interface. In Ge-
The discussed architecture is crucial not only for decoupling oGraphy, an analogous approach has led to the definition
audio synthesis from sequencing, but –more generally– for of the “iXno” scripting language (ichnos, greek “trace”).
allowing multiple approaches to real-time interaction within The main purpose of iXno is to allow a simplified control of
the GeoGraphy environment: user interaction, in fact, can GeoGraphy for real-time usage. iXno commands take the
be accomplished through a three-layer interface system. form c@ p1 ... pn, where c is a single-letter identifier for
a command, @ can be replaced by + or -, and the follow-
ing symbols represent command-specific parameters. The
4. THE THREE-LAYER INTERFACE SYS- following line
TEM
e+ woodLow 0.4 woodHi 0.7 woodLow 1 woodHi 1.2 woodHi
As all the system is written in SC, the interaction can
be accomplished by the same SC application, by using its contains the iXno command necessary to create the graph of
textual interface to write and evaluate code in input, and Figure 1: e+ is used to create edges by specifying a series of
to receive in output feedback from SC through its post win- (vertex, edge duration, vertex) triples which can be concate-
dow (Figure 4, no. 3). The previous section has, in fact, nated (as in the example). iXno code is interpreted by the
already demonstrated an interactive SC session with GeoG- GraphParser class (Figure 2) and translated into SC code.
raphy. Also, the code textual interface is the most powerful iXno is intended as a fast language, to be typed in real-
way to control algorithmic processes over time, e.g. directly time: to overcome the slowness of typing, it favors terseness
manipulating graphs or instantiating actants. As common against clarity and it is based on the principle “the less you
in live coding practice, snippets of codes can be evaluated type the faster you play”. More, many parameters receive
in real-time. a preset value and on the same line more commands can
Even if a live coding approach can be extremely powerful, be concatenated. As iXno does not provide a full access to
GUIs enhance real-time interaction by providing visualiza- the SC classes, its behavior can be defined in terms of “live
tion and gestural control: as a consequence, GeoGraphy scripting”, less powerful than SC but most immediate. The
includes many GUI classes. All the GUIs are “throw-away iXno language defines an intermediate layer between code
user interfaces” ([8]), as they share the discussed depen- and GUI and not by chance can be controlled by the user by
dency mechanism: thus they can be created/deleted in real- two distinct interfaces. On one side, the commands can be
time without affecting the system state. As an example written in a GUI textfield associated to the GraphParser
(Figure 4), the Runner can be given a GUI representing all (Figure 4, no. 4). Here, so to say, the user is placed at
the actants: in Figure 4, no. 1, there are four active actants the GUI level. On the other side, to a more code-oriented
traversing the graph. For each of them (labeled through the interface, a special SC document class (GeoDocument) is
ActID field), the GUI provides a button to start/stop the available. GeoDocument provides a unified interactive code
generating process, plus a slider for continuous control of window interface for SC and iXno and benefits of all the
amplitude. Another fundamental GUI is provided by the text editing capabilities of the SC application (copy/paste,
Painter class, which is deputed to graph visual represen- find, etc).
tation (Figure 4, no. 2): it draws the vertices with their To sum up, the three layers of the interface system are
ID and their strings, and the edges with their ID and du- in practice extremely permeable. Real-time control fully
rations. The Painter also provides information on running benefits from the intermingling of the different possibili-
259
3 2
Figure 4: GeoGraphy GUIs. 1. Runner GUI, 2. Painter, 3. SC Post Window, 4. GraphParser GUI.
ties. Interactive graph manipulation can mix iXno script- [2] E. Gansner, E. Koutsofios, and S. North. Drawing
ing with GUI control through the Painter space. GUI cre- graphs with dot, 2006.
ation/deletion can be scheduled by SC programming. As [3] W. Ge and P. Cook. Chuck: A concurrent, on-the-fly
iXno is defined on top of SC Geography classes, it rep- audio programming language. In Proceedings of the
resents a simplified interface to SC classes which can be International Computer Music Conference (ICMC),
used directly in SC programming, by constructing strings Singapore, 2003.
and passing them to GraphParser instances. Through iXno, [4] T. Magnusson. The ixiQuarks: Merging code and gui
code snippet readability is highly improved with respect to in one creative space. In Proceedings of the
the SC-only code. This is useful for live coding both in International Computer Music Conference 2007,
terms of typing speed and code management. Copenhagen, August 27-31, 2007.
[5] C. Nilson. Live coding practice. In NIME ’07:
5. CONCLUSIONS Proceedings of the 7th international conference on
The real-time, interactive implementation of the GeoG- New interfaces for musical expression, pages 112–117,
raphy system aims at merging different attitudes towards New York, NY, USA, 2007. ACM.
composition. Real-time usage has required the development [6] E. Pietriga. A toolkit for addressing hci issues in
of different interfacing systems: GUI, code and script. This visual language environments. IEEE Symposium on
has been possible through a strictly modular architecture, Visual Languages and Human-Centric Computing
which favors the insertion of modules specialized for differ- (VL/HCC), 00:145–152, 2005.
ent tasks. The resulting three-layer structure has proven [7] C. Roads. The computer music tutorial. The MIT
to be useful in order to allow the user a maximum control Press, Cambridge, Mass., 1996.
flexibility, providing a smooth transition between the two [8] J. Rohrhuber, A. de Campo, R. Wieser, J.-K. van
extremes of code typing and GUI. In this way, it is possible Kampen, E. Ho, and H. Hölzl. Purloined letters and
for the musician to merge features typical of “out-of-time” distributed persons. In Music in the Global Village
algorithmic composition with a real-time control over per- Conference, Budapest, December 2007.
formance. [9] A. Sorensen. ”impromptu: An interactive
GeoGraphy is distributed as a “quark, a public SC ex- programming environment for composition and
tension (see http://quarks.sourceforge.net/). See also performance”. In Proceedings of the Australasian
http://www.cirma.unito.it/andrea/notation/. Computer Music Conference 2005, pages 149–153.
ACMA, 2005.
6. ACKNOWLEDGMENTS [10] H. Taube. An introduction to Common Music.
A thank is due to Vincenzo Lombardo for his continuous Computer Music Journal, 21(1):29–34, 1997.
support. [11] A. Valle and V. Lombardo. A two-level method to
control granular synthesis. In Proceeding of the XIV
7. REFERENCES CIM 2003, pages 136–140, Firenze, 2003.
[1] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. [12] S. Wilson, D. Cottle, and N. Collins, editors. The
Design Patterns: Elements of Reusable Object SuperCollider Book. The MIT Press, Cambridge,
Oriented Software. Addison-Wesley, 1995. Mass., 2008.
260
Multi-Platform Development of Audiovisual and Kinetic

Installations
Iannis Zannos Jean-Pierre Hébert
Ionian University, Dept. of Audiovisual Arts UCSB, Kavli Institute for Theoretical Physics
Plateia Tsirigoti 7 hebert@kitp.ucsb.edu
Kerkyra, 49100 Greece
+30 6977280656
zannos@gmail.com
ABSTRACT Most tools are specialized in one medium, or are generic

programing environments that must be extended through libraries
In this paper, we describe the development of multi-platform tools
or plug-ins to work with specific media. Moreover, if working
for Audiovisual and Kinetic installations. These involve the
with very specific and experimental technologies such as
connection of three development environments: Python,
particular types of sensors or actuators, web-based environments
SuperCollider and Processing, in order to drive kinetic art
etc. it is hardly possible to integrate all aspects of the work in one
installations and to combine these with digital synthesis of sound
programming tool. Thus, the ability to combine several different
and image in real time. By connecting these three platforms via
tools or environments becomes an important asset, if not a
the OSC protocol, we enable the control in real time of analog
necessary condition, for integrating different media in works that
physical media (a device that draws figures on sand), sound
address several modes of expression.
synthesis and image synthesis. We worked on the development of
algorithms for drawing figures and synthesizing images and sound 1.2 OSC and Communication between
on all three platforms and experimented with various mechanisms
for coordinating synthesis and rendering in different media. Applications
Several problems were addressed: How to coordinate the timing With the appearance of the Open Sound Control standard (OSC)
between different platforms? What configuration to use? Client- [1] many applications have become able to communicate with
server (who is the client who the server?), equal partners, mixed each other. OSC has the advantage of being medium-neutral and
configurations. A library was developed in SuperCollider to easily configurable to meet the needs of each application
enable the packaging of algorithms into modules with automatic independently of its specific internal mode of communication and
generation of GUI from specifications, and the saving of control. Thus, OSC is now often used to connect an application to
configurations of modules into session files as scripts in input devices as well as output and actuator devices. Less
SuperCollider code. The application of this library as a framework common however is the interconnection of several applications of
for both driving graphic synthesis in Processing and receiving different types. Even though this way of working is becoming
control data from it resulted in an environment for popular, it has hardly been treatment as a research topic by itself.
experimentation that is also being used successfully in teaching The present paper addresses precisely this issue. It is based on
interactive audiovisual media. work done on three parallel tracks: Development of a general
framework for resource management in SuperCollider, called
Keywords "Lilt", the application of this framework to connect SuperCollider
kinetic art, audiovisual installations, python, SuperCollider, as a sound synthesis engine with Python to provide sound for
Processing, algorithmic art, tools for multi-platform development Jean-Pierre Hébert's art projects with sand and finally educational
of this framework in teaching the programming of interactive
1.INTRODUCTION audiovisual applications.
1.1 Combining Tools to Span Different Media 1.3 Initial Work: Python and SuperCollider in
The integration of different media both technically and
aesthetically is one of the main challenges in art. This is especially the "Sand" Project
true in art forms that involve different modes of expression and The present paper reports on work that started in 2004 as an
sensing such as sound, still or moving image, still or moving experiment to add sound to a series of kinetic installations by
sculpture, text, etc. on equal terms. Digital technology presents Jean-Pierre Hébert, which draw figures on sand by means of a ball
provides new and powerful tools for addressing this challenge. moved on a flat surface by a magnet. The magnet is controlled by
However, the tools and development environments available are a program written in Python which calculates the trajectory as a
rarely if ever capable of spanning several media with equal ability. sequence of line segments of specified length and direction. The
objective was to derive the sound synthesis parameters in real-
time based on the data representing the position and trajectory of
Permission to make digital or hard copies of all or part of this work for the ball on the sand. The data were sent from Python to
personal or classroom use is granted without fee provided that copies are SuperCollider via OSC. To facilitate development and enable
not made or distributed for profit or commercial advantage and that experimentation with different ways of matching sound physical
copies bear this notice and the full citation on the first page. To copy to movement, we developed an instrument-orchestra-score model.
otherwise, or republish, to post on servers or to redistribute to lists, While such a paradigm is known from sound synthesis
environments such as Csound, the present implementation differs
261
from it in fundamental ways because it requires real-time

synthesis (or "inference") of the score from the parameters of the
ball movement.
Figure 3: Basic GUI example
2.3 Connecting Scripts

Connecting Scripts refers to making one script read one input
from the output of another script. For example, a script that
contains a synth f that adds reverberation may read its audio input
from another synth s that produces an audio output. Thus the
reverberation effect of script f is added to the signal produced by
the audio output of synth s. A single script may at the same time
receive input from several other scripts, on one or several different
of its inputs. Similarly, a single script may send its output to one
or more other scripts. In most cases a script will have several
inputs but only one output. The inputs of a script that runs one
single synth are the synth's inputs while the output of that script is
the synth's output.
Figure 1: Example of a Sand Piece by Jean-Pierre Hébert
Implementing the dynamic interconnection of scripts proved to be
2. CONFIGURABILITY IN OPEN a major task. A number of constraints and conditions at different
DEVELOPMENT ENVIRONMENTS levels must be met: Synths have to be able to start and stop
independently of each other, be placed in the right order of
A basic motivation for the development of the library was the
computation in the synth graph of the server, and employ the right
need to organize code so as to maximize reuse while not limiting
configuration of busses for writing and reading signals.
the access of the programmer to all aspects of the system. The
Automatically computing the right configuration of busses was
objective of the library was therefore not to require of the
perhaps the most complicated part of the work. As shown in figure
programmer to master and use the API of the given tool
3, to enable two sources (w1 and w2 to write to two effect
exclusively, but rather to offer the option of wrapping any code in
processes r1 and r2 where w1 writes only to r1 and w2 writes both
a construct that provides essential features of control and
to r1 and to r2, it is necessary to copy the output signal of w2 to
interconnection. The fundamental concept that was born to
the bus that reads the separate output of w1.
address this need was that of a "Script" as a unit of code with
uniform but configurable features.
2.1 The "Script" Concept w1 w2 w1 w2

The script concept was born out of the need to create and manage
a library of code snippets that realize ideas in SuperCollider. By
providing a uniform interface for starting, stopping, controlling
and interconnecting scripts. A A a B
le
sib
r1 r2 r1 r2
os
tp
no
Figure 3: Signal Copying For n-to-n Configurations

An algorithm was devised that can compute the necessary bus and
copying synth structures for any configuration of synth
interconnections dynamically and realize it even while the synths
are running. Figure 4 shows one of several cases that where
analyzed in the process of developing the algorithm.
Figure 2: The Script Browser
2.2 GUI Generation w

One further feature of the script concept is the ability to create its
own GUI for control based on a list of simple specifications that
determine the names and ranges of parameters. To provide A a C b D c B
maximum flexibility, it is possible to override the default action of
the GUI element that is generated by the script by a user-defined
function.
r
Figure 4: Generalized n-to-n Configuration Case
262
Output
Inputs
Figure 5: Inputs and Outputs in a Script GUI
Figure 7: The Resource Pane Window
2.3.1 Extensions of the Interconnection Scheme
Interconnections are not limited to audio signals, but can also be 4. APPLICATION EXAMPLE: AN
created for control signals. Additionally, there is a similar scheme
for linking scripts so that they can exchange messages or function
AUDIOVISUAL SEQUENCE
calls. This is implemented by attaching editable pieces of code, The tools described above are currently being evaluated for
called "snippets" to scripts, which can be used to further control or application in mixed media for artistic production and for
automate the script's behavior. education. Figure 6. shows the results of work done by two
students, Alexandros Synodinos and Christos Mousas, at the
3. RESOURCE MANAGEMENT Department of Audiovisual Arts at the Ionian University as part of
A characteristic difference of experimental and programmable 4th year undergraduate coursework. These students had no
development environments to commercial tools for image or experience in programming at all.
sound processing is the relative lack of management facilities of
the former. Applications such as FinalCut Pro, DVD Studio, Logic
Audio, Cubase etc. use their own file formats for saving "project
data" which include settings such as the paths of audio files used,
processing data on the files etc. One of the objectives of the
present work is to provide such resource management facilities to
SuperCollider. The usefulness of such facilities is easy to
demonstrate: When experimenting with several scripts that require
synthesis algorithms, buffers, and bus interconnections it is
convenient to be able to save the configuration of scripts, buffers,
synthesis algorithms and interconnections onto file per mouse-
click. This is implemented in Lilt by the concept of a Session. A
session saves all the above data as a Script that can recreate the stage 1
stage 2
sessions elements. The Script is generated in SuperCollider code
and can therefore be inspected by the user.
stage 3 stage 4
Figure 8a: Initial Sections of an Algorithmic Audiovisual Piece
Figure 6: The Sessions Pane
263
from a given initial spiral and adding their own variations in

increasing freedom. The starting point was an example provided
by Jean-Pierre Hébert. This was first modified radically to reduce
to the basic functioning principles. Then an interface to
SuperCollider was provided using the Lilt library. Two versions
were prepared: In the first one, sound synthesis on SuperCollider
is driven from Processing. Conversely, in the second one,
SuperCollider drives graphic synthesis on Processing. The second
approach has the advantage that timing can be controlled
accurately and independently from the frame rate of the draw
function in Processing.
5. CONCLUSION
In this paper, we presented a framework for mixed-media
interactive installations that run distributed on the three
development environments SuperCollider, Python and Processing.
The central part of the framework is the Lilt library written in
SuperCollider, which enables the modularization and re-use of
code, the easy configuration and interconnection of modules, and
the saving of configurations in SuperCollider code as scripts. We
showed applications that used this environment both in an artistic
and in an educational setting. While the initial stages of work on
this project were hard, because the design solutions were not yet
stage 5 mature, more recent results are encouraging. Besides the
undergraduate work shown here, there exist also several graduate
projects that employ Lilt for multimedia work in connection with
Max/MSP and Jitter as well as vvvv (see http//vvvv.org).
Certainly, the graphic elements shown in the present example are
simple, and remind one of early phases in the development of the
Logo environment for programmable graphics [3]. However, there
is a big difference here, in that both timing and sound are
involved, and that it is possible to connect further independent
tools to the framework via OSC. The advantage of the present
approach is that it can support the combination of software
specialized in different domains, thereby helping to exploit the
full potential of these applications in work that involves several
different media.
6. ACKNOWLEDGMENTS
Thanks are due to 4th year undergaduate students Alexandros
Synodinos and Christos Mousas for providing the visual examples
in this paper.
7.REFERENCES
[1] Wright, M. and Freed, A. Open Sound Control: A New
Protocol for Communicating with Sound Synthesizers.
Proceedings of the 1997 International Computer Music
stage 6 Conference, Thessaloniki, Hellas (Greece), 1997, 101-104.
Figure 8b: Further Stages of an Algorithmic Audiovisual Piece [2] Alvaro, J. Miranda, E. and Barros, B. EV Ontology:
Multilevel Knowledge Representation and Programming,
The examples of Figures 8a and 8b. show several phases in the Proceedings of the 10th Brazilian Symposium on Computer
unfolding of an algorithmically composed audiovisual piece Music (SBCM), Belo Horizonte (Brazil) 2005.
running on Processing and SuperCollider. It is visible how the
students created a work with several distinct sections, starting [3] Papert, S. Mindstorms: Children, Computers, and Powerful
Ideas. Basic Books, N.Y. 1980.
264
Performer Model: Towards a Framework for

Interactive Performance Based on Perceived Intention
Greg Corness
Simon Fraser University
1345 King Edward Highway
Surrey B.C. Canada
011 (604) 505-1108
gcorness@sfu.ca
ABSTRACT systems are often more concerned with timbral generation, while
Through the developing of tools for analyzing the performers the player paradigm requires the use of some meta-compositional
sonic and movement-based gestures, research into the system- generation method to produce musical output” [10]. His
performer interaction has focused on the computer’s ability to taxonomy, though mainly focused on process used to generate the
respond to the performer. Where as such work shows interest system’s response, implies a consideration of a fundamental
within the community in developing an interaction paradigm difference in the manner of interaction. The instrument paradigm
modeled on the player, by focusing on the perception and suggests devices used for direct control of synthesis and low-
reasoning of the system, this research assumes that the level parameters (pitch, volume, on/off), while the player
performer’s manner of interaction is in agreement with this paradigm generally involves sensors that allow for the mapping
computational model. My study presents an alternative model of of larger performative gestures to global parameters.
interaction designed for improvisatory performance centered on
the perception of the performer as understood by theories taken Improvisational music systems often implement elaborate sensors
from performance practices and cognitive science. and algorithms for analyzing the physical and sonic gestures of
the performer [7] [14] [11]. The assumption underlying this
approach is that much of the communication between performers,
Keywords and in particular musicians, is through the context and syntax of
Interactive performance, Perception, HCI their sonic response. This argument is not wholly untrue and has
produced some very accomplished systems. However, musicians
1. INTRODUCTION tend to play within what might be termed social contact with each
For the past two-decade, composers have been designing other. With this term I refer to communications modes, such as
interactive music systems that are often viewed as new musical eyesight, that are separate from the act of playing, but I also
instruments, or as an emulation of a player or conductor [10][3]. intend to bring attention to the social aspects of music that give it
As processor speeds increase, the systems being designed not common ground with other performance disciplines.
only produce more complex sounds, but generating responses and
analyze performer’s gestures with increasing sophistication. In Other interactive music projects have expanded the mode of
conjunction with these developments, increasing amounts of communication to explore other cues such as visual movement
intelligence and autonomy are being built into systems for use in cues [14], acoustic variation [7][14] and multisensory multimedia
a variety of performance situations including improvisation. But [3] often in the context of interdisciplinary performance.
as the autonomy of these systems increases it may be necessary Exploration in Multimedia and interdisciplinary interaction has
to reconsider the models used for designing the interaction. found that the system-performer communication cannot rely on
Research into the performer-system interaction has focused the syntax of a particular performance domain but rather be
largely on the computer’s ability to respond. As composers expanded to general expressive gestures.
explore giving agency to the computer, the performer is being
required to be responsive. This study addresses a number of This research into performer-system communication shows an
issues that lead towards constructing a framework for developing interest in developing a player paradigm for interaction.
a performer-based model for improvisatory interaction. However, the research has focused on the perception of the
system thus ignoring aspects of human communication. This
study presents an alternative model of interaction appropriate for
2. TRADITIONAL MODELS improvisatory performance by examining theories taken from
In the early years of interactive music, Robert Rowe proposed a performance practices and cognitive science to focus on the
distinction between an instrument paradigm and a player performer’s ability to perceive intention.
paradigm as one axis along which we could place different
interactive systems [7]. Rowe suggests, “ Instrument paradigm
3. BACKGROUND THEORY
Communication in performance is an inter-subjective
Permission to make digital or hard copies of all or part of this work for phenomenon where understanding is agreed upon by the agents
personal or classroom use is granted without fee provided that copies are involved in the moment. As Lockford and Pelias explain:
265
“Even when faced with the challenge to perform in an of the other without the need of theorizing about it” [4]. What is
unscripted moment, performers understand that they are crucial to this phenomenon is that the action observed must be
engaged in an ongoing communicative exchange. This goal oriented, that is it must have intention [5][9][6].
exchange is a process best conceived, not as an act of
information transmission or shared understanding, but as
communication scholar H. L. Goodall, Jr. would have it, as an However, there is some question as to the usefulness of mirror
act of ‘boundary negotiation’.” [8, p433] neurons in human-computer interaction. The findings to date
concerning a person’s ability to perceive intention in others
Here “boundary negotiation” refers to the process of the self of suggest that the ability diminishes in correspondence to the
the performer being incremental build with in the context of the physical similarity with the other. This means that a human
performance. In a theatrical sense, this is the build up of character subject perceives the intention of other humans, but less so apes,
as new information is reveled in the scene. In a musical sense, the only slightly with other animals and not at all with machines [4]
negotiation is between soloist and accompanist over harmonic [5]. The prevalent reason given for this distinction is a perceived
extensions and rhythms that occur during a particular solo. Such similarity of motion [5]. It is then unclear whether a system’s
a negotiation implies that the agent must be able to respond to response actions would affect the per-cognitive process of a
new information while simultaneously presenting information to subject if accurately modeled on human action.
contribute to the self of other agents. Negotiation in these terms is
a coordination of the interaction between agents [8]. It becomes Still, the presence of the pre-cognitive function implies that the
imperative that all agents are able to negotiate the coordination of human cognitive system as a whole works in connection with this
their intention and therefore able to track the intention of the mechanism, and that even at a cognitive level, interaction is
others. governed by the prediction of events as much or more then
reaction to events, an interpretation supported by the presented
The importance of the agent’s ability to track intention can be theories on improvisational performance. These findings suggest
clearly seen when considering the notion of trust. Since the that as social being we have developed the ability to intuitively
agents constitute them selves and each other through the predict the actions and sounds of those around us.
negotiation of boundries [8], this inter-subjective communication
requires a sense of trust. For a performer to be open to The idea that human action and intention happens before the act
constituting their performance identity anew in negotiation with has been shown in other experiments as well. Wegner in his book
others on the stage, they must trust the environment. “The Illusion of Conscious Will” presents the work of Kornhuber
Furthermore, a sense of support is established when their actions and Deecke (1965) as well as Libet (1983). These researchers
both affect and support other agents. Again, this support comes measured a rise in brain activity up to 800ms before an action
from trust in the inter-subjective understanding of the moment. took place. In the case of Libet’s experiments, brain activity was
This understanding keeps the ensemble synchronized, but recorded over 300ms before the subject was even aware they
requires that all the performer-agents are able to track the wanted to act [13].
intention of the others. Therefore, it becomes imperative that all
agents be able to project their own intentions. These findings further indicate that humans do not live in a static
present moment but rather in a moment becoming the next. Our
social engagements are informed by an embodied empathy that
3.2 Agencies and State Knowledge allows minor predictions of those around us. We react not in the
Bogart and Landau coach students of improvisation to “trust in
moment but in the moment next over half a second late.
letting something occur onstage, rather then making it occur” [1].
Applicable to both sonic and physical gestures, their statement
does not mean that nothing should be started but rather to avoid 4. PERFORMER MODEL
forcing a start. We might call this an additive approach where The theories presented give an understanding of the role of
additive suggests that the agency is added to the state of the perception of intention in human interaction. Based on these
system whether it is in steady state or a dynamic state. The theories, I suggest that a framework for interaction between
implications of this view can be seen when considering the autonomous agents should address:
response of the performer rather then the system. To trust in the
something that will happen is to coordinate the actions, adding to 1) The need to negotiate boundries and build trust with others.
action of the system. This cannot be done in response. The 2) The development of an inter-subjective understanding of the
improviser must move beyond the cognitive and trust in the moment
intuitive [8]. 3) The need to feel supported through one’s agency and
acceptance in the environment.
3.3 Intuition and Intention I propose that these criteria may be address by incorporating in to
Research in the field of neuroscience has recently suggested links
the system a mechanism to allow the performer to perceive the
between intuition and intention. Neurons found in pre-motor
system’s intention. Therefore, I have started a series of studies
areas of the brain have been shown to fire not only when
looking at the experience of the performer working in a system
producing a sound or action, but when the subject hears the sound
designed to project its intention.
or observes others doing the action as well [5] [9] [6]. The firing
of these neurons allows the subject to predict the outcome of their
own actions as well as the actions of others. “This implicit, 5. SYSTEM DESIGN
automatic, and unconscious process of motor simulation enables The system used to conduct the study took two forms, visual and
the observer to use his/her own resources to penetrate the world sonic. Both systems were constructed through an iterative design
266
process using a first person methodology. In order to focus the as “fakes” in which the Light Actor moved it’s “weight in one
study on methods for modeling an embodied projection of the direction then immediately moved it back to a center position.
system’s intention, gestures in both systems were generated with This emergent behavior was of special interest. The perception
simple random processes, avoiding any signifiers that may come that I could tell where it was “thinking” of moving encouraged
from structure or syntax, and allowed the system to enact its own me to get close but the impression that it could “change its mind”
“intention” with no sense of the performer. The response kept up my interest in the engagement.
paradigms chosen for both test systems were informed by human
response and perception behaviors but were not meant to mimic 6.1.1 Test with non-projecting system
them. Finally, the research was set up as studies into the Some time was spent comparing the system with and without the
experience of a subject being afforded the ability to move with center circle active. With out the center active I noticed I was not
the system. No expectation of creation or performance was inspired to get close to the light, and my willingness to engage
imposed. with the system was shorter. Similarly, I noticed when the
response behavior was tuned to give less fakes the movements
5.1 Visual System became easier to predict, but the interaction became less
The response gestures in the visual system were realized using an engaging in the context of a tag paradigm.
image of two concentric circles generated in MAX/Jitter. This
image was projected onto the floor of the performance space 6.1.2 moving with the light
using an I-CUE dmx controllable mirror. The behavior of the During a second session I focus on moving with the light rather
system was set so that the inside circle needed to move off center then avoiding the light. At first I changing only my behavior, the
for the entire image to move in the space. Stopping required the system’s behavior pattern remained the same as before; however,
circle to return to the center. The direction and amount that the I found this interaction very unsatisfying. Although I could tell
circle moved off center corresponded to the direction and speed where the light was going, I had very little time to coordinate my
at which the image was about to move. The time required for the own movements. The interaction quickly became a dodging
inner circle to reach its maximum point was set at 200ms, in line rather then a moving with.
with the research presented by Wegner. The movement of the
light object was constrained using a dynamic weighted random The behavior settings of the system were then changed to
algorithm. The probability of the light moving in any direction generate movements that tended to be longer with less “fake”
was a function its position in the space. motions. These changes were modeled after mirroring exercises
in which human partners try to mimic each other’s motion with
5.2 Sonic System out a sense of leading. In these exercises, fluid, often slow
The sonic version of the study was modeled on the common idea predictable motions are emphasized. With the system’s behavior
that breath can be used to synchronize a group. The system used modeling mirror exercises, I found the interaction with the light
a physical model of a flute constructed in the PeRColate more of a moving with experience. However, the quality of my
synthesis library for MAX/MSP [12]. Each session explored movement remained at a “proof of concept” level. The interaction
different approaches to perceiving information embedded in did not inspire flow or exploration in my own movement.
different parts of the breath sound. The information was
embedded by manipulating the parameters of the flute model to 6.1.3 shape
get different qualities breath sounds before and after the tone. As final note, I noticed that the circle inside a circle design had
The timings of these different breath qualities in each session more the top down look of a joystick then a human. I tried giving
were functions of the generated gesture’s length, density and a more human shape by using ovals rather then circles, but found
speed. the oval shape less engaging then the circles. Though this can be
explained by the fact that an oval implies a direction and the
6. QUALITATIVE DATA system was not programmed to take direction of the image into
account, my experience suggests that the circle configuration,
6.1 Visual System though endowed with behavioral characteristics, remained a spot
I spent a number of sessions working in the system to feel the
of light. My perception of the object combined “lightness” with
experience of being in the space with it. As might be expected, it
behavior and did not need to construct a new humanoid entity.
was easy to anthropomorphize the light. I Perceived it’s motion
as a nervous exploring intention, even though I knew the
movements were random. Still, it quickly became apparent that 6.2 Sonic System
the system had no sense of my presence. This had been part of The audio-based system had a different initial impact. Where as
the design, however, it was interesting to note how easily I the visual system had inspired an avoidance response and only
perceived the design as experience. Furthermore, this perception after being re-modeled, produced a moving with response, my
profoundly changed the quality of the interaction from the experience was that the breath model in the audio based system
intended design model of tag to one of playing in ocean waves or immediately inspired a moving with response. The randomness of
taunting a blindfolded partner. My perception of the system’s the gestures had less of an affect, perhaps because there were no
movement intention, stalking and lunging with no focus on me, fake gestures produced by the sonic system. The breath sound in
inspired a sense of teasing. I noticed myself considering which the first session was linked to the duration of the generated
way the system was “thinking of moving” and circling to the phrase and produced a feeling of lift into tone of the sound. This
other side just out of “reach”. The random process used for feeling of lift encouraged my motion with the onset of the sound
starting and stopping also produced occasional motions perceived even though I had no knowledge of when it would happen.
267
Through reflecting on my response I noticed two parts to the alignment of both the performer and system’s intentions for a
breath generated by the physical model: the inhale and the stream more unified and balanced interaction.
focusing. I was lifting on the inhale but moving on the focusing
change of breath just before the flute tone. This discovery
inspired a series of sessions exploring the breaking of the breath 9. REFERENCES
sound into three parts: inhale, focused –airstreem and breath trail- [1] Bogart, A., and Landau, T., The Viewpoints Book: A
off. By considering that a breath into a beat is often used to signal Practical Guide to Viewpoints and Composition. New York:
a down beat and that more air is needed to play longer phrases I Theatre Communications Group, (2005)
mapped inhale duration to tempo and inhale volume to phrase
duration. This mapping frequently allowed me to anticipate the [2] Camurri, A., and Feffentino, P., Interactive Environments
tempo of the phrase and move with it but only with in a small for Music and Multimedia. Multimedia Systems 7: 32-47
range of values. However, when inhale duration was a function of (1999)
phrase length I found that I moved with out much thought with
the sound. The mapping of duration to tempo affected in me a [3] Camurri, A., et al. The MEGA Project: Analysis and
more rational approach to moving. Synthesis of Multisensory Expressive Gesture in Performing
Art Applications. Journal of New Music Research, 34:1, 5-
21
7. DISCUSSION
The literature and theories presented in this paper suggests that [4] Gallese, V. The “Shared Manifold” Hypothesis: From
human interaction is not restricted to reacting to enacted events. Mirror Neurons to Empathy. Journal of Consciousness
Instead, as social being, our interactions include the Studies 8 5-7 (2001) 33-50
understanding and prediction of events through the perception of
the intention of others in the environment. From these theories, I [5] Gallese, V., The Intentional Attunement Hypothesis: The
have suggested a framework for interaction, modeled around the Mirror Neuron System and Its Role in Interpersonal
abilities and needs of a Performer pertaining to perception of Relations. Biomimetic Neural Learning (2005) 19-30
intention. The crucial point is that all agents in the environment
need to be able to perceive the intentions of the other agents. The [6] Iacoboni, M. et al. Grasping the Intention of Others with
framework that I am constructing has a crossover with the One’s Own Mirror Neuron System. PLoS Biology 3:3
“Player” paradigm of interaction, first suggested by Rowe, in that (2005): 529-35
agency is being given to the system. However, the proposed
framework differs from Rowe’s paradigm by focusing on [7] Lewis, G. E., Interacting with Latter-Day Musical
interaction through the perception of interaction rather then Automata. Contemporary Music Review 18:3 (1999): 99-
through a process for responding. 112
In order to demonstrate the implications of this approach in the [8] Lockford, L. and Pelias, R., Bodily Poeticizing in Theatrical
context of both sonic and physical interaction, I have discussed Improvisation: A Typology of Performative Knowledge.
two example systems: a visual based system and a sonic based Theatre Topics 14.2 (2004) 431-43
system. Both systems were designed around the claim that the
performer needs to be able to perceive the intention of the system [9] Kohler, E., et al. Hearing Sounds, Understanding Actions:
in anticipation of any action. The result of my studio work Action Representation in Mirror Neurons. Science 297,
indicates that both visual and sonic systems provide the (2002) 846-8.
opportunity to embed information in the system’s response
media, projecting the system’s general intention. Analyses of the [10] Rowe, R. Incrementally Improving Interactive Music
results indicate further that the two systems share many of the Systems, Contemporary Music Review 13:2 p.47-62 (1996)
same issues. The cognitive load imposed on the performer when
trying to predict the action of the system was reviled as an issue [11] Thom, B., Artificial Intelligence and Real-Time Interactive
when using analytical models to indicating intentions. These Improvisation. Proceedings of the Seventeenth Conference
models were most prevalent in the sonic system, and yet, a on Artificial Intelligence, Austin Texas, August (2000)
similar effect was observed in the visual system. Both systems
indicated an experiential difference between “natural” and [12] Trueman, D., and DuBois, R. L., PeRColate: A Collection of
analytical interactions; however, the parameters for separating Synthesis, Signal Processing, and Video Objects for
these qualities have not been isolated. What was made clear by MAX/MSP/Nato. V 1.0b3
the studio work was that the manner in which the system
expressed its intention did not need to be “true”, modeled on a [13] Wegner, D. M., The Illusion of the Conscious Will,
human gesture. However, there is some indication that a stronger Cambridge MA. USA: MIT Press (2002)
reference to signifiers that are already part of the performer’s
body knowledge reduced their need to rationally analyze the [14] Weinberg, G., and Driscoll, S., The Perceptual Robotic
intention of the system. Of prime importance was the observation Percussionist- New Developments in Form, Mechanics,
that a feeling of trust and sharing of space was created in the Perception and Interaction Design 2nd ACM/IEEE
system projecting its intention that was not present in the International Conference on Human-Robot Interaction
response only system. With more investigation it is hoped that a Washington DC, USA, March 9-11 (2007)
system may be developed, that enables the integration and
268

+&"9:
9*+;
!"#$%$# .1<9
#&' :=5#)
()* & 632345%3%54>.
+,-(#...
//012134/151 ?0>>$1798,8
622..3532$52.
07*8,8

`

!

#!
{|
}

<
! # <

\

!" #$"$
%$ '$" #$"$

!

!

^

269

¤

*;<=>=
*'%
#"
@$%"%#$JZ \%$%"# ;
#$'^
#$\#$
"$%_%$

*#$
!"
#"
$#
%`#

j%`'
"
^$%"

#$$% # !$

}

\
^
#$%#$%"\\ '$$ $
"'$ ^;{'%\$
%#$
!"
" #$ \\
|$

# |
%
\
_%"
#
{ '%
#"
;$%"%#$
!" $%

¤

! }
¡

`

¥

^ ¢

{

{!}

^

££ |

^

}
££

&'
*+$%;
}

270

|
¦£

¡

#<

# < }~

# <

=?@ *QZ[;
\\

§ |

#
<

#

< # <

#

¨

!ª

|

< §

#

< ^

«
¬

Q

[

}
$
$%"#\% !$
#$
"$%_%$#
¡
$'\
%_%$

¡
# <

¡

¡

| #
<

# <

271
j

¯{ª
¡!!

£
¯ Z

" @ ] ^" "
"
*
_\
` ]{®®®¤®
#<

' * ?@
|
¡ ¤
® ± ! £ ¬ ^
}*=`~%{!

¡
® ¤ | ¡ {

² ³ *
"[ @
|
*@[ +[ ^ @ ^[\
£!}¤

£ ^
| !¡{
±
¯
£ Z
"@@

* " * ""

`+_{®

ª Z@*
\@ "

¡ ¤

?@ " !¡
^ |

Z**_!
¡

§ ª!|
£ }
Z
"@^"
[

} "
*_\
} {£¤§¤

}
£
!
"@^""

"
*_\

£!

} !
¨
& ¡

± ¡£
¡ §¤
® £´}
~ µ ^ ª ¡!
¤ ^`
` ^ '
@
$
!
¡

|

¡
}

} <Z
"@]

*^[\
^"
}

¯ ` ¬ {

¡ } ± {
¬ {£

£ | ¬
^
±
£ £

^" "
"
*

! _\
{£{¬ {¬±!

^\ $| ^ @[!
~
?*
¶ ¡£}±¡!£

272
; "
#
6
& &
"<
!
" # $ $
"
"# $ #
$
% &
& ' "
( # 4
5
( " 4 5 & 4 & 5
( ) * 6
+ , "- "
# & ;
&
. & " 2 &
"
" 2
/ 6 "
# 3 3
& 3
'
" "
#
& # "
& 0 / # 1 # 3
" 2 (
3
3
"
"
) 4 5 & "
" # ;
"
6 = 3
3 & &
" 7 " <
(
"#
-
* ,"- 89 - 89:
273
" 1
%
"
2 3
"
4 & 5
%
& "
#
%
&
"2
3
3 "# !" # "$ % &'( % )

3
( 3
# " #
" !
# 3 "
2 &
> &6> = $ 6
$ >2/2 " 4 5
"0
% "#
3
"#
3
6 &
"
1
!
"
2 3
! & " #
!" # ('! )*+
! %
&
3 4 5" ! "
# 6 6 #
( 3 !
" 4 5 (
&
"# 3 & " 1
" ! !
# 4 5 !
& 3 !
274
3 # &
"? & !
3
! " < 3 !
! "
2 " #
> > 3 &
" 2 & " # 3
! &
&
& "
(
&
- . $
! "
#
1 3
& "#
(
" 2
47 ; 5
"#
3 "2
&
!
%
&
(
& "
(
7
" # &
(
"#
3
;
7 ;
& " 1
(
( 3
&
"#
"1 :
!
+@ 8@
" # :
; & " #
3
(
" #
;
&
"# 3
"
6
"1
2
;
4 4
'
!
"
!
" *
'
!" # , (+ % , )* &
"
275
/ 0 1
/ 3 > >
& 3"
4 5 4 5" 2
2
/ ?" A ."
(
0 B > 3 C2 D" "1
/1
-88 "
3 " 2
/ / " >" ) > * =
! -8E@"
" 7 1" 0"
# 4 5 , 7 -8++"
" #
& 7 1" 0" " <&
$ $ D = -8@+"
" B >" ? B" <&
# D = -8E@"
. =" "D B
( = -8@E"
& %
; " #
&
"
276
A Directable Performance Rendering System: Itopul
∗
Mitsuyo Hashida Yosuke Ito Haruhiro Katayose
School of Science & School of Science & School of Science &
Technology Technology Technology
Kwansei Gakuin University Kwansei Gakuin University Kwansei Gakuin University
Sanda, 669-1337 JAPAN Sanda, 669-1337 JAPAN Sanda, 669-1337 JAPAN
hashida@kwansei.ac.jp katayose@kwansei.ac.jp
ABSTRACT software systems adopt rules-based approach, which enable

One of the advantages of case-based systems is that they a user to generate expressions relatively easily. However,
can generate expressions even if the user doesn’t know how these systems have a problem in that their expression rules
the system applies expression rules. However, the systems are limited.
cannot avoid the problem of data sparseness and do not Meanwhile, one of the advantages of case-based systems is
permit a user to improve the expression of a certain part of that they can generate expressions even if the user doesn’t
a melody directly. After discussing the functions required know how the system applies expression rules. However,
for user-oriented interface for performance rendering sys- the systems cannot avoid the problem of data sparseness.
tems, this paper proposes a directable case-based perfor- Furthermore, the rule-based and case-based systems have a
mance rendering system, called Itopul. Itopul is character- common problem that the design of the interface for user-
ized by 1) a combination of the phrasing model and the oriented performance rendering.
pulse model, 2) the use of a hierarchical music structure for We discuss the functions required for such a user-oriented
avoiding from the data sparseness problem, 3) visualization interface and propose a directable case-based system for
of the processing progress, and 4) music structures directly performance rendering.
modifiable by the user.
2. INTERFACE OF A CASE-BASED
Keywords PERFORMANCE RENDERING SYSTEM
Performance Rendering, User Interface, Case-based Approach One of the good points of a case-based performance ren-
dering system is that it enables an inexperienced listener to
give a music piece an expression with little manual opera-
1. INTRODUCTION tion. However, a system that automates every performance
Music production has been increasing with the popularity rendering function may make a listener feel that it is in-
of the web 2.0 services. Performance rendering is seen as a convenient. An interface that has enough directability to
way to meet the needs of the new forms of production that reflect a user’s operation directly and immediately would
can be supported with these developments. A performance solve these problems. To devise one, the following four sup-
rendering system expresses a certain music piece by chang- ports are needed.
ing its dynamics, rhythm, and tempo as a human virtuoso
would play a piece expressively. 2.1 Supporting Searches of Referred Cases
Researches that ushered in performance rendering sys- A case-based performance rendering system must have a
tems date back to the 1980’s [2]. Since the 1990’s, ap- lot of pieces to use as cases to give an expression. It requires
proaches involving music recognition theories such as the a large-scale database and a search engine. Furthermore,
generative theory of tonal music [7] and implication-realization the pieces in the database need to be annotated to improve
model [8], learning systems, and example-based reasoning the usability of the system’s search.
[6, 10] have been proposed. In addition, a hearing com- As an approach to annotation, Suzuki et al. use a vari-
petition for system-rendered performances called Rencon 1 able for the playing situation in their system “Kagurame
has been held since 2002 [5]. Moreover, a lot of commercial [9]”. They try to deal with the data-sparseness problem
software for desktop music and digital audio workstations by analyzing the similarities of the extracted melody can-
has been published. didates to the target melody and by using them as weights
Automated performance rendering systems are classified to transfer the features of the expression.
into rules-based and case-based. Many commercial music
2.2 Enabling the User to Choose and Control
∗Currently, Hewlett-Packard Development Company, L.P.
1
the Rendering Procedure
http://www.renconmusic.org/ The usual function of a case-based performance rendering
system is automation of every rendering procedure. This
causes the activity to be hidden from the user.
Some rule-based rendering systems such as SuperConduc-
Permission to make digital or hard copies of all or part of this work for tor [1], Finale2 and jPop-E [4] enable a user to indicate the
personal or classroom use is granted without fee provided that copies are region of the score to apply rules and the values of the rule
bear this notice and the full citation on the first page. To copy otherwise, to parameters. Such functions increase the variety of expres-
republish, to post on servers or to redistribute to lists, requires prior specific sions of the target piece. To control expression, case-based
permission and/or a fee. systems need an interface framework to enable a user to
NIME08, Genoa, Italy 2
Copyright 2008 Copyright remains with the author(s). http://www.cameo.co.jp/finale/
277
user’s action
input target score Structure Analysis examples (cases)
to be referred
indicate phrase
(melody fragment) case1
case2
modify structure case3
(option) ...
choose a phrase
Similarity Search
1. Usage of the referred melody fragment; the most
indicate the strategy similar melody fragment, all of the extracted super class
melodic fragments over the threshold, or the
for copying melodic fragments that the user selects for case1 model
parameters of herself/himself. case4
expression 2. Usage of the parameters; weighed parameters case8
based on the similarity of referred melody ... waltzes
fragments or the simple average of the sonatas
parameters of the referred melodic fragment. Chopins
3. Use parameters of super class ...
indicate (if 2, 3) threshold
edit an approximate
melody line (option)
Expression Copy threshold = 1.0 threshold = 0.5 threshold = 0.0
performance
Figure 2: Examples extracted by threshold. The

user can listen to and measure the referred melody
Figure 1: Overview of Itopul fragments on the list.
indicate the criterion of the similarity search and the ex-

pression transfer procedures. is to copy tempo and dynamics of a sequential melody line
directly, and the other is to analyze (decompose) and com-
2.3 Showing the User the Situation and pose hierarchically each tempo and dynamics of the melody
Progression of Each Procedure structure. Itopul takes the latter approach to make much
of expression of each phrase. This approach requires a hi-
As related above, by not showing the result of each ren-
erarchical description of the performance expression.
dering procedure, the present case-based systems don’t pro-
A popular method to analyze music structure hierarchi-
vide enough feedback to a user. In jPop-E [4], the function
cally is Clynes’ pulse model, based on which SuperConduc-
to show a user the resulting expression of the target phrases
tor [1] is implemented. The pulse model is focusing on met-
and the rule parameters works in real-time. This function
rical structure (strong beat, weak beat) analysis. Clynes an-
is useful for design support and education. The interfaces
alyzed the characteristics of music expression by composer,
of case-based systems must also show the progress of each
based on the idea that characteristics appear in the bal-
procedure.
ance of tempo, and the dynamics in each metrical structure.
2.4 Seamless Function to Modify the Rendered The pulse model is very useful, but it is difficult to apply
Expression phrasing expression and SuperConductor dose not support
transferring the features of a certain expression.
The present performance rendering systems by academic
In Itopul, the performance parameters are expressed as
researchers use automatic rendering and do not support
the shape of linear line fragments given to each phrase di-
subsequent editing of the details of the expression. These
vided into two subphrases. This approach can deal with
systems have superior automatic rendering processing com-
both the metrical structure and the phrase expression within
pared with commercial DTM systems, whereas the DTM
the same framework.
systems comparatively have superior quality and produc-
tivity in their expressions, which a user elaborated with
using them, than the automatic systems. The automatic 3.2 Hierarchical Performance to Avoid
systems should have editing functions like those of a DTM Sparseness Problem and Suggesting the
system and seamless combination of expression interfaces. Rendering Process to the User
We think Itopul should as faithfully as possible copy the
3. DESIGN OF DIRECTABLE CASE-BASED characteristics of the performance examples that the user
indicates. However, these examples perhaps can’t always
PERFORMANCE RENDERING SYSTEM apply to all melody fragments of the target score. Some
We developed a case-based performance rendering system examples might have less similarity, so is it appropriate to
Itopul based on the considerations stated above, as shown apply such rare examples to the target score? Itopul pro-
in Figure 1. Itopul works on Java environment. vides a function to listen to the generated performance while
Itopul is characterized by 1) a combination of the phras- showing extracted performances which have more similarity
ing model and the pulse model, 2) the use of a hierarchi- than the threshold a user indicates (see Figure 2).
cal music structure for avoiding from the data sparseness The user can choose the strategy for copying the param-
problem, 3) visualization of the processing progress, and 4) eters of expression (1. usage of the referred melody frag-
music structures directly modifiable by the user. In this ment; the most similar melody fragment, all of the extracted
section, we describe the design of Itopul by focusing on the melodic fragments over the threshold, or the melodic frag-
search support of reference cases and the choice and control ments that the user selects for him/her. 2. Usage of the
of processing by the user. parameters; weighed parameters based on the similarity of
referred melody fragments, or the simple average of the pa-
3.1 Architecture of the Phrasing Model and rameters of the referred melodic fragment). If no melody
the Pulse Model fragments are extracted for a certain threshold, no expres-
The fundamental function of the Itopul is to copy the sion might be given to the melody. In this case, Itopul no-
characteristics of a performance expression in existing mu- tifies the user that it couldn’t find melodies and then asks
sic samples. There are two possible methods to do this; one if the use wants it to apply a general (average) expression
278
œ œ œ œ œ œ œ œ œ œ œ ˙ using the GUI.

&c œ œ Œ
4.1.2 Rhythm
Itopul describes the surface rhythm as a vector. The vec-
tor element of the onset (the unit is the shortest note) is 1
and of the non-onset is 0. For example, the score of Figure
3 is described as {1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0}.
Itopul can evaluate the similarity of the target melody
fragment and the reference melody fragment even if their
subphrase A subphrase B lengths are not the same. When the lengths are different,
elements 0 will be added to the front or end of the shorter
phrase X vector until the length becomes the same as the longer one.
length(A) The cosine distance is exhaustively calculated. The system
ct3 = ct5 = length of X chooses the largest cosine-distance obtained by this proce-
length(A) + length(B)
dure as the similarity of the surface rhythms simr .
4.1.3 Similarity measure
Figure 3: Parameters of a melody fragment
The similarity of contours simct is calculated with the
following equation (the weighted cosine distance):
model matching the style of the target score. wct (uct · vct )t
simct =
|wct ||uct ||vct |
3.3 User-oriented Hierarchical Music
Structure Analysis Where, uct and vct , denote vectors which consist of the
To copy the features of a melody fragment to the tar- parameters described in section 4.1.1; and wct are weight
get music, we need to describe the fragment hierarchically vector, each component of which is multiplied with ct1 , ...,
and to consider the balance of automatic and manual proce- ct5 .
dures. In Itopul, a user first indicates the area of a melody The similarity between the target melody fragment u and
fragment (phrase) as the user analyzes the fundamental the referred melody fragment v, sim slu,v is calculated as
phrase structure. Then, the system analyzes the upper and follows, using simr and simct , the weight parameter wcr the
lower structure based on the phrase structure automatically. ratio of the lengths of the target and the referred melody
The upper structures are analyzed by using ATTA [3]. The fragment, and its weight wld :
j ffwld
lower structures are analyzed in terms of the metrical struc- min(len(u, v))
ture, as each melody forms a binary tree; long duration and sim slu,v = (wcr simct + (1 − wcr )simr )
max(len(u, v))
rest notes are considered.
If the user isn’t satisfied with the analyzed structure, sim slu,v reflects the similarity of the same layers only. The
Itopul provides an editing function to modify it manually. final similarity between the target and referred melody frag-
ment should reflect the similarity at the lower layers. The
final similarity simu,v is revised using the following equa-
4. COPYING THE PERFORMANCE tion recursively from the lower layer to the higher layer.
EXPRESSION simu,v = 0.5 ∗ sim slu,v + 0.25 ∗ (sim slu1 ,v1 + sim slu2 ,v2 )
Itopul decomposes a target and referred melody into multi-
layered melody fragments (groups) as shown in section 3.3. Here, {u1 , u2 }, and {v1 , v2 } are lower melody fragments of
Melody fragments are used as the units of the similarity u and v, respectively.
calculation and also as the unit of expression feature copy-
ing. The expression of a note of the target is calculated by
4.2 Getting the Expression Parameters
multiplying every parameter obtained from the extracted Itopul employs a model to deal with the pulse model and
examples; the shape is similar to each hierarchical melody phrasing in the same architecture. Here, we describe a pro-
fragment that contains the note. cedure to extract expression parameters regarding tempo
and dynamics of the melody fragment of the source exam-
4.1 Similarity between Melody fragments ple.
The similarity between the target and referred melody Step 1 Select the melody fragment (from a higher layer).
segments is calculated using the melodic contour and the Step 2 Fit two line segments (A, B) to the expression data.
surface rhythm. First, the Itopul calculates the similarity (Get the coefficients of two line segments (A, B) as the
between the target and referred melody segments in the expression parameters from the reference value given
same layer. Then, the similarity is re-calculated as it con- by the average value of the expression data of the
siders the similarity of the lower layer. melodic fragment.)
Step 3 Replace the expression data with the residue (sub-
4.1.1 Melodic contour tract fitting at the step2), then go to step 1 with the
There are five basic parameters of melodic contour (ct1 , melody fragment (lower layer).
... , ct5 ) in Figure 3. ct1 and ct2 are inclinations of two
line segments, and ct3 is the ratio of the length of the for- This procedure is a modification of the phrase fitting with
mer line segment to the latter. ct4 is the pitch difference a quadratic equation proposed by Widmer et al. [10].
between the last note of the former line segment and the 4.3 Calculation of target expression
first note the latter line segment. ct5 is the length of the
The power of each note I(Ni ) is calculated using the ex-
whole melody fragment. Parameters (ct1 , ... , ct4 ) are auto-
pression parameter Iratio (Ni , level) of note Ni at layer l:
matically calculated using least squares fitting. If the user
Y
regards the boundary suggested by the automatic fitting to I(Ni ) = Iratio (Ni , l)
be unsatisfactory, he/she can manually edit the position by l
279
The duration of each note is calculated as the same man- 5.2.2 Musical ability of the system
ner as the power calculation. The difference is only the Itopul allows users to generate expressions of metric struc-
parameters of the note the position and the time-value of ture and phrasing based on the results of a similar melodic
which are the same between the target and the referred fragment search. At present, the functions to copy expres-
melody fragment, when calculating the duration. sions regarding tempo, dynamics and the duration of notes
As for the timing control, tempo is controlled by sending have been implemented. One of our future jobs will be
the local score beat-time given by the following equation to provide functions to deal with articulation of each note
every infinitesimal score time Sk . within chords.
Y “ ” The current version of Itopul is designed for monophony
T (Tk ) = Tratio Tk̃,Ni ,l expressions. The expression of the accompaniment part is
l generated by simply copying the expression of the corre-
Where“ Ni is ” the note that contains Tk in its control, and sponding melody part. We should improve the system so
that it can deal with polyphony. We are planning to trans-
Tratio Tk̃,Ni ,l is the score beat-time ratio at time Tk̃ of the plant the functions from jPop-E [4] that we have been de-
referred melody fragment of the target melody fragment at veloping.
level l. Iratio (Ni , l), and Tratio (Tk̃ , Ni , l) are calculated by In addition, expression marks, explicitly described in scores,
referring to the user’s preferences (see section 3.2). such as staccato or legato need to be handled.
5. PERFORMANCE GENERATION AND 6. CONCLUDING REMARKS

The main goal of the performance rendering systems that
DISCUSSION academic researchers have been developing is the realization
of autonomous functions for performance generation. This
5.1 Performance Generation paper discussed the requirements of performance rendering
The input files of Itopul are the target score (MusicXML systems as a tool for human designers and introduced a case-
format), pairs of the score (MusicXML format) and perfor- based performance rendering system called Itopul. We be-
mance data (DeviationInstanceXML format3 ) of the exam- lieve that it is indispensable to evaluate musical competence
ples (cases) to be referred. as well as the usability of the performance rendering sys-
The users of Itopul can generate expressive performances tems. We’re going to bring Itopul to the ICMPC-Rencon4
easily by using the following steps: (an international event), then we will hand out this point.
1. Suggest melodic fragments (boundaries) only at a cer- 7. REFERENCES

tain layer of the target and the source music.
[1] M. Clynes. Superconductor: The global music
2. Adjust the threshold of the similar melody fragments interpretation and performance program.
search. http://www.superconductor.com/clynes/superc.htm,
3. Select the strategy for copying expression parameters: 1998.
(1. usage of the referred melody fragment; the most [2] L. Fryden and J. Sundberg. Performance rules for
similar melody fragment, all of the extracted melodic melodies. origin, functions, purposes. In Proc. of
fragments over the threshold, or the melodic frag- ICMC, pages 221–225, 1984.
ments that the user selects for herself/himself. 2. [3] M. Hamanaka, , K. Hirata, and S. Tojo. AT T A:
Usage of the parameters; weighed parameters based Automatic time-span tree analyzer based on extended
on the similarity of referred melody fragments, or the GT T M . In Proc. ISMIR, pages 358–365, 2005.
simple average of the parameters of the referred melodic [4] M. Hashida, N. Nagata, and H. Katayose. jpop-e: An
fragment) assistant system for performance rendering of
4. Give directions on using parameters of expression of ensemble music. In Proc. of NIME, pages 313–316,
meta-class (In case similar melodic fragments are not 2007.
extracted from the given source music.) [5] R. Hiraga, M. Hashida, K. Hirata, H. Katayose, and
K. Noike. Rencon: toward a new evaluation method
5.2 Discussion for performance. In Proc. of ICMC, pages 357–360,
2002.
5.2.1 User Interface [6] O. Ishikawa, A. Aono, H. Katayose, and S. Inokuchi.
Extraction of musical performance rules using a
The following points are crucial for the realization of di-
modified algorithm of multiple regression analysis. In
rectability: 1) assistance in searching for examples, 2) users’
Proc. of ICMC, pages 348–351, 2000.
operation and processing selection, 3) notification of the
menu and processing status to the user by the GUI or sound [7] F. Lerdahl and R. Jackendoff. A Generative Theory of
device. Tonal Music. MIT Press, 1983.
For 2), 3), we proposed a model to deal with the phrasing [8] E. Narmour. The Analysis And Cognition of Basic
and pulse model on the same architecture and procedures Melodic Structures. the University of Chicago Press,
for solving the sparseness problem. These proposals with 1977.
some practical UI designs provide users with directability. [9] T. Suzuki, T. Tokunaga, and H. Tanaka. A case based
As for 1), Itopul can analyze the melodic structure and ob- approach to the generation of musical expression. In
tain parametric features of melodic fragments. These func- Proc. of IJCAI, pages 642–648,, 1999.
tions are the bases to implement search engines on a large- [10] G. Widmer and A. Tobudic. Playing mozart by
scale music database. We still have to carry out a usability analogy: Learning phrase-level timing and dynamics
test. strategies. In Proc. of ICAD, pages 28–35, 2002.
3 4
http://www.crestmuse.jp/cmx/ http://www.renconmusic.org/icmpc2008/
280
Designing Ambient Musical Information Systems
William R. Hazlewood Ian Knopke

School of Informatics School of Informatics
Indiana University Indiana University
Bloomington, Indiana Bloomington, Indiana
whazlewo@indiana.edu ian.knopke@gmail.com
ABSTRACT information can be provided in such a way that it can be

In this work we describe our initial explorations in build- easily ignored when the observer has more pressing issues to
ing a musical instrument specifically for providing listeners address.
with simple, but useful, ambient information. The term Over time, Weiser’s conception of calm technology has
Ambient Musical Information Systems (AMIS) is proposed compelled researchers to explore several novel methods of
to describe this kind of research. Instruments like these dif- observing and interacting with information. One outcome
fer from standard musical instruments in that they are to of these explorations is a specific class of alternative displays
be perceived indirectly from outside one’s primary focus of that can provide useful information while blending natu-
attention. We describe our rationale for creating such a de- rally into the surrounding environment. These devices are
vice, a discussion on the appropriate qualities of sound for distinguished from more common informational displays in
delivering ambient information, and a description of an in- that they are primarily intended to be perceivable from out-
strument created for use in a series of experiments that we side a person’s direct focus of attention, and providing pre-
will use to test out ideas. We conclude with a discussion of attentive processing without being distracting. Technologies
our initial findings, and some further directions we wish to such as these are often embedded in existing environments,
explore. making use of unused physical and visual aspects of every-
day objects to provide an information channel that can be
easily ignored when there are more important matters that
Keywords require ones attention [6].
Ambient Musical Information Systems, musical instruments,
human computer interaction, Markov chain, probability, al-
gorithmic composition 2. AMBIENT INFORMATION SYSTEMS
While exploring these concepts, researchers have used a
1. INTRODUCTION variety of different terms to describe their own implemen-
tations for this form of representing information. These
In 1996 research conducted by Mark Weiser at Xerox
terms include: peripheral displays, ubiquitous technology,
PARC gave a new perspective on human-computer inter-
informative art, everyday computing, glanceable displays,
action. Weiser argued that the best technologies are those
user notification systems, and slow technology. Partially be-
which are not experienced as technology at all, and pointed
cause of this over proliferation of terminology, Pousman and
out that the designs that “encalm” and inform meet two
Stasko [7] have proposed the term ambient information sys-
human needs that are not usually met together [9]. For
tem (AIS) as a means to describe the collection of properties
example, cell phones, news services, television, and pagers,
that are inherent in all such implementations. According to
provide useful information, but at the same time they con-
their definition, AIS is intended to describe all technologies
trol our attention, causing unnatural distractions, and ul-
that, display non critical information, move easily in and out
timately leading to a state of informational overload. To
of the periphery, focus on tangible representations, changes
address this conflict between people’s need to be aware of
subtly to reflect changes, and have an emphasis on aesthet-
a growing amount of information, and people’s need to not
ics.
be overwhelmed by it, Weiser proposed the development of
Almost all existing AIS implementations make use of var-
calm technologies. In Weiser’s vision, information can ex-
ious visual elements such as: color, texture, shape, and mo-
ist both at the center and the periphery of our awareness,
tion, to produce subtle changes in their appearance and
moving smoothly back and forth between the two, so that
convey a specific piece of information. For example, one
device which is commercially available is the Ambient Orb,
from AmbientDevices.com. This device consists of a sim-
Permission to make digital or hard copies of all or part of this work for ple frosted glass sphere that contains an array of colored
personal or classroom use is granted without fee provided that copies are LEDs that can be powered in such a way as to allow the
not made or distributed for profit or commercial advantage and that copies orb to display thousands of different colors (see Figure ??,
bear this notice and the full citation on the first page. To copy otherwise, to left). By using these changes in color, the orb can be con-
republish, to post on servers or to redistribute to lists, requires prior specific figured to display several different channels of information,
NIME08, Genova, Italy such as: stock price, weather forecast, traffic condition, and
Copyright 2008 Copyright remains with the author(s). local air quality. For example, with stock pricing the orb
281
may shift from green to red when a particular stock drops “League of Electronic Musical Urban Robots” and create a
significantly in price. Another example of a visual ambient semi-autonomous automated musical instrument [8], which
display is the DataFountain by Koert van Mensvoort. This can be fed information regarding the activities in a remote
display is comprised of three water jets that project water location, and change its state accordingly.
to different heights based upon the relative value of the Yen, Our musical interface has to be able to alter the music
Euro, and US Dollar. it was producing to inform listeners about different levels of
One of the problems with the normal visual cues used activity in remote locations, but had to do so without forcing
in AIS is that they must be within the reader’s field of vi- itself into the listener’s primary focus of attention. We had
sion. However, one of the primary advantages of this class to come up with a way to change the type of music being
of information technology is that one may perceive it while played such that it mapped to the remote level of activity,
focusing directly on other tasks. This has lead us to believe but without being distracting. To achieve this, our system
that sound may be a preferable medium to use when de- makes use of Markov chains in a generative context. This is
veloping this sort of system, although sound-based AIS has discussed further in Section 6.1.
been largely left unexplored. To build a proper AIS we had to take the aesthetics of our
Unlike more traditional musical interfaces, which are de- instrument into careful consideration. Maintaining the in-
signed to be directly observed in public performances, we strument’s perceptual subtlety requires that it blend smoothly
need our instrument to have some subtlety and to perform into the environment so that people would not be overly en-
in the periphery of the listener’s awareness. gaged with it. This means that it could not be overly at-
This paper describes an instrument designed to study mu- tractive or unattractive. In this case, we chose to make the
sic as a medium for delivering ambient information in the instrument resemble a piece of generic artwork, akin to those
style of AIS. We begin by discussing our design rational for that might be found in common waiting rooms or lobbies.
the development of the initial instrument, and follow with a For our sound emitting material, we made use of thin bars of
description of its actual development and construction. Fi- slate stone which were tuned and could be played similarly
nally, we discuss some evaluations and the next stages in our to a xylophone or marimba. To keep the motion and elec-
research. tronics from drawing attention to the instrument, we hid
all of the mechanisms that were operating the instrument
3. DESIGNING AN AMIS internally. To the casual onlooker, it would appear that
the instrument was nothing more than a simple, somewhat
Because of the lack of existing sound-based implementa-
bland, wall-hanging piece of artwork.
tions, we sought to construct our own AIS that can convey
a simple stream of information within a public setting. We
are proposing the term Ambient Musical Information Sys- 4. MUSIC AS INFORMATION
tem (AMIS) for this type of research, based on Pousman As we have mentioned, there have been few explorations
and Stasko’s definition of an Ambient Information System, in AIS that make use of sound as an information channel,
but focused on audio-based delivery. but the underlying concept of providing music as an addi-
One type of information which is both important and use- tional information layer has some precedents. Perhaps the
ful, but not necessarily critical, is regarding how and when most pervasive example of music as information is Muzak.
people are making use of public spaces (i.e., lounges, con- Used by 90 million people each day, this company’s tradi-
ference rooms, study halls, etc.). The particular situation tional products are designed specifically to be non-invasive,
we wished to experiment with is a system that can inform yet are subversively “made and programmed for business
people in one location how much activity is taking place in environments to reduce stress, combat fatigue, and enhance
a remote location. For example, if there are two separate sales” [4, pp. 4] and in some cases has been used to increase
lounges in a building it could be useful to know if a high worker productivity in factories by arranging songs in cycles
level of activity is taking place in the other lounge so that of increasing tempo. [4, pp. 43–5]. Muzak is a good example
one could choose to relocate to the area where most of their of music that was developed to be perceived outside the di-
colleagues are. Alternatively, they could choose to relocate rect focus of attention. The style of Muzak is produced such
to the other lounge if less activity was taking place, and they that it is deliberately tame (“easy listening”), and does not
needed a place to study, or have a private meeting. give cause for listeners to become overly interested in what
This is one of the most important and difficult features is being played.
of an AIS to design. We had to carefully consider different One of the continuing topics of discussion concerning our
qualities of sound that might be appropriate for this sort instrument is regarding the qualities of sound that are best
of informational interface. The sound produced needs to suited for conveying ambient information. Of course we
situate the information being conveyed at the edge of the know what sound qualities are probably inappropriate (e.g.
listener’s perception, and fade in and out of their aware- a fog horn, or drum set), but we believe that there may be
ness depending on the level of information that is being other sound qualities (e.g. tone, timbre, resonance)may be
presented. The idea that there could be an optimal kind best for delivering information in an ambient manner. In
of music for this sort of display has been the source of some considering the construction of our instrument we felt that
debate within our group. the right place to start was with a highly resonant, slowly
The need to focus on tangible representations required changing, sound source.
that we could not simply place audio speakers into the pub-
lic space and provide the information as disembodied sound.
Instead, we had to build a physical musical instrument that 5. IMPLEMENTATION
was capable of conveying musical information in a subtle Our instrument consists of five tuned bars (sound ele-
manner. We decided to take our inspiration from Eric Singer’s ments) of slate stone mounted to a hollow wooden box with
282
a hole beneath each bar to amplify the sound. Beneath each The final solution involved welding a small spring to the base
of the bars is a single solenoid that can be activated to strike of the plunger, and attach the opposite end of the spring to
the bar, producing sound. The solenoids are controlled pro- a cap which is fitted over the back of the solenoid. This so-
grammatically by using an existing hardware platform called lution worked well enough to make our solenoids functional,
“Phidgets” [2]. This setup proved to be more complicated but the resulting assembly produces a bit of extra noise that
than we had anticipated. Individual components of our in- lessens the overall effectiveness as an AIS device. Other re-
strument are discussed below. searchers have proposed different strategies for dealing with
solenoid problems, although some of these were not applica-
5.1 Physical Aspects ble in our case [3].
Upon acquiring our sound elements, we tested them on The Phidgets platform works very well as a means to con-
a standard xylophone mounting and found that the sound trol the solenoids we manufactured for our instrument. We
produced had a mellow quality, and decay rate similar to were able to make use of the Phidget Interface Kit 8/8/8
a marimba. To amplify the sound and contain the elec- [2] to control three Dual Relay Boards in combination with
tronic components, a resonant box was constructed out of a 24 Volt power supply. The only drawback to using the
1/4” thick particle board with 3” holes below each sound Phidget Dual Relay Boards is that they produce some extra
element. After mounting the sound elements we found that noise. Specifically, a “click” can be heard when the board
the density of the wood, and the mechanism used to mount switches between the on and off position. This had the effect
the sound elements vertically, had an effect on the decay of giving our instrument a sound that has some similarities
of the sound elements. We assumed that part of the prob- to a pinball machine, which we are not completely certain is
lem was the thickness of the wood, so a second box was appropriate as an sound-based ambient information source.
constructed from 1/8 inch wood stock. This improved the The company that developed the Phidgets platform has just
sound by increasing the overall decay rate, but the act of released a new component that is similar to the Dual Re-
mounting the sound elements vertically still caused them to lay Board, but this model makes use of a solid-state switch
lose some resonance. In our second iteration on the instru- which operates silently, and that we believe will solve this
ment’s construction, we designed a new mounting bracket problem.
that pinched the drilled mounting holes in the sound ele-
ments between two small pieces of foam not much larger 6. MUSICAL CHARACTERISTICS
than the holes themselves. This provided some improve- There are a number of musical factors that must be care-
ment over the initial mounting. We are still experimenting fully considered when designing an ambient musical instru-
with better ways to mount these sorts of sound elements ver- ment. One strategy for building an ambient instrument
tically, so that they produce the same sound as when they could be to use specific melodies or well-known songs. How-
are mounted horizontally. ever, the difficulty with this approach is that these kind of
musical materials bring in a variety of distracting cultural
5.2 Working with Solenoids and Phidgets and semantic associations, as can be seen in the current
The primary difficulty in automating our instrument was “ringtone” phenomenon with cellular phones, that we feel
in acquiring solenoids that would best suit our purposes. A would bias our results. Instead, the approach used here has
solenoid is a device that can convert energy into a linear been to use somewhat non-descript musical materials, and
motion by making use of a simple electromagnet. Inside to transmit information to the user using changes in global
the electromagnet is a simple piston that is drawn in when musical characteristics. For instance, changes in tempo, ac-
power is passed through the magnetic coil. Solenoids can tivity levels, repetitiveness of pitches or regularity of rhythm
be categorized as either pull-type, or push-type (sometimes can be used to indicate changes in some aspect of the infor-
called thrust-type), depending on the motion they create. mation that is being represented. Additionally, there is the
The push-type solenoids differ from the pull-type only in need to supply musical materials over long periods of time,
that another smaller piston is attached to the primary so such as entire days or perhaps even weeks at a time.
that when the primary is drawn in, the other pushes out the However, music that is completely redundant, such as an
opposite direction. endlessly repeating rising scale, can be extremely tiring for a
Solenoids like these are used for everything from control- listener. A chorus that is repeated multiple times may be ac-
ling automated car locks, to operating soda vending ma- ceptable within the confines of a three minute pop song, but
chines. The companies that produce these devices will make is likely to be unacceptable for longer periods of time. Al-
custom orders to match the needs of a particular project, but ternately, and as has been pointed out by other researchers
these companies normally expect very large orders in order [5], completely random music with no predictable attributes
to do so. For someone doing a project like ours, were we also has a tendency to be extremely tiring for a listener. This
need only 5 to 10 solenoids, purchases are likely going to is perhaps in part because the only predictable attribute is
be done through surplus retailers, and only the basic model that it is unpredictable. The long-term characteristics of the
will be available. A basic solenoid model will consist of only instrument must sit somewhere between these two extremes.
the plunger and the frame. If the plunger is placed half way The ideal music, for our purposes, should not carry recog-
into the frame, and power is applied, the plunger will quickly nizable cultural attributes, be able to transmit information
force itself into the stop position. Making this mechanism through large changes in style, have a partial degree of re-
useful requires the additional construction of a return spring dundency, and be able to run continuously for extremely
to move the plunger back to the start position when the long periods of time.
power is turned off. Without access to specialized drilling
machines, we attempted several less-than-optimal ways to 6.1 Markov Chains
create a functioning return spring for each of our solenoids. Discrete Markov chains have a long history in computer
283
music and algorithmic composition [1], and are ideally suited proven valuable in both the planning of the experiment, as
to our purposes. well as the next version of the instrument. The most preva-
A discrete Markov chain is a discrete-time stochastic pro- lent feature that has been mentioned, and that we would like
cess that can be used to model a series of events, where each to introduce (besides fixing the aforementioned click prob-
event is assumed to belong to one of a finite set of unique lem) would be to have better control over the volume of
states. The entire process must always be in a single state individual notes, which is difficult with our present solenoid-
at any one time, and will change state based on some kind based system. The present version has convinced us that we
of received information. One interesting aspect of Markov are moving in the right direction and that the AMIS concept
chains is the underlying Markov property, which assumes shows much promise.
that future states only depend on the present state, and not
on previous states.
P (Xn+1 |X0 , X1 , X2 , X3 , ..., Xn ) = P (Xn+1 |Xn )

Each transition between states has a fixed probability,
ranging from 0 to 1, and all outgoing probabilities from
any state must sum to 1, forming a probability distribu-
tion. These probabilities can be learned from data, or can
be assigned directly by some other means.
The transition probabilities for a Markov chain, for ex-
ample a random walk procedure, can be indicated using a
transition matrix, as in Figure 6.1.
2 3 Figure 1: Current implementation, front and back
a b c d
6a 0 1 0 0 7
6 7
S = 6b .5 0 .5 0 7
4 c 0 .5 0 .55 8. REFERENCES
d 0 0 0 1 [1] C. Ames. The markov process as a compositional
model: A survey and tutorial. Leonardo, 22(2):175–87,
Here, each element Si,j represents a particular transition
1989.
probability, where i, the row, is the current state and j, the
column, is a possible future state. One of the advantages [2] P. Inc. Unique and easy to use usb interfaces, Jan 2008.
of the matrix approach is the ease by which probabilities [3] A. Kapur. A comparison of solenoid-based strategies for
can be altered; for instance, adding a link from d to a is as robotic drumming. In Proceedings of the International
simple as filling in a value at S4,1 and normalizing the row. Computer Music Conference, pages 393–6.
Markov chains are ideal for our purposes, for two main rea- International Computer Music Association, 2007.
sons. First, they can be used to simulate various degrees of [4] J. Lanza. Elevator Music. University of Michigan Press,
“stochasticity” including completely randomness (by setting Ann Arbor, 2004.
all output transitions to the same amount) to completely [5] G. Ligeti. Metamorphoses of musical form. Die Riehe,
predictable, and any degree between. Secondly, Markov 7:5–19, 1964.
chains have the ability to continuously generate a stream [6] P. Olivier, H. Cao, S. W. Gilroy, and D. G. Jackson.
of data tokens. Normally, termination of the chain is set by Crossmodal Ambient Displays, pages 3–16. August
either specifying an end state or a fixed number of tokens to 2007.
generate. If neither of those conditions is set, the chain will [7] Z. Pousman and J. Stasko. A taxonomy of ambient
continue to generate tokens without stopping. This is equiv- information systems: four patterns of design. In AVI
alent to simply repeating the state-selection process inside ’06: Proceedings of the working conference on Advanced
an infinite loop. visual interfaces, pages 67–74, New York, NY, USA,
For this experiment we have used two Markov chains, one 2006. ACM.
for controlling generation of new pitches and one for control- [8] E. Singer. League of electronic musical urban robots,
ling durations, and encoding each separate pitch or duration Jan 2008.
as a separate state in each respective chain. Changes in the [9] M. Weiser. The computer for the 21 century. Scientific
musical behavior are accomplished by loading new probabil- American, 256(3):94–104, 1992.
ities into the transition matrix, which immediately results
in a change in instrument behavior. Additionally, a stan-
dard time unit (an eighth note) is specified in milliseconds,
and duration states are specified as multiples or divisions of
that. This is used to provide a way to globally alter tempo,
without disturbing the other musical characteristics.
7. EVALUATION
Our current, implemented device is shown in Figure 1.
While more-extensive listener studies are planned, we have
already tested the instrument casually with a number of
listeners. We have received a range of opinions that have
284
The Elbow Piano:

Sonification of Piano Playing Movements
Aristotelis Hadjakos Erwin Aitenbichler Max Mühlhäuser
TU Darmstadt TU Darmstadt TU Darmstadt
Hochschulstr. 10 Hochschhulstr. 10 Hochschulstr. 10
64289 Darmstadt 64289 Darmstadt 64289 Darmstadt
+49 6151 16 6670 +49 6151 16 2259 +49 6151 16 4557
telis@tk.informatik. erwin@tk.informatik. max@tk.informatik.
tu-darmstadt.de tu-darmstadt.de tu-darmstadt.de
ABSTRACT of our evaluation are presented in Section 4 and conclusions in
The Elbow Piano distinguishes two types of piano touch: a touch section 5. Finally, we discuss opportunities for future work in
with movement in the elbow joint and a touch without. A played Section 6.
note is first mapped to the left or right hand by visual tracking.
Custom-built goniometers attached to the player's arms are used 2. RELATED WORK
to detect the type of touch. The two different types of touches Systems for music practice education that use sensor data to
are sonified by different instrument sounds. This gives the sonify playing movements have been developed for different
player an increased awareness of his elbow movements, which is instruments. The 3D Augmented Mirror [7] is an example of
considered valuable for piano education. We have implemented such a system for a bowed string instrument. The 3D
the system and evaluated it with a group of music students. Augmented Mirror provides the user with visual and auditive
feedback based on the input of a visual tracking system.
Keywords Piano teaching systems can be classified by the type of input
Piano, education, sonification, feedback, gesture. they receive from the user. Recent piano teaching systems can
be classified to pure MIDI systems and systems that receive
1. INTRODUCTION physiological or physical data.
A pianist can use different combinations of movements in
different joints to perform a touch [4]. A touch can be performed 2.1 Pure MIDI Piano Teaching Systems
by the isolated movement of a finger while wrist, elbow, The Piano Tutor [3] uses score following to find errors, to
shoulder, and the body support the finger without active provide accompaniments, and to turn pages. The system gives
movement. Alternatively, a pianist can execute a touch using the user feedback using a combination of video, notation, voice,
movement in one of the mentioned joints. For example, a pianist music, and graphics. An expert system module monitors the
can slightly fixate fingers and use the movement of the wrist, success of the student and suggests easier tasks if necessary.
while elbow, shoulder, and body do not move. A pianist can also
The pianoFORTE system [13] visualizes tempo, articulation and
use a combination of activity in the different joints.
dynamics of a performance. Tempo is visualized by a speed-o-
It has been argued in the field of piano pedagogy that awareness meter, dynamics by the color of the notes. To visualize the
of the playing movements can be beneficial. S. Bernstein, for articulation, the lengths of the played notes are marked in the
example, states that becoming aware of the playing movements score. The MIDIATOR [12] compares a student‘s performance
is one of the key goals a pianist should pursue [1]. By giving the to the score or a previous performance of the piece and
pianist more consciousness of her playing movements, she can visualizes differences of tempo, note volume, note duration, and
find movements that better fit the musical and technical articulation.
demands, and gain more confidence to master stress situations,
The practice tool for pianists by Goebl and Widmer [6]
like public performance.
generates visual feedback from MIDI input in real-time. The
This paper is structured as follows. First, different piano practice tool finds reoccurring patterns by autocorrelation. The
teaching systems related to this work are discussed in Section 2. student can see timing deviations between successive patterns
The architecture, the main features, and the components of the that indicate uneven play. Other visualizations display beats,
Elbow Piano system are described in Section 3. Next, the results time deviations between chord notes, and a piano roll overview.
The Intelligent Virtual Piano Tutor [8] is a system that suggests
fingerings for received MIDI sequences. The suggested
fingering is animated by a 3D virtual pianist.
copies bear this notice and the full citation on the first page. To copy 2.2 Physiological- and Physical-Data Based
requires prior specific permission and/or a fee. Piano Teaching Systems
NIME08, June 4-8, 2008, Genova, Italy Montes et al. used EMG biofeedback to teach thumb touches
Copyright remains with the author(s). [9]. Electrodes were placed on the abductor pollicis brevis
285
muscle. While a student performs thumb touches, the amount of horizontal image coordinates. For that reason, it needs to know
muscular activity is shown on a screen. In comparison to a the horizontal position of the left and right end of the claviature,
control group that received traditional training of the thumb as well as the vertical coordinate of the front. These positions
attack, a biofeedback group was able to match the muscle have to be configured once by the user using a GUI.
activity pattern of professional pianists better.
The keys are mapped to horizontal positions by linear
A multimodal feedback system is used by Riley to improve interpolation. Each key corresponds to an area with the width of
piano lessons [11]. The system can record and replay MIDI, 1/88 of the width of the claviature. Although this approximation
EMG, and video. The video and MIDI output is synchronized does not reflect the structure of the claviature, it is sufficiently
with a piano roll of the performance. accurate for the Elbow Piano.
Mora et al. developed a system that overlays a 3D mesh of a The visual tracking is locked on the hands when the user plays
suggested posture over a video of the student's performance the tuning chord. This chord consists of two black keys per hand
[10]. The student can see the differences and adopt the (each hand plays a f#-c#-chord on different octaves). When the
suggested posture. To generate the 3D mesh, the posture of a tuning chord is played, the approximate positions of the hands
professional pianist was recorded using motion capturing. are estimated by the system. Two areas, which are located in
front of the claviature at the horizontal positions of the hands,
3. ELBOW PIANO are used to calculate the histograms of the skin colors (for each
hand separately) and serve as initial search windows for the
A user who wants to practice with the Elbow Piano sits at the
tracking algorithm.
keyboard and attaches the goniometers to her arms. She then
plays a tuning chord, which is necessary to initialize the visual The visual tracking of the hands is done with the OpenCV
tracking system. Visual tracking of the hands is used to assign implementation of the CAMSHIFT algorithm. CAMSHIFT [2]
the goniometer measurements to the notes played. In the climbs the gradient of the probability distribution, which is
following, we describe a typical use case of our system. computed using a histogram, to adjust the position of the search
window. CAMSHIFT continuously changes the size of the
The user starts playing. Sometimes the user performs elbow-
search window. Therefore, the entire hand of the Elbow Piano
touches, sometimes the user avoids them. Always when the user
user is tracked after some iterations of the algorithm. Because of
plays an elbow-touch, the system plays the elbow-touch sound.
the operating principle of the CAMSHIFT, the color of the floor
Now, if the user unintentionally plays an elbow-touch, the
and the clothing have to be sufficiently distinct from skin color
system will also produce the elbow-touch sound. The user can
and it is necessary that the user wears long-sleeved clothing.
stop playing at this point and use a graphical visualization to
analyze this condition. After some time, the user continues The Elbow Piano segments the claviature into a part for the left
playing. hand and a part for the right hand. For this purpose, the system
examines the rightmost pixel assigned to the left hand and the
3.1 Hardware Setup and Software leftmost pixel assigned to the right and determines the middle.
Each note is assigned to the left or right hand by comparing its
Architecture Overview position to the middle.
The Elbow Piano consists of sensors, which are connected to a The user receives visual feedback about the hand tracking
computer, and a software that analyzes the incoming data stream module (Figure 1). The hands are surrounded by a circle in the
and controls an attached synthesizer. The sensor hardware of the image from the webcam.
Elbow Piano consists of a MIDI keyboard, a webcam placed
above the keyboard, and a pair of self-built goniometers. The
webcam is used to visually track the hands. The goniometers
provide data about the angles in the elbow joints.
When the system receives a note-on event from the keyboard,
the system assigns the note to the left or right hand. This is done
by means of visual tracking (section 4.2). The history of
goniometer data of the identified hand is examined to determine
if the key was pressed with activity of the elbow joint (section
4.3), or not.
3.2 Visual Tracking

Mapping MIDI data to hands without additional information is
only possible to some extent as the mapping is ambiguous.
Therefore, visual tracking was used.
A webcam (Genius Slim 1322AF) mounted above the keyboard
is monitoring the entire claviature. Before the user can start
playing, she has to lock the visual tracking on her hands. This is
done by playing a predefined tuning chord. To obtain the image
coordinates of the hands, the system has to map from keys to Figure 1. Hands Are Tracked
286
3.3 Recognition of Elbow Activity

Based on the data from the goniometers, the Elbow Piano
decides whether a touch was performed with movement in the
elbow joint, or not.
3.3.1 Goniometers
A pair of custom-built goniometers (Figure 3) provide the
computer with data about the angles in the elbow joints. Each
goniometer consists of a potentiometer and two plastic strips.
The plastic strips are connected to the axis of the potentiometer Figure 3. Visualization of Figure 4. Visualization of a
so that the motion of the plastic strips is transferred to the an elbow touch non-elbow touch
potentiometer. The potentiometer has a aluminum case and can
therefore sustain the occurring physical forces. Velcro fasteners visualizations. To assist the user to navigate, a separate view is
are attached to the plastic strips and are used to mount them on a provided for each hand. Furthermore, the graphs are stacked
suited pullover. The goniometers are additionally fixated by when a hand plays a chord, i.e., if two notes are received within
rubber bands. The potentiometers are wired up as voltage 0.1 seconds.
dividers and are connected to A/D converters. The digital signal
is transmitted to the computer via USB with a rate of up to 100
3.4 Sound Generation
Hz (values are transmitted on change only). The A/D converter When an elbow-touch is recognized, the system passes the note-
used is a CreateUSB board. on MIDI message that was received from the keyboard to the
connected (software or hardware) synthesizer. The system
3.3.2 Decision changes the channel of the MIDI message to reflect which arm
The goniometer data is continuously stored with the executed the elbow-touch. By configuring the synthesizer to
corresponding timestamps. When the keyboard sends a note-on play different instruments on these channels, the Elbow Piano
message, the sensor data log of the arm that produced the tone is can play different sounds for the left and right arm. The
examined. The latest 0.2 seconds of the sensor data is analyzed. generated sound effects can be prolonged by using the sustain
Considering the rapid sequences of movements that can occur in pedal of the MIDI keyboard.
piano playing, the choice of this rather large time interval is
reasonable, because elbow-touches cannot be done (much) more 4. EVALUATION
rapidly. The lowest angle in the elbow during that time interval
and the last measured angle are compared. If the difference of The Elbow Piano was evaluated with students of the HfMDK
these angles exceeds a predefined threshold, the touch is Frankfurt (University of Music and Performing Arts Frankfurt).
classified as a touch with elbow activity. Four pianists (professional level), one composer (advanced
level), and one singer (intermediate level) participated in the
3.3.3 Decision Visualization user study. The system was briefly explained to each participant.
Each participant then played pieces of her/his own choice with
To provide feedback to the user, a visualization of the decision the Elbow Piano. Afterwards, the participant filled out a
process was developed. The Elbow View (Figure 4) shows the questionnaire and was interviewed. The questionnaire contained
goniometer data that was used to decide whether the touch was different statements, which were rated by the participants on a
executed with activity in the elbow joint, or not. The graph is scale from 1 (disagree) to 5 (agree) (Table 1).
inverted along the y-axis to provide a more intuitive
visualization. The graph of an elbow touch begins with high It was evident that the participants improved their ability to
values and slopes to the right; this corresponds to the movement control the system during the sessions. At the beginning, the
of the forearm performing an elbow-touch, which starts at a high participants tended to imitate the mere appearance of the
position and then moves downwards. Relevant information motion, which had been shown to them. The participants often
about the graph and the decision is provided to the user: the did not consistently use the elbow joint to move the fingers but
lowest angle, the last angle, the difference between the two and used the wrist, the shoulder and (in one occasion) the back
the result of the decision. The user can access past instead. During the session, the participants moved more
consistently and could therefore control the system better. The
participants expected that they could learn to control the system
better if they have had more time to practice with it and they
stated that the system increased their awareness of arm
movements. Despite all that, all but one participants did rather
not want to use the system for practice or teaching because the
system focuses only on one specific aspect of piano technique
and can therefore not (yet) be integrated into a piano syllabus.
Overall, we received very positive feedback and were
encouraged to continue with our approach.
Figure 2. Goniometer
287
Table 1. Questionnaire angles in the elbows. Measured change of posture could be used
Statement Score (Avg.) to clean the goniometer before the recognition method is
applied.
I have good control of the sound. 3.5 of 5 We are currently exploring the use of different sensors to
generalize the presented approach to other playing movements.
I would learn to control the system better if I 4.8 of 5
had more time to practice with it.
7. REFERENCES
I am more aware of the movement of my arm 4.5 of 5 [1] Bernstein, S. Twenty Lessons In Keyboard Choreography.
when using the system. Seymour Bernstein Music, 1991
[2] Bradski, G. R. Real Time Face and Object Tracking as a
Using the system is fun. 4.2 of 5
Component of a Perceptual User Interface. In Fourth IEEE
I would use the system to practice or teach the 2.5 of 5 Workshop on Applications of Computer Vision (WACV
piano. '98), 1998
[3] Dannenberg, R. B., Sanchez, M., Joseph, A., Joseph, R.,
Saul, R., and Capell, P. Results from the Piano Tutor
Project. In Proceedings of the Fourth Biennial Arts and
5. CONCLUSIONS Technology Symposium, 1993
Awareness of playing movements can be beneficial for [4] Gat, J. The Technique of Piano Playing, Collet‘s Holding,
instrumental performance. The Elbow Piano distinguishes two London, 1965
types of touch: a touch with movement in the elbow joint and a [5] Gorodnichy, D. O. , and Yogeswaran, A. Detection and
touch without. Therefore, the Elbow Piano can increase tracking of pianist hands and fingers. In Proceedings of the
awareness of these movements with possible beneficial effects The 3rd Canadian Conference on Computer and Robot
on normal piano performance. Vision (CRV'06), 2006
The Elbow Piano consists of a MIDI keyboard, a webcam, a pair [6] Goebl, W., and Widmer, G. Unobstrusive Practice Tools for
of custom-built goniometers, a computer to which these sensors Pianists. In Proceedings of the 9th International Conference
are connected to and a software that analyzes the incoming data on Music Perception and Cognition (ICMPC9), 2006
stream and controls an attached synthesizer. The system uses [7] Ng, K., Weyde, T., Larkin, O., Neubarth, K., Koerselman,
visual tracking to find the positions of the hands on the T., and B. Ong. 3D Augmented Mirror: A Multimodal
keyboard. On each keypress, the goniometer data of the matched Interface for String Instrument Learning and Teaching with
arm is evaluated and the system decides what type of touch the Gesture Support. In ICMI '07: Proceedings of the 9th
user executed. The user gets visual feedback about the decision international conference on Multimodal interfaces, 2007
process and can evaluate the decision of the system.
[8] Lin, C., and Liu, D. S. An Intelligent Virtual Piano Tutor.
A user study with music students of the HfMDK Frankfurt was In Proceedings of the 2006 ACM international conference
conducted. The participants learned to better control their arm on Virtual reality continuum and its applications, 2006
movements during the sessions. Despite that, most participants
[9] Montes, R., Bedmar, M., and Martin, M. S. EMG
did not want to use the system to practice or teach the piano.
Biofeedback of the Abductor Pollicis in Piano Performance
Although a conservative attitude might be a minor factor for this
Brevis. In Biofeedback and Self-Regulation, 2, 18, 1993
result, we think that the our approach needs to be integrated into
a systematic piano syllabus to make it more convincing. To this [10] Mora, J., Lee, W., Comeau, G., Shirmohammadi, S., and
end, we are currently applying the presented approach to other Saddik, A. E. Assisted Piano Pedagogy through 3D
playing movements. Visualization of Piano Playing. In HAVE 2006 - IEEE
International Workshop on Haptic Audio Visual
6. FUTURE WORK Environments and their Application, 2006
Gorodnichy and Yogeswaran developed a system that allows to [11] Riley, K., Coons, E. E., and Marcarian, D. The Use of
track hands, fingers and the position of the keyboard in the Multimodal Feedback in Retraining Complex Technical
images of a camera placed above the keyboard [5]. The visual Skills of Piano Performance. Medical Problems of
tracking of the Elbow Piano could be improved using this Performing Artists, 20, 2, 2005
approach and the user would not need to configure the position
of the keyboard. [12] Shirmohammadi, S., Khanafar, A., and Comeau, G.
MIDIATOR: A Tool for Analyzing Students' Piano
Movements in the elbow joints are not only performed to press a Performance. In Revue de recherche en éducation musicale,
key downwards. They also occur when the player moves the 2, 2006
hand forwards, backwards or sideways. These changes of elbow
[13] Smoliar, S. W., Waterworth, J. A., and Kellock, P. R.
angles could be estimated using information gained by visual
pianoFORTE: A System for Piano Education Beyond
tracking of the hands. The goniometer data input could be
Notation Literacy. In MULTIMEDIA '95: Proceedings of
cleaned from this effect before the recognition method is
the third ACM International Conference on Multimedia,
applied. The posture of the player has also an effect on the
1995
288
UnitKeyboard: An Easily Configurable Compact Clavier
Yoshinari TAKEGAWA Tsutomu TERADA Masahiko TSUKAMOTO

Kobe University, Japan Kobe University, Japan Kobe University, Japan
take@eedept.kobe-u.ac.jp tsutomu@eedept.kobe-u.ac.jp tuka@kobe-u.ac.jp
ABSTRACT
(a) BaseUnit
Musical keyboard instruments have a long history, which
resulted in many kinds of keyboards (claviers) today. Since 1 octave higher diapason
the hardware of conventional musical keyboards cannot be than that of the BaseUnit
changed, such as the number of keys, musicians have to

carry these large keyboards for playing music that requires
only a small diapason. To solve this problem, the goal of
our study is to construct UnitKeyboard, which has only 12 (b) 2 octaves higher diapason
keys (7 white keys and 5 black keys) and connectors for than that of the BaseUnit
docking with other UnitKeyboards. We can build various BaseUnit
kinds of musical keyboard configurations by connecting one

UnitKeyboard to others, since they have automatic settings 1 octave higher diapason
than that of the BaseUnit
for multiple keyboard instruments. We discuss the usability
of the UnitKeyboard from reviews by several amateur and
professional pianists who used the UnitKeyboard. (c) BaseUnit EnhancedUnit
2 octaves higher diapason

Keywords than that of the BaseUnit
Portable keyboard instruments, block interface, Automatic

settings Figure 1: Combination examples of UnitKeyboard
keyboards, we can build various kinds of keyboard config-

1. INTRODUCTION urations by connecting a UnitKeyboard to other UnitKey-
Musical keyboard instrument has a long history, resulting boards. Since they have automatic settings considering the
in many kinds of keyboards today (ex. piano, choir organ, relationship among UnitKeyboards, and intuitive controls
and accordion). Moreover, there are many kinds of musical using sensors and actuators. Because of these special func-
forms in classical piano performance: solo, which is played tions, UnitKeyboard is a flexibly instrument for playing mu-
by one performer, piano duet, which is a performance by two sic.
performers with a single piano, piano duo, which is a per-
formance by two performers with two pianos, and ensemble,
which is a performance by multiple groups that consist of 2. DESIGN
two or more musicians. A UnitKeyboard is a keyboard equipped with 12 keys and
At the same time, various kinds of electronic musical in- also 4 connectors for connecting to other UnitKeyboards.
struments have been developed. These instruments have Also various kinds of keyboards can be simulated with them.
many kinds of functions, such as diapason change and tone For example, we can construct a keyboard of two octaves
change. Since conventional musical keyboards cannot change by connecting two UnitKeyboards horizontally, as shown in
their hardware configuration, such as the number of keys, Figure 1-(a). Moreover, we can construct an organ that has
musicians have to carry large keyboards for playing music two manuals by connecting two UnitKeyboards vertically as
that requires only a small diapason. Moreover, it is difficult shown in Figure 1-(b). We can also increase the diapason
to adjust to various kinds of keyboard instruments. For ex- by connecting an EnhancedUnit, which has various kinds
ample, musicians cannot play music for the organ with a of functions, between UnitKeyboards as shown in Figure
digital piano with 88 keys. 1-(c).
The goal of our study is to construct UnitKeyboard, which
has only 12 keys (7 white keys and 5 black keys) and 4 con- 2.1 Characteristics of UnitKeyboard
nectors for docking with other UnitKeyboards. With these
2.1.1 Automatic Settings
We can build various kinds of keyboard instruments by
docking multiple UnitKeyboards. However, users need to
Permission to make digital or hard copies of all or part of this work for configure various kinds of settings for each UnitKeyboard.
personal or classroom use is granted without fee provided that copies are To reduce the setting time, we propose an automatic setting
not made or distributed for profit or commercial advantage and that copies algorithm.
republish, to post on servers or to redistribute to lists, requires prior specific
Connection position A UnitKeyboard is equipped with
NIME08,June 5-8, 2007, Genova, Italy one connector on each side, left, right, top, and bottom,
Copyright 2008 Copyright remains with the author(s). for connecting to other UnitKeyboards. Assignments of the
289
Sound Generator
Wireless Module
Requirements of Unit ID etc…
Host
Data of Unit ID, Keying data etc… Wireless Module

sensors etc…
Highend Simple
EnhancedUnit EnhancedUnit
Figure 2: System structure
tone and the diapason for each UnitKeyboard depend on

the configuration of the connections. Generally, single man-
ual keyboards like the piano, have characteristics that the
more left/right the position of a key, the lower/higher its
pitch, and all of the keys have the same tone. Therefore, a
UnitKeyboard horizontally connected to a BaseUnit, which
controls the base settings such as the tone and the diapa-
son, inherits the tone of the BaseUnit, and a diapason of the
UnitKeyboard increases one octave based on the diapason
of the BaseUnit as shown in Figure 1. On the other hand, a
UnitKeyboard vertically connected to the BaseUnit has the
same diapason as that of the BaseUnit, and the tone of the
UnitKeyboard is independent from that of the BaseUnit.
Priority Between a BaseUnit and a non-BaseUnit, there Figure 3: A snapshot of UnitKeyboard

is a hierarchical relationship, that is, the settings of the
non-BaseUnit inherit those of the BaseUnit. We define this Figure 3 shows a snapshot of a UnitKeyboard. We im-
as priority. This is similar to an ensemble, where the mul- plemented the system using Microsoft Visual C++ .NET
tiple sections have section leaders or there is a conductor 2003, and we use a Sony Vaio VGN-S92PS, with the Win-
of the entire ensemble. Our system automatically assigns dows XP platform as the host, Allow7 UM-100 as a wireless
UnitKeyboard as low priority based on settings of a high module, Roland SC-8820 as a MIDI sound generator, and
priority UnitKeyboard. M-audio OXYGEN8 as the keyboard. OXYGEN8 has 25
2.1.2 Real-Time reconfiguration keys but we cut one in half to make the 12-keys. We use
a programmable integrated circuit (PIC) microcomputer
Since there may be cases where the configurations and (PIC16F873) to control the UnitKeyboard and Enhance-
connection statuses of the UnitKeyboards should be changed dUnit. The software on the PIC is programmed in C lan-
during the performance, the system needs to detect them guage on Microchip Technology’s MPLAB.
and reconfigures the settings of the UnitKeyboards in real-
time. 3.1 Host
We discuss the system design for fast real-time processing
from the views points of data management. In the prototype, we used a PC as the host. The functions
of the host are as follows.
Data management In a UnitKeyboard system, there are
various kinds of system data: connection data to manage Management of setting data The host manages the
the connection relationships among UnitKeyboards, setting setting data of each Unit. Note that a Unit includes teh
data for setting the diapason and the tone of each UnitKey- UnitKeyboard and the EnhancedUnit.
board, and keying data that is generated when keys of a Management of connection statuses The host directly
UnitKeyboard are pressed/released. manages the connection statuses of all the Units. Moreover,
If each UnitKeyboard manages its own settings, each the host calculates the setting data of each Unit’s configu-
UnitKeyboard sends a connection change message to all ration from the connection data of all the Units.
the UnitKeyboards. Because the CPU and memory in a
UnitKeyboard is limited, it is difficult to do this in real- Process of sound generation The host generates a
time. MIDI Note On/Off messages based on the setting data of
Therefore, we use a computer as the “host” to calculate the Units and keying data sent from a UnitKeyboard.
the connection statuses, setting statuses for all UnitKey-
boards in the system. 3.2 UnitKeyboard
The hardware structure of a Unit is shown in Figure 4. A
3. PROTOTYPE SYSTEM UnitKeyboard consists of a PIC, a 12-key keyboard connec-
Figure 2 shows the structure of the prototype system. tors on all four sides, and a wireless module to communicate
It consists of a host, UnitKeyboards, and EnhancedUnits. to the host. A UnitKeyboard has the following functions.
290
Keyboard (UnitKeyboard Only)
Wireless module
[3] [4] [5] [3] [4] [3] [4]
A number of octave: [*]
[2] [3] [4] [3] [4]
Microcomputer
Connector Input/output devices Figure 5: An EnhancedUnit with electric motor
(EnhancedUnit Only)
board neighboring an EnhancedUnit equipped with distance
Figure 4: The hardware of Unit sensors. For example, the longer the distance between the
UnitKeyboard and the EnhancedUnit, the higher the dia-
pason of the UnitKeyboarda.
Establishing connection to the host A UnitKeyboard
broadcasts a “New Entry” command after it is turned on, Acceleration sensor Users control the tone of UnitKey-
and when the UnitKeyboard receives acknowledgement from boards with the users’ posture that is calculated and de-
the host, it sends an “ID” and “connector data”, such as tected from data of the acceleration sensor.
the number of connectors, to the host.
Motor Users can move UnitKeyboards automatically by
Sending keying data A UnitKeyboard sends keying data using an EnhancedUnit equipped with motors attached to
to the host, when the status of the UnitKeyboard keys is a propeller and wheels. For example, if musicians use an
changed. EnhancedUnit equipped with a motor and wheels, they
can add/subtract a diapason by automatically moving a
Sending connection data A UnitKeyboard sends a “Con-
UnitKeyboard as shown in Figure 5.
nection Status” command to the host, when the status of
its connectors is changed.
4. CONSIDERATIONS
3.3 EnhancedUnit We discuss the usability of proposed UnitKeyboard from
The EnhancedUnit has two models: a simple model that the reviews by 5 amateur pianists and 5 professional pianists
only controls the diapason of a UnitKeyboard and a high- that actually used the UnitKeyboard. We have demon-
end model that is equipped with sensors, actuators, and a strated UnitKeyboard in various kinds of events such as
wireless module to operate settings of the UnitKeyboards. Kobe Luminarie Live Stage on December 8th and 9th, 2007.
The former is inserted between UnitKeyboards to increase It began in 1995 and commemorates the Great Hanshin
the diapason. It has a simple structure that consists of two earthquake of that year about 4 million participants at-
connectors and a variable electric resistance.Since the con- tended last year.
nectors of a UnitKeyboard can measure the change of volt-
age that works with the number of the variable resistance,
4.1 Performance Evaluation
UnitKeyboards that interleave with simple EnhancedUnits
convert the amount of voltage to changing the diapason. Visibility We checked the function that automatically
Figure 4 shows hardware of the high-end EnhancedUnit. assigns the settings of the UnitKeyboard assuming the re-
The main differences between the EnhancedUnit and the lationship among all the UnitKeyboards were working well.
UnitKeyboard are that the EnhancedUnit does not have a The host settled conflicting settings among the UnitKey-
keyboard and has various input/output devices. The high- boards. Moreover, the proposed automatic-assignment al-
end EnhancedUnit has the following functions. gorithm was intuitive from participants’ reviews.
Because he participants could see the connection rela-
Connection to the host The enhancedUnit broadcasts tionships between the UnitKeyboards, it was easy to rec-
a “New Entry” command after the power is turned-on and ognize the relative diapason of each UnitKeyboard. How-
establishes connections with the host just like a UnitKey- ever, it was difficult to recognize the absolute diapason of
board. each UnitKeyboard. In present implementation, partici-
pants could not see the BaseUnit and the diapason of the
Sending connection data The EnhancedUnit monitors BaseUnit. Therefore, participants had to press the keys of
the status of its own connectors, and it sends a “Connection each UnitKeyboard to check the diapason.
Status” command to the host when it detects a change of For future work, we plan to develop an EnhancedUnit
connection just like the UnitKeyboard. with LEDs and a display for checking the settings of the
UnitKeyboard.
Sending of input data from input devices The En-
hancedUnit collects data from input devices, and informs Wireless vs. Wired connections We adopted a wireless
the host of this according to the requirements of the host. connection for communication between the host and the
Units.
Control of output devices The EnhancedUnit controls In the wireless connection, although there was some de-
output devices according to commands sent from the host. lay between the keying to the output sound. The delay was
not so noticeable in the music. However, the more UnitKey-
3.3.1 Input/Output devices boards were used, the higher the possibility was for packet
We developed a high-end EnhancedUnit prototype equipped loss and longer delays.
with various kinds of input/output devices. On the other hand, the delay produced using wired con-
nection was less than that of the wireless connection.
Distance sensor Users can control diapasons of a UnitKey- Because both methods have advantages and disadvan-
291
[1] Anderson, D., Frankel, J., Marks, J., Agarwala, A.,

Beardsley, P., Hodgins, J., Leigh, D., Ryall, K.,
Sullivan, E. and Yedida, J.: “Tangible Interaction
Graphical Interpretation: A New Approach to 3D
Modeling”, In Proceedings of SIGGRAPH 2000,
pp.393–402, 2000.
[2] Gorbet, G. M., Orth, M. and Ishii, H.: “Triangles:
Figure 6: Snapshots of collaborative performance Tangible Interface for Manipulation and Exploration
of Digital Information Topography”, In Proceedings
tages, we will conduct a more detailed evaluation for each of CHI1998, pp.49–56, 1998.
method in future work. [3] Suzuki, H. and Kato, H.: “Interaction-level support
for collaborative learning: AlgoBlock an open
One-octave UnitKeyboard In this study, a UnitKey-
programming language”, In Proceedings of
board had only one octave from C to B. This diapason is
CSCL2002, pp.349–355, 2002.
effective in music of only C major or C minor.We can solve
this problem by using the Mobile Clavier[7], which enables [4] Watanabe, R., Itoh, Y., Asai, M., Kitamura, Y,
a smooth change in diapason. Kishino, F. and Kikuchi, H.: “The Soul of
ActiveCube - Implementing a Flexible, Multimodal,
4.2 New performance Three-Dimensional Spatial Tangible Interface”, In
We conducted performance with UnitKeyboards and En- Proceedings of ACE 2004, pp. 173–180, 2004.
hancedUnits. [5] Henry, D. N., Nakano, H. and Gibson, J.: “Block
As shown in Figure 6, when there was a lack of diapason Jam”, In Proceedings of SIGGRAPH 2002, pp.67,
during the performance, a musician solved it by borrow- 2002.
ing a UnitKeyboard from another performer. Moreover, as [6] Terada, T., Tsukamoto, M. and Nishio, S.: “A
shown in Figure 5, a keyboard moving automatically to a Portable Electric Bass Using Two PDAs”, In
commanded location was visually interesting. These perfor- Proceedings of IWEC 2002, pp. 286–293, 2002.
mances are not only musically entertaining but also visually [7] Takegawa, Y., Terada, T., Tsukamoto, M. and Nishio,
attractive. S.: “Mobile Clavier: New Music Keyboard for
Flexible Key Transpose”, In Proceedings of NIME
4.3 RELATED WORK 2007, pp. 82–87, 2007.
There has been a large amount of research whose main
goal was improving a function by combining simple func-
tional units. For example, users can control an object in a
game by combining LEGO blocks[1], control website brows-
ing by combining triangle boards[2], or control program-
ming with combined blocks[3]. Moreover, there are block
interface equipped input/output devices[4]. These targets
were not musical like our study.
On the other hand, a system whose for music composition
functions by combining blocks assigned for mood music[5].
Moreover, there are systems, DoublePad/Bass[6] and Mo-
bile Clavier[7], which were developed to improve the porta-
bility of acoustic instruments. DoublePad/Bass is base in-
struments using two PDAs. Musicians who play an electric
bass should be able to easily play it. Mobile Clavier en-
ables the smooth change of diapason by allowing additional
black keys to be inserted. These instruments were not de-
signed with concept of combining units or for various kinds
of keyboard/string instruments
5. CONCLUSIONS
We proposed the UnitKeyboard, which can apply vari-
ous kinds of keyboard instruments by connecting one-octave
keyboards together. Moreover, the UnitKeyboard has var-
ious functions such as the automatic settings considering
the relationship among multiple UnitKeyboards, intuitive
controls and new performance using an EnhancedUnit.
We intend to evaluate the hardware and the usability of
the system in the future.
6. ACKNOWLEDGMENTS
This research was supported in part by a Grant-in-Aid for
Scientific Research (A) (17200006) from the Japanese Min-
istry of Education, Culture, Sports, Science and Technol-
ogy, a Grant-in-Aid for Scientific Research from the JSPS
Research Fellowship, and by the Hayao Nakayama Founda-
tion for Science & Technology and Culture.
7. REFERENCES
292
Eight Years of Practice on the Hyper-Flute:

Technological and Musical Perspectives
Cléo Palacio-Quintin
LIAM - Université de Montréal - Montreal, QC, Canada
IDMIL - Input Devices and Music Interaction Laboratory
CIRMMT - Centre for Interdisciplinary Research in Music Media and Technology
McGill University - Montreal, QC, Canada
cleo.palacio-quintin@umontreal.ca
ABSTRACT
After eight years of practice on the first hyper-flute proto-
type (a flute extended with sensors), this article presents
a retrospective of its instrumental practice and the new
developments planned from both technological and musi-
cal perspectives. Design, performance skills, and mapping
strategies are discussed, as well as interactive composition
and improvisation.
Keywords
hyper-instruments, hyper-flute, sensors, gestural control,
mapping, interactive music, composition, improvisation
1. INTRODUCTION
Since 1999, I have been performing on the hyper-flute [13].
Interfaced to a computer by means of electronic sensors and
Max-MSP software, the extended flute enables me to di- Figure 1: The hyper-flute played by Cléo Palacio-
rectly control the digital processing parameters as they af- Quintin. Photograph by Carl Valiquet.
fect the flute’s sound while performing and allows me to
compose unusual electroacoustic soundscapes.
Until now, I mostly used the hyper-flute to perform im- sonorities for the flute in my own compositions. Already
provised music. Wishing to expand a repertoire for the familiar with electroacoustic music and with the use of the
hyper-flute, I began doctoral studies in January 2007 to computer, it was an obvious step to get into playing flute
work on written compositions. Before developing a core with live electronics. My goal was to keep the acoustic rich-
repertoire, I decided to review my experience with the in- ness of the flute and my way of playing it. The computer
strument. would then become a virtual extension of the instrument.
This article presents the original design of the hyper-flute During post-graduate studies in Amsterdam, I had the
and the learning experience of eight years of practice on it. chance to meet the experienced instrument designer Bert
The performance skills and mapping strategies developed Bongers [3]. In 1999, I participated in the Interactive Elec-
over time now suggest new enhancements of the instrument. tronic Music Composition/Performance course with him
Technological and musical issues in the development of a and the meta-trumpeter Jonathan Impett [9] at the Darting-
new prototype of the hyper-flute as well as a hyper-bass- ton International Summer School of Music (U.K.). There,
flute will be discussed. I made my first attempt at putting several sensors on my
flute, programming a Max interface, and performing mu-
2. BACKGROUND sic with it. Several months later, I registered as a student
at the Institute of Sonology in The Hague (The Nether-
2.1 Why, Where and When lands) in order to build my hyper-flute. The prototype of
the hyper-flute was mainly built during the Fall of 1999
By the end of my studies in contemporary flute perfor-
with the help of Lex van den Broek. Bert Bongers was a
mance (Université de Montréal – 1997), I was heavily in-
valuable consultant for the design. He also made the main
volved in improvised music and had started looking for new
connector from the sensors to the Microlab interface.
2.2 Original Design

personal or classroom use is granted without fee provided that copies are 2.2.1 Interface
not made or distributed for profit or commercial advantage and that copies The Microlab is an electronic interface that converts the
bear this notice and the full citation on the first page. To copy otherwise, to voltage variations from various analog sensors (between 0
republish, to post on servers or to redistribute to lists, requires prior specific and 5 volts) into standard MIDI data. It offers 32 ana-
NIME08, Genova, Italy log inputs, a keyboard matrix of 16 keys and an integrated
Copyright 2008 Copyright remains with the author(s). ultrasonic distance measuring device. This interface was
293
thus made because of technical considerations. Some of

Table 1: Sensors installed on the hyper-flute these choices were arbitrary and made without overt musi-
Sensors Parameter
cal considerations. However, most decisions turned out to
1 Ultrasound sensors flute’s distance to computer
be quite pertinent. I will discuss design details and the use
3 Pressure sensors (FSRs) pressure: left hand and thumbs
2 Magnetic field sensors motion of G# and low C# keys of sensors in relationship with the physicality of flute play-
1 Light-dependent resistor ambient light ing. Finally, I will present some of my ideas on performance
2 Mercury tilt switches tilt and rotation of the flute skills and mapping strategies developed over the years.
6 Button switches discrete cues
3.1 Design & Sensors
When designing the hyper-flute some sensors were chosen
originally designed and developed by J. Scherpenisse and simply because they were available. I just had to find a
A.J. van den Broek at the Institute of Sonology. As a stu- place to put them on the flute. This was the case for the
dent there, I had access to the schematics and was able to ultrasound transducer and the light sensor. I also studied
build it myself. the free space available on the instrument and looked for
what sort of sensor I could put there. Since the G# and
2.2.2 Sensors low C# keys are the only levers on the flute with space
There is little free space to put hardware on a flute be- available under them, I installed the magnet sensors in those
cause of the complexity and small size of its key mechanism. two places.
Nevertheless it was possible to install sensors at specific Because it does not compromise the natural movements
strategic locations. Table 1 shows an overview of all the of the fingers and hands for instrumental playing, the ul-
sensors installed on the hyper-flute. trasonic range finder integrated into the Microlab interface
Inspired by Jonathan Impett’s meta-trumpet, I chose to turned out to be one of the most useful controllers. The
put different types of electronic sensors on my flute. “As same benefits comes from the tilt switches which are acti-
far as possible, this is implemented without compromising vated without any interaction of the fingers.
the richness of the instrument and its technique, or adding As there is no movement involved, pressure sensors (FSR)
extraneous techniques for the performer – most of the ac- are considered isometric. These sensors only capture muscle
tions already form part of conventional performance.” (page tension. This made it easier to get used to performing with
148) [9] them. A large FSR is installed under the left hand, which
The most important energy captors are proprioceptive holds the flute by pressing it towards the chin. There is
sensors. These directly relate to instrumental playing. A a constant contact and a continual variation of pressure
performer is always aware of the action of her muscles on on this point of the instrument while playing, though the
the instrument and her physical position. Of course a well pressure is quite controllable.
trained musician is not really concious of these parame- Under the left thumb, a small FSR is placed on the B key.
ters while performing. They become unconscious gestures As this key is used to play, it moves often and is sometimes
though always under her control. To collect gestural data, completely released. This limits the control of the sensor.
a number of proprioceptive sensors have been installed on A third FSR is located under the right thumb holding the
the flute. flute. There is a constant variation of the pressure on the
Several analog sensors send continuous voltage variations three sensors depending on what fingering is being played
to the Microlab which converts them into MIDI Continu- and how the instrument’s balance is kept (for example: if a
ous Controller messages. Ultrasound transducers are used thumb if lifted, the two other holding points will get more
to track the distance of the flute from the computer. An of the weight of the instrument). These pressure sensors
ultrasonic pulsed signal is sent by a transmitter attached to cannot be controlled without interacting with the playing
the computer, and is captured by the receiver attached to but they do not interfere with the normal motion of the
the flute’s footjoint. The Microlab calculates the distance fingers and hands. They capture natural gestures related
based on the speed of sound. Pressure sensors (Force Sens- to the musical content performed.
ing Resistors) are installed on the principal holding points The pressure sensors also interact directly with the but-
of the flute (under the left hand and the two thumbs). Two ton switches. Four of them are located close to the thumbs
magnetic field sensors (Hall Effect) give the exact position and can be reached while playing. The respective thumb’s
of the G# and low C# keys, both operated by the little pressure sensor is thus released when a button is used. The
fingers. A light dependent resistor is positionned on the left thumb cannot reach buttons without compromising the
headjoint of the flute. This photoresistor detects the varia- fingering, while the right thumb is freer. Like the two mer-
tions of ambient light. cury tilt switches, those buttons turned out to be very prac-
Other controllers used on the hyper-flute send discrete tical, even essential, to activate/desactivate various com-
values : on/off Midi Note messages. Two mercury tilt puter processes and to scroll through menus during perfor-
switches are activated by the inclination (moving the footjoint mances. Two extra button switches, not easily reachable
up) and the rotation (turning the headjoint outwards) of the while playing, are located next to the headjoint. In order
instrument. There are also six little button switches which to perform without touching the computer, those switches
can also be considered pressure sensors, but which send two are often used to start and end a piece.
discrete values (on/off) instead of continuous mesurements. The magnet sensors give the exact position of the lever of
Two of them are located on the headjoint, and two are the G# and low C# keys. The small distance of the action
placed close to each of the thumbs and can be reached while of the key is precisely mesured in 95 steps. It is possible to
playing. play with the motion range of the keys and make different
curves for the midi output with quite accurate control. This
is not a standard technique on the flute and it affects the
3. LEARNING EXPERIENCE acoustics of the instrument.
When I built the hyper-flute, I had little knowledge about Because it happened to be around at the time, a light
augmented instruments, and hardly any experience with sensor was installed on the instrument. I expected to use
human-computer interaction. Several choices of design were it with stage lighting. However, staging with a lighting rig
294
is quite uncommun when performing improvised electronic

music. I have used it only once in 8 years. Realistically,
I cannot control the ambient light myself, so this sensor is
not really relevant.
Over the years, the entire design of the hyper-flute proved
to be quite robust. Everything still works as well as on the
first day. The force sensing resistors need to be replaced
(more or less every 2 years) but all the other parts are still
the original ones. The Microlab interface is also very sta-
ble and reliable. Even as the MIDI protocol is becoming
obsolete and slow compared to new standards, the stability
of the interface has been a good help in developing perfor-
mance skills for the long term.
3.2 Performance Skills Figure 2: Example of multiparametric mapping of

The detailed physical control required to perform on tra- inputs and parameters to control the acoustic flute
ditional acoustic instruments takes time to learn. I spent sound
more than 15 years developing my instrumental skills. While
playing an acoustic instrument, all performers receive me-
chanical feedback cues via a variety of physiological and the sensors. All gestures need to be integrated in order to
perceptual signals. Haptic sensations include tactile and ki- achieve expressive performances.
naesthetic perception. Kinaesthetic perception is the aware- Just like learning an acoustic instrument, it is necessary
ness of the body state, including position, velocity and to play on an electroacoustic instrument for a long period
forces supplied by the muscles. The auditory feedback is ob- of time before achieving a natural control of the sound. As
viously very important but the physical sensation of playing on any musical instrument, expressivity is directly linked to
comes before the perception of the sound. virtuosity [7]. But in order for this to happen on the elec-
While extending my flute sound with computer process- troacoustic instrument, the mappings of gesture to sound
ing, I wanted to keep the same subtle control. It was obvi- must also remain stable.
ous that I should use my already refined instrumental skills
in order to control the sound processing parameters. How- 3.3 Mapping Strategies
ever, in order to perform proficiently on the hyper-flute, My first attempts at controlling sound processing param-
many extra techniques needed to be developed. eters with the hyper-flute were made by directly coupling
Earlier I mentioned that the ultrasonic device and the tilt each sensor to a specific parameter of sound processing.
switches were very useful because they do not compromise This simple direct mapping approach was soon changed. It
natural movements. However, the movements they capture is almost impossible for a performer to think about many
are not normally necessary for flute playing. The performer different parameters, each controlled separately but simul-
is not trained to consciously notice them. But once these taneously. It implies an analytical cognitive mode of think-
sensors were linked to sound processing parameters, it was ing which is confusing for human beings while performing a
very difficult not to activate something without meaning to. complex task. Thinking in sequential order is very hard for
I had to learn to play completely motionless (which is very a player who is already busy playing an acoustic instrument.
unnatural for a performer) in order to attain the necessary Axel Mulder came to the same conclusion using a body-
control. suit with sensors, and trying to map each joint of the body
In the case of the pressure sensors, they always react ac- to control a single synthesis parameter. “This mapping ap-
cording to the fingerings played. It is almost impossible peared to be very difficult to learn. First of all, human move-
to keep them completely stable, but they are very flexible ments often involve the simultaneous movement of multiple
and the motion of pressing them is natural. The maximum limbs. So, when the intent was to change one or more spe-
values are reachable only with extreme pressure which does cific parameter(s), often other synthesis parameters were
not occur in normal playing although it can be used ex- co-articulated, i.e. also changed unintentionnaly.” (page
pressively. The process of learning to use those sensors has 325) [12]
not been too difficult, as they are normal playing gestures Researchers Hunt and Kirk have done experimental work
simply needing, at times, to be exaggerated. to compare different types of interface mapping for real-time
The control of the little fingers’ magnetic sensors was musical control tasks. This research revealed that “complex
much more difficult to learn. Flutists are trained to push or tasks may need complex interfaces” (page 254) [8], so the use
lift a key very fast as opposed to moving it slowly within its of a multiparametric interface seems to be the best choice on
motion range. After hours of practice, I trained my little the long-term in order to develop an interesting interactive
fingers and can now control those sensors quite accurately. system. The holistic mode of thinking involves looking at
Performing with some of the sensors installed on the hyper- a perceived object as a whole. It relates to spatial thinking
flute was not always compatible with standard flute tech- and is much more appropriate for multi-dimensional gestu-
nique and entailed a long learning process. Playing an ex- ral control.
tended instrument requires a new way of performing. This An acoustic instrument is played in such a multipara-
should be kept in mind by designers of new interfaces. Few metric way. “The resulting mapping of input parameters
performers are willing to put a large amount of energy and to sound parameters in a traditional acoustic instrument
time into learning to perform on a new instrument. resembles a web of interconnections.” (page 235) [8] As il-
Experience showed me how much the interaction between lustrated in Figure 2, the air pressure blown into a flute,
acoustic playing techniques and the motion captured by the which contributes to the pitch, also has an effect on the am-
sensors is intimately connected. Musical gestures need to plitude and timbre of the sound. The pitch is also affected
be thought of as a whole. You cannot simply ask a flutist by other inputs (fingerings, lip position). Each parameter
to play normally and add extra motions to be captured by of the sound is affected by different inputs simultaneously.
295
Combinations of convergent and divergent mappings are different levels of interactivity between each other. We can
always experienced while playing an acoustic instrument. It divide these structures in 3 distinct types:
seems much more appropriate to control complex sound pro-
cessing parameters according to the same principles. These • sound processing transforming the acoustic sound,
highly non-linear mappings take substantial time to learn, • sound synthesis,
but further practice improves control intimacy and compe-
tence of operation. • pre-recorded sound material.
Different sound processing methods demand different ways
of controling them. Mappings must be adapted for each On the hyper-flute, I have focused on the development of
specific situation, and a lot of fine tuning is necessary. I the first type: transforming the flute sound with live digital
experimented with different combinations of direct, conver- processing. However, when looking for new extended flute
gent and divergent mapping, some being more suitable to sonorities, the process also leads to the integration of sound
control specific sound processing patches. As my software synthesis.
evolves for each new piece, no definite mapping is possible. In an improvisational context, the interactive computer
However, I try to keep as much consistency as possible in environment is designed to maximize flexibility in perfor-
the use of sensors, so that the precision of the control is mance. The environnement must give the opportunity to
maintained for each performance. generate, layer and route musical material within a flexi-
ble structure, like an open form composition. Ideally, the
computer environment would give the same improvisational
4. INTERACTIVE COMPOSITION, freedom the performer has developed with his acoustic in-
IMPROVISATION & PERFORMANCE strument. Each performer has his personal repertoire of
instrumental sounds and playing techniques from which he
Joel Chadabe is one of the pionneers of real-time com-
can choose while performing. This sound palette can be
puter music systems. In 1983, he proposed a new method
very wide, and switching from one type of sound to another
of composition called interactive composing, which he de-
is done within milliseconds. Of course, any interactive ges-
fined in the following terms: “An interactive composing
tural interface has a limited number of controllers. The
system operates as an intelligent instrument – intelligent
sound processing patches can only generate the sounds that
in the sense that it responds to a performer in a complex,
have been programmed (even if they include some random
not entirely predictable way, adding information to what
processings). The freedom of the performer is somewhat
a performer specifies and providing cues to the performer
limited by the computer’s environment.
for further actions. The performer, in other words, shares
My long term goal is to develop an interactive sound pro-
control of the music with information that is automatically
cessing palette that is as rich and complex as my instru-
generated by the computer, and that information contains
mental one. I want to improvise freely and to be able to
unpredictable elements to which the performer reacts while
trigger many different processes at anytime, and this with-
performing. The computer responds to the performer and
out disturbing my flute playing. Though there are still pro-
the performer reacts to the computer, and the music takes
gramming issues to be addressed before achieving an ideal
its form through that mutually influencial, interactive rela-
environment, I have always felt more limited by the number
tionship.” (page 144) [5]
of controllers and buttons on the hyper-flute. This has led
From this point of view, the performer also becomes an
me to new developments on the instrument itself.
improviser, structuring his way of playing according to what
he hears and feels while interacting with the computer.
In most cases, users of interactive computer systems are 5. NEW DEVELOPMENTS
at once composer, performer and improviser. Due mostly After eight years of practice, I am now very comfortable
to the novelty of the technology, few experimental hyper- playing the hyper-flute. I have also developed a very good
instruments are built by artists. These artists mostly use knowledge of my musical needs in order to control the live
the instruments themselves. There is no standardized hyper- electronics while performing. Over the years, I found what
instrument yet for which a composer could write. It is works best and what is missing on the instrument. So I
difficult to draw the line between the composer and the decided to make a new prototype which will feature some
performer while using such systems. The majority of per- new sensors. As I also perform on the bass flute, an hyper-
formers using such instruments are concerned with impro- bass-flute is in development. The following sections briefly
visation, as a way of making musical expression as free as presents the planned design of those new hyper-instruments.
possible. Jonathan Impett also thinks that the use of com-
puters to create real-time music has profoundly changed the 5.1 Hyper-Flute
traditional kinds of music practices. “In such a mode of pro- To maintain the playing expertise I have developed over
duction, the subdivisions of conventional music are folded the years, most sensors used since 1999 will be used in the
together: composer, composition, performer, performance, same physical configuration, but will include technical im-
instrument and environment. Subject becomes object, ma- provements (ultrasound transmitter, magnetic field sensors
terial becomes process.” (page 24) [10] on the little fingers, and force sensing resistors under the left
Using an interactive computer system, the performer has hand and thumbs). There will be several more buttons on
to develop a relation with different types of electroacoustic the new prototype, located close to the right thumb which
sound objects and structures. These relationships consti- is more free while playing.
tute the fundamentals of musical interaction. The computer Earlier I mentionned the necessity to have more sensors
part can be supportive, accompanying, antagonistic, alien- which do not disturb the hands and fingers while playing.
ated, contrasting, responsive, developmental, extended, etc. The new prototype is thus designed with a two axis ac-
All the musical structures included in a piece have different celerometer placed on the foot-joint of the instrument. This
roles. Some affect the micro-structure of a musical perfor- accelerometer gives information about the position of the
mance, others affect the macro-structure and many are in flute (inclination and tilt of the instrument) in a continu-
between. The interaction between the performer and these ous data stream instead of the simple on/off switches used
musical structures vary. The structures can also support previously.
296
5.3 Interface
For both hyper-flutes, I will replace the Microlab device
with a new interface using the Open Sound Control proto-
col. “OSC is a protocol for communication among comput-
ers, sound synthesizers, and other multimedia devices that is
optimized for modern networking technology. Bringing the
benefits of modern networking technology to the world of
electronic musical instruments, OSC’s advantages include
interoperability, accuracy, flexibility, and enhanced organi-
zation and documentation.This simple yet powerful proto-
col provides everything needed for real-time control of sound
and other media processing while remaining flexible and easy
Figure 3: Accelerometer and ultrasound transducer
to implement.” [2]
mounted on a Bo-Pep
This protocol will allow the transmission of different types
of parameters with more resolution and velocity. This will
be achieved with fewer intermediary interfaces and will be
The present proprioceptive sensors on the hyper-flute give much faster. Data will go directly from one interface to
information about muscle actions that are not visible to the computer through a USB connection. Previously, the
the audience (except for the ultrasound sensor and the tilt Microlab was plugged to a MIDI Interface then to the com-
switches working with the inclination of the instrument). puter.
The use of an accelerometer will give more multidimensional A new ultrasonic range finder is being implemented on a
data about movements and position which are visible by the PSoC chip by Avrum Holliger at IDMIL. It has a much more
auditors. This will help to correlate the amount of activity refined resolution than the one used on the Microlab, which
of the computer with the physical activity of the performer. was limited to 128 values by the MIDI protocol. This new
The amount of data produced by the accelerometer greatly range finder will be directly linked to the main interface.
increases the possibilities of multiparametric mapping and For the bass flute, it is possible to install the complete
permits the development of more complex musical struc- interface on the instrument. The hyper-bass-flute will be
tures. This will be very helpful to increase the number of connected to the computer with a single USB cable. A
tasks while playing. For example, one can use the inclina- prototype is now in development using a Arduino-mini in-
tion to scroll through long menus of different sound process- terface [1] which is small enough to fit on the instrument.
ing modules or to choose between several buffers to record Wireless connection is not desirable because of its need for
in. This way, only one button is necessary to trigger many power. A 9 volt battery would be too heavy to install on
different tasks. As I am already aware of the instrument’s the flute.
inclination while playing (because of the tilt switches), it
is now easier to remember the physical position at various 5.4 Mapping Research Project
angles. For my doctoral project, my compositions will aim to
Fastening the sensors on the flute has always been prob- optimize the mappings of my extended instruments in the
lematic. I own only one (expensive) flute and I do not context of new computer music pieces. My first intention
wish to solder anything onto it. Therefor I have been using when building the hyper-flute was to use the natural ges-
double-sided tape to attach the sensors to the flute. This tures of the flutist to control sound processing parameters.
way, the sensors can be taken off when the instrument needs However, as stated above, I was obliged to develop new
to be cleaned or repaired. But this is a tedious exercise and playing techniques to control some of the sensors.
there is always a risk of breaking them. I am now trying to In the Performance skills section, I mention that the ul-
build the sensors on clips that can easily be attached and trasound transducer, pressure sensors and magnet sensors
removed. This will make it easier to transform any flute continually capture the natural movement of a performer.
into a hyper-flute, and will eventually give opportunities to It is a similar situation with the new accelerometer. Those
other performers to play my music. gestures are directly related to the musical material being
A first test was to use a Bo-Pep for the accelerometer performed.
and ultrasound transducer (as showed on Figure 3). These With the new prototype of the hyper-flute, more infor-
plastic hand supports for the flute are simply clipped on the mation from the natural gestures of the performer will be
body of the instrument, and can be taken on and off in a usable. I would like to use these gestures to control the
second. Some sensors can simply be applied on a Bo-Pep, computer so that the performer will not need to add too
while others will need to use a custom made clip. many extra movements. To achieve this, I will study the
gestural data captured by the new hyper-flute (and hyper-
5.2 Hyper-Bass-Flute bass-flute) [15].
I am also developing a hyper-bass-flute, a noticeably dif- Instrumental music material will be written first, then
ferent instrument than the hyper-flute. The bass flute has performed on the hyper-flutes. The performer will play
the advantage of being much bigger so there is more space without taking notice of the sensors. All the gestural data
to attach sensors. Nevertheless, the weight of the instru- will be recorded together with the flute sound. I will then
ment limits the capacity of the thumbs to reach different be able to analyse the gestural data in a specific musical
sensors while playing. The new design of the sensors needs context. This analysis will guide the choice of mappings
to be different than the hyper-flute. Only the accelerome- between the sensors and the computer’s live processing pa-
ter and ultrasound transducer can be installed on the bass rameters. The use of sensors will be precisely specified in
flute as on the flute. Compositional strategies will need to a musical context and will be directly related to the per-
be adapted for this instrument and a new period of learn- former’s natural gestures. This should allow a more subtle
ing will be necessary to perform with it. Even if many and expressive control of the sound processing than is pos-
controllers will be different, I expect the learning process to sible in an improvised music context.
be much faster due to my experience with the hyper-flute. To explore the differences of motion between performers,
297
I will record other flutists as well as myself. I expect other 8. REFERENCES

flutists will move more naturally then myself, as I am used [1] Arduino-Mini.
to playing with the sensors which react to any movement I http://www.arduino.cc/en/Main/ArduinoBoardMini,
make. visited January 2008.
[2] Open Sound Control.
6. MUSICAL PERSPECTIVES http://opensoundcontrol.org/introduction-osc, visited
After 8 years of practice, I consider the hyper-flute as a January 2008.
musical instrument in its own right. New technologies offer [3] B. Bongers. Physical interfaces in the electronic arts.
opportunities to enhance it but even with these improve- interaction theory and interfacing techniques for
ments, it will stay the same instrument. In addition to the real-time performance. In M. Wanderley and
development of my improvisational environment, I want to M. Battier, editors, Trends in Gestural Control of
compose more written repertoire. I also hope to have other Music. IRCAM - Centre Pompidou, Paris, 2000.
composers do so as well. My most sincere wish is that even- [4] M. Burtner. The metasaxophone: concept,
tually other performers will play the hyper-flute. The mu- implementation, and mapping strategies for a new
sical perspectives are open-ended for the hyper-flute, truly computer music instrument. Organised Sound,
a new instrument for the twenty-first century. 7:201–213, 2002.
[5] J. Chadabe. Interactive composing: An overview. In
7. ACKNOWLEDGMENTS C. Roads, editor, The Music Machine: selected
I would like to thank Marcelo Wanderley for his invalu- readings from Computer Music Journal, pages
able advice in my research and all the IDMIL team for their 143–148. MIT Press, Cambridge-London, 1989.
great technical help. Sincere thanks to Elin Söderström and [6] R. Dean. Hyperimprovisation: Computer-Interractive
Jean Piché for their writing help for this paper. My doc- Sound Improvisations. AR Editions, Middleton,
toral studies are supported by the FQRSC (Fonds québécois Wisconsin, 2003.
de la recherche sur la société et la culture). [7] C. Dobrian and D. Koppelman. The ’e’ in nime:
Musical expression with new computer interfaces. In
N. Schnell, F. Bevilacqua, M. J. Lyons, and
A. Tanaka, editors, NIME, pages 277–282. IRCAM -
Centre Pompidou in collaboration with Sorbonne
University, 2006.
[8] A. Hunt and R. Kirk. Mapping strategies for musical
performance. In M. Wanderley and M. Battier,
editors, Trends in Gestural Control of Music. IRCAM
- Centre Pompidou, Paris, 2000.
[9] J. Impett. A meta-trumpet(er). In Proceedings of the
International Computer Music Conference, pages
147–149, San Francisco, 1994. International Computer
Music Association.
[10] J. Impett. The identification and transposition of
authentic instruments: Musical practice and
technology. Leonardo Music Journal, 8:21–26, 1998.
[11] E. Miranda and M. Wanderley. New Digital Musical
Instruments: Control And Interaction Beyond the
Keyboard. AR Editions, Middleton, Wisconsin, 2006.
[12] A. Mulder. Towards a choice of gestural constraints
for instrumental performers. In M. Wanderley and
M. Battier, editors, Trends in Gestural Control of
Music. IRCAM - Centre Pompidou, Paris, 2000.
[13] C. Palacio-Quintin. The hyper-flute. In F. Thibault,
editor, NIME, pages 206–207. Faculty of Music,
McGill University, 2003.
[14] M. Waisvisz. The hands, a set of remote
midi-controllers. In Proceedings of the International
Computer Music Conference, pages 313–318, San
Francisco, 1985. International Computer Music
Association.
[15] M. Wanderley. Quantitative analysis of non-obvious
performer gestures. In Gesture and Sign Language in
Human-Computer Interaction: International Gesture
Workshop, pages 241–253, 2003.
298
A Tangible Virtual Vibrating String

A Physically Motivated Virtual Musical Instrument Interface
Edgar Berdahl Julius O. Smith III
CCRMA CCRMA
Stanford University Stanford University
Stanford, CA, USA Stanford, CA, USA
eberdahl@ccrma.stanford.edu jos@ccrma.stanford.edu
ABSTRACT lacking commonplace interface counterparts, so skill transfer

We introduce physically motivated interfaces for playing vir- to the virtual domain is not as immediate [17].
tual musical instruments, and we suggest that they lie some- 1.1.2 Haptic Interfaces
where in between commonplace interfaces and haptic inter-
faces in terms of their complexity. Next, we review guitar- Haptic interfaces lie at the opposite end of the complexity
like interfaces, and we design an interface to a virtual string. spectrum. They apply force feedback to the performer, so
The excitation signal and pitch are sensed separately using that he or she feels and interacts with the vibrations of the
two independent string segments. These parameters control virtual instrument as if the virtual instrument were real. In
a two-axis digital waveguide virtual string, which models this sense, haptic interfaces can be seen as the ideal inter-
vibrations in the horizontal and vertical transverse axes as face for interacting with virtual instruments. For instance, a
well as the coupling between them. Finally, we consider the carefully designed haptic bowed-string should promote bet-
advantages of using a multi-axis pickup for measuring the ter skill transfer to the virtual domain because it exerts
excitation signal. forces on the instrument interface causing it to behave as
if it were a real bow bowing a string.
When Luciani et al. implemented their haptic bowed string,
Keywords they found that users strongly preferred that haptic feed-
physically motivated, physical, models, modeling, vibrating back be rendered at the audio rate of 44kHz rather than
string, guitar, pitch detection, interface, excitation, coupled at the usual 3kHz. Users made comments regarding the
strings, haptic “strong presence of the string in the hand,” “the string in
the fingers,” and “the string is really here” [16]. Related
1. INTRODUCTION kinds of instruments, such as actively controlled acoustic
musical instruments are essentially the same as haptic mu-
1.1 Physical Models sical instruments except that the whole acoustical medium
Virtual models of acoustic musical instruments have been becomes the interface [6]. Haptic technologies are becoming
available to the music community for decades [7] [18] [13]. increasingly available to the music community, but they are
The models are useful for studying the physical behavior of currently still complex enough that it is worth considering
acoustic musical instruments, and they can also synthesize alternatives.
sound output. Given an appropriate interface, many of the 1.1.3 Physically Motivated Interfaces
models can be played in real-time by performers.
In this paper, we investigate the middle ground in between
1.1.1 Commonplace Interfaces commonplace interfaces and haptic interfaces for controlling
Often it is most convenient and simplest to control a phys- physical models. We term such interfaces physically moti-
ical model with a commonplace interface, such as a computer vated interfaces. Rather than applying haptic feedback, we
keyboard, musical keyboard, or mouse. This approach is attempt to otherwise preserve the physical interaction be-
most palatable if the interface matches the physical model. tween the performer and the virtual instrument as much as
For instance, playing a virtual piano with a musical key- possible. Such interfaces are similar to Wanderley’s cate-
board interface is physically intuitive, so it is easy for a pi- gorization of instrument-like controllers with one important
anist to transfer real-life skills to the virtual domain. How- exception [17]: we state that the input quantities should be
ever, many performers play traditional acoustic instruments sensed so accurately that an audio-rate feedback loop could
be closed around the sensor if the interface were equipped
with an actuator. It follows that ideally all quantities ap-
plied to the physical model should:
• correspond to the correct quantity for controlling the
physical model (e.g. displacement, velocity, accelera-
Permission to make digital or hard copies of all or part of this work for tion, etc.)
any purpose are granted under a Creative Commons Attribution 3.0 license:
http://creativecommons.org/licenses/by/3.0/ • be linear and low-noise
Copyright 2008 Copyright remains with the author(s). • be delayed and filtered as little as possible
299
• be sampled at the audio sampling rate Test signal

Pitch
detection
These requirements were difficult to meet in the past due to signal
limitations in computational power, A/D conversion, and
sensing; however, today we may achieve or approximate
them as we see fit. To succeed in our endeavor, we need
to carefully apply knowledge from acoustics, mechanical en-
gineering, and electrical engineering to the field of human
Frets Excitation
computer interaction. In the following, we develop a physi- signal
cally motivated interface for a virtual vibrating guitar string.
2. PRIOR GUITAR-LIKE INTERFACES

A number of musical instrument interfaces suggest the
metaphor of a guitar. While they have followed different
design goals, we should at least consider how they estimate Damping material
the desired pitch. For instance, the virtual air guitar uses
the distance between the hands to control the pitch. Differ- Figure 2: Two string segment approach
ent versions of the virtual air guitar make use of magnetic
motion capture, camera tracking, and acoustic delay esti-
mation systems for this measurement [12]. 3.2 Signal Flow
The makers of the GXtar prefer to place an force-sensing
The upper half of Figure 1 shows the signal flow for the
resistor strip placed beneath a real string to measure both
pitch string segment; the purely acoustic components and
the position and pressure [14]. The Ztar [2] and the Yamaha
paths are drawn in dashed lines. We detect the desired pitch
EZ-GE [4] detect pitch with a matrix of sensors in the neck.
of the string acoustically to help avoid incorrectly captur-
One sensing element is used for each fret and string. The
ing higher-order effects such as string bending, slightly mis-
SynthAxe sports normal strings placed above matrix-type
placed frets, etc. We actuate the string and measure its
sensors [5]. The SynthAxe has additional sensors to detect
response. Since we know how the string is being actuated,
string bending. A current flows down the each string, and
we should be able to more accurately estimate the length of
small electric coils placed near the string measure the lateral
time it takes for a pulse to leave the actuator, reflect off of
string displacement.
a fret, and arrive back at the sensor (see Figure 2, top).
Roland provides a six-channel electromagnetic pickup for
Any picking, plucking, scraping, or bowing excitation is
electric guitar and accompanying DSP [3]. In the MIDI
sensed via the excitation string segment and fed directly to
mode of operation, a DSP estimates when new notes are
the virtual string.2 The lower half of Figure 1 shows the
played, with what velocity, and with what pitch.1 One draw-
signal flow for the excitation string segment. One end of
back of this approach in general is that most detectors have
the excitation string segment should be damped passively to
noticeable delay for lower pitches because they wait at least
prevent physical resonances from interfering with resonances
one period. It is also difficult to construct a perfectly reliable
in the virtual model (see the damping material in Figure 2,
pitch detector of this type. Consequently, performers must
bottom).
learn to play carefully to avoid confusing the pitch detector.
3.3 Two-Axis Digital Waveguide Virtual String
3. TANGIBLE GUITAR STRING We model the virtual string using a simple two-axis model
that takes into account the vertical and horizontal transverse
3.1 Separate Excitation Sensing and Pitch De- modes of vibration. The ith axis is modeled using a delay
tection line of length Ni samples and lowpass filter LP F i, which is
a 3-tap linear-phase FIR filter causing the higher partials to
To ensure that the interface is physically motivated, we decay faster (see Figure 3). This portion is the basic digi-
follow the guidelines outlined in Section 1.1.3. We sense the tal waveguide model used for elementary teaching purposes
relevant portions of performer’s gestures with as much pre- at CCRMA. For additional realism, the excitation signals
cision as possible to preserve the guitar-like physical inter- can be comb filtered with the notch frequencies chosen as a
action between the performer and the virtual instrument. function of the excitation’s distance from the bridge [18].
We would also like to preserve the physical presence of a While the nut is assumed to be rigid, the bridge is in
string. To these ends, an independent string segment asso- general not quite rigid, hence it couples the axes together at
ciated with each hand separates the problems of estimating this point. The coupling implemented in Figure 3 is actually
the desired pitch and measuring the plucking excitation sig- more appropriate for modeling the coupling of the vertical
nal [5]. axes of two neighboring piano strings, but it still results
2
There are surprisingly few examples in the literature where
1
In another mode of operation, the Roland system syn- some filtered form of an excitation signal measured at the
thesizes audio more directly from the sensed signals using audio sampling rate appears at the output. One example is
“Composite Object Sound Modeling”. Here the pitch detec- the digital flute, which allows the excitation signal as mea-
tor is not needed explicitly, so tracking is much improved. sured by a microphone to be mixed with the sound synthesis
Since the model is not entirely virtual and its details are output; however, in contrast with the current work, sound
trade secret, we do not consider it further here. was not synthesized with a physical model [19].
300
String Pitch String Pitch String Pitch Test signal

Actuator Segment Sensor Detector generator
Excitation String Excitation String Virtual Synthesis

Segment Sensor String Output
Figure 1: Tangible virtual string signal flow diagram
in qualitatively correct behavior. For instance, by choosing

N1 ≈ N2 such that N1 = N2 and g ≈ 0.1, one obtains
behavior where the energy slowly rotates back and forth
between the axes of vibration [18] [15].
3.4 Prototype
A prototype of the tangible guitar string interface is shown
in Figure 5. For convenience given the default hardware on a
Fender Stratocaster, the two string segments are spaced hor-
izontally instead of vertically in relation to one another. In a
six-stringed embodiment, each pair of string segments would
instead be placed axially-aligned with each other. In the
prototype, each string’s vibration is sensed using a Graph-
tech saddle piezoelectric pickup, as shown in Figure 4 [1].
Figure 4: Graphtech piezoelectric pickup
The magnetic actuator can be obtained by ordering the

Sustainiac [11]. It conveniently replaces any one of the
pickups; however, other kinds of actuators can be used in-
stead. To prevent the actuator from affecting the excitation
string segment, we choose the excitation string segment to
be a regular, non-ferrous solid electrical wire. The excita-
tion string is passively damped using felt, wrapped in such Figure 5: Tangible Guitar String Interface
a manner to approximate gradually increasing the string’s
wave impedance to infinity, eliminating reflections as much
as possible. In other words, the strip of felt is wrapped vertical transverse axes of the string. This can be done us-
more and more tightly approaching the nut (see Figure 5). ing optical [10], electromagnetic, or piezoelectric sensors [8].
In some cases, it is better to damp the string less effectively. We have verified informally that the tangible virtual string
The resulting less damped reflections from the felt material sounds more realistic given the two-dimensional excitation.
cause a comb-filtering effect, which changes the timbre of
the instrument as a function of excitation position as with a
normal vibrating string. Note that this desirable attribute 4. WEBSITE
comes for free since the interface is physically motivated— We have authored a website providing sound examples of
the comb filter can be implemented either mechanically on the instrument in a few different configurations.3 For exam-
the interface or virtually in the instrument model. ple, the website includes comparisons of model output given
randomly synthesized excitations, single axis measured exci-
3.4.1 Multi-Axis Pickups tations, and two-axis measured excitations. It also includes
To most accurately excite the physical model, we should
3
ideally measure the excitation in both the horizontal and http://ccrma.stanford.edu/~eberdahl/Projects/TS
301
Excitation axis 1 Excitation axis 2
N1samples of delay + + + N2samples of delay
LPF Bridge
Out 1 Out 2
g
LPF 1 + − − + LPF 2
Figure 3: Two axis digital waveguide string model
model output given various excitation sources such as pluck- [6] E. Berdahl, G. Niemeyer, and J. O. Smith.
ing, picking, bowing, and scraping. To enable others to ex- Applications of passivity theory to the active control
cite their physical models with quality excitation signals, of acoustic musical instruments. In Proc. of the
we provide the corresponding non-resonant excitation sig- Acoustics ’08 Conference, June 2008.
nals themselves. Finally, an example melody played on the [7] C. Cadoz, A. Luciana, and J.-L. Florens. Synthèse
tangible virtual string demonstrates the viability of physi- musicale par simulation des mécanismes
cally motivated instrument design. instrumentaux. Revue d’acouqistique, 59:279–292,
1981.
5. FUTURE WORK [8] A. Freed and O. Isvan. Musical applications of new,
The behavior of the interface could be further refined with multi-axis guitar string sensors. In Proc. of the Int.
force-feedback. For example, the excitation string segment Computer Music Conf., Aug. 27-Sept. 1, 2000.
could be made into a haptic device by adding an actuator. [9] B. Hannaford. A design framework for teleoperators
Then the piece of physical string could be joined to a portion with kinesthetic feedback. IEEE Transactions on
of the waveguide using teleoperator techniques [9]. It would Robotics and Automation, 5(4):426–434, August 1989.
be essential that the string segment would have as little mass [10] R. Hanson. Optoelectronic detection of string
as possible to avoid loading down the virtual waveguide at vibration. In The Physics Teacher, volume 25, pages
the point of connection.We would also like to eventually con- 165–166, 1987.
struct a six-string version to promote the maximum transfer [11] A. Hoover. Controls for musical instrument sustainers.
of guitarists’ skills from real guitars to virtual guitars. U.S. Patent No. 6034316, 2000.
[12] M. Karjalainen, T. Mäki-Patola, A. Kanerva, and
6. CONCLUSION A. Huovilainen. Virtual air guitar. Journal of the
We have presented a physically motivated interface for Audio Engineering Society, 54(10):964–980, October
controlling a virtual digital waveguide string. The excita- 2006.
tion and pitch are sensed separately using two independent [13] K. Karplus and A. Strong. Digital synthesis of plucked
string segments. In contrast with prior interfaces, the exci- string and drum timbres. Computer Music Journal,
tation to the physical model is measured according to the 7(2):43–55, 1983.
principles of physically motivated interfaces. In particular, [14] L. Kessousand, J. Castet, and D. Arfib. ’gxtar’, an
we measure the excitation signals with high quality, linear, interface using guitar techniques. In Proc. of the
and low noise sensors at the audio sampling rate. We hope International Conf. on New Interfaces for Musical
that interfaces such as this one will continue to promote skill Expression, pages 192–195, 2006.
transfer from traditional acoustic musical instruments to the [15] N. Lee and J. O. Smith. Vibrating-string coupling
virtual domain. estimation from recorded tones. In Proc. of the
Acoustics ’08 Conference, June 2008.
7. ACKNOWLEDGMENTS [16] A. Luciani, J.-L. Florens, D. Couroussé, and
C. Cadoz. Ergotic sounds. In Proc. of the 4th Int.
We thank the Stanford Graduate Fellowship program for
Conf. on Enactive Interfaces, November 2007.
supporting this work.
[17] E. Miranda and M. Wanderley. New Digital Musical
Instruments. A-R Editions, Middleton, WI, 2006.
8. REFERENCES [18] J. O. Smith. Physical Audio Signal Processing: For
[1] http://www.graphtech.com/. april 14, 2008. Virtual Musical Instruments and Audio Effects.
[2] http://www.starrlabs.com/. april 14, 2008. http://ccrma.stanford.edu/˜jos/pasp/, 2007.
[3] http://www.vg-8.com/. april 14, 2008. [19] M. Yunik, M. Borys, and G. Swift. A digital flute.
[4] http://www.yamaha.com/. april 14, 2008. Computer Music Journal, 9(2):49–52, Summer 1985.
[5] W. Aitken, A. Sedivy, and M. Dixon. Electronic
musical instrument. International Patent,
WO/1984/004619, 1984.
302
Towards Participatory Design and Evaluation of Theremin-based

Musical Interfaces
C. Geiger H. Reckter, D. Paschke, F. Schulz C. Poepel
FH Düsseldorf Hochschule Harz FH Ansbach
Josef-Gockeln Str 9 Am Eichberg 1 Residenzstr. 8
40474 Düsseldorf, Germany 38855 Wernigerode, Germany 91522 Ansbach, Germany
geiger@fh-duesseldorf.de hreckter@hs-harz.de c.p@fh-ansbach.de
ABSTRACT and 3D user interfaces to the design of NIMEs. In the “VRemin”

Being one of the earliest electronic instruments the basic project presented in this paper we built a simple music synthesizer
principles of the Theremin have often been used to design new simulation based on the Theremin concept. The Theremin was
musical interfaces. We present the structured design and one of the earliest electronic instruments and unique in that it was
evaluation of a set of 3D interfaces for a virtual Theremin, the the first instrument that was played without being touched. The
VRemin. The variants differ in the size of the interaction space, player stands in front of the instrument and moves her hands to
the interface complexity, and the applied IO devices. We control pitch and volume. With the work presented here we want
conducted a formal evaluation based on the well-known to propose both, a set of alternate approaches to a virtual
AttrakDiff questionnaire for evaluating the hedonic and pragmatic Theremin as well as an empirical evaluation providing support to
quality of interactive products. The presented work is a first the theoretical basis as well as to new methods to compare
approach towards a participatory design process for musical musical interfaces.
interfaces that includes user evaluation at early design phases.
2. UI DESIGN BACKGROUND
Keywords Few WIMP interface concepts make efficient use of both hands.
3D interaction techniques, Theremin-based interfaces, Evaluation. In contrast the use of both hands is an important concept for
musical interfaces. At present there are no widely accepted
design methodology that could guide musical interfaces
1. INTRODUCTION designers. We are convinced that development of successful
The challenge of designing new interfaces for musical expression NIMEs needs an intensive testing of many application concepts as
is to identify a suitable mapping of interaction elements to sound well as the active user participation in the design and refinement
generation attributes. An arbitrary mapping of sound parameters of promising designs. A simple implement and test approach,
to interface device properties, e. g. X-orientation mapped to pitch however, is not viable because the implementation of working
control, may not directly lead to an intuitive musical interface for prototypes is expensive, time consuming, and is limiting the
casual players or even advanced artists. It may also be less number of concepts and designs that can be possibly explored.
attractive for the audience, a property that is important for audio- Therefore, we propose to evaluate the concepts under controlled
visual performances. Due to missing interface standards and little conditions with potential end users. Recent developments in HCI
design experience in this domain, a “try-and-error approach” is suggest a paradigm shift for the usability evaluation of interactive
the best design method of choice. Unfortunately, little attention products. The most recognized definition of usability is provided
has been given to a structured design approach that allows for by ISO 9241 and defines it as “the extent to which a product can
design reviews at early design stages including end user be used by specified users to achieve specified goals with
participation. These problems are well known in the HCI area of effectiveness, efficiency and satisfaction”. In recent years the
post-WIMP interface design and the focus of attention shifted majority of research activities have been based upon this
towards authoring concepts and evaluation techniques recently. In definition but more recently a broader perspective has been
this project we propose to directly apply methods and techniques suggested that considers the motivation of the user in more detail.
from advanced HCI fields like tangible and embedded interaction The traditional key element of user satisfaction provides only a
limited description of the user’s experience with an interactive
product. Trying to evaluate the complete spectrum of experience a
Permission to make digital or hard copies of all or part of this work for person has when interacting with a specific design, this lead to a
personal or classroom use is granted without fee provided that copies are number of new quality measurements approaches. A common
distinction is between hedonic / aesthetic and utilitarian qualities
otherwise, or republish, to post on servers or to redistribute to lists, of a computer system interface [9]. In the model of Hassenzahl he
requires prior specific permission and/or a fee. distinguishes between hedonic quality – identification (HQI),
NIME08, June 5-8, 2008, Genova, Italy hedonic quality – stimulation (HQS) and pragmatic quality (PQ)
Copyright remains with the author(s). [9]. HQI measures how well a user identifies with a product and
HQS measures to what extent a product stimulates the user by
303
offering novel and interesting functions, contents, interactions and interface complexity and interaction space, 2) the development of
styles of presentation. PQ measures the traditional concept of interface prototypes with high-level tools, 3) user testing and
usability, i.e., how well the user achieves her goals with the analysis, and 4) cycles of refinement and final implementation. In
product. This model has been implemented as AttrakDiff2™, a this study we classify NIMEs along two dimensions: interaction
web-based instrument for measuring the attractiveness of space and interface complexity. The interaction space is defined
interactive product. With the aid of pairs of opposite adjectives, as the spatial extent occupied by the user during interaction with
users indicate how they experience the design [5]. Innovative the NIME. Figure 1 denotes the different values for interaction
interfaces for intuitive music expression should also have space along with the second dimension interface complexity. A
pragmatic and hedonic qualities. Creating musical expressions detailed description of the two dimensions and how to define and
with new interface concepts should be easier and more joyful for measure interface complexity for NIMEs will be presented in
a performer and attractive for the audience. Thus, measurement [15]. The prototype development is divided into a music
approaches for hedonic and pragmatic qualities, like the ones generation backend and an interface front end. The presented
developed by the HCI community, can be very helpful to design interface variants were assembled using the MIDI-based sensor
and evaluate new musical interfaces. In particular, we used the kits from I-CubeX [2], optical tracking using IR lighting and/or
AttrakDiff2™ approach to evaluate our designs. fiducials [13], and the Wiimote game controller.
3. RELATED WORK
As we see the work presented here falling into two categories we
will refer to both, Theremin inspired controllers as well as the
evaluation of new musical interfaces. Due to its successful use
during decades the Theremin has a sustained influence on
researchers working in the field of musical expression. Looking at
the idea to use two hands, freely moving in the air to play
electronic sounds "the hands" [1] are an early example of an
highly flexible and expressive musical interface offering a set of
sensors and keys to be played by hands and fingers. For his
virtual musical instruments Mulder [3] used data gloves and a
Polhemus 3-D tracking system to shape and play sounds including
a visual representation of sounding objects and virtual hands. A Figure 1. Classification
Swedish project used optical tracking to develop virtual
The prototyping of the back-end was straight forward. We
instruments that are controlled by gestures [4]. Four virtual
implemented a small synthesizer application using Native
instruments including a virtual xylophone and air guitar have been
Instruments Reaktor 5. We implemented a lean MIDI interface
developed using a Cave-like virtual room and have been
that allows selecting a small set of predefined instruments or
evaluated concerning their efficiency and learning curve. As we
effects and controlling effect parameters, pitch and volume by
are following the interface paradigm "low threshold, high ceiling"
MIDI commands. This allows us to develop different variants of
the controllers involved in this research are low cost and
input devices and techniques without altering the music
commercial available ones. We use the Nintendo Wii controllers
generation back-end. The PDA based version and the massive
and as we do, others have been using those for the control of
multi user version will be described in [15].
sound. Paine [6] developed a method for dynamic sound
morphology using the Wiimote and the Nunchuck controllers 4.1 VRemin I – Wii Controller
seeking two goals, to increase the performance and the ability for The initial approach for a Theremin-based interaction scenario
communication with the audience. While having the system uses the Wii game controllers for interaction. For the VRemin I,
successfully used in several concerts he points out the potential of the Wiimote is controlled by the dominant hand (DH, usually
further investigation. While a method for evaluation using tools right) and the Nunchuck is used by the non-dominant hand (NDH,
from HCI has been provided by Wanderley and Orio [7] it has not usually left). Pitch and volume are controlled by the Wiimotes
yet often been used. Isaacs [10] presents a study comparing a 3-D acceleration sensor. The buttons are used for switching the current
accelerometer with a Korg Kaosspad KP2 looking at participants note on and off and permit to interrupt the sound generation. The
learning to play with those. In addition, a method to compare NDH is used to select the predefined effect with Nunchuck
digital instruments based on findings of music psychology on buttons and control the effect parameter with the acceleration
musical expression has been provided [8]. Since basic evaluation sensor (see figure 2). The Wiimote / Nunchuck values are
methods seem not yet to have been established, however, we recorded and transformed to midi notes using the DarwiinRemote
consider this as an important issue to provide and test frameworks software and a virtual midi device (IAC device driver). The
for evaluation of newly designed interfaces. interaction space is determined by each hand’s rotation and thus is
the smallest of all variants. The sound generation is designed as
4. DESIGN OF VRemin VARIANTS an asymmetrical two-handed interaction [10] because both hands
The design of NIMEs is a challenging task and we propose a perform different tasks and with different gestures / interaction
participatory design approach that evaluate design prototypes techniques.
through end users and iteratively refine the designs according to
test results. Our design approach for NIMEs consists of the
folliwing steps: 1) the classification of NIMEs with regards to
304
showed high computer affinity, but little experience with musical

instruments.12 persons (9 male, 3 female) with an average age of
26.75 years (min=20, max=47) participated in the study. Three of
them had attended a musical school long ago and played a
musical instrument (0.2 – 2 years experience). The questionnaire
allowed assigning integer values from one to five to each
question. Here a one stood for »very little« or »very few« and five
indicated »very much« or »very many« (see figure 4).
Participant Questionnaire Mean Value [1..5)
Familiarity with computers 4,25
Ear for music 2,83
Familiarity with sequencers 3,00
Figure 2. VRemin I – Wii controller Familiarity with game controllers 2,17
The second variant interacts within an arm-based space. The Figure 4. Participant questionnaire
VRemin II tracks the X and Y position of the dominant hand and Subjects were assigned the test variants in permuted order to
assigns volume level and pitch level based on the assigned exclude corruption of the results by learning experience or change
position values. We selected optical tracking for monitoring the of perception by preceding tasks. Each subject received a short
hand movement. In the current prototype we attached a small introduction to the instrument immediately before each test. After
fiducial marker at the DH’s wrist and use a web cam to capture a short familiarization period of five minutes the subject was
the image. The reacTIvision software package [13] is used to given two tasks. The first task consisted in playing a musical
analyze the image, calculate the position values and sends a MIDI scale. The second task allowed the subject to improvise to a
value which controls pitch of the sound to the back end interface. played back drum beat. Subsequently the subject answered two
The selection and adjustment of effects is also controlled with the questionnaires. The first form is based on the attrakDiff2
DH. We built a small custom glove-based input device that uses a described in section 2. A second form amended the attrakDiff2
bend sensor for each finger. The sensors are connected to an I- questionnaire. This form queries facts which are also similarly
CubeX midi converter that generates MIDI signals. The poses of surveyed in attrakDiff2 (complexity, precision, comfort, etc.) and
all fingers determine an individual combination of effects and thus increases result validity. In addition the subjects evaluate
their strength. This approach realizes a unimanual interaction musical qualities to allow for further indications on the adequacy
technique. of the interface as a musical instrument. Due to the limited
training of the subjects as musicians this questionnaire was
considered only in a reduced fashion. The tests were concluded
for each subject with a comparison questionnaire, which allowed
to comparatively checking the results of the previous
questionnaires once more. The result diagram of the attrakDiff2
questionary (figure 5) shows that the Theremin (number 3) is
neutrally valued in pragmatic (PQ) and hedonic (HQ) quality. The
Theremin was deemed suitable for playing music, but it only
achieved average evaluations. Also the hedonic quality was
within the average range. The Theremin was only averagely
interesting. The VRemin II (number 1) has similar PQ values as
the Theremin. It supports the user in fulfilling his tasks, but
obtains only average values. However, compared to the Theremin
it is valued more exciting and it generates curiosity (HQ). Both
Figure 3. VRemin II – Hand tracking and Glove (prototype) instruments show a large variance in user rating (confidence
rectangle). The VRemin I (number 2) convinced the subjects with
its pragmatic and hedonic qualities. (PQ and HQ of number 2).
5. EVALUATION The users had the feeling to be able to play music more easily and
We are interested in usability and manageability of our interfaces
it motivated and stimulated the user more than the other two
as well as the potential they offer for interaction variations. Since
variants. The Theremin and the VRemin II show potential for
the AtrakDiff test is new in the evaluation of musical interfaces
improvement with respect to usability, though the VRemin II with
we decided to start with a pre-test. Controlled laboratory tests of
a value of 0.6 was deemed more attractive than the Theremin at
the variations »Theremin«, »VRemin I« and »VRemin II« were
0.05. Figure 8, right shows the averages of some important values
performed for deployment to the usability evaluator. Three
of the second questionnaire. The VRemin II is deemed as more
questionnaires surveyed the subjects’ evaluations after each test.
complex and having more capabilities than the Theremin and the
Video and audio recordings added quantitative data to the
VRemin I. The VRemin I is superior to the other two variants
collected materials. However, the pre-test focuses on the
with respect to handling, usability, precision, comfort, and
quantitative data of the questionnaires, since the surveyed subjects
controllability.
305
Ease of handling
variance. The planned test for the analysis of the potential of
Instrument operation
musical expression of the evolutionary VRemin series should give
Precision more information on the pragmatic quality and further the
Complexity advancement of the VRemins as musical instrument. We plan to
Range of capabilities
use this research to develop Interface Design Patterns for Musical
Your progress
Instruments (IDP-FMI).
Comfort
Reached goals
Interface optic 7. REFERENCES
Sound quality [1] M. Waisvisz. The hands, a set of remote midi-controllers. In
Measurment of control B. Truax, editor, Proceedings of the 1985 International
Computer Music Conference, pages 313-318, 1985.
Theremin VRemin I VRemin II
[2] www.infusionsystems.com
Figure 5. AttracDiff Results and second questionnaire
[3] A. Mulder. Design of three-dimensional instruments for
Progress with all instruments is deemed similar in the neutral sound control. PhD thesis, Simon Fraser University,
range (value 3). With respect to appearance the VRemin I is Vancouver, Canada, 1998
placed before VRemin II and the Theremin in the positive range.
[4] Mäki-Patola, T., Laitinen, J., Kanerva, A., Takala, T.
This confirms the findings of the attrakDiff2 questionnaire. Both
Experiments with Virtual Reality Instruments, In Proc. NIME
VRemins appeal more to the subjects. Both generate curiosity and
2005, pages 11-16, Vancouver, Canada, 2005.
stimulate the participants. Due to the complexity and prototypical
character of the VRemin II the subjects were able to attain the [5] www.attrakdiff.de
task goals (pragmatic values), but they are valued slightly below [6] G. Paine. Interfacing for dynamic morphology in computer
the values for the Theremin and much more so below the VRemin music performance. In Proceedings of the 2007 International
I. On the other hand, the VRemin II is valued as more complex Conference on Music Communication Science, pages 115-
and having more capabilities. The VRemin I obtains throughout 118, Sydney, Australia, December 2007.
neutral to very good values and hence shows also in the
specialized questionnaire its qualities. While the Theremin is [7] M. M. Wanderley and N. Orio. Evaluation of input devices
viewed as mostly neutral with respect to usability of an interactive for musical expression, borrowing tools from HCI. Computer
device for playing music, the VRemin I is seen as superior in all Music Journal, 26(3):62-76, 2002.
aspects. The VRemin II appeals as interesting, stimulating, and [8] C. Poepel. On interface expressivity: A player-based study.
fascinating with the cost of an increase of complexity and In Proc. NIME 2005, pages 228-231, Vancouver, Canada,
accordingly operational difficulty. Further development of the 2005.
VRemin II will require a reduction in its capabilities or a longer [9] M. Hassenzahl, A. Platz, M. Burmester, K. Lerner. Hedonic
training phase and the transformation of the prototype into a more and ergonomic quality aspects determine a software’s
stable version. A further planned test in spring with a group of appeal. CHI Letters, 2, 1, 201-208, 2000
musicians (sound editors) with a longer training period will
analyze the identified points of criticism and the harmonic [10] D. Isaacs. Evaluating input devices for musical expression.
capabilities of the interactive devices. Master's thesis, University of Queensland, Brisbane,
Australia, 2003.
6. Conclusion
The presented approach methodically analyzed digital input [11] Guiard, Y. A symmetric division of labor in human skilled
devices as computer supported musical instruments. The bimanual action: The kinematic chain as a model. The
presented evaluation steps and the pre-test argue that digital Journal of Motor Behavior, 19 (4), 1987.
developments based on the Theremin appear to the casual player [12] Mason, C. P. Theremin “Terpsitone” A New Electronic
as tantamount. At the same time the attractiveness and use of the Novelty in Radio Craft, Dec. 1936, p.365
new input modes is viewed as more positive and more appealing. [13] Kaltenbrunner, M., Bencina, R. reacTIVision: A Computer-
The capabilities seem to the subjects as higher-valued, than the Vision Framework for Table-Based Tangible Interaction. 1st
original instrument introduced by Lev Theremin. For our Conf. on Tangible and Embedded Interaction. Baton Rouge,
subjects, the VRemin I seems to be superior to the VRemin II for Louisiana, 2007.
playing. However, the VRemin II is still at a prototype stage, has
a higher degree of complexity and this results in a higher degree [14] C. Geiger, H. Reckter, D. Paschke, F. Schulz. Poster:
of usage difficulty. The approach with attrakDiff2, the specialized Evolution of a Theremin-based 3D Interface for Music
questionnaire with explicit music-related questions and validity Synthesis. Conf. 3D User Interfaces, Reno, Nevada, 2008
check by comparison questionnaire has proven itself. The claims [15] D. Paschke. A Method to Design and Implement new
were congruent even with a small number of subjects. The interfaces for musical expressions. Bachelor Thesis,
variance fluctuations are defensible – a larger number of subjects University of Applied Science Harz, (in German,
or a firmer selection of subjects should result in a reduced forthcoming)
306
META-EVI
Innovative Performance Paths with a Wind Controller
Tomás Henriques
Faculdade de Ciências Sociais e Humanas, Universidade Nova de Lisboa
Av. de Berna, 26-C 1069-061 Lisboa, Portugal
+351 96 991 1001
tomas.henriques@fcsh.unl.pt
ABSTRACT 2. THE META-EVI

The META-EVI is a novel interface created from an extensive
augmentation of the Steiner MIDI EVI, a brass style ‘electric The META-EVI keeps all of the original MIDI EVI’s sensors,
valve instrument’ (EVI), which was expanded with 11 analog such as the breath pressure sensor and the bite sensor found
sensors and 10 digital switches. The main goal of this project on its mouthpiece, the touch/capacitance sensors for the
consists in achieving a performance tool that exhibits truly fingering/key system and also capacitance sensors that
innovative levels of musical expression much beyond what is generate pitch bend, glissando and vibrato.
possible with the original instrument. By allowing the control The META-EVI modification adds 11 analog sensors and 10
of a great variety of musical parameters, the META-EVI is digital switches. These are as follows:
able to push the level of musicianship associated with the
player of a monophonic wind instrument, to a new plateau of
musical complexity. • 1 Accelerometer with Tilt detection
• 1 Gyroscope
Keywords • 1 Force Resistance Sensor
Musical Instrument, Sensor technologies, Computer Music, • 3 Position membrane sensors
Software Design. • 1 Joystick
• 2 Linear Potentiometers
• 10 On/Off tactile switch buttons
1. THE EVI and MIDI EVI
The idea of a wind driven electronic synthesizer, known today
as a wind controller, was conceived by American engineer
and trumpeter Nyle Steiner, who completed his first playable
EVI, the 'Steiner Horn’ [1], in 1975. This instrument was a
monophonic electronic controller that used the valve-
technique found in brass instruments as the main control
interface. A woodwind version, the EWI, implementing the
Boehm system was developed soon after. The success and
musical prowess of these instruments can be easily understood
when one considers that jazz giants, such as saxophonists Bob
Mintzer and the late Michael Brecker, among others, have
regularly played with them. Back in the 80’s through the mid
90’s, those instruments were voltage-driven analog controllers
that needed special sound modules to translate and create
sound according to those voltages. In the late 1990’s Nyle Figure 1. Top view: virtual valves, accelerometer, gyro, FSR
Steiner created a MIDI version of the EVI, the MIDI EVI [1]
that enabled the wind controller to be connected directly to
MIDI gear. 3. FUNDAMENTAL GOALS
The choice of sensors to extend the capabilities of the MIDI
EVI were thought out to have two main objectives. The first is
the attempt to change the paradigm of the player of a
Permission to make digital or hard copies of all or part of this work for monophonic wind/brass instrument. For over half a century
personal or classroom use is granted without fee provided that copies are now, there have been created ‘extended techniques’ both for
brass and woodwind acoustic instruments, such as multi-
otherwise, or republish, to post on servers or to redistribute to lists, phonics, playing and singing simultaneously, etc, etc. These
requires prior specific permission and/or a fee. techniques were created with the purpose of surpassing the
NIME08, June 5-7, 2008, Genova, Italy “limitations” of those instruments and they have clearly widen
Copyright remains with the author(s). the musical palette available to the musician. Nonetheless,
307
because they are sonically unsteady, technically difficult, or at 4.1 The extra analog sensors
times insufficiently expressive, they fail at being able to create While most of the extra sensors and switches are placed at
a new performance model. By conquering specific new levels specific places in order to be easily accessed by the fingers of
of performance techniques the META-EVI presents the musician, both the accelerometer and gyroscospe
innovative solutions for the player of a monophonic placement was chosen to fully optimize their readings as a
instrument. These include among others, playing complex function of the performer’s motions. The accelerometer sits at
harmonic structures while simultaneously playing a lead the end of the top part of the instrument, being able to detect
melody, a process where the two strands are played in real the amplitude of the motion within the vertical plane. It
time and totally asynchronous from each other. basically detects how high the musician is holding the
Attempts to tackle this particular issue have been realized instrument, a measurement that can vary between 0-180º.
with some commercial electronic instruments but with very Similarly, the tilt compo nent of the accelerometer will detect
limited results. Specifically with the AKAI EWI 3020 [2] (an roll, determining when and how much the musician bends
offspring of the early EWI) or the Syntophone [3]. These his/her body sideways. Here the amplitude of motion can be
instruments are able to trigger a previously stored stack of made to also vary between 0-180º degrees.
notes in a sound module when a specific pitch is played. This The gyroscope being a sensor that detects rotational
approach, although allowing the playback of some harmonic acceleration is used in the META-EVI as a means of detecting
material presents very little flexibility. fast movements of the upper torso, specifically when the
musician swings his/her upper body while playing the
instrument.
While the force resistance sensor (FRS) was mounted right by
the ‘3rd valve’ of the MIDI EVI being comfortably accessed
by the right hand pinky finger of the musician, the 3-
membrane position sensor - which consists of three small
parallel strips that are able to independently detect the exact
location where they are touched - was placed along the left
side of the instrument. They allow the right hand of the
musician to touch it with either the pinky finger or the thumb,
while playing the three main MIDI EVI keys. They can also
be touched simultaneously with up to three fingers of the right
hand when the musician is not using the virtual valves.
The joystick was placed under the body of the instrument and
it is controlled by the right hand’s thumb and the two linear
Figure 2. Side view: joystick and 3-membrane sensor potentiometers are placed inside the canister of the MIDI EVI.
They two potentiometers are accessed by the index and
The second goal for the creation of the META-EVI was the middle fingers of the left hand of the musician, the same hand
intention of gathering and taking full advantage of the that controls the octave the instrument is playing in as well as
gestures and body motions that are naturally used by a the half-octave of the MIDI EVI. The two potentiometers
performer of a brass/wind instrument. These gestures are have a heavy usage in the extended instrument since they are
always present and very visible and they are used to directly controlled by the left hand of the performer, which is freer
influence various parameters of the software application that from the keying scheme.
controls the ongoing performance. The force resistance sensor, the 3-membrane position sensor,
the joystick and the two linear potentiometers being
4. PLAYING THE META-EVI independent from the player’s gestures allowed them to have
The META-EVI is played by blowing through its mouthpiece,
a more extensive set of functions and to respond to multiple
and fingering four touch sensors or virtual “valves”, which
mapping options.
allow the production of the 12 chromatic steps. Three of those
valves or keys, sit on top of the instrument and are the
equivalent of the three valves of a regular trumpet that are
played with the index, middle and ring fingers of the
musician’s right hand. The fourth valve (that lowers the pitch
by a fourth) consists of a metal ring snug against the lower
edge of the instrument, and it is accessed by the index finger
of the player’s left hand.
The instrument is supported mainly with the left hand, which
holds the canister - a cylinder shaped component located at its
bottom edge. The controller has a pitch span of seven full
octaves with octave switching being done by sliding the
thumb of the left hand on a set of six metal rollers (also touch
sensors) that are housed inside the canister.
Figure 3. Canister view: 2 potentiometers and 3 switches
308
4.2 The digital switches 5) DSP mode(s)

The META-EVI has a total of ten tactile switch buttons. 6) Mix mode
Seven of those buttons are placed on top of the instrument
easily accessible by the right hand fingers and the remaining
three switches are imbedded in the canister wall. The
placement of the three buttons within the canister was done as
an attempt to fully maximize all the free fingers of the
player’s left hand. Manny programming features had to be
implemented to allow for the flexibility that the instrument
presents. Therefore each switch responds to one simple click,
a double click, one click holding it down either a short or
longer time, thus giving two different readings, and a double
click holding down the second click, again either a short or a
longer time. This range of possibilities allows the performer to
maximize the high number of tasks that the relatively small
number of switches need to accomplish.
When programmed to respond to simple clicks, the 10
switches can be further explored by using switch
combinations. This way they can be made to function as a 10
bit number generator, theoretically being able to generate
1024 different values. However, only a small fraction of these
values are used - under one hundred different values. The
narrowing of the number of switch combinations results from
ergonomic constraints and also from practical limitations Figure 5. Real time harmonization of a lead melody
arisen from the need to memorize those combinations.
Additionally, the three switches that are placed right across
the three main virtual valves of the MIDI EVI can be made to 1) Harmony Mode
function as a secondary brass keying scheme. This allows the In ‘Harmony mode’ the META-EVI is programmed to play
musician to simultaneously play the normal melodic material truly complex and flexible harmonies. These include the
with the three touch sensor keys while being able to trigger choice, in real time, of 34 different chord types, in any key,
other events by using alternative switch combinations. their position (root, 1st inversion, etc), their voicing or internal
ordering and their tessitura. The combination of these
The extra sensors and switches of the META-EVI output different chord attributes, allows a great variety of chord
small voltages within the range of 0-5 volts. They are housed configurations to be created. The playback of harmonies is
within an acrylic structure that also holds the sensor interface, performed alongside the melodic material traditionally
a MIDITRON [4] sensor interface. The analog signals generated by the monophonic instrument, thus being able to
captured by the interface are sent via MIDI to a computer allow the musician to accompany him/herself.
running the Max/MSP programming language.
In this performance mode the sensors needed include the two
potentiometers, the joystick, the seven top switches and the
accelerometer. They are mapped so that the potentiometers
control respectively the number of the notes of the chord and
the chord inversion/position. The 3-membrane sensor chooses
the “voicing”, the joystick acting as a 8-way switch triggers
different chord types and the accelerometer chooses the
chord’s octave. The 7 switches have multiple functions the
most important of which are the alteration of specific notes
within the chords thus allowing great flexibility for the
configuration of the harmonic structures to be generated.
2) Counterpoint mode
The META-EVI is able to generate two or even three
simultaneous strands of melodic material in real time. Any of
these linear phrases can be further processed by having their
note information trigger stacks of parallel chords. These
Figure 4. View of the Sensor Interface MIDITRON
features allow the creation of a very complex web of musical
expression.
5. PERFORMANCE MODES 3) Sample trigger mode
The META-EVI was built to be capable of performing in a Besides the two main performance tasks above mentioned, the
variety of different performance modes, separately or META-EVI has been programmed to be able to access a large
concurrently. These include: number of recorded sound files. It can control the playback of
1) Harmony mode 64 samples at a time, being able to process them in real time
2) Counterpoint mode as well. Current digital processes include: ring modulation,
3) Sample trigger mode vocoding, multi-delays, filtering and granular sampling
4) Looper mode effects.
309
4) Looper mode 7. ACKNOWLEDGMENTS

A live sampling recording technique was also implemented This project was funded by a ‘sabbatical research grant’ from
allowing the musician to record in real time a musical phrase, both the Fundação para a Ciência e Tecnologia (FCT), and the
store it, play it back, and record new layers on top of it. The Fundação Luso-Americana para o Desenvolvimento (FLAD).
whole process can be repeated several times enabling the I also would like to thank my father for his invaluable work
performer to create up to 8 independent layers of music with hardware design.
materials. This performance mode relies exclusively on the
use of the seven top switch buttons.
8. REFERENCES
[1] www.patchmanmusic.com/NyleSteinerHomepage.html
[2] Detailed description in www.ewi-evi.com
[3] www.softwind.ch/Synthophone.asp
[4] www.eroktronix.com/miditron_manual.pdf
9. SOURCES
1. Burtner, M. Noise Gate 67 for MetaSaxophone.

Composition and Performance Considerations of a
New Computer Music Controller. In Proceedings of
the 2nd International Conference on New Interfaces
for Musical Expression (NIME 2002) Dublin,
Ireland, May 24-26, pp 71-76.
2. Scavone, G. P., (2003). The PIPE: Explorations
with Breath Control. In Proceedings of the 3rd
International Conference on New Interfaces for
Musical Expression (NIME 2003), Montreal,
Canada, May 22-24, pp. 15-18.
3. Cook, O. Principles for Designing Computer Music
Controllers”. In Proceedings of the 1st International
Conference on New Interfaces for Musical
Figure 6. MSP patch for a DSP performance mode Expression (NIME 2001) Seattle, USA, April 1-2
4. Burtner, M. A theory of modulated objects for new
5) DSP mode(s) shamanic controller design. In Proceedings of the
The META-EVI was also designed to control several different 4th International Conference on New Interfaces for
synthesis parameters within a given synthesis program. This is Musical Expression (NIME 2004), Hamamatsu,
accomplished through both the extra sensors and the native Japan, June 3-5, pp. 93-196.
sensors. In this mode the sensors that capture the musician’s 5. Essl G., O’Modhrain S. PebbleBox and Crumble
body motions have the unique function of closely translating Bag: Tactile Interfaces for Granular Synthesis In
the performance gestures with specific sonic changes. This Proceedings of the 4th International Conference on
mode was specifically developed with the second goal of the New Interfaces for Musical Expression (NIME
project in mind. 2004), Hamamatsu, Japan, June 3-5, pp. 74-79.
6) Mix mode 6. McCaig, Fels, Kaastra, Takahashi. Evoking Tooka:
In this mode the instrument plays at the same time, in two or from experiment to instrument. In Proceedings of
more of the previously described modes. This is by far the the 4th International Conference on New Interfaces
most challenging way to perform with the instrument, both for Musical Expression (NIME 2004), Hamamatsu,
technically and musically and it is under development. Japan, June 3-5, pp. 1-6.
7. Paradiso, J. Dual Use Technologies for Electronic
6. CONCLUSION Music Controllers: A personal perspective. In
The novelty of the META-EVI consists in the creation of an Proceedings of the 3rd International Conference on
instrument with a high number of sensors capable of sending New Interfaces for Musical Expression (NIME
continuous and reliable gesture and control information in real 2003), Montreal, Canada, May 22-24, pp. 228-234.
time, while easily being able to reassign any of the sensors’
8. Trueman D., Bahn C., Cook, P. Alternative Voices
information to any desired parameter in real time as well.
for Electronic Sound. In
These extra capabilities turn the already versatile
http://silvertone.princeton.edu/~dan/
expressiveness of the original MIDI EVI into a more powerful
instrument that enables the performer to have a wider control
of the sound that (s)he creates as well as being able to
generate more complex musical structures that go far beyond
simple monophonic playing.
310
Database and mapping design for audiovisual prepared

radio set installation
Robin Price, Pedro Rebelo
Sonic Arts Research Centre
Queen’s University Belfast
BT7 1NN, Northern Ireland
+44 (0)28 90974829
{rprice02, p.rebelo}@qub.ac.uk
ABSTRACT installation described aimed to encourage the listener to perceive

This paper describes both the design strategies behind a database that their will, expressed through interaction with a single radio’s
of radio gathered percussive sounds, and also the interaction dials and buttons, was imposed on to an unfolding sonic
scheme used to embed this database in an audio-visual landscape, reconfiguring and exploring it. The re-appropriation of
installation. In particular we describe the implementation of a existing objects for artistic examination is not itself a new concept
complex mapping system that uses a model of heat energy and and can trace its roots from Duchamp through the movement of
‘one to many’ mappings to encourage a holistic cognitive mode in appropriation art. More recently in the sonic arts it has seen use in
the user. the form of ‘hardware hacking’ [3]. Good examples are Phil
Archer’s carefully broken and re-appropriated CD Players and
Keywords printers [1] and Nicolas Villar’s ColorDex DJ system [9] that
Mapping, database, audiovisual, radio, installation art. works with re-appropriated harddrives as interfaces for track
selection, cueing and speed control. The key difference between
these examples and the outlined work is the subtlety with which
1. INTRODUCTION the appropriated object is tampered with. The author would argue
As part of the FIX 07 Festival [12] an audiovisual installation that simply breaking the radio in a creative fashion and presenting
featuring a prepared (hacked) radio set and accompanying visuals it for use in a ‘circuit bending’ style is too nihilistic an approach
was created. The objective of the piece was to engage users in a and that Villar’s harddrive turntable emulators are far removed
musical system where they felt they had a degree of control in its from the real media they control and are more a visual comment
output, without having to confront them directly with instructions about where the media is stored. The author’s piece represented
for its use. Instead the piece relied on the audiovisual feedback an attempt to engage the public in a musical style often
and the pre-existing experience people have of browsing for represented as inaccessible, challenging this assertion by housing
signals on radios. Rather than concentrating too hard on the it in a comfortable style icon and allowing the public to
individual effect of each dial and switch, the objective was to reconfigure it interactively. This allowed for the investigation of
place the user in what has been described as a ‘holistic’ mode [5] mapping the controls of a functioning, yet altered, radio to the
where they concentrate more on overall effect of the installation. variables of a musical system and the development of a large,
Aesthetically the piece hinged on the re-appropriation of an expanding, database of sounds that formed the raw input for the
existing technology, the radio set, for re-use in an installation audiovisual output.
context without altering its appearance. This allowed the
interesting possibility of remapping and subverting the existing Parameter mapping and it’s importance in defining the experience
relationship people have with the object into the parameters for a of an instrument/installation has been the subject of recent
constructed, database driven, musical engine that appeared to be investigation [4][5][6] and some of the conclusions that can be
hidden within it. drawn are that mappings which are not one-to-one are more
engaging for users and the use of metaphor can be a way of
building transparency into interfaces. This encouraged the
2. BACKGROUND development of the outlined piece’s use of a heat energy model as
John Cage, in his piece “Imaginary Landscape N° 4”, set out with a metaphor for interaction. One of the key aims of the piece was
the idea of “erasing all will and the very idea of success” [13] by to encourage a holistic cognitive mode, this can be thought of as
scoring for twelve radio sets each with two players. The piece the opposite to an analytical mode where the user is focused on
composed using Cage’s I-Ching method to create tables referring the logical, sequential details and cause and effect of an
to tempi, duration, sounds and dynamics. In contrast the interaction; rather a holistic mode places the emphasis on the
overall effect of interaction and attempts to refocus the mind on
the bigger picture of the experience. This was felt to be important
Permission to make digital or hard copies of all or part of this work for in the context of an art installation where the public were unlikely
personal or classroom use is granted without fee provided that copies are to be minded to learn a system but rather more interested with
not made or distributed for profit or commercial advantage and that playful interaction, seeing and hearing the effects of their
explorations with the radio set’s knobs and dials.
311
3. CONSTRUCTION automatically added to a hsql database [14] inside MaxMSP. The

onset of percussive sounds and spectral information was gathered
3.1 Physical Construction from the analyzer~ [11] object which triggered the recording and
subsequent saving of a 250 ms snippet of audio from a delay line
A re-issued TR-82A Bush radio was selected by the author both
suitably delayed to take into account the time required for
for its aesthetic beauty, its fifties styling and the large amount of
analyzer~ to detect the attack. The brightness (spectral centroid),
empty space within it. Several key modifications were made to the
noisiness and loudness of the first 50 ms of the recorded audio
functioning of the radio. Firstly the set was disassembled and
was averaged and then stored along with a unique filename in the
rotary potentiometers were fitted to one side of the volume, tone
database. Before entry the properties of the sound were compared
and frequency dials. These were then fixed in place such that the
to those existing in the database, if more than four existing
dial’s rotation could be measured as a resistance across the
recordings were found with properties closer than a specified
potentiometer. Custom built pressure sensors comprising of
amount then the sound was rejected. Each database entry also had
quantum tunneling compound squares and springs were placed
a play count to record the number of times it had been played as
under the switches allowing the pressure from a user’s fingers to
part of the installation.
be gauged. All of these were wired to a USB Arduino unit that
was housed in the battery compartment of the radio. The op-amp
3.3 Mapping
feeding the radio’s speaker was located and rewired to allow the
interruption and diversion of the set’s received signal and the The musical output of the installation fell into two modes
supply of a new signal to the speaker. A small section of the back depending on the radio being tuned or not. Within these two
of the radio was neatly cut and filed out to allow the attachment of modes the behaviour was determined by a combination of the real
two RCA phono sockets, a USB socket and a small switch. The time positions of the volume, tone, mode selector dials and
phono sockets were connected to the rewired op-amp via the buttons and an extra hidden inner variable the author defined as
switch such that the radio could either function normally or have the ‘heat’ of the system that took into account the recent activity
its ‘normal’ output diverted to a phono socket and play a signal in the installation. As the knobs and dials of the set were twisted
supplied by the second. The USB socket was connected to that of ‘energy’ was added to a portion of the Max/MSP patch that
the Arduino. The set when installed was connected by USB and modeled the first order differential equation governing the cooling
audio cables to a laptop hidden a plinth. This was running a of heated bodies. The amount of ‘energy’ added to the system was
Max/MSP patch that received the natural output of the radio, proportional to rotational velocity of dial. This allowed for the
supplied a replacement and interpreted the movements of the piece to internally measure the degree of interaction that was
controls. It was within Max that a database of sounds from the going on with the radio. As the dials and knobs were twisted and
tuned radio signal was constructed. used the ‘heat’ value stored in the patch rises for as long as it was
being played with and in proportion to the degree of interaction,
i.e. light stroking movements with the dials produced less of an
increase than frenzied twiddles. As soon as the set was left it
began to ‘cool’ and the ‘heat’ value dropped. Along with this idea
of heat an idea of ‘biasing’ was included in the mapping strategy.
Biasing occurs in many acoustic instruments [8] where an effect
only occurs or suddenly changes when enough energy is put into a
system. Biasing as implemented here means that a heat controlled
effect changed when the heat parameter was above a threshold
value. The inclusion of this extra heat parameter coupled with
biasing allowed for an interesting non-linear mapping of the
controls to multiple parameters of system.
Figure 1. TR82A Bush radio set installation.
3.2 Database Construction

The construction of a large, browsable and sortable database of
Figure 2. Heat Model and biasing to provide non
sounds was an important idea behind the piece inspired in part by
linear mapping.
Lev Manovich’s idea of the database as an art form [7]. This idea
has seen currency within the sonic arts/technology in Casey et al’s
research [2], i.e. soundspotter and the semantic hifi project [15].
Max/MSP monitored the noisiness of the unmodified radio signal
The approach implemented within the piece was to construct a
and when it fell below a threshold value the radio was deemed to
database of percussive sounds taken from the unaltered radio
be tuned, this method was quite effective in differentiating
signal. These were categorized by their spectral information and
stations from static. This triggered the fading in of the live radio
312
signal that had first been processed through two signal processors,
a sample and bit rate reduction unit and a filter and dub delay
unit. The signal processing varied with the heat, below a threshold
level the sound was passed through the sample and bit rate
reduction unit. The heat parameter controlled the amount of
sample and bit rate reduction so that if the set was left tuned then
slowly over time the sound degraded with a tearing effect. If a
gentle motion was applied to any of the dials the sound recovered,
a harder motion would supply enough energy to remove the
degradation unit from the DSP chain leaving the sound arriving
through the filter and dub delay unit. Here the heat parameter
controlled the length of the delay line, with higher heat leading to
a shorter delay line, to a minimum of 5ms, and the frequency of a
low frequency oscillator. This LFO was added to the position of
the volume dial to control the frequency of a peak equalizer filter
that sat in the feedback path of the delay unit giving it the dub
sound. The heat similarly controlled the frequency of another low
frequency oscillator that controlled the smooth interpolation
between numbers of different filters, the cut off frequency of
which was controlled by the tone dial.
Figure 4. Mapping strategy.
When the set was detuned the altered signal was faded and
replaced by the output of a phase vocoder that was a loop of the
last ten seconds of recorded radio station input. The speed of this
output would slowly drop to give the effect of the real signal
shifting out of place. This provided a sound bed for the key
feature of the untuned set, the sonified, sorted and browsable
contents of the database of detected radio sounds. This was
achieved through the play back of the 250 ms samples, looped,
filtered and enveloped by amplitude and low-pass filter envelopes.
The shape of these envelopes was set to track with the heat of the
system with higher temperatures corresponding to shorter, sharper
envelopes and a brighter filter sound. Similar to the installation’s
tuned mode the behaviour changed when the heat was above a
threshold value. At this point the tone dial would control the
length of the loops such that at either extremity of the dial the end
point of the loop and the sample would match but as it was rotated
towards the centre it shrank towards a 10 ms minimum giving a
glitching effect. The selection of samples to voice were taken from
the database by sorting the samples in order of loudness,
brightness or noisiness depending on which of the FM, MW or
LW buttons was depressed and mapping the location of the
frequency dial to a distance down the sorted list. Up to eight Figure 5. Example of database visualization.
consecutively ordered samples (thus timbrally similar in some
fashion) were voiced at a time. The number and rhythmic timing 3.4 Database visualization
of the samples was controlled by a collection of first order markov
chains, where each entry in a thirty-two step sequence table Tandem to the sonic output of the radio set, the installation
corresponded to the probability of a sound being triggered at that included a projected visualization of the sound database, showing
point. The probabilities changed and interpolated to create more the currently selected sounds and a representation of the unfolding
frenetic probability based beats, with more samples voiced, as the audio output. This was achieved through a custom programmed
heat of the system increased. Processing sketch running on a separate laptop wirelessly
networked to that running Max/MSP and communicating via
OSC. The radius of fifty-one rings was symmetrically mapped to
the energy in twenty-five frequency bands with the radius of
thicker central ring controlled by the loudest band at any time.
The length of the whole graphic equalizer bar extended and
contracted with the volume of the output. The degree of damping
applied both to this motion and the changing ring radius was
controlled by the heat of the system, such that at higher
temperatures low damping afforded rapid, jerky movement closely
following the output while lower temperatures gave a slow glacial
Figure 3. Signal flow diagram. movement. The database of samples was visualized around this
313
central equalizer bar with each sample represented by a rectangle, 5. CONCLUSION

parallel to the bar and rotated to face outwards. The position of This paper outlined the design and implementation of a sonic
each sample was mapped in cylindrical polar co-ordinates, with database with novel data visualization and the mapping of a re-
the mapping dependent on the current sort mode. In each sample appropriated object’s controls for its use. Some issues of mapping
sort mode, chosen with the band selector buttons, the samples were discussed in particular the idea of using a model of heat as
arranged themselves in the same order they were sorted by the metaphor for the amount of interaction with an installation and the
database, i.e. in order of loudness, brightness or noisiness, along use of biasing to give depth to a mapping strategy.
the length of the graphic equalizer bar. In each case the two
remaining unused parameters were used to order the radial
distance from the central bar and the angle they made with it. As
6. ACKNOWLEDGMENTS
the frequency dial was rotated and different samples selected for Thanks to my supervisor Dr Rebelo, Colm Clarke at Belfast
use from the database their visual representations shot out and as Catalyst Arts and Peter Bennett for their help.
they were played each sample square expanded and returned to its
original shape. The opacity of each square was controlled by the 7. REFERENCES
play count of each sample with more frequently played samples [1] Archer, P. Intervention and Appropriation: studies in the
producing more opaque, bluer, rectangles. aesthetics of the homemade in real-time electroacoustic
composition. Ph.D. Thesis, University of East Anglia,
Again as in the two audio modes of interaction an idea of biasing
Norwich. 2004.
was included such that when the heat passed above a threshold the
visualization would begin to warp and shimmer in a chaotic [2] Casey, M. and Grierson, M. Soundspotter and Remix-TV:
fashion. Fast Approxmate Matching for Audio-Visual Performance.
In Proceedings of the International Computer Music
The role of the visualization was partly to augment the idea of the Conference, (Copenhagen, Denmark, 2007).
hidden heat variable and provide an idea how the sounds were
sorted but also to encourage the ‘holistic’ mode of cognition [3] Collins, N. Handmade Electronic Music: The art of
through distracting the user from focusing too strongly on the Hardware Hacking. Routledge, UK, 2006.
effect of each dial on the audio output. [4] Fels, S., Gadd, A. and Mulder, A. Mapping transparency
through metaphor: towards more expressive musical
4. REFLECTION instruments. Organised Sound, 7, 2 (August 2002), 109–126.
The radio set had its inaugural installation in Belfast’s Catalyst [5] Hunt, A. and Kirk, R. Mapping Strategies for Musical
Arts centre between 21st and 30th November 2007. In conversation Performance Trends in Gestural Control of Music,.Ircam
and correspondence with some of the visitors the author noted in Centre Pompidou, 2000.
response to the experience that they felt their movements were
translated into musical outcomes but not in a manner they felt [6] Hunt, A., Wanderley M. and Paradis, M. The importance of
they wholly controlled or could be exactly repeated. Should we parameter mapping in electronic instrument design. In
treat repeatability as positive? The author would argue not and as Proceedings of NIME (NIME ’02), (Dublin, Ireland 2002).
complete repeatability would defer all choice to the listener and [7] Manovich, L. The Language of New Media. MIT Press,
require them to engage more deeply and in a more prolonged Cambridge, MA, 2002.
fashion with the piece to provide pleasing, varied output and
[8] Rovan, J., Wanderly, M., Dubnov, S. and Depalle, P.
defeat the purpose of the activity; namely encouraging the feeling
Instrumental Gestural Mapping Strategies as Expressivity
in the listener that they were engaging and customizing a
Determinants in Computer Music Performance. In Kansei,
preexisting sound ecology rather than composing their own. Users
The Technology of Emotion, Proceedings of the AIMI
also stipulated that installation provided a clear distinction
International Workshop, (Genoa, Italy, October 1997) 68–
between when no one was using it and when it was in use which
73.
could be attributed to the use of the heat model as a metaphor for
interaction in the piece. The ease of accessibility to the piece [9] Villar, N., Gellersen, H., Jervis, M. and Lang, A. The
without prior understanding was also mentioned and could be ColorDex DJ System: A New Interface for Live Music
mooted as a success for encouraging playful ‘holistic’ cognition. Mixing. In Proceedings of NIME (NIME ’06), (IRCAM,
France 2006).
The indirect mapping of user interaction with dials through an
extra middle layer of mapping has been pointed to in the past [10] [10] Wanderly, M. and Depalle, P. Gestural Control of Sound
and the author would argue that the installation allowed the Synthesis. In Proceedings of the IEEE, 92, 4 (April 2004),
opportunity to successfully demonstrate the novel metaphor of 632 – 644.
heat energy as a metaphor for interaction. The use of biasing [11] Analyzer~. http://web.media.mit.edu/~tristan/maxmsp.html
which occurs in acoustic instruments were a certain effect only [12] FIX Festival. http://www.fixcatalyst.org.uk/
comes into play after an amount of energy has been expended was
identified as a key way of building depth into the mapping [13] The John Cage Database.
strategy and thus the user’s experience. The combination of the http://www.johncage.info/workscage/landscape4.html
hidden mapping layer with direct mapping was a successful [14] net.loadbang-SQL.
strategy for encouraging visitors to engage in a more playful http://www.loadbang.net/space/Software/net.loadbang-SQL
‘holistic’ cognitive mode rather than approaching it as a learnable
[15] Semantic HIFI. http://shf.ircam.fr/
instrument.
314
Monalisa: “see the sound, hear the image”

Kazuhiro Jo Norihisa Nagano
RCAST Studio 2
University of Tokyo IAMAS
4-6-1, Komaba, Meguro-ku, Tokyo 3-95, Ryoke-cho, Ogaki City, Gifu
jo@jp.org nagano@monalisa-au.org
ABSTRACT plug-ins as sound effect. Monalisa-Image Unit is plug-in software

Monalisa is a software platform that enables to "see the sound, for image processing applications. It wraps existing sound effect
hear the image”. It consists of three software: Monalisa plug-ins as image effect plug-ins. Currently these software works
Application, Monalisa-Audio Unit, and Monalisa-Image Unit, and on Mac OSX. OSX offers basic API (Application Programming
an installation: Monalisa “shadow of the sound”. In this paper, we Interface) for image processing and sound processing as Core
describe the implementation of each software and installation with Image and Core Audio [2]. Monalisa Application, Monalisa-
the explanation of the basic algorithms to treat the image data and Audio Unit, and Monalisa-Image Unit use these API as a basis of
the sound data transparently. their image and sound processing. Monalisa "shadow of the
sound" is an installation. It consists of custom version of Monalisa
applications and a set of projector, switch, camera, speaker, and
Keywords microphone situated in a room. In this installation, people hear the
Sound and Image Processing Software, Plug-in, Installation sound of their image and see the sound.
1. INTRODUCTION 2. RELATED WORKS

Many artists and researchers have tried to "see the sound, hear the Monalisa treats the image and the sound as the sequence of
image”. Kandinsky corresponded the timbres of musical numbers represented by binary codes. SonART [18], the
instruments and colors. Based on the correspondence, he produced framework for image and data sonification, also offers to use data
many paintings borrowing motifs from traditional european music from images for sound synthesis or audio signal processing and
[7]. John Whitney created number of motion graphics based on vice verse. It is designed as standalone application and could be
the theory of musical harmony [16]. With Slow Scan, Laurie communicate other image and sound applications through network.
Anderson recorded the visual information as sounds and SoundHack is a sound-file processing program [5]. Though it
reconstructed these images by playing the Tape Bow Violin [1]. limits its target only for sound processing, its function "open any"
VinylVideo also recorded images as sounds into LP analog records. allows reading image data as sound data in the same way as
It replays the moving image from the LP record with ordinal Monalisa application.
record player and a custom hardware [12]. Xenakis developed a Because of the use of Core Image API, Monalisa Application and
computer system UPIC to allow the composer to draw music with Monalisa-Audio Unit employs GPU (Graphics Processing Unit)
a graphics tablet. The drawing is immediately calculated and for its sound processing. Several work have been doing in this
transformed into sound by the computer [17]. MetaSynth area. Gallo and Tsingos investigate the use of GPU for 3D audio
translates static images into sound. It enables people to draw geometric calculation [6]. Whalen examines using pixel shaders
sound by graphically editing the spectrum structure of sound [13]. for executing audio algorithm [14]. Zhang et al. implemented
The software platform, Monalisa, also tries to "see the sound, hear modal synthesis by using the parallelism and programmability in
the image” by treating all the image and the sound as the sequence graphics pipeline [13]. As Monalisa software indicate alternative
of numbers. In contrast to such previous works, it limits its target use of the existing effects for image and sound processing, these
to the image data and the sound data represented by binary codes. instance indicate alternative use of the existing hardware for
It consists of three software: Monalisa Application, Monalisa- sound processing.
Audio Unit, and Monalisa-Image Unit, and an installation:
Monalisa “shadow of the sound”. Monalisa Application is 3. ALGORITHM
standalone software that transparently treats the image data and Monalisa treats the image and the sound as the sequence of
the sound data. With this software, people open a sound data as an numbers represented by binary codes. The target image is a
image data and vice versa. Sound and image effects could be bitmap data that consists of pixels and the target sound is a linear
added to the data. Monalisa-Audio Unit is plug-in software for monaural PCM (Pulse Code Modulation) data that consists of
sound processing applications. It wraps existing image effect samples. In order to treat the image data and the sound data
transparently, we have developed following two algorithms: 8-bit
and 24-bit those sequentially transform pixels of image data into
Permission to make digital or hard copies of all or part of this work for samples of sound data and vice versa.
not made or distributed for profit or commercial advantage and that 3.1 8-bit
8-bit is an algorithm for Monalisa application, Monalisa-Image
requires prior specific permission and/or a fee. Unit and Monalisa "shadow of the sound". It treats each RGB
NIME08, June 5-7, 2008, Genova, Italy color value of pixel of bitmap data as an 8-bit sample of linear
Copyright remains with the author(s). PCM data. A bitmap image data consists of pixels. Each pixel has
315
three-color values: R (red), G (green), B (blue) and each value is

defined within a certain range. We sequentially treat each color
value (the order is R, G, B.) as a separate 8-bit data from the top
left pixel to the down right pixel of the image data (see Figure 1).
Figure 4. 24-bit algorithm
Figure 1. Treatment of image data 4. Monalisa: “see the sound, hear the image”
A linear PCM sound data consists of samples. Each sample stands 4.1 Monalisa Application
in a line at uniform intervals (i.e. sampling rate) and is defined Monalisa Application is standalone software that transparently
within a certain range. We sequentially treat each sample as a treats the image data and the sound data. Monalisa Application
separate 8-bit data from the start to the end of the sound data (see uses Core Image and Core Audio API as a basis of its image and
Figure 2). sound processing. The application enables people to open any
types of image data and sound data that the API supports. The
data is treated as both image and sound. Based on the 8-bit
algorithm, the image data can be played as sound and the sound
data can be showed as image. These API also provide access to
system-level plug-in architecture: Image Unit and Audio Unit. By
using these plug-in, Monalisa Application offers to add image and
sound effects for the data. We briefly describe two examples of
the effect for the data: invert the sound, and delay the image.
Figure 2. Treatment of sound data Invert the sound: Invert is an image effect that inverts each color
In this algorithm, we exchange the 8-bit color value for 8-bit value of pixels. In Monalisa Application, each sample of the
sample and vice versa. Each color value of a pixel is separately sound data is treated as separate 8-bit color value. By adding
treated as a sample of a sound data and each sequence of three invert to the sound, each sample of the sound data is inverted
samples is treated as three color values (R, G, B) of a pixel (see within a given range (8-bit) and forms a phase-reversed sound
Figure 3). from the original (see Figure 5).
Figure 5. Invert the sound

Figure 3. 8-bit algorithm Delay the image: Delay is a sound effect that delays each sample
of the sound data with a given interval and compounds it to the
3.2 24-bit original data. In Monalisa Application, each color value of pixels
24-bit is an algorithm for Monalisa-Audio Unit. It treats three- is treated as separate 8-bit sample. By adding delay to the image,
color values of pixel of bitmap data as a 24-bit sample of linear each color value of pixels is delayed with a given interval and
PCM data. We sequentially treat each color value (the order is R, added to the original image (see Figure 6).
G, B.) of a pixel as a combined 24-bit data from top left pixel to
down right pixel of the image data. As the sound data, we
sequentially treat each sample as a separate 24-bit data from the
start to the end of the sound data. In this algorithm, we exchange
the 24-bit color value for 24-bit sample and vice versa. Three-
color values of a pixel are treated as a composite sample of a
sound data and a sample of a sound data is treated as three
decomposed color values of a pixel (see Figure 4).
Figure 6. Delay the image
316
4.2 Monalisa-Audio Unit stream of sound through the speaker. Then, the microphone
Monalisa-Audio Unit is plug-in software for sound processing captured the stream of sound and transformed it to the application.
applications. Currently it works on Audio Unit host applications The application re-projected the incoming sound data as an image
on Mac OSX (e.g. Apple GarageBand, Apple Logic Pro). It on the screen from top left to down right of the screen. In this
enables people to add several kinds of image effects for the sound installation, the re-projected image reflects the reverberation of
data in real-time by wrapping existing Image Unit plug-in as the room as duplications of the image (see Figure 8).
Audio Unit plug-in. In Audio Unit host applications, the plug-in
behaves as a single Audio Unit plug-in. By using this plug-in, the
sound data is split into separate bitmap image by given buffer size
along to time line. Based on the 24-bit algorithm, each sample of
the sound data is treated as a pixel of a 24-bit image data. Image
effects (i.e. Image Unit plug-ins) can be added to the sound data
through Monalisa-Audio Unit. Several kinds of Image Unit plug-
ins are pre-installed on Mac OSX. Each pixel of the processed
image is re-treated as a sample and formed a new sound data in Figure 8. Original image and re-projected image.
real-time. We have developed two custom versions of Monalisa application.
One for image capture and sound production, and another for
4.3 Monalisa-Image Unit sound capture and image production. Both applications were
Monalisa-Image Unit is plug-in software for image processing installed in different PC. The first one (PC1) was connected to the
applications. Currently it works on Image Unit host applications camera, the projector, and the speaker. It captured the image, and
on Mac OSX (e.g. Pixelmator, Apple Motion). It enables people played the image as the sound. The other one (PC2) was
to add existing sound effects for the image data within several connected to the microphone and the speaker. It captured the
applications by wrapping existing Audio Unit plug-in as Image sound, and projected the sound as the image. To control the
Unit plug-in. It works not only for the static image data, but also trajectory of the installation, we employed MaxMSP [4] in other
for the motion graphics data by treating it as a collection of static PC (PC3). It was connected to the switch, the light, and a video
image. The plug-in works with standalone software Monalisa- switcher. When it received a signal from the switch, it sent a
Image Unit Generator. The software separately wraps each capture message for PC1 through Open Sound Control (OSC) [9].
existing Audio Unit plug-in as an Image Unit plug-in. In Image The light was controlled by DMX controller through MIDI. When
Unit host applications, each wrapped Audio Unit plug-in behaves the light turned to the darkness, PC3 sent a play message for PC1,
as a single Image Unit plug-in. Based on the 8-bit algorithm, each a capture message for PC2, and switched the video switcher from
color value of the pixel is treated as an 8-bit sample of the sound PC1 to PC2. After PC2 projected whole captured sound, PC3
data. Audio effects (i.e. Audio Unit plug-ins) can be added to the increased the light to ordinal level (see Figure 9).
image data as standard Image Unit plug-ins. Several kinds of
Audio Unit plug-ins are pre-installed on Mac OSX. Each sample
of the processed sound is re-treated as a color value of pixel and
formed a new image data in real-time.
4.4 Monalisa "shadow of the sound"

Monalisa "shadow of the sound" is an installation that represents
the essence of Monalisa application. It was premiered at Open
Space at NTT InterCommunication Center from 9th June 2006 to
11th March 2007. A total of about 10,000 people participated in
the work during the exhibition period. It consists of custom
version of Monalisa applications and a set of projector, camera,
microphone, switch, and speaker situated in a room (see Figure 7).
Figure 9. System of Monalisa "shadow of the sound".
5. DISCUSSIONS
In Monalisa, image data become sound data and vice versa. As
Whitelaw cited from the email of Christopher Sorg, "all data
Figure 7. Setting of Monalisa "shadow of the sound". inside the computer is essentially the same, ... either with ears or
eyes, or whatever senses we care to translate the switching of 1s
When entering the room, each participant saw his/her image and 0s into..." [15]. While sonification or visualization lay
projected on the screen. When he/she pushed the switch, the emphasis on the use of sound / image to help a user monitor and
image was captured as a static bitmap image data and the light of comprehend whatever it is that the sound / image output
the room was gradually decreased to the darkness. The image data represents [8], our software platform gives people new modes of
was transformed to the application and automatically played as a manipulation to employ the sound data and the image data as
317
materials to produce their own creative works. Sometimes, such [2] Apple, Core Audio, Core Image, http://www.apple.com
unintended use may results in horrible noise, while other times it [3] Cascone, K. The Aesthetics of Failure: "Post-Digital"
can produce wondrous tapestries of sound [3]. The software Tendencies in Contemporary Computer Music. Computer
platform also enables people to use existing image plug-ins in Music Journal, Vol. 24, No. 4, December 2000, pp. 12-18.
existing sound processing applications and vice versa. This
experience suggests us to access old media objects in new ways [4] CYCLING74. MaxMSP,
congruent with information interfaces we use in our everyday life http://www.cycling74.com/products/maxmsp
[10]. [5] Erbe, T. SoundHack: A Brief Overview, Computer Music
Journal, Vol. 21, No. 1, Spring 1997, pp. 35-38.
We have developed two algorithms: 8-bit and 24-bit. While 8-bit
provides one to one relationship in each color value and sound [6] Gallo, E. and N. Tsingos. Efficient 3D Audio Processing
sample, 24-bit provides one to one relationship between each with the GPU. In GP2, ACM Workshop on General Purpose
pixel and sample. Therefore the result of adding effects for data Computing on Graphics Processors, 2004.
has difference in each algorithm. For instance, with 8-bit, the [7] Kandinsky, W. Concerning the Spiritual in Art, New York:
resulted sound will be low bit like old hip-hop sample. In contrast, Dover Publications Inc, 1977.
24-bit retains the resolution of sound data which most of existing
sound processing application provides. We think their [8] Kramer, G. (ed.) Auditory Display - Sonification,
distinguished characters and the variety of expressions are not a Audification, and Auditory Interfaces. Addison-Wesley,
trivial function. Therefore we plan to provide the selection of two 1994.
algorithms in our future release. [9] Maeda, J. Design by Numbers, Cambridge, MA: MIT Press,
1999.
Due to the technical limitation, we currently fixed the buffer size
of 4096 in Monalisa-Audio Unit. It limits the range of [10] Manovich, L. The anti-sublime ideal in data art.
transformation and inhibits to treat whole sound data if it is not http://www.manovich.net/DOCS/data_art.doc, 2002.
consists of 4096 samples. Therefore we also plan to add adjusting [11] OSC, Open Sound Control,
mechanism for buffer size in future release. http://cnmat.cnmat.berkeley.edu/OSC/
While situating the installation Monalisa "shadow of the sound", [12] Sengmuller, G. VinylVideo TM, Leonardo - Volume 35,
we have conducted several observations. Due to the limited space, Number 5, October 2002, pp. 504-504.
we briefly introduce two following observations.
[13] Wenger, E. MetaSynth, http://metasynth.com
Figure changes the sound: The resulted sound was produced from [14] Whalen, S. Audio and the Graphics Processing Unit.
the image of the participant. Therefore, the figure of the http://www.node99.org/projects/gpuaudio/gpuaudio.pdf,
participant affects the character of the sound, for example, white 2005.
T-shirt produced higher frequency and border of striped shirts
made kind of rhythmical sound. [15] Whitelaw, M. Hearing Pure Data: Aesthetics and Ideals of
Data-Sound, in Arie Altena (ed.) Unsorted: Thoughts on the
Sound / Image equipment affects image / sound: We have tested Information Arts: An A to Z for Sonic Acts X, Amsterdam:
several video cameras, microphones, speakers and lights. When Sonic Acts/De Balie, 2004.
we changed these equipments, the quality of the sound equipment
[16] Whitney, J. Digital Harmony: On the Complementarity of
affects image and vice versa. For instance, the sensitivity of the
Music and Visual Art, McGraw-Hill, Inc., New York, NY,
video camera affects the spectrum of the sound and the directivity
1981.
of the microphone affects the clearness of the image. These
characters of Monalisa show potential as alternative tools to check [17] Xenakis, I. Formalized Music, rev. ed. Stuyvesant, New
the quality of sound and image equipments. York: Pendragon Press, 1992.
The possibility of computational process as a material for artistic [18] Yeo, W. S.,, Berger, J., and Lee, Z., SonART: A framework
creation [9] is not fully investigated yet. We are interested to for data sonification, visualization and networked multimedia
explore the alternative way for image and sound processing. We applications, in Proceedings of the Internaional Computer
anticipate that our initial explorations of the software platform Music Conference, Miami, FL, USA, November 2004.
stimulate new ideas for the instruments for image and sound [19] Zhang, Q., Ye, L., and Pan, Z. Physically-Based Sound
productions. Synthesis on GPUs, Entertainment Computing - ICEC 2005,
Springer Berlin / Heidelberg, Volume 3711/2005, pp. 328-
6. ACKNOWLEDGEMENTS 333.
This work was developed under the support of FY2005 IPA
Exploratory Software Project (Project Manager: KITANO 8. APPENDIX
Hiroaki) provided by Information-Technology Promotion Agency Monalisa application is under development for next release.
(IPA) and NTT InterCommunication Center. We also would like Monalisa-Audio Unit and Monalisa-Image Unit are downloadable
to thank Nao Tokui and Kazuo Ohno for their valuable comments from following URL.
and Karl D.D. Wills for his excellent graphic design.
http://nagano.monalisa-au.org/?page_id=351
7. REFERENCES A Japanese techno musician Junichi Watanabe employs Monalisa-
[1] Anderson, L. THE RECORD OF THE TIME Sound in the Audio Unit to produce his latest album "LITTLE SQUEEZE
Work of Laurie Anderson, NTT Publishing Co., Ltd., 2005. PROPAGANDA" (ADDL-004, AsianDynasty, 2007).
318
A Turing Test for B-Keeper: Evaluating an Interactive

Real-Time Beat-Tracker
Andrew Robertson Mark D. Plumbley Nick Bryan-Kinns

Centre for Digital Music Centre for Digital Music Department of Computer
Department of Electronic Department of Electronic Science
Engineering Engineering Queen Mary, University of
Queen Mary, University of Queen Mary, University of London
London London nickbk@dcs.qmul.ac.uk
andrew.robertson@ mark.plumbley@
elec.qmul.ac.uk elec.qmul.ac.uk
ABSTRACT tests, such as the MIREX Competition [4], since these did
Off-line beat trackers are often compared to human tap- not involve the necessary component of interaction and our
pers who provide a ground truth against which they can beat tracker was highly specialised for performance with in-
be judged. In order to evaluate a real-time beat tracker, put from drums. In MIREX, the beat trackers are compared
we have taken the paradigm of the ‘Turing Test’ in which to data collected from forty human tappers who collectively
an interrogator is asked to distinguish between human and provide a ground truth annotation [5].
machine. A drummer plays in succession with an interac- In order to test the real-time beat tracker, we wanted to
tive accompaniment that has one of three possible tempo- make a comparison with a human tapper and to do so within
controllers (the beat tracker, a human tapper and a steady- a live performance environment, yet in a way that would be
tempo metronome). The test is double-blind since the re- both scientifically valid and also provide quantitative as well
searchers do not know which controller is currently function- as qualitative data for analysis.
ing. All participants are asked to rate the accompaniment In Alan Turing’s 1950 paper, ‘Computing Machinery and
and to judge which controller they believe was responsible. Intelligence’ [9] he proposes replacing the question ‘can a
This method for evaluation enables the controllers to be computer think?’, by an Imitation Game, popularly known
contrasted in a more quantifiable way than the subjective as the “Turing Test”, in which it is required to imitate a
testimony we have used in the past to evaluate the system. human being1 in an interrogation. If the computer is able
The results of the experiment suggest that the beat tracker to fool a human interrogator a substantial amount of the
and a human tapper are both distinguishable from a steady- time, then the computer can be credited with ‘intelligence’.
tempo accompaniment and they are preferable according to Turing considered many objections to this philosophical po-
the ratings given by the participants. Also, the beat tracker sition within the original paper and there has been consid-
and a human tapper are not sufficiently distinguishable by erable debate as to its legitimacy, particularly the position
any of the participants in the experiment, which suggests referred to as ‘Strong A.I.’. Famously, John Searle [7] put
that the system is comparable in performance to a human forward the Chinese room argument which proposes a sit-
tapper. uation in which computer might be able to pass the test
without ever understanding what it is doing.
The Imitation Game might prove to be an interesting
Keywords model for constructing an experiment to evaluate an inter-
Automatic Accompaniment, Beat Tracking, Human-Computer active musical system. Whilst we do not wish to claim the
Interaction, Musical Interface Evaluation system posseses ‘intelligence’, its ability to behave as if it
had some form of ‘musical intelligence’ is vital to its ability
1. INTRODUCTION to function as an interactive beat tracker.
B-Keeper controls the tempo by processing onsets de-
Our research concerns the task of real-time beat tracking
tected by a microphone placed in the kick drum with addi-
with a live drummer. In a paper at last year’s NIME Con-
tional tempo information from a microphone on the snare
ference [6], we introduced a software program, B-Keeper,
drum. The beat tracker is event-based and uses a method
and described the algorithm used. However, the evaluation
related to the oscillator models used by Large[3] and Toivi-
of the algorithm was mainly qualitative, relying on testi-
ainen[8]. Rather than processing a continuous audio sig-
monial from drummers who had tried using the software in
nal, it processes events from an onset detector and modifies
performances and rehearsal.
its tempo output accordingly. B-Keeper interprets the on-
In trying to find a scientific method for testing the pro-
sets with respect to bar position using an internal weighting
gram, we could not use previously established beat tracking
mechanism and uses Gaussian windows around the expected
beat locations to quantify the accuracy and relevance of the
onset for beat tracking. A tempo tracking process to deter-
mine the best inter-onset interval operates in parallel with a
personal or classroom use is granted without fee provided that copies are synchronisation process which makes extra adjustments to
not made or distributed for profit or commercial advantage and that copies remain in phase with the drums. The parameters defining
republish, to post on servers or to redistribute to lists, requires prior specific 1
permission and/or a fee. As Turing formulates the problem, the computer imitates a
NIME08, Genoa, Italy man pretending to be a woman, so as to negate the element
Copyright 2008 Copyright remains with the author(s). of bias due to the imitation process from the test
319
‘not like a metronome’ and hence, distinguishable from the

Steady Tempo trials. These expectations will form the ba-
sis of our hypotheses that are to be tested and we collected
quantitative and qualitative data in order to do so.
Figure 1: AR taps on the keyboard in time with

drummer, Joe Caddy, during one of the tests
the algorithm’s behaviour automatically adapt to suit the

playing style of the drummer.
B-Keeper is programmed as Java external within the
Max/MSP environment. More details are given in our pa-
per, ‘B-Keeper: a real time beat tracker for live perfor-
mance’ [6], published at NIME2007.
2. EXPERIMENTAL DESIGN
The computer’s role in controlling the tempo of an ac- Figure 2: Design set-up for the experiment. Three
companiment might also be undertaken by a human con- possibilities: (a) Computer controls tempo from
troller. This, therefore, suggests that we can compare the drum input; (b) Steady Tempo; (c) Human controls
two within the context of a “Turing Test” or Imitation tempo by tapping beat on keyboard
Game. We also extend the test by including a control -
a steady accompaniment which remains at a fixed tempo After each trial, we asked each drummer to mark an ‘X’
dictated by the drummer. For each test, the drummer gives on an equilateral triangle which would indicate the strength
four steady beats of the kick drum to start and this tempo of their belief as to which of the three systems was respon-
is used as the fixed tempo. sible. The three corners corresponded to the three choices
The test involves a drummer playing along to the same and the nearer to a particular corner they placed the ‘X’, the
accompaniment track three times. Each time, a human tap- stronger their belief that that was the tempo-controller for
per (AR) taps the tempo on the keyboard, keeping time that particular trial. Hence, if an ‘X’ was placed on a cor-
with the drummer, but only one of the three times will this ner, it would indicate certainty that that was the scenario
be altering the tempo of the accompaniment. For these tri- responsible. An ‘X’ on an edge would indicate confusion
als, controlled by the human tapper, we applied a Gaussian between the two nearest corners, whilst an ‘X’ in the mid-
window to the intervals between taps in order to smooth dle indicates confusion between all three. This allowed us
the tempo fluctuation, so that it would still be musical in to quantify an opinion measure for identification over all
character. Of the other two, one will be an accompaniment the trials. The human tapper (AR) and an independent
controlled by the B-Keeper system and the other the same observer also marked their interpretation of the trial in the
accompaniment but at a fixed tempo (see Figure 2). The same manner.
sequence in which these three trials happen is randomly In addition, each participant marked the trial on a scale
chosen by the computer and only revealed to the partic- of one to ten as an indication of how well they believed
ipants after the test so that the experiment accords with that test worked as ‘an interactive system’. They were also
the principle of being ‘double-blind’: i.e. neither the re- asked to make comments and give reasons for their choice.
searchers nor the drummer know which accompaniment is A sample sheet from one of the drummers is shown in Figure
which. Hence, the quantitative results gained by asking for 3.
opinion measures and performance ratings should be free We carried out the experiment with eleven professional
from any bias. and semi-professional drummers. All tests took place at
We are interested in the interaction between the drum- the Listening Room of the Centre for Digital Music, Queen
mer and the acommpaniment which takes place through the Mary, University of London, which is an acoustically iso-
machine. In particular, we wish to know how this differs lated studio space. Each drummer took the test (consisting
from the interaction that might take place with a person, of the three randomly-selected trials) twice, playing to two
or in this case, a human beat tracker. We might expect different accompaniments. The first was based on a dance-
that, if our beat tracker is functioning well, the B-Keeper rock piece first performed at Live Algorithms for Music Con-
trials would be ‘better’ or ‘reasonably like’ those controlled ference, 2006, which can be viewed on the internet [1]. The
by the human tapper. We would also expect them to be second piece was a simple chord progression on a software
320
version of a Fender Rhodes keyboard with some additional been correctly identified. The distribution does not seem
percussive sounds. The sequencer used was Ableton Live to have the kind of separation seen for the Steady Tempo
[2], chosen for its time-stretching capabilities. trials, suggesting that they have difficulty telling the two
We recorded all performances on video and audio and controllers apart, but could tell that the tempo had varied.
stored data from the B-Keeper algorithm. This allowed us
to see how the algorithm processed the data and enabled us
to look in detail at how the algorithm behaved and monitor
how the tempo of the accompaniment was changed by the
system.
Figure 4: Results where the eleven different drum-

mers judged the three different accompaniments (B-
Keeper, Human Tapper and Steady Tempo) in the
test. The symbol used indicates which accompani-
ment it actually was (see corners).
The deduction process used by participants generally

worked by first trying to determine whether the tempo had
been steady or not. In the majority of cases, this was
successful, but some misidentifications were made, partic-
ularly if the drummer had played to the accompaniment
and not made much attempt to influence the tempo. In
these cases, the distinction between an interactive accom-
paniment, which will adapt to you, and one at a fixed tempo
is harder to judge.
Figure 3: Sample sheet filled in by drummer Adam The second deduction to be made would be, in the case
Betts. where the tempo varied or the music appeared responsive,
to discern whether the controller had been B-Keeper or the
Human Tapper. In order to do so, there needs to be some
3. RESULTS assumption as to the characteristics that might be expected
of each. From interviews, we recognised that drummers ex-
We shall contrast the results between all three tests, par-
pect the human to be more adaptable to changes in rhythm
ticularly with regard to establishing the difference between
such as syncopation and they may also have felt that a hu-
the B-Keeper trials and the Human Tapper trials and com-
man would respond better to changes within their playing.
paring this to the difference between the Steady Tempo and
For instance, as drummer Tom Oldfield commented: “I felt
Human Tapper trials. In Figure 4, we can see the opinion
that was the human, because it responded very quickly to
measures for all drummers placed together on a single tri-
me changing tempo.”
angle. The corners represent the three possible scenarios:
B-Keeper, Human Tapper and Steady Tempo with their re- 3.1 Case Study: Joe Caddy
spective symbols. Each ‘X’ has been replaced with a symbol
One dialogue exchange shows the kind of logical debate
corresponding to the actual scenario in that trial. In the di-
in action.2
agram we can clearly observe two things:
JC: [talking about the trials]: “The first one I gave 8 and
There is more visual separation between the Steady
I put actually closer to human response. I played pretty
Tempo trials than the other two. With the exception of
simply and it followed it quite nicely. The second one had
a relatively small number of outliers, many of the steady
no response at all to tempo on the drums. The last one I
tempo trials were correctly placed near the appropriate cor-
gave 9 - great response to tempo change, I slowed it up, I
ner. Hence, if the trial is actually steady then it will prob-
slowed it down. It took a couple of beats to resolve, but I
ably be identified as such.
The B-Keeper and Human Tapper trials tend to be spread 2
JC refers to Joe Caddy, session drummer and drummer for
over an area centered around the edge between their respec- hip-hop band Captive State; AR refers to the first author,
tive corners. At best, approximately half of these trials have who acted as the Human Tapper in all experiments.
321
think I put it nearer the B-Keeper.”

AR: “Is that because you have some experience of the Table 1: Mean Identification measure results for all
system?” judges involved in the experiment. Bold percent-
JC: “If it was human, I would have expected it to catch ages correspond to the correct identification
up more quickly. I think because it took two or three beats Judged as:
to come in at the new tempo, it was the B-Keeper.” Judge Accompn.t B-Keeper Human Steady
AR: “Same. I think it’s an 80 per cent chance that that B-Keeper 44 % 37 % 18 %
was B-Keeper.” Drummer Human 38 % 44 % 17 %
[Result is revealed: The first was B-Keeper; the last the Steady 12 % 23 % 64 %
Human Tapper , i.e. controlled by AR - the opposite to
what both JC and AR have identified.] Human B-Keeper 59 % 31 % 13 %
AR: “I just didn’t think it was that though. I guess it Tapper Human 36 % 45 % 23 %
must have been.” Steady 15 % 17 % 68 %
JC: “The last test we did, I changed the tempo much
more. Do they surprise you those results?”
AR: ”The first I felt was me and I felt that the last wasn’t B-Keeper 55 % 39 % 6%
me.” Observer Human 33 % 42 % 24 %
This exchange demonstrates how both a drummer and Steady 17 % 11 % 73 %
even the person controlling the tempo can both be fooled
by the test. From the point of view of the key tapper, AR
suggests that there is a musical illusion in which, by tapping ing the synchronisation window so wide that the machine
along to the drummer playing, it can appear to be having was thrown out of sync. In Figure 5, this can be seen
an effect when in fact there is none. The illusion is strongest happening after about fifty seconds, where the pattern has
when the B-Keeper system was in operation as the music changed so the onsets are no longer used by the tracker
would respond to changes in tempo. This effect is reflected to synchronise (dotted errors in second graph). When it
in the opinion measures reported by AR, which we initially eventually does so at sixty to seventy seconds, an erroneous
expected to be higher for the Human Tapper trials than the adjustment easily occurs due to the size of the window and
others, but had a mean of only 45% (see Table 1). low threshold.
In this case, it was immediately apparent that it was B-
3.2 Case Study: Adam Betts Keeper since the tempo had varied and done so in a non-
human manner. It had made an apparent mistake and all
three involved in the experiment, the drummer, the hu-
man tapper and our independent observer, immediately
concluded that this was B-Keeper. On the trial sheet, Adam
commented:
“Scary. Okay at beginning, but got confused and

guessed tempo incorrectly with 16ths etc. When
it worked, it felt good.”
Such an event happened only one time out of the the

twenty-two tests3 , but it is interesting since it suggests that
the form of the experiment is viable for similar reasons to
those suggested by Turing. In the scenario of the imitation
game, if the machine did exhibit abnormal behaviour (for
instance, as he suggests, the ability to perform very quick
arithmetical calculations) or, as implied throughout Tur-
ing’s paper, the inability to answer straight-forward ques-
tions such as the length of one’s hair, then one could easily
deduce it was the machine. In this case, the absence of
human tolerance to extreme syncopation is the the kind of
‘machine-like’ characteristic that made it easily identifiable.
Figure 5: Data from the B-Keeper’s interaction 3.3 Analysis and Interpretation
with drummer Adam Betts. The top graph shows
The mean scores recorded by the drummers are given at
the tempo variation. The second graph shows the
the top of Table 1. They show similar measures for cor-
errors recorded by B-Keeper between the expected
rectly identifying the B-Keeper and Human Tapper trials,
and observed beats. The final two graphs show how
both have mean scores of 44%, with the confusion being
the synchronisation threshold and window automat-
predominantly between which of the two variable tempo
ically adapt, becoming more generous when onsets
controllers is operating. The Steady Tempo trials have a
fail to occur in expected locations.
mean confidence score of 64% on the triangle.
Each participant in the experiment had a higher score for
The above study shows a scenario in which the B-Keeper identifying the Steady Tempo trials than the other two. It
fooled the drummer into guessing it was a human-controlled appears that the Human Tapper trials are the least identi-
accompaniment. In one trial with James Taylor Quar- fiable of the three and the confusion tends to be between
tet drummer, Adam Betts, the machine had been cali- the B-Keeper and the Human Tapper.
brated (to its usual setting) so as to be fairly responsive
to tempo changes. However, when he played a succession of 3
This was due to incorrect parameter settings for the drum-
highly syncopated beats, the algorithm responded by mak- ming style in question.
322
Table 2: Table showing the polarised decisions made Table 4: Table contrasting decisions made by the
by the drummer for the different trials. drummer over the B-Keeper and Human Tapper
Judged as: trials.
Controller B-Keeper Human Steady Judged as:
B-Keeper 9.5 8.5 4 Controller Human Tapper B-Keeper
Human Tapper 8 10 4 Human Tapper 9 8
Steady Tempo 2 4 16 B-Keeper 8 8
Table 3: Table showing the polarised decisions made acteristic of having variable tempo and thus is not identifi-
by the drummer over the Steady Tempo and Human able simply by trying to detect a tempo change, we would
Tapper trials. expect that if there was a machine-like characteristic to
Judged as: the B-Keeper’s response, such as an unnatural response
Controller Human Tapper Steady Tempo or unreliability in following tempo fluctuation, syncopation
Human Tapper 12 4 and drum fills, then the drummer would be able to iden-
Steady Tempo 5 14 tify the machine. It appeared that, generally, there was
no such characteristic and drummers had difficulty decid-
ing between the two controllers. It may appear that having
the Human Tapper visible to them would give them an ad-
Of the B-Keeper trials themselves, the drummers were
vantage, however, this did not prove to be the case as the
least confident in identifying it as the controller. The re-
similarity between the computer’s response and a human
searchers, who acted as independent observer and the tap-
tapping along was close enough that often the observer and
per, were more confident. In an analogous result, we might
the human tapper were also unsure of the controller.
expect the human tapper, the first author, to be able to dis-
tinguish the trials in which he controlled the tempo, how-
ever, this did not appear to be the case. He was more
successful at discerning the other two trials.
We can polarise the decisions made by drummers by tak-
ing their highest score to be their decision for the that trial.
In the case of a tie, we split the decision equally. The advan-
tage of this method is that we can make pair-wise compar-
isons between any of the controllers, whilst also allowing the
participants the flexibility to remain undecided between two
possibilities. Table 2 shows the polarised decisions made by
drummers over the trials. There is confusion between the
B-Keeper and Human Tapper trials, whereas the Steady
Tempo trials were identified over 70% of the time. The B-
Keeper and Human Tapper trials were identified 43% and
45% respectively, little better than chance.
3.4 Comparative Tests

In order to test the distinguishablility of one controller
from the other, we can use a Chi-Square Test, calculated Figure 6: Bar Graph indicating the different fre-
over all trials with either of the two controllers. If there is a quency of cumulative ratings for the three scenar-
difference in scores so that one controller is preferred to the ios - B-Keeper (black), Human Tapper (grey) and
other (above a suitable low threshold), then that controller Steady Tempo (white).
is considered to be chosen for that trial. Where no clear
preference was clear, such as in the case of a tie or neither The difficulty of distinguishing between controllers was
controller having a high score, we discard the trial for the a common feature of many tests and whilst the test had
purposes of the test. been designed expecting that this might be the case, the
Thus for any two controllers, we can construct a table results were often surprising when revealed. In addition, we
for which decisions were correct. The table for comparisons did not expect drummers to believe steady accompaniments
between the Steady Tempo and the Human Tapper trials is had sped up or slowed down with them or the human tapper
shown in Table 3. We test the hypothesis that the distribu- that he had controlled the tempo when he had not. This
tion is the same for either controller, corresponding to the indicates a subjectivity to the perception of time. It seems
premise that the controllers are indistinguishable. that some drummers had an enhanced ability to spot a fixed
The Chi-Square Test statistic for this table is 8.24 which tempo without even varying much, perhaps gained through
means that we reject the test hypothesis at the 5% signifi- extensive experience. Matt Ingram, session drummer, who
cance level. This indicates a significant separation between professed to have been “playing to click for the last ten
the controllers. Partly this can be explained from the fact days, all day every day”, remarked of the Steady Tempo
that drummers could vary the tempo with the Human Tap- trial: “It felt like playing to a metronome, cause it was just
per controller but the Steady Tempo trials had the charac- there. Either that or your time’s great, cause I was trying
tersitic of being metronomic. to push it and it wasn’t letting me.”
Comparing the B-Keeper trials and the Human Tapper
trials, we get the results shown in table 4. The Chi-Square 3.5 Ratings
test statistic is 0.03 which is extremely low, suggesting no In addition to the identification of the controller for each
significant difference in the drummers’ identification of the trial, we also also asked each participant to rate each trial
controller for either trial. Whilst B-Keeper shares the char- with respect to how well it had worked as an interactive
323
This provides a more informative comparison for evaluation

Table 5: Median ratings given by all participants for than subjective interviews.
the different scenarios. The combined total median The beat tracker has proved to be comparable in per-
is given in bold. formance to the human tapper and is not distinguishable
Median Rating in any statistically significant way. The Steady Tempo ac-
Judge B-Keeper Human Tapper Steady Tempo companiment was perceived as a less successful accompa-
Drummer 7.5 5.5 5 nist and was considerably more distinguishable from the
Human 8 6.5 4 variable tempo accompaniments. In addition, the resulting
Observer 8 7 5 accompaniment was judged as being aesthetically compara-
Combined 8 6 5 ble with that resulting from using a human tapper.
We are currently working on incorporating the beat
tracker into a live rock music band. By interfacing with
accompaniment to the drums. Our reasoning in obtain- Ableton Live, the beat tracker provides a framework for
ing ratings for the accompaniments is that in addition to the triggering of loops, samples and electronic parts within
trying to establish whether the beat tracker is distinguish- a rock performance without recourse to click tracks or other
able human tapper controller, it is also desirable to com- compromises. We aim to evaluate its efficiency by case stud-
pare the controllers through a rating system. Partly we are ies with users of the system. We are also concentrating on
interested in how the drums and accompaniment sounded improving the ability of the system to correctly interpret
together, but also we are interested in its response to the extended syncopation and expressive timing within drum
drums. patterns within its analysis of onsets.
The cumulative frequency for these ratings over all par-
ticipants (drummers, human tapper and independent ob- 5. ACKNOWLEDGMENTS
server) is shown in Figure 6. The Steady Tempo accom- The authors would like to thank Adam Stark, Enrique
paniment was consistently rated worse than the other two. Perez Gonzales and Robert Macrae for acting as indepen-
The median values for each accompaniment are shown in dent observers during the tests. We would also like to thank
Table 5. The B-Keeper system has consistently been rated all drummers who kindly participated in the experiment:
higher than both the Steady Tempo and the Human Tapper Joe Yoshida, Rod Webb, Joe Caddy, Matt Ingram, Jem
accompaniment. Doulton, Greg Hadley, Adam Betts, Tom Oldfield, David
The overall median ratings, calculated over all partici- Nock, Hugo Wilkinson and Mark Heaney.
pants, were: B-Keeper: 8, Human Tapper: 6 and Steady AR is supported by a studentship from the EPSRC.
Tempo: 5. It is important that not only was the the
beat tracker not significantly distinguishable from the hu- 6. REFERENCES
man tapper, but it performed as well when judged by both
the drummer and an independent observer. The fact that [1] http:
the median rating is towards the top end of the scale sug- //www.elec.qmul.ac.uk/digitalmusic/b-keeper.
gests that musically the beat tracker is performing its task [2] http://www.ableton.com, viewed on 4th April, 2008.
well. As the experiment was double-blind, there was no bias [3] E. W. Large. Beat Tracking with a Nonlinear
within the scaling of the different controllers. Oscillator. In Working Notes of the IJCAI-95
If we look at pair-wise rankings, we can assess the the Workshop on Artificial Intelligence and Music,
significance of this difference between ratings. Firstly, we Montreal, pages 24–31, 1995.
convert the rating out of ten into a strict ordinal rating [4] M. F. McKinney. Audio beat tracking contest
(allowing equality where necessary). The Wilcoxon signed- description, 2006. http://www.music-ir.org/
rank test is a non-parametric statistical test that can apply mirex2006/index.php/Audio Beat Tracking as viewed
to test the hypothesis that the controllers’ rankings have on 4th april 2008.
the same distribution. For more than twenty trials, the [5] M. F. McKinney, D. Moelants, M. E. P. Davies, and
distribution for this test statistic is approximately normal. A. Klapuri. Evaluation of audio beat tracking and
When contrasting the rankings given by drummers to B- music tempo extraction algorithms. Journal of New
Keeper with the Steady Tempo and Human Tapper trials, Music Research, 36(1):1–16, 2007.
the approximate Z ratios4 are 2.97 and 2.32 respectively. [6] A. Robertson and M. Plumbley. B-keeper: A
Thus, we would reject the hypothesis that the controllers beat-tracker for live performance. In Proc.
are equally preferable at the 5% significance level in both International Conference on New Interfaces for
cases. The fact that the ratings are significantly higher for Musical Expression (NIME), New York, USA, 2007.
B-Keeper is highly important as the primary aim is to create [7] J. Searle. Minds, Brains and Programs. Behavioural
a musically successful beat tracker for live drums. and Brain Sciences, 3:417–457, 1980.
[8] P. Toiviainen. An interactive midi accompanist.
4. CONCLUSIONS AND FUTURE WORK Computer Music Journal, 22(4):63–75, 1998.
[9] A. Turing. Computing Machinery and Intelligence.
In this experiment, we contrast a computer-based beat
Mind, 59:433–460, 1950.
tracker with a human tapper and metronome for the pur-
poses of providing interactive accompaniment to drums.
The Turing Test has proved an interesting scenario for a
scientific evaluation of the beat tracker. By contrasting it
with a human tapper in an experiment analogous to that
described by Turing for language imitation, we were able
to assess its performance against human abilities which are
the standard against which beat trackers are best judged.
4
normal with zero mean and unit variance
324
Interaction with tonal pitch spaces

Gabriel Gatzsche Markus Mehnert Christian Stöcklmeier
Fraunhofer IDMT Ilmenau Technische Universität Ilmenau Fraunhofer IDMT Ilmenau
Ernst-Abbe-Zentrum Institut für Medientechnik Ernst-Abbe-Zentrum
D-98693 Ilmenau, Germany D-98693 Ilmenau, Germany D-98693 Ilmenau, Germany
gze@idmt.fraunhofer.de markus.mehnert@tu-ilmenau.de stoecn@idmt.fraunhofer.de
ABSTRACT 2. TONAL PITCH SPACE

In this paper, we present a pitch space based musical interface
The description of musical tonality with geometrical models has a
approach. A pitch space arranges tones in a way that meaningful
long tradition. Early approaches are for example Heinichen’s
tone combinations can be easily generated. Using a touch
(1728) or Kellner’s (1737) regional circles, the harmonic network
sensitive surface or a 3D-Joystick a player can move through the
proposed by Leonhard Euler (1739), and Weber’s (1767) regional
pitch space and create the desired sound by selecting tones. The
more optimal the tones are geometrically arranged, the less chart [3]. Known as circle of fifths (Kellner’s regional circle),
control parameters are required to move through the space and to Riemann’s “Tonnetz” (Euler’s harmonic network) or Schönberg’s
select the desired pitches. For this the quality of pitch space based chart of key regions (Webers regional chart), these models are of
musical interfaces depends on two factors: 1. the way how the great interest till this day. In the meantime, advanced geometric
tones are organized within the pitch space and 2. the way how the tonality models have been developed. Roger Shepard [4] proposes
parameters of a given controller are used to move through the several helix models, which primarily describe aspects of octave
space and to select pitches. This paper presents a musical equivalence or fifths and chroma proximity. Elaine Chew [1]
interface based on a tonal pitch space derived from a four proposes a so called Spiral Array. The model’s core is the,
dimensional model found by the music psychologists [11], [2]. harmonic network inspired, geometric arrangement of pitches on
The proposed pitch space particularly eases the creation of tonal a spiral. The great breakthrough of Chew’s model is a unified
harmonic music. Simultaneously it outlines music psychological description of the relationship between tones, chords and keys
and theoretical principles of music. within one model and the observation of functional relationships
that build a tonal center. Fred Lerdahl‘s [3] “diatonic space”
consists of “basic space”, “chordal space” and regional space”.
Keywords These spaces help to model different aspects of tonality. There are
Pitch space, musical interface, Carol L. Krumhansl, music
many other ideas for representing musical aspects geometrically,
psychology, music theory, western tonal music, 3D tonality
e.g. Dmitri Tymoczko “orbifold space” [5], Aline Honingh’s idea
model, spiral of thirds, 3D, Hardware controller, Symmetry model
to describe and visualize the principle of shapeliness of tonal
pitch structures.
1. INTRODUCTION a) c b) d c
To make the process of music creation more intuitive it is f A E f
A
necessary to find interfaces that are easy to understand and F B

d B g g E
intuitive to play. The interfaces should provide many musical F a D
a C
D C F
possibilities that can be accessed by a small and expressive d b
b e e
parameter set. They also should represent music psychological, DC Gf d AC
G
bC AC f c
physical and theoretical principles. The goal of the musical
BC BC
interface presented here is to bring together results of the pitch g c dC g EC
EC gze@IDMT001209
space research on the one side and the possibilities of recent

developments in the area of controllers on the other side. For this Figure 1. The spiral of thirds: a) The direct derivation from
we will first give an overview of the state of the art in pitch space Krumhansl’s 4D MDS Sapce [6]; b) An adapted version as
research and then we will show how such pitch spaces can be basis of the proposed musical interface approach
combined with hardware controllers to form new musical
interfaces. 3. THE APPLIED PITCH SPACE
3.1 Derivation
The musical interface approach presented here uses a geometric
Permission to make digital or hard copies of all or part of this work for pitch space which has been derived from Carol L. Krumhansl and
personal or classroom use is granted without fee provided that copies are E. J. Kessler’s four dimensional multidimensional scaling solution
not made or distributed for profit or commercial advantage and that [6]. The pitch space is part of a larger framework of pitch spaces
called symmetry model [12][13]. The space used here is a three
requires prior specific permission and/or a fee. dimensional spiral which’s ends close in the fourth dimension
NIME08, June 5-7, 2008, Genova, Italy (Figure 1a). Cutting out one spiral winding results in a subspace
Copyright remains with the author(s). which contains 8 tones that form a diatonic set (Figure 2a+b). The
325
start and the end of the spiral winding are formed by the same keys (major, minor, …). To be aware about the geometric center
tone. If the pitch space is modified such that these two pitch helps to recognize redundancies in western tonal music. An
classes occupy the same XY-location (see Figure 2c, Figure 1b) it example: The chord progression C-f-C has exactly the same
is possible to represent many basic tonal relationships by simple geometric structure like the chord progression a-E-a. But both
geometric structures or in simple geometric ratios (Table 1). That structures are mirrored around the geometric center of the key.
simplification again makes it much easier to navigate and find There are much more of such major/minor mirror relationships
desired tonal structures within the pitch space. which become apparent when the pitch space is aligned to the
~ geometric center. But from a music psychological point of view
a) b) d
C e the geometric center is not the tonic of a key. The tonic of a key is
represented by the root of the given key which we denote with
F a a G
d C cognitive center. The most resting tone in a given key is that root
D e note. While the geometric center of a key is mode independent the
b G F b cognitive center of a key changes if the mode changes. For this it
d D could be better to align the pitch space not to the geometric center
gze@IDMT000600 but to the appropriate root note. This is shown in Figure 3: Figure
Figure 2. The extraction of one spiral winding results in a 2D 3a shows the system aligned to the geometric center. It can be
subspace that represents a diatonic key. This 2D space is the seen that major and minor chords together form a perfect
basis of the proposed interface [6] symmetric structure. It should be noted, that the geometric
distances along the spiral correspond to the tones distances on a
Representing one extracted and modified spiral winding in a 2D
semitone scale. Figure 3b shows the system aligned to the
plane – like shown in Figure 2 – results in a new geometric sub
cognitive center of a-Minor. The root note “a” is represented on
pitch space which particularly expresses key related tonal
the circle’s top. The root of the subdominant (“d”) and the root of
structures like functional relationships, aspects of tension and
the dominant (“g”) are now symmetrically arranged around the
resolution as well relationships between tones, intervals and
tonic. Figure 3c shows this for C-Major: The root “C” is aligned
chords [6]. This expressivity led to the decision to develop a to the circle’s top, the subdominant (“F”) and the dominant (“G”)
musical interface based on the adapted Krumhansl space. are symmetrically arranged around the tonic (“C”). To make a
musical interface intuitive it should allow switching between the
two alignment types. It should be possible to change the
Table 1. Often used tone combinations and its position alignment between the major tonic, the minor tonic and key’s
Diatonic Key One complete spiral winding geometric center.
Maj/min. chord Three neighbored tones
a) b) C c) a
Relative minor Direct neighbor counter clockwise of major C e
a C
e F
Relative major Direct neighbor clockwise of the minor chord a G
Parallel major The cord located directly one spiral winding
F G d e
above a given minor chord
F h
Parallel minor The chord located directly one spiral winding d h h G
d
below a given major chord gze@IDMT001136
Diminished The tones forming the start and the end of the Figure 3. The difference between a key’s geometric center and
selected spiral winding a key’s cognitive center: a) The geometric center, b) the major
root and c) the minor root are represented on the top
Major/minor Four neighbored tones (G-b-D-F)
sevens chord
Subdominant All tones to the left of the symmetry axis (e.g.
chords d-F-a, F-a-C [8]
4. NAVIGATION IN PITCH SPACE
Tonic chords Tones centered around the geometric center Now we will derive an interface that makes it possible to select
of the selected spiral (e.g. a-C-e, C-e-G) [8] parts of the pitch space like the one described in Table 1. Firstly a
simple set of parameters will be chosen in order to define the
Dominant All tones to the right of the symmetry axis desired sound. The interface proposed here has to support the
chords (e.g. e-G-b, G-b-D) [8] following movements and tasks in pitch space:
a) Moving within one spiral winding to play tones and
chords of one key. If the selected spiral winding is
3.2 Geometric versus cognitive center projected onto a 2D plane like shown in figure 2 this
It is important to distinguish between the geometric center and the results in a 2D navigation.
cognitive center of a key. The geometric center is located exactly b) Define what pitches are selected if a certain spatial
in the middle of the selected spiral winding (Figure 2, represented position has been reached. This requires parameters that
by the tilted d). The geometric center is the same for all diatonic define the dimensions of the selected part in the space.
326
c) Change the current spiral winding to change the key or a continuous fading between single tones, third intervals, major
to temporarily play chords from other keys. This results and minor chords or major or minor seven chords.
in a movement within the 3rd dimension.
4.2 Changing the key - Changing the spiral
d) According to Figure 3 it must be possible to align the
pitch space to the major tonic or to the minor tonic or to winding
the geometric center of a given key. The alignment It has to be distinguished between two modes of spiral winding
should be a task which is executed conscious. change: 1.) the permanent key change, which is used, if the key
shall be modified permanently. This permanent modification is
4.1 Playing Tones and Chords - Navigating used for example, if another song shall be accompanied or the
within one spiral winding current musical piece is to be transposed. In that case it is
The most basic task to create music is to navigate within one necessary to switch to another spiral winding and also to rotate
spiral winding and to select tones of one key. In Figure 4 it is the spiral winding such that the geometric or cognitive center
shown that concrete pitch classes are represented at discrete (Section 3.2) is represented at the same position where the former
angles. But to make pitch classes audible we also have to assign a key’s center was represented. In Figure 3 this position is the
pitch height to every pitch class. For this the authors propose to circle’s top.
use the radial dimension to assign different root positions to every 2.) The second change is the temporary key change: In many
pitch class. According to Figure 4 this results in four control cases there is a fixed key, but we need to play chords that belong
parameters for playing tones, intervals and chords: 1.) A start to another diatonic set. E.g. often it is required to play the
angle, 2.) an apex angle 3.) a start radius and 4.) an apex radius. dominant major chord in harmonic minor. In that case it is
The start angle defines the root of the chord that is to be played necessary to jump to another spiral winding, but the tonal center
(Figure 5). The apex angle defines how many pitch classes shall remain at the same spatial location.
neighboured to the root are played (Figure 6). The start radius it
used to define the pitch height of the played pitch class such that
the pitch height increases continuously with the radial position. Table 2. Several key respectively spiral winding changes:
So the higher a tone’s pitch the greater is the tone’s radial Key change Spiral winding change
position. Because chords can also be composed of tones of more
then one octave the apex radius can be used to increase or 1 Jump to the parallel major Select the spiral winding
decrease the number of octaves that are used to generate the tone key directly above the current one
combination.
2 Jump to the parallel minor Select the spiral winding
key directly below the current one
C e
3 Shift the key by one Shift the spiral by fife fifths
semitone
Start angle D 4 Jump to the next key in the Shift the spiral by one fifths to
a G circle of fifths the left or to the right
Apex angle E
r1 r2
Apex radius r2 Table 2 shows characteristic key changes. The two most
Start radius r1 important spiral winding changes are the parallel major/minor
ones (Table 2: 1, 2). These changes convert a given major chord
to a minor chord and convert a given minor chord to a major
F h chord (See also Figure 1). It can be seen that these changes are
simple operations. Other operations like the transformation of a
d gze@IDMT000404 given major chord into the dominant 7th chord are more
complicated and have to be regarded in more detail.
Figure 4. The selection of tones in the pitch space is done

using only 4 parameters
In order to fade between tones, intervals and chords as well as to

fade chords continuously from one into another, a spatial tone
distribution function is added to every tone. This distribution
function describes what happens if a spatial part between two
tones is selected. So by moving the pitch selection in radial Figure 5. Example 1: The change of the start angle allows to
direction the pitch height of the chord can be transformed (Figure play different chords. From left to right: The chords F-major,
7). This again allows to create inversions of a given chord. By C-major and G-Major. Note: To identify the correct pitch
moving the selection in tangential direction neighbored chords class labels look to Figure 2c.
can be crossfaded. Continuous increase of the apex radius results
in brighter chords, continuous increase of the apex angle results in
327
7 Shift the spiral winding by fife Tilt cap

fifths (pm 1 Semitone)
8 Align the pitch space to a given Physically rotate the
root note (psychological center) whole space navigator
9 Change to a selected key Select Key (Task 5-8)
permanently and press Button B
10 Define apex radius Button A + Twist cap
Figure 6. Example 2: A continues increase of the apex angle
11 Define apex angle Button A + Rotate cap
makes it possible to fade between single tones, third intervals
and major or minor chords
Figure 7. Example Tasks 1-4 of Table 3 describe how tones are activated: The start
3: A continuous radius and the start angle are defined by moving the
movement of the SpaceNavigator towards the appropriate point. Stopping this
selected area allows movement triggers the play of the tones, whereas the velocity of
the genera-tion of the played tones depends on the (negative) acceleration when the
mixed chords like movement is stopped. Tasks 5-9 of Table 3 show the realization
the dominant 7th. of key changes that are changes of the selected spiral winding. In
western tonal music the most frequently used spiral winding
5. THE HARDWARE CONTROLLER change will be to jump to the spiral winding directly above or
To execute the tasks described before, several hardware directly below the currently selected. According to Table 1 this
controllers have been subject of experiment. The subsequent spiral winding change transforms a given minor chord to its major
chapter describes the use of a 3DConnexion SpaceNavigator. The chord and a given major chord to its minor chord. Analogue to
3DConnexion Space Navigator is a 3D Joystick that provides 6 the geometrical movement within the spiral the pull and push
degrees-of-freedom: a) Move left/right, b) push/pull, c) tilt function of the SpaceNavigator is proposed to perform this task.
forward/ backwards, d) rotate, e) tilt left/right, f) move forwards/ For this it is possible to transform e.g. the e-Minor chord to E-
backwards (Figure 8). Major or the F-Major chord to an f-Minor chord by pulling and
pushing the cap. Tasks 6 and 7 propose a mapping of
SpaceNavigator functions to other types of key changes. Other
key changes than the one described in Table 3 can be generated
through combinations of the presented movements e.g. a key
change of one whole tone e.g. from C to D can easily be
performed by a double tilt or a double twist. With the assignment
of spiral movement operations as shown in Table 1 it is possible
to execute all of the key changes described in Table 2. It has to
be denoted that all key changes performed by Task 5 to 8 are
temporary key changes (4.2), i.e. if the cap is pushed and then the
parallel minor key is immediately selected. If the cap is released
after that, the selection returns to the previous key. This makes it
Figure 8. The six degrees of freedom of the 3DConnexion possible to “borrow” chords from other keys easily. To switch to
SpaceNavigator [9]. a key permanently the appropriate task of (Task 5 to 8) has to be
The question is now, how to assign these degrees-of-freedom to performed and after that Button B has to be pressed. Task 9 shows
the spiral navigation tasks described before. Table 3 proposes a that it is easily possible to align the system to the root of a given
mapping of different tasks to the parameter of the controller. major or minor key: This can be simply done by rotating the
whole SpaceNavigator, so that the desired root note shifts to the
SpaceNavigator’s top. The required visual feedback can easily
Table 3. Navigating within the pitch space using the realized by positioning the controller on a printout of the pitch
3DConnexion SpaceNavigator space (Figure 9).
Task Space Navigator
1 Select the start angle Move cap towards the
appropriate angle
2 Select the start radius Move the cap to the
appropriate radius
3 Start playing the selected tones Stop the movement
4 Define the velocity of the tones Acceleration of the stop
5 Jump to spiral winding Push/Pull cap
above/below
Figure 9. By rotating the whole SpaceNavigator the user can
6 Shift the spiral winding by one Twist cap left/right
fifth to the left or to the right align the system to major or minor: a) C-Major, b) a-Minor
328
6. SOFTWARE ARCHITECTURE problems. The reason for this was that manipulating a parameter
Figure 10 shows the software architecture that realizes the A involuntarily led to change in another parameter B also. In
presented musical interface approach. The core module is the addition, the simultaneous manipulation of two or more
module “Geometric Pitch Distribution”. This module defines independent parameters has been perceived as very complicated.
where the tones are geometrically positioned within the pitch The manipulation of the apex angle (Table 3, Task 11) has been
space. In our case this module realizes the spiral of thirds and can perceived as totally unusable by more than half of the test
be replaced by other geometric pitch distributions, e.g. the circle persons. This was due to the high sensibility of the
of fifths or a Riemann network. The module “Pitch Selection” SpaceNavigator. Also the time it takes to set the angle has been
defines in what way pitches can be selected for playout. The Pitch rated to be too long for real musical applications. Some people
Selection depends on the geometric pitch distribution and stated that they liked the perceptional link to the pitch space and
provides a high level interface to change e.g. the start- and apex the way that notes are being selected and played (parameter 1,
angle or the start- and apex radius (Figure 4) or the velocity of the Table 3).
tones to be played. The control parameters provided by the pitch
selection module must be mapped to the parameters of a given 8. APPLICATIONS
hardware controller e.g. the SpaceNavigator. This is done by the The proposed pitch space based musical interface is interesting
“Parameter Mapper”. This module receives the events from the for different target groups. Children often have extensive skills in
hardware controller (SpaceNavigator), transforms those events if computer games and the usage of new hardware controllers. Such
required and maps these parameters to the control parameters an interface could help them to learn tonal western music step by
provided by the module “Pitch Selection”. step. Combined with an appropriate visualization they can quickly
learn many theoretical relationships like the composition of
Parameter Pitch Midi Note chords, functional relationships between chords (subdominant,
Mapper Selection Generator tonic, dominant) or relationships between different keys, but also
more psychophysical relationships like the distinction between
Pitch
pitch chroma (assigned to angle) and pitch height (assigned to
Weighting
radius) which have been shown to be processed in different brain
Geometric regions [10].
Synthesizer
Pitch Distribution
Music students often have to learn many music theoretical terms.
gze@IDMT001137
For them it is important to keep the overview. The challenge
within this relationship is to bridge the gap between theoretical
Figure 10. The software architecture of the presented knowledge and its practical application. Using a pitch space based
interface approach instrument could help them to organize many theoretical terms by
The module “Pitch Weighting” takes the current Pitch Selection linking them to a spatial model. The possibility to interact directly
and the given Geometric Pitch Distribution and derives the with such geometric representation of tones and to hear the result
weights of the currently selected pitches. The weighted pitches immediately will additionally help them to improve their learning
again are forwarded to the “Midi Note Generator” which progress. For this the proposed musical interface could become
generates a midi signal which is fed to a synthesizer. part of the standard scholar education.
Older people are often willing to learn a new instrument, but
7. EVALUATION classical instruments like piano or violin are too complicated and
To evaluate the parameter mapping proposed in Table 3 several require the development of extensive motor functions. A musical
informal tests, a focus group and a usability test have been instrument with a simple set of parameters could motivate them to
conducted. The focus group consisted of five participants of start a new challenge i.e. to start to learn a new instrument.
varying musical background. The usability test featured 20 Musicians, DJs, Composers: Combined with an advanced sound
participants (10 musicians, 10 non-musicians). Task 1 (as synthesis (Figure 10) the pitch space based instrument can
proposed in Table 3) has been perceived as according to the become a creative tool that supports the finding of new chord
model in the software, and generally accepted. Due to the self- progressions throughout all keys and to develop advanced sound
centering property of the SpaceNavigator, Tasks 2-4 have been textures.
evaluated as difficult and only suitable for experimental musical
settings. Thus the start radius has preliminarily been set to a fixed 9. SUMMARY AND RESULT
position. This led to an alteration of the controller assignment of A new musical interface approach based on tonal pitch spaces was
Tasks 3 and 4 in a way that now a note is being played when a presented. The approach targets both musicians as well non
certain radius-margin is being crossed. The velocity of the played musicians. The combination of tonal pitch spaces with 3D-
note is now derived from the velocity of the movement during the Navigation tasks and a real-time auralisation provides many new
crossing of that margin. Tasks 5-7 have been perceived as very musical possibilities. The used model was derived from a music
problematic and unusable, due to physical limitations of the psychological model which guarantees a strong relationship
SpaceNavigator. It has been found that the 6 degrees of freedom between the used model of tonality and cognitive principals. With
of the SpaceNavigator cannot always be handled simultaneously. the extraction of one spiral winding and the projection onto a XY-
For example a full rotation locks the controller cap and prevents plane a strong simplification of the tonal space and the
movement to the desired angle (as in Task 1, Table 3) etc. Also complexity of navigation tasks could be reached (Figure 2). The
handling two independent parameters on two different axes led to alignment of that diatonic subspace to the symmetry axis
329
respectively the geometric center of the extracted key helped to [4] Shepard, Roger N.: Geometrical approximations to the
encounter structural redundancies between major and minor structure of musical pitch. In: Psychological Review (1982),
which again leads to a reduction of the learning matter. Nr. 89, S. 305–333
A simple set of parameters to navigate within the pitch space and [5] Tymoczko, Dmitri: The Geometry of Musical Chords. In:
to interact with pitches has been provided. This interface allows Science (2006), Nr. 313, S. 72–74
fading continuously between tones, intervals, chords and [6] Gatzsche, G.; Mehnert, M.; Gatzsche D.; Brandenburg, K.: A
inversions of chords as well as defining chord progressions symmetry based approach for musical tonality analysis,
throughout all keys, modulations, etc. The spiral of thirds portions Proceedings of the 8th International Conference on Music
the 12 major and the 12 minor keys in easy understandable Information Retrieval, ISMIR2007, Vienna, 2007
diatonic subspaces which can be easily accessed. The spiral has
been designed such that important tonal relationships like parallel [7] Gatzsche, G; Mehnert, M.; Gatzsche, D.; Brandenburg, K.:
and relative major and minor chords are in neighborhood. Mathematical optimization of a toroidal tonality model, 8th
Conference of The Society for Music Perception and
To create music out of the pitch space several navigation tasks Cognition, Concordia University Montreal QC Canada, 2007
have been defined and mapped to the control parameters of the
3DConnextion SpaceNavigator. A subsequent usability test [8] Mehnert, M.; Gatzsche, G.; Gatzsche, K. Brandenburg, K.:
consisting of a focus group and 20 individual tested participants The analysis of tonal symmetries in musical audio signals.
brought the result that it is required to have an controller which International Symposium on Musical Acoustics ISMA 2007,
allows anindependent control of different parameters (push/pull, Institut d'Estudis Catalans, Barcelona, 2007
rotate, …) simultaneously. The tested controller didn’t meet this [9] 3DConnextion, A Logitech company, SpaceNavigator,
requirement. http://www.3dconnexion.com/, January 2008
Next steps in the development of the interface will be 1.) to [10] Warren, J. D., et al.: Separating pitch chroma and pitch
evaluate controller alternatives that meet the requirements height in the human brain. www.pnas.org, 100(17):10038–
denoted before, 2.) to add the possibility for navigation within 10042, 2003.
different parts of the space independently, 3.) to introduce other
[11] Krumhansl, C.; Kessler, E.: Tracing the Dynamic Changes in
pitch spaces to play other and more complex chords and scales
Perceived Tonal Organization in a Spatial Representation of
and 4.) to develop an advanced visualization.
Musical Keys. In: Psychological Review (1982), Nr. 89, S.
334-368
10. REFERENCES [12] G. Gatzsche, M. Mehnert, D. Arndt, K. Brandenburg:

[1] Chew, Elaine: Towards a Mathematical Model of Tonality, Circular pitch space based musical tonality analysis, 124th
Massachusetts Institute of Technology, Diss., 2000 AES Convention, 2008, Amsterdam, Netherlands
[2] Krumhansl, Carol L.: Cognitive foundations of musical [13] Mehnert, Markus ; Gatzsche, Gabriel ; Brandenburg,
pitch. Oxford psychology series; no.17. Oxford University Karlheinz ; Arndt, Daniel; Circular Pitch Space based
Press, 1990. – ISBN 0–19–505475–X Harmonic Change Detection, In: 124th AES Convention
(2008)
[3] Lerdahl, Fred: Tonal pitch space. Oxford: Oxford University
Press, 2001. – ISBN 0–1950–5834–8.
330
real-time Raag Recognition for Interactive Music
Parag Chordia Alex Rae

Georgia Institute of Technology Georgia Institute of Technology
Music Technology Group Music Technology Group
840 McMillan St 840 McMillan St
Atlanta, GA Atlanta, GA
ppc@gatech.edu arae3@gatech.edu
ABSTRACT there are several difficulties for an automatic system.

We describe a system that can listen to a performance of In- The most obvious method for automatic raag recognition
dian music and recognize the raag, the fundamental melodic would be to attempt some kind of discrete note transcrip-
framework that Indian classical musicians improvise within. tion and then look for known phrases [6]. Unfortunately
In addition to determining the most likely raag being per- such an approach is highly sensitive to inaccuracies in pitch
formed, the system displays the estimated the likelihood tracking that lead to the insertion or deletion of notes. Pitch
of each of the other possible raags, visualizing the changes tracking in real contexts is difficult because of accompany-
over time. The system computes the pitch-class distribution ing instruments and the prevalence of fast, continuous mo-
and uses a Bayesian decision rule to classify the resulting tion between notes. For plucked instruments such as the
twelve dimensional feature vector, where each feature repre- sitar and sarod, strumming patterns on drone strings are
sents the relative use of each pitch class. We show that the interspersed with the main melodic line. All these factors
system achieves high performance on a variety of sources, require either a much more sophisticated pitch tracking al-
making it a viable tool for interactive performance. gorithm or the use of more advanced machine learning tech-
niques.
Previous work by Chordia [3] showed that pitch class
Keywords distributions (PCDs) and to a lesser extent pitch-class dyad
raag, raga, Indian music, automatic recognition distributions are relatively robust and easily computed fea-
tures that can be used for accurate raag classification. Hav-
ing established the effectiveness of such techniques, we wanted
1. BACKGROUND AND MOTIVATION to adapt this work for real-time situations involving the in-
Raag classification has been a central preoccupation of terplay between live performers and computer systems.
Indian music theorists for centuries [1], reflecting the im- The benefits of adopting a real-time approach are several-
portance of the concept in the musical system. The ability fold. First, there are greater opportunities for visualizing
to recognize raags is an essential skill for musicians and lis- the activity of the system, as one can watch classification
teners. The raag provides a framework for improvisation, decisions evolve in response to audio input, whether live or
and thus generates specific melodic expectations for listen- prerecorded. This in turn provides an opportunity to bet-
ers, crucially impacting their experience of the music [5]. ter understand the behavior of the algorithm, particularly
Recently, MIR researchers have attempted to create sys- temporal dynamics. In previous work, decisions were made
tems that can accurately classify short raag excerpts [3, based on isolated segments of fixed length. Observing a
2] continuously updated output allows us to see what events
Raag recognition is a difficult problem for humans, and perturb the system, and to understand when ambiguity and
it takes years for listeners to acquire these skills for a large clarity accurately reflect the source material or result from
corpus. It can be very difficult to precisely explain the es- inherent limitations of the classification system. Finally, a
sential qualities of a raag. Most common descriptions of real-time implementation opens up a wide realm of inter-
raags are not sufficient to distinguish them. For example, active music applications, in which the system can flexibly
many raags share the same notes, and even similar char- respond to a performer at a level that is highly abstracted
acteristic phrases. While ascending and descending scales but nonetheless musically relevant.
are sometimes used as fairly unique descriptors, they are
highly abstracted from the real melodic sequences found
in performances. It takes a performer lengthy practice to
2. METHOD
fully internalize the raag and be able reproduce it. This is Figure 2 shows the block diagram for the system. A sound
despite the fact that humans are highly adept at pattern source is selected and the system begins to listen and pitch
recognition and have little problem with pitch recognition, track the performance. The pitch estimates are aggregated,
even in harmonically and timbrally dense settings. Clearly, weighted by peak amplitude, and counted by pitch-class to
create the PCD. This is continuously updated (at the frame
rate) and sent to the classifier, which returns the posterior
probability of each raag. The strength of the posteriors
Permission to make digital or hard copies of all or part of this work for is visualized using the graph shown in Figure 2. Depend-
personal or classroom use is granted without fee provided that copies are ing on the performance context, the information about the
not made or distributed for profit or commercial advantage and that copies strength of activation of the different raag classes would be
bear this notice and the full citation on the first page. To copy otherwise, to used to control the musical output of the system.
republish, to post on servers or to redistribute to lists, requires prior specific The system is coded in Java and is currently implemented
NIME08, Genova, Italy as a Java external for Max/MSP. The Max environment
Copyright 2008 Copyright remains with the author(s). was chosen primarily for convenience in handling the au-
331
The bin boundaries are set by a configurable percentage

so that frequencies within some distance of the center fre-
quency for a given scale degree are aggregated (e.g. a value
of .5 creates contiguous bins, .25 quarter-tone bins, and so
on). The frequency values for the bin boundaries are cal-
culated in the log domain. Each incoming frequency value
is adjusted so it falls within a central pitch range by multi-
plying or dividing by a power of two. Then the binned fre-
quency values give a twelve-dimensional vector representing
the weighted frequency with which each pitch-class is used.
Because it is calculated at the frame-level, duration is auto-
matically taken into account; a held tone will occur in many
frames, giving it more weight.
The classification is performed using a Bayesian classifier.
WEKA, a commonly used general-purpose machine learn-
ing package, was used for this purpose [8]. PCD vectors
from each raag are used to learn the multivariate Gaus-
sian class-conditional probability distributions as well as the
prior probabilities. Training was performed on a database
of raags containing thirty-one different targets totaling 20
hours of recorded material. The PCDs had been previously
calculated using a non-real-time method with a slightly dif-
ferent pitch detection algorithm [7].
When a new case is presented, as through playing a sound-
file, the stored models are used to classify the given instance.
In our case, we were interested primarily in the posterior
distribution, and not just the most likely raag, which was
the reason for using a Bayesian classifier rather than a Sup-
port Vector Machine that gave very high accuracy rates in
previous work. The posterior distribution was displayed as
a bar graph, and was updated at the hop rate, providing a
Figure 1: Block diagram of raag recognition system smoothly changing picture of the classifiers best estimates
of the target raag. The PCD was displayed in a similar
manner within the same window, allowing for immediate
dio environment (selecting various sound sources, tuning comparison.
an oscillator, etc.). With few modifications, a stand-alone
application will be developed.
The YIN algorithm is used for pitch tracking [4]. This 3. RESULTS
method is based on finding periodicities in the time-domain In previous work we formally evaluated the system using
signal through a method similar to autocorrelation. It dif- a cross-validation procedure and established that classifica-
fers in that instead of multiplying time-lagged versions of tion accuracy for PCDs using thirty-one targets on thirty
the signal with the original, the YIN algorithm is built on second frames was 76%. In this work, we were more inter-
the squared difference function, along with several other ested in qualitative evaluation of the system, particularly
optimizations that have been shown to collectively increase its suitability for live performance settings. This led us to
the accuracy of the estimate [4]. examine how stable the estimates were, how much they fluc-
As the signal enters the system it is divided into frames, tuated over time, as well as the conditions under which the
typically of size 1024 samples given a sample rate of 44.1 system failed.
kHz. A hop parameter determines how frequently new pitch We evaluated these questions in two parts. One involved
estimates are generated, usually overlapping frames by 50%. the first author, a trained Indian classical musician, per-
The raw pitch track is smoothed by disallowing large jumps forming twelve of the thirty-one raags for several minutes
that immediately return to the approximate previous pitch each on sarod. A second evaluation used eight vocal record-
value. The smoothing uses a configurable threshold for ings, of an average length of ten minutes, from the raag
what is considered an unacceptably large jump, expressed database. In both cases the system was presented with
as a ratio. The peak amplitude of the frame is used to unaccompanied performance, although in the case of the
weight the pitch estimate in the next step. sarod, as noted above, drone and sympathetic strings also
To create the PCD, pitch estimates are fed into a block sounded.
that converts frequency values to pitch-classes. In this con- The majority of raags presented in the first test were de-
text, pitch classes are considered as scale degrees, since the tected quickly and accurately, usually finding the correct la-
tonic can vary from performance to performance. Thus it is bel within fifteen seconds and becoming stable within thirty
necessary to provide the frequency of the tonic to the sys- to forty seconds. There were a number confusions, mostly of
tem. This is done manually, or alternatively the performer two distinct types. Raags with similar scale types frequently
can simply play a note to the system that will be pitch led to the posterior distribution not converging completely
tracked and used as the root. The positions of the scale onto one choice; often the probabability mass, rather than
degrees are derived from a just-intoned scale, chosen over being evenly distributed, would fluctuate between different
equal temperment due to prevailing practice and lack of choices. The other main type of confusion involved raags
harmonic modulation in Indian music. It should be noted that were quite dissimilar in terms of scale type, but shared
that there are twelve fixed positions within the octave in one or two prominent notes, what one might call the domi-
NICM, despite microtonal inflections. nant note effect. To illustrate, we describe three examples.
332
Figure 2: The left panel shows a screen shot of the java external embedded in the Max/MSP environment.
The user is given access to various parameters that control the algorithm. The pitch track and amplitude of
the signal are displayed here. The right panel shows the display of the posterior distribution over the raag
targets and the pitch class distribution.
The system correctly identified raag Desh as performed palasi was confused with Darbari and Bageshri, and at
by the author in less than fifteen seconds, and the posterior times with Malkauns, and was never correctly identified.
became locked within thirty. There was some slight con-
fusion with Jaijaiwante, which is similar in both scale and
phraseology. In the first few seconds, when the performer 4. CONCLUSIONS
had played only two or three notes, with a prominent ma- We have demonstrated that real-time raag recognition
jor seventh scale degree, raag Yaman had the highest pos- is possible in realistic performance situations with minimal
terior, however with the introduction of more material, the adjustments needed for different performers. In terms of ac-
estimate quickly shifted to Desh. As a special case we were curacy we noted that as additional information is collected,
interested to see whether the introduction of raag Jaijai- initial ambiguities are often resolved leading to correct clas-
wantes signature phrase, which includes the minor third sification. In some cases, however, as the performer begins
scale degree not present in raag Desh, would be immedi- to focus on certain phrases that overlap with nearby raags,
ately detected. In fact, the change in the PCD brought on the estimate begins to fluctuate between different possibil-
by this phrase did indeed produce a nearly immediate shift. ities. In the latter case, this error is in part attributable
Raag Kedar (also performed live) provided a challenge: it to our feature choice, which does not include any sequence
is a raag that is primarily distinguished by its zig-zagging information. This make the system biased towards simply
phrases, and has a major scale similar to several other raags. using the most commonly occurring pitches for the clas-
Not surprisingly, the system took longer to converge, and sification decisions, without consideration of the melodic
for the first two to three minutes Kedar was confused with context. This suggests that if sequence were taken into ac-
Maru Bihag, Jaijaiwante, and Darbari. The first two were count, such as by counting pitch dyads, we would be able to
unsurprising due to their similar scales, however Darbari more clearly disambiguate raags in the same neighborhood.
contains a minor third and minor sixth, neither of which are This is consistent with what we found earlier in our non-
present in Kedar. We observed that this confusion occurred real-time experiments where pitch-dyads were also used as
when the fifth scale degree was the most prominent, an features.
example of the dominant note effect. Another point that becomes clear is that the algorithms
Raag Darbari, as sung by Amit Mukherjee, provided an lack a note model makes it very difficult for it to estimate
interesting example of tetrachordal ambiguity being resolved. the perceptual salience of a pitch. For example, for a hu-
Often a performer will focus on half of the scale for a period man listener, the fleeting introduction of a new note can
of time. In this case, the singer lingered on the upper four dramatically change the tonal landscape even though its ef-
notes of the scale for nearly forty seconds before including fect on the PCD is minimal, perhaps indistinguishable from
the other notes. During this time, Malkauns had by far the noise. Another example is the use of gliding approaches to
highest posterior, an unsurprising confusion given the two notes that are very common in Indian music. For our sys-
raags’ similarity in the upper tetrachord. However, the system, such glides introduce energy at all the frequency bands
tem immediately and unambiguously switched to Darbari between the starting and ending point, and for a plucked
upon hearing notes from the lower tetrachord. instrument that decays quickly, may emphasize the start-
It should be noted that in some cases the system was un- ing point. This leads to a noisier PCD estimate. However,
able to converge to the correct raag, or only briefly touched for a human the opposite is often true. Glissandi serve to
upon it before converging elsewhere. For example, Bhim- emphasize the target note. In both these cases, the crucial
333
difference is that the system has no concept of what consti- [6] H. Sahasrabuddhe and R. Upadhy. On the
tutes a note. Humans are able to integrate many different computational model of raag music of india. In Proc.
types of information to resolve notes from complex time- Indian Music and Computers: Can Mindware and
varying pitch tracks, from the basic perception of vibrato Software Meet?, 1994.
as a modulation of a central frequency to understanding the [7] X. Sun. A pitch determination algorithm based on
importance of a pitch within the musical structure. subharmonic-to-harmonic ratio. In In Proc. of
These insights were suggested concretely by our ability International Conference of Speech and Language
to visualize inner working of the algorithm, both the data Processing, 2000.
presented to it and its hypotheses, while simultaneously lis- [8] I. H. Witten and E. Frank. Data Mining: Practical
tening. machine learning tools and techniques. Morgan
We have created a framework for high-level musical in- Kaufmann, 2005.
teraction for Indian electroacoustic music. We envision a
number of extensions to this system to allow it to be used
effectively in a performance setting. Aside from tracking
more features to yield different and more robust classifica-
tion estimates, the momentary maximum posterior can be
used for example to introduce phrases from the same or re-
lated raags. The time-varying posteriors can also be treated
as modulating control signal that tracks a very high-level
aspect of the music. The time-varying PCDs can be used
in a similar fashion. These signals can be used to generate
tonally relevant material.
5. FUTURE WORK
Improving the pitch estimation algorithm would likely
yield better results. The primary difficulties faced are track-
ing pitches that vary rapidly over time, which is typical of
Indian music, eliminating pitches that are not part of the
main melodic line, and source separation from accompani-
ment and resonating strings.
Future systems will attempt to use simple sequential in-
formation such as pitch class dyad distributions as well,
which would require segmenting the signal into notes using
an onset detector. This is particularly difficult in Indian
music where notes do not often correspond to clear onsets
in the time domain. If solved, this would partially address
the lack of a note model referred to above, and would likely
lead to a substantial increase in accuracy. However, cre-
ating a perceptual note model remains a fundamental and
interesting problem.
6. ACKNOWLEDGMENTS
The authors would like to acknowledge the following mu-
sicians who made substantial contributions to the raag database:
Prattyush Banerjee (sarod), Nayan Ghosh (sitar), Amit
Mukherjee (vocal), Sugato Nag (sitar), Falguni Mitra (vo-
cal), Manilal Nag (sitar).
7. REFERENCES
[1] V. Bhatkande. Hindusthani Sangeet Paddhati. Sangeet
Karyalaya, 1934.
[2] P. Chordia. Automatic raag classification of
pitch-tracked performances using pitch-class and
pitch-class dyad distributions. In Proceedings of
International Computer Music Conference, 2006.
[3] P. Chordia and A. Rae. Raag recognition using
pitch-class and pitch-class dyad distributions. In
Proceedings of International Conference on Music
Information Retrieval, 2007.
[4] A. de Cheveigne and H. Kawahara. Yin, a fundamental
frequency estimator for speech and music. Journal of
the Acoustical Society of America, 111(4):1917 – 1930,
2002.
[5] D. Huron. Sweet Anticipation: Music and the
Psychology of Expectation. MIT Press, 2006.
334
Bending Common Music with Physical models
Anders Vinjar
Institute for Musicology
University of Oslo
Oslo, Norway
andersvi@extern.uio.no
ABSTRACT
A general CAC1 -environment charged with physical-model-
ling capabilities is described. It combines Common Music,
ODE and Fluxus in a modular way, making a powerful and
flexible environment for experimenting with physical models
in composition.
Composition in this respect refers to the generation and
manipulation of structure typically on or above a note-
, phrase or voice-level. Compared to efforts in synthesis
and performance little work has gone into applying physical
models to composition. Potentials in composition-applica-
tions are presumably large. Figure 1: Intuitive response in physical model
The implementation of the physically equipped CAC-en-
vironment is described in detail.
human research and cpu-time — are again focused on prob-
Keywords lems concerning musical structure.
Physical Models in composition, Common Music, Musical 1.1 Physical models

mapping A physical model in this context is a computer-program
made up of methods simulating nature. It simulates ob-
1. INTRODUCTION jects with physical properties and their behavior in time in
a physical universe. Physical models have qualities which
When analyzing the amount of cpu-power and cpu-time
make them interesting when working on musical problems.
spent on music-related computing at places like IRCAM
over the years a trend stands out[1]. During early years 1. both realistic and virtual models can be made to re-
most cpu-resources were spend on traditional composition- act in ways we perceive as natural and intuitive in
tasks — structuring sets of notes and rhythms and other response to input (Figure 1).
isolated musical parameters. This is not very strange. For
the early computer-composer, coding restricted algorithms 2. because the models behave according to physical laws,
to calculate a limited amount of musical parameters and and give realistic response to stimulus, we also recog-
high-level events, with a fair chance of achieving satisfy- nize behavior in variations of real models as realistic.
ing musical results, was more feasible then attempting to
develop effective algorithms to calculate the samples of mu- 3. physical models may be programmed so they give lin-
sically useful sound-waves, often ending up with dull sound- ear response to linear modifications. The structure of
ing results. When knowledge and technology had developed the model and the parameters of the physical forces
adequately, more focus shifted towards research and devel- which affect it can be modulated dynamically, and its
opment of hardware for real-time DSP, software to control it response to stimulus changed in a predictable manner.
effectively in performance and analysis/synthesis techniques
to gain more interesting synthesis or processing of sound.
1.2 Using Physical Models in composition
In recent years however, after realtime-processing and play- Physical Models have proven effective in various areas
ing of multiple channels of hifi-sound has become obtainable of computer-music applications, and have received much
in consumer-level equipment, much more resources — both attention from music-researchers and developers of music-
technology. The main efforts have focused on sound-syn-
1 thesis and solutions to problems concerned with musical
Computer-Assisted Composition
performance. Typical musical applications are modelling
acoustics or synthesizing sound, fex. room-acoustics by ge-
ometric modelling of virtual rooms[10] or virtual strings[8].
Permission to make digital or hard copies of all or part of this work for Much work has also gone into developing bio-mechanical
personal or classroom use is granted without fee provided that copies are models to generate computer-performances with humanlike
not made or distributed for profit or commercial advantage and that copies expression[12], ie. human dynamics, ritardando etc.
bear this notice and the full citation on the first page. To copy otherwise, to The main efforts have focused on sound-synthesis and
republish, to post on servers or to redistribute to lists, requires prior specific solutions to problems concerned with musical performance.
NIME08, Genova, Italy Typical musical applications are modelling acoustics or syn-
Copyright 2008 Copyright remains with the author. thesizing sound, fex. room-acoustics by geometric modelling
335
Figure 3: Simple physical control of complex musi-

Figure 2: Black-box: “simple-to-complex”-mapping
cal parameter-space
of virtual rooms[10] or the Karplus-Strong algorithm[8], or

ironment.
developing biomechanical models to generate computer-per-
In real-life situations, both constraint-based and rule-
formances with humanlike expression[12], ie. human dy-
based approaches enforce strict limitations on the composer.
namics, ritardando etc.
In a constraint-based system, if a search is to find solutions
1.3 Potentials to composition-work at all or within acceptable time the possible set of con-
straints to be combined in a given search is limited. Using
Relatively little research has been done on physical mod-
rule-based systems composers need to explicitly formulate
els in composition-work until now. Experiments has been
detailed sets of rules and exceptions for all situations occur-
done by the author[16] and others[3] mapping physical prop-
ing in the musical processes he or she wants to prescribe.
erties from dynamic physical models to controlling compo-
These approaches suggests either having a well-defined
sition-parameters, suggesting that potentials in composition-
musical goal to end up with, or a clear idea of how to ar-
applications are large. Having automatic systems control-
rive there. To facilitate creative and intuitive work with yet
ling complex sets of musical parameters and responding in-
unknown composition-problems, and still provide a deter-
tuitively to input are obviously interesting. Physical mod-
ministic and reproducable work-flow other tools may prove
els may bring similar benefits to typical composition-tasks
useful.
— structuring and disposition of material with similar or
contrasting global qualities, generation of patterns with de- 1.6 Using Physical Models in CAC-environ-
grees of similarity, controlling evolution of interlocking sets ments
of voices — as they have done to synthesis and performance-
Typical composition-processes involve experimenting with
systems.
complex hierarchies of many interdependent parameters gen-
1.4 Black-Box architecture erating musical data. Some of the wanted qualities of a good
CAC-environment are:
A Black Box refers in this context to the inclusion of a
construct as part of a program, where we are only interested 1. simplify control of complex processes with many pa-
in its input and output characteristics. Input controlling the rameters
device may be both simple and precise — and the output
arbitrary complex — as long as the system reacts as we 2. flexible and effective mappings between composers in-
expect. This makes Black-boxes useful when controlling put and musical data
complex musical processes effectively. 3. intuitive or ecological relations between cause and re-
If a black-box consists of a Physical Model — real or sponse in parts of the system
virtual — the response in the system will also fit with our
expectations about how physical devices work. This may 4. function as explorative tool
provide a system which is intuitive and easy to learn.
When composing music it is interesting to experiment 5. “tune to liking” — define and save certain personal
with the response in the system by changing it gradually styles, approaches, presets
on a linear scale from obvious to unexpected. Using black- 6. not bias the composition-work towards predefined mu-
boxes also allows for mapping control-input to resulting out- sical styles
put using any of the archetypes — one-to-one, one-to-many,
many-to-one and many-to-many — mappings. Incorporating physical models as black-box-modules in
existing and well-functioning CAC-environments may pro-
1.5 CAC-environments vide such qualities. Figure 3 shows the basis of the work-
Some of the powerful approaches to composition offered flow in such a system.
by modern CAC-software are rule-based and constraint- In performance-control or analysis/synthesis systems the
programming. Composers like them because they allow applicable physical models are often limited to define stable
for explicitly formulating music-theoretical rules and hav- and linear systems. Used in synthesis or composition-work
ing the computer generate music which fits, or getting a there are no real limitations as to what kind of physical
musical score automatically generated just by describing models and control-strategies are legible, since everything
desired results. Examples of implementations of constraint- which comes out of the system may be considered legiti-
programming systems are various Constraint-libraries[9] dis- mate material. However, just as in sound-synthesis — and
tributed with PatchWork and Open Music, or Common- in particular when developing the topologies and parame-
Musics integration with Torsten Anders’ Strasheela[2]-env- ters of models to be used for composition — starting with
336
A special OSC-protocol is developed as part of the pro-

ject. Both CM and Fluxus are Scheme-based programming
environments2 and could potentially be combined in the
same heap. But the way the present project is structured
and the current development-state of the involved systems
suggest instead to run them as separate applications and
communicate through OSC.
Separating the components makes the system more flexi-
ble. CPU-load may be shared between computers and OSC-
messages may be routed and sent to/from other applica-
tions. For convenience, an OSC-sequencer is programmed
in SuperCollider[13] to make storing and playback of OSC-
data possible out of realtime when eligible.
Figure 4: Components in a physically equipped
CAC-environment 2.1 Common Music
CM functions as the client. Its an object-oriented music
composition environment[14] with broad support for tradi-
a model which imitates something real and subsequently tional musical entities. It has a well-defined API facilitat-
modifying it may prove more effective than setting up a ing definition of for instance new I/O-classes. CM supports
random topology of some virtual structure. OSC messaging, and a real-time-scheduler is built-in. CMs
has powerful support for processes 3 and modular pattern-
1.7 Related research generation macros. Amongst CMs interesting features are
The questions rising when applying physical models on dynamic scheduling and control of processes. This makes it
levels of musical structure typical of composition-tasks over- possible to both control any running processes, and set-up
lap to a large extent those studied in research on embodied and schedule further processes based on the current run-
music-cognition. The main efforts in this vast field are re- time situation in the environment.
lated to studies of perception and performance. In this context CM takes care of musical I/O, algorithms
The project here is more directly related to applications and intermediate representations. CM receives streams of
to creative work. Examples of these are physical-modeling OSC-data from the physical system, in or out of real-time,
tool-kits designed for DSP-applications like Cyrille Henrys and uses this input in several ways: triggering events, dy-
pmpd [7], specialized environments like ACROE’s Genesis- namically controlling the construction, sprouting and evolu-
project[4] or IRCAMs Modalys[15]-project. Some of the tion of CM processes based on current state in the physics-
most relevant research is connected to using physical models system, changing dynamic variables which are looked up by
together with sensor-control of virtual instruments (using already running processes etc.
the models to efficiently map sensor-data to control-data),
and experimenting with bio-mechanical models in studies 2.2 Fluxus/ODE
and simulation of human performance as in the Samsara[5]-
Fluxus[6] is a real-time, graphical live-coding environ-
project conducted by Sylvie Gibet et. al.
ment for Scheme developed by Dave Griffiths. In this project
All work being done to understand and use physical mod-
it constitutes a Scheme-controlled physics server. Fluxus
els as black-, white- or gray-boxes in advanced control-
can communicate with other applications via OSC network-
algorithms in general[17] is of course very relevant.
messages and handle input from audio, keyboard or mouse.
A physics engine — ODE[11] — is built into Fluxus for
2. IMPLEMENTATION time-synchronous simulations of rigid body dynamics. ODE
An application to aid research on physical models in com- is an open source, high performance library for simulating
position-work is programmed by the author. rigid body dynamics. It is fully featured, stable, mature and
A suitable tool in this research-project shares many of the platform independent with an easy to use C/C++ API. It
requirements of CAC-environments for working composers. has advanced joint types and integrated collision detection
The following qualities are wanted: with friction. ODE is useful for simulating objects in vir-
tual reality environments, vehicles and virtual creatures. It
Effective development-environment, minimizing lag in is currently used in many computer games, 3D authoring
implement → test → response cycle tools and simulation tools.
Fluxus allows visualization and graphical interaction in
Optimized to run efficiently on standard hardware real-time. Besides generating interesting graphical output4 ,
it provides the user with useful visual feedback on the phys-
Flexible architecture, allowing easy modifications, upda-
ical system.
tes, and even substitution of modules or components
Open Source to help sharing of development-work and 2.3 Virtual mechanical structures
experiment-results with other projects or individuals All kinds of realistic and unrealistic virtual mechanical
structures are interesting to experiment with in this context.
General representation to facilitate analysis- and inter- Having access to a general toolkit for rigid-body-mechanics
application-work makes all possible shapes, structures or set of structures
definable in this environment available for experimentation.
The application is programmed as a client/server archi-
tecture consisting of two parts — Common Music as the 2
CM can be built using either Scheme or Common-Lisp
client, and Fluxus with ODE built-in running as a server. 3
Algorithms with built-in functionality for handling musical
The physically equipped CAC-environment is up & runtime
ning, and has been used to compose musical material for 4
Fluxus is perhaps used most as a real-time performance
new compositions. tool
337
mance of special physical topologies — and physical models

in general — will be studied qualitatively and with respect
to efficiency, and compared with alternative approaches to
solving the same musical problems.
To ensure exchange of relevant results, the research will
be done in conjunction with relevant research groups at the
University of Oslo and elsewhere.
Results will be presented on various scenes, exhibitions,
concerts, performances, workshops, seminars and confer-
ences.
4. REFERENCES
[1] C. Agon, G. Assayag, and J. Bresson, editors. The
OM Composer’s Book, volume 1. Delatour
Figure 5: Virtual hand behaving in a virtual world France/Ircam-Centre Pompidou, 2006.
[2] T. Anders, C. Anagnostopoulou, and M. Alcorn.
Multiparadigm Programming in Mozart/OZ, volume
The interactive nature of the application suggests start- Volume 3389, chapter Strasheela: Design and Usage
ing off with simple structures, observe the results, modify, of a Music Composition Environment Based on the
and observe how the musical output changes. Systems re- Oz Programming Model. Springer Berlin/Heidelberg,
sponding realistically to input according to physical laws 2005.
simplifies learning its behavior. An example of one such ap- [3] C. Cadoz. The Physical Model as Metaphor for
proach used to experiment with is a model of a hand where Musical Creation. pico.. TERA, a Piece Entirely
attributes such as length of fingers or gravity are modulat- Generated by a Physical Model. Proceedings of the
able. A certain point on the hand is subjected to externally 2002 International Computer Music Conference, 2002.
controlled forces in three dimensions, resulting in the whole [4] N. Castagne and C. Cadoz. Creating music by means
structure responding in a physically coherent way. of ‘physical thinking’: The musician oriented genesis
As part of this project virtual structures consisting of environment. In Proc. of the 5th. Int. Conference on
mechanical bodies with varying shape, mass and surface- Digital Audio Effects. DAFX-02, 2002.
qualities, connected by links of various types and qualities [5] S. Gibet, N. Courty, and J.-F. Kamp, editors. Gesture
— eg. “balljoint”, “hingejoint”, “fixedjoint”, “sliderjoint” in Human-Computer Interaction and Simulation, 6th
— are constructed and set to interact in virtual physical International Gesture Workshop, GW 2005, Revised
worlds with arbitrary values for physical properties such as Selected Papers, volume 3881 of LNAI. Springer,
gravity or friction. All physical parameters in the system Berder Island, France, May 2006.
— position, speed, forces, collisions between objects, angles [6] D. Griffiths. Fluxus, http://www.pawfal.org/fluxus.
of joints etc. — can be read at any time, notified as OSC-
[7] C. Henry. pmpd: Physical modelling for pure data,
messages and be mapped to musical parameters.
2004.
Different structures and modes of interaction may be saved
[8] K. Karplus and A. Strong. Digital synthesis of
and recalled as presets, and behavior over time may be
plucked string and drum timbres. Computer Music
recorded and played back.
Journal, 7(2):43–55, 1983.
2.4 Mapping [9] M. Laurson. Pwconstraints. X Colloquio di
Special mapping-layers connecting streams of data from Informatica Musicale, X:332–335, 1993.
the physical world to musical parameters, and fitting their [10] F. L. Lezcano. dlocsig,
ranges onto eligible scales, are programmed as classes and http://ccrma.stanford.edu/ nando/clm/dlocsig/.
methods in the CM-environment. [11] Open Dynamics Engine, http://www.ode.org.
The way data-streams from the physical objects are used [12] R. Parncutt. Modeling piano performance: Physics
to control evolution of musical parameters defines the re- and cognition of a virtual pianist. ICMC Proceedings,
sulting music. These choices are made to suit the actual pages 15–18, 1997.
compositional problem at task. [13] SuperCollider, http://supercollider.sourceforge.net/.
The enhancements provided by a physical black-box in- [14] R. Taube. Common Music.
cluded before the mapping-stage are related to how param- http://commonmusic.sourceforge.net.
eters evolve over time and the way simple input are used to [15] H. Vinet. Recent research and development at ircam.
control complex sets of parameters through intuitive one- Computer Music Journal, 23(3):9–17, 1993.
to-many mappings. [16] A. Vinjar. Oppspent line. MIC-recording, 1994.
Musical composition.
2.5 Musical parameters
[17] M. J. Willis and M. T. Tham. Advanced process
The system controls parameters on various levels as il- control, April 1994. Web-document.
lustrated on the right-hand side of figure 3. Examples of
interesting compositional parameters range from low-level
ones — pitch, onset, register, dynamic — to high-level at-
tributes such as phrasing, ambitus, texture, redundancy.
3. CONCLUSION AND FUTURE WORK

As part of this project several musical works will be com-
posed and the physically enhanced CAC-environment will
be used while composing these pieces. The use and perfor-
338
Scoring an Interactive, Multimedia Performance Work

Margaret Schedel Alison Rootberg Elizab eth de Martelly
Stony Brook University Kinesthetech Sense Stony Brook University
3304 Staller Center 5009 Woodman Ave #303 3304 Staller Center
Stony Brook, NY 11794 Sherman Oaks, CA 91423 Stony Brook, NY 11794
1.415.335.7555 1.847.209.8116 1.630.533.0194
gem@schedel.net arootb erg@gmail.com elizabeth.demartelly@yale.edu
ABSTRACT
The Color of Waiting is an interactive theater work
3. Philosophy
with music, dance, and video which was developed at A live event engages the audience in a unique way, where each
STEIM in Amsterdam and further refined at CMMAS member contributes to shaping the event, actively participating i n
in Morelia Mexico with funding from Meet the its realization. In this way, the artistic experience becomes a
Composer. Using Max/MSP/ Jitter a cellist is able t o dialectical one. Importantly, the live event implicates its audience
control sound and video during the performance both as individuals and as a collective. As critic Nicholas
while performing a structured improvisation i n Bourriaud notes, “Each particular artwork is a proposal to live in a
response to the dancer’s movement. In order t o shared world…intersubjectivity…becomes the quintessence of
ensure. repeated performances of The Color o f artistic practice.” [1] In performance, audience members engage
Waiting, Kinesthetech Sense created the score with the artists and their creations in a collective elaboration of
contained in this paper. Performance is essential t o meaning. This component of a communal development of meaning
the practice of time-based art as a living form, but is an essential aspect of the artistic experience. At a live event,
has been complicated by the unique challenges i n “there is the possibility of an immediate discussion: I see and
interpretation and re-creation posed by works perceive, I comment, and I evolve in a unique space and time.” [2]
incorporating technology. Creating a detailed score With the diminished critical distance comes an increasing
is one of the ways artists working with technology emotional involvement where the participant is immersed “in a
can combat obsolescence. 360-degree…unity of time and place.” [3] The live event is thus a
site of encounter and exploration.
Keywords By contrast, the viewer takes on a much more passive role when
experiencing an event through documentation. Instead of a shared
Interactivity, Dance, Max/MSP/Jitter, Sustainability
site of artistic communion, the document “refer[s] each individual
to his or her space of private consumption.” [4] The viewer cannot
1. INTRODUCTION participate in the communal aspect of a live performance, as the
This score with descriptions of the electronic documentary forces him or her to acknowledge his or her current
sounds, video compositing, choreography, cello surroundings, separating the individual from the experience while
tracking, lighting, costume stage diagram will allowing only a glimpse of it. In addition, a documentary of an
enable performance of this work well into the future. event lacks the dynamism of meaning one encounters at a live
A DVD with the Max/MSP/Jitter patch saved as text, event, as a document is essentially a predigested, one-sided
the sound and video files used in the performance interpretation of a historical circumstance. Documentation, n o
and video clips from various performances i s matter how thorough, is unavoidably biased towards producing a
included with the score. By including screen shots certain interpretation of the event: each image presented i s
of relevant sections of the max patch in the score, we mediated through the critic’s lens. Here, the relationship between
show which part of the interaction is most important viewer and image is one of authoritarian promotion and reception.
for the artistic success of the work. [4] But art exists in time and space, and its reduction to mere
document subtracts something essential from it, reducing it to an
object that exists within the confined parameters of the viewer’s
2. The Score screen. Bourriaud argues that artistic form can only be realized
The following figures show full pages of the score. “from a meeting between two levels of reality. For [the
The entirety of the introductory materials (figures 1- homogeneity of a document] does not produce [art]: it produces
3) is included while due to space restrictions only only the visual, otherwise put, ‘looped information.’” [5]. Our
excerpts from the timeline of the performance are score including the DVD is not a documentation of a performance,
included (figures 4-6). The introduction serves t o nor is it a document to be used in performance, rather it is a
document all elements of the piece including the set, document to ensure repeated performances.
the lighting and the costumes. The timeline contains
sketches of the choreography, performance 4. References
instructions for the cellist and dancer, musical
notation for the cellist including light and dance [1] Bourriaud, Nicholas. 2002. Relational Aesthetics. Transl. Simon
cues, and stills from the video showing brightness, Pleasance and Fronza Woods. Paris: Les presses du reel, 22.
and placement of elements. [2] Ibid., 16.
[3] Popper, Frank. 2007. From Technological to Virtual Art.
Cambridge, MA: The MIT Press, 181.
[4] Bourriaud, 22.
[5] Ibid., 24.
[6] Ibid., 24.
339
Figure 1: Page 2 of score
340
341
342
Demos1

1
The following pages include the contributions that have been accepted as demos. The demo program also includes
nine further demos associated to papers and posters.

Rhythmic Instruments Ensemble Simulator Generating

Animation Movies Using Bluetooth Game Controller
Ayaka Endo Yasuo Kuhara
Department of Media Art, Tokyo Polytechnic University Department of Media Art, Tokyo Polytechnic University
1583 Iiyama Atsugi Kanagawa Japan 243-0297 1583 Iiyama Atsugi Kanagawa Japan 243-0297
+81 46 242 4111 +81 46 242 4111
endayak@media.t-kougei.ac.jp kuha@media.t-kougei.ac.jp
ABSTRACT 2. METHOD
We developed a rhythmic instruments ensemble simulator
generating animation using game controllers. The motion of a 2.1 Concept
player is transformed into musical expression data of MIDI to We use rhythmic instruments as a musical ensemble, which
generate sounds, and MIDI data are transformed into animation include drum sets and percussions in various genres, such as pop,
control parameters to generate movies. These animations and latin, ethnic, techno, etc. They have no clear melody and it is easy
music are shown as the reflection of player performance. Multiple to make a sound by a comparably simpler action like hitting than
players can perform a musical ensemble to make more varied other melodious instruments. The performing action is directly
patterns of animation. Our system is so easy that everyone can related to the generating of sound expression. Even beginners or
enjoy performing a fusion of music and animation. children can play them in a demonstration.
We use a Wii Remote as the wireless controller by Bluetooth
Keywords technology, which is easily connected to a computer. In past
Wii Remote, Wireless game controller, MIDI, Max/MSP, Flash reports, Bluetooth controllers are often used for musical
movie, Gesture music and animation. performance [1][2]. Wii Remote is an obtainable device on the
market at a reasonable price, and has useful operation. A wireless
device can make it possible to construct an unfettered
1. INTRODUCTION
environment for the music performance, and players can perform
Many persons have a desire to perform music with instruments.
freely from bothersome wires. Wii Remote has three axis
However, playing musical instruments requires some degree of
acceleration sensors, and various physical motions can be
training, and as a result, it is difficult to obtain the skill of
detected, such as shake, hit, slide, turn, twist, etc. Rhythmic
performing well.
performance is highly related to these handy actions, and it means
Recently, as computer technology advances, numerous music players directly make rhythmic sounds by handling it.
video games have been developed as simulators of musical
Our system generates animation movies synchronized in real time
instrument performance, for example, Beatmania or Guitarfreaks
with music performing by players. The visual aspects of playing
of Konami. However in those games, players are so passive that
music are as important as the sound itself. Players enjoy
they are only pushing buttons on controllers according to
performing music more by animation, which can be controlled
preloaded music. They cannot perform their own music. A video
and make various patterns synchronized with musical expression
monitor shows only the performance data and provided images.
such as tone, velocity, tempo, etc. If multiple players perform an
In this paper, we proposed a musical instrument performance ensemble, an animation generated by each player interacts with
simulating system, which generates animation. Players use a the others, and as a result, variations of animation are increased.
normal wireless game controller known as a Wii Remote, which The interaction of movies is interesting for all players related to
is developed by Nintendo using Bluetooth technology. They can the ensemble, because unexpected motion graphics are generated.
play rhythmical instruments by operating the controller. The In our system, a maximum of four players can perform together at
action of players is reflected to sound data, such as velocity, a time.
timing, etc. Additionally animation movies are generated based
on the sound data. If multiple players perform an ensemble, each
player generates his own movie and influences the images of the PC
others. They can enjoy a performance by not only music from the Bluetooth Max/MSP
flashserver Flash
ears, but also animation from the eyes. Motion to
MIDI MIDI MIDI to
transmitted Movie
requires prior specific permission and/or a fee. Wii Remotes Speakers playing music
NIME08, June 4-8, 2008, Genova, Italy Display animation
Copyright remains with the author(s). Figure 1. System configuration.
345
2.2 System Configuration 3. DISCUSSION

Our system is configured on a PC, Wii Remotes, speakers, display, In 2006 we developed a circle canon chorus system for enjoyable
Max/MSP software, and Adobe Flash player (see Figure 1). singing ensemble [5]. In its demonstration someone hesitated to
sing. Meanwhile, our rhythmic performance system is easy to
Players perform rhythmic music by handling Wii Remotes. The
play without feeling shy. In a demonstration, everyone enjoyed
motion data of a controller are transmitted to the Max/MSP. The
playing rhythmic music and was surprised by the generated
aka.wiiremote [3] external for Max/MSP can handle the Wii
animation. Additionally, our system can connect a normal MIDI
Remote. In the Max/MSP, motion data are transformed into
controller via the MIDI interface, for example, MIDI keyboard,
musical expression data of MIDI. Acceleration data are assigned
MIDI guitar, and Wind synthesizer (see Figure 3). Therefore, we
to MIDI parameters such as, note number, velocity, duration, and
can perform a session with melody or chord instruments, and a
some control change values, which cause a MIDI sound module
different type of animation ensemble is expected. Some photos
to generate music. The flashserver [4] software allows for a
and movies of the demonstration are shown on our web site [6].
communication between Max/MSP and Flash. The MIDI data are
transmitted to Flash to be transformed into animation control
parameters, such as, color, shape, motion, timing, etc. As a result, 4. CONCLUSION
music and animation are synchronized with the action of players. Using a Wii Remote, we developed a rhythmic instrument
performance system generating animation, which is so easy that
In the case of multiple players, basically it works similarly except everyone can enjoy it. In the near future, we will build a friendly
for the animation. If a collision of two animations originated from interface for playing more varied instruments and enjoying the
different players occurs, both animations are interacted with each ensemble of sounds and images.
other to change their patterns of movies.
5. REFERENCES
2.3 Performance [1] Bowen, A. Sound stone: A 3-D Wireless Music Controller.
To perform music, we aim to evolve to more creativity, while In Proceedings of the 2005 International Conference on New
keeping the simplicity and popular attractiveness. A player uses Interfaces for Musical Expression (NIME05), Vancouver, BC,
the Wii Remote like beating a drum to generate sounds and Canada. 2005, 268-269.
animation movies. The generated sound is one of rhythmic
[2] Hashida, T. Naemura, T. and Sato, T. A System for
instruments selected in advance. Anytime a player can change his
Improvisational Musical Expression Based on Player's Sense
instruments by pushing buttons on the Wii Remote. Each button
of Tempo. In Proceedings of the 2007 International
has assigned rhythmic instruments, such as hi-hat, snare, kick,
cymbal, bongo, conga, wood block, triangle, castanets, maracas,
(NIME07), New York, USA. 2007, 407-408.
etc. The vertical axis of an accelerometer is used to trigger sounds
with velocity, and the horizontal axis to change tones and colors. [3] http://www.iamas.ac.jp/~aka/max/
Our system can handle a maximum of four Wii Remotes at a time. [4] http://www.nullmedium.de/dev/flashserver/
The generated animations are consisted of moving patterns of [5] Nakamoto, M. and Kuhara, Y. Circle Canon Chorus System
small geometrical figures or graphics of natural products such as Used To Enjoy A Musical Ensemble Singing "Frog Round".
leaves, flowers, stars, lightings, etc. As the motion gets faster, the In Proceedings of the 2007 International Conference on New
volume of sound and the size of patterns of animation become Interfaces for Musical Expression (NIME07), New York,
lager (see Figure 2). Each controller has its own color for drawing USA. 2007, 409-410.
graphical patterns, for example red, yellow, green, and blue. If the
[6] http://www.media.t-kougei.ac.jp/wiimu/
different colored patterns are collided, two colors are mixed to
make a new color and the size, direction, and velocity are
changed. Consequently, multiple players can perform an
ensemble in animation movies.
Figure 3. Ensemble of Wii remote, wind synthesizer, and

Figure 2. Screenshot of movie collaborated with two players. MIDI guitar with animation projected in a back wall.
346
Stage-Worthy Sensor Bows

for Stringed Instruments
Keith A. McMillen
BEAM Foundation
970 Miller Ave
Berkele y, CA 94708
1 510 502 5310
keith@beamfoundation.org
ABSTRACT This provides a robust rapid signal with over a 40 dB dynamic

The demonstration of a series of properly weighted and balanced range. Analog filtering was required to actually remove string
Bluetooth sensor bows for violin, viola, cello and bass. audio acquired by this sensor.
Keywords 2.4 Accelerometer

Sensor bow, stringed instruments, bluetooth An integrated 3-axis accelerometer is located within the frog
assembly. Sensitivity can be set from the host computer by the
1. INTRODUCTION user. The signal is filtered (low passed) at 160Hz. Updated
Reliable, practical stage-worthy sensor bows for the string family information can be retrieved down to 1.7 ms intervals.
have not been available to experimenters, composers and
performers. Now a complete series of bows for stringed
2.5 Signals relative to the instrument
instruments are presented. The bows provide hair tension, grip All the sensors mentioned prior, operate in free space with no
pressure, X-Y-Z acceleration, relative X-Y position wrt to the sense of relative position to the instrument. While necessary and
bridge and the tilt or twist of the violin bow. A Max based useful, knowing the bow’s position relative to the string and
application using custom RFCOMM objects provides processed bridge is essential for fully understanding the performer’s gestures
gesture outputs as well as a 2 axis trainable gesture extractor. and intentions.
A small emitter PCB clips under the end of the fingerboard of any
2. DESCRIPTION OF THE SENSOR BOW traditional instrument and most electric bowed instruments. The
emitter creates an RF field and an IR (infrared) modulated wide
2.1 Stick assembly field light cone.
The Sensor Bow (named K-Bow) uses a specially designed and The loop antennas within the stick are tuned to be resonant with
built Kevlar and Carbon Graphite stick, which has been made 10 the RF field. Further analog signal processing provides an
grams lighter than a normal stick giving a final weight and accurate signal strength reading which is converted and used as a
balance that is expected of a fine bow. Embedded within the stick bow to bridge distance measurement.
are two full-length loop antennas that are placed at right angles to
each other. These antennas are used to determine bridge – The quadrature relationship between the two loop antennas
fingerboard placement and the twist of the bow wrt the instrument provides a twist signal which is affected by the rotation of the bow
top. Cavities and connectors within the stick gather signals and stick relative to the emitter. This is timed and converted with a
convey them to the circuit board within the frog. 1% accuracy per 180 degrees of rotation.
2.2 Grip Sensor Directly beneath the bow hair sensor on the front of the frog is an
IR photo detector. The detector receives a modulated signal from
A cylindrical pressure sensor made of a 5 layer “sandwich” of an array of LEDs that emerge from beneath the instruments
conductive materials replaces the usual grip. Changes in
fingerboard. Decoded signals from the IR detector represent the
resistance occur in relationship to the pressure and total surface
distance of the frog from the fingerboard. These are processed in
area of the musician’s grip. The sensor output is fed to a 12-bit the analog domain before being presented to the 12-bit ADC.
ADC before it is transmitted to the host. Repeatability and return
to zero are very reliable. 2.6 The board within the frog
2.3 Bow Hair Tension Housing all of the circuitry in a frog not much larger than a
traditional bass frog was challenging. The board, itself, forms the
Many have tried to measure the pressure of the bow hair on the major structural element fastening the hair to the stick through a
strings by measuring flexing [1] of the stick. While this provides a
frog adaptor. The frog is fully adjustable providing the normal
useable signal, it is inherently prone to damage from exposure to
range of hair tension.
outside forces. In the K-bow a special angular measurement
scheme is attached to the bow hair at the frog end. After bringing The circuitry includes 20 op-amps, two cpus (an ARM7 and a
the bow up to tension, and upon power up, the sensor is auto Silicon Laboratories F411), the Bluetooth transceiver,
calibrated. accelerometer and extensive power management systems.
347
A 6 gram lithium polymer battery provides a full days use. The extended functionality beyond that provided. This program
battery is charged through a standard USB connector, which also provides user settable sensitivity options and a calibration routine.
provides for firmware updates. Triggers are extracted from inflection points in bow data. Data
Using different frog adaptors and hair mounting brackets alows smoothing and sensor blending provides fluidly useful data for
the same circuitry and housing to be used for violin, viola, cello, continuous control functions.
and bass bows. Programmable signal processing for violin audio provides a wide
Monitoring the accelerometer’s activity allows the CPU to range of timbres for user selection. A four track “Looper” is
determine bow activity and power down unused circuits to integrated into the application with controls tightly coupled to the
conserve power. A user settable “Off Interval” turns the entire bow’s capabilities.
bow off when this time is exceeded due to lack of bow motion. Included in the application is a 2D OCR MXJ Object. This can be
Toggling the power switch on the bow allows the bow to be trained from any 2 outputs from the bow. One use is to map X and
automatically discovered and routed to its previous application Y position into a trained object that recognize letters written in the
address. These states are forwarded to the Emitter under the air for control of recording functions or preset selection.
fingerboard so it can follow similar power management rules.
Connectivity is via Bluetooth 1.2 Class 2 devices. Normal line of A custom Bluetooth object for the bow interfaces the RFCOMM
site range is greater than 10 meters. Data rates can be updated as layer directly to Max/MSP. Bows can be named for easy
recognition when presented device lists by the Host OS.
fast as 1.6 ms for a single bow. Up to seven bows can be
supported by one host computer. 4. ACKNOWLEDGMENTS
Our thanks to Ashley Adams, Don Buchla, Dawson Bauman,
Chuck Carlson, Joel Davel, Jeff Van Fossen, David Hishinuma,
Marriele Jakobsons, Dan Maloney, Denis Saputelli and Barry
Threw for their contributions to the project.
5. REFERENCES
[1] Young, Diana s. New Frontiers of Expression Through Real-
Time Dyamics Measurement of Violin Bows. Masters Thesis,
MIT September 2001
[2] Rasamimanana, Nicolas H. Gesture Analysis Bow Strokes
Using an Augmented Violin. Technical Report, IRCAM, June
2004.
Figure 1 Degrees of Bow Sensor Data
3. Host Software
A host software program accompanies the bow. Written in
Max/MSP, the Host application can be easily modified for
348
Plink Jet
Lesley Flanigan Andrew Doro
Interactive Telecommunications Program Interactive Telecommunications Program
New York University New York University
721 Broadway 4/F 721 Broadway 4/F
New York, NY, USA New York, NY, USA
+1 212 998 1894 +1 212 998 1894
lesleyflanigan@gmail.com andy@sheepish.org
ABSTRACT the back-and-forth motion of each carriage is controlled by a

Plink Jet is a robotic musical instrument made from scavenged three-way switch. While under automatic control, the carriage is
inkjet printers and guitar parts. We investigate the expressive controlled by a micro-controller containing programmed patterns
capabilities of everyday machine technology by re- of movement.
contextualizing the relatively high-tech mechanisms of typical
office debris into an electro-acoustic musical instrument. We also 2.2 Strumming
explore the performative relationship between human and The guitar strings are plucked by motors with a single thin metal
machine. strip that strikes the string as it rotates around. Four dials control
the speed of the strumming motors. Control over the strumming
motors exists regardless of whether the associated carriage is
Keywords under manual or automatic control.
Interaction Design, Repurposing of Consumer Technology, DIY,
Performing Technology, Robotics, Automation, Infra-Instrument 2.3 Amplification
Inside each ink cartridge is a piezoelectric microphone used to
1. INTRODUCTION pick up the sound of the plucked guitar string as well as the
Plink Jet is a robotic musical instrument made from scavenged ambient sounds of the sliding cartridge. In many ways Plink Jet is
inkjet printers. The mechanical parts of four inkjet printers are an elaborate guitar, and like an electric guitar it has a single
diverted from their original function, re-contextualizing the quarter-inch output jack which allows it to be connected directly
relatively high-tech mechanisms of typical office debris into to a guitar amplifier.
musical performance. Motorized, sliding ink cartridges and
plucking mechanisms play four guitar strings by manipulating 3. TECHNOLOGY
both pitch and strumming patterns mimicking human hands The printer carriages and motors are from four inkjet printers. The
fingering, fretting, and strumming a guitar. Plink Jet is designed controlling circuits and electronics are custom-designed. The
to play in three modes: automatic (played by a micro-controller), optical encoder of each inkjet printer has been removed and
manual (played by a musician), or a combination of both. A replaced with a tunable guitar string that uses actual guitar tuning
musician can choose varying levels of manual control over the mechanisms built into the machine.
different cartridges (fretting) and string plucking speeds
(strumming), while improvising with preprogrammed sequences 3.1 Circuitry
of Plink Jet. While under manual control, Plink Jet’s circuitry is completely
analog. The only digital element is the micro-controller used in
automatic mode.
2. INTERFACE
Plink Jet is designed to play guitar strings both manually and 3.1.1 DC Motors
automatically. The interface consists of four toggle switches, four A DC motor connected to an H-bridge chip controls the back and
three-way switches, four dials, a single six-position rotary switch forth movement of each carriage. While in manual mode, the
and a single power switch. Each of the four toggle switches and three-way switch controls the H-bridge with 5VDC. While in
three way switches is associated with a single ink carriage. The automatic mode, the H-bridge is under the control of the micro-
rotary switch allows the user to select different pre-programmed controller.
patterns while a carriage is under automatic control.
3.1.2 Stepper Motors
2.1 Fretting The strumming mechanism is driven by stepper motors, normally
The guitar strings are strung across the printer mechanism where used for the docking procedure of the ink carriages. Each dial is
the optical sensor used to be. Cartridges slide up and down the attached to a potentiometer which controls the speed by changing
strings and touch the strings just enough to change the pitch, the voltages on an oscillator chip. The oscillator signals are
similar to a slide guitar. The farther away the cartridge is from the connected to hex divider chip, that acts as a stepper driver. The
plucking mechanism, the lower the pitch of the note. stepper signals are then relayed through a Darlington array before
triggering the stepper motors.
Each carriage is controlled by a toggle switch and a 3 way switch.
Toggle switches control whether the associated inkjet carriage is
under manual or automatic control. While under manual control,
349
3.1.3 Micro-controller originally intended. Taking apart these discarded machines is an

Plink Jet uses an ATMEGA168 chip containing six pre- opportunity to appreciate them for what they are, as opposed to
programming patterns to control the fretting when a carriage is in what they are intended for; such is appreciating a printer for its
automatic mode. A six-position rotary switch selects which mechanics, rather than its ability to print. Combining the parts of
pattern to use. When a carriage is in automatic mode, the these machines with parts of musical instruments is a way to see
ATMEGA controls the associated motor’s H-bridge. their operative similarities and learn how they work through
sound.
Inside an ordinary inkjet printer are the same toy-like, clockwork
mechanisms that have delighted people and sparked imaginations
for centuries. When we made Plink Jet, we took these
mechanisms and combined them with guitar. Now we not only
see the back-and-forth motion of the inkjet cartridge, we hear it.
Adding a guitar string highlights the design structure inherent in a
printer by relating pitch and rhythm directly to its mechanics. The
mechanical relationship between human fingers fretting a guitar
string and an inkjet cartridge riding an optical sensor is heard in a
musical scale. We did not take apart printers to make a guitar.
Plink jet is a hybrid. Both printer and guitar are heard. The
pickups not only amplify the guitar strings, but how the guitar
strings sound within a printer. The ticks, clicks and hums of the
printer mechanisms are amplified expressiveness, like slapping
the neck of a guitar or rubbing its strings. There is respect for the
original sound of the printer and it remains present in the new
invention.
Plink Jet is part of a long tradition of re-appropriating office
technology to create music. As early as the 1970s, computer
technicians learned how to use the early IBM 1400 mainframe
computer series to generate music- “… a purpose for which this
business machine was not at all designed. The method was
simple. The computer's memory emitted strong electromagnetic
Figure 1. Plink Jet at the ITP Winter Show 2007 waves and by programming the memory in a certain way and by
placing a radio receiver next to it, melodies could be coaxed out -
captured by the receiver as a delicate, melancholy sine-wave
4. EVERYDAY MACHINES AND MUSIC tone.” [3]. Technicians also learned to use the IBM 1403 printer
The repurposing of consumer technology is a growing trend for under the control of an IBM 1401:
artists and technologists in the DIY genre exploring circuit Clever engineers figured out what line of characters to print to
bending, hardware hacking and retro-engineering [6]. Artists who make a noise at a given pitch, and how many times to print that
have used the mechanics of printers for producing sound include line repeatedly to sustain that pitch for a given duration. In
Paul Slocum with his dot matrix printer and Eric Singer's other words, the printer could play musical notes. All that was
printer/scanner-inspired musical instrument, GuitarBot. The needed was a program for the IBM 1401 computer system that
innovative American composer Harry Partch built many of his read in a deck of punched cards, each card containing a single
note of melody, and then played the melody on the printer. The
instruments out of trash and his own carpentry. Plink Jet emerged
tempo could be adjusted using the sense switches on the
from the process of hardware hacking and could be considered an computer console. [2]
infra-instrument, a concept developed by John Bowers and Phil
Archer. Infra-instruments are often created by taking a non- Future iterations of Plink Jet could include more printers, more
instrument and finding the instrument within [1]. With Plink jet, precise levels of user control, different stringed instruments,
we have found the infra-instrument within the inkjet printer. greater levels of automated control, and more precise tuning and
plucking mechanisms. But decisions that concern the number of
Machines are built to be reliable. When they no longer function printers used or the use of guitar strings rather than violin strings
consistently, we quickly upgrade to new machines and readily are ultimately irrelevant to the nature of Plink Jet. Plink Jet is an
discard the old. But usually, these discarded machines are not experiment, a way for us to learn and have fun with mechanics
dead. Their mechanisms still function, though perhaps not as though sound. Plink Jet is not yet a fine-tuned and finessed
robotic instrument. It is an illustration of deconstruction and
reconstruction in process, alive with imperfections. It will be
Permission to make digital or hard copies of all or part of this work for interesting to see how future versions Plink Jet can change our
relationship to the original machine. The different sounds we are
able to coax out of more printers could inspire new observations
otherwise, or republish, to post on servers or to redistribute to lists, on how machines work and teach us more about what we desire in
requires prior specific permission and/or a fee. their performance.
350
move as well as sound, they take on a personality of sorts, and

inspire the human players in a unique way.” [5] Numerous
5. STRUCTURE AND IMPROVISATION options for playing Plink Jet between manual and automatic
Intuition plays a powerful role in how and why people perform. A control opens a dialog between the player of Plink Jet and the
human listens to his or her performance and is able to react and robotics of the mechanisms themselves, and a performance
make constant changes. A machine does not have this self- broadcasts this dialog between machine structure and human
awareness; it simply follows prepared instructions. We do not improvisation.
want our machines to improvise. We want a printer to function as
a printer, printing exactly what we want when we want it. If it
does not do what we expect, it becomes useless to us. With Plink 6. ACKNOWLEDGMENTS
Jet, human improvisation plays with the machine and transforms We would like to thank Danny Rozin, Todd Holoubek, Tom Igoe,
the predicable function of a printer into a unique and Gideon D’Arcangelo, Gian Pablo Villamil.
irreproducible performance.
A musician playing Plink Jet is like a pianist playing a player 7. REFERENCES
piano. Two performance operations are occurring simultaneously. [1] Bowers, John, & Archer, Phil. Not Hyper, Not Meta, Not
There are the programmed, ordered movements of the machine Cyber but Infra-Instruments.
itself, and there are the improvised decisions of the user regarding <http://hct.ece.ubc.ca/nime/2005/proc/nime2005_005.pdf >.
levels of automatic and manual control and his or her reactions to [2] Computer History Museum.
the precise mechanical patterns. The Player Piano is one of the <http://www.computerhistory.org/exhibits/>.
first examples of an automatic, mechanically played musical
[3] Jóhannsson, Jóhann. IBM 1401, A User´s Manual.
instrument, but early player piano rolls lacked expressiveness
<http://www.ausersmanual.com/data/ >.
when played because they were created by hand directly from the
music score [4]. Electronic or machine music often evokes very [4] Kapur, Ajay. A History of Robotic Musical Instruments.
different emotions as opposed to human-performed music because <http://mistic.ece.uvic.ca/publications/2005_icmc_robot.pdf
of its super-human precision. The combination of these two >.
musical aesthetics (prepared and improvised, machine and [5] Lotti, Giulio. LEMUR: League of Electronic Musical Urban
human) expresses a tension in our relationship with machines. Robots. <http://www.simultaneita.net/lemur2.html>.
Reflecting upon the interplay between a mechanical presence and
human player, Eric Singer of LEMUR has said “I believe it is an [6] Ramocki, Marcin. DIY: The Militant Embrace of
entirely new experience for the human players. The robots create Technology. <http://ramocki.net/ramocki-diy.pdf>.
a physical, responsive presence (unlike synthesizers) which can
profoundly affect the humans interacting with them. Because they
351
Oto-Shigure: An Umbrella-Shaped Sound Generator for

Musical Expression
Yusuke Kamiyama Mai Tanaka Hiroya Tanaka
Faculty of Environment and Faculty of Environment and Faculty of Environment and
Information Studies, Information Studies, Information Studies,
Keio University Keio University Keio University
5322 Endo, Fujisawa-shi, 5322 Endo, Fujisawa-shi, 5322 Endo, Fujisawa-shi,
Kanagawa 252-8520 Japan Kanagawa 252-8520 Japan Kanagawa 252-8520 Japans
+81-80-5549-5315 +81-80-5189-8263 +81-90-1954-5355
t04257yk@sfc.keio.ac.jp t06581mt@sfc.keio.ac.jp htanaka@sfc.keio.ac.jp
ABSTRACT developed a totally new sound generating interface for musical

In this paper we introduce Oto-Shigure, an umbrella-shaped sound expression.
generator for musical expression. Attention of the music Following are the advantage and disadvantage of the two sound
computing community hasn't been focusing on sound output but generating devices which have been mainly used.
sound input.However, we argue that in order to make highly (1)The speakers have an advantage of generating sound loudly
expressive sounds, it is significant to increase methods for sound and widely, therefore many people can enjoy it simultaneously.
output. Therefore, we developed the new sound-generating But then, itis difficult to carry them around and also unsuitable to
device for variety of musical expressions. Oto-Shigure provides use them with a few people in public space.
two experiences for the user; feel the sounds as if you are bathing (2)The headphones have an advantage of enabling to utilize it in
in the rain, and easily arrange the rotation of sounds above the any space. However, they can be used by only one person.
umbrella by controlling the sound localization. This is a device From these, we developed Oto-Shigure which has resolved the
not just for musicians but also non-musicians. problems above. Oto-Shigure realized portable and wireless
interfaceand made it possible for the user to generate sounds
anywhere. Additionally, it can be used by not only one person
Keywords but also with a few people. The maximumvolume can be
umbrella, musical expression, sound generating device, 3D sound obtained right under the umbrellaand people in the distance can
system, sound-field arrangement. not hear the sound, so the user can also use it in publicareas.
When the Oto-Shigure is generally used, it has a non-localized
1. INTRODUCTION soundcharacteristic. But if connected to the PC and specially
In the last few years, research in theinterface of input for musical processed with the software, it provides sounds from the optional
expressions and generated sounds have been an active area, along pointon the umbrella. The point is fixed by controlling the
with the progressing technology of computer. software running on iPod touch.
Bog[1] is an alien-shaped instrument with grasping interface. The
sound with formant synthesis is generated from the speakers.
Articulated Paint[2] controls various musical expressions by
painting with conductor-like movement. These methods of
musical expression can’t be invented without the progress of the
sensing technology and the speed-up of CPU.
There are various works of sound input systems and generated
sounds, though most of the works' sound output systems remain
using general speakers, headphones, and earphones. Many
diverse sensors and sound synthesis algorithmswere createdto
extend musical expressions, and new methods for musical
expressions have been provided one after another.However, to
generate more various musical expressions, we considered that it's
Figure 1. Using Oto-Shigure
necessary to use the novel sound output systems. From this, we
2. RELATED WORKS
personal or classroom use is granted without fee provided that copies are The works with umbrellas have been variously provided.
not made or distributed for profit or commercial advantage and that Amagatana[3] is an umbrella shaped portable device. By
copies bear this notice and the full citation on the first page. To copy swinging it, sounds of a sword are generated from the headphone.
otherwise, or republish, to post on servers or to redistribute to lists, Rain Dance[4] is an installation content using umbrellas. When
requires prior specific permission and/or a fee. the user whoputting up the umbrella passes through the
352
shower which has the audio vibration, the shower is cut off by the Through these processing, Oto-Shigure enables to make various
umbrella and a sound is generated. musical expressions; locating the sound on the surface of umbrella
and generating the surround sound like the 5.1ch surround system.
3. IMPLEMENTATION Also, using original interface running on iPod touch enables the
user to control sound area intuitively on real-time without
3.1 Technique for Generating Sound complicated manipulation. For example, when the user draws a
The system presented in this work is based on the original sound
generating system (Figure 2). Oto-Shigure comes equipped with
vibration motors instead of speakers. The vibration motors are
attached to each of four tips of the umbrella ribs and they generate
sounds by vibrating the whole umbrella cloth. The audio signal
from line input is amplified from 100 to 200 times by operational
amplifier (LM386)(Figure 3). The amplified audio signal is
transmitted to the vibration motors and resonate the ribs and cloth
along with the surrounding air.
A sound-generating device, such as a speaker in general, makes
point source sound, while our sound generating system enables to
make almost plane source sound. That is due tothe umbrella
cloth that vibrates as a plane and makes the air resonate. This
allows making non-localized sound and enables the user under the
umbrella to have a quite new experience like bathing in the sound
of a rain. Additionally, the umbrellas in general are made of Figure 3. Printed circuit board of amplifier
cloth, vinyl and metals, but we used the traditional Japanese circle clockwise on iPod touch, the sound and its effect heard
umbrellamade of Japanese paper and bamboo. Owing to these from above his head rotating clockwise.
materials' high sound transmission property, superior sound
characteristic and the high sound volume were realized.
4. CONCLUSION AND FUTURE WORK
In this paper, we explained about the new sound-generating
interface which enables the user to develop various musical
expressions. There are two ways to generate musical
expressions with Oto-Shigure. One is, generating non-localized
and airy sounds without any cords. The user can feel as if the
sounds of falling rain are encompassing their whole body. Second
is, generating 3D sound space under the umbrella by connecting
to PC and controlling the interface built in iPod touch. This is the
novel sound output system which provides two kinds of quite
unlike sounds.Moreover, we will create an interactive content
that can be usedamong multi-usersfor our future work.
5. REFERENCES
[1] Takahashi, Bog: Instrumental Aliens, In Proc. of the 2007
(NIME2007), 429, 2007,
[2] André Knörig, Boris Müller, Rato Wettach, Articulated
Paint: Musical Expression for Non-Musicians, In Proc. of the
2007 Conference on New Interfaces for Musical Expression
Figure 2. Two Systems of Oto-Shigure (NIME2007), 384-385, 2007.
[3] Yuichiro Katsumoto, Masa Inakage, "Amagatana", Ars
3.2 3D Sound Electronica 2007 Pixelspace, Linz, Austria, 5-11 September,
To generate 3D sounds, the audio signal is processed by computer 2007.
software. The processed signal goes through the external audio [4] Paul De Marinis, Rain Dance,
interface and is generated from Oto-Shigure. Specifically, the
http://www.well.com/~demarini/exhibitions.htm
software splits the input audio signal into four equal signals and
controls the volume and phase of each signal, based on the
assigned point. This software is developed by Max/MSP.
353

!" $&'()*

#$
$ #$
$

%

!" "

*+=? #

#
##

#

! # #
#

#
#
#

#
$

#

*+=?

@

\

*+

@#*+=?
Q#

@
@

# *+=?

Q

\
#

#

#

#

\
^\` + ` = $
{| }

=

!
@

#
#

#
#

@

#

|
@ *+=?

! =

@

#

#$% &
QQ

@
#
Q

Q {

#

#
|

' "()** + * (

#

#

# # #
#
##

\
^

#
@

= +

#

#

#

##

#
*+=?

#
#
$

#

*+=?
##

@
*+=?Q

`|

#

##
\
^\`
}
#

= #

@

#

/&
#

+ `
=

#

@ `

`|
+

`
@#}*
@

354

#$

@

@

@
*+=?

}

=
#

##

\
^\`

!
#
#

<%(

@

#

=

*+=?

##
#

#

+`=\
^\`

Q

}

\
^\`

{|

#$% & "()** %*
8)9%* :( '=>

@
#

\@
!

#
* #

# ` ` |
}

#

*
# ! #
!
# @

@* $

@

/#>
*

@ $
@

*+=?#

#=\

#
#
@

@

@

{ # @

^^
^
8>=?
@
@\
=

`

#

;#
\
\ ^^

^^

\

`=\`
#
`#

@@ {
!"
#
##`
#
#$% '& (*"++"*("(:%"+
$

=
$ @
#
` `# $"
;:%"+

}

!
##
# }

355

#

! " $ "

*

+( )

, -

/( )

"

*, -

& "

"

!

"

#

)

&"
0&1

$

"

"

"!! " "

%

!

&

'( )

#$% "!! " "

&
"

" *"

2

3

3

* 4
56 7+889:%

;

<(

356
2

<
(

)

*

%

" "&" ' "$

&1

*#
'+

(#-

)

& /

#
"

)

"

#
& *

*

=

$

("))"$
=

%

*

*-.

"
: ; :

>

;;?)

0

#

/#

@'A &"
&1

444B 4

*+)%& "& ,"' @+A C"D

44" 4>

4
@/A :

44 "
4"4)C?
#$% +)%& "& ,"'
357
Video Based Recognition of Hand Gestures by Neural

Networks for the Control of Sound and Music
Paul Modler Tony Myatt
University of Media, Arts and Design University of York
Department of Media Art/Sound Department of Music
76135 Karlsruhe, Germany YO 105 DD York, United Kongdom
pmodler@hfg-karlsruhe.de am12@york.ac.uk
ABSTRACT For this, the motion of a gesture is grouped in at least two main
In recent years video based analysis of human motion gained states performed in a repetitive way. The whole gesture may then
increased interest, which for a large part is due to the ongoing be seen as a progression through a cyclic state model with the
rapid developments of computer and camera hardware, such as aim to view the gesture not as an isolated event but in the gesture
increased CPU power, fast and modular interfaces and high context and related motions.
quality image digitisation. A similar important role plays the
development of powerful approaches for the analysis of visual
2. VARIATION OF GESTURE INSTANCES
data from video sources. In computer music this development is Each gesture was recorded at 3 lower positions and 2 upper
reflected by a row of applications approaching the analysis of positions of the gestural space of the hand and arm to obtain data
video and image data for gestural control of music and sound reflecting the variance of the hand articulation at differing
such as Eyesweb, Jitter, CV ([1,[2], [3]). Recognition and locations. All 5 recording instances were aimed to be in a plane
interpretation of hand movements is of great interest both in the parallel to the front of the camera. Blended hand positions for the
static states of four cyclic gesture types are shown in Figure 1 to
areas of music and software engineering ([4], [5], [6]). In this
demo an approach is presented for the control of music and Figure 4.
sound parameters through hand gestures, which are recognised
by an artificial neural network (ANN). The recognition network
was trained with appearancebased features extracted from image
sequences of a video camera.
1. A SET OF CYCLIC HANDGESTURES

Previous experiments showed that hand gestures may be
combined as cyclic gestures such as waving the hand or pointing
to the left and to the right with the index finger [7].
Gesture Description Short names of
main states
Index up/down Index finger moves up indUp, indDo Figure 1: Horizontal Figure 2: Vertical
and down open/close open/close
Index left/right Index finger moves left indLe, indRi
and right
Cut up/down Flat hand moves up and cutUp, cutDo
down
Cut left/right Flat hand moves left and cutLe, cutRi
right
Horizontal Hand with horizontal horOp, horCl
open/close back opens and closes
Vertical Hand with vertical back verOp, verCl
open/close opens and closes
Croco open/close Hand with thumb opens corOp, corCl
and closes
Swing open/close Hand turns and opens swiOp, swiCl
and turns and closes Figure 3: Croco open/close Figure 4: Swing open/close
Table 1: A set of cyclic gestures of the left hand 3. TIME DELAY NEURAL NETWORKS
Time Delay Neural Networks (TDDN) are feed-forward
Permission to make digital or hard copies of all or part of this networks and incorporate the learning of time series through a
work for personal or classroom use is granted without fee series of data windows (delays) shifting in time over the data
provided that copies are not made or distributed for profit or series. An exemplary TDNN would consist of 4x4 input units
commercial advantage and that copies bear this notice and the and 4 input delay frames. To apply such a TDNN to image
full citation on the first page. To copy otherwise, or republish, features larger input frames were used i.e. 1024 or 256 input
to post on servers or to redistribute to lists, requires prior units and a hidden layer size of 50 units ([8].) The number of
specific permission and/or a fee. output units was in the range of 24 to 37 similar as in the shown
358
example. In our approaches each output unit was associated with aleatoric and the strict approach the binding of the body gestures
a certain state of the training patterns for the network. to the musical actions have to be considered thoroughly.
A similar situation may be found for the required number of
4. GESTURAL CONTROL OF SOUND recognisable gestures, which differs between the musical
The system tries to realise the control of a sound generation intention and the role it assigns the gesture recognition. Two or
process by using discrete bindings of gestures to sound three gestures may be enough to play a central role in a piece.
parameters. The system uses gestures of the left hand which are For a complex control a larger number of gestures is required
recognised by the video analysis. Two identical sound generation e.g. more than the 16 gesture states of the hand used for the
processes for live sampling and sound modification are realised training of the neural network of the demo system.
in a Max/Msp patch. The position space of the hand is divided
through a dedicated object (Gitter) into 9 concentric fields
(Figure 5).
Figure 5: division of gesture plane into concentric fields

The binding of a parameter group to a Gitter field is aimed to
locate the more important and more often-used parameters in the
centre of the position space, and less-often used parameter
groups around the central area.
Remote to body Centre Close to body
Effect distortion (low) Diffusion/panning (mid) Reverb (low) Figure 7: Stage setup
Volume (low) Selection of sound (mid) Recording (mid) The goal in the applied setup was not to use as many as possible
Effect ring-modulation Filter (high) Granulation mappings provided from the gesture recognition or the visual
(low) (high)
analysis but to focus on the case, that visual musical control is
achieved by an automatic recognition process. Gesture
recognition may be only one part in the dramaturgy of a piece
Table 2: Binding of sound parameter groups
but it reflects the development of information technology, which
In Table 2 the estimated complexity and number of parameters is approaches human mind and human body.
given in brackets. Each field in the gesture coordinate extends
into a list of selectable choices. In the stage setup (Figure 7) 6. REFERENCES
these choices are mainly implemented as settings for the [1] Camurri, A., Hashimoto, S., Ricchetti, M., Trocca, R.,
parameter category associated with the field, representing a Suzuki, K., Volpe, G., EyesWeb toward gesture and affect
morphing state of the soundproceesing patch (Figure 6). recognition in interactive dance and music systems,
Computer Music Journal, pp. 57-69, MIT Press, Spring 2000
[2] cv.jit, Computer vision for jitter by Pelletier, J. M.,
http://www.iamas.ac.jp/~jovan02/cv/ (2007)
[3] Cycling74,www.cycling74.com, (2008)
[4] Axel G. E. Mulder, S. Sidney Fels and Kenji Mase, Empty-
handed Gesture Analysis in Max/FTS, AIMI international
workshop on kansei the Genova, Italy, 1997.
[5] Cadoz C., Wanderley, M., Gesture-Music, in M. Wanderley
Figure 6: Display of sound actions (reverb) bound to the and M. Battier (eds): Trends in Gestural Control of Music-
current field of the hand position Ircam - Centre Pompidou, 2000
[6] Mathias Kolsch, Matthew Turk, "Robust Hand Detection,"
5. RESULTS Sixth IEEE International Conference on Automatic Face and
The demo system is a protoype for the usage of gesture Gesture Recognition, 2004
recognition with artificial neural networks integrated in a sound
generation context. The degree of required recognition precision [7] Modler, P., Myatt, A., Saup, M., An experimental set of
varies between different performance paradigms. It may range hand gestures for expressive control of musical parameters
from an aleatoric approach where it low recognition rate of 75% in realtime, Nime-2003, McGill University, Montreal, 2003
to 85% is sufficient to a strict binding of the gestures to a
[8] Modler P, Myatt A., Image features based on 2-dimensional
complex control of an elaborate instrument, where a single miss-
FFT for gesture analysis and recognition, Proceedings of the
recognition will disturb the whole musical concept or at least will
4th Sound Music Computing Conf. Lefkada, 2007
be perceived as a hindering error. For both extremes, the
359
beacon: Embodied Sound Media Environment

for Socio-Musical Interaction
Kenji Suzuki Miho Kyoya Takahiro Kamatani Toshiaki Uchiyama

Dept. of Intelligent School of Art and Design College of Media Arts, Dept. of Kansei, Behavioral
Interaction Technologies University of Tsukuba Science and Technology and Brain Sciences
University of Tsukuba mihokyoya@mac.com University of Tsukuba University of Tsukuba
kenji@ieee.org buhii314@gmail.com uchi@kansei.tsukuba.ac.jp
ABSTRACT by means of sound, music and his/her own body motion like
This research aims to develop a novel instrument for socio- dancing or tapping. The promising applications include edu-
musical interaction where a number of participants can pro- tainment, recreation, fitness, rehabilitation, entertainment
duce sounds by feet in collaboration with each other. The or sports and new artistic expression.
developed instrument, beacon, is regarded as embodied sound
media product that will provide an interactive environment 2. SYSTEM OVERVIEW
around it. The beacon produces laser beams lying on the
ground and rotating. Audio sounds are then produced when Hardware Configuration: The developed instrument con-
the beams pass individual performer’s foot. As the perform- sists of a loudspeaker, a small-size computer, 60 line laser
ers are able to control the pitch and sound length according modules, 2 laser range finders, dial and buttons interface,
to the foot location and angles facing the instrument, the and battery. All equipments are installed in a cylinder
performer’s body motion and foot behavior can be trans- shaped interface as illustrated in figure 2. This instrument
lated into sound and music in an intuitive manner. is a kind of small lighthouse sending out line laser beams.
The beams are used not only to mark the current location
to produce the sound but also to assist musical interaction.
Keywords In the current implementation, up to 4 laser beams with
Embodied sound media, Hyper-instrument, Laser beams equiangularly-spaced directions are lying on the ground and
rotating during musical performance. The rotation speed of
laser beams can be set from 40bpm to 100bpm. At the bot-
1. INTRODUCTION tom of the instrument, two laser range-finders are installed
Many sound installations with electroacoustic techniques and used for the distance measurement to performers, in
have been presented so far. However, those systems usually particular those foot positions and its angles every 100 ms
require a large space or complicated instruments. There at the height of 1 cm from the ground. The installed range-
are very few compact interfaces for enjoying music with finder has 4[m] measuring range with 99% range accuracy,
other performers or audience like conventional musical in- and also has a 240 degree angle of view for each. We used
struments. In this paper, we introduce a portable instru- two range-finders in order to obtain omni-directional dis-
ment called beacon for socio-musical interaction. A number tance map every time.
of line laser modules are installed, and the laser beams are
produced and rotated around the instrument. The beam Motion-to-sound Mapping: The performer is regarded
performs like a moving string because sounds are generated as a musical note. beacon generates sounds when the beams
every time the beam lying on the ground passes the perform- passed individual performers as if the rotating laser beams
ers’ feet. A real-time motion capture technique and pattern could detect them. However, in reality, the performers around
recognition of users’ feet are used in order to create a new by beacon are detected at all times by the equipped omini-
style of musical interaction. Therefore, this instrument pro- directional laser range-finder. A number of performers, there-
vides an embodied sound media environment [1] where ev- fore, can participate in a musical session, and individual per-
eryone can readily enjoy it for playing the sounds without
scores and also can interact with others through collabo-
rative musical experiences. In the interactive environment
around beacon, people can communicates with each other
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
Figure 1: beacon - a new interface for socio-musical
interaction.
360
formers is able to change the pitch in accordance with the

distance from the center of the instrument to the foot. The
sound lengths, on the other hand, are determined based on
the foot angles. When the performer points his/her toes
toward the beacon, shorter sounds are produced. The per-
former put the entire length of the foot facing beacon, longer
sounds are played. The attack and decay of each note are
predetermined, and the sound volume is fixed in this work.
The timbre, tone colors and major/minor keys can be se-
lected, even during the performance, by using buttons and
a dial interface that are equipped at the top of instrument.
Sensing Spatial Movement: A rectangular-to-polar co-
ordinate transformation is required in order to normalize the
Light beam
size and angles of performer’s feet due to the characteristics Timber pattern (rotating)
of measurement. The figure 3 shows an example of sensing Beat pattern
(dial-like interface) Foot angle
by the range-finder. An identical foot is measured as differ- (note length)
ent size according to the distance from the instrument. The beacon
calculated adjusting parameters are obtained based on a cal- Distance (pitch)
ibration technique, and used for motion-to-sound mapping.
We conducted two preliminary experiments to evaluate the Figure 2: beacon - for socio-musical interaction
sensing property. Subjects are asked to stand in front of the
instruments with different distances and facing angles of the
foot. The experimental results of 5 male/female individual
subjects with different foot sizes are shown in the figure 4.
The relationship between the distance d from the performer
to instrument (X-axis) and the measured angular area θ (Y-
axis) is approximated by 2 tan−1 (x/2d) with regard to x as
the subject’s averaged foot size. The ideal curve is calcu-
lated using geometric property, which is indicated by the
bold line. This results verified that we could approximate
the scaling effect by using the offset parameters kd of the
ideal curve described above.
We also obtained the relationship between the facing an- Figure 3: An example of measurement of feet
gle to the instrument and measured angular area with differ-
ent distances. The measured data are collected with facing
foot angles every 22.5 degrees. A clear difference can be
seen according to the facing angles of foot at the distance of
500mm-1000mm. However, there are small differences over
the distance of 1500mm. Based on this result, we use the
foot angles for control of sound length within the area less
then 1500mm radius from the instrument.
3. DISCUSSION AND CONCLUSIONS

We introduced a novel interface beacon for socio-musical
interaction.This instrument allows performers not only to
communicate with each other via music and motion, but
also to improve the quality of sound production by training Figure 4: Foot sizes with different distances
and devising various types of behavior. This novel instru-
ment thus can be used for the physical exercise or recreation
with fun. Moreover, by arranging small objects around the
instrument, a variety of sound will be produced like a envi-
ronmental music box. Sounds are generated when the laser
beam passes the objects as well as human performers. This
installation provides a new artistic expression for spatial de-
signers. This round-shape interface does not have any di-
rectional characteristics and plays a key role of gathering
people for affective communication.
Acknowledgement
A part of this work is supported by JST CREST ”Generation and
Control Technology of Human-entrained Embodied Media.”
References
[1] K. Suzuki et al., Proc. IEEE, 92(4), pp. 656-671, 2004. Figure 5: Foot angles with different distances
361
Prototype GO: Wireless Controller for Pure Data

Eva Sjuve
School of Computing,
Communicationsand Electronics
University of Plymouth, UK
eva@moomonkey.com
ABSTRACT into Pure Data, over a emulated serial port [6].

This paper describes the development of a wireless wearable
controller, GO, for both sound processing and interaction
with wearable lights. Pure Data is used for sound processing.
The GO prototype is built using a PIC microcontroller using
various sensors for receiving information from physical
movements.
Keywords
Wireless controller, Pure Data, Gestural interface, Interactive
Lights.
1. INTRODUCTION
This paper describes the development of prototype GO, a Figure 1. Prototype GO.
wireless and wearable controller for sound processing using
Pure Data [1]. GO is being developed as part of research i n
wireless and portable systems for sound processing. Various
sensors on the GO board are reading data from human
movements. Output from GO is, in addition to live sound
processing, also using various lights modules
corresponding to physical movement. The first stage of
development was described in Designing Prototype GO for
Sound and Light. [2] To couple sound and light for live
performance has not been examined within studies of
wearable interactive performances. Figure 2. Volume control.
2. PROTOTYPE GO 3. PURE DATA

Prototype GO is a wearable controller, aimed to work with Pure Data is used for sound processing. GO is set up t o
physical movements, to generate interaction with both interact with Pure Data in two different ways. One way of
sound and lights. One of the design ideas of the GO board, interaction is if GO is placed inside of a moving object,
which can be seen in Figure 1, is to build a modular without a performer, the accelerometer advances the sound
controller, with easily detachable and exchangeable sensors composition depending of the movement of the object. The
and light modules both for practical and for artistic second way of interacting with the composition is by using
purposes. physical movements of a performer, using both the
accelerometer and the switches. See main interface for Pure
Data in Figure 3.
The circuit board holds a PIC micro controller [3], with an
accelerometer from Analog Devices, ADXL210 [4], three
micro switches, and one bend sensor from Images Scientific
Instruments [5]. The accelerometer sends out values
depending on its relation to earth's gravity. It is a 2-way axis
sensor. The bend sensor, in Figure 2, controls volume.
Information is sent from GO via a wireless Bluetooth module
Figure 3. Main interface in Pure Data.
362
4. PERFORMING WITH LIGHTS

Performing with a small amount of light sources is not very Prototype GO is an experiment in creating a sound
common. Most music performances are taking place on a performance with lights corresponding to the performer's
stage with a traditional light environment, where the musical movements coupled with sound processing. Some of the
performer is fully visible to the audience. A laptop musician light modules for GO can be seen in Figure 4 and Figure 5.
is often using light coming from the laptop, where the
audience's gaze is focused on the often un-expressive face of 5. FUTURE WORK
the performer. GO is a work in progress. Future developments will be
focused on a few issues. There will be more work on the
sound composition. The development of the final board will
also be made, with a Printed Circuit Board (PCB). One of the
design criteria of the final printed circuit board, is that the
size will be as small as possible, since the board should not
be visible during the performance. There will also be work
made on computational light modules corresponding t o
movements. Some more work will be put into developing
new light modules, as well as new switches.
ACKNOWLEDGEMENTS
Many thanks to White Noise ll and New Composers Series at
White Box in New York for commissioning the GO
Karamazov performance for PERFORMA 07 [7]. Many thanks
also to Nordscen Artist in Residency Program for giving me
space to work out the first prototype and light modules for
GO [8].
Figure 4. Light module circle.
REFERENCES
[1] Pure Data, http://puredata.info/
[2] Sjuve, E.S.I. Designing Prototype GO for Sound and
Light. In Proceedings of Pure Data Convention 2007,
(Montréal, Canada, August 21-26, 2007),
http://artengine.ca/~catalogue-pd/
[3] Micro Chip, http://www.microchip.com
[4] Analog Devices, http:// www.analog.com/
[5] Images Scientific Instruments,
http://www.imagesco.com/sensors/flex-sensor.html
[6] Serial communication, RS-232 is emulated by rfcomm, a
Bluetooth protocol.
[7] PERFORMA 07, http://www.performa-arts.org/
[8] Nordscen, Nordic Resort,
http://www.nordscen.org/nordicresort/
Figure 5. Light module centre.
363
From toy to tutor: Note-Scroller is a game to teach music
Robert Macrae Simon Dixon

Centre for Digital Music Centre for Digital Music
Department of Electronic Engineering Department of Electronic Engineering
Queen Mary, University of London Queen Mary, University of London
robert.macrae@elec.qmul.ac.uk simon.dixon@elec.qmul.ac.uk
ABSTRACT
The computer games industry has recently been producing
titles in a new genre called ‘music’ that toys with engag-
ing the user in musical expression. The technology used in
these games has allowed for novel interfaces for represent-
ing musical instructions which has yet to be tried within
musical practice and tuition. The games themselves have
greatly simplified the instruments and the music created to
the point where the skills learnt are not transferrable to the
actual instruments that they seek to recreate.
The aim of this work is to explore the potential to move
from this category of entertainment systems based on mu- Figure 1: A front and back view of the projected
sical expression towards tutoring applications that support display
learning musical instruments in an entertaining and reward-
ing experience. Note-Scroller (Figure 1) is an interface that
has been designed to bridge this gap and act as a case study 2) for example, is the first video game franchise to generate
for evaluating the potential of such a movement. It is hoped more than $1bn in revenue and the title Dance Dance Revo-
that Note-Scroller will be fun and intuitive to use, teaching lution (with a similar instructive interface) alone sold more
users how to play music on a piano-style keyboard. than 7.5 million units. Recreating the elements that make
these games successful in educational platforms would be
the next logical step.
Keywords The motivations for incorporating this set of features into
Graphical Interface, Computer Game, MIDI Display one package are varied but ultimately stem from the hope
of making music more accessible to a wider audience:
1. INTRODUCTION • The display methods used in these popular video games
In this paper, we describe an interactive system for pro- may be more intuitive to non music-literate users than
viding the user with musical instructions using methods in- standard music notation.
spired by video games. This system also has the purpose of • Computer games incorporating modern features have
acting as a learning aid that can evaluate the users perfor- been shown to increase students’ motivation [7].
mance in real-time to provide visual feedback. As MIDI files
often contain the full range of instruments and instructions • Adding visual cues to the users’ auditory feedback
used in a musical piece, the option of having the computer loop may result in an increase in performance.
play selected instruments means Note-Scroller can also ac-
company the player. • As an interface to the vast MIDI content already avail-
Previous work in this area includes examples of computer able on the internet, students will be able to choose
based musical learning aids [2]. Work by Guillaume Denis musical pieces from an almost limitless source.
and Pierre Jouvelot [3, 4] demonstrates the opportunity to By making the performance of music more accessible it
create video games with the purpose of teaching music. The was intended that such a system will remove the typical
use of visual feedback in musical expression and its impli- barriers that deter people from learning to play instruments
cations on mental workload was explored in François et al’s such as the cost of tuition, availability and the requirement
Mimi tool [5]. Also, games that already go some way as to of reading standard music notation. However, Note-Scroller
use musical expression on a more simplified level are proving would also ideally be used in conjunction with other learn-
immensely popular. The Guitar Hero franchise (see Figure ing methods.
2. DESIGN
Permission to make digital or hard copies of all or part of this work for Similar to the displays featured in video games such as
personal or classroom use is granted without fee provided that copies are Guitar Hero, Frets on Fire and Dance Dance Revolution,
not made or distributed for profit or commercial advantage and that copies the instructions flow to the user as and when they need to
bear this notice and the full citation on the first page. To copy otherwise, to be executed. As these video games show, providing users
republish, to post on servers or to redistribute to lists, requires prior specific with visual cues of what is coming next allows the user to
NIME08, Genova, Italy prepare more efficiently for their subsequent actions utiliz-
Copyright 2008 Copyright remains with the author(s). ing any spare attention [8]. Another benefit to the animated
364
tems. One example would be in using live audio synchro-

nization to provide a means of controlling the playback
speed of the instructions by simply playing faster or slower.
There are also interesting developments in haptic interfaces
[6] that could pave the way for a touch sensitive feedback
loop to the system.
The main drawback with the Note-Scroller interface is in
setting it up as the system requires a projector, computer
and a suitable keyboard/piano screen to project the image
onto. This also entails a lot of trial and error getting the
virtual keys in line with the keyboard and finding space be-
hind the instrument to position the projector. This problem
Figure 2: From left to right: The interface for Gui- could be mediated with a resizable display and with the ad-
tar Hero, Dance Dance Revolution and Note Scroller vancement of flat screen monitors the projector will at some
point be no longer necessary.
Adapting commercially successful games to suit educa-
display over static music sheets is that the user doesn’t have tional needs isn’t new, with an example being Neverwinter
to change their point of focus as the required information Nights [7] being transformed into a historical and politcal
moves itself to where the user’s gaze already is. educational game for students. Dance Dance Revolution is
Using a MIDI connection to the keyboard, or a Moog also being used in schools in West Virginia to aid Physical
PianoBar [9], the proposed system keeps track of which Exercise lessons [1].
notes are being played correctly. By changing the display It is the long term goal of this work to use the popular-
of the notes to green or red, (depending on whether they are ity of video games, and their ability to reach and motivate
being played), the system will provide the user with visual millions of people, to teach transferable skills. Learning mu-
feedback. This colour code follows the familiar ‘traffic light’ sical instruments using Note-Scroller has provided a clear
convention that will be recognizable by users [10]. example of how this could be done.
The musical instructions will be projected onto a screen
above the keyboard/piano, flowing down to where the keys 5. ACKNOWLEDGMENTS
are. Therefore when a note needs to be played, the graphic
With thanks to Andrew Robertson for his feedback. RM
representing that note is close to the keys. As the note
is supported by a studentship from EPSRC.
graphics are vertically aligned with the keys they corre-
spond to, there is a clear natural mapping [10] between
the musical instructions and their executions. 6. REFERENCES
[1] R. R. Borja. Dance video games hit the floor in
3. IMPLEMENTATION schools. Education Week, 25(22):1, February 2006.
[2] R. B. Dannenberg, M. Sanchez, A. Joseph, R. Joseph,
Note-Scroller is designed to be used in conjunction with
R. Saul, and P. Capell. A computer-based multimedia
a projector, screen, keyboard and MIDI input cable. The
tutor for beginning piano students. Journal of New
interface loads MIDI files by drawing all notes on a verti-
Music Research, 19(2-3):155–173, 1990.
cal piano roll that then flows down the screen. The user
presses a key whenever a note graphic reaches the keys on [3] G. Denis and P. Jouvelot. Building the case for video
the piano-style keyboard. The user can select which of the games in music education. In In Second International
used MIDI instruments are shown and also the instruments Computer Game and Technology Workshop, 2004.
played by the computer. When the notes pass by the play- [4] G. Denis and P. Jouvelot. Motivation-driven
ing line, the interface looks to see whether that particular educational game design: applying best practices to
note is being played and changes the display accordingly. music education. In Proceedings of the 2005 ACM
The user is marked on whether or not they have played SIGCHI International Conference on Advances in
the correct notes. The user then has the option of choos- computer entertainment technology, volume 265,
ing whether playing additional notes detracts from his/her pages 462–465. ACM, 2005.
score as some users may wish to intentionally improvise [5] A. François, E. Chew, and D. Thurmond. Visual
within a performance. feedback in performance-machine interaction for
The system is designed to be simple to use in that the musical improvisation. In Proceedings of the 2007
user opens the MIDI file and uses start, stop, tempo, zoom Conference on new Interfaces for Musical Expression
and position sliders to fully control the flow of information. (NIME07), pages 277–280, 2007.
The display is customizable so the user can configure how [6] K. S. Hale and K. M. Stanney. Deriving haptic design
much information is shown to suit themselves. guidelines from human physiological, psychophysical
A core MIDI library of basic piano pieces was assembled and neurological foundations. IEEE Computer
from freely available MIDI files on the internet. The user Graphics and Applications, 24(2):33–39, March 2004.
can then find and load more MIDI files as they see fit. [7] H. Jenkins, E. Klopfer, K. Squire, and P. Tan.
Entering the education arcade. Computers in
4. CONCLUSIONS & FUTURE WORK Entertainment (CIE), 1(1):17–17, October 2003.
[8] N. Lavie and Y. Tsal. Perceptual load as a major
Preliminary testing of the Note-Scroller system has so far
determinant of the locus of selection in visual
been promising, showing that music-illiterate players could
attention. Perception & Psychophysics, 56:183–197,
perform basic pieces and on average thought the system was
1994.
better for them to learn music rather than reading scores.
Work on improving Note-Scroller is ongoing as there are [9] Moog. Moog pianobar. http://www.moogmusic.com.
many opportunities arising within digital music that may [10] D. Norman. The design of everyday things. Basic
add to the functionality of Note-Scroller and similar sys- Books, 2002.
365
Gluisax: Bent Leather Band’s Augmented Saxophone Project
Stuart Favilla Joanne C annon

Bent Leather Band Bent Leather Band
Sonic Frontiers CTME SonicFrontiersCTME
Victoria University Victoria University
+61 3 97301026 +61 3 97301026
sfavilla@b igp ond.com joanne_cannon@bigp ond.com
Dale C hant
Tony Hicks Musician/Saxophonist
Musician/Saxophonist
Software Developer
Improvisation Department
red centre software
Paris Favilla
Victorian College of the Arts Musician/Saxophonist
+61 3 94597936
+61 3 94597936 Improvisation Department
dale@redcentresoftware.com Victorian College of the Arts
hixt@op tusnet.com.au
ABSTRACT joined by a number of new musicians in order to develop new

interfaces or extend their own acoustic instruments with
This demonstration presents three new augmented and meta
sensors, sound interfaces and software. The project has been
saxophone interface/instruments, built by the Bent Leather
working with gluion streaming interfaces [5], with the idea of
Band. The instruments are designed for virtuosic live
forming a large ensemble of networked playable instruments.
performance and make use of Sukandar Kartadinata’s Gluion
This demonstration presents work undertaken with three
[OSC] interfaces. The project rationale and research outcomes
saxophone players, Tony Hicks, Dale Chant and Paris Favilla,
for the first twelve months is discussed. Instruments/interfaces
to develop extended saxophones for this larger ensemble.
described include the Gluisop, Gluialto and Leathersop.
Other musicians have contributed also, including Derek
Pascoe from the University of Adelaide.
Keywords
Augmented saxophone, Gluion, OSC, virtuosic performance 2. BACKGROUND
systems When embarking on this project, we were conscious that the
saxophone has had quite a history of modification and use i n
electronics. After all, it was Daniel Kientzy’s Computersax [6]
1. INTRODUCTION work in the late 1980’s early 1990s, that served as an initial
The Bent Leather Band is currently undertaking its Sonic inspiration for us to head into signal processing. Braxton and
Frontiers residency at Victoria University’s Centre for Rosenboom’s live interactive CD was a favorite for a while and
Telecommunications and Microelectronics [Melbourne]. The even local Melbourne musicians such as Brian Brown were
ensemble was formed to develop new instruments and new experimenting with leather saxophones and effects machines.
music for the virtuosic live performance. So far, the project has Digital controllers by Yamaha, the EWIs, and Syntaphones all
developed a number of unique and versatile instruments belong to a MIDI generation and together, with many other
including light-harps and electronic leather bassoons. The experimental interfaces are well beyond the scope of this demo
project has been developing over a long term [15 years] and for critical review.
research has been undertaken concurrently with artistic
activities including; concerts, exhibitions and recordings.
Research projects have investigated a number of areas Meta, augmented and hybrid instruments, there are so many
including; musical languages [Free music, Indian music now. Strings, percussion and brass instruments are well
gamaka, micro-tonality and multi-phonics], interface design, represented here, but the saxophone perhaps not enough. Even
live signal processing, performance techniques, virtuosity, Sukandar’s gluions have featured on at least three meta/Mehta
feedback systems and skilled performance. trumpets [Axel Dörner, Jonathan Impett and Rhajeesh Mehta]
and there are a number of trombone projects such as Nic
Collins, and LeMouton [7]. The work of Matthew Burtner and
The project has sought to develop mature instruments from his Metasax would be the best-known augmented sax
working prototypes, playable instruments. We define playable controller in recent times.
instruments as expressive, responsive, versatile and
practicable: suitable for technical and musical development.
The project has also aimed to develop instruments that are His approach of placing sensors over existing saxophone keys
intuitive, inspiring and capable of demonstrating their own to affect expression while playing long notes, brings into
sound and personality. Additionally the project has focused sharp focus the issues of redundancy of saxophone technique
on live improvisation and since the Paris NIME, has been
366
and interface. Burtner’s musical landscape takes the them to play sensor controls. For these reasons we chose a bent
saxophone well outside the instrument’s traditional jazz and soprano as our first instrument to work with and i n
repertoire boundaries into a space that redefines timbre. collaboration with the saxophonists developed a sensor
Burtner explains his approach as a modification of the keys, interface consisting of two panels.
situating force-sensitive resistors under the finger-tips t o
affect “after-touch”. He writes; “…In essence, the saxophone The first panel mounted a number of dials, switches and FSRs
keys which normally execute only on/off changes of the air and was situated on the right hand side of the saxophone’s
column, are converted to continuous control levers…”[1]. bell.
The saxophone is amongst a number of highly specialized

traditional instruments bristling with key-work. Instruments
that keep your fingers busy while you hold onto it as best you
can. It is arguable that this co-dependency of the Meta-sax’s
traditional acoustic and electronic sensor interfaces has
transformed the instrument’s nature entirely. To progress this
idea further, an after-touch saxophone may not even need any
keys. Instead it could perhaps be better served bristling with
sensors; which is how the third of our bassoons contra-
monster was conceived. Sensors were placed under the first
three [strong] fingers for both hands with joysticks situated
for the thumbs. The instrument is capable of ten channels of
simultaneous control. However, the contra-monster was
constructed via a number of prototypes and also to perform a
specialized signal processing based musical language.
Schiesser and Traube’s saxophone project [8] offers a more

practical solution regarding this issue. Their augmented
saxophone’s electronic sensors were situated for simultaneous
and independent actuation alongside the traditional
saxophone key work, allowing the musician to still play
conventionally and yet execute independent sensor control.
Their USB interface instrument was limited to only six 10-bit
analogue controllers [force sensitive resistors or FSRs,
inclinometers and ultrasound proximity] and some buttons.
Nevertheless it demonstrates some practical features including
a control panel mounted on the right hand side of the bell.
Figure 1. Gluisop
Some other points worth mentioning here are that larger
acoustic instruments usually require bigger hand stretches and The second panel, which was much smaller, mounted a
that there are other places for sensors to go on the saxophone joystick, two dials and one small FSR for the left hand. The
if the instrument is supported well and the thumbs are free t o instrument was completed with an extra FSR at the lower right-
move. hand thumb-rest.
Perhaps there is a way to augment the saxophone without any

redundancy of technique, interface etc? Can the acoustic and
electronic interfaces be independent of each other and also be
effective? What about bending notes and other techniques that
are not so on/off? Finally, what about a leather saxophone i.e. a
sensor only instrument? These questions formed the basic
parameters for our project and we decided to make a number of
playable OSC saxophones in collaboration with the
musicians.
3. GLUISOP
Amongst the saxophonists involved in the project, there
remained a strong interest in developing a small, portable
extended saxophone. Touring and air freight issues were the Figure 2. Gluisop left-hand panel
main consideration here. But also smaller instruments, well
supported by neck-straps, allow for the weight of the Two microphones were used to pick up the instruments sound,
instrument to be taken off the thumbs, potentially freeing one clipped on to the bell and another one over the key-work
367
to pick up key clacks and other techniques. The microphone range is confined discretely to control the buffer size for
signal was digitized by a Digidesign 002 audio interface. rhythmic looping and re-sampling. The two transposable
buffer delays have a pitch range of over eight octaves. The
mapping was developed as an expressive, intuitive solution
The sensors were digitized using a gluion sneaker interface for a number of joysticks, wheels and FSRs.
with sensors cabled [soldered] onto pins, allowing them to be At this stage of the project the majority of trial performance
connected directly into the interface housings high density work has been done with the Gluisop instrument, [consisting
SUB-D connector. Analogue sensors are sampled at 16bit of regular rehearsals over a six-month period. During this time,
resolution and OSC data streamed directly into MaxMSP with the instrument was secured to an adjustable stand taking all of
up to a 1 msec refresh rates. With the instrument supported b y the weight off the hands. Another dial controller was added t o
a neck-strap the musicians could play the saxophone’s key- the left-hand panel just above the joystick. The FSR on this
work unrestricted and still have at their disposal up to four panel was repositioned between the underside of the panel and
independent channels of simultaneous sensor control. the left-hand upper thumb rest of the instrument so that any
Situating a joystick at the lower thumb rest extended this downward pressure applied to the dials or joystick of this
further to five. panel could be transferred independently as another channel
for control.
Dials of various sizes and types were used on the instrument

for specific purposes. For the transposition of pitch, or delay- These modifications and the inclusion of the stand, made the
time [fine control] a large dial capable of small, well- instrument much easier to play. Although some sensor
controlled movements was situated on the right-side panel. controllers such as those situated on the lower panel, still
Roller dials have been positioned for thumbs, while FSRs have require the right hand to come off the instrument’s key-work.
been nested for the small “pinky” finger to control feedback of Although, the sensors and saxophone keys remain
delays and comb filters. The gluion is directly connected t o independent and the instrument is capable of the full gamut of
the computer using a standard Ethernet network cable. saxophone technique in performance with four simultaneous
channels of sensor control.
3.1 Gluialto & Leathersop
Two other saxophones were also constructed for the project.
The Gluialto was constructed in conjunction with Dale Chant
and was interfaced to a 16bit Gluion Slipper interface. This
interface stacked another joystick on the left hand panel and
added two extra FSRs.
Finally the Leathersop [leather soprano sax], was developed as

a new instrument for the bent leather collection but also as a
total sensor saxophone. Similar to the Contramonster, [3] this
instrument has no open tone holes and places sensors instead
onto the closed tube. This instrument has eight FSRs and two
joysticks for the hands to play allowing for up to 1 2
simultaneous channels of sensor control. The Leathersop also
has a number of dials and switch based controllers and can be
used with either a continuous foot-pedal or active
electromagnetic proximity sensor [expressive radius of up t o
two meters]. Leathersop’s Gluion electronics make use of
Sukandar’s new smaller circuit board and are built in to the
instrument’s body.
4. MAPPING AND PERFORMANCE

TRIALS
The saxophones’ preliminary mappings have been based o n
the bent leather band’s contra-monster work and developed b y
Joanne Cannon. This software patch developed in Max MSP
brings together a granular pitch shifter, a smooth pitch
delay/echo, a modulation delay patch, two transposable buffer
delays [for multi-part playing and comb filtering], and finally
a reverb. Each effect is sequenced in their previous mentioned
order with one control knob reserved for a wet/dry and global
level for stage control etc.
The smooth pitch delay/echo has a nonlinear mapping figure 3 and 4. Tony Hicks and Gluisop
exploding the range as the delay time approaches zero [for fine Furthermore, all saxophonists involved in the project found
pitch control] whereas at the other end of the delay time the that the instrument could be picked up and played without a
368
detailed knowledge of the signal processing techniques The project team has also presented the instrument at the
involved. Once the mapping was set up the instrument was University of Adelaide where experimental saxophonist Derek
intuitive to the player. New sounds and techniques have been Pascoe and composer Luke Harrald have been working on live
discovered in each of the following sessions also and the multi-agent performance systems for saxophone. Luke has
development of advanced techniques continues. The mapping expressed an interest in writing for the ensemble and it i s
and sonic outcomes are also compatible with the Bent Leather anticipated that the new instrument projects will be completed
Band’s existing ensemble language so the Gluisop has been in 2008. New mappings and signal processing techniques
brought into the group. including tuning systems and spatial projection control will
also be trialed.
6. REFERENCES
[1] Burtner M. “The Metasaxophone: Concept,
implementation and mapping strategies for a new
computer music instrument” Organised Sound: Vol. 7,
no. 2. Cambridge: Cambridge University Press: pp. 201-
213.
[2] M. Burtner and S. Serafin: The Exbow-Metasax. JNMR,
Fig. 4 & 5. Gluialto metasax July 2002.
[bent leather band 2008] [3] Favilla S. Cannon J. & Greenwood G. 2005 “Evolution
and Embodiment: Playable Instruments for Free Music” In
ICMC Free Sound, ICMA 2005 pp. 628-631
[4] De Laubier S. & Goudard V. “Meta-Instrument 3: a look
This instrument, although not capable of as many over 17 years of practice”, NIME06, pp. 288-291, 2006
simultaneous channels of control as the Leathersop; [5] Kartadinata S. “The gluion, advantages of an FPGA-based
introduces the main ideas of signal processing expression sensor interface.” NIME06, pp. 93-96, 2006
such as, delay time [dial] and feedback [FSR or pressure]
control, the use of two dimensional joystick controllers and [6] Kientzy D. Teruggi D. Risset J. & Racot G. Sax-Computer,
global parameter settings and controls. Therefore it also serves Audio CD, INA GRM, SNA, 1990
as a training instrument for the more advanced Leathersop [7] LeMouton S. Stroppa M. & Sluchin B. “Using the
sensor interface as well as an instrument in its own right. augmented trombone in I will not kiss your f.ing flag”,
NIME06, pp. 304-307, 2006
[8] Schiesser S. & Traube C. “On making and playing an
electronically-augmented saxophone.” NIME06, pp. 308-
313, 2006
[9] M. Wright and A. Freed: Open Sound Control: A New
Protocol for Communicating with Sound Synthesizers, In
Proceedings, International Computer Music Conference,
Thessaloniki, Hellas, 1997
Fig. 6 Gluialto metasax
5. NEXT STEPS
The next stage of the project involves building the ensemble
up and networking interfaces to a single computer. The Bent
Leather Band project “Heretics Brew” aims to develop an
ensemble of experimental instruments/interfaces for brass,
saxophones, woodwinds and guitar families. The project i s
building momentum and in the process of staging public
performances and recording with Tony Hicks, Dale Chant, Paris
Favilla and Melbourne experimental improviser and guitarist
Ren Walters.
369

Staas de Jong
LIACS
Leiden University
staas@liacs.nl

!

!

!

.

"

#
! $ # /(

%0'

$
%&'()

!

%/'

*

$

+

(

$

(
,

1

2

3
%-'

+

!

.

!

!
-4/445

-6
-/78

-44)9 .

. :
!

!4-

/;

/44)9
&44)9#

:

! ,

2

!
<8; -44= 3

$(

370
!"!

9

!

!

9

!

2

!
# 0

$

(

@

!
#"

@

+

%6'
>

.
$ # -( ?

/4
A

!

)

!

/
-

!

2 %8'

# 0

B2 2

-84 )9

%& /'

,

9

,

/
@

,
-@

9

C @

$
%/' ? , ?
D"
+,
"
B
E

>
!
"%

&

' %& '*+/> BF -44&
%-' ><
,
D
>
"

!
* '
4

9;
:
# -446
%0' " ,B

@

< = = "/;;>++?>"@/
/;//G
:
,
! / C : /778
%&' " " * ""E

#>" !

* ' 4

9;
/: # -446

#

%8' , , * "">

,
1 )E

1
1!
E,

!

!
*H' 4

?

9;
H/B
C, -44;

%6' * ":
:

,,
?

,

!
JK;
KJ
;

! ?' J
K
J J/,? C,

!
-44-
371
The Musical Synchrotron: using wireless motion sensors

to study how social interaction affects synchronization
with musical tempo
Michiel Demey, Marc Leman Frederick Bossuyt, Jan Vanfleteren
IPEM – Department of Musicology, Ghent University TFCG Microsystems Lab, Ghent University
Blandijnberg 2 Technologiepark Zwijnaarde 914
9000 Ghent, Belgium 9052 Zwijnaarde
+32 (0)9 264 41 26 +32 (0)9 264 53 54
michiel.demey@ugent.be jan.vanfleteren@ugent.be
ABSTRACT explored in an ecological context. It is also related to studies that

The Musical Synchrotron is a software interface that connects explore the relationship between music and gaming [7]. The
wireless motion sensors to a real-time interactive environment present study thereby focuses on how social interaction affects the
(Pure Data, Max/MSP). In addition to the measurement of perception of music. This is done by asking people to move along
movement, the system provides audio playback and visual with the music, in particular, to synchronize with musical tempo.
feedback. The Musical Synchrotron outputs a score with the
degree in which synchronization with the presented music is 3. The Musical Synchrotron System
successful. The interface has been used to measure how people The basic principle of The Musical Synchrotron is to capture
move in response to music. The system was used for experiments movement data from wireless motion sensors together with the
at public events. playback of musical stimuli. The pulse or BPM (beats per minute)
value of the movement data is determined in (quasi) real time and
Keywords compared to the annotated BPM value of the played musical
Wireless sensors, tempo perception, social interaction, music and stimuli. This comparison is used to calculate a score which can be
movement, embodied music cognition displayed to the participants representing their performance of the
synchronization to the music. The software interface is developed
1. Introduction such that 4 persons can participate simultaneously, which enables
not only the study of multiple individual performances but also
Music is known to be a core component of the social and cultural
the human-human interactions as a response to music.
cohesion of our society [1] [2] [3]. Yet the components of the
social power of music, that is, the elements that contribute to The Musical Synchrotron consists of three main parts, which are
social interaction are badly understood. In this paper, we describe discussed in the next sections. The first part is concerned with the
a system called The Musical Synchrotron that enables the study data acquisition where the acceleration data from the wireless
on the relationship between music and social interaction. The motion sensors is received. The second part is concerned with the
Musical Synchrotron connects commercial and custom-made processing of the acceleration data in terms of BPM. These values
wireless motion sensors to a real-time interactive music give a direct indication of the synchronization with the musical
environment. By integrating audio playback, kinetic monitoring tempo. Finally, the BPM value is used to provide feedback, in the
and visual feedback, the system offers an interesting tool for the form of a user interface. The data is transmitted between the
study of how social interaction affects synchronization with different parts of the Musical Synchrotron using the Open Sound
musical tempo. Control (OSC) protocol [8]. This makes the design both modular
and powerful since the different tasks can be distributed among
2. Background different computers. When changing the sensor system one should
Research on social aspects of musical gesture so far has been only change the first part, all the following steps remain the same.
conducted mainly in the context of ethnomusicological research,
using anthropological methods in combination with audio/video 3.1 Data acquisition
recording and analysis [4] [5]. Given this context, the Musical Currently two types of wireless motion sensors are in use, namely
Synchrotron can be seen as a contribution to the development of a the commercially available Nintendo Wii Remote and the custom-
technological platform in which social embodied music made HOP sensors.
interaction [6], including music perception, can be studied and
At the time of the study there existed no standalone object in Pure
Data (PD) to access the acceleration data from the Nintendo Wii.
Therefore an external object for PD, called WiiSense [9], was
Permission to make digital or hard copies of all or part of this work for developed. This external object enabled the readout of the 3
personal or classroom use is granted without fee provided that copies are directional values of acceleration measured with the Wii Remote
copies bear this notice and the full citation on the first page. To copy at a rate of 100 Hz. In view of more advanced studies and real-
otherwise, or republish, to post on servers or to redistribute to lists, time applications, such as the use of a large number of sensors for
requires prior specific permission and/or a fee. a longer time and a more flexible attachment to different human
NIME08, June 4-8, 2008, Genova, Italy body parts, we envisioned the development of a custom-made
Copyright remains with the author(s). sensor called the HOP sensor [10].
372
The HOP sensors incorporate a 3D accelerometer and a Wireless platform that can highly contribute to our understanding of music-
USB transceiver, which accesses the 2.4 GHz ISM band. A driven social interaction, using the principles of embodied music
dedicated HUB is connected via USB to the computer and cognition [6].
recognized as a virtual COM port. Sampling rates of 100 Hz are
possible and a receiver range of 40 m is achieved. The major 5. ACKNOWLEDGMENTS
advantage of the HOP sensors is their size: 55 mm long, 32 mm
Special thanks to Bart Kuyken and Wouter Verstichel and the
wide and around 15 mm thick (including connectors). Each sensor
support of the TFCG Microsystems Lab - IMEC under the
is powered by a Li-Po battery, which has the same dimensions as
guidance of Jan Vanfleteren to design and manufacture the HOP
the sensor and provides around 18 hours of operation time. This
sensors. Also our gratitude goes to Prashant Vaibhav for
small design makes it easy to strap the sensor on the legs or arms
implementing the WiiSense object in PD and to all the
of people using simple stretchable Velcro.
participants in our experiments.
3.2 Processing This work is funded with the EmcoMetecca-project.

In the processing of the 3D acceleration signal, the 3 directional
values of acceleration are summed and filtered to a range from 6. REFERENCES
0.5Hz to 4Hz. The high pass filter eliminates the constant offset in [1] Merriam, A. The Anthropology of Music. Evanston:
the acceleration data due to the gravitation of the earth. The low Northwestern University Press,1964.
pass filter eliminates the higher frequencies that are irrelevant to
[2] McNeill, W. H. Keeping together in time: dance and drill in
human rhythm perception [11]. The calculation of the BPM value
human history. Harvard University Press: Cambridge,
is done using an FFT with an input window of 400 samples (or 4 Massachusetts, 1995.
seconds) and an overlap of 50%, which results in an update time
of 2 seconds. This external enables a real-time BPM [3] Bispham, J. Rhythm in music: What is it? Who has it? And
determination of the acceleration signal. why? Music Perception, 24(2), 125-134, 2006.
3.3 Feedback [4] Lortat-Jacob, B. & Olsen, M. R. Music, anthropology: A
The third part of the Musical Synchrotron consists of a user necessary marriage. Homme (171-172), 7-26, 2004.
interface. This interface controls the playback of the musical [5] Clayton, M. R. L. Observing entrainment in music
stimuli and displays a direct visual feedback of the participants’ performance: Video-based observational analysis of Indian
performance. The feedback is provided through a score that musicians’ tanpura playing and beat making. Musicae
counts up (down) when the people are moving the sensor in the Scientiae, 11(1), 27-59, 2007.
same tempo (too fast or too slow) compared to the annotated
[6] Leman, M. Embodied music cognition and mediation
musical tempo. All data are logged to txt files containing the
technology. Cambridge, MA: The MIT-Press, 2007.
sensor acceleration data and the score of the participant for further
offline analysis. [7] Rinman, M. L., Friberg, A., Bendiksen, B., Cirotteau, D.,
Dahl, S., Kjellmo, I., et al. Ghost in the cave – An interactive
The software was developed such that 4 people could participate
collaborative game using non-verbal communication.
in the experiment simultaneously. As such, The Musical
Gesture-Based Communication in Human-Computer
Synchrotron could be used in the context of social interaction
Interaction, 2915, 549-556, 2004.
where participants move along with the tempo of the music they
hear. However, the Musical Synchrotron is also operational with [8] OSC, http://opensoundcontrol.org/
less than 4 people. [9] Vaibhav, P. http://code.google.com/p/wiisense/
4. Evaluation and Outlook [10] Kuyken, B., Verstichel, W., Demey, M. & Leman, M. The
The Musical Synchrotron has meanwhile been used in several HOP sensor: wireless movement sensor, NIME 08.
public events, including a big fair in Ghent (ACCENTA 2007), an [11] Van Noorden, L., & Moelants, D. Resonance in the
event on Scientific Research at Ghent University, and on the perception of musical pulse. Journal of New Music Research,
Television Broadcasting (KETNET, Betweters) where it was 28(1), 43-66, 1999.
demonstrated in a program for children. In addition to that, the
[12] De Bruyn, L., Leman, M. & Moelants, D. Quantifying
Musical Synchrotron has been used in a scientific study on
children’s embodiment of musical rhythm in individual and
synchrony behavior of children.
group settings. Accepted for publication in Proceedings of
The results, which are described in detail in [12] [13], are the 10th International Conference on Music Perception and
promising in that they show a clear effect of the social condition Cognition. Sapporo, Japan, August 25-29 2008.
(where the 4 participants see each other) in comparison with the [13] De Bruyn, L., Leman, M., Demey, M., Desmet, F. &
individual condition (where the participants are blindfolded). Moelants, D. Measuring and quantifying the impact of social
A major result of The Musical Synchrotron concerns the positive interaction in listeners’ movements to music. Accepted for
attitude of participants that were engaged in this interactive publication in Springer Verlag Lecture Notes in Computer
music-driven game. Using dedicated hardware components that Science.
are currently under development [10] we are on the way towards a
373
Performances

Opening Concert
June 4, 2008, h. 18.00, Auditorium Casa Paganini
This concert includes four original music pieces emerging from the experience of four
young composers working with interactive technologies, and in particular with the
EyesWeb XMI platform for eXtended Multimodal Interaction (www.eyesweb.org).
EyesWeb XMI supports the design of multimodal interactive systems, the analysis and
processing of expressive full-body movement and gesture, and a large number of further
features. This concert shows on stage concrete results from current research at Casa
Paganini-InfoMus Lab (www.casapaganini.org). The concert is not just held at Casa
Paganini: it fully exploits the whole environment of Casa Paganini as an overall
instrument/interface for musical expression.
The opening concert is partially supported by the EU FP7-ICT Project SAME

(www.sameproject.eu) on active music listening, and by the EU-Culture 2007 Project
CoMeDiA (www.comedia.eu.org) on networked performance.
In particular, the piece “Lo specchio confuso dell’ombra” by Roberto Girolin faces the
problem of remote communication and social interaction between audience in different
locations: the Foyer and the Auditorium of Casa Paganini. The piece is structured in two
separate but communicating installations. One of the main scientific research issues
behind this piece, raised and experimented during its design and implementation is on
“how to interact and to convey expressive content in a remote networked environment?”,
one of the main core issues of CoMeDiA.
The piece “The Bow is bent and drawn” by Nicola Ferrari is again another challenge, this
time centered on the SAME ICT EU Project. This piece explores a novel paradigm on
“active music listening” developed at Casa Paganini - InfoMus Lab, described in the paper
by Camurri et al. in these proceedings. The active music listening paradigm has been
elaborated and transformed into a compositional element by the composer.
377
Lo specchio confuso dall’ombra
Composer: Roberto Girolin

robertogirolin@gmail.com
EyesWeb interactive systems design:

Paolo Coletta, Simone Ghisio, Gualtiero Volpe
Biographical information
Roberto Girolin (1975) was born in Pordenone, Italy, and after studying of the classical guitar he began to study the piano
and composition at the "J. Tomadini" Conservatory in Udine. He studied the vocal and instrumental counterpoint, graduating
in choral music and conducting in the same Conservatory. He has conducted many choirs and orchestras, exploring different
kinds of repertories from Gregorian music to contemporary music.
He has deepened the study of contemporary music at the University of Udine with Dr.A.Orcalli and then with Dr.N.Venzina
at "B.Maderna" Archive in Bologna (Italy). He has followed several Masterclasses and seminars: choral music, chamber
music, composition (Salvatore Sciarrino, Fabio Nieder, Mauro Bonifacio), electronic music (Lelio Camilleri, Agostino Di
Scipio), a Sound Design course with Trevor Wishart, an Audio Digital Signal Processing for Musical Applications (Lab
workshop, lessons and applications) with Giuseppe Di Giugno and live electronics in Luigi Nono's works with Alvise
Vidolin and André Richard (Experimental Studio Freiburg für Akustische Kunst).
He graduated with full marks in Electronic Music and Multimedia at the Musical Academy of Pescara (Italy) and in 2006 he
also got his degree at the Conservatory of Venice under the direction of Alvise Vidolin with full marks (cum Laude).
He is actively involved in performing and investigating the compositional and performance potential offered by
electronic&multimedia music systems. His music is performed in Italy and abroad. He has recently won the “Call 2007”,
(Italian CEMAT Competition) and a Mention at the 34th "Concours Internationaux de Musique et d'Art Sonore
Electroacoustiques de Bourges", France.
Description of Piece
Lo specchio confuso dall’ombra can be translated as “The mirror confused by its shadow” and it is between a distributed
installation and a concert, in which opposing groups of performers in two remote places play solo or interact.
The audience (two people at a time, one for each installation) activates video and sound transformations, depending on the
space they occupy and their gesture. The two installation are in the Foyer and in the Auditorium, respectively, so the two
persons from the audience cannot see and talk each other. Multimodal data and expressive gesture cues are extracted in real-
time by an EyesWeb patch, interacting and playing with the electronic performer. The interaction occurs both between the
electronic performer and the two places where the audience has access, and between the two remote installations. There are
two different levels of intervention in the audio and video transformation: autonomous, depending on the single person and
conditioned, depending on the behaviour and the actions occurring in the other, separate installation.
Further, the entrance of the concert hall has microphones, which capture words, sentences, coughs, laughs or other noise,
which are transformed in real-time and thus entering into the piece.
Lo specchio confuso dall’ombra can’t bind the audience remain seated or follow a specific pattern in his behaviour. His
duration is indefinite: it changes every time it is performed.
Acknowledgments
This piece has been commissioned by Casa Paganini – InfoMus Lab, to tackle open problems on networked performance
faced in the EU Culture 2007 Project CoMeDiA.
378
The Bow is bent and drawn

Four Parts Madrigale Rappresentativo
For four Dancers and EyesWeb
Composer: Nicola Ferrari

InfoMus Lab – Casa Paganini, Genova
eusebius_1799@yahoo.com
Based on the installation "Mappe per Affetti Erranti",

designed and developed by Antonio Camurri, Corrado Canepa, Nicola Ferrari, Gualtiero Volpe
texts from Edmund Spenser’s The Faire Queen and William Shakespeare’s King Lear
with support of EU ICT Project SAME
Vocalists: Roberto Tiranti (tenor and vocal conductor), Valeria Bruzzone (alto),
Chiara Longobardi (soprano), Edoardo Valle (bass)
Dancers: Giovanni Di Cicco (choreography), Luca Alberti, Filippo Bandiera, Nicola Marrapodi
Recording engineer and music consultant: Marco Canepa

Sound engineers: Corrado Canepa (director), Chiara Erra (assistant)
EyesWeb interactive systems design: Paolo Coletta, Barbara Mazzarino, Gualtiero Volpe
Nicola Ferrari was born in 1973. He studied composition with Adriano Guarnieri and took his degree at ‘G. B. Martini’ Conservatory
in Bologna. He took his Master Degree and PhD from the Faculty of Arts and Philosophy at University of Genoa. Since 2005 he is a
member of the staff of the InfoMus Lab. For many years he directed the ‘S.Anna’ polyphonic choir. He wrote scores for theatrical
performances.
Description of the Piece
The bow is a theatrical mise-en-scene of the installation Mappe per Affetti Erranti. During the Science Festival 2007, as a preparatory
work for the EU ICT Project SAME on active listening (www.sameproject.org), the audience was invited to explore and experience a
song by John Dowland (see the paper on these proceedings by Camurri et al). The audience could walk inside the polyphonic texture,
listen to the singles parts, change the expressive quality of musical interpretation by their movement on the stage of Casa Paganini
analysed with EyesWeb XMI. Aesthetically, the most interesting result consists in the game of hiding and revealing a known piece.
The idea could be matched with the classical theatrical topos of recognition. So, the musical potentiality of the ‘interactive
performance’ of a prerecorded music becomes a new dramaturgical structure.
Roberto Tiranti and his madrigalistic group recorded, under the supervision of Marco Canepa, different anamorphic interpretations of
a bachian choral. Thanks to the interactive application developed with EyesWeb XMI, the group of dancers conducted by the
choreographer Giovanni Di Cicco, mix and mould the recorded music material in real time. At the same time, the live sound of the
vocal group explores the whole space of Casa Paganini, as a global (both real and imaginary) musical instrument. In a metamorphic
game where, according to Corrado Canepa’s compositive lesson, electronic and acoustic technologies merge and interchange their
specificity, this interactive score of losing and finding, multiplying and distillating the ancient bachian palimpsest tries to tell the
dramatic history of King Lear, the most tragic western figure of difficulty to reach the affects you possess without being able to know
or express.
Acknowledgments
The music commission is kindly offered by Fondazione Spinola. The scientific and technological developments are partially
supported by the EU FP7 ICT Project SAME (www.sameproject.eu).
379
Tre aspetti del tempo

per iperviolino e computer
Giorgio Klauer
Conservatorio di Como
home: Kreplje, 5
6221 Dutovlje (Slovenija)
0039 329 25 89 352
klauer@alice.it
Giorgio Klauer studied electronic music, instrumental composition, flute and musicology in Trieste, where he was born in 1976, in
Cremona and in Liège. He is professor at the Conservatory of Como, school of music and sound technologies.
Putting a distance sensor under the scroll of the instrument and an inclination sensor on the wrist, the detection of the displacements of the
limbs of the interpreter becomes possible. These displacements, drawn onto a cartesian plane, give the coordinates of a track in an ideal
performing space, whose third dimension is increased and formed by the passing of time. Actually, the computer permits to assimilate to
the aforesaid track the sounding path proposed by the interpreter, hence to rehear it. Also in the latter case, the coordinates to access it are
given by current gestures, therefore the dimension of time results bundled, somehow like considering a parchment palimpsest: the sounding
form returned by the computer results increasingly dense and inexplicable and needs an electroacoustic exegesis that unleash it at least in
shreds.
The procedures of musical production are here a metaphor for knowledge; alike are the compositional methods at the root of the score,
which providing the prescriptions of the musical path, portrays in addition a mental track.
380
Aurora Polare
Alessandro Sartini
Conservatorio Niccolò Paganini
Via Terracini 140/2
16166 Genoa, Italy
+393491291231
sartiniale@libero.it
Alessandro Sartini
Born in Genoa in 1982, he studied piano with Canzio Bucciarelli and attends the last year of Composition at the Conservatory of Genoa
with Riccardo Dapelo, who introduced him to “live electronic” treatments. His first public exhibition was at the Auditorium Montale of the
Carlo Felice Theatre in Genoa, during the concert commemorating the 50th anniversary of Béla Bartók's death in 1995. From that year on
he established a great number of collaboration with various solo musicians, who really appreciated his way to accompany; this guided him
to work in partnership with a good number of professional soloists. In 1999 he joined the class of Composition at the Conservatory of
Genoa with Luigi Giachino, who introduced him to film music: this interest led him to win the third prize at the Lavagnino International
Film Music Festival in Gavi in 2006 and the first prize at the “Concorso Internazionale di Composizione di Alice Belcolle" in 2007. With
Valentina Abrami, he is the founder of the “Associazione Musica in Movimento”, which operates at the “International School in Genoa”.
Aurora Polare
Aurora Polare (Polar Dawn) is a short piece for cymbals, tam-tam, vibraphone, live electronics and EyesWeb system. This piece was
inspired by the smooth movements of waves, the drawings created by polar dawns and the cold weather in polar seas – that’s the reason
why only metallophones are used.
The first matter to fight with was making the percussionist elaborate the sound they
produce while playing their instruments and crafting a brand-new easy way to
specify every movement. That’s why, under the traditional notation score, two
special lines follow the music specifying the direction to move to: up-down and
left-right/near-far. A line approaching the top or the bottom of the Y axis tells the
way to track. You can find an example here on the left.
All of those movements fully interact with EyesWeb and MAX MSP thru two
30fps accelerometer bracelets worn by the performers. Every vertical movement
controls the volume of the processed sound, while horizontal movements manage a
different patch in MAX MSP suited to every instrument: a tam-tam sample speed
controller (this make the instrument play without being touched), an harmonizer to
make cymbals sing just like a Theremin, but with their own processed sound, and
the rate of a delay. In the control room a MIDI controller and a computer will be used to manage live additional effects and parameters,
like granular synthesis, reverb and multi-slider filters.
Thanks to Martino Sarolli for helping me with MAX MSP, to Matteo Rabolini and Matteo Bonanni for playing my composition.
381
Pyrogenesis
Pascal Baltazar
GMEA,
4, rue Sainte Claire
F-81000 Albi France
+33 563 545 175
pb@gmea.net / pb@zkrx.org
Pascal Baltazar is a composer and research coordinator at GMEA, National Center for Musical Creation in Albi, France. His research
focuses on spatial and temporal perception of sound, and its relationship to the body and musical gesture. He is coordinating the Virage
research platform, on control and scripting novel interfaces for artistic creation and entertainment industries, granted by the French
Research Agency, in the frame of its Audiovisual and Multimedia program, for the 2008-2009 period. He is an active member of the
Jamoma collective.
He has studied Aesthetics (Masters of Philosophy Thesis The sonic image : material and sensation, 2001, Toulouse III, France) and
electroacoustic composition at the National Conservatoire of Toulouse. He has then been implied as a composer or interactive designer in
diverse artistic projects : concerts, performing arts shows and interactive installations. He has been commissioned for musical works by
several institutions, as the French State, INA-GRM, GMEA, IMEB… and participated in international festivals (Présences Électroniques,
Paris / Radio France Festival, Montpellier / Synthèse, Bourges / Videomedeja, Novi Sad / Space + Place, Berlin…).
The composition of Pyrogenesis took inspiration from several aspects of the blacksmithing, not in a literal way, but much as a set of
correspondences :
First, the gesture, by which the blacksmith models the matter continuously; striking, heating, twisting, soaking metals to gradually print a
form into them.
Then, the tool: Just like the blacksmith manufactures his own tools, I work on developing my own electro-acoustic instrument: an
instrument to write sound, in space and with a gestural input.
Lastly, the organic construction of the form: Gilles Deleuze says "Why is the blacksmith a musician? It is not simply because the forging
mill makes noise, it is because the music and the metallurgy are haunted by the same problem: that the metallurgy puts the matter in the
state of continuous variation just as the music is haunted by putting the sound in a state of continuous variation and to found in the sound
world a continuous development of the form and a continuous variation of the matter "
On a more technical/scientific point of view, the interaction with the performer uses two interfaces : a Wacom tablet, and a set of force-
resistive-sensors (through an analog-to-digital converter), which common point is that they both allow control by the pressure of hands, and
thus offer a very “physical” mode of control.
The composition/performance environment consists of a set of generative audio modules, fully addressable and presettable, including a
mapping engine allowing a quick yet powerful set of mapping strategies from controllers inputs and volume envelopes to any parameter,
including those of the mappers themselves, allowing a very precise, flexible, and evolutive sound/gesture relationship in time.
The composition has been realized through a constant dialogue between improvisations in a pre-determined trajectory, and afterwards-
listening of the produced result. Thus, most of the details of the composition have been generated by an improvisation/learning-through-
repetition process, without any visual support - thus allowing to emphasize expressivity while keeping a very direct relationship to the
musical gesture.
382
Keo Improvisation for sensor instrument Qgo

Chikashi Miyama
University at Buffalo 131 Allen st., Apt.17
Buffalo NY 14201 USA 1-716-868-2819
cmiyama@buffalo.edu
Chikashi Miyama received his BA(2002) and MA(2004) from the Sonology Department, Kunitachi College of
Music, Tokyo, Japan and Nachdiplom(2007) from Elektronisches studio, Musik-Akademie der Stadt Basel, Basel,
Switzerland. He is currently attending the State University of New York at Buffalo for his ph.D. He has studied
under T.Rai, C.Lippe, E.Ona, and G.F.Haas. His works, especially his interactive multimedia works, have been
performed at international festivals, such as June in Buffalo 2001 (New york, USA) , Mix '02 (Arfus, Denmark),
Musica Viva '03 (Coimbra, Portugal), Realtime/non-realtime electronic music festival (Basel, Switzerland), Next
generation'05 (Karlsruhe, Germany), as well as various cities in Japan. His papers about his works and realtime
visual processing software "DIPS" have also been accepted by ICMC, and presented at several SIGMUS
conferences. Since 2005, he has been performing as a laptop musician, employing his original sensor devices and
involving himself in several Media-art activities, such as Dorkbot, Shift-Festival, SPARK, and SGMK workshops.
His compositions have received honorable mention in the Residence Prize section of the 30th International
Electroacoustic Music Competition Bourges and have been accepted by the International Computer Music
Conference in 2004, 2005, 2006 and 2007. Several works of him are published, including the Computer Music
Journal Vol.28 DVD by MIT press and the ICMC 2005 official CD.
"Keo" is a performance for voice improvisation, Qgo sensor instrument , and live electronics. The author attempts
to realize three concepts in the work. The first is "dual-layered control," in which the performer improvises
phrases by singing and providing sound materials for a computer. Simultaneously, he sends commands to the
computer to process vocals using a pair of sensor devices worn on both hands. The second is the connection
between the visuality of the performance and the musical
gestures. In most parts of the performance, the movement of
the sensor instrument and the musical parameters are clearly
connected. If the performer moves his hand even slightly,
particular aspects of the sound are influenced in an obvious
manner. The third is the strong connection between music
and theatricality. In several parts of this work, the body
motions of the performer not only control the sensor device,
but also provide some theatrical meanings. (Photo ; Qgo, sensor
instrument)

383
Intersecting Lines
Keith Hamel
University of British Columbia
6361 Memorial Rd.
1-604-822-6308
hamel@interchange.ubc.ca
François Houle
Vancouver Community College
1155 East Boradway
1-604-874-3300
f.houle@telus.net
Aleksandra Dulic
University of British Columbia
6361 Memorial Rd.
1-604-822-8990
adulic@interchange.ubc.ca
François Houle has established himself as one of Canada’s finest musicians. His performances and recordings transcend the stylistic
borders associated with his instrument in all of the diverse musical spheres he embraces: classical, jazz, new music, improvised music, and
world music. As an improviser, he has developed a unique language, virtuosic and rich w ith sonic embellishments and technical extensions.
As a soloist and chamber musician, he has actively expanded the clarinet’s repertoire by commissioning some of today’s leading Canadian
and international composers and premiering over one hundred new works. An alumnus of M cGill University and Yale University, François
has been an artist-in-residence at the Banff Centre for the Arts and the Civitella Ranieri Foundation in Umbria, Italy. Now based in
Vancouver, François is a leader in the city’s music community and is considered by many to be Canada’s leading exponent of the clarinet.
Keith Hamel is a Professor in the School of M usic, an Associate Researcher at the Institute for Computing, Information and Cognitive
Systems (ICICS), a Researcher at the M edia and Graphics Interdisciplinary Centre (MAGIC) and Director of the Computer M usic Studio at
the University of British Columbia. Keith Hamel has written both acoustic and electroacoustic music and his works have been performed
by many of the finest soloists and ensembles both in Canada and abroad. M any of his recent compositions focus on interaction between
live performers and computer-controlled electronics.
Aleksandra Dulic is media artist, theorist and experimental filmmaker working at the intersections of multimedia and live performance with
research foci in computational poetics, interactive animation and cross-cultural media performance. She has received a number of awards
for her short animated films. She is active as a new media artist, curator, a writer, an educator, teaching courses, presenting art projects and
publishing papers, across North America, Australia, Europe and Asia. She received her Ph.D. from the School of Interactive Art and
Technology, Simon Fraser University in 2006. She is currently a Postdoctoral research fellow at the M edia and Graphics Interdisciplinary
Centre, University of British Columbia funded by Social Sciences and Humanities Research Council of Canada (SSHRC).
Intersecting Lines is a collaboration between clarinetist François Houle, interactive video artist Aleksandra Dulic and computer music
composer Keith Hamel. The work grew out of Dulic's research in visual music and involves mapping a live clarinet improvisation onto
both the visual and audio realms. In this work an intelligent system for visualization and signification is used to develop and expand the
musical material played by the clarinet. This system monitors and interprets various nuances of the musical performance. The clarinetist’s
improvisations, musical intentions, meanings and feelings are enhanced and extended, both visually and aurally, by the computer system,
so that the various textures and gestured played by the performer have corresponding visuals and computer-generated sounds. The
melodic line, as played by the clarinet, is used as the main compositional strategy for visualization. Since the control input is based on a
classical instrument, the strategy is based on calligraphic line drawing using artistic rendering: the computer-generated line is drawn in 3D
space and rendered using expressive painterly and ink drawing styles. The appearance of animated lines and textures portray a new artistic
expression that transforms a musical gesture onto a visual plane. Kenneth Newby made contributions to the development of the animation
software. This project was made possible with generous support of Social Sciences and Humanities Research Council of Canada.
384
Vistas
Ernesto Romero
Los Platelmintos
Union 139-5, Col. Escandón, México, D.F.
tait_mx@yahoo.com
Esthel Vogrig
Los Platelmintos
Union 139-5, Col. Escandón, México, D.F.
cuki100@hotmail.com
Los Platelmintos are a group of artists, living in Mexico City, that work under the premise of interdiscipline and experimentation. Dance,
music and electronic media are fundamental elements in their work. Ernesto Romero : music composition and electronic media. Studies
Composition, Mathematics and Choir conduction in México. Chief of the Audio Department at the National Center for the Arts in México
where he researches and developes technology applied to the arts. Esthel Vogrig : Coreographer and dancer. Studies contemporary dance
and coreography in México, V ienna and the United States. Director of Los PLatelmintos company. Recipient of the "Grant for
Investigation and Production of Art Works and New Media” from the National Council of the Arts and the Multimedia Center in Mexico.
This grant was used to produce the piece Vistas. Karina Sánchez : Dancer. Studies contemporary dance and coreography in Chile, Spain
and México.
Diagrams/images are welcome (do not exceed 1 page total).
VISTAS. (2005) Choreography with video, one musician playng live electronics and two dancers with metainstruments interacting with
the music. Divided in three scenes the work is conceptually based in the “self-other” cognitive phenomena inspired by Edgar Morin's idea
of the evolution of society through interdisciplinary interaction. The interdisciplinary feature of the piece is carefully constructed using 2
metainstruments that link the formal elements in a structural way. This metainstruments are two wireless microphones plugged into two
stethoscopes attached to the dancers hands. The movements of the dancers make the microphones generate an amplitude that is
transmitted to the computer and mapped into different music elements. Some live voice participations from the dancers add dramatic
accents to the piece. Vistas is en integral piece in wich the music supports the choreography as well as the choreography gets influenced by
the music. The video supports the scene creating an abstract space that changes and evolves according to the performance. The musical
aesthetic has Noise elements and voice sample manipulation playing with texture and density contrast in a very dynamic way. The
language of the choreography comes from an exploration of the planes in a 3rd dimension space by separate first and united later. The
language is also influenced by the need to achieve the best usage as possible of the metainstrument
385
The Pencil Project

Martin Messier Jacques Poulin-Denis
412B, rue Boucher 2244 de Larivière
Montreal Qc Canada Montréal Qc Canada
H2J 1B6 H2K 4P8
001 514 273 87 58 001 514 652 42 52
martin.messier@ekumen.com jacques@ekumen.com
Martin Messier
Holding a diploma in drums for jazz interpretation, Martin Messier has completed a bachelor’s
degree in electroacoustic composition at the University of Montreal, and De Montfort University
in England. Recently, Martin has founded a solo project called « et si l’aurore disait oui… »,
through which he develops live electroacoustic performance borrowing stylistic elements from
Intelligent Dance Music, acousmatic and folk. Based on strong aptitudes for rhythm, Martin’s
esthetic can be defined as a complex, left field and happily strange sound amalgam, constantly
playing with construction and deconstruction.
Jacques Poulin-Denis
Jacques Poulin-Denis is active in projects that intersect theater, dance and music. He has
completed his undergraduate studies in electroacoustic composition from the University of
Montreal, and De Montfort University in England. Most of his music was composed for theater
and dance. Jacques explores innovative ways of presenting electro-acoustic music. Jacques’
musical style is evocative and filled with imagery. Combining traditional and electronic
instruments with anecdotic sound sources of everyday life, he creates vibrant music that is
fierce and poetic.
Description of the piece

The Pencil Project is a performance piece created by sound artists Martin Messier and
Jacques Poulin-Denis. Their intention was to craft a live electronic music piece inspired by the
physicality of writing and the imagery it articulates. The performers translate scribbling,
scratching, dotting and drawing with pencil music. The computers are hidden and untouched
throughout the piece, allowing object manipulation and the creation of sound to be the
performers’ main focus.
The Pencil Project is about musicianship. Liberated from the computer screen and equipped
with hands-on objects, the performers explore a new form of expressivity. Through an
authentic and stimulating performance, the musicians bring computer music intimately close to
playing an actual musical instrument.
386
Heretic’s Brew
Stuart Favilla
Bent Leather Band
CTME, OTRL, Victoria University
Melbourne, Australia
sfavilla@bigpond.com
Joanne Cannon
Bent Leather Band
Melbourne, Australia
joanne_cannon@bigpond.com
Tony Hicks
Saxophonist/Multi-Instrumentalist
Melbourne Australia
hixt@optusnet.com.au
Composer/improviser Joanne Cannon, is one of Australia’s leading bassoonists. Although she began her career as a professional
orchestral musician, she now works as a composer and improviser, exploring extended techniques. Stuart Favilla has a background in
composition and improvisation. Together they form the Bent Leather Band, a duo that has been developing experimental electronic
instruments for over twenty years in Australia. Bent Leather Band blurs virtuosity and group improvisation across a visual spectacle of
stunning original instruments. These were made in conjunction with Tasmanian leather artist, Garry Greenwood. The instruments
include fanciful dragon headed Light-Harps, leather Serpents and Monsters that embody sensor interfaces, synthesis and signal
processing technology. Practicable and intuitive instruments, they have been built with multi-parameter control in mind. Joint winners of
the Karl Szucka Preis, their work of Bent Leather has gained selection at Bourges and won the IAWM New Genre Prize.
Inspired by the legacy of Percy Grainger’s Free music, i.e. “music beyond the constraints of conventional pitch and rhythm” [Grainger,
1951], Bent Leather Band has strived to develop a new musical language that exploits the potentials of synthesis/signal processing,
defining new expressive boundaries and dimensions and yet also connecting with a heritage of Grainger’s musical discourse. Grainger
conceived his music towards the end of the 19th Century, and spent in excess of fifty years bringing his ideas to fruition through
composition for theremin ensemble, the development of 6th tone instruments [pianos and klaviers], the development of polyphonic reed
instruments for portamento control and a series of paper roll, score driven electronic oscillator instruments.
Tony Hicks enjoys a high profile reputation as Australia's most versatile woodwind artist. Equally adept on saxophones, flutes and
clarinets, his abilities span a broad spectrum of music genres. A student of Dr Peter Clinch Tony also studied at the Eastman School of
Music. He has performed throughout Australia, and across Europe, the United States, Japan and China with a number of leading
Australian ensembles including the Australian Art Orchestra, Elision, and the Peter Clinch Saxophone Quartet. He has performed
saxophone concertos with the Melbourne Symphony Orchestra, and solo’d for Stevie Wonder and his band. As a jazz artist he has
performed and recorded with leading jazz figures Randy Brecker, Billy Cobham, notable Australian artists, Paul Grabowsky, Joe
Chindamo, David Jones, and also lead a number of important groups in the local Australian scene. An explorer of improvised music, he
consistently collaborates with numerous artists both in Australia and overseas.
Bent Leather Band introduces their new extended instrument project, Heretics Brew. The
aim of this project is to develop an extended line up with the aim of building a larger
ensemble. So far the project [quintet] has developed a number of new extended saxophone
controllers and is currently working on trumpets and guitars. Their instruments are based on
Gluion OSC, interfaces; programmable frame gate array devices that have multiple
configurable inputs and outputs. For NIME08, the ensemble trio will demonstrate their
instruments, language and techniques through ensemble improvisation.
[Pictured Right: Gluisop extended saxophone]
387
Mark A. Bokowiec
The Suicided Voice Julie Wilson-Bokowiec
University of Huddersfield EDT
School of Music & Humanities www.bodycoder.com
Queensgate, Huddersfield. HD1 3DH
00 44 (0) 1484 472004 00 44 (1) 484 513158
m.a.bokowiec@hud.ac.uk juliebokowiec@yahoo.com
Mark Bokowiec (Composer, Electronics & Software Designer)

Mark is the manager of the electro-acoustic music studios and the new Spacialization and Interactive Research Lab at the University of
Huddersfield. Mark lectures in interactive performance, interface design and composition. Composition credits include: Tricorder a work
for two quarter tone recorders and live MSP, commissioned by Ensemble QTR. Commissions for interactive instruments include: the
LiteHarp for London Science Museum and A Passage To India an interactive sound sculpture commissioned by Wakefield City Art
Gallery. CD releases include: Route (2001) the complete soundtrack on MPS and Ghosts (2000) on Sonic Art from Aberdeen, Glasgow,
Huddersfield and Newcastle also on the MPS label. Mark is currently working on an interactive hydro-acoustic installation.
Julie Wilson-Bokowiec (vocalist/performer, video and computer graphics)
Julie has creating new works in opera/music theatre, contemporary dance and theatre including: Salome (Hammersmith Odeon – Harvey
Goldsmith/Enid production) Suspended Sentences (ICA & touring) Figure Three (ICA) for Julia Bardsley, Dorian Grey (LBT/Opera
North), Alice (LBT) and a variety of large-scale site-specific and Body Art works. As a performer and collaborator Julie has worked with
such luminaries as Lindsey Kemp, Genesis P-Orridge and Psychic TV and the notorious Austrian artist Hermann Nitsch. Julie and Mark
began creating work with interactive technologies in 1995 developing the first generation of the Bodycoder System in 1996.
The Suicided Voice

(for performer/vocalist, the Bodycoder System, live MSP, video streaming & computer graphics)
The Suicided Voice is the second piece in the Vox Circuit Trilogy, a series of interactive vocal works completed in 2007. In this piece the
acoustic voice of the performer is “suicided” and given up to digital processing and physical re-embodiment. Dialogues are created
between acoustic and digital voices. Gender specific registers are willfully subverted and fractured. Extended vocal techniques make
available unusual acoustic resonances that generate rich processing textures and spiral into new acoustic and physical trajectories that
traverse culturally specific boundaries crossing from the human into the virtual, from the real into the mythical. The piece is fully scored,
there are no pre-recorded soundfiles used and no sound manipulation external to the performer’s control.
In The Suicided Voice the sensor interface of the Bodycoder System is located on the upper part of the torso. Movement data is mapped to
live processing and manipulation of sound and images. The Bodycoder also provides the performer with real-time access to processing
parameters and patches within the MSP environment. All vocalisations, decisive navigation of the MSP environment and Kinaesonic
expressivity are selected, initiated and manipulated by the performer. The primary expressive functionality of the Bodycoder System is
Kinaesonic. The term Kinaesonic is derived from the compound of two words: Kinaesthetic meaning the movement principles of the body
and Sonic meaning sound. In terms of interactive technology the term Kinaesonic refers to the one-to-one, mapping of sonic effects to
bodily movements. In our practice this is usually executed in real-time. The Suicided Voice was created in residency at the Banff Centre,
Canada and completed in the electro-acoustic music facilities of the University of Huddersfield.
388
Etch
Mark A. Bokowiec Julie Wilson-Bokowiec
University of Huddersfield EDT
School of Music & Humanities www.bodycoder.com
Queensgate, Huddersfield. HD1 3DH
00 44 (0) 1484 472004 00 44 (1) 484 513158
m.a.bokowiec@hud.ac.uk juliebokowiec@yahoo.com
Mark Bokowiec (Composer, Electronics & Software Designer)

Mark is the manager of the electro-acoustic music studios and the new Spacialization and Interactive Research Lab at the University of
Huddersfield. Mark lectures in interactive performance, interface design and composition. Composition credits include: Tricorder a work
for two quarter tone recorders and live MSP, commissioned by Ensemble QTR. Commissions for interactive instruments include: the
LiteHarp for London Science Museum and A Passage To India an interactive sound sculpture commissioned by Wakefield City Art
Gallery. CD releases include: Route (2001) the complete soundtrack on MPS and Ghosts (2000) on Sonic Art from Aberdeen, Glasgow,
Huddersfield and Newcastle also on the MPS label. Mark is currently working on an interactive hydro-acoustic installation.
Julie Wilson-Bokowiec (vocalist/performer, video and computer graphics)
Julie has creating new works in opera/music theatre, contemporary dance and theatre including: Salome (Hammersmith Odeon – Harvey
Goldsmith/Enid production) Suspended Sentences (ICA & touring) Figure Three (ICA) for Julia Bardsley, The Red Room (Canal Café
Theatre) nominated for the Whitbread London Fringe Theatre Award, Dorian Grey (LBT/Opera North), Alice (LBT) and a variety of large-
scale site-specific and Body Art works. As a performer and collaborator Julie has worked with such luminaries as Lindsey Kemp, Genesis
P-Orridge and Psychic TV and the notorious Austrian artist Hermann Nitsch. She guest lectures in digital performance at a number of
University centres, and together with Mark, regularly publishes articles on interactive performance practice.
Julie and Mark began creating work with interactive technologies in 1995 developing the first generation of the Bodycoder System an on-
the-body sensor interface that uses radio to transmit data in 1996. They have created and performed work with the Bodycoder System at
various events and venues across Europe the US and Canada and at artist gatherings including ISEA and ICMC. Major works include
Spiral Fiction (2002) commissioned by Digital Summer (cultural programme of the Commonwealth Games, Manchester). Cyborg
Dreaming (2000/1) commissioned by the Science Museum, London. Zeitgeist at the KlangArt Festival and Lifting Bodies (1999) at the
Trafo, Budapest as featured artists at the Hungarian Computer Music Foundation Festival NEW WAVES supported by the British Council.
Etch
(for performer/vocalist, the Bodycoder System, live MSP & computer graphics)
Etch is the third work in the Vox Circuit Trilogy (2007). In Etch extended vocal techniques, Yakut and Bell Canto singing, are coupled
with live interactive sound processing and manipulation. Etch calls forth fauna, building soundscapes of glitch infestations, howler tones,
clustering sonic-amphibians, and swirling flocks of synthetic granular flyers. All sounds are derived from the live acoustic voice of the
performer. There are no pre-recorded soundfiles used in this piece and no sound manipulation external to the performer’s control. The
ability to initiate, embody and manipulate both the acoustic sound and multiple layers of processed sound manipulated simultaneously on
the limbs – requires a unique kind of perceptual, physical and aural precision. This is particularly evident at moments when the source
vocal articulates of the performer, unheard in the diffused soundscape, enter as seemingly phantom sound cells pitch-changed, fractured
and heavily processed. In such instances the sung score, and the diffused and physically manipulated soundscape seem to separate and the
performer is seen working in counterpoint, articulating an unheard score. Etch is punctuated by such separations and correlations, by choric
expansions, intricate micro constructions and moments when the acoustic voice of the performer soars over and through the soundscape.
Although the Bodycoder interface configuration for Etch is similar to that of The Suicided Voice, located on the upper torso - the
functional protocols and qualities of physical expressivity are completely different. Interface flexibility is a key feature of the Bodycoder
System and allows for the development of interactive works unrestrained by interface limitations or fixed protocols. The flexibility of the
interface does however present a number of challenges for the performer who must be able to adapt to new protocols, adjust and temper her
physical expressivity to the requirements of each piece.
The visual content of both Etch and The Suicided Voice was created in a variety of 2D and 3D packages using original photographic and
video material. Images are processed and manipulated using the same interactive protocols that govern sound manipulation. Content and
processing is mapped to the physical gestures of the performer. As the performer conjures extraordinary voices out of the digital realm, so
she weaves a multi-layered visual environment combining sound, gesture and image to form a powerful ‘linguistic intent’.
Etch was created in residency at the Confederation Centre for the Arts on Prince Edward Island, Nova Scotia in June 2007.
389
Silent Movies: an improvisational sound / image performance
Thomas Ciufo
Smith College
Seelye Hall, Room B1
Northampton, MA 01063 USA
413-585-3435
tciufo@smith.edu
Thomas Ciufo is an improviser, sound / media artist, and researcher working primarily in the areas of electroacoustic
improvisational performance and hybrid instrument / interactive systems design, and is currently serving as artist-in-
residence in Arts and Technology at Smith College. Recent and ongoing sound works include, three meditations, for
prepared piano and computer, the series, sonic improvisations #N, and eighth nerve, an improvisational piece for prepared
electric guitar and computer. Recent performances include off-ICMC in Barcelona, Visione Sonoras in Mexico City, the
SPARK festival in Minneapolis, the International Society for Improvised Music conference in Ann Arbor, and the Enaction
in Arts conference in Grenoble.
Silent Movies: an improvisational sound / image performance
Silent Movies is an attempt to explore and confront some of the possible relationships / interdependencies between visual and
sonic perception. In collaboration with a variety of moving image artists, this performance piece complicates visual
engagement through performed / improvised sound. In a sense, Silent Movies plays with the live soundtrack idea, but from a
somewhat different vantage point. Or maybe it is an inversion; a visual accompaniment to an improvised sonic landscape?
For this performance, I will use a hybrid extended electric guitar / computer performance system, which allows me to explore
extended playing techniques and sonic transformations provided by sensor controlled interactive digital signal processing.
For tonight's performance, the moving image composition is by Mark Domino (fieldform.com).
For more information, please refer to online documentation:

Guitar performance system :
http://ciufo.org/eighth_nerve_guitar.html
Performance documentation:
http://ciufo.org/silent_movies.html
390
NIME Performance - The Color of Waiting
Alison Rootberg
Artistic Director - Kinesthetech Sense  5009 Woodman Ave #303  Sherman Oaks, CA 91423  847-209-8116
arootberg@gmail.com
Margaret Schedel
Assistant Professor - Stony Brook University  PO Box 1137  Sound Beach, NY 11789  415-246-1096
gem@schedel.net
Kinesthetech Sense was founded by Alison Rootberg and Margaret Schedel in 2006
with the intent to collaborate with visual artists, dancers, and musicians, creating
ferociously interactive experiences for audiences throughout the world. Rootberg, the
Vice President of Programming for the Dance Resource Center, focuses on
incorporating dance with video while Schedel, an assistant professor of music at Stony
Brook University, combines audio with interactive technologies. Oskar Fischinger once
said that, "everything in the world has its own spirit which can be released by setting it
in motion." Together Rootberg and Schedel create systems which are set in motion by
artistic input, facilitating interplay between computers and humans. Kinesthetech Sense
has had their work presented throughout the US, Canada, Denmark, Germany, Italy,
and Mexico. For more info, please go to: www.ksense.org
Developed in Amsterdam, at STEIM, The Color of Waiting uses animation, movement

and video to portray themes of expectation. This collaboration (between animator Nick
Fox-Gieg, chorographer/dancer Alison Rootberg, composer/programmer Margaret
Schedel, and set designer Abra Brayman) deals with the anticipation of events by
understanding the way time unfolds. The performers shift between frustration and
acceptance as they portray the emotions evoked when waiting for something or
someone. The Color of Waiting is an experience and a mood, an abstraction depicting
human interaction.
391
MoPho – A Suite for a Mobile Phone Orchestra

Ge Wang Georg Essl Henri Penttinen
Stanford University TU Berlin Helsinki University of Technology
Center for Computer Research in Deutsche Telekom Laboratories Department of Signal Processing
Music and Acoustics Germany and Acoustics
U.S.A. Finland
georg.essl@telekom.de
ge@ccrma.stanford.edu henri.penttinen@hut.fi
Ge Wang received his B.S. in Computer Science in 2000 from Duke University, PhD (soon) in Computer Science (advisor Perry Cook) in
2008 from Princeton University, and is currently an assistant professor at Stanford University in the Center for Computer Research in
Music and Acoustics (CCRMA). His research interests include interactive software systems (of all sizes) for computer music, programming
languages, sound synthesis and analysis, music information retrieval, new performance ensembles (e.g., laptop orchestra) and paradigms
(e.g., live coding), visualization, interfaces for human-computer interaction, interactive audio over networks, and methodologies for
education at the intersection of computer science and music. Ge is the chief architect of the ChucK audio programming language and the
Audicle environment. He was a founding developer and co-director of the Princeton Laptop Orchestra (PLOrk), the founder and director of
the Stanford Laptop Orchestra (SLOrk), and a co-creator of the TAPESTREA sound design environment. Ge composes and performs via
various electro-acoustic and computer-mediated means, including with PLOrk/SLOrk, with Perry as a live coding duo, and with Princeton
graduate student and comrade Rebecca Fiebrink in a duo exploring new performance paradigms, cool audio software, and great food.
Georg Essl is currently Senior Research Scientist at Deutsche Telekom Laboratories at TU-Berlin, Germany. He works on mobile
interaction, new interfaces for musical expression and sound synthesis algorithms that are abstract mathematical or physical models. After
he received his Ph.D. in Computer Science at Princeton University under the supervision of Perry Cook he served on the faculty of the
University of Florida and worked at the MIT Media Lab Europe in Dublin before joining T-Labs.
Henri Penttinen was born in Espoo, Finland, in 1975. He completed his M.Sc. and PhD (Dr. Tech.) degrees in Electrical Engineering at the
Helsinki University of Technology (TKK) in 2002 and 2006, respectively. He conducted his studies and teaches about digital signal
processors and audio processing at the Department of Signal Processing and Acoustics (until 2007 known as Laboratory of Acoustics and
Signal Processing) at TKK. Dr. Penttinen was a visiting scholar at Center for Computer Research in Music and Acoustics (CCRMA),
Stanford University, during 2007 and 2008. His main research interests are sound synthesis, signal processing algorithms, musical
acoustics, real-time audio applications in mobile environments. He is one of the co-founders and directors, with Georg Essl and Ge Wang,
of the Mobile Phone Orchestra of CCRMA (MoPhO). He is also the co-inventor, with Jaakko Prättälä, of the electro-acoustic bottle
(eBottle). His electro-acoustic pieces have been performed around Finland, in the USA, and Cuba.
Additional Composer Biography: Jeffrey Cooper is a musician / producer from Bryan, Texas. Having worked as a programmer and DJ for
a number of years, he is currently finishing a Master Degree in Music, Science, and Technology at Stanford University / CCRMA. Co-
composer of music for mobile phones with the honorable Henri Penttinen.
The Mobile Phone Orchestra is a new repetoire-based ensemble using mobile phones as the primary musical instrument.
The MoPhO Suite contains a selection of recent compositions that highlights different aspects of what it means to compose for and perform
with such an instrument in an ensemble setting. Brief program note: The Mobile Phone Orchestra of CCRMA (MoPhO) presents an
ensemble suite featuring music performed on mobile phones. Far beyond ring-tones, these interactive musical works take advantage of the
unique technological capabilities of today's hardware, transforming phone keypads, built-in accelerometers, and built-in microphones into
powerful and yet mobile chamber meta-instruments. The suite consists of selection of representative pieces:
***Drone In/Drone Out (Ge Wang): human players, mobile phones, FM timbres, accelerometers.
***TamaG (Georg Essl): TamaG is a piece that explores the boundary of projecting the humane onto mobile devices and at the same time
display the fact that they are deeply mechanical and artificial. It explores the question how much control we have in the interaction with
these devices or if the device itself at times controls us. The piece work with the tension between these positions and crosses the desirable
and the alarming, the human voice with mechanical noise. The alarming effect has a social quality and spreads between the performers. The
sounding algorithm is the non-linear circle map which is used in easier-to-control and hard-to-control regimes to evoke the effects of
control and desirability on the one hand the the loss of control and mechanistic function on the other hand.
***The Phones and Fury (Jeff Cooper and Henri Penttinen): how much damage can a single player do with 10 mobile phones? Facilitating
loops, controllable playback speed, and solo instruments.
***Chatter (Ge Wang): the audience is placed in the middle of a web of conversations...
392
Club Performances

Traces/Huellas
for flute and electronics
Jane Rigler
25 Monroe Street
Brooklyn, NY 11238 USA
1.917.826.9608
Jane Rigler, flutist, composer, educator and curator is known for her innovations in new flute performance, techniques and unique musical
vocabulary. She is a featured performer in contemporary music festivals throughout the U.S. and Europe as a soloist as well as within
chamber ensembles (Ensemble Plural, Either/Or, Ne(x)tworks, Ensemble Sospeso, Anthony Braxton 12tet, etc.). Besides premiering works
written especially for her, Jane’s compositions cover simple solo acoustic pieces inspired by language to complex interactive electronic
works that pay homage to painting, poetry and dance. After receiving a B.M. (Northwestern University) and then pursuing flute studies in
various parts of Europe and North America, she gained her M.A. and Ph.D. (UC San Diego) completing The Vocalization of the Flute, a
book demonstrating both new and ancient methods of singing-while-playing the flute. Her expertise has led to performances in
contemporary operas, experimental theater and dance events as well as other interactive electronic festivals. Her compositions are sought
after by other flutists and have been performed in South Korea, Australia, France, Spain, and in concert halls and universities throughout
the U.S.
After living in Spain for 9 years, Jane resides in Brooklyn, NY and organizes events such as the Relay~NYC! held at MoMA, the
Spontaneous Music Festival, and collaborated with the Conflux Festival in 2007. She has received several Brooklyn Arts Council grants
for her community activities, a Global Connections grant to perform in Munich this year and was awarded several artist residencies for her
interactive electronic works from Harvestworks Studios, Art Omi and RPI’s Create @ iEar Studios.
Traces/Huellas, a quadraphonic work for flute and computer, is inspired by the ancient storytelling tradition where a bard (the flutist) uses
the voice, motion and gesture to convey characters within a story in order to orally pass on the teachings of a culture. Designed using the
interactive computer program Max/MSP, this work incorporates a tiny triggering device strategically placed on the flute so the performer
can control the timing of the work, the sound processing and the distribution of sound through the space in real-time. The spatialization of
sound provides room for the sonic textures to move and interact to each other offering clues to the characters and situations within the
story. Although technology is essential to the music and story, its concealment attempts to give the appearance of a free-moving acoustic
instrumentalist-as-storyteller. In this way, Jane’s flute language merges into an organic electronic one creating a compelling musical
narration. Through the poetics of a unique vocal/flute/electronic/physical language, Traces/Huellas’ musical journey re-creates the
traditional story of the hero quest, her transformation and the consequences of this journey.
395
Drawing / Dance
Renaud Chabrier Antonio Caporilli

freelance Associazione Culturale Iblu
54 ter rue de l’Ermitage Via Mulineti 33
75 020 Paris France 16 036 Recco (GE) Italia
+33660740143 +393381058515
renaud.chabrier@m4x.org antonio_caporilli@hotmail.com
Renaud Chabrier was born in 1974. After receiving a master’s degree in physics, computer sciences and cognitive sciences, he has devoted
his research to the perception of movement through drawing. He has since become a dancer at KMK street theatre company and a
children’s book illustrator. He develops animation techniques and movement analysis tools for both performance and education purposes.
Antonio Caporilli is a dancer, performer, videomaker and installation artist. His researches on movement and improvisation have led him to
experiment with conventional and unconventional spaces, collaborating with people from different backgrounds, such as Georgio Strehler,
Robert Wilson and Bill T. Jones.
"Drawing / Dance" shows both the making of a short animation movie and its interpretation as a dance solo.
This performance is based on a custom software for real-time animation, on the basis of handmade dance sketches drawn on paper.
Thanks to this tool, real-time drawing can be used in a choreographic way, interacting in various manners with a live dance performance.
Real-time animations can thus either lead the dancer’s movements or follow them as the drawer sketches him while he dances.
A short dance solo will be created in advance, in relation with animated sequences. The dialogue between those sequences, projected on a
screen, and the solo dance will constitute the heart of the performance.
Some simple animations and dance movements will be also improvised in real-time, as an opening and a conclusion to the piece. Those
improvisations are an insight on the choreographical creative process.
A public rehearsal for this performance, based on drawing and dance improvisations, could also be organised as an installation (duration 3
hours).
The scene : video images are generated from the laptop Example of sketches
396
RADIO WONDERLAND
Joshua Fried
Independent Artist
th
277 N. 7 Street, Apt. 4R
Brooklyn, NY 11211 USA
+1 718-599-3414
composer@acedsl.com
Joshua Fried's unique profile spans experimental music, electronic dance music, performance, rock and pop. He has performed solo at
Lincoln Center, The Kitchen, CBGB, a Stuttgart disco, a former East Village bathhouse, a Tokyo museum, and the Royal Palace of
Holland; art rock guitar giant Fred Frith soloed on Fried's first solo disk, and Fried has produced or co-produced records by artists as
diverse as They Might Be Giants, Chaka Khan and avant-drone master David First. He is a recipient of numerous awards including two
New York Foundation for the Arts (NYFA) Fellowships, a National Endowment for The Arts (NEA) Composer's Fellowship and artist
residencies at MacDowell, Yaddo, VCCA, Djerassi and the Rockefeller Foundation's Bellagio Center on Lake Como, Italy. Fried won two
large commissions from American Composers Forum: to create live music for Douglas Dunn & Dancers, and to compose for the robotic
instruments of New York's League Of Electronic Musical Urban Robots (LEMUR). Joshua Fried is the youngest composer to appear in
Schirmer Books' American Music in the 20th Century.
Diagrams/images are welcome (do not exceed 1 page total).
RADIO WONDERLAND turns live commercial FM radio into recombinant funk. All sounds originate from an old boombox, playing
radio LIVE. All processing is live, programmed by me in MaxMSP. But I hardly touch the laptop. My controllers are a vintage Buick
steering wheel, old shoes mounted on stands, and some gizmos. You'll hear me build grooves, step by step, out of recognizable radio, and
even UN-wind my grooves back to the original radio source. I want to show that we ALL can interrupt and interrogate the endless flow. So
my transformations, taken individually, must be clear and simple--mostly framing, repeating and changing pitch--although when put
together the whole is indeed complex. My controllers are simple too: the wheel merely a knob to make things go up and down (frequency,
tempo) or play radio loops like a turntable, the shoes merely pads to hit softer or louder. The surreality of these ordinary objects
underscores the absurd disconnect between digital controller and sound, as well as the congenial nature of the aural transformations
themselves. So, too, my riffs must be vernacular and not elite. (We need the funk.)
397
,OVXRQRLQFDXVDWRLPSURYLVHDFWLRQIRUVXVSHQGHG
FODULQHWFODULQHWWLVWDQGHOHFWURQLFV

6LOYLD/DQ]DORQH
VLOYLDODQ]DORQH#FUPPXVLFLW

%LRJUDSKLFDOLQIRUPDWLRQRIWKHDXWKRU
6LOYLD /DQ]DORQH 6DOHUQR IODXWLVW DQG FRPSRVHU VWXGLHG IOXWH ZLWK (QULFR 5HQQD $QQDPDULD 0RULQL DQG 3HWHU/XNDV *UDI
FRPSRVLWLRQZLWK0DXUR&DUGLDQG*XLGR%DJJLDQLHOHFWURQLFPXVLFDQGFRPSRVLWLRQZLWK0LFKHODQJHOR/XSRQHDQG*LRUJLR1RWWROL
6KHFROODERUDWHVIURPZLWK&50&HQWUR5LFHUFKH0XVLFDOL5RPHDVPXVLFDVVLVWDQW
,Q VKH ZRQ WKH WK HGLWLRQ RI WKH ,QWHUQDWLRQDO 3UL]H RI &RPSRVLWLRQ ³4XDUDQW¶DQQL QHO ´ LQVWLWXWHG E\ &(0$7 &HQWUL
0XVLFDOL$WWUH]]DWLIRUWKHUHDOL]DWLRQRIDZRUNRIPXVLFDOWKHDWHUIRUFKLOGUHQ
,QKHUZRUN,OVXRQRLQFDXVDWRLPSURYLVHDFWLRQIRUVXVSHQGHGFODULQHWFODULQHWWLVWDQGHOHFWURQLFVZRQWKH)LUVW3UL]HH[
DHTXR RI WKH ,QWHUQDWLRQDO 3UL]H RI &RPSRVLWLRQ ³)UDQFR (YDQJHOLVWL´ DQG LV SXEOLVKHG E\ 6XYLQL =HUERQL ([HFXWLRQV RI VDPH
FRPSRVLWLRQVDUHSXEOLVKHGRQ&'E\$UV3XEOLFD
6KH KDV SXEOLVKHG DQDO\WLFDO DUWLFOHV IRU VSHFLDOLVW SHULRGLFDOV LQFOXGLQJ ³2UJDQL]HG 6RXQG´ ,QWHUQDWLRQDO -RXUQDO RI 0XVLF DQG
7HFKQRORJ\&DPEULGJH8QLYHUVLW\3UHVVDQG³6\ULQ[´$,)
+HUFRPSRVLWLRQVH[HFXWHGLQ,WDO\DQGDEURDGDUHRULHQWDWHGWRZDUGVH[SHULPHQWDWLRQDQGUHVHDUFKLQWRQHZH[SUHVVLYHDQGOLQJXLVWLF
VROXWLRQVDQGDUHUHDOL]HGZLWKLQIRUPDWLRQWHFKQRORJLHVZKLFKSHUPLWWKHSURFHVVLQJRIVRXQGLQUHDOWLPH'XULQJUHFHQW\HDUVVKHKDV
EHHQSULQFLSDOO\LQWHUHVWHGLQJHVWXUDOSRVVLELOLWLHVLPSURYYLVDWLRQDQGLQDGOLEFUHDWLRQZLWKWKHFRPSXWHUWKDWLVRIWHQLQWHJUDWHGZLWK
LQWHUDFWLYHLQVWDOODWLRQV
%LRJUDSKLFDOLQIRUPDWLRQRIWKHSHUIRUPHU
0DVVLPR0XQDUL5RPDJUDGXDWHGLQFODULQHWDW&RQVHUYDWRU\$&DVHOOD/¶$TXLODVWXGL\QJZLWK&7DGGHLDQG,YR0HFFROL
+H VWXGLHG FRPSRVLWLRQ ZLWK 0 *DEULHOL 5 6DQWRERQL DQG 'DYLG 0DFFXOL DQG KDV IROORZHG VSHFLDOL]DWLRQ FRXUVHV DW $FFDGHPLD
0XVLFDOH &KLJLDQD RI 6LHQD ZLWK )UDQFR 'RQDWRQL DQG DW 6FXROD GL 0XVLFD GL )LHVROH ZLWK *LDFRPR 0DQ]RQL +H VWXGLHG (OHFWURQLF
0XVLF ZLWK 5LFFDUGR %LDQFKLQL DQG *LRUJLR 1RWWROL DW &RQVHUYDWRU\ 6DQWD &HFLOLD RI 5RP+HKDVSHUIRUPHGDVVRORLVWLQFKDPEHU
HQVHPEOHDQGLQRUFKHVWUD$WWKHPRPHQWKHFROODERUDWHVZLWKYDULRXVFKDPEHUHQVHPEOHVDV0R]DUWHQVHPEOH1DEODHQVHPEOH
WKH HQVHPEOH &HFLOLD HOHWWULFD WKH 'RPDQL 0XVLFD HQVHPEOH DQG HQVHPEOH $OJRULWPR +H KDV SHUIRUPHG LQ YDULRXV LWDOLDQ
IHVWLYDOV )HVWLYDO LQFRQWUL PXVLFDOL QHO /D]LR )HVWLYDO GHOOH 5RFFKH 'RPDQL 0XVLFD IHVWLYDO 1XRYD &RQVRQDQ]D 0XVLFD
9HUWLFDOH 1XRYH IRUPH VRQRUH $XWXQQR PXVLFDOH 8QLYHUVLW\ RI &DVHUWD 6WDJLRQH FRQFHUWLVWLFD GHOO¶$FFDGHPLD 0XVLFDOH
3HVFDUHVH &DPSXV ,QWHUQD]LRQDOH GL 0XVLFD GL /DWLQD +H FROODERUDWHG DV VRORLVW ZLWK 2UFKHVWUD 6LQIRQLFD RI 3HVFDUD +LV
SHUIRUPDQFHVZHUHWUDQVPLWWHGE\5DGLR9DWLFDQD5DGLR7UHDQG5DL7UDGH,QKHZRQWKHGLSORPDRIVSHFLDOUHJDUGDWFRPSHWLWLRQ
RI FRPSRVLWLRQ 3 %DUVDFFKL GL 9LDUHJJLR DQG LQ KH ZRQ WKH , DZDUG DW FRPSHWLWLRQ 6 &LDQL RI 6LHQD LQ WKH MXU\ /XFLDQR
%HULR
'HVFULSWLRQRI3LHFH
,O VXRQR LQFDXVDWR LPSURYLVHDFWLRQ IRU VXVSHQGHG FODULQHW FODULQHWWLVW DQG HOHFWURQLFV
LV D SLHFH ZKLFK LQYHVWLJDWHV WKH UHODWLRQVKLS EHWZHHQ WKH SHUIRUPHU DQG KLVKHU
LQVWUXPHQW LQ D ZD\ WKDW JRHV EH\RQG WKH XVXDO SURGXFWLRQ RI VRXQG , ZDV SURPSWHG E\
FHUWDLQFRQVLGHUDWLRQVDERXWWKHSULQFLSDORIFDXVDOLW\WRWU\RXWDV\VWHPWKDWZRXOGUHOHDVH
WKHFODULQHWIURPWKHWUDGLWLRQDOH[FLWDWLRQRIWKHDLUFROXPQSURGXFHGE\WKHFODULQHWWLVWE\
PHDQV RI WKH UHHG ,Q WKLV SLHFH WKH XQFDXVHG VRXQG LV QRW WKH UHVXOW RI WKH V\VWHP RI
YLEUDWLRQWKHFODULQHWEXWWKHFDXVHWKHDFWWKDWDOORZVXVWRH[SORUHLWVDFRXVWLFV
7KH XQFDXVHG LV DFFRUGLQJ WR PHGLHYDO $ULVWRWHOLVP WKH ³ILUVW LPPRELOH PRWRU´ WKH SXUH DFW WKH ILUVW FDXVH RI PRYHPHQW WKDW RI
QHFHVVLW\JDYHULVHWRWKHZRUOGDQGLWVFDXVDOVHTXHQFHV7KHXQFDXVHGVRXQGLVDOVRLQ,QGLDQFXOWXUH³$QkKDWD´WKHIRXUWKFKDNUDWKH
DERGHRISULPLJHQLDOVRXQG
398
Feed Forward Cinema

Luka Dekleva
Video manipulations Luka Prinčič Miha Ciglar
Adamičeva 15 Data analysis and granular synthesis Feedback Tv Interfacing
1000, Ljubljana Rakitna Sp.Slemen 40 L
++386 41 262 428 ++386 40 512 603 2352 Selnica ob Dravi
luka.dekleva@siol.net ++386 40 512 603
nova@viator.si
miha.ciglar1@guest.arnes.si
Mag. art. Luka Dekleva, in 2000 finished his studies in fine art photography at FAMU Prague in 2000. Works as a
freelance photographer, vj performer and multi media artist. Recently he has begun to explore the relation of image to
sound, trough interactive installations and performances. Luka Prinčič works in the field of sound and programming as a
performer and composer. He is an artist, web-developer, dj, writer, critic, reverse engineer, part-time hacker and open
source agent. Miha Ciglar is a composer and sound artist currently studying at the University of Music and Dramatic Arts
in Graz, Austria. His subject of high concern and priority is the problem of absolute awareness of sonic perception which
is directly connected with the question of existential legitimacy of sound art.
FeedForward Cinema
Audio/Video composition for three performers.
Closed information floows, become networks for interaction. Three A/V instruments are integrated to produce a resonant
and harmonic experience and a possibility to interact for the artists. A bond where all layers of the composition are
affected by a single change, be it image creating a audio signal or the other way round. The inherent glitch of used
devices exposes their fragile nature and turns them into instruments of expression. The performers, choose to limit their
expressive posibilities and submit a part of their „digital freedom“. By doing so the narow space of expression, that such a
fragile instrument has, becomes exposed to outside manipulation. Rather as three seperate instruments, FeedForward
Cinema combines one instrument for three performers.
399
The Control Group
Greg Corcoran Hannah Drayson Miguel Ortiz Perez Koray Tahiroglu

Altona, Leopardstown Rd., Transtechnology SARC Queens University, Media Lab, UIAH Helsinki
Dublin 18, Ireland Research, University of Belfast BT7 1NN, Northern Hämeentie, 135 C 00560,
Plymouth, Portland Ireland Helsinki
+353 86 1609557 Square, Plymouth, PL48AA +44 (0)28 90974829 +90 533 712 8245
thomascorcra@gmail.c nan
miguelortizperez@gm koray.tahiroglu@taik.fi
om hannah.drayson@ply ail.com
mouth.ac.uk
Hannah Drayson is an artist and a research student. With an interest in science and
technology, and their integration into lived reality. She uses media ranging from web and
graphic design to visual performance, video and digital audio production.
Koray Tahiroglu is with the University of Art and Design Helsinki, Finland. He is a sound
artist, performer and a researcher who grew up in Istanbul. He has been performing noise
and electronic music collaborating with dierent sound artists and performers as well as
with solo performances.
Miguel Ortiz Perez is a mexicancomposer and sound artist based in Belfast. Born in
Hermosillo Sonora, he has been involved in a vast range of activities related to modern
music and sound art. He has worked professionally as a composer, sound engineer,
lecturer, score editor, promoter and sound designer.
Thomas Greg Corcoran lives and works in Dublin, Ireland. Previously of a math/science
background he now practices painting, drawing, and computer-based art.
The members of the Control Group play melodic improvisation derived from the
physiological signals of their bodies.Communicating via gestures of their nervous systems
the group plays an improvisation within descriptive spaces of sound and visual media.
This is an audio-visual performance based upon the idea of exposition of the community
of bodies outside the realm of everyday body language.The performers interact with their
own internal rhythms in a feedback loop by observing the data-made-
audiovisual.Performers play one person per virtual instrument, each with their own
distinctive sound space, sparse enough to allow unconfused communication: I.e. Both the
audience and the performers know who is playing what.Components of the sounds used
include biometricbodilysound samples (such as the sound of breath and beat of heart),
sonified data streams either directly rendered or mediated by processes such as physical
models, and further interpretive sounds.
This piece is the result of a collaboration between active researchers and artists in the area
of bio-music performance and the rendering of sound to physiological control data.
400
Cent Voies
Nicolas d’Alessandro
Information Technology Group of
Faculté Polytechnique de Mons
31 Boulevard Dolez
B-7000 Mons (Belgium)
(+32) (0)65 37 47 94
nicolas.dalessandro@fpms.ac.be
Nicolas d'Alessandro – holds an Electrical Engineering degree from the Polytechnic Faculty of Mons (Belgium) since 2004. He achieved
his master thesis in the Faculty of Music of the University of Montreal, collaborating with Prof. Caroline Traube on the measurement of
perceptual analogies between guitar and voice sounds. For the last 3 years he work in the Information Technology Group of the
Polytechnic Faculty of Mons (supervisor: Prof. Thierry Dutoit) as a PhD student. He mainly works on expressive gestural control of sound
production, and more precisely digital instruments achieving voice (speech/singing) synthesis. He recently proposed a tablet-based digital
instrument devoted to the realtime manipulation of voice materials (live and synthetic) – called the HandSketch – and presented during
NIME 2007.
It is relatively straightforward to think that if a musical instrument is not played, it can not properly evolve or even does not exists. The
HandSketch (cf. Fig. 1) is a bi-manual digital instrument (controller and synthesizer) focused on the expressive control of incoming and
synthetic voice material. In order to be played, this instrument had thus to be “excited” by some composing work. From a purely technical
point of view, “Cent Voies” explores the field of possibilities in gestures and sounds that the HandSketch can produce. Even if the singing
synthesis mappings are exploited, main features that have been updated since its first presentation during NIME 2007 e.g. the
implementation of expressive interactions between the performer’s own voice characteristics and bi-manual gestural control. The most
relevant example is the concurrent control of the intonations of virtual locators which is both the result of the performer’s voice and
external “sketches”.
The piece explores the boundaries and overlapping regions between narration and music. Part of the incoming material is the performer’s
voice but the intonation is deformed in order to exacerbate musical aspects of speech prosody. Then the multiple virtual locutors initiated
by the performer become autonomous. By this way, a conversational confusion is created. This confusion is increased by the fact that the
voices progressively loose their human aspects (disappearing consonants, voicing, then timbral coherence). On the top of this phenomenon,
a virtual singer appears, where phrases and voice quality variations are completely due to hand movements through the tablet and
embedded force sensors. It brings the spectator of this confusing conversation in a much more intimate context.
This piece also wants to serve as a contextual work. Indeed, talking about convergence between narrative and musical materials drives us to
the typical context of singing, which is actually an “acoustic” way of seeing it. It can also meet some needs in the world of theatre
performances where the “music” of sentences is really meaningful for actors and authors, but this work often stands on traditional
references. This is precisely in the need of finding new paths between these two ways of expression (narration and music) that the design of
digital musical instruments is really meaningful. A contemporary instrument replying to a contemporary need: the hybridation of practices.
Fig. 1 – Illustration of typical playing position with the HandSketch.
401
Improvisation for hyper-flute,

electric guitar and real-time processing
Cléo Palacio-Quintin Sylvain Pohu
LIAM-Université de Montréal Laboratoire dinformatique, acoustique et musique
Input Devices and Musical Interaction Laboratory (LIAM) Faculté de Musique
Centre for Interdisciplinary Research in Music Université de Montréal
Media and Technology Montreal, QC, Canada
McGill University, Montreal, QC, Canada 1-514-725-5544
1-514-525-3649
sylvain_pohu@umontreal.ca
cleo_palacio-quintin@umontreal.ca
Cléo Palacio-Quintin
Constantly seeking new means of expression and eager to create, the flutist-improviser-composer Cléo Palacio-Quintin takes part in many
premieres as well as improvisational multidisciplinary performances, and composes instrumental and electroacoustic music for various
ensembles and media works. Since 1999, she extended these explorations into the development of a new instrument: the hyper-flute.
Interfaced to a computer and software by means of electronic sensors, the enhanced flute enables her to compose novel electroacoustic
soundscapes. She is now pursuing doctoral studies in Montreal to compose new works for the hyper-flute and for her new hyper-bass-flute.
Sylvain Pohu
Composer, improviser and guitarist, Sylvain Pohu is a founding member of the contemporary jazz ensemble [iks] and, since 2007, is also
its artistic director. As an electroacoustic composer and improviser, Sylvain Pohu has participated in numerous festivals. Another
dimension of Sylvain's work is the design and production of sound installations and vidéomusique pieces. Parallel to the aforementioned
activities, Sylvain is currently researching the role of improvisation in the compositional process and in real-time processing in conjunction
with a master's degree whose aim is to explore the expressive possibilities of improvised electroacoustic music.
Composers Cléo Palacio-Quintin and Sylvain Pohu are both skilled and experienced performers-improvisers. They have known each other
since years as collegues at the Université de Montréal, but never got the chance to share a stage. However, they always shared their passion
for improvisation and electroacoustic music.
Each of them is working on a new interface to perform live electronics together with their traditional instrument (flute and electric guitar).
In the development of their respective computer interfaces to perform improvised electroacoustic music, they are concerned with the same
issues. They both focus on the research of expressivity and freedom while performing on an augmented instrument.
NIME-08 give them now an opportunity to confront and merge their extended sonic worlds. No doubts, its going to be a challenging trip
for your ears!
402
Improvisation for Guitar/Laptop and HandSketch

Nicolas d’Alessandro Sylvain Pohu
Information Technology Group of Faculty of Music of
Faculté Polytechnique de Mons University of Montreal
31 Boulevard Dolez 200 Avenue Vincent d'Indy
B-7000 Mons (Belgium) H2V 2T2 Montréal (Canada)
(+32) (0)65 37 47 94 (+1) 514 343 6427
nicolas.dalessandro@fpms.ac.be sylvain.pohu@umontreal.ca
Nicolas d'Alessandro – holds an Electrical Engineering degree from the Polytechnic Faculty of Mons (Belgium) since 2004. He did his
master thesis in the Faculty of Music of the University of Montreal, collaborating with Prof. Caroline Traube on the measurement of
perceptual analogies between guitar and voice sounds. For the last 3 years he work in the Information Technology Department of
Polytech.Mons (supervisor: Prof. Thierry Dutoit) as a PhD student. He mainly works on expressive gestural control of sound production,
and more precisely digital instruments achieving voice (speech/singing) synthesis. He recently proposed (for NIME'07) a digital instrument
devoted to the realtime manipulation of singing voice contents, the HandSketch.
Composer, improviser and guitarist, Sylvain Pohu is a founding member of the contemporary jazz ensemble [iks] and, since 2007, is also
its artistic director. As an electroacoustic composer and improviser, Sylvain Pohu has participated in numerous festivals. Another
dimension of Sylvain's work is the design and production of sound installations and videomusique pieces. Parallel to the aforementioned
activities, Sylvain is currently researching the role of improvisation in the compositional process and in real-time processing in conjunction
with a master's degree whose aim is to explore the expressive possibilities of improvised electroacoustic music.
This piece presents a collaboration between two people interested in pushing forward instrumental control of sounds through improvisation.
On the one hand, the HandSketch is a fully invented (controller and synthesizer) bi-manual digital instrument devoted to the expressive
control of incoming and synthetic voice materials (speech and singing). It comes from the will of a PhD student (Nicolas d'Alessandro) in
signal/information processing to develop gestural interaction with narrative contents, in order to manipulate them ouside usual boudaries,
and make them becoming music. On the other hand, there is the will of a guitarist (Sylvain Pohu) to extend the possibilities of his six
strings in order to improvise on a wider range of sounds that will create electroacoustics. Each of them is working on new interfaces and
new strategies to perform and improvise live electronics. They meet together in this work, starting from two completely different points.
Figure 1. Presentation of Nicolas d'Alessandro's (left) and Sylvain Pohu's (right) setups. The HandSketch (left) is made of a Wacom tablet,
FSR sensors and a custom voice processor/synthesizer. The guitar is extended with MIDI knobs, faders and pedals and live processing.
403
Anjuna’s Digital Raga

Ajay Kap ur
California Institute of the Arts
Valencia, CA
USA
250-704-2531
akap ur@calarts.edu
Ajay Kapur is the Music Technology Director at California Institute of the Arts. He received an Interdisciplinary Ph.D. in 2007 from
University of Victoria combining Computer Science, Electrical Engineering, Mechanical Engineering, Music and Psychology with a
focus on Intelligent Music and Media Technology. Ajay graduated with a Bachelor in Science and Engineering Computer Science
degree from Princeton University in 2002. He has been educated by music technology leaders including Dr. Perry R. Cook, Dr.
George Tzanetakis, and Dr. Andrew Schloss, combined with mentorship from robotic musical instrument sculptors Eric Singer and
the world famous Trimpin. A musician at heart, trained on Drumset, Tabla, Sitar and other percussion instruments from around the
world, Ajay strives to push the technological barrier in order to make new music.
Blending Indian Classical knowledge with the 21st century music scene, by adopting the age of computer human interface. Custom
made Electronic Sitar with wearable sensors controlling modular software systems used to bring dance and tribal groove to the next
dimension. 21st century music for the renaissance audience member, bringing the fields of human-computer interfaces together with
custom software design for space age sonic mosaics.
The Electronic Sitar (ESitar) is a custom built hyperinstrument that captures performance gestures for musical analysis and real-
time musical expression. It has sensors that help deduce rhythmic strumming information, fret detection, and tilt of the neck of the
sitar in 3-axises. All sensor data is converted to MIDI messages and used simultaneously with the audio data from the humbucker
pickup to make new sounds and atmospheres.
The KiOm is a custom built wearable instrument that converts accelerometer data to MIDI messages. It is used on the performer’s
head during performance to aid in sound design.
404
Redshift
Jonathan Pak
124A Johnston St
Fitzroy, VIC 3065 Australia
+613 94171775
jon@pak.id.au
Jonathan Pak is an Australian based electronic musician, new media artist and technology innovator currently interested in humanising
technology to create interactive art through intuitive hardware and software design. In this endeavour he created the Light Matrix interface
subsequently presented at NIME ’06. Jon performs his music regularly in both solo and collaborative work including performances at the
Melbourne International Arts Festival (2007), Melbourne Electronic Music Festival: Electundra (2004/2005) and a collaborative work with
contemporary music group Re-sound entitled Ungrounded (2005). Other works include installation pieces such as Sonic Feast (2007),
Virtual Vandalism (2007) and Comic Effect (2008).
Redshift is an exploration of sound, light and movement within the intimate space beneath the hands of a solitary performer. The
expressive potential of the human hand is harnessed through the reflection of computer-controlled light patterns that are translated into
music.
The centrepiece of this work is the Light Matrix interface: a device consisting of an array of high intensity red LEDs in a bi-directional
configuration acting as both photosources and photosensors. Light intensity measurements are used to manipulate a range of powerful and,
by nature, parameter heavy software synthesisers and effects enabling the performer to sculpt sound timbre with their hands. The changing
light patterns are driven by a sequencer and also derived from the attributes of the resulting sound. This in turn alters the amount of light
reflected from the performer’s hands resulting in a complex and potentially chaotic interplay. In this way the audience can engage with
some of the elements of electronic music such as patterns and automated parameters that traditionally remain hidden during performance.
Figure 1: The Light Matrix in operation
405
Installations

Habitat
Olly Farshi
Goldsmiths College
Electronic Music Studios
Department of Music, Goldsmiths,
University of London, New Cross,
London SE14 6NW
+44 (0) 7725 413 088
olly@ollyandjeremy.com
Olly Farshi is a British Sound Artist and Composer, currently residing in Jyväskylä, Finland. He is currently completing his debut
album for Anglo-Canadian record/media-label CocoSolidCiti, due for release in late 2008. As a Sound Artist, Farshi's explorations
concern playful interaction, digital communication and, in an ongoing piece of research entitled iRedux, notions of intangible
property and connected life-styles. Farshi has been involved with a variety of festivals, venues, exhibitions, conferences and
collaborative projects, including: Resfest Austria, Foldback, Mediaterra Greece, FutureSonic, Commonwealth Film Festival,
Defunktion.net, Shunt Gallery London, Salone Internazionale del Mobile, CocoSolidCiti, KunstForum.
Habitat is an ambient sound installation designed to be exhibited within a public city-space. The installation generates a sonic
landscape – a generative wildlife habitat – based upon how populated the installation space is. The objective of Habitat is to
encourage those who experience it to consider the installation space and, by association, the city-space differently; considering the
impact of commercialisation, gentrification and industrialisation on these public spaces which, at one point in time, were not so
densely populated.
Habitat displaces the default sonic landscape – pedestrians traversing the city-space, commercial ambience etc. - overlaying a
generative ambience constructed entirely out of wildlife and field-recordings. As the population shifts within the physical
installation space – transient individuals enter and leave – the sonic landscape generated by Habitat reflects these changes: various
animals and exotic wildlife emerge, flock and interact with each other, resulting in a vibrant and enchanting ambience.
Using Bluetooth, the Habitat software polls the space every 30 seconds, counting the number of mobile devices carried by
individuals in the installation space. The more devices the system counts, the fewer wildlife sounds will be heard, implying that to
encourage wildlife to return to the space, the individuals with those devices must eschew their technology – switch off their mobile
devices – to re-enliven the Habitat.
As fewer devices are counted, more animals and insects flock to the sonic landscape. When the installation space is empty, in
terms of bluetooth devices counted, the wildlife habitat is vibrant, all possible sounds are playing (see fig1 and fig2 for examples of
empty installation spaces). Thus users with bluetooth mobile devices can leave the space, knowing that in doing so they are having
a positive impact on the sonic landscape, or, as a more compelling course of action , encourage others within the space to switch
off their mobile devices and instigate a mass change on the installation's sonic landscape.
The aim is to heighten awareness and consideration of the city-space that one occupies, utlises and traverses. In experiencing
Habitat, one is given the opportunity to build a new relationship with the city-space, to consider our impact as pedestrians, users
and consumers on this space and to consider its natural origins. As each individual within the space contributes to the juxtaposition
of familiar wildlife sounds alongside the city ambience, Habitat invites the flâneur to stop, observe and listen, in order to
experience the true impact that those traversing the installation space have on the sonic environment.
409
MIRROR OF THE MOON

Soundspace and Video installation
Jeff Talman
Assistant Professor, Visual and Media Arts
Emerson College, Boston, MA
203 W. 109th St, 1W, New York, NY 10025 USA
001 (212) 729-4430
www.jefftalman.com, jefftalman@mindspring.com
Artist Biography
International artist Jeff Talman has created installations in collaboration with the cathedral and the City of Cologne, Germany, for St James
Cathedral, Chicago, at the MIT Media Lab, The Kitchen, Eyebeam and Bitforms Gallery in New York. He completed a series of three
installations in the Bavarian Forest in May 2008. Recognized as ‘a pioneer of the use of resonance in artworks’ by ‘Intute’ the consortium
of British universities, his unique achievement is self-reflexive resonance, in which the ambient resonance of an installation site becomes
its sole sound source. Talman's work further investigates the nature of sound and light as primal wave/radiant forces. Recent awards include
a Guggenheim Foundation Fellowship in Sound Art (2006) and a New York Foundation for the Arts Fellowship in Computer Arts (2003).
Residencies in 2007 include the Liguria Study Center in Bogliasco, Italy; Yaddo, and the Künstlerhaus Krems.
The Sea, the Sun and the Moon (2008)
Project Description
For the 8th Annual NIME Conference and the Museo d’Arte Contemporanea Villa Croce, it seems only natural to begin with the sea, as the
sea permeates the culture and lives of the Genoese. From this it follows that it is entirely appropriate that a sense of the sea should literally
permeate the museum gallery. Rather than the cartoonish effect of merely transposing literal sea sounds to the gallery, I decided instead to
extract sound waves of the tide mapped to the gallery’s resonant frequencies, so that the gallery itself would harmoniously speak of the sea.
Sonic spectral analysis of the gallery provided a chart of the room’s resonant frequencies. I then programmed progressively shaped, digital
filters and used them to filter a recording of sounds of the tide. Only those frequencies that are resonant to the largest gallery space in the
museum were extracted for use in the installation.
Humans do not normally pursue sound as a referential aspect of space. The sense of sound of a space remains largely intuitive and/or sub-
conscious, though it is a significant factor in human spatial cognition – as any blind person would know. By emphasizing the resonance of
the gallery the installation enriches the human perception of the space’s sonic and spatial reality. In MIRROR OF THE MOON this
emphasis on the characteristic sound of the space provides a phenomenological template for hearing/sensing the space, while it serves as
the plastic art material for an expressive sound work. Constructed into a 5-channel sound installation, the temporal field of the work is ever-
changing as one walks through the space. The gallery itself becomes recognizable as a tuned instrument ‘played’ by sounds of the sea.
Further, the gallery space becomes a field of compositional activity, which may be explored interactively by simply walking through the
space and pausing at different locations to witness the room, video projection and the interaction of the room modes and their nodes and
anti-nodes within the space. Everything heard, though heard differently from any location in the room, reflects normally submerged sonic
aspects of the room. Importantly, by harnessing the sound of the sea to the gallery, the installation recognizes a confluence of waveforms,
those of water, sound, light and the effects of gravity. Here the sounds of the Mediterranean meld into an environment that stands in
relation to itself, the people and their city by the sea.
410
Fold Loud
JooYoun Paek
Eyebeam
540 W. 21st street
New York, NY 10011
1 917 238 9448
jypaek@gmail.com
JooYoun is an artist and interaction designer born in Seoul and based in New York. She has created interactive
objects that reflect on human behavior, technology and social change. She earned a Master’s degree from the Interactive
Telecommunications Program at NYU and is currently an Artist in Residence at Eyebeam. JooYoun’s art has been displayed
by the Museum of Modern Art New York, SIGGRAPH 2007, Museum of Science Boston and Seoul Museum of Art. Her work
has also been published in BBC News, Architectural Magazine, Next Magazine and many other publications.
Fold Loud is a (de)constructing musical play interface that uses origami paper-folding techniques and ritualistic
Taoist principles to give users a sense of slow, soothing relaxation. Fold Loud interconnects ancient traditions and modern
technology by combining origami, vocal sound and interactive techniques. Unlike mainstream technology intended for fast-
paced life, Fold Loud is healing, recovering and balancing.
Playing Fold Loud involves folding origami shapes to create soothing harmonic vocal sounds. Each fold is assigned
to a different human vocal sound so that combinations of folds create harmonies. Users can fold multiple Fold Loud sheets
together to produce a chorus of voices. Opened circuits made out of conductive fabric are visibly stitched onto the sheets of
paper, which creates a meta-technological aesthetic. When the sheets are folded along crease lines, a circuit is closed like a
switch. Thus, the interface guides participants to use repetitive delicate hand gestures such as flipping, pushing and
creasing. Fold Loud invites users to slow down and reflect on different physical senses by crafting paper into both geometric
origami objects and harmonic music.
411
in a thousand drops... refracted glances

Kenneth Newby Aleksandra Dulic Martin Gotfrit
Emily Carr Institute of Art Design & Magic Lab, University of British School for the Contemporary Arts,
Media Columbia Simon Fraser University
University College of the Fraser Vancouver, BC, Canada Vancouver, BC, Canada
Valley adulic@interchange.ubc.ca gotfrit@sfu.ca
Vancouver, BC, Canada
knewby@eciad.ca
Kenneth Newby is a media artist whose research and creative practice explores expressive applications of computer assisted media
composition, performance and diffusion. He teaches new media composition and technique. Aleksandra Dulic is a media artist and
theorist working in the area of interactive computer animation with current research underway in performative visualization of sound
and music. Martin Gotfrit’s research centres on the creation, performance and function of music and sound in many different
disciplines and contexts. He is the Director os the School for the Contemporary Arts at Simon Fraser University. Together the three
authors of this work have been working together
as the Computational Poetics Research Group
for the past four years.
Description of Work
in a thousand drops...refracted glances

Interactive Installation. Media Diffusion:
4-channel audio and 100-channel video.
in a thousand drops… refracted glances is an audio/visual sculpture in fragmented space and time that becomes a single
audiovisual image as one interacts with the space of the exhibition. The work presents fragments of the bodies of humans
in hybrid relations to themselves, thereby creating a sense of the fragility of experience. The work reveals a background
made of deeper perennial questions: Who am I? What is my community? Where do its boundaries exist and how
permeable might they be?
The interactive aspects of the work provide points of focus for flows of both audible and visible images. As one moves
with the work a subtle effect is exerted on how these images are animated. Characters composed of multiples emerge and
are accompanied by synchronized emergent musical gestures. The resulting audiovisual environment is one of
construction and deconstruction of bodies through processes of stitching, repetition, collage, stretching, contraction,
multiplication, and reduction. As a result of these processes new hybrid fugal bodies are born that speak to the variety
and complexity of the ecological and interpersonal balances that depend on the mutual interdependencies of the
community of agents that make up its population. Interactions with the work take the form of refracted glances both
rewarding and confounding in an ongoing process of making sense of a chaosmos — the balance between confusion and
order — the fantastic and the logical — dreamt and waking realities.
Musical Interface
A set of layered generative music processes are guided in their production by the data inferred by a motion tracking
system including blob-detection, to determine individual locations for tracking in relation to the space of the installation,
andoptical flow sensing, to determine the relative direction of the participants’ movement. The overall effect of the
interactive process is one of a kind ofspatially dynamic orchestration in which a particuar musical process-gesture is
mapped to either a specific location or a movement style such as motion along the slow-fast spectrum, the near-far
spectrum, and stillness. These states are mapped onto the musical parameters such as orchestration, phrase selection and
detail as well as stochastic characteristics such as glissandi speed and direction. As the same motion tracking information
is also used to guide the visual animations, the audible and visible images have a strong synchronization. The
participants, in this way, become collaborators with the unfolding audio-visual experience.
Given the dynamic character of the multi-screen animation, and the flexibility of the musical production, the work moves
toward what we have been theorizing as a new form of process-based cinematic experience in which the processes
guiding the audible and visible images are braided together into a new heteroform of multiply-mediated experience.
412
Soundscaper
Jared Lamenzo, Mohit Santram,
Kuan Huan, Maia Marinelli
Mediated Spaces
225 East 4th St.
New York, NY 10009
917-405-4352
jared@mediatedspaces.com
The Soundscaper team has worked on many interactive projects and large-scale installations. The four met at NYU's Interactive
Telecommunications Program. Their work collectively has been covered in The New York Times, Popular Science, The Village Voice,
New York Metro, C|NET and others, and has been displayed at the Chelsea Art Museum, Sony Wonder Tech Lab, Eyebeam, 3rd Ward
(Brooklyn), Proflux (RI), VIDEOFORMES (France), China Digital Entertainment Festival, New York University, University of British
Columbia, Svevo Castle (Italy), DUMBO Arts Festival, Refusalon Gallery (San Francisco), STYLIN Festival (Miami), Schautankstelle
Gallery (Berlin), Merce Cunningham Studio, Rockefeller Center, and many others.
As John Cage demonstrated, even in silence, there is sound all around us. By detaching sound from observable cues, recordings often
reveal more about our surroundings than passive hearing, i.e., the sound of industrialized society masking the natural world, and vice-versa.
The Soundscaper is a tool to make geo-tagged recordings, entered from a mobile device. The soundscapes of cities and savannahs can be
captured for later listening and manipulation, by increasingly sophisticated mobile devices. For NIME, we intend to walk around the city
and the sea to create a Genovese soundscape using Nokia N95 phones—the sound of ships, church bells, factories, parks, buoys, etc. These
recordings can be used for sound games in which people are sent out to find or retrieve specific sounds, games of hear-and-seek, and other
applications.
The SoundScaper widget contains a “See Waypoints” section and a “New Waypoint” section. The database contains a list of “Waypoints”
entered by users. Users can enter new recordings, attached to a Waypoint, with a description of the sound. The database stores date and
time information, and a Web page lists the sounds available for listening and download. N.B. The application does not operate on all
networks in all countries. The only technical requirements are phones that run our widget, and small condenser microphones. A project
description may be found at http://mediatedspaces.com/soundscaper/ and a video at http://mediatedspaces/lib/mov/soundscaperlow.mov.
figure i. Major components of the Soundscaper.
413
NIME 08 Performances and Installations Template

Pasquale Napolitano Stefano Perna Pier Giuseppe Mariconda
University of Salerno, Science of soundBarrier_ soundBarrier_
Communication Department +393384934729 +393208676543
Via Ponte Don Melillo
Fisciano (SA) stefanoperna@fastwebnet.it pg.mariconda@alice.it
+39 081 512 31 95
pnapolitano@unisa.it
Pas ale apolitano, born in 8 , is a researcher in design and visual communication. He grduated with onors at the University of Salerno, with a thesis
on the aesthetics of remix. He is a stable collaborator of the faculty of Industial Design at the same university, where he curates didactical laboratories in
visual communication, as an expert in video-design. His doctoral research in Media Studies is focused on the design potential of video as a form of
expression, as well as on the relation between the former and the derive of contemporary visual imaginary. He has participated in a few exhibitions as a
visual artist, videomaker and performer. He co-founded the collective Componibile and the project SoundBarrier. A video of his will be shown at Locarno
film festival 2007, in the sectio nPlay Forward. Some of his papers appeared edited by Plectica edizioni. He also published some essays on the relation of
audiovisual forms and digital cultures, in editions such as Carocci, Cronopio, L’Arca.
tefano Perna orn in 7 , Napoli, Italy, Stefano Perna is pursuing a PhD in Communication Sciences at Ùniversità degli Studi di Salerno´, focused on
the analysis of the Scopic Regimes of Information Age. His work is about Visual Culture, Information Aesthetics, Digital Design. He published several
articles in the field of Visual Studies and New Media Studies. He is author, with Ruben Coen Cagli, of Ber.loose.coin, a digital theory and online project on
contemporary politics, now in the Rhizome Art Base.
Pier i seppe aricon a born in 80 in Avellino, he is Degree in Communication Sciences by thesis in Design entitled The Sound Image. Study and
design scapes, and Communicative Use of Sound”. Musician and Sound Designer (Sdudies of Violin and Piano at the Conservatory D. Cimarosa of
Avellino from 0 to 7) constantly engaged in studies on sound and visual communication and relationships from audiovisual and generative
occurrences. Research in the field and on Interaction Design in realtime audio and video through the use of experimental software (Eyesweb, Processing,
Puredata, Supercollider, etc.). Participation in Vision'R 2008 (international festival of Vjing in Paris) as a programmer team SoundBarrier.
. A moving image is a signal that continuosly change in time. More than forms, figures and volumes, it is made of dimensions, frequencies, intensities. On
a digital medium image and sound share the same substance: electronic coded impulses. What at a first glance may appear as a simple technical factor, can
become an aesthetic hypothesis. In the era of digital technologies we can mention acustic or musical factors of an image not only in a metaphorical way. A
new kind of connection is emerging. Now it is possible to create interactions, driven by mathematical rules, between the audible and the visible media in a
way that erodes the barrier between the two fields of sound and image. With digital media we reach the indistinguishable point where image becomes sound
and sound becomes image. In our project, a patch written in EyesWeb analyzes the moving image extracting some parameters that are subsequently
converted in a data stream. Data are translated to a MIDI signal that controls audio software and synthesizers. This process gives the sound a deep
sensibility to the variations of the moving image. Selected movies are analyzed and played back . Most of them are from american underground cinema of
the '60. The experimental approach of some directors of that period - their approach to cinema as a medium and as a machine - deeply influenced the birth
of what now we call new media . The selection is at the same time a homage to that directors and a media archaeology research.
2. The continuous reference to the universe of the artistic experimentation can act from indicative jewel of the new ways of feel, to perceive and to know
proper of the video-cultures. In this perspective it could be of extreme interest to analyze
that area of experimentation, already very advanced from the point of view of the
realizations but almost entirely unexplored from the theoretical-critical point of view,
transversal to a series of productive practices (software art, genetic art, vjing), that
investigates really the unpublished and disruptive possibilities of elaboration, mapping
and aesthetic result of the video flows and to which he is turning as with always greater
interest not so much the traditional disciplines of aesthetics but the universe of the
planning.
3. In this perspective, the need of a “reinvention” (Rosalind Krauss) of the video medium,
is central. In this constant and militant remediation context our analytical artefact on the
Andy Warhol Empire tape find its basis. The long film of the american artist have a
peculiarity: it puts up the time, not the chronological time, but the enduring time,
theorized by Bergson (peculiarity that is noticeable in many other work of Warohl like so
in other video artists that refer to the New American Cinema). In other words: a
bergsonian permanence exercise. During the eight (and then some) hours of the film,
Warhol never quits the fixed eye on the skyscraper; this way of proceeding involve that the
scan of the time is entrust to the shake, the flow and the flicker of the film on the tape
head. As in other works of that period all the channel distortions became incarnation of
sense. As to say an antelitteram glitch. In mcluhanian terms: the medium (and its material
device) is the message.
. The core of the project is: try to map and sonificate this film movements, vibrations and distortions. What follows is the patch created with EyesWeb in a
screenshot beside.
414
China Gates
Art Clay Dennis Majoe
ETH Zurich ETH Zurich
Institute for Computer systems Institute for Computer systems
Clausisusstrasse 59, Zurich CH-8092 Clausisusstrasse 59, Zurich CH-8092
++41 044 632 84 14 ++41 044 632 73 23
art.clay@inf.ethz.ch dennis.majoe@inf.ethz.ch
Art Clay (USA/CHE) is a specialist in the performance of self-created works with the use of intermedia. He has appeared at international
festivals, on radio and television in Europe, Asia & North America. Recently, his work focuses on large performative works and public
spectacles using mobile devices. He is artistic director of the Digital Art Weeks in Zurich and teaches at various Institutes including the
Zurich University of the Arts. http://mypage.bluewin.ch/artclay
Dennis Majoe has a PhD in Navigation related Electronic systems and has worked extensively in the design of a variety of motion and
orientation sensing systems and computer generated environments including 3D audio. He is director of MASC, an innovative electronics
and computer design company active in the field of wireless communications. In addition to his activities at MASC he is as a senior
researcher on the ETH Zurich for the Computer Systems Department, where he developing applications related to proactive health.
Description of Piece (performative Installation)

The work China Gates is technically based on possibilities of synchronizing a group of performers using the clock pulse emitted from GPS
satellites. Aesthetically, China Gates is rooted in works for open public space and belongs to a genre of works, which celebrate the use of
innovative mobile technologies to explore public space and public audience. The performance takes place in a limited city area such as a
city square, a park and open courtyard. A series of tuned gongs is used. The number of gongs is greater than the number of performers
participating. Tuned to an Eastern musical scale, these gongs give the piece a touch of the orient on the horizontal, melodic side and a
western type dissonance on the vertical, chord structure side. The gongs are circulated amongst the players by an exchange process so that
an on going change in harmonies can be achieved. Each of the players wanders through the performance space freely. A custom built GPS
interface on the wrist registers the player’s position and determines to geographical coordinates when to play the gong. By using a delay
between the satellite clock pulse and the LED that indicates when to strike the gong, a harmolodic effect is obtained as the players
gradually shift from a chordal to a melodic structure dependent on geographical coordinates. In general, each player tries to move when
another is not, so that a “choreographic counterpoint” results that allows for a rhythmic-melodic coloring caused by the vertical to
horizontal unfolding of the struck gong chord. The performance ends for each player at the return to the start point. The interface therefore
acts as a “conductor”, indicating when the gongs are to be hit and how the music as a whole will sound in the end.
415
Workshops

4th i-Maestro Workshop on

Technology-Enhanced Music Education
Kia Ng
ICSRiM - University of Leeds
School of Computing & School of Music
Leeds LS2 9JT, UK
info@i-maestro.org, www.i-maestro.org
Recent advancements in digital media, ICT, and related technologies offer many opportunities for supporting music education. The
i-Maestro workshop series (see www.i-maestro.org/workshop) aims to explore the subject area, including but not limited to the following
topic of interests: Gesture/posture analysis, support and understanding; Score and gesture following; Multimodal interfaces, with
visualization, sonification, etc; Augmented instruments; Cooperative environments for music; Music notation and representation;
Technology enhanced music pedagogy; Linking theory and practice trainings; Exercise generation, packaging and distribution;
Courseware authoring and generation; Assessment support; Profiling and progress monitoring; and more generally Interactive multimedia
music.
The i-Maestro project is partially funded by the European Commission under the IST 6th Framework to explore interactive multimedia
environments for technology enhanced music education. The project explores novel solutions for music training in both theory and
performance, building on recent innovations resulting from the development of computer and information technologies, by exploiting new
pedagogical paradigms with cooperative and interactive self-learning environments, and computer-assisted tuition in classrooms including
gesture interfaces and augmented instruments with particular focus on bowed string instruments.
The resulting i-Maestro framework for technology-enhanced music learning is intended to support the creation of flexible and
personalisable e-learning courses. It shall to offer pedagogic solutions and tools to maximise efficiency, motivation, and interests in the
learning processes and improve accessibility to musical knowledge.
At the time of going to press, the provisional programme for this workshop from the first call includes the following presentations and
demos:
An Overview of the i-Maestro Project on Technology Enhanced Learning for Music, Kia Ng
This presentation provides an introduction to the i-Maestro project. It discusses the overall aims and objectives, latest results, achievements
and future directions. An overview of the key components is given with their corresponding pedagogical contexts. This opening
presentation will introduce the structure of the workshop highlighting aspects of the various presentations and demos to follow.
Analysis and Sonification of Bowing Features for String Instrument Training, Oliver Larkin, Thijs Koerselman, Bee Ong, and Kia Ng
The i-Maestro 3D Augmented Mirror (AMIR) allows a teacher and student to study a performance using 3D motion capture technology
and provides multimodal feedback based on analyses of the data. This paper presents a module for AMIR that maps bowing feature
analyses to sound parameters. We describe several sonifications which are designed to be used in both real-time and non-real-time
situations offering an alternative way of looking at the performance which, in some cases, has advantages over visualisation techniques.
Three Pedagogical Scenarios using the Sound and Gesture Lab, Nicolas Rasamimanana, Fabrice Guedy, Norbert Schnell, Jean-
Philippe Lambert, and Frederic Bevilacqua
This article reports pedagogical experimentations using the “Sound and Gesture Lab”, a prototype application that allows for the
manipulation and processing of sound and gesture data of a real music player. Three scenarios were designed to provide real-time
interactions between an instrumentalist and a computer, or between several instrumentalists. The scenarios make use of direct sonifications
of bow movements, gesture following and sound synthesis. As such, they create movement based interactions that can develop students’
embodiment in music.
Integration of i-Maestro Gesture Tools, Thijs Koerselman, Oliver Larkin, Bee Ong, Nicolas Leroy, Jean-Philippe Lambert, Diemo
Schwarz, Fabrice Guedy, Norbert Schnell, Frederic Bevilacqua, and Kia Ng
We discuss two prototype pedagogical applications which allow the study of string instrument bowing gesture using different motion
capture technologies. Several modalities for integration of these tools are presented, and the advantages and implications of the various
approaches are discussed. Data exchange between the two applications is achieved using the Sound Description Interchange (SDIF) format
which has, until now, been used primarily as a format for storing audio analysis data. We describe our method of storing motion- and
analysis- data in the SDIF format and discuss future works in this direction.
Music Representation with MPEG-SMR, Pierfrancesco Bellini, Francesco Frosini, Nicola Mitolo, and Paolo Nesi
Symbolic music representation is a logical structure of symbolic elements representing music events and the relationships among those
events, and with other media types. The evolution of information technology has recently produced changes in the practical use of music
representation and notation, transforming it from a simple visual coding model for sheet music into a tool for modelling music in computer
programs and electronic devices in general with strong relationships with other audiovisual data: video, images, audio, animations, etc.
419
MPEG SMR is a new ISO standard integrated into MPEG-4 allowing the realization of new applications in which multimedia and music
notation may take advantage and enrich each other in terms of functionalities.
The European Curriculum Challenge: a case study on technology-supported specialised music education, Kerstin Neubarth, Vera
Gehrs, Lorenzo Sutton, Tillman Weyde, and Laura Poggio
The European project i-Maestro is developing an interactive multimedia environment to support music teachers and students at music
schools and conservatories across Europe, focusing on bowed string instruments. This paper analyzes the curricula of European music
education institutions against: pedagogic needs reported by music teachers; desired learning outcomes defined by regulatory frameworks;
findings from music psychology and education research; and the technological priorities of the i-Maestro environment. The analysis leads
to an identification of curriculum areas that are expected to benefit from technology-enhanced teaching and learning experiences as offered
by i-Maestro.
Collaborative Working for Music Education, Pierfrancesco Bellini, Francesco Frosini, Nicola Mitolo, and Paolo Nesi
In the area of technology enhanced music training, cooperative work is becoming an increasingly feasible concept. It can be used to
experiment with and to exploit new modalities of training and to reduce the set up time necessary for the organization of group work -
integrating distributed systems for audio-visual processing and general control (i.e. synchronous playback, recording). To this end, a
flexible model to cope with groups, roles, tools, and large sets of features is required. This paper presents a Cooperative Work environment
for Max MSP to facilitate the creation of a variety of cooperative applications (not limited to educational uses) structurally supporting:
group role and tool concepts, undo/redo, joining and rejoining, preserving the simplicity and consistency of commands, etc. The system
can be used for a large variety of multimedia/multimodal music applications.
Integration Aspects in i-Maestro, Marius Spinu, Giacomo Villoresi, Maurizio Campanai, Andrea Mugellini, and Fabrizio Fioravanti
The integration of large software projects is a complex task. The i-Maestro framework is composed of many modules created with
different technologies such as Max/MSP, C++, Java and Php. Thus integration presents challenges in diverse fields: from communications
protocols to Graphical User Interface harmonisation, not forgetting the connection between different tools made with graphical- and
procedural- programming environments. Minor changes performed in any module could have significant effects on other modules both
from a technical and usability perspective. This paper discusses the integration of the i-Maestro project components during the first two
years of development, considering the various aspects such as the collaborative environment, technical and pedagogical issues and
strategy.
i-Maestro: Making Music Tuition Technologies Accessible, Neil McKenzie and David Crombie
This paper discusses a set of reusable user interfaces that have been created for viewing, navigating and editing music notation in
accessible formats. These were successfully incorporated into software developed during the i-Maestro project which is investigating the
role of technology in improving the quality of music education across Europe. The work follows on from the AccessMusic project which
created converters for producing Braille and speech output from scores created in the popular Finale package. Furthermore, these
interfaces have been designed such that they can be coupled onto any music notation viewer or editor.
The paper will present the findings related to this area of the i-Maestro project and provide a demonstration of the tools and interfaces
created for visually impaired end users during the course of the project.
Introducing a Novel Musical Teaching Automated Tool to Transfer Technical Skills from an Anthropomorphic Flutist Robot to
Flutist Beginners, Jorge Solis and Atsuo Takanishi
Up to now, different kinds of musical performance robots (MPRs) and robotic musicians (RMs) have been developed. MPRs are designed
to closely reproduce the human organs involved during the playing of musical instruments. In contrast, RMs are conceived as automated
mechanisms designed to introduce novel ways of musical expression. Our research on the Waseda Flutist Robot has been focused on
clarifying the human motor control from an engineering point of view. As a result, the Waseda Flutist Robot No. 4 Refined IV (WF-4RIV)
is able of playing the flute nearly similar to an intermediate player. Thanks to the human-like design and the advanced technical skills
displayed by the WF-4RIV, novel ways of musical education can be conceived. In this paper; the General Transfer Skill System (GTSS) is
implemented on the flutist robot, towards enabling the automated transfer of technical skills from the robot to flutist beginners. A set of
experiments are carried out to verify the evaluation and interaction modules of the GTSS. From the experimental results, the robot is able
of quantitatively evaluating the performance of beginners, and automatically recognizing the melodies performed by them.
Audio-driven Augmentations for the Cello, Benjamin Lévy and Kia Ng
This paper presents the development of a suite of audio-driven sonifications and effects for the acoustic cello. Starting with a survey of
existing augmented string instruments, we discuss our approach of augmenting the cello, with particular focus on the player’s gestural
control. Using features extracted from the audio input we maintain the player’s normal interactions with the instrument and aim to provide
additional possibilities to allow new expressions with technology. The system is developed in Max/MSP. It comprises analysis and
processing modules that are mapped through virtual layers forming effects for either live improvisation or composition. The paper
considers the musicality of the effects and pedagogical applications of the system.
Acknowledgement
The i-Maestro project is supported in part by the European Commission under Contract IST-026883 I-MAESTRO. The authors would like
to acknowledge the EC IST FP6 for the partial funding of the I-MAESTRO project (www.i-maestro.org), and to express gratitude to all I-
MAESTRO project partners and participants, for their interests, contributions and collaborations.
420
Tablet Workshop for Performers and Teachers
Michael Zbyszyński
Center for New Music and Audio
Technologies, UC Berkeley
1750 Arch Street
Berkeley, California USA
+1.510.643.9990 x 314
mzed@cnmat.berkeley.edu
This is a workshop for people who want to get started with tablet-based interfaces or who
want to teach others to use tablet interfaces for music. It is based on my method book (in
progress, see paper accepted to this year's NIME) and will cover:
Basics:
• Short history of pen and tablet based
interfaces for music
• Choosing a tablet, styluses, etc. and
installing and running on your operating
system
• Implementation in musical software
including Max/MSP, Pd, and
OpenSoundControl
Exercises and Etudes

• Musical games to help build technical
skills
• Short études that address both skills
and aesthetics
Répertoire
• Workshop presenters will perform and
demonstrate their own works, illustrating
performance and mapping paradigms that are unique to their situations, as well as general
strategies
• Participants will be given a selection of tablet interfaces that could make up a "tablet
orchestra." The final stage of the workshop will be a group performance on diverse tablet
instruments. (It would be great if we could schedule a "tablet jam" during the conference,
perhaps at one of the concerts.)
421
Participants will leave the workshop with an overview of prior musical work with tablets,
specifically recent pieces by members of the NIME Community. They will also have many
pre-built software tools to implement their own work, to continue practicing tablet skills, and
to teach in individual and classroom settings.
Presenters:
Myself (http://www.mikezed.com/) , and I intend to invite Matt Wright (http://ccrma.stanford.edu/~matt/) ,
Ali Momeni (http://alimomeni.net/), Jan Schacher aka jasch (http://www.jasch.ch/), and Nicholas
D'Alessandro (http://www.dalessandro.be/). They have all expressed prior interest in the tablet
pedagogy project.
Participants:
Approximately 25 people who want to get started with tablet-based interfaces or who want
to teach others to use tablet interfaces for music.
Schedule:
4 hours -- 1.5 hours lecture/demonstrations followed by a short break and approximately 2
hours of hands on laboratory
422
Techniques for Gesture Measurement in Musical

Performance
R. Benjamin Knapp Marcelo Wanderley Gualtiero Volpe
Sonic Arts Research Centre McGill University Casa Paganini – InfoMus Lab
Queen’s University Montreal, Quebec Piazza S. Maria in Passione 34
Belfast, Northern Ireland Canada H3A 1B9 16123 Genova, Italy
+44 (0)28 9097 4069 +1 514 398-4535 +39 010 2758252
b.knapp@qub.ac.uk marcelo.wanderley@ gualtiero.volpe@unige.it
mcgill.ca
Description of the Workshop
This workshop will present three complementary approaches to measuring gesture during musical
performances:
1. On-body (ambulatory) sensing
2. Motion capture
3. Measuring expressive gesture qualities
The advantages and limitations of each technique will be explored. A review of current equipment
and techniques of data acquisition (including methods of synchronization), processing, and storage
will be presented along with video and live demonstrations of these methods.
The Details:
On-body (ambulatory) sensing – Ben Knapp:
This component of the workshop will examine the use of a number of on-body sensor systems for
measuring kinematics and physiological state during performance. The simultaneous use of motion
sensors (e.g. accelerometers, gyros, and location trackers), force sensors (e.g. FSRs, QTCs, and strain
gauges), and bioelectric sensors (e.g. EMG, EKG, EEG, and GSR) will be demonstrated. The key
issues of synchronization and multimodal processing will be discussed.
423
Motion Capture – Marcelo Wanderley:

This part of the workshop will review the basics of using motion capture
(mocap) for the analysis of music performance and present various case
studies of projects using this technique. It will include details about the
most common mocap technologies currently used, including passive and
active infrared, electromagnetic, and inertial systems, as well as software
applications used to analyze mocap data. Examples of mocap data using
systems such as Vicon and BTS Elite (passive infrared), NDI Optotrak
and Certus (active infrared), Polhemus Liberty (electromagnetic), and
XSens (inertial) will be presented. Finally, we will discuss issues of cost-
effectiveness and complexity of analysis using such techniques.
Measuring expressive gesture qualities – Gualtiero Volpe:

This part of the workshop will examine the use of video based techniques for analysis of music
performance, with a particular focus on extraction of expressive descriptors from performers gesture.
It will include details on equipment (different kinds of videocameras and their properties), basic
computer vision algorithms (e.g., background segmentation and blob tracking), and advanced
expressive gesture processing techniques (e.g., extraction of gestural descriptor such as amount of
motion, contraction/expansion, directness, impulsiveness). Potentialities and limitations of such
techniques will be discussed. Concrete examples will be demonstrated using the EyesWeb XMI open
platform for eXtended Multimodal Interaction (www.eyesweb.org).
Format:
The workshop will consist of 90 minutes of presentations on the measurement techniques followed
by 90 minutes of interactive discussions and demonstrations.
Audience:
The workshop is intended for anyone interested in understanding the quantitative and qualitative
aspects of measuring gestures during live performance of music. It is open to both the novice as well
as those that are already experienced in specific techniques but might be interested in learning more
about some of the other methods of gesture acquisition.
424
Jamoma Workshop
Alexander Refsum Jensenius,a Timothy Place,b Trond Lossius,c
Pascal Baltazar,d Dave Watsone
a)
University of Oslo & Norwegian Academy of Music, a.r.jensenius@imv.uio.no
b)
Electrotap, tim@electrotap.com
c)
BEK - Bergen Center for Electronic Arts, lossius@bek.no
d)
GMEA - Groupe de Musique Electroacoustique dʼAlbi-Tarn, pb@gmea.net
e)
dave@elephantride.org
Jamoma
Jamoma1 is an open-source project for developing a structured and modularized approach to programming in Max/MSP and Jitter. The
main idea of Jamoma is the module, which is built up of a separate algorithm patcher and a patcher containing the graphical user
interface. Figure 1 shows examples of the three main types of modules: control, audio and video.
Figure 1. Examples of the three main types of modules (from left): control, audio and video.
Besides providing a large collection of ready-made modules, one of the core strengths of using Jamoma in Max development is that it
simplifies the creation of large projects by enforcing a structured approach to the patching. Jamoma uses Open Sound Control (OSC)
for internal and external messaging, thus making it easy to communicate to and from modules. Recent development adds support for
cues, various types of mappings and an extensive ramping library including a modular function and dataspace library.
Description of the Workshop

The workshop is aimed at beginning and intermediate Jamoma users. Participants should be familiar with Max/MSP and Jitter to get
the most out of the workshop. We advise participants to download the latest version of Jamoma before the workshop.
 Fundamental ideas of Jamoma:
o Why proposing a structured approach to development and control of modules in Max?
o The need for modularity and flexibility
o A model-view-controller approach to modular development
 Working with Jamoma:
o Creating a patch from modules
o Communicating to and from modules
o Ramping of parameters
o How the concept of values and properties extends the possibilities for expression
o Mapping between modules
o Setting up cues
 Developing a module:
o Model: The algorithm
o View: Designing the graphical user interface
o Controller: Setting up internal communication in a module
o Handling presets
o Documentation
 Examples of scientific and artistic projects using Jamoma
More information about the workshop at: http://www.jamoma.org/wiki/JamomaWorkshopInGenova2008
1
http://www.jamoma.org
425

Author Index

Abdallah, Samer, 215 Drayson, Hannah, 400
Aitenbichler, Erwin, 285 Dubrau, Josh, 164
Ajay, Kapur, 144 Dulic, Aleksandra, 384, 412
Alonso, Marcos, 207, 211 Eigenfeldt, Arne, 144
Baltazar, Pascal, 382, 425 Endo, Ayaka, 345
Barbosa, Álvaro, 9 Essl, Georg, 185, 392
Bau, Olivier, 91 Falkenberg Hansen, Kjetil, 207
Bencina, Ross, 197 Farshi, Olly, 409
Berdahl, Edgar, 61, 299 Favilla, Paris, 366
Blackwell, Alan, 28 Favilla, Stuart, 366, 387
Bokowiec, Mark Alexander, 388, 389 Feldmeier, Mark, 193
Bossuyt, Frederick, 229, 372 Fernstrom, Mikael, 103
Bouënard, Alexandre, 38 Ferrari, Nicola, 379
Bouillot, Nicolas, 13, 189 Fitzpatrick, Geraldine, 87
Boyle, Aidan, 269 Flanigan, Lesley, 349
Bozzolan, Matteo, 24 Follmer, Sean, 354
Bryan-Kinns, Nick, 81, 319 Fraietta, Angelo, 19
Butler, Jennifer, 77 Freed, Adrian, 107, 175
Camurri, Antonio, 134 Friberg, Anders, 128
Canazza, Sergio, 140 Fried, Joshua, 397
Canepa, Corrado, 134 Gatzsche, Gabriel, 325
Cannon, Joanne, 366, 387 Geiger, Christian, 303
Caporilli, Antonio, 396 Gibet, Sylvie, 38
Chabrier, Renaud, 396 Girolin, Roberto, 378
Chant Dale, 366 Godbehere, Andrew B., 237
Chordia, Parag, 331 Goina, Maurizio, 150
Ciglar, Miha, 203, 399 Gotfrit, Martin, 412
Ciufo, Thomas, 390 Grosshauser, Tobias, 97
Clay, Art, 415 Hadjakos, Aristotelis, 285
Coghlan, Niall, 233 Hamel, Keith, 383
Coletta, Paolo, 134 Hartman, Ethan, 356
Collins, Nick, 87 Hashida, Mitsuyo, 277
Cooper, Jeff, 356 Havryliv, Mark, 164
Cooperstock, Jeremy R., 13, 189 Hayafuchi, Kouki, 241
Corcoran, Greg, 400 Hazlewood, William R., 281
Corness, Greg, 265 Hébert, Jean-Pierre, 261
Cospito, Giovanni, 24 Henriques, Tomás, 307
Crevoisier, Alain, 113 Hicks, Tony, 366, 387
d'Alessandro, Nicolas, 401, 403 Houle, François, 384
Dattolo, Antonina, 140 Huan, Kuan, 416
De Jong, Staas, 370 Ito, Yosuke, 277
de Martelly, Elizabeth, 339 Jacobs, Robert, 193
Dekleva, Luka, 399 Jacquemin, Christian, 122
Delle Monache, Stefano, 154 Jensenius, Alexander R., 181, 425
Demey, Michiel, 229, 372 Jo, Kazuhiro, 315
Dimitrov, Smilen, 211 Källblad, Anna, 128
Dixon, Simon, 364 Kamatani, Takahiro, 360
Doro, Andrew, 349 Kamiyama, Yusuke, 352
429
Kapur, Ajay, 144, 404 Paek, Joo Youn, 411
Katayose, Haruhiro, 277 Pak, Jonathan, 405
Kellum, Greg, 113 Pakarinen, Jyri, 49
Kiefer, Chris, 87 Palacio-Quintin, Cléo, 293, 402
Kim-Boyle, David, 3 Papetti, Stefano, 154
Kimura, Mari, 219 Paradiso, Joseph A., 193
Klauer, Giorgio, 380 Paschke, David, 303
Knapp, R. Benjamin, 117, 233, 425 Pelletier, Jean-Marc, 158
Knopke, Ian, 281 Penfield, Kedzie, 117
Kuhara, Yasuo, 345 Penttinen, Henri, 392
Kuuskankare, Mika, 34 Perna, Stefano, 414
Kuyken, Bart, 229 Place, Timothy, 181, 425
Kyoya, Miho, 360 Plumbley, Mark D., 81, 319
Lähdeoja, Otso, 53 Pohu, Sylvain, 402, 403
Lamenzo, Jared, 413 Polotti, Pietro, 150, 154
Langley, Somaya, 197 Pöpel, Cornelius, 303
Lanzalone, Silvia, 273, 398 Poulin-Denis, Jacques, 386
Laurson, Mikael, 34 Price, Robin, 311
Leman, Marc, 229, 372 Prinčič, Luka, 399
Lossius, Trond, 181, 425 Puputti, Tapio, 49
Loviscach, Joern, 221 Rae, Alex, 331
Mackay, Wendy, 91 Räisänen, Juhani, 57
Macrae, Robert, 364 Rebelo, Pedro, 311
Maes, Pattie, 67 Reckter, Holger, 303
Majoe, Dennis, 415 Rigler, Jane, 395
Maniatakos, Vassilios-Fivos A., 122 Robertson, Andrew, 215, 319
Mariconda, Pier Giuseppe, 414 Rocchesso, Davide, 154
Marinelli, Maia, 416 Rohs, Michael, 185
Marquez-Borbon, Adnan, 354 Roma, Gerard, 249
Mazzarino, Barbara, 134 Romero, Ernesto, 385
McMillen, Keith A., 347 Rootberg, Alison, 339, 391
Mehnert, Markus, 325 Santram, Mohit, 416
Menzies, Dylan, 71 Sartini, Alessandro, 381
Messier, Martin, 386 Schacher, Jan C., 168
Misra, Ananya, 185 Schedel, Margaret, 339, 391
Miyama, Chikashi, 383 Schmeder, Andy, 175
Modler, Paul, 358 Schutz, Florian, 303
Mühlhäuser, Max, 285 Serafin, Stefania, 211
Myatt, Tony, 358 Settle, Zack, 13, 189
Nagano, Norihisa, 315 Sjöstedt Edelholm, Elisabet, 128
Napolitano, Pasquale, 414 Sjuve, Eva, 362
Nash, Chris, 28 Smith III, Julius O., 299
Nesi, Paolo, 225 Spratt, Kyle, 356
Newby, Kenneth, 412 Steiner, Hans-Christoph, 61
Ng, Kia, 225, 419 Stöcklmeier, Christian, 325
Oldham, Collin, 61 Stowell, Dan, 81
OʼModhrain, Sile, 117 Suzuki, Kenji, 241, 360
Ortiz Perez, Miguel, 400 Svensson, Karl, 128
430
Tahiroglu, Koray, 400
Takegawa, Yoshinari, 289
Talman, Jeff, 410
Tanaka, Atau, 91
Tanaka, Hiroya, 352
Tanaka, Mai, 352
Teles, Paulo Cesar, 269
Terada, Tsutomu, 289
Thiebaut, Jean-Baptiste, 215
Torre, Giuseppe, 103
Torres, Javier, 103
Tsukamoto, Masahiko, 289
Uchiyama, Toshiaki, 360
Välimäki, Vesa, 49
Valle, Andrea, 253, 257
Vanfleteren, Jan, 229, 372
Verstichel, Wouter, 229
Vinjar, Anders, 335
Vogrig, Esthel, 385
Volpe, Gualtiero, 134, 423
Wanderley, Marcelo M., 38, 423
Wang, Ge, 392
Ward, Nathan J., 237
Ward Nicholas, 117
Warren, Chris, 354
Watson Dave, 425
Wilde, Danielle, 197
Wilson-Bokowiec, Julie, 388, 389
Wozniewski, Mike, 13, 189
Xambó, Anna, 249
Young, Diana, 44
Zannos, Ioannis, 261
Zbyszynski, Michael, 245, 421
Zoran, Amit, 67

431

Uosc The Open Sound Control Reference PL PDF

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Uosc The Open Sound Control Reference PL PDF

Caricato da

Copyright:

Formati disponibili

8th International Conference

Casa Paganini – InfoMus Lab

Antonio Camurri, Stefania Serafin, and Gualtiero Volpe

With the support of

EU ICT Project SAME (www.sameproject.eu)

Conference Chairs Alvaro Barbosa Sergi Jorda

The scientific program of NIME08 includes 2 keynote lectures, 34 oral presentations,

Enjoy NIME 08!

Antonio Camurri and Gualtiero Volpe

Genova, May 8, 2008

Thursday, June 5, 2008

Session 1: Networked music performance 1

Session 2: Networked music performance 2

Session 3: Analysis of performers gesture

Friday, June 6, 2008

Session 6: Evaluation and HCI methodologies

Session 7: Sensing systems and measurement technologies

Session 8: Active listening to sound and music content

Session 9: Agent-based systems

Session 10: Sensing systems and measurement technologies

Thursday, June 5, 2008 - Session 1

Friday, June 6, 2008 - Session 2

Saturday, June 7, 2008 - Session 3

Ayaka Endo, Yasuo Kuhara

Opening concert ....................................................................................................... 377

CLUB PERFORMANCES 393

AUTHOR INDEX 427

Network Musics - Play, Engagement and the Democratization of

Ulrike Gabriel [2]. In Global String, network traffic between

2.2 Generative Works

Figure 4. Specially modified PDAs for Malleable Mobile

2.4 Immersive Works

Ten-Hand Piano: A Networked Music Installation

ABSTRACT computer networks as a channel to connect performing spaces. It

3.1 The User Interface

Fig.1 Casa da Musica Building 1

Fig. 4 PSOs Client interface showing the representation of 5 users

building’s unique architecture (a project by Rem Koolhaas), so 't 1 Time

Large-Scale Mobile Audio Environments

Mike Wozniewski & Zack Settel Jeremy R. Cooperstock

ABSTRACT within our physical environment. This prospect yields a

1. INTRODUCTION & BACKGROUND

Figure 3: Two participants jamming in a virtual

Open Sound Control: Constraints and Limitations

ABSTRACT time control of sound” [17] and is unsuitable as an end-to-end

2.1.1 Intuitive Names

2.3 Time Tags 3.2 OSC is Efficient

3.2.1 Communications Bandwidth 3.2.3.1 Internet Mapping

3.2.4 Address Pattern Matching

SMuSIM: a Prototype of Multichannel Spatialization

Matteo Bozzolan Giovanni Cospito

ABSTRACT the multichannel spatialization of sound sources. In par-

Figure 2: Input devices used for SMuSIM.

The second device is a gamepad, a classical gaming con-

where K is the maximum number of sound sources in-

Realtime Representation and

ABSTRACT from multiple, misaligned passages of music could be argued to

which is physical sound itself. A periodic waveform, such as a

example, the system’s difference in feedback between the two

As with many musical instruments, mastering the interaction

Towards Idiomatic and Flexible Score-based Gestural

Mikael Laurson Mika Kuuskankare

ABSTRACT 2. SCORE-BASED SYNTHESIS CONTROL