Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
on New Interfaces for Musical Expression
NIME08
Proceedings
http://nime08.casapaganini.org
http://www.casapaganini.org
http://www.infomus.org
Printed by: BETAGRAFICA scrl
ISBN:13-978-88-901344-6-3
In collaboration with
Facoltà di Lettere e Filosofia, Università degli Studi di Genova
Conservatorio di Musica “Niccolò Paganini”
Museo d’Arte Contemporanea Villa Croce
Ufficio Paganini, Comune di Genova
Polo del Mediterraneo per l’Arte, la Musica e lo Spettacolo
Accademia Ligustica di Belle Arti
Casa della Musica
GOG − Giovine Orchestra Genovese
AIMI (Associazione di Informatica Musicale Italiana)
Centro Italiano Studi Skrjabiniani
Goethe−Institut Genua
Fondazione Bogliasco
Fondazione Spinola
Associazione Amici di Paganini
Festival della Scienza
Radio Babboleo
Partners
Sipra SpA
Sugar Srl
I
NIME 08 Committees
III
Performance Sandra Solimano Laura Santini
Committee Atau Tanaka Olivier Villon
Miguel Azguime
Andreas Breitscheid Demo Committee Organizing Committee
Pascal Decroupet Alain Crevoisier Corrado Canepa
Michael Edwards Sofia Dahl Francesca Cavallero
Neil Leonard Amalia De Gotzen Roberto Doati
Michelangelo Lupone Emmanuel Flety Nicola Ferrari
Pietro Polotti Matija Marolt Roberta Fraguglia
Curtis Roads Barbara Mazzarino Donald Glowinski
Jøran Rudi Douglas Irving Repetto Lauro Magnani
Rodrigo Sigal Kenji Suzuki Andrea Masotti
Alvise Vidolin Andrea Valle Barbara Mazzarino
Daniel Weissberg Giovanna Varni Valentina Perasso
Iannis Zannos Roberto Sagoleo
Club NIME Committee Marzia Simonetti
Installation Committee Frédéric Bevilacqua Francesca Sivori
Jamie Allen Nicolas Boillot Sandra Solimano
Philippe Baudelot Marco Canepa
Nicola Bernardini Jaime Del Val NIME Secretariat
Riccardo Dapelo Davide Ferrari Roberta Fraguglia
Scott deLahunta Jean Jeltsch Francesca Sivori
Nicola Ferrari Eric Lapie
Sergi Jorda Claudio Lugo
Lauro Magnani Press
Leïla Olivesi
Pedro Rebelo Michele Coralli
Kéa Ostovany
Franco Sborgi Guillaume Pellerin
Eric Singer Nicolas Rasamimanana Cover design
Pavel Smetana Christophe Rosenberg Studiofluo
IV
Preface
We are proud to present the 8th edition of the International Conference on New
Interfaces for Musical Expression (NIME08), hosted by Casa Paganini - InfoMus Lab,
Università degli Studi di Genova.
Since 2005, InfoMus Lab has its new premises in the recently restored monumental
building of S. Maria delle Grazie La Nuova - Casa Paganini. The International Centre of
Excellence Casa Paganini – InfoMus Lab aims at cross-fertilizing scientific and
technological research with humanistic and artistic research. Our research explores the
relationships between music, science and emerging technologies: a mission that recalls
Niccolò Paganini´s spirit of experimentation.
New perspectives in contemporary music, in multimedia and digital luthery are among
the main purposes of the Centre. Casa Paganini - InfoMus Lab studies new directions in
scientific and technological research to improve quality of life (e.g., therapy and
rehabilitation, leisure, sport, edutainment), to develop novel industrial applications and
services (e.g., innovative interfaces and multimedia applications), to contribute to
culture (e.g., museography, support cultural heritage through new technologies).
In this framework, the NIME Conference is a unique occasion for Casa Paganini to
present on the one hand its research outcomes and activities to the scientific community
and on the other hand to get inspiration and feedback for future work. Further, our
efforts have been directed in involving in NIME the most important institutions and the
whole city of Genova. For example, besides the monumental site of Casa Paganini,
which hosts the welcome concert and the scientific sessions, concerts will be held at the
Music Conservatory “Niccolò Paganini”, demos at Casa della Musica, installations at the
Museum of Contemporary Art “Villa Croce” and at the Faculty of Arts and Philosophy of
the University of Genova, posters in the ancient convent of Santa Maria di Castello, club
NIME performances at four different cafés and clubs in Genova (010, Banano Tsunami,
Cafè Garibaldi, Mentelocale).
The artistic program encompasses a welcome concert, 3 NIME concerts, 4 Club NIME
performances, and 7 installations. The NIME concerts and the Club NIME performances
include 23 music pieces, selected by the program committee out of 63 submissions.
The welcome concert on June 4 evening, offered by Casa Paganini – InfoMus Lab in
collaboration with major music institutions in Genova, will present 4 novel music pieces
by young composers using EyesWeb XMI: one of the pieces has been commissioned to
tackle some open problems on networked performance faced in the EU Culture 2007
Project CoMeDiA; another piece has been commissioned to exploit a paradigm of
"active music listening" which is part of the EU FP7 ICT Project SAME.
V
Four workshops will precede and follow the official NIME program on June 4 and 8: a
workshop on technology enhanced music education, a tablet workshop for performers
and teachers, one on Jamoma, and one on techniques for gesture measurement in
musical performance.
Moreover, this year the 4th Sound and Music Computing (SMC) Summer School is held
at Casa Paganini in connection with NIME08, on June 9 - 11, 2008. The program of the
school includes plenary lectures, poster sessions, and hands-on activities. The school
will address the following topics: Gesture and Music - Embodied Music Cognition,
Mobile Music Systems, and Active Music Listening.
Organizing the NIME Conference is a huge effort, which is affordable only with the help
of many people. We would like to thank the members of the NIME Steering Committee
for the precious and wise suggestions, the demo and installation chair Corrado Canepa,
the performance chair Roberto Doati, the club performance chair Donald Glowinski, and
the members of our program committees who helped in the final selection of papers,
posters, demos, installations, and performances.
We wish to thank the Rector of the University of Genova Professor Gaetano Bignardi,
the Culture Councilor of Regione Liguria Fabio Morchio, and the Culture Councilor of
Provincia di Genova Giorgio Devoto, whose support has been of vital importance for the
creation and maturation of the project of Casa Paganini project.
We wish to thank Professor Gianni Vernazza, Head of the Faculty of Engineering,
Professor Riccardo Minciardi, Director of the DIST-University of Genova, the colleagues
Lauro Magnani and Franco Sborgi, Professors at the University of Genova; Patrizia
Conti - Director of the Music Conservatory “Niccolò Paganini”; Sandra Solimano -
Director of the Museum of Contemporary Art “Villa Croce”; Teresa Sardanelli - Head of
the Direzione Cultura e Promozione della Città of Comune di Genova and Anna Rita
Certo - Head of the Ufficio Paganini of Comune di Genova; Pietro Borgonovo - Artistic
Director of GOG - Giovine Orchestra Genovese; Enrico Bonanni and Maria Franca
Floris of the Dipartimento Ricerca, Innovazione, Istruzione, Formazione, Politiche
Giovanili, Cultura e Turismo of Regione Liguria; Roberta Canu - Director of Goethe-
Institut Genua; Vittorio Bo and Manuela Arata - Directors of Festival della Scienza;
Francesca Sivori - Vice-President of the Centro Italiano Studi Skrjabiniani; Andrea
Masotti and Edoardo Lattes - Casa della Musica; Giorgio De Martino – Artistic Director
of Fondazione Spinola; Laura Santini of Mentelocale.
Finally, we thank the whole staff of InfoMus Lab – Casa Paganini for their precious help
and the hard work in the organization of the conference.
Stefania Serafin
NIME 08 Program Chair
VI
Table of Contents
PAPERS 1
_____________________________________________________________________
Session 4: Instruments 1
Jyri Pakarinen, Vesa Välimäki, Tapio Puputti
Slide guitar synthesizer with gestural control ............................................................ 49
VII
Otso Lähdeoja
An Approach to Instrument Augmentation: the Electric Guitar.................................. 53
Juhani Räisänen
Sormina - a new virtual and tangible instrument ....................................................... 57
Edgar Berdahl, Hans-Christoph Steiner, Collin Oldham
Practical Hardware and Algorithms for Creating Haptic Musical Instruments ........... 61
Amit Zoran, Pattie Maes
Considering Virtual & Physical Aspects in Acoustic Guitar Design ........................... 67
Session 5: Instruments 2
Dylan Menzies
Virtual Intimacy : Phya as an Instrument .................................................................. 71
Jennifer Butler
Creating Pedagogical Etudes for Interactive Instruments ......................................... 77
VIII
Saturday, June 7, 2008
POSTERS 173
_____________________________________________________________________
IX
Nicolas Bouillot, Mike Wozniewski, Zack Settle, Jeremy R. Cooperstock
A Mobile Wireless Augmented Guitar ....................................................................... 189
Robert Jacobs, Mark Feldmeier, Joseph A. Paradiso
A Mobile Music Environment Using a PD Compiler and Wireless Sensors .............. 193
Ross Bencina, Danielle Wilde, Somaya Langley
Gesture ≈ Sound Experiments: Process and Mappings ........................................... 197
Miha Ciglar
“3rd. Pole” - a Composition Performed via Gestural Cues ........................................ 203
Kjetil Falkenberg Hansen, Marcos Alonso
More DJ techniques on the reactable ....................................................................... 207
Smilen Dimitrov, Marcos Alonso, Stefania Serafin
Developing block-movement, physical-model based objects for the Reactable ....... 211
Jean-Baptiste Thiebaut, Samer Abdallah, Andrew Robertson
Real Time Gesture Learning and Recognition: Towards Automatic Categorization . 215
Mari Kimura
Making of VITESSIMO for Augmented Violin:
Compositional Process and Performance ................................................................ 219
Joern Loviscach
Programming a Music Synthesizer through Data Mining .......................................... 221
Kia Ng, Paolo Nesi
i-Maestro: Technology-Enhanced Learning and Teaching for Music ........................ 225
X
Ioannis Zannos, Jean-Pierre Hébert
Multi-Platform Development of Audiovisual and Kinetic Installations........................ 261
Greg Corness
Performer model: Towards a Framework for Interactive Performance
Based on Perceived Intention ................................................................................... 265
Paulo Cesar Teles, Aidan Boyle
Developing an “Antigenous” Art Installation
Based on A Touchless Endo-system Interface ......................................................... 269
Silvia Lanzalone
The ‘suspended clarinet’ with the ‘uncaused sound’.
Description of a renewed musical instrument ........................................................... 273
Mitsuyo Hashida, Yosuke Ito, Haruhiro Katayose
A Directable Performance Rendering System: Itopul ............................................... 277
William R. Hazlewood, Ian Knopke
Designing Ambient Musical Information Systems ..................................................... 281
XI
Anders Vinjar
Bending Common Music with Physical Models ........................................................ 335
Margaret Schedel, Alison Rootberg, Elizabeth de Martelly
Scoring an Interactive, Multimedia Performance Work ............................................. 339
DEMOS1 343
_____________________________________________________________________
1
These are the contributions accepted as demos. The demo program also includes nine further demos associated
to papers and posters.
XII
PERFORMANCES 375
_____________________________________________________________________
Jane Rigler
Traces/Huellas (for flute and electronics) ................................................................. 395
XIII
Renaud Chabrier, Antonio Caporilli
Drawing / Dance ....................................................................................................... 396
Joshua Fried
Radio Wonderland .................................................................................................... 397
Silvia Lanzalone
Il suono incausato, improvise-action
for suspended clarinet, clarinettist and electronics (2005) ........................................ 398
Luka Dekleva, Luka Prinčič, Miha Ciglar
FeedForward Cinema ............................................................................................... 399
Greg Corcoran, Hannah Drayson, Miguel Ortiz Perez, Koray Tahiroglu
The Control Group .................................................................................................... 400
Nicolas d'Alessandro
Cent Voies ................................................................................................................ 401
Cléo Palacio-Quintin, Sylvain Pohu
Improvisation for hyper-flute, electric guitar and real-time processing ...................... 402
Nicolas d'Alessandro, Sylvain Pohu
Improvisation for Guitar/Laptop and HandSketch ..................................................... 403
Ajay Kapur
Anjuna's Digital Raga ............................................................................................... 404
Jonathan Pak
Redshift .................................................................................................................... 405
INSTALLATIONS 407
_____________________________________________________________________
Olly Farshi
Habitat ...................................................................................................................... 409
Jeff Talman
Mirror of the moon .................................................................................................... 410
Joo Youn Paek
Fold Loud.................................................................................................................. 411
Kenneth Newby, Aleksandra Dulic, Martin Gotfrit
in a thousand drops... refracted glances .................................................................. 412
Jared Lamenzo, Mohit Santram, Kuan Huan, Maia Marinelli
Soundscaper ............................................................................................................ 413
Pasquale Napolitano, Stefano Perna, Pier Giuseppe Mariconda
SoundBarrier_ .......................................................................................................... 414
Art Clay, Dennis Majoe
China Gates.............................................................................................................. 415
XIV
WORKSHOPS 417
_____________________________________________________________________
Kia Ng
4th i-Maestro Workshop on Technology-Enhanced Music Education ....................... 419
Michael Zbyszyński
Tablet Workshop for Performers and Teachers ........................................................ 421
R. Benjamin Knapp, Marcelo Wanderley, Gualtiero Volpe
Techniques for Gesture Measurement in Musical Performance ............................... 423
Alexander Refsum Jensenius, Timothy Place, Trond Lossius,
Pascal Baltazar, Dave Watson
Jamoma Workshop ................................................................................................... 425
XV
Papers
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
creation of network instruments, generative works, the employed would greatly inhibit the ability of participants t o
integration of musical play within social networks and the distinguish their own musical contribution much less be able
creation of immersive environments. to engage in meaningful dialog with others. Nevertheless,
through their participation, listeners were able to build a
community brought together by the exploration of the
network infrastructure. This would suggest that the goals of
2.1 Network Instruments Radio Net were not so much the participation in dialog but
Amongst some of the earliest work to utilize rather the playful exploration of a network environment.
telecommunication networks for artistic purposes are Max Like these earlier works, one of Neuhaus’s most recent projects
Neuhaus’s radio projects. Between 1966 and 1977, Neuhaus Auracle (2004) [15] adopts a similar network infrastructure
produced a series of works, which he termed “Broadcast although in this case the network no longer exists over radio
Works” in which the musical outcome is dependent upon the transmissions but rather the internet. In Auracle, participants
active responses of the audience. In the earliest of these works form ensembles and collectively modify an audio stream
Public Supply I (1966), radio listeners were asked to call in t o broadcast by a server through the use of their voice. In a
a radio station and produce any sounds they wanted. Neuhaus similar manner to the Broadcast Works the resultant sounds of
then mixed the incoming signals to produce the musical Auracle are affected by the proficiency of the participants but
results. Neuhaus has written of these works - “...it seems that also by network latency. Network latency, a manifestation of
what these works are really about is proposing to reinstate a temporal dislocation, is often considered a technical handicap
kind of music which we have forgotten about and which i s for performers who wish to collaborate over the internet, but i t
perhaps the original impulse for music in man: not making a is a key aesthetic consideration in the work of many
musical product to be listened to, but forming a dialogue, a composers who exploit it in the creation of unique musical
dialogue without language, a sound dialogue.” [16] The environments. While latency is minimized in Auracle by the
intention is strikingly similar to that expressed by Tanaka - system architecture employed, it nevertheless clearly
“In classical art making, we have been mostly concerned with distinguishes the relationships participants form with the
making objects - nodes - is (sic.) time to think about the audio stream and through that with other ensemble members
process of creating connections?” [22] and like much of from those traditional relationships that exist between
Tanaka’s network-based projects, Neuhaus’s work exists as an performers and their instruments.
environment which promotes the agency of its participants
through the initiation and development of musical dialogs. In Just as in the Broadcast Works Neuhaus regards Auracle not as
Public Supply I, however, Neuhaus mediates those a self-contained musical work in itself but as a collective
relationships through the mixing process, reinforcing instrument or musical architecture through which participants
musically interesting dialogs while downplaying those of less develop relationships through musical dialog. As implied
appeal. above, those dialogs are necessarily mediated by the design of
the instrument itself. The algorithm used to extract control
In a later realization of Neuhaus’s project, listeners from across features from the sonic input is not made explicit and the
the Eastern United States were asked to call in and whistle a ability of participants to shape the audio stream with any
single pitch for as long as they were able. The work, entitled degree of nuance is quite limited. Further, there is little direct
Radio Net, was produced in cooperation with National Public indication as to how particular gestures modify the audio
Radio. Unlike Public Supply I, in this work Neuhaus did not stream. While this would seem to inhibit the ability of
mix the responses live but rather, devised an automated participants to engage in meaningful dialog with other
mixing system in which the output switched between various participants, it does reinforce the fact that like any instrument,
input signals based on the pitch of the input sounds. The Auracle has its own idiosyncrasies.
input whistles were also subject to electronic transformation
as the sounds looped from one broadcast station to another. In comparison with the Broadcast Works, the use of an
While Radio Net’s realization was perhaps of more interest t o interface also represents an important distinction. Existing as
its participants than a passive audience, and despite the fact the window through which the environment is explored and
that some thousands of listeners participated in the realization dialogs with ensemble members are developed, of immediate
of the work, the result as realized in its only 1977 performance note is its simplicity. With the screen divided into discrete
was coherent, subtle, and at times quite beautiful [14]. sections representing the geographical location of
participants, the musical contributions of ensemble members
To the extent that Radio Net was developed as an environment is graphically represented by simple lines. Basic control
within which musical dialogs could be formed and developed, functions allow participants to record brief audio samples
the work does present a number of themes which we will see which transform the audio stream. While control functions are
taken up in various forms in most subsequent network-based simple they are a necessary consequence of the work’s open
music. These include the role of the agency of others i n environment. The interface design also enables ensemble
conditioning one’s own play, the degree to which dialogs are members to more clearly distinguish their own musical
mediated by the mechanism’s of the network, the public vs. contributions from those of other members.
private space of performance, the degree to which the dialogs
enabled represent truly unique ways of communicating and the
new role of the composer as a designer of a musical
environment rather than a creator of self-contained musical
work. Rather than attempt to address the extent to which all
these themes are addressed in Neuhaus’s Broadcast Works, let
us for now comment on the question of agency. The network
infrastructure of Radio Net and the transformational processes
4
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
5
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Through immediate visual and aural feedback, participants of Mobile Music” (2004) is a good example of this more recent
Graph Theory are clearly able to discern their actions and direction. Using specially modified mobile communication
evaluate them in the context of previous and future decisions. devices equipped with physical sensors that measure both the
They are also able to compare their choices with those of pressure with which the device is held and its movement i n
others through a simple “popularity” index which rates the space, participants are able to collectively remix a popular
frequency with which subsequent cells are chosen. The choices song chosen by the members of the network [23]. Various
made are given a further complexity in that they contribute t o audio transformations such as time stretching and sampling
a more global index used to create a score for live performance. can be applied, and rhythmic patterns and sequences can be
Participants thus contribute to two distinct levels of generated from the original source material through various
performance - the private space performance that takes place built-in software modules. Just as in open form works, these
within their own immediate interaction with the network, and transformations can be applied in any order and the various
the public space performance which results from the collective contributions of each group member become an individual
play of many participants. track in the master mix. The physical proximity of the
participants which is determined through a GPS system is also
used to affect the dynamic balance of the resultant mix directly
correlating social proximity with musical presence. The results
of the remixing and transformations are broadcast to all
participants. More overtly than Neuhaus’s Auracle, Tanaka’s
instrument creates immediate collaborative relationships and
communities through the virtual environment of the network
technology employed. The “Malleable Mobile Music” project
has recently been employed in a new interactive work,
Net_Dérive, for mobile, wearable communication devices. This
latter project was produced in collaboration with Petra
Gemeinboeck and received its premiere in Paris in 2006 [25].
6
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
motion of the head-mounted displays. Consisting of sixteen Theory. Given the responsibility assumed of participants, the
channels of spatialized sound, the sounds complement the composer or designer of that environment must also assume
images generated and provide easily discerned sonic cues that some responsibility for the quality of those relationships that
help establish cooperative relationships between the emerge. Dobrian goes further and states that in a collective
participants. Ecstasis defines clear goals for its participants performance it is up to the composer to develop an
and rewards their explorations with a greater understanding of environment within which compelling work can take place [9]
their environment. Through the environmental space that the while Tanaka has stated that interesting results can only be
work presents, Ecstasis becomes a catalyst for collective achieved by developing interesting processes [2007, pers.
individuation [12]. As its participants decode their comm. April]. Bryan-Kinns and Healey have even shown that
environment and come to a greater collective awareness it i s the effect of decay within a collective instrument significantly
clear that the disjunctions between interface and environment affects how participants engage with that environment [5]. As
and public and private performance spaces are no longer we have seen in the work of Neuhaus and Freeman, interface
sustainable. design is of critical importance in conditioning the ways i n
which processes, environments and relationships are able to be
explored while in Tanaka’s Global String, haptic feedback is a
critical component in the development of meaningful play.
Indeed, as has become evident, democratized performance
spaces can only be realized through carefully considered
interface design.
Transparent interface design also facilitates the ability of
participants to surrender to their environment rather than have
to decode the means through which it is presented. How that
environment responds to their own agency is of especial
importance. As noted by Phillips and Rabinowitz,
...when the audience expects instant response, asks the piece
for self-affirmation or affirmation of a learned behavior, the
effect closes down what the piece means to open up.
Collaborative art asks for something as complex as inspired
Figure 5. A screenshot from Ecstasis. surrender and must elicit recognition, building from
reflection. That moment of self-regard should then develop
into more complicated correspondences. Otherwise, the piece
3. AESTHETIC THEMES can veer toward superficiality and rely on what we call a
While each network project examined posits its own aesthetic “supermarket door process of interactivity”: I walked up to i t
questions, they all share a number of common concerns. These and it opened’ I have power [17].
range from questions regarding the democratized performance
space which network-based work promotes, through t o While technology has not fundamentally changed the defining
questions provoked by the technology through which these characteristics of collaborative art forms, it has certainly
works are sustained. Some of these questions include mediated them in distinctive ways. In some environments,
consideration of how the spatial and temporal aesthetics of such as in Metraform’s Ecstasis, this has brought about
network technologies mediate collaborative relationships [11] unique modes of engagement while in other projects network
while others make overt the influence of interface design in the latency has produced collective instruments the aesthetics of
promotion of democratized performance environments. which are founded on immediacy and extended reflection [24].
Of defining character, of course, are the spatial and temporal
Given the creative role participants play in exploring their properties of the network infrastructure or topology. While
musical environments, the role of the composer has largely these are able to be exploited to musical effect, it is perhaps
become transformed to that of designer while the traditional counterintuitive that spatial disjunction and temporal
role of the performer has been subsumed by that of player. To a dislocation can also perhaps serve to facilitate a greater
certain extent this situation is paralleled in traditional open- awareness of agency and collective becoming.
form works in which composers design open musical
environments which serve to facilitate an awareness of process
and collective becoming. All network-based musical works
posit environments within which relationships between 4. SUMMARY
participants are facilitated and developed. The directives The democratized performance spaces that network-based
which determine the extent to which these environments can musical environments supports are a natural response to the
be explored and relationships developed differs from musical and social ideals that motivated the work of an earlier
composer to composer and from project to project. While generation of composers for whom such technology did not
artists such as Tanaka and Neuhaus encourage collaborative exist. These technologies have brought about new modes of
relationships and dialogs to be openly explored within the awareness of individual agency and of the creative
boundaries of their environments, other artists such as relationships that can emerge with others through the playful
Metraform, and Freeman adopt a less open approach and exploration of the architectures that sustain musical
predefine particular social goals through and for their work. In collaboration. The aesthetic features unique to the genre
Metraform’s Ecstasis, as we have seen, this took the form of an emphasize the challenges of fully engaging participants i n
improved environmental understanding while the creation of a collaborative processes and moving participants beyond the
performable work was an explicit goal of Freeman’s Graph easy solution of falling back on what Cage has referred to as
superficial habits [6]. These challenges are amply rewarded,
7
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
however, by the exciting potential of network music to create [12] Massumi, B. The Political Economy of Belonging. In
unique forms of musical expression and new modes of musical Parables for the Virtual: Movement, Affect, Sensation.
agency and engagement and in doing so to transcend the Duke University Press, Durham, NC, 2002.
network architectures that make such dialogs and [13] Metraform. Ecstasis. 2004, Viewed December 2006,
relationships possible. <http://www.metraform.com>.
[14] Neuhaus, M. Radio Net. 1977. Available at
5. ACKNOWLEDGMENTS <http://www.ubu.com/sound/neuhaus_radio.html>.
I am grateful to Jason Freeman, Lawrence Harvey and Atau [15] Neuhaus, M., Freeman, J., Ramakrishnan, C., Varnick, K.,
Tanaka for providing further information on their work. I Burk, P., and Birchfield, D. Auracle. 2006, Viewed June
would also like to thank John Dack for generously providing a 2006, <http://auracle.org>.
copy of his article on the Scambi project.
[16] Neuhaus, M. The broadcast works and Audium. 2007,
Viewed January 2007, <http://www.max-neuhaus.info>.
6. REFERENCES [17] Phillips, L., and Rabinowitz, P. On collaborating with an
[1] Barbosa, A. Displaced soundscapes: a survey of network audience. Collaborative Journal, 2006, Viewed January
systems for music and sonic art creation. Leonardo Music 2006, <http://www.artcircles.org/id85.html>.
Journal, vol. 13, 2003, 53-59. [18] Rebelo, P. Network performance: strategies and
[2] Broeckmann, A. Reseau/Resonance - connective processes applications. Presentation at the 2006 International
and artistic practice. Artmedia VIII, 2002, Viewed March Conference on New Interfaces for Musical Expression
2007, (NIME06), Paris, 2006, Viewed March 2007,
<http://www.olats.org/projetpart/artmedia/2002eng/te_a <http://www.sarc.qub.ac.uk/~prebelo/index>
Broeckmann.html>. [19] Salen, K., and Zimmerman, E. Rules of Play: Game Design
[3] Brown, E. Form in new music. Darmstadter Beitrager, vol. Fundamentals. MIT Press, Cambridge, MA, 2004.
10, 1965, 57-69. [20] Souza e Silva, A. Art by telephone: from static to mobile
[4] Brün, H. When music resists meaning: the major writings interfaces. Leonardo Electronic Almanac, vol. 12, no. 10,
of Herbert Brün. Ed. A Chandra, Wesleyan University 2004.
Press, Middletown, CN, 2004. [21] Stockhausen, K. ...how time passes.... Trans. C Cardew, Die
[5] Bryan-Kinns, N. and Healey, P.G.T. Decay in collaborative Reihe, 3 (1959), Bryn Mawr, PA, 10-40.
music making. In Proceedings of the 2006 International [22] Tanaka, A. Seeking interaction, changing space. In
Conference on New Interfaces for Musical Expression Proceedings of the 6th International Art +
(NIME06), Paris, 2006, 114-117. Communication Festival 2003, Riga, Latvia, 2003, Viewed
[6] Cage, J. Soundings: investigation into the nature of July 2006, <http://www.csl.sony.fr/~atau/>.
modern music. Neuberger. [23] Tanaka, A. Mobile music making. In Proceedings of New
[7] Chew, E, Kyriakakis, C., Papadopoulos, C., Sawchuk, A. A., Interfaces for Musical Expression 2004 Conference,
and Zimmermann, R. From remote media immersion to Hamamatsu, 2004, 154-156.
distributed immersive performance. In Proceedings of the [24] Tanaka, A. Global String. 2005, Viewed July 2006,
2003 ACM SIGMM Workshop on Experiential <www.sensorband.com/atau/globalstring/globalstring.pd
Telepresence, Berkeley, CA, 2003, 110-120. f>.
[8] Dack, J. “Open” forms and the computer. In Musiques, [25] Tanaka, A., Gemeinboeck, P., and Momeni, A. Net_Dérive,
Arts, Technologies: Towards a Critical Approach. a participative artwork for mobile media. In-press, 2007.
L’Harmattan, Paris, 2004, 401-412.
[26] Turbulence.org. Viewed January 2006,
[9] Dobrian, C. Aesthetic considerations in the use of <http://www.turbulence.org>.
“virtual” music instruments. In Proceedings of the 2003
International Conference on New Interfaces for Musical [27] Weinberg, G. Interconnected musical networks: toward a
Expression (NIME03), Montreal, 2003, 161-163. theoretical framework. Computer Music Journal, 29:2,
2005, 23-39.
[10] Haubenstock-Ramati, R. Notation - material and form.
Perspectives of New Music, Vol. 4, No. 1, 1965, 39-44. [28] Wolff, C. Open to whom and to what. Interface, 16/3,
1987, 133-141.
[11] Leman, M. Embodied music cognition and mediation
technology. MIT Press, Cambridge, MA, 2007.
8
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2. BACKGROUND TOPICS
Permission to make digital or hard copies of all or part of this work for 2.1 Sound Objects
personal or classroom use is granted without fee provided that copies are Community-driven creation, results in a holistic process, i.e., its
not made or distributed for profit or commercial advantage and that properties cannot be determined or explained by the sum of its
copies bear this notice and the full citation on the first page. To copy components alone [7]. A community of users involved in a
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
NIME08, June 4-8, 2008, Genova, Italy
Copyright remains with the author(s).
9
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
creation process, through a Shared Sonic Environment, definitely this topic, even though they were scattered over different panels,
constitutes a Whole in Holistic sense. instead of one distinct session.
According to Jan Smuts (1870-1950), the father of Holism Since then the term Networked Music has become increasingly
Theory, the concept of a Whole implies its individual parts to be consensual in defining the area, and according to Jason Freeman’s
flexible and adjustable. It must be possible for the part to be definition [12]: it is about music practice situations where
different in the whole from what it is outside the whole. In traditional aural and visual connections between participants are
different wholes a part must be different in each case from what it augmented, mediated or replaced by electronically-controlled
is in its separate state. connections.
Furthermore, the whole must itself be an active factor or influence In order to have a broad view over the scientific dissemination of
among individual parts, otherwise it is impossible to understand Networked Music research I present some of the most significant
how the unity of a new pattern arises from its elements. Whole Landmarks in the field over the last decade:
and parts mutually and reciprocally influence and modify each
other. 2.2.1 Summits and Workshops
Similarly, when questioning object’s behaviors in Physics it is The ANET Summit (August 20-24, 2004)
often by looking for simple rules that it is possible to find the The summit was organized by Stanford University’s Center for
answers. Once found, these rules can often be scaled to describe Computer Research in Music and Acoustics (CCRMA) and held
and simulate the behavior of large systems in the Real World. at the Banff Center in Canada, was the first Workshop event
This notion applies to the Acoustic Domains through the addressing the topic of High quality Audio over Computer
definition of Sound Objects as a relevant element of the music Networks. The guest lecturers were Chris Chafe, Jeremy
creation process by Pierre Schaeffer in the 1960’s. According to Cooperstock, Theresa Leonard, Bob Moses and Wieslaw
Schaeffer, a Sound Object is defined as: Woszczyk. A New edition of the ANET Summit is planed for
April 2008
“Any sound phenomenon or event perceived as a coherent whole
(…) regardless of its source or meaning” (Schaeffer, P., 1966). The Networked Music Workshop at ICMC (September 4, 2005).
Sound Object (I’object sonore), refers to an acoustical object for This Workshop was held in Barcelona and resulted from
human perception and not a mathematical or electroacoustical experience in previous ICMCs, which called for the need to
object for synthesis. One can consider a sound object the smallest realize such an event. Guest Lecturers were: Álvaro Barbosa
self-contained particle of a Soundscape [8]. Defining a universe of (Pompeu Fabra University, MTG), Scot Gresham-Lancaster
sound events by subsets of Sound Objects is a promising approach (Cogswell College Sunnyvale, CA), Jason Freeman (Georgia
for content-processing and transmission of audio [9], and from a Institute of Technology), Ross Bencina (Pompeu Fabra
psychoacoustic and perceptual point of view it provides a very University, MTG).
powerful paradigm to sculpt the symbolic value conveyed by a
Soundscape. 2.2.2 PhD Dissertations
These are some relevant dissertations published on the topic:
In an artistic context the scope for the user’s personal
interpretation is wider. Therefore such Sound Objects can have a 2002 Golo Föllmer “Making Music on the Net, social and
much deeper symbolic value and represent more complex aesthetic structures in participative music” [13]; 2002 Nathan
metaphors. Often there is no symbolic value in a sound, but once Schuett “The Effects of Latency on Ensemble Performance” [14];
there is a variation in one of its fundamental parameters it might 2003 Jörg Stelkens “Network Synthesizer” [15]; 2003 Gil
then convey a symbolic value. Weinberg “Interconnected Musical Networks: Bringing
Expression and Thoughtfulness to Collaborative Music” [16];
All these ideas about Sound Objects and the Holistic nature of
2006 Álvaro Barbosa “Displaced Soundscapes” [17]
community music are the basis for the main concept behind the
Public Sound Objects System. In fact, in PSOs raw material
provided for each user, to create his contribution to a shared 2.2.3 Journal Articles
musical piece, is a simple Sound Object. These Sound Objects, There is a number of Survey and partial overview articles on the
individually controlled, become part of a complex collective topic of Networked Music [18], [19], [20] [21] and [22] however
system in which several users can improvise simultaneously and a special issue of the journal Organised Sound from 2005 [23],
concurrently. edited by Leigh Landy, specifically focused on the topic of
Networked Music and includes many of the relevant references in
In the system a server-side real-time sound synthesis engine (a this area.
Disklavier Piano in the case of the Casa da Musica installation)
provides an interface to transform various parameters of a Sound
Object, which enables users to add symbolic meaning to their 3. THE PSOs INSTALLATION
performance. Casa da Musica is the main concert venue in the city of Porto,
and it has a strong activity in what concerns contemporary and
2.2 About Networked Music experimental forms of Music. The commission for the Public
In his Keynote Speech from ICMC 2003 Roger Dannenberg Sound Objects Installation had the underlying idea of bringing
mentioned “Networked Music” as one of the promising research music to the hallways of the house of music, so that the visitors
topics and at least four papers [2], [10] and [11] were centered on could actually interact with it.
10
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
t1 t2 t3
R
Fig. 3 A PSOs Client with the ETHERSOUND Hub, Speakers
and Keyboard concealed on the structure. L
All the computer hardware for the server and clients has been
cloaked by a metal structure created in coherence with the 't 2 't 3
11
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The basic idea consists of only transmitting a sound object trough [9] Amatriain, X. and Herrera, P. Transmitting Audio Content as Sound
the Right Channel of the streamed Soundscape stereo mix, when a Objects. 15-6-2002. Proceedings of the AES22 International
ball hits the right wall, transmitting only through the Left Channel Conference on Virtual, Synthetic and Entertainment Audio
when a ball hits the left wall and transmitting in booth channels [10] Stelkens, J. peerSynth: A P2P Multi-User Software with new
(L+R) if the ball hits the top or bottom wall. techniques for integrating latency in real time collaboration. 2003.
Proceedings of the International Computer Music Conference
Sound Panorama Adjustment adds an extra cue to perception in
(ICMC2003)
temporal order of triggered Sound Objects and respective
correlation to ball impacts. [11] Obu, Y., Kato, T. and Yonekura, T. M.A.S.: A Protocol for a
Musical Session in a Sound Field Where Synchronization between
Musical Notes is no garanteed. 2003. International Computer Music
4. CONCLUSIONS AND FUTURE WORK Association. Proceedings of the International Computer Music
The PSOs Installation at Casa da Musica allows a piano to be Conference (ICMC2003), Singapore
controlled by 10 instances simultaneously (Ten Hands!) in a
[12] Freeman, J. The Networked Music Workshop at ICMC 2005,
coherent and constructive manner, which would hardly be Barcelona (September 4, 2005)
possible to do in a traditional way.
[13] Föllmer, G. 2002 Making Music on the Net, social and aesthetic
Even though the interface is radically different than the normal structures in participative music. Ph.D. Thesis, Martin Luther
control paradigm of a piano it is based on the same fundamental Universität Halle-Wittenberg – Germany
musical facets (Rhythm, Pitch, Timbre and Dynamic) and [14] Schuett, N. 2002 The Effects of Latency on Ensemble Performance.
therefore it is an engaging experience, since the users recognize a Ph.D. Thesis, Stanford University, California – USA
familiar result achieved trough a totally different way.
[15] Stelkens, J. 2003 Network Synthesizer. Ph.D. Thesis, Ludwig
The interface is simple enough to achieve a musical soundscape Maximilians Universität, München – Germany
with zero learning time and without any previous musical practice [16] Weinberg, G. 2003 Interconnected Musical Networks – Bringing
experience, which made the system very accessible and popular Expression and Thoughtfulness to Collaborative Music Making.
for the average 500 daily visitors of the Casa da Musica. Ph.D. Thesis, Massachusetts Institute of Technology, Massachusetts
– USA
Controlling a popular acoustical instrument brings the users closer
to the musical experience and in this sense we would like to [17] Barbosa, A. 2006 Displaced Soundscapes: Computer Supported
Cooperative Work for Music Applications. Ph.D. Thesis, Pompeu
further develop this system adding a pool of instruments to the
Fabra University, Barcelona – Spain
piano, such as wind, string and percussion instruments controlled
by Robotics. [18] Sergi Jordà, S. 1999 Faust Music On Line (FMOL): An approach to
Real-time Collective Composition on the Internet, Leonardo Music
Journal, Volume 9, pp.5-12
5. ACKNOWLEDGMENTS [19] Tanzi, D. 2001 Observations about Music and Decentralized
The author would like to thank the people that collaborated in this Environments, Leonardo Music Journal, Volume 34, Issue 5, pp.431-
project: Jorge Cardoso (UCP), Jorge Abade (UCP) and Paulo 436
Maria Rodrigues (Casa da Musica).
[20] Barbosa, A. 2003 Displaced Soundscapes: A Survey of Network
Systems for Music and Sonic Art Creation, Leonardo Music Journal,
6. REFERENCES Volume 13, Issue 1, pp.53-59
[1] Barbosa, A. and Kaltenbrunner, M. Public Sound Objects: A shared [21] Weinberg, G. 2005 Interconnected Musical Networks: Toward a
musical space on the web. 2002. IEEE Computer Society Press. Theoretical Framework, Computer Music Journal, Vol. 29, Issue 2,
Proceedings of International Conference on Web Delivering of pp.23-29
Music (WEDELMUSIC 2002) - Darmstadt, Germany
[22] Traub, P. 2005 Sounding the Net: Recent Sonic Works for the
[2] Barbosa, A., Kaltenbrunner, M. and Geiger, G. Interface Decoupled Internet and Computer Networks, Contemporary Music Review, Vol.
Applications for Geographically Displaced Collaboration in Music. 24, No. 6, December 2005, pp. 459 – 481
2003. Proceedings of the International Computer Music Conference
(ICMC2003) [23] Landy, L. 2005 Organised Sound 10 (Issue 3), Cambridge University
Press, U.K. (OS: ISSN: 1355-7718)
[3] Barbosa, A., Cardoso, J. and Geiger, G. Network Latency Adaptive
Tempo in the Public Sound Objects System. 2005. Proceedings the [24] Wright, M. and Freed, A. 1997 Open Sound Control: A New
International Conference on New Interfaces for Musical Expression Protocol for Communicating with Sound Synthesizers, proceedings
(NIME 2005); Vancouver, Canada. of the International Computer Music Conference
[4] Puckette, M. Pure Data. 269-272. 1996a. International Computer [25] Nella, M. J. Constraint Satisfaction and Debugging for
Music Association. Proceedings of the International Computer Interactive User Interfaces. Ph.D. Thesis, University of
Music Conference, San Francisco (ICMC96) Washington, Seattle, WA, 1994.
[5] Yamaha Disklavier Piano:
http://www.yamaha.co.jp/english/product/piano/product/europe/dl/dl
.html (Cunsulted 2008/01/30)
[6] ETHERSOUND:
http://www.ethersound.com/ (Cunsulted 2008/01/30)
[7] Smuts, J. Holism and Evolution. 1926. Macmillan, London UK
[8] Schaeffer, P., Traité des Objets Musicaux., Le Seuil, Paris, 1966
12
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Keywords
sonic navigation, mobile music, spatial interaction, wireless
audio streaming, locative media, collaborative interfaces
13
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
towards a virtual microphone that captures and busses the spaces emerge that can operate in a potentially unbounded
sound to a loudspeaker where it is heard. The result is a physical space. These offer many novel possibilities that can
technique of spatial signal bussing, which lends particularly lead to new artistic approaches; or they can re-contextualize
well to many common mixing operations. Gain control, for existing concepts that can then be revisited and expanded
instance, is accomplished by adjusting the distance between upon. An excellent example is parade music, where sound
two nodes in the scene, while filter parameters can be con- emission is spatially dynamic or mobile; a passive listener
trolled by changing orientations. remains in one place while different music is coming and
The paradigm of organizing sound processing in three- going. One hundred years ago, Charles Ives integrated this
dimensional space has been explored in some of our previous concept into symphonic works, where different musical ma-
publications [27, 28, 26]. We have seen that users easily un- terial flowed through the score, extending our notions of
derstand how to interact with these scenes, especially when counterpoint to include those based on proximity of mu-
actions are related to every-day activity. For instance, it is sical material. The example of parade music listening ex-
instantly understood that increasing the distance between pands to include two other cases: a mobile listener can walk
two virtual sound elements will decrease the intensity of with or against the parade, yielding additional relationships
the transmitted signal, and that pointing a sound source to the music. Our work also integrates the concept of ac-
in a particular direction will result in a stronger signal at tive listening; material may be organized topographically
the target location. We have designed and prototyped sev- in space, produced by mobile performers and encountered
eral applications using these types of interaction techniques, non-linearly by mobile listeners. From this approach come
including 3-D mixing, active listening, and using virtual ef- several rich musical forms, which like sculpture, integrate
fects racks [27, 29]. Furthermore, we began to share virtual point of view ; listeners/observers create their own unique
scenes between multiple participants, each with subjective rendering. Thus, artists may create works that explore the
audio rendering and steerable audio input, allowing for the spatial dynamics of musical experience, where flowing mu-
creation of virtual performance venues and support for vir- sic content is put in counterpoint by navigation. Musical
tual reality video conferencing [31]. scores begin to resemble maps, and listeners play a larger
While performers appreciated the functionality of these role in authoring their experiences.
earlier systems, they were nevertheless hampered by con-
straints on physical mobility. These applications operated 1.3 Related Work
mainly with game-like techniques, where users stood in front With respect to collaborative musical interfaces, Blaine
of screens, and navigated through the scene using controllers and Fels provide an overview of many systems, classifying
such as joysticks or gamepads. The fact that the gestures for them according to attributes such as scale, type of media,
moving and steering sound were abstracted through these amount of directed interaction, learning curve, and level
intermediate devices resulted in a lack of immersive feeling of physicality, among others [7]. However, most of these
and made the interfaces more complicated to learn. systems rely on users to be in a relatively fixed location in
We thus decided to incorporate more physical movement, front of a computer. The move to augmented- or mixed-
for example, sensing the user’s head movement with an ori- reality spaces seems like a natural evolution, offering users
entation sensor attached to headphones, and applying this a greater level of immersion in the collaboration, and their
to affect changes to the apparent spatial audio rendering. respective locations can be used for additional control.
To further extend this degree of physical involvement we be- In terms of locative media, some projects have considered
gan to add real-world location awareness to the system, al- the task of tagging geographical locations with sound. The
lowing users to move around the space physically instead of [murmur] project [2] is one simple example, where users
virtually. For example, our 4Dmix3 installation [4] tracked tag interesting locations with phone numbers. Others can
up to six users in an 80m2 gallery space. The motion of call the numbers using their mobile phones and listen to
each user controlled the position of a recording buffer, which audio recordings related to the locations. Similarly, the
could travel among a number of virtual sound generators in Hear&There project [20] allows recording audio at a given
the scene. The result was a type of remixing application, GPS coordinate, while providing a spatial rendering of other
where users controlled the mix by moving through space. recordings as users walk around. Unfortunately, this is lim-
In the remainder of this paper, we explore the use of ited to a single-person experience, where the state of the
larger scale position tracking, such as that of a Global Po- augmented reality scene is only available on one computer.
sitioning System (GPS), and the resulting challenges and Tanaka proposed an ad-hoc (peer-to-peer) wireless network-
opportunities that such technology presents. We evolve ing strategy to allow multiple musicians to share sound si-
our framework to support a more distributed and mobile- multaneously using hand-held computers [22]. Later work
capable architecture, which results in the need for wireless by Tanaka and Gemeinboeck [23] capitalized on location-
audio streaming and the distribution of information about based services available on 3G cellular networks to acquire
the mobile participants. Sections 2 and 3 describe the addi- coarse locations of mobile devices. They proposed the cre-
tional technical elements that need to be introduced into the ation of locative media instruments, where geographic local-
system to support wireless and mobile applications, while ization is used as a musical interface.
Section 4 demonstrates a prototypical musical application Large areas can also be used for musical interaction in
using this new architecture. Musicians in the Mobile Au- other ways. Sonic City [16] proposed mobility, rather than
dioscape are able to navigate through an outdoor environ- location, alone, for interaction. As a user walks around a
ment containing a superimposed set of virtual audio ele- city, urban sounds are processed in real time as a result of
ments. Real physical gestures can be used to steer and readings from devices such as accelerometers, light sensors,
move sound through the space, providing an easily under- temperature sensors, and metal detectors. Similarly, the
stood paradigm of interaction in what can now be thought Sound Mapping [19] project included gyroscopes along with
of as a mobile music venue. GPS sensors in a suitcase that users could push around a
small area. Both position changes and subtle movements
1.2 Mobile Music Venues could be used to manipulate the sound that was transmitted
By freeing users from the confines of computer termi- between multiple cases in the area via radio signal.
nals and interfaces that severely limit mobility, application Orientation or heading can also provide useful feedback,
14
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
since spatial sound conveys a great deal of information about tions that transmit error corrections over radio frequencies.
directions of objects and the acoustics of an environment. The idea is that mobile GPS units in the area will have
Projects including GpsTunes [21] and Melodious Walkabout similar positional drift, and correcting this can yield accu-
[15] use this type of information to provide audio cues that racies of under 1m. Another technique, known as assisted
guide individuals in specific directions. GPS (AGPS), takes advantage of common wireless networks
We take inspiration from the the projects mentioned above, (cellular, bluetooth, WiFi) in urban environments to access
and incorporate many of these ideas into our work. How- reference stations with a clear view of the sky (e.g., on the
ever, real-time high-fidelity audio support for multiple indi- roofs of buildings). Although accuracy is still in the or-
viduals has not been well addressed. Tanaka’s work [22], as der of 15m, the interesting benefit of this system is that
well as some of our past experiences [8], demonstrate how localization can be attained indoors (with an accuracy of
we can deal with the latencies associated with distributed approximately 50m) [6].
audio performance, but minimizing latency remains a ma-
jor focus of our work. The ability to create virtual audio 2.3 Orientation & Localization
scenes will be supported with some additions to our existing While GPS devices provide location information, it is also
Audioscape engine. To address the need of distributed mo- important to capture a listener’s head orientation so that
bile interaction, we are adding large-scale location sensing spatial cues can be provided, the resulting sound appearing
and the ability to distribute state, signals, and computa- to propagate from a particular direction. Most automotive
tion among mobile clients effectively. These challenges are GPS receivers report heading information by tracking the
addressed in the following sections. vehicle trajectory over time. This is a viable strategy for in-
ferring the orientation of a vehicle, but a listener’s head can
change orientation independently of body motion. More-
2. LOCATIVE TECHNOLOGY over, the types of applications we are targeting will likely
In order to support interaction in large-scale spaces, we involve periods of time where a user does not change posi-
require methods of tracking users and communicating be- tion, but stays in one place and orients his or her body in
tween them. A variety of mobile devices are available for various directions. Therefore, additional orientation sensing
this purpose, potentially equipped with powerful processors, seems to be a requirement.
wireless transmission, and sensing technologies. For our ini- In human psychoacoustic perception, accuracy and re-
tial prototypes, we chose to develop on Gumstix (verdex sponsiveness of orientation information are important, since
XM4-bt) processors with expansion boards for audio I/O, a listener’s ability to localize sound is highly dependent on
GPS, storage, and WiFi communication [17]. These devices changes in phase, amplitude, and spectral content with re-
have the benefit of being full-function miniature computers spect to head motion. Responsiveness, in particular, is a
(FFMC) with a large development community, and as a significant challenge, considering the wireless nature of the
result, most libraries and drivers can be supported easily. system. Listeners will be moving their heads continuously
to help localize sounds, and a delay of more than 70ms in
2.1 Wireless Standards spatial cues can hinder this process [10]. Furthermore, it
Given that the most generally available wireless technolo- has been demonstrated that head-tracker latency is most
gies on mobile devices are Bluetooth and WiFi, we consider noticeable in augmented reality applications, as a listener
the benefits and drawbacks for each of these standards . For can compare virtual sounds to reference sounds in the real
transmission of data between sensors located on the body environment. In these cases, latencies as low as 25ms can be
and the main processing device, Bluetooth is a viable solu- detected, and begin to impair performance in localization
tion. However, even with Bluetooth 2.0, a practical transfer tasks at slightly greater values [11]. It is therefore suggested
rate is typically limited to approximately 2.1 Mbps. If we that latency be maintained below 30ms.
want to send or receive audio (16 bit samples at 44kHz), To track head orientation, we attach an inertial measure-
approximately 700 kbps of bandwidth is needed for each ment unit (IMU) to the headphones of each participant,
stream. In theory, this allows for interaction between up capable of sensing instantaneous 3-D orientation with an
to three individuals, where each user sends one stream and error of less than 1 degree. It should be mentioned that not
receives two. Given the need to support a greater number all applications will require this degree of precision, and
of participants, we are forced to use WiFi.2 Furthermore, some deployments could potentially make use of trajectory-
the range of Bluetooth is limiting, whereas WiFi can relay based orientation information. For instance, the Melodious
signals through access points. Furthermore, we can make Walkabout [15] uses aggregated GPS data to determine the
use of higher-level protocols such as Optimized Link State direction of travel, and provides auditory cues to guide in-
Routing protocol (OLSR) [18], which computes optimal for- dividuals in specific directions. Users hear music to their
warding paths for ad-hoc nodes. This is a viable way to left if they are meant to take a left turn, whereas a low-pass
reconfigure wireless networks if individuals are moving. filtered version of their audio is heard if they are traveling
in the wrong direction. We can conceive of other types of
2.2 GPS applications, where instantaneous head orientation is not
GPS has seen widespread integration into a variety of needed, and users could adjust to the paradigm of hear-
commodity hardware such as cell phones and PDAs. These ing audio spatialization according to trajectory rather than
provide position tracking in outdoor environments, typically line of sight. Of particular interest, are high-velocity appli-
associated with the 3-D geospatial coordinates of users. cations such as skiing or cycling, where users are generally
However, accuracy in consumer-grade devices is quite poor, looking forward, in the direction of travel. Such constraints
ranging between approximately 5m in the best case (high- can help with predictions of possible orientations, while the
quality receiver with open skies) [25] to 100 metres or more faster speed helps to overcome the coarse resolution of cur-
[6]. Several methods exist to reduce error, for example, rent GPS technology.
differential GPS (DGPS) uses carefully calibrated base sta-
2
We note viable alternatives on the horizon, such as the
3. WIRELESS AUDIO STREAMING
newly announced SOUNDabout Lossless codec, which al- The move to mobile technology presents significant de-
lows even smaller audio streams to be sent over Bluetooth. sign challenges in the domain of audio transmission, largely
15
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
related to scalability and the effects of latency on user expe- for Pure Data [3], and can be deployed on both a central
rience. More precisely, a certain level of quality needs to be server and a mobile device.
maintained to ensure that mobile performers and audience In benchmark tests, we have successfully transmitted un-
members experience audio fidelity that is comparable to compressed streams with an outgoing packet size of 64 sam-
traditional venues. The design of effective solutions should ples. The receiver buffer holds two packets in the queue
take into account that WiFi networks provide variable per- before decoding, meaning that a delay of three packets is
formance based on the environment, and that small and encountered before the result can be heard. With a sam-
lightweight mobile devices are, at present, limited in terms pling rate of 44.1kHz, this translates to a packetization and
of computation capabilities. receiving latency of 3 × (64/44.1) = 4.35ms. In addition,
the network delay can be as low as 2ms, provided that the
3.1 Scalability users are relatively close to each other, and typically does
Reliance on unicast communication between users in a not exceed 10ms for standard wireless applications. The
group suffers a potential n2 effect of audio interactions be- sum of these latencies is in the order of 7-15ms.
tween them and in turn, to bandwidth explosion. We have Practical performance will, of course, depend on the wire-
investigated a number of solutions to deal with this prob- less network being used and the number of streams trans-
lem. mitted. Our experiments show that high packet rate results
Multicast technology, for instance, allows devices to send in network instability and high jitter. In such situations it
UDP packets to an IP multicast address that virtualizes a is necessary to increase packet size to help maintain an ac-
group of receivers. Interested clients are able to subscribe to ceptable packet rate. This motivates us, as future work, to
the streams of relevance, drastically reducing the overall re- investigate algorithms for autonomous adaptation of low-
quired bandwidth. However, IP multicast over IEEE 802.11 latency protocols that deal both with quality and scalabil-
wireless LAN is known to exhibit unacceptable performance ity.
[14] due to unsupported collision avoidance and acknowl-
edgement at the MAC layer. Our benchmark tests confirm
that multicast transmission experienced higher jitter than 4. MOBILE AUDIOSCAPE
unicast, mandating a larger receiver buffer to maintain qual- Our initial prototyping devices, Gumstix, were chosen
ity. Furthermore, packet loss for the multicast tests was in to provide: 1) wireless networking for bidirectional high-
the order of 10-15%, resulting in a distorted audio stream, quality, low-latency audio and data streams, 2) local au-
while unicast had almost negligible losses of 0.3%. Based on dio processing, 3) on-board device hosting for navigation
these results, we decided to rely for now on a point-to-point and other types of USB or Bluetooth sensors, 4) minimal
streaming methodology while experimenting with emerging size/weight, and 5) Linux support. A more detailed ex-
non-standard multicast protocols, in anticipation of future planation of our hardware infrastructure can be found in
improvements. another publication [9], in particular, the method of Blue-
tooth communication between Gumstix and sensors.
3.2 Low Latency Streaming To develop on these devices, a cross-compilation toolchain
Mobile applications tend to rely on compression algo- was needed that could produce binaries for the ARM-based
rithms to respect bandwidth constraints. As a result they 400MHz Gumstix processors (Marvell’s PXA270). The first
often incur signal delays that challenge musical interaction library that we needed to build was a version of Pure Data
and performer synchronization. Acceptable latency toler- (Pd), which is used extensively for audio processing and
ance depends on the style of music, with figures as low as control signal management by our Audioscape engine. Par-
10ms [12] for fast pieces. More typically, musicians have dif- ticularly, we used Pure Data anywhere (PDa), a special
ficulty synchronizing with latencies above 50ms [13]. Most fixed-point version of Pd for use with the processors typ-
audio codecs require greater than this amount of encod- ically found on mobile devices [5]. Several externals needed
ing time.3 Due in part to limited computational resources to be built for PDa, including a customized version of the
available on our mobile devices, we instead transmit un- Open Sound Control (OSC) objects, where multicast sup-
compressed audio, thus fully avoiding codec delays in the port was added, and the nstream object, mentioned in Sec-
system. tion 3.2. The latter was also specially designed to support
Other sources of latency include packetization delay, cor- both regular Pd and PDa, using sample conversion for in-
responding to the time required to fill a packet with data teroperability between an Apple laptop, PC and Gumstix
samples for transmission, and network delay, which varies units.
according to network load and results in jitter at the re- We also supplied each user with an HP iPAQ, loaded
ceiver. Soundcard latencies also play a role, but we con- with a customized application that could graphically repre-
sider this to be outside of our control. The most effective sent their location on a map. This program was authored
method for managing these remaining delays may be to min- with HP Mediascape software [1], which supports the play-
imize the size of transmitted packets. By sending a smaller back of audio, video, and even Flash based on user position.
number of audio samples in each network packet, we also The most useful aspect of this software was the fact that
decrease the amount of time that we must wait for those we could use Flash XML Sockets to receive GPS locations
samples to arrive from the soundcard. of other participants and update the display accordingly.
In this context, we have developed an dynamically recon- Although we used the Compact Flash GPS receivers with
figurable transmission protocol for low-latency, high-fidelity the iPAQs for sending GPS data, the interface between Me-
audio streaming. Our protocol, nstream, supports dynamic diascape software and the Flash program running within it
adjustment of sender throughput and receiver buffer size. only allowed for updates at 2Hz, corresponding to a latency
This is accomplished by switching between different levels of at least 500ms before position-based audio changes were
of PCM quantization (8, 16 and 32 bit), packet size, and re- heard. The use of the GPSstix receiver, directly attached
ceiver buffer size. The protocol is developed as an external to the Gumstix processor, is highly recommended to anyone
3
Possible exceptions are the Fraunhofer Ultra-Low De- attempting to reproduce this work.
lay Codec (offering a 6ms algorithmic delay) [24] and the The resulting architecture is illustrated in Figure 2. In-
SOUNDabout Lossless codec (claiming under 10ms). put audio streams are sent as mono signals to an Audioscape
16
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Figure 2: Mobile Audioscape architecture. Solid We have presented the challenges associated with sup-
lines indicate audio streaming while dotted lines porting multiple participants in such a system, including
show transmission of control signals. the need for accurate sensing technologies and network ar-
chitectures that can support low latency communication in
a scalable fashion. The prototype application that we devel-
server on a nearby laptop. The server also receives all con- oped was well-received by those who experimented with it,
trol data via OSC from the iPAQ devices and stores location but many improvements still need to be made. The coarse-
information for each user. A spatialized rendering is com- ness of resolution available in consumer-grade GPS technol-
puted, and stereo audio signals are sent back to the users. ogy is such that an application must span a wide area for it
For all streams, we send audio with a sampling rate of 44.1 to be of any value. This is problematic, since the range of
kHz and 16-bit samples. a WiFi network is much smaller, mandating redirection of
In terms of network topology, wireless ad-hoc connections signals through additional access points or OLSR peers. If
are used, allowing users to venture far away from buildings all signals must first travel to a server for processing, then
with access points (provided that the laptop server is moved distant nodes will suffer from very large latency.
as well). Due to the number of streams being transmitted, One solution is to distribute the state of the virtual scene
audio is sent with 256 samples per packet, which ensures an to all client machines, and perform rendering locally on the
acceptable packet rate and reduces jitter on the network. mobile devices. For the prototype application that we de-
The result is a latency of 3 × (256/44.1) = 17.4ms for pack- veloped, this would cut latency in half since audio signals
etization and a minimal network delay of about 2ms. How- would only need to travel from one device to another, with-
ever, since audio is sent to a central server for processing out the need to return from a central processing server. Fur-
before being heard, these delays are actually encountered thermore, this strategy would allow users to be completely
twice, for a total latency of approximately 40ms. This is free in terms of mobility, rather than in within contact with
well within the acceptable limit for typical musical perfor- the server for basic functionality. However, for scenes of
mance, and was not noticed by users of the system. any moderate complexity, this demands much more pro-
The artistic application we designed allows users to navi- cessing power and memory than is currently available in
gate through an overlaid virtual audio scene. Various sound consumer devices, and of course, the number of users will
loops exist at fixed locations, where users may congregate still be limited by the available network bandwidth required
and jam with accompanying material. Several virtual volu- for peer-to-peer streaming.
metric regions are also located in the environment, allowing A full investigation into distributing audio streams, state
some users to escape within a sonically isolated area of the and computational load will be presented in future work,
scene. Furthermore, each of these enclosed regions serves but for the moment we have provided a first step into the
as a resonator, providing musical audio processing (e.g., de- exploration of large-scale mobile audio environments. The
lay, harmonization or reverb) to signals played within. As multi-user nature of the system coupled with high-fidelity
soon as players enter such a space, their sounds are modi- audio distribution provides a new domain for musical prac-
fied, and a new musical experience is encountered. Figure tice. We have already designed outdoor spaces for sonic
3 shows two such performers, who have chosen to jam in a investigation, and hope to perform and create novel musi-
harmonized echo chamber. They are equipped with Gum- cal interfaces in this new mobile context.
stix and iPAQs, with both unobtrusively in their pockets.
6. ACKNOWLEDGEMENTS
5. DISCUSSION The authors wish to acknowledge the generous support
Approaching mobile music applications from the perspec- of NSERC and Canada Council for the Arts, which have
tive of virtual overlaid environments, allows novel paradigms funded the research and artistic development described in
of artistic practice to be realized. The virtualization of per- this paper through their New Media Initiative. The proto-
former and audience movement allows for interaction with type application described in Section 4 was produced in co-
sound and audio processing in a spatial fashion that leads to production with The Banff New Media Institute (Alberta,
new types of interfaces and thus, new musical experiences. Canada). The authors would like to thank the participants
17
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
of the locative media residency for facilitating the work and musical expression (NIME), pages 109–115,
in particular, Duncan Speakman, who provided valuable as- Singapore, 2003.
sistance with the HP Mediascape software. [17] Gumstix. www.gumstix.com.
[18] P. Hipercom. RFC 3626, Optimized Link State
7. REFERENCES Routing protocol (OLSR), 2003.
[19] I. Mott and J. Sosnin. Sound mapping: an assertion
[1] HP Mediascape website. www.mscapers.com.
of place. In Proceedings of Interface, 1997.
[2] The [murmur] project. murmurtoronto.ca.
[20] J. Rozier, K. Karahalios, and J. Donath. Hear &
[3] Pure Data. www.puredata.info. There: An augmented reality system of linked audio.
[4] Webpage: 4Dmix3. au- In Proceedings of International Conference on
dioscape.org/twiki/bin/view/Audioscape/SAT4Dmix3. Auditory Display (ICAD), 2000.
[5] PDa: Real time signal processing and sound [21] S. Strachan, P. Eslambolchilar, R. Murray-Smith,
generation on handheld devices. In International S. Hughes, and S. O’Modhrain. GpsTunes:
Computer Music Conference (ICMC), 2003. Controlling navigation via audio feedback. In
[6] R. Bajaj, S. L. Ranaweera, and D. P. Agrawal. GPS: International Conference on Human Computer
Location-tracking technology. Computer, 35(4):92–94, Interaction with Mobile devices & services
2002. (MobileHCI), pages 275–278, New York, 2005. ACM.
[7] T. Blaine and S. Fels. Contexts of collaborative [22] A. Tanaka. Mobile music making. In Proceedings of
musical experiences. In Proceedings of the conference New Interfaces for Musical Interaction (NIME), 2004.
on New Interfaces for Musical Expression (NIME), [23] A. Tanaka and P. Gemeinboeck. A framework for
pages 129–134, Montreal, 2003. spatial interaction in locative media. In Proceedings
[8] N. Bouillot. nJam user experiments: Enabling remote New Interfaces for Musical Expression (NIME), pages
musical interaction from milliseconds to seconds. In 26–30, Paris, France, 2006. IRCAM.
Proceedings of the International Conference on New [24] S. Wabnik, G. Schuller, J. Hirschfeld, and U. Krämer.
Interfaces for Musical Expression (NIME), pages Reduced bit rate ultra low delay audio coding. In
142–147, New York, NY, USA, 2007. ACM. Proceedings of the 120th AES Convention, May 2006.
[9] N. Bouillot, M. Wozniewski, Z. Settel, and J. R. [25] M. Wing, A. Eklund, and L. Kellogg. Consumer-grade
Cooperstock. A mobile wireless platform for global positioning system (GPS) accuracy and
augmented instruments. In International Conference reliability. Journal of Forestry, 103(4):169–173, 2005.
on New Interfaces for Musical Expression, Genova, [26] M. Wozniewski. A framework for interactive
Italy, 2008. three-dimensional sound and spatial audio processing
[10] D. Brungart, B. Simpson, R. McKinley, A. Kordik, in a virtual environment. Master’s thesis, McGill
R. Dallman, and D. Ovenshire. The interaction University, 2006.
between head-tracker latency, source duration, and [27] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
response time in the localization of virtual sounds. In framework for immersive spatial audio performance.
Proceedings of the International Conference on In New Interfaces for Musical Expression (NIME),
Auditory Display (ICAD), 2004. Paris, pages 144–149, 2006.
[11] D. S. Brungart and A. J. Kordik. The detectability of [28] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
headtracker latency in virtual audio displays. In paradigm for physical interaction with sound in 3-D
Proceedings of International conference on Auditory audio space. In Proceedings of International
Display (ICAD), pages 37–42, 2005. Computer Music Conference (ICMC), 2006.
[12] E. Chew, A. A. Sawchuk, R. Zimmerman, [29] M. Wozniewski, Z. Settel, and J. R. Cooperstock. A
V. Stoyanova, I. Tosheff, C. Kyriakakis, spatial interface for audio and music production. In
C. Papadopoulos, A. R. J. François, and A. Volk. Digital Audio Effects (DAFx), 2006.
Distributed immersive performance. In Proceedings of
[30] M. Wozniewski, Z. Settel, and J. R. Cooperstock.
the Annual National Association of the Schools of
Audioscape: A pure data library for management of
Music (NASM), San Diego, CA, 2004.
virtual environments and spatial audio. In Pure Data
[13] E. Chew, R. Zimmermann, A. A. Sawchuk, Convention, Montreal, 2007.
C. Papadopoulos, C. Kyriakakis, C. Tanoue, D. Desai,
[31] M. Wozniewski, Z. Settel, and J. R. Cooperstock.
M. Pawar, R. Sinha, and W. Meyer. A second report
User-specific audio rendering and steerable sound for
on the user experiments in the distributed immersive
distributed virtual environments. In International
performance project. In Proceedings of the 5th Open
conference on auditory displays (ICAD), 2007.
Workshop of MUSICNETWORK: Integration of
Music in Multimedia Applications, 2005.
[14] D. Dujovne and T. Turletti. Multicast in 802.11
WLANs: an experimental study. In MSWiM ’06:
Proceedings of the 9th ACM international symposium
on Modeling analysis and simulation of wireless and
mobile systems, pages 130–138, New York, NY, USA,
2006. ACM.
[15] R. Etter. Implicit navigation with contextualized
personal audio contents. In Adjunct Proceedings of the
Third International Conference on Pervasive
Computing, pages 43–49, 2005.
[16] L. Gaye, R. Mazé, and L. E. Holmquist. Sonic city:
the urban environment as a musical interface. In
Proceedings of the Conference on New interfaces for
18
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
19
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2.1.3 Address Ranges [4]. The speed comparison between OSC and MIDI is always
The namespace feature of OSC is extremely powerful in that it made at the 31.25 kilobits per second Data Layer in MIDI3, and
enables a significantly large number of namespaces and the so Wright and Freed state that MIDI is “roughly 300 times
ability to define a range of points in a single message. For slower” [18] than OSC. Speed, by its definition, is a function
example, the OSC namespace ‘/minicv/left* 127’ would set the of time; in the same way the weight is not just a function of
value of ‘127’ from ‘/minicv/leftThumb’ right through to mass, but also a function of gravity. Comparing the speed of
‘/minicv/leftPinkie.’ MIDI with that of OSC is akin to comparing the weight of a
2Kg ball on earth with a 600Kg ball in outer space where the
gravity is zero. A more accurate speed comparison between
2.2 OSC Data Types OSC and MIDI would be made by comparing the two protocols
One of the brilliant features of OSC is the ability to define at identical layers on the OSI stack, comparing the time taken
different data types that can be transmitted in a message. for the target data to be encoded and then decoded on identical
Although it is possible to send any data type of any resolution layers on the stack using identical processors. If one was to
using MIDI system exclusive messages, OSC has provided a measure the number of machine instructions required to parse a
standard for software and hardware developers from different typical MIDI message with that of a typical OSC message,
vendors. MIDI would win hands down.
20
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
is the address, which defines the mapped point being (ARP) [9] used on local Ethernet networks. Without going into
referenced. As stated previously in this paper, the advantages the exact details, a brief explanation of how each mechanism
given by the addressing scheme are the intuitive names, the operates is presented, showing how similar paradigms to the
increased namespace, and addressing a range of points and will OSC address space are efficiently implemented.
be addressed later in the paper.
3.2.2 Processing Bandwidth This activity is done behind the scenes and is abstracted away
Although the transmission rate is taken into consideration— from the user. Although obtaining the IP before sending a
hence Wright and Freed’s assumption “that Open SoundControl message is effectively a two step procedure, these two steps
[sic] will be transmitted on a system with a bandwidth in the make it much more efficient than sending the domain name to
10+ megabit/sec range” [18]—many seem to forget that after every web server.
transmission and reception, the packet also needs to be parsed
by the target synthesiser. Furthermore, it is not just the target 3.2.3.2 Local Network Ethernet Mapping
synthesiser that needs to parse the data, but all synthesisers that On local networks, the abstraction is done through the Media
are not the intended recipients are required to stop what they are Access Control (MAC) address through ARP [9]. If, for
doing and parse a significant number of bytes before rejecting example, a computer whose IP address was ‘192.168.0.2’ on a
the message. This in turn affects the minimum processing local network wanted to send a message to the computer
requirements of each and every component in entire system. addressed ‘192.168.0.4’, it does not send a message to all the
Although many microcontrollers are being developed with computers on the local network expecting all but ‘192.168.0.4’
higher processing speeds, the “increase in the processor speed is to reject it. If this was the case, every time a network card
... accompanied by increased power consumption” [13]. received a message, it would be required to interrupt the
3.2.3 Processing Efficiency computer, impacting on the performance of the rejecting
computer. Rather, the ARP layer maps the IP addresses of the
Although the string based OSC namespace is more efficient for
computers on the network to MAC addresses. This MAC
a human to evaluate, a numerical value is much more efficient
for the computer because computers are arithmetic devices. address is used to address the network card. The other network
cards on the local network ignore the message and do not
Apart from the number of bytes that need to be parsed, the OSC
interrupt the computer. This mapping can be viewed on a
implementation requires that the namespace be parsed through
computer by typing ‘arp –a’ from the command prompt.
some sort of string library, requiring additional computation
and the memory space to contain the library. In a performance $> arp -a
where a mapped point is changed one hundred times a second, Interface: 192.168.0.2 --- 0x2
Internet Address Physical Address Type
the human would not be expected to read that value for every 192.168.0.1 00-04-ed-0d-f2-da dynamic
message sent; the computer, however, is. Hence, the message is 192.168.0.4 00-13-ce-f4-63-b6 dynamic
optimized for the entity that requires it least during
performance. This problem with the OSC addressing model is Although these steps are complicated, this is the sort of thing
that the coupling between the human cognition of the computers are good at and it makes communication on complex
namespace and transmission mechanism to the target computer networks very efficient. A similar approach to these could be
is too tight [6]—the naming, which is effectively the human used as an underlying layer to OSC. Implementation of such a
interface, should be abstracted away from the implementation mechanism for OSC is well beyond the scope of this paper; this
using a mapping strategy. Two such strategies that uses this does show that such processes are being used by other
type of mapping are the Internet Name Server [10] for technologies for improved efficiency and should probably be
addressing domain names, and the Address Resolution Protocol used in OSC.
21
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
on the UNIX name matching and expansion [18, 19]. Once is used by many implementations of OSC [19]. UDP not
again we see a tight coupling between the human interface and guarantee that a packet will be received if transmitted;
the computer implementation. Although the developers of OSC moreover, it does not guarantee that the target will receive
claim that “with modern transport technologies and careful packets in the order they were sent. OSC is based on the same
programming, this addressing scheme incurs no significant paradigm as UDP in that it is packet driven. “This leads to a
performance penalty ... in message processing”[18], using two protocol that is as stateless as possible: rather than assuming
numbers to define a range would require significantly less that the receiver holds some state from previous
processing than decoding a character string with wildcards. For communications” [18]. The problem with the paradigm is that it
example, in the address range ‘/minicv/left*’, every character is no longer event driven, and assumes all the relevant data is
would need to be parsed and tested to see if it was one of the transmitted at once. If a gestural controller sends an OSC
defined wildcard characters. Next, one would have to factor in message that was supposed to change a robot motor direction,
the string comparison that would be required for every mapped immediately followed by a message to start the motor, the OSC
address on the client computer. receiver may receive those in the opposite order, which may be
worse than not receiving the information at all. For example, if
Protocols such as MODBUS [http://www.modbus-ida.org/] and a server was to send the following messages using UDP:
DNP [http://www.dnp.org/] are used by telemetry units to /lefthand/motor/direction 1
control pump stations in real time [2]. These protocols can use a /lefthand/motor/start
message type that sets a range of mapped points using a single
message. When a range is defined using two numbers, it is a The client could receive them as follows:
simple matter to test if a mapped point is within the range. For /lefthand/motor/start
example, if a message from a protocol that defined two mapped /lefthand/motor/direction 1
point ranges ‘UPPER_RANGE’ and ‘LOWER_RANGE’, the
algorithm to test would be as follows. This now means that the composer will need address the
possibility of messages arriving in the wrong order without any
IF MAPPED_POINT <= UPPER_RANGE notice from the protocol. Although, one could use TCP “in
AND MAPPED_POINT >= LOWER_RANGE THEN situations where guaranteed delivery is more important than low
ProcessValue latency” [19], lower latency has been one of the OSC
ENDIF
evangelists’ greatest catch cries.
As with the intuitive names, this requires an additional layer of
mapping and abstraction, which in turn means work for the 5. STRATEGIES FOR IMPROVEMENT
developer. Software engineering has a similar paradigm where The first strategy for improvement is the intelligent mapping of
some languages are scripted and some are compiled. Scripted namespaces to numbers. OSC must move away from the
languages require the server to compile human readable code stateless protocol paradigm and begin to embrace techniques
each time it is executed, while compiled languages use a tool to such as caching [16], which has been used for many years now
convert human readable code to something that is more efficient to improve the performance of networks, hard drives, and
for the computer. The first type is more efficient for the memory access on CPUs. MIDI’s use of running status is an
programmer because he or she does not need to compile the example of how caching can improve performance by nearly
code after each modification; however, there is a definite thirty-three percent. Caching will be the key to efficient
performance hit. Compiled languages require an extra step: mapping of address patterns to simple numbers without
compiling the human readable code to machine code; however, significant impacting upon performance.
there is an enhancement in performance. In terms of
communications protocols, OSC is like a scripted language: OSC must move toward an event delegation model, where
extremely powerful, but requiring significantly more computing clients register whether to receive OSC messages within a
power than what is available to most embedded technologies particular namespace. Needlessly receiving and parsing large
today. irrelevant messages from OSC servers is a waste of valuable
processing power.
3.2.5 Message Padding The developers of OSC must change their attitude towards
Another possible inefficiency is the padding of all message
MIDI. OSC has been anti-MIDI for a while, with OSC
parameters to four byte boundaries. For example, a parameter
developers often ridiculing MIDI developers [personal
that is only one byte in length is padded to four bytes. The
correspondence]. Some OSC developers have made token
reasoning behind this is that the OSC data structure is optimised
gestures towards MIDI by providing a namespace, which is “an
for thirty-two bit architectures [OSC Newsgroup]. There have
OSC representation for all of the important MIDI messages”
not yet been any conclusive tests to determine whether the gains
[19]. This completely defeats the innovative address pattern
obtained from this optimisation exceed the additional overhead
provided by OSC. Instead, an underlying network layer should
created by inserting and later filtering these additional padded
convert an intuitively mapped name, such as
bytes [OSC Newsgroup]; however, these results should be
‘/performer1/lefthand’ to a MIDI message and then transport it
forthcoming in the near future. It does, however, mean that
via MIDI or vice versa. The MIDI controller number should be
there would be a decrease in efficiency for eight, sixteen, and
completely abstracted away from the application layer in order
sixty-four bit architectures.
to reduce the coupling between the two. The OSC server should
not need to know at the application layer that the motor that
4. FAULT TOLERANCE controls the robot’s left finger is MIDI controller 13. Likewise,
the motor that is being controlled by controller 13 should not
OSC is a packet driven protocol that does not accommodate
need to know that the OSC server is really addressing
failure in the underlying OSI layers. UDP [12] is a protocol that
‘/performer1/lefthand’. Although these sort of strategies have
22
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
been employed in dynamic routing schemes in some OSC [5] Kartadinata, S. the gluion: advantages of an FPGA-based
projects [19], this should be a function of the network layer, not sensor interface. in International Conference on New
the application layer. When one considers that the longest Interfaces for Musical Expression (NIME). 2006. IRCAM
domain names on the internet can be addressed with only four - Centre Pompidou, Paris, France.
bytes, it is not unreasonable to expect that even the most [6] Larman, C., Applying UML and patterns: an introduction
complex OSC namespaces could be translated into simple MIDI to object-oriented analysis and design and the unified
messages if required. process. 2nd ed. 2002, Upper Saddle River, NJ: Prentice
There needs to be a greater number of message types—currently Hall PTR. xxi, 627.
there are only two. OSC needs to move towards an object [7] Lemieux, J., The OSEK/VDX Standard: Operating System
oriented paradigm in the communications protocol [4]. and Communication. Embedded Systems Programming,
Currently, all the network, data link, and transport layers of 2000. 13(3): p. 90-108.
transmission have been delegated to the application layer. This [8] Pawlicki, J. Formalization of embedded system
is above the presentation layer, which is where OSC exists— development: history and present. in Quality Congress.
this is completely upside down when comparing to the OSI ASQ's ... Annual Quality Congress Proceedings. 2003:
model. OSC needs to develop an underlying OSI stack where PROQUEST Online.
the protocol between the client and server is abstracted away [9] Plummer, D.C. RFC 826 - Ethernet Address Resolution
from the user. The underlying mapping should direct the Protocol: Or converting network protocol addresses to
message from the source to the destination. 48.bit Ethernet address for transmission on Ethernet
hardware. < http://www.faqs.org/rfcs/rfc826.html >
6. CONCLUSION accessed 28 January 2008
Although OSC has provided a standard “protocol for [10] Postel, J. IEN-89 - Internet Name Server. < ftp://ftp.rfc-
communication among computers, sound synthesizers, and editor.org/in-notes/ien/ien89.txt > accessed 28 January
other multimedia devices” [19], and was supposed to overcome 2008
“MIDI's well-documented flaws … [, its] liberal [use] … of
bandwidth” [18] may be its Achilles heel, preventing it from [11] Postel, J. RFC 760 - DoD standard Internet Protocol. <
ever being the standard end-to-end protocol for communication http://www.faqs.org/rfcs/rfc760.html > accessed 28
for low power and wireless microcontroller interfaces. If OSC is January 2008
to have any hope in servicing this significant and important area [12] Postel, J. RFC 768 - User Datagram Protocol. <
of the NIME community, an OSI stack needs to be developed http://www.faqs.org/rfcs/rfc768.html > accessed 21
that has efficiency and performance at the forefront, while at the January 2008
same time, implementing proven design patterns [6]. This, [13] Schiemer, G. and M. Havryliv. Wearable firmware: the
however, would be a significant research project within itself. Singing Jacket. in Ghost in the Machine: the Australasian
Computer Music Conference. 2004. University of Victoria,
Wellington.
7. ACKNOWLEDGMENTS
I would like to thank Adrian Freed from Center for New [14] Schiemer, G. and M. Havryliv. Pocket Gamelan: a Pure
Music and Audio at Univ. California, Berkeley for answering Data interface for java phones. in International Conference
the many questions I asked about OSC. I would also like to on New Musical Interfaces for Music Expression (NIME-
thank all the members of the Developer's list for the OpenSound 2005). 2005. University of British Columbia, Vancouver.
Control [sic] (OSC) Protocol osc_dev@create.ucsb.edu for [15] Son, S.H., Advances in real-time systems. 1995,
their input. Englewood Cliffs, N.J.: Prentice Hall. xix, 537.
[16] Vitter, J.S., External memory algorithms and data
8. REFERENCES structures: dealing with massive data. ACM Comput.
[1] Doornbusch, P., Instruments from now into the future: the Surv., 2001. 33(2): p. 209-271.
disembodied voice. Sounds Australian, 2003(62): p. 18.
[17] Wright, M. Introduction to OSC. <
[2] Entus, M., Running lift stations via telemetry. Water http://opensoundcontrol.org/introduction-osc > accessed
Engineering & Management, 1989. 136(11): p. 41-43. 21 January 2008
[3] Fraietta, A. Mini CV Controller - Conference Poster. in [18] Wright, M. and A. Freed. Open SoundControl: A New
Generate and Test: the Australasian Computer Music Protocol for Communicating with Sound Synthesizers. in
Conference. 2005. Queensland University of Technology, International Computer Music Conference. 1997.
Brisbane: Australasian Computer Music Association. Thessaloniki, Hellas: International Computer Music
[4] Fraietta, A., The Smart Controller: an integrated electronic Association.
instrument for real-time performance using programmable [19] Wright, M. and A. Freed. OpenSound Control: State of the
logic control, in School of Contemporary Arts. 2006, Art 2003. in International Conference on New Interfaces
University of Western Sydney. for Musical Expression (NIME-03). 2003. Montreal,
Quebec, Canada.
23
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
24
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
hand of the player. phonic configuration with the 4 loudspeaker placed at the
Zirkonium [9] is a software implemented to control the corners of the room. The projection space can be artifi-
spatialization within the Klangdom system at the ZKM cially extended and modified by controlling the direct to
(Germany). The Klangdom is formed by 39 speaker and it reverbereted signal ratio (for the creation of illusory acous-
can be controlled by Zirkonium through mouse and joystick. tic spaces).
It implements various spatialization algorithms (Wave Field The spatializer allows the player to control up to four
Synthesis, Ambisonics, Vector Base Amplitude Panning e simultaneous sound sources and a graphical feedback gives
Sound Surface panning) and it allows the user to define an the instantaneous state of the system.
arbitrary number of resources1 to spatialize in the concert The system offers a set of functionalities that allows a
hall. The system is controlled through a simple graphic complete and efficient control of the spatialization and in
interface. particular: a punctual and precise placement of the sound
Challenging Bodies [4] is a complex multidisciplinary pro- sources in the space, the control of relative and absolute
ject for live-performances of disabled people realized at the volume levels, the automatization of the movements, a non-
Informatics and Music Department of the Regina Univer- linear interpolation of the position of the sources in time and
sity (Canada). Within this wide project, the RITZ system, the possibility to load pre-recorded sound files or to acquire
through various techniques, allows to frontally spatialize up signals coming from a microphone or any audio device.
to 10 input signals coming from musical instruments with 7 As shown in Figure 1, the system has been implemented
loudspeaker placed in front of the players. Its control inter- in Pure Data 3 (and its graphical interface GrIPD 4 ) and
face is made up by two windows: the first one, implemented EyesWeb 5 (with the creation of ad hoc additional blocks)
in GEM2 , supplies a graphical feedback of the loudspeakers communicating through the OSC 6 protocol, making so SMu-
configuration and it allows to modify the position of the SIM a native network distributed application (both with
sound sources in the space, while the second one, the main one ore more instances on several machines, allowing mul-
control patch implemented in Pure Data, gives the user the tiple distributed configurations).
possibility to set the relative and absolute sound levels. The
system is hardly oriented to scalability and usability. 3.1 Interaction interfaces
The last example is the work recently proposed by Scha- The prototype offers three different typologies of human-
cher [10] at the ICMST of the Zurich university (Switzer- computer interaction devices for the spatialization’s con-
land). It consists of a design methodology and of a set of trol. Keyboard and mouse are the simplest and the most
tools for the gestural control of sound sources in surround widespread ones. The user controls the diffusion of the
environments. The spatialization is made through a struc- sound sources in the space through a combination of actions
tured and formalized analysis that allows to map the player and commands coming from the PC keyboard and from the
gestures on the sources movements by applying various ty- mouse. In this case the system provides (in addition to
pologies of geometric transformations. From the point of the visual feedback window) a bidimensional graphic envi-
view of the input devices, the system does not have a con- ronment where the player can put and move some graphic
solidated structure, but the interfaces used up to now spaces objects representing the different sound sources.
from data gloves equipped with multiple sensors (pressure,
position, bending) to haptic arms and graphic and multi-
touch tablets. The spatialization algorithms used are the
Ambisonics and the Vector Based Amplitude Panning.
3. IMPLEMENTATION: SMUSIM
SMuSIM is a multichannel sound spatialization system
with a multiple and multimodal interaction interface. It
is designed for real-time applications in musical expressive
contexts (electronic music spatialization, distributed and
collaborative network performances).
25
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
widely available on the market in order to let the system of the attenuation levels to apply to the audio signals on
easily usable and accessible to any user level. each channel. The spatialization technique is the Ampli-
tude Panning extended to the multichannel case. On the
3.2 Software components structure basis of the positional data of the virtual sources coming
As shown in Figure 3, the application is composed by from the input devices, a monophonic signal (considering a
some functional units that perform the various needed tasks. single source) is applied to the various channels with a gain
Data coming from the input devices are acquired, for- factor as follows:
matted and analyzed by the Device controller unit that is
constituted by other 4 sub-units, one for each input device.
xi (t) = gi x(t), i = 1, ..., N
In particular Mouse/Keyboard controller supplies a gra-
phic window (interaction environment) where the user can where xi (t) is the signal to apply to the loudspeaker i, gi
displace the four objects representing the sound sources the gain factor of the correspondent channel, N the cardi-
with the mouse. A set of keyboard key combinations allows nality of the loudspeaker and t the time. The gain factor gi
to perform a set of predefined actions (shifting of single or has a non-linear proportionality with the position (x, y) of
groups of sources, maintaining or not their topological con- a single sound source in the space. To overcome the 6dB at-
figuration, loading/saving default configurations, etc.). tenuation at the center of the projection space, a quadratic
sinusoidal compensation curve is applied along the two di-
mensions. By considering all the sound sources involved,
the resulting signal X(t) can finally be defined as:
K
X(t) = xi (t)
j=1
26
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ing and optimizing both the tracking algorithm and the vi- consideration, the graphic feedback proposed to the user is
sual feedback production (in case abandoning the EyesWeb quite simple and thin, but it results very efficient and let
and Pd platforms and realizing an integrated, stand-alone have the actual state of the sound sources in the diffusion
and dedicated software application). space always under control.
From the point of view of the automatization, it does not From the point of view of the sound spatialization, the
provide any way of interaction with the player, but it is Amplitude Panning technique produces the expected re-
an autonomous and isolated modality. It could be inter- sults. It is very efficient, it does not have problems of com-
esting the implementation of rules for pattern learning and putational complexity and it is easily configurable to the
reproduction in order to let the system able to imitate and various executive and technical contexts (customization of
continue a performance initially guided by a human user. the panning curves and of the number of diffusion channels).
Other possible developments could refer to the diffusion Even if an intensive and large scale test session has still to
system (increasing the number of loudspeakers and their be conducted, SMuSIM has shown good results in terms of
configuration) and to the integration of a sound synthesis learnability, intuitivity and expressiveness. There are vari-
engine within the application. ous possible developments of this work and they refer both
During this first phase of the work there was not enough to software and hardware issues (input devices, diffusion
space for an intensive and structured test session on a large system) and applicative and musical aspects.
and heterogeneous set of users. However a hypothetical
evaluation experiment has been predisposed for a future 6. REFERENCES
use.
[1] A. Camurri et al. Toward real-time multimodal
The experiment has a total duration of about 45 minutes
processing: EyesWeb 4. In Proceedings of the
and it is composed by six sections:
Convention on Motion, Emotion and Cognition
1) free trial of the instrument (10 min) without any expla-
(AISB04), Leeds, UK, 2004.
nation about the working principles of the system (the user
has previously read a short user manual) [2] J. M. Chowning. The simulation of moving sound
2) supervised test (10 min) in which the user has to execute sources. In Journal of the Audio Engineering Society,
some tasks evaluated by the operators volume 19, pages 2–6, 1971.
3) explanation of the working principles (5 min) by an op- [3] M. Marshall, Wanderley, et al. On the development of
erator in order to increase the consciousness of control of a system for the gesture control of spatialization. In
the spatialization instrument and to accelerate the learning Proceedings of the 2006 International Computer
process Music Conference (ICMC06), pages 360–366, New
4) repetition of the test (10 min) after the explanations of Orleans, USA, 2006.
the operator [4] J. Nixdorf and D. Gerhard. Real-time sound source
5) questionnaire (5 min) of evaluation compiled by the user spatialization as used in Challenging Bodies:
6) interview (5 min) in which the operators deepen some implementation and performance. In Proceedings of
aspects appeared during the test. the 2006 International Conference on New Interfaces
The two proposed tests contains list a of 21 tasks (for each for Musical Expression (NIME06), pages 318–321,
test) that the user has to execute. Each task receives a Paris, France, 2006.
mark according to a five point Likert-scale (1: not exe- [5] N. Orio, N. Schnell, and M. M. Wanderlay. Input
cuted, 5: executed at the first trial). The tasks are sorted devices for musical expression: borrowing tools from
by the increasing level of difficulty and they are intended HCI. In Proceedings of the 2001 International
to test most of the functionalities of the instrument and its Conference on New Interfaces for Musical Expression
expressive possibilities. The questionnaire presents 22 ques- (NIME01), 2001.
tions divided into 5 categories: usability of the system (8), [6] F. Pachet and O. Delerue. A mixed 2D/3D interface
learnability (3), audio feedback (3), visual feedback (4) and for music spatialization . In Proceedings of the First
overall opinion (4). Also in the questionnaire the players International Conference on Virtual Worlds, pages
has to give a mark according to a five point Likert-scale (1: 298–307, Paris, France, 1998.
bad, 5: very good). [7] M. Puckette. Pure Data: another integrated computer
music environment. In Proceedings of the 1996
International Computer Music Conference (ICMC96),
5. CONCLUSIONS pages 269–272, Hong Kong, China, 1996.
A real-time sound sources spatialization system with a [8] V. Pulkki. Spatial sound generation and perception
multimodal interaction interface has been developed. by amplitude panning techniques. Graduation thesis,
The interaction interfaces have been realized with very Helsinki University of Technology, Laboratory of
simple and inexpensive technologies and devices, that have Acoustics and Audio Signal Processing, 2001.
nevertheless shown satisfactory expressive and interaction [9] C. Ramakrishnan, J. Gossmann, and L. Brummer.
possibilities. In particular the best results came out, as The ZKM Klangdom. In Proceedings of the 2006
expected, with the gamepad and the webcam, devices that International Conference on New Interfaces for
allow more freedom in movements and a more intuitive and Musical Expression (NIME06), pages 140–143, Paris,
natural interaction. Moreover the webcam let the user move France, 2006.
independently each sound source (action impossible with [10] J. C. Schacher. Gesture control of sounds in 3D space.
both the mouse and the gamepad). On the other hand, the In Proceedings of the 2007 International Conference
performances are one of the key aspects associated with this on New Interfaces for Musical Expression (NIME07),
last kind of device, because of the computational load of the pages 358–361, New York, USA, 2007.
image analysis techniques that make the real-time issue a [11] L. Schomaker, A. Camurri, et al. A taxonomy of
crucial aspect of the application. multimodal interaction in the human information
In general even all the graphic rendering operations for processing system. Technical report, Nijmegen
the creation of the visual feedback are particularly oner- University, 1995.
ous for the overall performances of the system. Under this
27
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
1. INTRODUCTION
Although intricate and complex musical processes involving
rhythm, melody and harmony are to be found in most musical
genres, the use of and conventions relating to tempo are less
adventurous [16]. It has only been the last century or so that has
seen composers, such as Steve Reich and Conlon Nancarrow, Figure 1. Perceived synchronisation in phase music.
experiment with simultaneous musical parts bearing differing
tempi [9]. As regards experiments in musical time, the notion of
Due to the periodic nature of much of the world’s rhythms [3],
polytempi is crucially different from the relatively more
there are various points where disjoint parts can appear more or
common concepts of polyrhythm and polymetre, which both
less temporally-aligned, so that the perceived effect is
rely on simple integer divisions of the bars or beats in the piece.
determined not only by the absolute musical offset, but also
In contrast, the multiple simultaneous tempi of polytempo
relative factors. For example, in Figure 1, the parts start in-sync
music leads to situations where the bar lines and beats of each
and gradually diverge because of differing tempi. Initially, the
part in the piece are themselves incongruent. The timing
divergence is small enough so that the listener can still corrupt
relationships between the events in each part can no longer be
their perception of the musical events onto a single time scale,
thought of, or expressed in, simple integer fractions (e.g. 3 in
dismissing the offset as they might a digital chorus effect,
the time of 2, or 3/2 vs. 6/4), but instead become irrational.
acoustic echo or performance prosody [9]. After time, the offset
A number of explanations can be volunteered for the paucity of increases and the two parts are more easily separated, becoming
polytempo use in the modern musical repertoire. For the harder to align perceptually. Yet, by Bar 4, the absolute offset is
average listener: the barrage of incongruous notes resulting approximately one beat, and thus the music can be aligned
about the beat. Continued, such alignment occurs relative to
other points in the bar, as well as divisions of the beat, and
inevitably aligns relative to the bar itself.
The varying incongruity of notes can be seen to form a temporal
harmony, where perceived aligned and misaligned episodes
correspond to consonance and dissonance respectively. For
centuries, these concepts have been powerful tools levered by
composers in their engagement with tonal harmony; in the
typical case, dissonance giving way to consonance, to provide
28
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
resolution [14]. As such, and like dissonant harmonies, the nonetheless able to present impressions of temporal harmony,
average listener’s aversion to apparently cacophonic, with temporally dissonant passages resolving to consonance.
misaligned music serves only to reinforce the potential for Furthermore, his reliance on more established working practices
temporal resolutions. affords him greater flexibility in instrumentation, arrangement
and performance.
Harmony and pitch have been studied, codified and notated in a
variety of ways to enable performance by musicians and Both Nancarrow and Reich effectively used technology to
experimentation by composers, yet few notations have arisen to address problems with the use of polytempi, but were both
explicitly express temporal relations [16]. In our research, we forced to pre-calculate and prescribe the tempo variations long
apply computer technology and interactive notations to tackle in advance of performance; waiting hours, days or weeks to
these remaining problems. Whereas the problem of directing hear the result of their writing. In all three cases, the composers
and coordinating performers is unquestionably also a matter of are forced to limit their creativity in some way, be it temporal
notation (be it paper, digital, aural, static or interactive), this freedom, dynamism, note-to-note control or instrumentation.
paper principally focuses on the earlier, pre-requisite stage in
Approaches to managing complex musical timings tend to focus
the process – the composer’s creation of the music. For our
on performance requirements. Ghent [7] is one of the earlier
purposes, the “super-human virtuosity” currently required of
attempts to use audio cues (e.g. multiple metronomes) for
polytempi performers can be provided by the computer [4].
individual musicians. Ligeti [12] uses a similarly audio-based
method. Such techniques isolate the musician from the
2. BACKGROUND ensemble and, more importantly, the part from the piece, which
The simple concept described in the Introduction and Figure 1 is not only incompatible with the composer’s requirement of a
underpins the phase music of Steve Reich [15]. “Piano Phase” macroscopic view of the music, but also inhibits performer
(1967) contains two musical parts with the same melodic and interaction, an important component of the music, socially and
rhythmic content, but slightly differing (yet constant) tempi. aesthetically [19].
The parts start together, then gradually diverge, or ‘phase’, in
musical time, producing moments of dissonance and Other explicit considerations of polytempo music are sparse,
consonance, as the parts become more or less aligned. Reich’s and the paucity of published research in this area is marked by
interface to this process began with a tape machine, playing two the writings of lamenting composers desperate to explore more
looped tapes of the phrase at different speeds. Subsequently, advanced musical timings, such as the late Stockhausen [17]. A
and owing to the relative simplicity and repetitive nature of the useful website, run by artist John Greschak [8], contains more
musical content, he was able to carry the idea to the piano, information and unpublished articles about polytempo, as well
whereupon two exceptionally disciplined and practiced as an annotated list of polytempo music. To our knowledge,
performers can play the music live. With the exception of the there has been no previously published work in the area of
tape speed settings, general performance directions and the music interaction or interface design that has significantly
looped phrase itself, however, the piece is not fully-scored, but addressed musical tempo as the focus of control, nor explicitly
is instead an example of the generative or procedural considered the composer and composition as target user or task.
specification of music. Notably, it is difficult to inspect or
manipulate specific, individual notes or events in the 3. A SYSTEM FOR POLYTEMPI
performance. There are two principal requirements of a system allowing
composers to interact with tempo and polytempi: a
Conlon Nancarrow, a contemporary of Reich, took a different
representation of the polytempi, including the temporal
approach to the problems of notation and performance,
relations between parts (the notation); and a method of
replacing the human pianist with a pianola (or player piano),
manipulating and managing the tempo of such parts (the
notating his music on the paper roles used by the machine [9].
system). In this latter case, interaction should occur in realtime,
Unlike score notation, the roles represent time linearly and the
in order to quickly allow the auditioning of alternative material
piano’s mechanism eventually afforded the opportunity to
and making of expressive refinements.
dynamically vary tempo within a part. Unlike Reich,
Nancarrow’s pieces tended not to rely on the phasing of musical However, before further considering issues of system design
events in repetitive parts, but on a grander plan of having a and implementation, we must tackle one of the fundamental
single, climatic point of synchrony. goals of our research: the design of a notation for polytempi,
upon which the system will be based.
Alejandro Viñao [1], an established modern-day composer, has
been much inspired by Nancarrow’s efforts, and now brings a 3.1 Notation
more personal perspective and motivation to our research, The design of our system was arrived at by drawing on our prior
joining us as Composer in Residence at the Computer research into notations for performance and composition in
Laboratory. For more than 30 years and in a variety of areas and music and other expressive arts [2][5]. The lack of previous
centres of research (including IRCAM and MIT’s Media Lab), work in this specific problem encouraged us to look for
he has sought technologies to help him express his musical analogies in other disciplines and fields where it is necessary to
ideas. Yet, his appropriation of technologies and methods in handle parallel streams, signals and processes – such as physics,
conventional music practice force him to an unsatisfying data communication, computer security, graphics, and
compromise when it comes to exploring polytempi. Using engineering fields.
scored music, Alejandro divides the bar into the finest
performable resolution (e.g. 1/32nd notes), and uses varying In facilitated cross-disciplinary meetings with 10 different
note accents and stresses to give the impression of multiple specialist research groups (see Section 8, for a full list), the
tempi. Even though he admits such methods do not produce concept of phase and synchronization was highlighted in a
true polytempi, Alejandro manages to create pieces that are number of non-musical activities, possibly the closest cousin of
29
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Bar Y
Bar Y
Bar Y
sine wave, at any moment has a phase, frequency and (c) (d)
wavelength that might be adapted to music, in the forms of
(b)
musical position, tempo and bar length, respectively1.
Considering a musical part as a periodic signal, the challenge
moves to representing multiple signals so that the relationship
(a) Bar X Bar X Bar X
between them is evident. In many fields, phase can be plotted or
graphed as a function of other properties of a given system,
such as time (e.g. phase plot) or frequency (e.g. Bode plot). In
this manner, it would be possible to plot musical position on a Figure 3. An idealised example of using a musical
vertical axis against absolute time on the horizontal, but this phase plot to manage polytempi.
(a) Part Y is progressing faster through the bar. (b) The part is slowed
would only be useful in plotting absolute synchronization and
to the tempo of its counterpart, leaving them offset by 1 musical beat.
absolute time offsets – tempo would be implicitly presented as (c) The part is again slowed so that, by (d), the parts are back in sync.
line gradient, and relative alignments would also be difficult to
identify. Instead, we propose a plot of the phase of one signal
against the phase of another, as in Figure 2(a). In music, this is
bar lines (the other part’s formed by the axes of the graph), and
the musical position within the bar of one part, against that in
gradually creeps across the grid, as the bar lines diverge,
another part, as shown in Figure 2(b).
eventually converging on the opposite extreme of the bar,
whereupon the process concludes, having regained synchrony,
albeit a bar adrift.
Bar Y
3.2 Interface
The examples above demonstrate how the plot can be used to
inspect temporal aspects of a piece but, in order to be of use to
composers, a system must allow the viewer to affect the tempi –
to draw the line themselves – and react to what they see and
hear.
It would be possible to expose the relative synchronisation as a
(a) (b) Bar X control parameter, but this would require the composer to first
select a reference part to which the synchronisation would be
Figure 2. Plots of phase against phase. relative, effectively restricting tempo variation to a single part,
(a) A general case. (b) An adaptation for musical purposes (4/4). at any given time. Instead, we elected to simply control the
tempi of both parts independently.
In addition to these two fundamental variables, we envisaged
Although the plot no longer allows the reader to deduce the
additional control parameters. Notably, the composer will, at
individual tempos of each part, the relationship between them is
different times, wish to affect tempo variations of varying scale.
clear – a diagonal line (45 degrees) implies matched tempi;
With Reich and Nancarrow, the tempo changes were gradual
steeper or shallower and one part is faster or slower than the
and finely-controlled, but other composers, such as Alejandro
other. More importantly, the bar-level phase difference is also
Viñao, desire the expressive freedom to make both fine and
displayed, allowing the reader to easily deduce points of
more abrupt, coarser variations. Thus, a third variable of control
relative alignment, as shown by the guidelines in Figure 2(b).
range (or resolution) is required. Finally, observing that
From the diagram it is possible to see the salient factors of the
temporal harmony involves the varying between two extremes
polytempi process – the relative phases and synchronization of
(temporal consonance and dissonance), and that most pieces
two parts – and extrapolate how changes in each tempo, which
revolve around the journeys between them, we introduce a
affect the gradient of the plot, will affect the degree of
fourth factor in the interaction: a “gravitational” element that
synchrony over time. Figure 3 gives an illustration of a musical
draws the two parts into consonant temporal congruity, to a
application.
varying degree. Altogether, this requires an interface offering at
To further illustrate how the plot functions, consider how least 4 degrees of freedom, corresponding to: tempo of first
Reich’s “Piano Phase” would be represented: With two parts part, tempo of second part, tempo control resolution and
featuring close yet differing constant tempos, the line would be influence of gravity.
drawn with a gradient slightly off-diagonal. One part would
Our interface could simply be formed from common input
reach the end of the bar sooner than the other, prompting the
widgets (sliders, rotary knobs, etc.). However, in designing our
line to ‘wrap-around’ using the dashed lines, as in Figure 2. The
prototype, we turned to human gesture, where the body affords
wrap-around line illustrates the relation between the two parts’
a large variety of motions to which our scales might be
effectively mapped, and where their interrelationships and
dependencies might be implicitly reflected. Gesture is often
seen as a ‘natural’ interaction mode for computer-based musical
1 applications, owing to the physical and tangible nature of
Amplitude, the remaining fundamental characteristic of audio signals
interaction in traditional music making [13]. In this vein, we
constitutes an instantaneous property, and might be seen as the
counterpart to similar musical properties such as dynamics, pitch,
elected to use gestures, motions and actions that would not
instrumentation, etc. appear out of character with those established in live musical
30
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Figure 4. A Vicon™ Motion Capture-based system designed for controlling and representing polytempi.
performance. Specifically, the similarity of expressive roles within a confined space. The raw camera data is processed by
between our user, the composer and that of a conductor was a the Vicon™ system into a realtime stream of 3D coordinates,
significant influence in our selection. The intended result was a which can be combined into groups representing different
method of interacting that would make a user more comfortable bodies and limbs.
in their manipulation of the system, where musical-like physical
For our system, we used two gloves and a belt to allow us to
actions prompted clear musical results and users did not feel
determine the position of the hands, relative to the body, and
inhibited or self-conscious by having to make overt, overly-
the position of the body relative to the space. The data was
exuberant and uncharacteristic gestures.
piped over a TCP/IP network to a PC running Cycling 74’s
Projecting the phase schematic onto a wall-mounted screen, a Max/MSP and Jitter. Using C++, we developed a Max/MSP
Vicon™ Motion Capture system [18] was used to capture body external that converted the data packets into usable Max
motion. A similar system has been used to control synthesizers variables. Variables corresponding to the positions, velocities
and sound generation [6], but we could find no published and orientation of the waist and each hand were connected to
account of an attempt to use such a system and gesture to allow the respective control variables of a MIDI playback engine
higher-level, realtime control of musical composition and (playing pre-recorded piano or percussive parts), so that they
expression. could be appropriately manipulated. In turn, the control
variables, together with the status variables of the engine, were
Our system (see Figures 4 and 5) was designed so that the
then passed to a Jitter patch that constructed a graphical
height of each hand would set the tempo of each respective part.
representation of the musical phase plot, to be fed back to the
Walking forwards or backwards set the tempo range addressed
screen.
by the hands, literally allowing more “up-close” adjustments or
broader handling “from a distance”. Appropriately, the effect of Despite a diverse collection of protocols, the different
gravity could be controlled by bringing the hands closer technologies integrated well, and a basic system was up and
together laterally, so that clasped hands (vertical and horizontal running quickly, allowing us time to iteratively refine the
proximity) would ultimately bring about synchronisation of interaction. The system outlined in Section 4 was designed to
both tempo and relative position. Additional gestures were encapsulate the relative properties of the synchronisation
added to start and stop playback (a quick clap), and to allow the between parts and, in doing so, would provide only limited
user to lock the tempo of each part (turning the respective palm insight into the more absolute characteristics of the performance
up) so that they could focus on the other. – notably, absolute tempo or absolute part position. In Figure 4
and 5, the screen shows the musical phase plot in a 3D
perspective, whereby a different plot is presented for each bar
4. TECHNICAL DETAILS of a single part, flying forward in an abstract 3D space,
The Vicon™ system works by using multiple cameras that can
appearing at a distance from the upper right (allowing bars to be
detect infra-red light reflected off small reflective balls attached
read left-to-right), at a speed matching the part’s tempo. The
to the subject. Belts, hats, gloves and suits adorned with these
user is thus given the impression that they are progressing
balls can be worn to allow untethered movement to be recorded
through the piece, and at what rate.
31
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
32
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
and the Socio-Digital System team at the Microsoft Research [9] Grout, D. and Palisca, C. A History of Western Music
Centre. For additional input thanks also to Ian Cross, of the (Fifth Edition). W. W. Norton & Co. Inc., NY, 1996.
Centre for Music & Science. Lastly, many thanks to the [10] Howard, D. and Angus, J. Acoustics and Psychoacoustics
Leverhulme Trust and the Engineering, and Physical Science (Third Edition). Focal Press, Oxford, UK, 2006.
Research Council (EPSRC), without whose financial support
this project would not have been possible. [11] Jordà, S. New Musical Interfaces and New Music-making
Paradigms. In Proceeding of New Interfaces for Musical
Expression (ACM CHI'01), ACM Press., New York, 2001.
9. REFERENCES
[1] Bellaviti, S. Perception, Reception, and All That Popular [12] Ligeti, L. Beta Foly: Experiments with Tradition and
Music: An Interview with Alejandro Viñao. In Discourses Technology in West Africe. In Leonardo Music Journal,
in Music, 6, 2 (Spr-Sum‘07), University of Toronto, 2007. 10, 2000, 41-48.
[2] Blackwell, A. and Collins, N. The programming language [13] Magnusson, T. and Mendieta, E. The Acoustic, the Digital
as a musical instrument. In Proceedings of PPIG 2005 and the Body: A Survey on Musical Instruments. In
(Brighton, UK, June 29-July 1, 2005), 2005, 120-130. Proceedings of NIME ’07 (New York, June 6-10), 2007.
[3] Clark, E. Rhythm and Timing in Music. In The Psychology [14] Piston, W. and DeVoto, M. Harmony (Fifth Edition). W.
of Music (Second Edition, ed. Deutsch, D.), Elsevier Press, W. Norton & Co. Inc., NY, 1987.
1999, 725-792. [15] Potter, K. Four Minimalists: La Monte Young, Terry Riley,
[4] Collins, N. Relating Superhuman Virtuosity to Human Steve Reich, Philip Glass (Music in the Twentieth
Performance. In Proceedings of MAXIS (Sheffield Hallam Century), Cambridge University Press, Cambridge UK,
University, April 12-14, 2002), 2002. 2000.
[5] Delahunta, S., McGregor, W. and Blackwell, A. [16] Read, G. Music Notation: A Manual of Modern Practice
Transactables. Performance Research Journal, 9, 2 (Jun. (Second Edition). Taplinger Publishing Company, New
2004), 67-72. York, NY, 1979.
[6] Dobrian, C. and Bevilaqua, F. Gestural Contol of Music: [17] Stockhausen, K. How Time Passes. In die Reihe, 3
Using the Vicon 8 Motion Capture system. In Proceedings (Musical Craftsmanship), 10-40.
of NIME’03 (Quebec, Canada, May 22-24), 2003, 161-3. [18] Vicon Motion Systems. The Vicon MX Motion Capture
[7] Ghent, E. Programmed Signals to Performers: A New System. Detailed at http://www.vicon.com. Last Updated:
Compositional Resource. In Perspectives of New Music, Jan. 30, 2008. Last Checked: Jan 30, 2008.
6, 1, 1967, 96-106 [19] Williamon, Aaron. Musical Excellence: Strategies and
[8] Greschak, J. Polytempo Music Articles. Available at techniques to enhance performance. Oxford University
http://www.greschak.com/polytempo. Last Updated: Jan. Press, Oxford, UK, 2004.
15, 2008. Last Checked: Jan 30, 2008.
33
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
5. REALIZATION EXAMPLES
In this section we discuss three case studies. The first one
is a tremolo study realization (the original piece was com-
posed by Francisco Tarrega). The result is given in Figure
2. Although this example is now more complex it follows a
Figure 1: Two macro-note realizations that are la- similar scheme than the previous one. The following script
belled with ”trr”. The auxilliary notes are displayed was used to realize this example. Here the PM-part (1) ac-
after the main note as note-heads without stems. cesses all chords in a score and runs the Lisp-code part (2)
if the chord contains the expression with the label ’trmch’
(the variable ’ ?1’ will be bound to the current chord). The
duration; the system should know about the current musi- pitch-field consists now of all sorted midi-values that are
cal context such as dynamics, harmony, number of notes in contained in the chord. The most complex part of the
a chord; the system should have knowledge about the cur- code deals with the generation of a plucking pattern for
rent instrument and how it should react to various playing the tremolo gesture (see the large ’case’ expression) This
techniques. result defines the ’:indices’ parameter. Here different pat-
terns are used depending on the note value of the chord.
For instance, if the note value is a quarter note, 1/4, then
4. MACRO-NOTE SYNTAX the pattern will be ’(2 3)’, which will be expanded by the
Next we go over and discuss the main features of the ’add-items’ function to ’(2 1 1 1 3 1 1 1)’. This means that
macro-note syntax. As was already stated above, a macro we will use a typical tremolo pluck pattern where we pluck
note expression uses our scripting syntax having three main once the second note and then three times the first note in
parts: (1) a pattern-matching part (PM-part), (2) a Lisp- the pitch-field, then the third note and three times the first
code part, and (3) a documentation string. In the following note, and so on. We use here also an extra keyword called
code example we give a simple marco-note script that adds ’:len-function’ that guarantees that the sequence is finished
auxiliary notes to the main note simulating a repetition after the pattern has reached a given length.
gesture (see also Figure 1): A break-point function controls the overall amplitude
contour, ’:amp’, of the resulting gesture. Note that this
(* ?1 (e ?1 "trr") ; (1) PM-part
contour is added on top of the current velocity value.
(?if (add-macro-note ?1 ; (2) Lisp-code part
:dur (synth-dur ?1) Finally, we use two parameters that affect the timing of
:dtimes ’(.13 30* .12) the result. The ’:artic’ parameter is now a floating point
:midis (m ?1) value that is interpreted by our system as an absolute time
:indices 1
:artic 50 value in seconds, here 5.0s (by contrast, in the previous
:time-modif example we used integers that in turn were interpreted as
(mk-bpf ’(0 50 100) ’(90 130 100)) percentage values). This controls the crucial overlap effect
:update-function ’prepare-guitar-mn-data))
"repetition") ; (3) Documentation of the tremolo gesture. 5.0s is used here as a short-hand to
say: ’keep all sounds ringing’. The calculation of the final
durations is, however, much more complicated (for instance
the low bass notes will ring longer than the upper ones), but
In the PM-part (1) we first state, with a wild-card, ’*’, this will be handled automatically by the update-function.
and a variable, ’ ?1’, that this script is run for each note in The ’:time-modif’ parameter is similar to the one in the
the score (thus ’ ?1’ will be bound to the current note). Fur- previous example: we do an accelerando/ritardando gesture
thermore we check whether the note contains an expression during the tremolo event.
with the label ”trr”. If this is the case we run the Lisp-code
part (2). Here we call the Lisp function ’add-macro-note’ (* ?1 :chord (e ?1 "trmch") ; (1) PM-part
(?if ; (2) Lisp-code part
that generates a sequence of notes according to its keyword (when (m ?1 :complete? T)
parameters. The arguments are normally numbers, sym- (let* ((ms (sort> (m ?1)))
bols, lists or break-point functions. Internally these argu- inds len-function)
(case (note-value ?1)
ments are converted to circular lists. In our example we first (3/4
specify the duration of the sequence (’:dur’). Next we give a (setq inds (add-items ’(4 3 2 3 2 3) 3 1)
list of durations (’:d-times’). After this we define the ’pitch- len-function ’(= (mod len 24) 0)))
(1/4
field’ of our macro-note, ’:midis’, which is in our case the (setq inds (add-items ’(2 3) 3 1)
midi-value of the current note, ’(m ?1)’ . A closely related len-function ’(= (mod len 8) 0)))
argument, ’:indices’, follows, that specifies how the pitch- (1/2
(setq inds (add-items ’(4 3 2 3) 3 1)
field will be read. Here the pitch-field consists of only one len-function ’(= (mod len 16) 0))))
pitch and using the index 1 we get a sequence of repetitions. (add-macro-note ?1
Two time related parameters follow: the first one, ’:artic’, :dur (synth-dur ?1)
defines an articulation value (which is in our case 50 per- :dtimes ’(.13 30* .12)
:midis (mapcar ’list ms ms)
cent meaning ’half-staccato’); the second, ’:time-modif’, is :indices inds
a tempo function, defined as a break-point function, where :len-function len-function
x-values are relative to the duration of the note (from 0 to :amp (mk-bpf
’(0.0 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
100), and the y-values specify tempo changes as percentage (g+ ’(40 20 0 30 10 50 20 40) (vel ?1)))
values (100 percent means ’a tempo’). Thus in this gesture :artic 5.0
we start slower with 80 percent, make an accelerando up :time-modif (mk-bpf ’(0 50 100) ’(90 130 100))
:update-function ’prepare-guitar-mn-data))))
to 130 percent, and come back to the ’a tempo’ state with "tremolo chords")
100 percent. Finally, the ’:update-function’ performs some
instrument specific calibration of the generated macro-note
sequence. Figure 1 shows two applications of the macro- Our next example is a realization of a arpeggio study by
note script. Heitor Villa-Lobos (Figure 3) and the script is quite similar
35
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
to the previous one. The main difference is that the pitch- 6. CONCLUSIONS
field is sorted according to string number and not according This paper presents our recent developments dealing with
to midi-value as was the case in the tremolo study example. a score-based control system that allows to fill a musical
The ’:indices’ parameter is also different: now it is static, score with ornamental textures such as trills and arpeg-
reflecting the idea of the piece where the rapid plucking gios. After presenting the main syntax features we discussed
gesture is repeated over and over again. three larger case studies that aim to show how the macro-
We combine here two notions of timing control: a global note scheme can be used in a musical context.
one and a local one. A global tempo function (see the break- These examples have been subjectively evaluated by the
point function above the staff that is labelled ”/time”) authors (the first author is a professional guitarist), and
makes a slow accelerando gesture lasting for 5 measures. we consider the macro-note scheme clearly to improve the
This global timing control is reflected in our script where the musical output of our model-based instrument simulations.
local ’:dur’ parameter gets gradually shorter and shorter. While this paper concentrates in the simulation of existing
(* ?1 :chord (e ?1 "vlarp") ; (1) PM-part musical instruments, it is obvious that our control scheme
(?if (when (m ?1 :complete? t) ; (2) Lisp-code part could potentially be used also to control new virtual instru-
(let* ((ms (mapcar #’midi (sort (m ?1 :object T) #’< ments.
:key #’(lambda (n)
(first (read-key n :fingering)))))))
(add-macro-note ?1
:dur (synth-dur ?1)
7. ACKNOWLEDGMENTS
:dtimes ’(.14 20* .12) The work of Mikael Laurson and Mika Kuuskankare has
:midis (mapcar ’list ms ms) been supported by the Academy of Finland (SA 105557 and
:indices ’(6 4 5 3 4 2 3 1 2 1 3 2 4 3 5 4)
:artic 1.0
SA 114116).
:amp (mk-bpf
’(0.0 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
(g+ (vel ?1) ’(50 30 10 40 20 60 30 50)))
8. REFERENCES
:len-function ’(= len 32) [1] A. Friberg. Generative rules for music performance: A
:update-function ’prepare-guitar-mn-data))))
" Villa-Lobos arp")
formal description of a rule system. Computer Music
Journal, 15(2):56–71, 1991.
[2] H. Honing. From time to time: The representation of
Our final example, an excerpt from J. S. Bach’s Sara- timing and tempo. Computer Music Journal,
bande, is the most complex one, and it is probably also 35(3):50–61, 2001.
the most delicate one, due to its slow basic tempo. The [3] M. Kuuskankare and M. Laurson. Expressive Notation
piece is ornamented with rich improvised textures, such as Package. Computer Music Journal, 30(4):67–79, 2006.
portamento glides, trills and arpeggios (see Figure 4). In
[4] M. Laurson, C. Erkut, V. Välimäki, and
the following we discuss the arpeggio script that is applied
M. Kuuskankare. Methods for Modeling Realistic
three times (see the chords with expressions having the la-
Playing in Acoustic Guitar Synthesis. Computer Music
bel ”carp”). The arpeggio script is similar to the tremolo
Journal, 25(3):38–49, Fall 2001.
example as we have a database of plucking patterns. These
[5] M. Laurson and M. Kuuskankare. Aspects on Time
are organized here, however, according to the number of
Modification in Score-based Performance Control. In
notes in the pitch-field. Furthermore, the script can choose
Proceedings of SMAC 03, pages 545–548, Stockholm,
randomly (using the ’pick-rnd’ function) from several alter-
Sweden, 2003.
natives. This results in arpeggio gesture realizations that
are not static but can vary each time the score is recalcu- [6] M. Laurson and M. Kuuskankare. Micro Textures with
lated, similar to the baroque performance practices where Macro-notes. In Proceedings of International Computer
a player is expected to improvise ornaments. Music Conference, pages 717–720, Barcelona, Spain,
2005.
(* ?1 :chord (e ?1 "carp") [7] M. Laurson and M. Kuuskankare. Recent Trends in
(?if (when (m ?1 :complete? t)
(let* ((ms (sort> (m ?1))) PWGL. In International Computer Music Conference,
(ind (case (length ms) pages 258–261, New Orleans, USA, 2006.
(6 (pick-rnd [8] M. Laurson, V. Norilo, and M. Kuuskankare.
’(6 5 4 3 2 1 2 3 4 5)
’(1 2 1 3 4 3 5 6 5 6 5 4 3 2 1) PWGLSynth: A Visual Synthesis Language for Virtual
’(1 2 3 4 5 6 5 4 3 2 1))) Instrument Design and Control. Computer Music
(5 (pick-rnd Journal, 29(3):29–41, Fall 2005.
’(5 4 3 2 1 2 3 4 5)
’(1 2 1 3 4 3 5 5 4 3 2 1) [9] M. Laurson, V. Välimäki, and C. Erkut. Production of
’(1 2 3 4 5 4 3 2 1))) Virtual Acoustic Guitar Music. In AES 22nd
(4 (pick-rnd International Conference on Virtual, Synthetic and
’( 4 3 2 1 2 3 4 )
’(1 2 1 3 4 3 4 3 2 1) Entertainment Audio, pages 249–255, Espoo, Finland,
’(1 2 3 4 4 3 2 1))) 2002.
(3 (pick-rnd
’( 3 2 1 2 3)
’(1 2 1 3 3 2 1))))))
(add-macro-note ?1
:dur (* 0.95 (synth-dur ?1))
:dtimes ’(.15 30* .13)
:midis (mapcar ’list ms ms)
:indices ind
:artic 5.0
:amp (mk-bpf
’(0.0 0.25 25.0 25.25 45.0 45.25 65.0 65.25 100.0)
(g+ (vel ?1) ’(50 0 30 10 40 20 60 30 0)))
:time-modif (mk-bpf ’(0 50 100) ’(60 150 90))
:update-function ’prepare-guitar-mn-data))))
"Bach arp")
36
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Figure 2: Realization of the opening measures of the tremolo study ”Recuerdos de la Alhambra” by Francisco
Tarrega.
Figure 3: Arpeggio study by Heitor Villa-Lobos. This example is challenging as we use macro-notes mixed
with ordinary guitar notation.
Figure 4: Johann Sebastian Bach: Sarabande. This example contains macro-note arpeggios and trills,
vibrato expressions, a tempo function and a portamento expression.
37
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
38
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
39
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3.2 Motion capture protocol and database of these gestures, the performer has been asked to change
We propose to quantitatively characterize timpani ges- the location of the beat impact according to Figure 2 (right
tures by capturing the motion of several timpani perform- side). Finally, our database is composed of fifteen examples
ers. We use a camera tracking Vicon 460 system3 and a of timpani playing variations for each subject, and to each
standard DV camera that allow both the retrieval of ges- example corresponds five beats per hand. This database
ture and sound. will be used when studying in detail the variations of the
The main difficulty using such hardware solutions is then timpani gesture.
the choice of the sampling frequency for the analysis of per- The use of widespread analysis tools integrated in Vicon
cussive gestures (because of the short duration of the beat software allow for the representation of temporal sequences
impact [7]). For our experiments, cameras were set at 250 as cartesian or angular trajectories (position, velocity, accel-
Hz. With a higher sampling frequency (500 Hz and 1000 eration), but one can easily observe that such a representa-
Hz), we could expect to more accurately retrieve beat at- tion isn’t sufficient to finely represent the subtility of gesture
tacks, but the spatial capture range is significantly reduced dynamics, and cannot be easily interpreted by performers.
so that it is impossible to capture the whole body. In the instrumental gesture context, we are mainly inter-
In order to retrieve beat impacts, markers have also been ested in also displaying characteristics such as contact forces,
placed on the drumsticks. The smaller timpani (23”) has vibration patterns, and a higher-level interpretation of cap-
been used to emphasize sticks rebounds. tured data (space occupation, 3D trajectories, orientation of
segments).
4. VISUALIZATION
Our visualization framework proposes the design of a vir-
tual instrumental scene, involving the physical modeling and
animation of both virtual characters and instruments. Tim-
pani gestures are taken from the database and physically
synthetized, making available both kinematic and dynamic
cues about the original motion.
40
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
41
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
by taking advantage of the dynamic characteristics that are sive way of designing new gesture-sound interactions based
available thanks to our virtual character dynamic model. on both kinematic and dynamic gesture features.
5. CONCLUSION
We have presented in this paper a new interface for visu-
alizing instrumental gestures, based on the animation of a
virtual expressive humanoid. This interface facilitates the
Figure 8: PID process. From motion capture data 3D rendering of virtual instrumental scenes, composed of
targets (angles ΘT and angular velocities Θ̇T ), joints’ a virtual character interacting with instruments, as well as
current state (angles ΘS and angular velocities Θ̇S ) the visualization of both kinematic and dynamic cues of the
and coefficients (Kp , Ki and Kd ) to be tuned, torques gesture. Our approach is based on the use of motion capture
τ are processed to physically control the virtual data to control a dynamic character, thus making possible
character. a detailed analysis of the gesture, and the control of the dy-
namic interaction between the entities of the scene. It be-
4.3.2 Interaction comes therefore possible to enhance the visualization of the
hitting gesture by showing the effects of the attack force on
In order to account for the interaction between the vir-
the membrane. Furthermore, the simulation of movement,
tual character’s sticks and the timpani model, we suggest to
including preparatory and interaction movement, provides a
render a propagating wave on the membrane of the timpani
mean of creating new instrumental gestures, associated with
when a beat impact occurs. Although the rendering of such
an adapted sound-production process.
a wave isn’t the theoretical solution of the wave equation,
In the near future, we expect to enrich the analysis of
this model can take into account the biomechanical proper-
gesture, by extracting relevant features from the captured
ties of the limbs and the properties of the sticks. Once the
motion, such as invariant patterns. We will also introduce
collision system detects an impact, kinematic and dynamic
an expressive control of the virtual character from a reduced
features - such as the velocity and the impact force - can be
specification of the percussion gestures. Finally, we are cur-
extracted. These features instantiate the attributes of the
rently implementing the connection of our simulation frame-
propagation of the wave making it possible the visualization
work to well-known physical modeling sound-synthesis tools
of the position and the intensity of the impact (Figure 9).
such as IRCAM’s Modalys [10] to enrich interaction pos-
Once kinematic and dynamic features of motion and phys-
sibilities of this framework. A similar strategy to existing
ical interactions are obtained, we can set up strategies of
frameworks, such as DIMPLE [21], using Open Sound Con-
sound production. In this paper, we limit ourselves to the
trol [25] messages generated by the simulation engine, is
triggering of pre-recorded sounds available from motion cap-
being considered.
ture sessions. These sounds are played when the impacts of
the virtual character sticks are detected on the membrane
of the timpani model. 6. ACKNOWLEDGMENTS
One can notice that the time when the sound is played The authors would like to thank the people who have con-
doesn’t depend on motion capture data, but depends on the tributed to this work, including Prof. Fabrice Marandola
physical simulation and interaction between the virtual per- (McGill), Nicolas Courty (VALORIA), Erwin Schoonder-
former and the percussion model. This provides an exten- waldt (KTH), Steve Sinclair (IDMIL), as well as the tim-
42
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
pani performers. This work is partially funded by the Nat- [15] M. Peinado, B. Heberlin, M. M. Wanderley, B. Le
ural Sciences and Engineering Research Council of Canada Callennec, R. Boulic, and D. Thalmann. Towards
(Discovery and Special Research Opportunity grants), and Configurable Motion Capture with Prioritized Inverse
the Pole de Competitivite Bretagne Images & Réseaux. Kinematics. In Proc. of the Third International
Workshop on Virtual Rehabilitation, pages 85–96,
7. REFERENCES 2004.
[1] R. Boie, M. Mathews, and A. Schloss. The Radio [16] T. Mäki-Patola, P. Hämäläinen, and A. Kanerva. The
Drum as a Synthesizer Controller. In Proc. of the 1989 Augmented Djembe Drum - Sculpting Rhythms. In
International Computer Music Conference (ICMC89), Proc. of the 2006 International Conference on New
pages 42–45, 1989. Interfaces for Musical Expression (NIME06), pages
[2] R. Bresin and S. Dahl. Experiments on gesture : 364–369, 2006.
walking, running and hitting. In Rochesso & Fontana [17] M. Marshall, M. Rath, and B. Moynihan. The Virtual
(Eds.): The Sounding Object, pages 111–136, 2003. Bodhran - The Vodhran. In Proc. of the 2002
[3] D. Buchla. Lightning II MIDI Controller. International Conference on New Interfaces for
http://www.buchla.com/. Buchla and Associates’ Musical Expression (NIME02), pages 153–159, 2002.
Homepage. [18] F. W. Noak. Timpani Sticks. Percussion Anthology.
[4] A. Camurri, B. Mazzarino, M. Ricchetti, R. Timmers, The Instrumentalist, 1984. Third edition.
and G. Volpe. Multimodal analysis of expressive [19] G. B. Peters. Un-contestable Advice for Timpani and
gesture in music and dance performances. In A. Marimba Players. Percussion Anthology. The
Camurri, G. Volpe (Eds.): Gesture-Based Instrumentalist, 1984. Third edition.
Communication in Human-Computer Interaction, [20] G. Rule. Keyboard Reports: Korg Wavedrum.
LNAI 2915, Springer Verlag, pages 20-39, 2004. Keyboard, 21(3):72–78, 1995.
[5] K. Chuchacz, S. O’Modhrain, and R. Woods. Physical [21] S. Sinclair and M. M. Wanderley. Extending
Models and Musical Controllers: Designing a Novel DIMPLE: A Rigid Body Simulator for Interactive
Electronic Percussion Instrument. In Proc. of the 2007 Control of Sound. In Proc. of the ENACTIVE’07
International Conference on New Interfaces for Conference, pages 263–266, 2007.
Musical Expression (NIME07), pages 37–40, 2007. [22] R. Smith. Open Dynamics Engine. www.ode.org.
[6] G. Cook. Teaching Percussion. Schirmer Books, 1997. [23] R. Taylor, D. Torres, and P. Boulanger. Using Music
Second edition. to Interact with a Virtual Character. In Proc. of the
[7] S. Dahl. Spectral Changes in the Tom-Tom Related to 2005 International Conference on New Interfaces for
the Striking Force. Spech, Music and Hearing Musical Expression (NIME05), pages 220–223, 2005.
Quarterly Progress and Status Report, KTH, Dept. of [24] A. Tindale, A. Kapur, G. Tzanetakis, P. Driessen, and
Speech, Music and Hearing, Royal Institute of A. Schloss. A Comparison of Sensor Strategies for
Technology, Stockholm, Sweden, 1997. Capturing Percussive Gestures. In Proc. of the 2005
[8] S. Dahl. Playing the Accent: Comparing Striking International Conference on New Interfaces for
Velocity and Timing in Ostinato Rhythm Performed Musical Expression (NIME05), pages 200–203, 2005.
by Four Drummers. Acta Acoustica with Acoustica, [25] M. Wright, A. Freed, and A. Momeni. Open Sound
90(4):762–776, 2004. Control: The State of the Art. In Proc. of the 2003
[9] C. Dodge and T. A. Jerse. Computer Music: International Conference on New Interfaces for
Synthesis, Composition and Performance. Schirmer - Musical Expression (NIME03), pages 153–159, 2003.
Thomson Learning, 1997. Second edition.
[10] N. Ellis, J. Bensoam, and R. Causse. Modalys
Demonstration. In Proc. of the 2005 International
Computer Music Conference (ICMC05), pages
101–102, 2005.
[11] R. Hanninen, L. Savioja, and T. Takala. Virtual
concert performance - synthetic animated musicians
playing in an acoustically simulated room. In Proc. of
the 1996 International Computer Music Conference
(ICMC96), pages 402–404, 1996.
[12] K. Havel and M. Desainte-Catherine. Modeling and
Air Percussion for Composition and Performance. In
Proc. of the 2004 International Conference on New
Interfaces for Musical Expression (NIME04), pages
31–34, 2004.
[13] R. Jones and A. Schloss. Controlling a physical model
with a 2D Force Matrix. In Proc. of the 2007
International Conference on New Interfaces for
Musical Expression (NIME07), pages 27–30, 2007.
[14] A. Kapur, G. Essl, P. Davidson, and P. Cook. The
Electronic Tabla Controller. Journal of New Music
Research, 32(4):351–360, 2003.
43
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Diana Young
MIT Media Laboratory
20 Ames Street
Cambridge, MA, USA
young@media.mit.edu
44
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
45
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
(k-NN) algorithm was chosen because it is simple and ro- Player 1, Principal Components 1-3
bust for well-conditioned data. Because each data point in
the time series was included, the dimensionality, 9152 (1144
samples in each time series x 8 gesture channels), of the ges-
ture data vector was very high. Therefore, the dimension- 0.15
ality of the gesture data set was first reduced before being 0.1
0
3.1 Computing the Principal Components -0.05
Principal component analysis (PCA) is a common tech- -0.1
nique used to reduce the dimensionality of data [12]. PCA -0.15 1
is a linear transform that transforms the data set into a -0.2
new coordinate system such that the variance of the data 0.95
-0.25
vectors is maximized along the first coordinate dimension 0.3 0.25 0.2 0.15 0.1
0.05 0 -0.05 -0.1 -0.15 -0.2 0.9
(known as the first principal component). That is, most of
the variance is represented, or “explained”, by this dimen-
sion. Similarly, the second greatest variance is along the sec- Figure 2: Scatter plot of all six bowing techniques
ond coordinate dimension (the second principal component), for player 1 (of 8). Accented détaché (square),
the third greatest variance is along the third coordinate di- détaché lancé (triangle), louré (pentagon), martelé
mension (the third principal component), et cetera. Because (circle), staccato (star), spiccato (diamond). The
the variance of the data decreases with increasing coordinate axes correspond to the first three principal compo-
dimension, higher components may be disregarded for sim- nents.
ilar data vectors, thus resulting in decreased dimensionality
of the data set.
In order to reduce the dimensionality of the bowing ges-
Player 5, Principal Components 1-3
ture data in this study, the data were assembled into a
matrix and the principal components were computed using
the efficient singular value decompositions (SVD) algorithm.
0.4
For this bowing technique study, there were 576 (8 players x
6 techniques x 4 strings x 3 performances of each) recorded 0.3
examples produced by the participants, and for each exam- 0.2
ple, 8 channels of the bow gesture data were used. These
0.1
data were used to form a 576 x 9152 matrix M, which was
input to the SVD in order to enable the following analyses. 0
Before continuing with the classification step, it was in- -0.1
formative to illustrate the separability of bowing techniques
-0.2
produced by the individual players. From the matrix M, a 1
46
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
-0.15
1 -0.2 1
-0.25 0.95
0.9 0.25 0.2 0.15 0.1 0.05 0.9
0 -0.05 -0.1 -0.15 -0.2 -0.25
0.8
47
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
• legato Bound together (literally, “tied”). Without [12] G. Strang. Linear Algebra and Its Applications.
interruption between the notes; smoothly connected, Brooks Cole, Stamford, CT, 4th edition, 2005.
whether in one or several bowstrokes. [13] D. Trueman and P. R. Cook. BoSSA: The
deconstructed violin reconstructed. In Proceedings of
• louré A short series of gently pulsed, slurred, legato
the International Computer Music Conference,
notes. Varying degrees of articulation may be em-
Beijing, 1999.
ployed. The legato connection between notes may not
be disrupted at all, but minimal separation may be [14] Yamaha. SV-200 Silent Violin.
employed. http://www.global.yamaha.com/index.html
[15] D. Young. Wireless sensor system for measurement of
• martelé Hammered; a sharply accentuated, staccato violin bowing parameters. In Proceedings of the
bowing. To produce the attack, pressure is applied Stockholm Music Acoustics Conference (SMAC 03),
an instant before bow motion begins. Martelé differs Stockholm, August 2003.
from accented détaché in that the latter has primar- [16] D. Young. A Methodology for Investigation of Bowed
ily no staccato separation between strokes and can be String Performance Through Measurement of Violin
performed at faster speeds. Bowing Technique. PhD thesis, M.I.T., 2007.
• staccato Used as a generic term, staccato means a [17] D. Young, P. Nunn, and A. Vassiliev. Composing for
non-legato martelé type of short bowstroke played with Hyperbow: A collaboration between MIT and the
a stop. The effect is to shorten the written note value Royal Academy of Music. In Proceedings of the 2006
with an unwritten rest. Conference on New Interfaces for Musical Expression
(NIME-06), Paris, 2006.
• spiccato A slow to moderate speed bouncing stroke. [18] D. Young and S. Serafin. Investigating the
Every degree of crispness is possible in the spiccato, performance of a violin physical model: Recent real
ranging from gently brushed to percussively dry. player studies. In Proceedings of the International
Computer Music Conference, Copenhagen, 2007.
7. REFERENCES
[1] J. Berman, B. G. Jackson, and K. Sarch. Dictionary
of Bowing and Pizzicato Terms. Tichenor Publishing,
Bloomington, Indiana, 4th edition, 1999.
[2] F. Bevilacqua, N. H. Rasamimanana, E. Fléty,
S. Lemouton, and F. Baschet. The augmented violin
project: research, composition and performance
report. In Proceedings of the 2006 Conference on New
Interfaces for Musical Expression (NIME-06), Paris,
2006.
[3] CodaBow. Conservatory Violin Bow.
http://www.codabow.com/.
[4] M-Audio. Fast Track USB.
http://www.m-audio.com/.
[5] MIT Committee on the Use of Humans as
Experimental Subjects (COUHES).
http://web.mit.edu/committees/couhes/.
[6] I. T. Nabney. Netlab neural network software.
http://www.ncrg.aston.ac.uk/netlab/index.php.
[7] K. Ng, B. Ong, O. Larkin, and T. Koerselman.
Technology-enhanced music learning and teaching:
i-maestro framework and gesture support for the
violin family. In Association for Technology in Music
Instruction (ATMI) 2007 Conference, Salt Lake City,
2007.
[8] J. Paradiso and N. Gershenfeld. Musical applications
of electric field sensing. Computer Music Journal,
21(3):69–89, 1997.
[9] C. Peiper, D. Warden, and G. Garnett. An interface
for real-time classification of articulations produced by
violin bowing. In Proceedings of the 2003 Conference
on New Interfaces for Musical Expression (NIME-03),
Montreal, 2003.
[10] M. Puckette. Pure Data (Pd).
http://www.crca.ucsd.edu/~msp/software.html.
[11] N. Rasamimanana, E. Fléty, and F. Bevilacqua.
Gesture analysis of violin bow strokes. Lecture Notes
in Computer Science, pages 145–155, 2006.
48
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
49
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
50
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
added to keep track of the virtual string location, i.e. an synthetic noise. The overall spectral shape of the contact
imaginary line representing the virtual string. This is very noise is set with a 4th-order IIR filter.
similar to the work presented in [1]. The line is drawn The slide guitar synthesizer is operated using an optical
through the tube and the averaged location of the pluck- gesture recognition user interface, similarly as suggested in
ing hand, so that the virtual string slowly follows the play- [1]. However, instead of a web-camera, a high-speed infrared
ers movements. This prevents the user from drifting away video camera is used for attaining a lower latency between
from the virtual string. The API detects the direction of the users gesture and the resulting sound. This IR-based
the plucking hand movement, and when the virtual string camera system could also be used for gestural control of
is crossed, a pluck event and a direction parameter is sent. other latency-critical real-time applications. The real-time
Also, a minimum velocity limit is defined for the plucking virtual slide guitar model has been realized in PD. A video
gesture in order to avoid false plucks. file showing the virtual slide guitar in action can be found on
the Internet: http://youtube.com/watch?v=eCPFYKq5zTk.
3.3 PD Implementation
When the PD implementation receives an OSC message
containing a pluck event, an excitation signal is inserted
5. ACKNOWLEDGMENTS
into each waveguide string. The excitation signal is a short This work has been supported by the GETA graduate
noise burst simulating a string pluck. There is also a slight school, the Cost287-ConGAS action, EU FP7 SAME project,
delay (20 ms) between different string excitations for cre- and the Emil Aaltonen Foundation.
ating a more realistic strumming feel. The order in which
the strings are plucked depends on the plucking direction. 6. REFERENCES
Figure 3 illustrates the structure and signaling of the PD
patch. [1] M. Karjalainen, T. Mäki-Patola, A. Kanerva, and
The camera software can be set to show the blob positions A. Huovilainen. Virtual air guitar. J. Audio Eng.
on screen in real time. This is not required for playing, but Soc., 54(10):964–980, Oct. 2006.
it helps the user to stay in the cameras view. The camera [2] M. Karjalainen, V. Välimäki, and T. Tolonen.
API uses roughly 10% of CPU power without the display Plucked-string models: From the Karplus-Strong
and 20-40% with the display turned on. Since PD uses up to algorithm to digital waveguides and beyond.
80% of CPU power when playing all six strings, the current Computer Music J., 22(3):17–32, 1998.
VSG implementation can run all six strings in real time [3] T. I. Laakso, V. Välimäki, M. Karjalainen, and U. K.
without a noticeable drop in performance, provided that the Laine. Splitting the unit delay - tools for fractional
blob tracking display is turned off. Selecting fewer strings, delay filter design. IEEE Signal Proc. Mag.,
switching the contact sound synthesis off, or dropping the 13(1):30–60, 1996.
API frame rate to half, the display can be viewed while [4] J. Pakarinen. Modeling of Nonlinear and
playing. Time-Varying Phenomena in the Guitar. PhD thesis,
Helsinki University of Technology, 2008. Available
3.4 Virtual Slide Guitar on-line at
The virtual slide guitar system is illustrated in Fig. 4. http://lib.tkk.fi/Diss/2008/isbn9789512292431/
The camera API recognizes the playing gestures and sends (checked Apr. 14, 2008).
the plucking and pull-off events, as well as the distance be- [5] J. Pakarinen, M. Karjalainen, V. Välimäki, and
tween the hands, to the synthesis control block in PD. The S. Bilbao. Energy behavior in time-varying fractional
synthesis block consists of the DWG models illustrated in delay filters for physical modeling of musical
Fig. 1. At its simplest, the VSG is easy to play and needs no instruments. In Proc. Intl. Conf. on Acoustics,
calibration. The user simply puts the slide tube and reflect- Speech, and Signal Proc., volume 3, pages 1–4,
ing ring on and starts to play. For more demanding users, Philadelphia, PA, USA, Mar. 19-23 2005.
the VSG provides extra options, such as altering the tuning [6] J. Pakarinen, H. Penttinen, and B. Bank. Analysis of
of the instrument, selecting the slide tube material, setting handling noises on wound string. J. Acoust. Soc. Am.,
the contact sound volume and balance between static and 122(6):EL197–EL202, Dec. 2007.
dynamic components, or selecting an output effect (a reverb [7] J. Pakarinen, T. Puputti, and V. Välimäki. Virtual
or a guitar amplifier plugin). slide guitar. Computer Music J., 32(3), 2008.
The tube-string contact sound gives the user direct feed- Accepted for publication.
back of the slide tube movement, while the pitch of the [8] J. Paradiso and N. Gershenfeld. Musical Applications
string serves as a cue for the tube position. Thus, visual of Electric Field Sensing. Computer Music J., 21(2),
feedback is not needed in order to know where the slide 1997.
tube is situated on the imaginary guitar neck.
[9] M. Puckette. Pure data. In Proc. Intl. Computer
Music Conf., pages 269–272, 1996.
4. CONCLUSIONS [10] M. Puckette. Patch for guitar. In Proc. PureData
This paper discussed a real-time virtual slide guitar syn- Convention 07, Aug. 21-26 2007. Available on-line at
thesizer with camera-based gestural control. Time-varying http://artengine.ca/∼catalogue-pd/19-Puckette.pdf
digital waveguides with energy-compensation are used for (checked Apr. 9, 2008).
simulating the string vibration. The contact noise between [11] J. O. Smith. Physical modeling using digital
the strings and the slide tube is generated with a paramet- waveguides. Computer Music J., 16(4):74–87, Winter
ric model. The contact sound synthesizer consists of a noise 1992.
pulse generator, whose output is fed into a time-varying [12] J. O. Smith. Physical Audio Signal Processing. 2004.
resonator and a distorting nonlinearity. By controlling the Aug. 2004 Draft,
noise pulse firing rate, the resonators center frequency, and http://ccrma.stanford.edu/∼jos/pasp/.
the overall dynamics with the sliding velocity, a realistic [13] V. Välimäki, J. Huopaniemi, M. Karjalainen, and
time-varying harmonic structure is obtained in the resulting Z. Janosy. Physical modeling of plucked string
51
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Camera view
IR Cam
PC
Cam API PD
Control Synthesis
Soundcard
StrLength=
BlobDist=78p
0.49
Pluck=up Pluck=up
PullOff=
PullOff=false false
52
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
53
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
commercialised), like the Gibson HD Digital Guitar featuring seems very limited and qualitatively poor in regard to the sonic
onboard analog-to-digital conversion and an ethernet possibilities offered by the technologies used.
connection which outputs an individual digital audio stream for 2) The instrument undergoes spatial extension, going from a
each string. single object to a collection of interconnected modules. A
common electric guitar playing environment comprises a guitar,
3. SOUND AND GESTURE a set of « effect » pedals and an amplifier, adding up to form an
environment which may easily expand beyond a single person’s
RELATIONSHIP IN THE ELECTRIC physical capacities of simultaneous control.
GUITAR
3.1. The gesture – string – signal continuum It appears to us that the cumulative approach to
The basis of the electric guitar is a transduction of the of the « electrification » and augmentation adopted by the electric
vibrating strings’ mechanical energy into electricity by a guitar carries inherent problems for the signal processing
microphone. The electromagnetic pick-up converts the vibration control which lead to downgrading its sonic and expressive
of the string directly into voltage, thus creating an immediate possibilities. Nevertheless, the established modular set-up of the
causal relationship between the instrumental gesture providing electric guitar is currently undergoing a profound
the initial energy, the string, and the electric signal produced. transformation with the advent of digital audio computing
Thus the basis of the electric guitar preserves the fundamental within the guitar itself or with PC plug-and-play environments.
characteristic of the acoustic instruments, the connection This development could offer an opportunity to redesign the
between gesture and sound, through direct energy transduction electric guitar by efficiently integrating the signal processing
as described by Claude Cadoz [4]. This intimacy ensures a high with the player’s gestures, and connecting the electronic graft to
quality instrumental relationship between the player and the the instrument and to its playing technique on a fundamental
guitar, a fact that has certainly contributed to the success of the level.
electric guitar among other experimental instruments. Players
experience an immediate response, multimodal (haptic, aural, 4. « CONTACT POINTS » : AN ANALYSIS
and, to a lesser degree, visual) feedback, a sense of OF THE GESTURE-SOUND
« connectedness » to the instrument.
RELATIONSHIP :
The augmentation project we have undertaken has its basis in
3.2. A cumulative model of augmentation the observation that a musical instrument is not simply an
While the basis of the electric guitar is a genuine « electrified » object, but a meeting point between a gesture and an object, the
acoustic instrument, its hybrid quality becomes more abstruce result of this encounter being the sound which is produced. For
with the addition of various sound shaping modules or us, a musical instrument loses its essence when taken out of its
« effects », essential in creating the instrument’s tone. These context, i.e. the relationship with the human body. In this
analog or digital extensions are powered by electricity and have « gestural » approach of the instrument, the central question is
no direct energy connection to the initial playing gesture, and to find ways of understanding the link between the body, the
therefore alternative strategies for their control must be object and the sound produced. The nature of the continuum
conceived. This makes up for a second « level » of the hybrid between gesture and sound, mediated by the instrument, is a key
instrument, where the gesture – sound relationship has to be factor for the expressive and musical qualities of an instrument.
designed solely by means of creating correspondencies between A highly functional continuum enables the player to gradually
acquired gesture data and sound processing parameters. The embody the instrument in a process where the musician’s
design of the « electric » level of the hybrid instrument is a proprioception extends to the instrument, resulting in an
central question of instrument augmentation, all the more experience of englobing the instrument and playing directly
challenging as the electric « implant » should integrate and with the sound [2]. Through observation of how the musician
enhance the instrument without hindering the acoustic level’s connects to the instrument, it appears that the body manipulates
sonic possibilities and playing technique. the instrument with a repertory of very precisely defined
movements. Each part of the body connecting to the instrument
In the case of the electric guitar, the question of coexistence has its own « vocabulary » of gestures adapted to its task and to
between the acoustic and electric levels of the instrument has the constraints of the object. This repertory forms the
been addressed with a cumulative model of augmentation. In « instrumental technique » where constituants of the corporal
this process, the electric level is conceived as an extension of « vocabulary » are combined in real time to form an
the initial acoustic instrument, leaving the latter relatively instrumental discourse. Each movement and combination of
intact. Thus the core of the electric guitar does not vary much movements has its caracteristic sonic result. We use the term
from the acoustic one’s : both hands are involved in the initial « contact points » to signify these convergencies between
sound production, working on the mechanical properties of the gesture and object which result in the production or
strings. The augmented part of the instrument is grafted « on modification of a sound. It allows us to think in terms of a
top » of this core by adding various sound processing modules continuum between these three elements and to establish a
and their individual control interfaces. The consequences of this « map » of their relationships in the playing environment.
cumulative process of augmentation are twofold :
1) The playing environment becomes more complex as For instance, mapping « contact points » on the electric guitar
interfaces are added, each new module requiring a separate results in a precise repertory of the gestural vocabulary
means of control. Moreover, as both hands work mainly on the comprised in the playing technique in relationship with each
initial sound production, the control of the augmented level gesture’s corresponding interface and sonic result. We can thus
needs to be relegated to the periphery of the playing technique, establish a typology of initial sound producing « contact
using the little free « space » that can be found in between the points » (left and right hand techniques on the strings) and of
existing hand gestures or in other parts of the body like the feet. the gesture-interface couples which control the instrument’s
Due to this marginal position, the control of signal processing electric level (potentiometers, switches, pedals etc. and their
corresponding gestures). This allows for a comprehensive
54
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
articulation of the instrumental environment in the scope of In our augmentation project we have adopted a gesture-based
establishing strategies for further and/or alternative methodology which proceeds by an initial mapping of the
augmentations. « contact points » comprised in the guitar’s basic playing
technique. The augmentation potential of each gesture is
evaluated in relationship with the available aquisition and sound
processing/synthesis techniques. In parallel, we study the
musician’s body in the playing context, looking for potential
« ancillary » [12] gestures not included in the conventional
playing technique. We then look for ways of tapping into these
gestures with an adapted gesture acquisition system (sensors or
signal analysis), thus activating a new « contact point ».
Following is a selection of augmentations we are working on.
Audio and video material of the augmented guitar and its
related M A A music project can be found at : www.
myspace.com/maamusique
Figure 1. Mapping « contact points » on the electric guitar
(gesture-specific detail not included here). 5.1. « Tilt – Sustain » augmentation
This augmentation is motivated by a double observation : 1) the
In the perspective of instrument augmentation, there is a dual upper body movements that characterise the performance of
interest in the mapping of « contact points ». On the one side, many guitarists remain disconnected from the actual sound.
breaking down the complexities of instrumental playing into a They carry an untapped expressive potential. 2) the sound of the
set of « meta-gestures » and their corresponding sonorities guitar has a very limited duration, which keeps it from
allows us to focus on strategies of « tapping into » the gestures employing long, sustained sounds. The development of the
of the standard instrumental technique, motivated by an guitar can be seen as a long search for this sustained quality [6].
intimate knowledge of gesture and medium. The gesture data The electric guitar with distortion and feedback effectively
acquisition can thus be adapted to the instrument according to attains that but only with a very distinct « overdriven » tone
its technical and playing specificities, using both direct and and high volume levels. The idea of our augmentation is to
indirect acquisition techniques [13]. On the other side, a map of create a sustainer controlled by the tilt of the guitar and of the
contact points allows for the articulation of the instrument player’s torso : the more the guitar is vertical, the more sustain.
according to a typology of « active zones » participating in the The augmentation is developped with a 2-axis tilt sensor
sound production, and of « silent zones » : convergencies of attached to the guitar, mapped to a realtime granular synthesis
gestures and localisations which have no role in the production engine which records the guitar sound and recycles it into a
of sound. From this « map » of « active » and « passive » synthesised sustain. The tilt–sustain augmentation activates a
regions of the instrumental environment, we may go on to find new « contact point » in the electric guitar playing technique,
ways of « activating » the silent zones, creating new contact incorporating torso movements into sound creation.
points and new gestures.
5.2. « Golpe » : The percussive electric guitar
5. THE AUGMENTED GUITAR PROJECT Acoustic guitar allows for the possibility of using percussive
We are currently developing an augmented guitar at the CICM techniques played on the instruments body. Due to its
motivated by the considerations exposed in this article. The microphone design, the electric guitar has lost this ability . The
project is based on simultaneous and crossover use of direct and percussive augmentation we’re working on aims to restore a
indirect gesture data acquisition (i.e. sensors and signal percussive dimension to the electric guitar, thus reactivating a
analysis) [13], as well as, both existing and new « contact traditional « contact point » which remains unused. In order to
points ». The technological platform is made up of a standard tap into the sounds of the guitar’s body, we have proceeded
Fender Stratocaster electric guitar equipped with an additional with the installation of a piezo microphone, detecting the
piezoelectric pickup and a selection of sensors (tilt, touch, percussive attacks and then analysing the signal’s spectral
pressure). The 2-channel audio and multichannel MIDI sensor content. When hit, different parts of the instrument resonate
data output is routed to a PC performing a series of signal with specific spectra, thus allowing us to build up a set of
analysis operations : perceptive feature extraction from the localisation–sound couples. The analysed signal drives a
audio signal (attacks, amplitude, spectrum related data) [5], and sampler where the piezo output is convolved with prerecorded
gesture recognition on the sensor data. The resulting data is percussive sounds, inspired by Roberto Aimi’s approach in his
mapped to the audio engine, providing information for a work for augmented percussions [1].
dynamic control of signal processing. The project is developped
in the Max/Msp environment. 5.3. « Bend » : an integrated « wah-wah »
effect
The left hand fingers operating on the fretboard have an
essential role of producing intonations with horizontal and
vertical movements which range from a minute vibrato to an
extended four semi-tone « bends ». This technique is widely
used on the electric guitar, allowing the player to work in the
doman of continuous pitch variations as opposed to the semi-
tone divisions of the fretboard. The « bend » technique is often
used to enhance the expressiveness of the playing, giving the
guitar a « vocal » quality. The motive of this augmentation is to
Figure 2. The CICM augmented electric guitar set-up extend the inflexion gesture’s effect on the sound from a
55
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
variation of the pitch to a double variation of both pitch and provides high quality feedback on our augmentations, and it
timbre. In our system, we use attack detection and pitch bears a central role in (in)validating our work. As the
following to match the note’s evolution compared to its initial augmentations stabilize and become more refined, we are
pitch. The resulting pitch variation data is mapped to a filter looking forward to conduct a series of user evaluations which
section, emulating the behavior of the classic « wah-wah » could provide useful insight for further developement of the
effect. We find that controlling the filter through an expressive augmented guitar.
playing gesture incorporates the effect into the musical
discourse in a subtle manner compared to the expression pedal 7. REFERENCES
used traditionally for this type of effect.
[1] Aimi R. M. « Hybrid Percussion : Extending Physical
Instruments Using Sampled Acoustics » PhD thesis,
5.4. « Palm muting » : an augmented effect Massachusetts Institute of Technology 2007 p. 41
switch [2] Berthoz A. La décision Odile Jacob, Paris 2003 pp.
A popular playing technique on the electric guitar consists of 153-155
muting the strings with the picking hand’s palm, thus [3] Bevilaqua F. « Interfaces gestuelles, captation du
producing a characteristic, short, muffled sound. Our mouvement et création artistique » L'inouï #2, Léo
augmentation is based on the detection of the muting gesture by Scheer Paris 2006
an analysis of the spectral content of the guitar’s signal : a loss [4] Cadoz C. « Musique, geste, technologie » Les
of energy in the upper zones of the spectrum, regardless of nouveaux gestes de la musique Parenthèses, Marseille
which string(s) is(are) being played. Our system tests the 1999 pp. 49-53
incoming signal with a « model » spectrum, interpreting closely [5] Jehan T. « Perceptual Synthesis Engine : An Audio-
matching signals as the result of a muted attack. The aquired Driven Timbre Generator » Masters thesis
« muting on/off » data is used in our guitar as a haptic Massachusetts Institute of Technology 2001
augmentation of an effect pedal’s on/off switch, allowing to add [6] Laliberté M. « Facettes de l’instrument de musique et
a desired timbre quality (« effect ») to the sound simply by musiques arabes » De la théorie à l’art de
playing in muted mode. l’improvisation Delatour, Paris 2005 pp. 270-281
[7] Machover T. «hyperinstruments homepage »
http://www.media.mit.edu/hyperins/
6. CONCLUSION AND FUTURE WORK [8] Overholt D. « The Overtone Violin »
The augmented guitar project is currently evolving at a steady Proceedings Of NIME 2005 Vancouver 2005
pace, exploring new augmentations and sound–gesture [9] Rasamimanana N. H. « Gesture Analysis of Bow
relationships. Two different directions seem to emerge from this Str okes Using an Augmented Violin » Masters thesis
work : one is refining the traditional electric guitar working Paris VI University 2004
environment by finding ways of replacing the poorly integrated [10] « Roland database » http://www.geocities.
effect modules with signal processing control systems more com/SiliconValley /9111/roland.htm
closely connected to the guitar’s playing technique. The other [11] Smitsonian Institute, « The Invention of the
direction points towards more radical augmentations of the ElectricGuitar »http://invention.smithsonian
guitar’s soundscape ; associated with the will of expanding the .org/centerpieces/ electricguitar
guitar’s melodically and harmonically oriented musical [12] Verfaille V. « Sonification of musicians’ ancillary
environment towards novel possibilities of working with timbre gestures » proceedings of ICAD 2006 London 2006
and sound texture. A central factor in this research is the [13] Wanderley M. « Interaction musicien-instrument:
establishment of an interactive working relationship between application au contrôle gestuel de la synthèse sonore »
technological innovation and music. Live playing experience Phd Thesis Paris VI University 2001 pp. 40-44
56
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
This paper describes the Sormina, a new virtual and tangible
instrument, which has its origins in both virtual technology and 2. MOTIVATION
the heritage of traditional instrument design. The motivation The motivation for this innovation is the desire to create totally
behind the project is presented, as well as hardware and new musical instruments in the context of classical music by
software design. Insights gained through collaboration with using computers and sensors. We are interested in designing
acoustic musicians are presented, as well as comparison to digital instruments that could be accepted as part of the standard
historical instrument design. symphony orchestra. We believe that classical music can
benefit from the current developments in digital technology.
Keywords The symphony orchestra has been quite stable during the last
Gestural controller, digital musical instrument, usability, music century, although there have been some experiments using
history, design. electronics. Sormina aims to encourage the symphony orchestra
to develop further to meet the challenges of the digital era. A
handheld computer interface is operated very close to the body,
1. INTRODUCTION which makes the user experience quite intimate. By offering
Sormina is a new musical instrument that has been created as new modes of sensory engagement and intimate interaction,
part of a research project in the University of Arts and Design sormina contributes to a change in the digital world, from
Helsinki, Media Lab. Sormina uses sensors and wireless disembodied, formless, and placeless interaction to materiality
technology to play music. Its design is guided by traditional and intimacy.
instrument building.
This project participates in a long tradition of similar
In new wireless technology, the instrument loses part of its innovations, starting from the Theremin, which is a rare
traditional character. The physical connection between the example of a musical innovation to become part of classical
sounding material and the fingers (or lips) is lost. The material music practise. In addition to Theremin, one of the most
does not guide the design, which puts the designer in a totally influental to the current research has been Rubine and
new situation with new questions. This study tries to answer McAvinneys article in Computer Music Journal 1990, where
these questions by exploring the design of a new instrument that they presented their VideoHarp controller and discussed issues
is intended for use in the context of a live symphony orchestra. related to its construction [1]. Also Michel Waiswisz and his
The research has started from the concept of the interface, Hands has been a great inspiration [2]. Recently, Malloch and
which traditionally is held in hands or put in the mouth. The Wanderley have proposed the Tstick [3]. Important questions
playing posture of the musician, the delicate controllability of concerning parameter mapping have been discussed in Hunt,
the instrument and the ability to create nuances are considered Wanderley and Paradis [4].
as the key phenomena of the new design. Visual aesthetic and
usability are of equal importance.
Sormina aims to take the musician on a tour to the ancient
3. CONSTRUCTING THE INSTRUMENT
world, where tools were built to fit the fingers of human beings, 3.1 Hardware, sensors
and where technology was to serve humanity. The technological Structurally, the Sormina is built using a Wi-microdig analog to
tools have changed during centuries, but the idea of music digital encoder, a circuit board for the wiring, and 8
making stays the same. Using the most modern technology for potentiometer sensors with custom-made, wooden knobs. The
music making does not have to result in underrating our Wi-microDig is a thumb-sized, easily configurable hardware
common heritage. device that encodes up to 8 analog sensor signals to
multimedia-industry-compatible messages with high resolution
and then transmits these messages wirelessly to a computer in
real-time for analysis and/or control purposes [8]. The custom-
Permission to make digital or hard copies of all or part of this work for made circuit board takes care of the wiring. The potentiometers
personal or classroom use is granted without fee provided that copies are are mounted in the circuit board in an upright position, and the
not made or distributed for profit or commercial advantage and that encoder unit is also attached to the circuit board. The knobs of
copies bear this notice and the full citation on the first page. To copy the potentiometers are arranged in a straight line on top of the
otherwise, or republish, to post on servers or to redistribute to lists, instrument.
requires prior specific permission and/or a fee.
NIME08, June 5-7, 2008, Genova, Italy The manufacturer of Wi-microDig promises that the 8 inputs of
Copyright remains with the author(s). 10 bits resolution each can sample at up to 1500 Hz with only
57
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
milliseconds latency [8]. The wireless transmission complies quite reliably. The Max/MSP programming environment was
with the Bluetooth v2.0 standard, which is claimed to be a also favored for its usefulness in other parts of the project.
reliable protocol and, at 115, kbs much faster than MIDI speed.
The wireless range is guaranteed up to 100 meters without The wi-microdig patch outputs the sensor data as 7-bit
obstructions, since it is a Bluetooth class 1 device. With the information, which was found to be sufficient for the purpose of
the project. According to the tests made, it was not possible to
prototype there was considerable problems with the connection
range. The encoder in question was, however, an older model produce any larger resolution with the finger movements using
than Wi-microDig. the small potentiometer knobs of Sormina.
The construction of the controller is open: it is not put in a box A visual user interface was programmed using Max/MSP,
which also handles the connection to the encoder. One purpose
or cover. With the help of this arrangement, the visual design
appears light and spacious. However, the decision to use no of the interface is to give the musician visual cues in controlling
cover is subject to change in the forthcoming prototypes, as the the instrument. This proved to be beneficial especially in the
learning phase. In addition to the feel in the fingertip, it was
openness makes the construction vulnerable to dust and
moisture. helpful to see the state of all the sensors at one glimpse on the
screen.
The Sormina makes use of 8 potentiometer sensors, which is the
The visual interface comprises sliders, number boxes, and basic
maximum number of sensors to be connected to the encoder.
The choice between sensors was made on the basis of three notations for the sensor input. At the same time the interface is
main arguments: stability, precision and tangibility. The Wi- capable of recording a control sequence, which was found
microDig encoder comes with only one potentiometer, which useful for learning to play the instrument. While the recorded
sequence is playing back, the visual information about the state
did not fit the standards set for the instrument design. The
suitable potentiometers were purchased separately. of the sensors is shown on the interface.
The first argument for the selection of the sensor type was
stability. In order to attain a stable instrument, the sensors also
have to provide this characteristic. Stability in this context
means a sensor that would preserve its state when not touched.
Most of the available sensors are built accoeding to a
convention that does not give support to this demand.
Potentiometer sensor changes its state only by intentional
action. Stability is also required for an instrument in the sense
of durability and robustness. Potentiometers proved to be stable
also in this sense.
3.2 Software
The software for Sormina has been programmed using
Max/MSP and Reaktor. It consists of three parts: one handles
the communication with the encoder through bluetooth, the
second takes care of the user interface, and the third produces
the sound. In addition, external software, Sibelius, was used for
the notation.
Figure 2. Part of the visual interfafe
The Wi-minidig comes with its own software, which actually is
not used in this project. This software is meant to take care of
the bluetooth connection and let the user decide the The sound is created using a sound synthesis patch created for
interpretation of the sensor data, which is then sent forward as the Reaktor software. The patch allows the control of several
MIDI information. In addition to this rather laborious software, features of sound synthesis. The mapping of the sensors to the
the company alsso offers on the web site for the same purpose a sound synthesis software appeared to be of crucial importance.
Max/MSP patch, which proved to be handier for the purpose of
Mainly due to the capabilities of the encoder, it was decided
the project. The wi-microdig patch for Max/MSP appeared to
that there should be 8 sensors. Nevertheless, it was found to be
handle the communication through bluetooth with the encoder
a very useful restriction. It was assumed that a human being
cannot handle too many controls at the same time. Too many
58
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
59
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
5. DISCUSSION
The aim of the Sormina project was to explore the main
principles of the instruments in classical music, from the 6. FUTURE DIRECTION
musician’s point of view, and with these findings to create a The current research has used the observation of traditional
new, stable electronic music instrument that could be accepted musical instruments and their user experience for the design of
in a symphony orchestra. The results suggested the importance a new electronic music instrument. Still, the scope of the
of three layers in the design of new instruments. The first layer exploration has been narrow, concentrating primarily on the
is the sound-synthesis that defines the audible response. The author’s experience of acoustic instruments. In the future, a
second is the mapping of the gestures to the sound parameters, more systematic inquiry will be accomplished, where
which constitutes the instrument in a conceptual manner to the professional musicians will be observed and interviewed about
musician. The third layer, often overlooked in the creation of their playing habits. Also, perceiving the learning process in the
new digital music instruments, DMIs, is the materiality and study of classical music instruments can reveal qualities that
usability layer of the controller. could then assist in new instrument design.
Much weight in the research has been put to the human hand One direction in the development of the instrument is to
and its capabilities. The author has followed Curt Sachs’ combine the sound output with a live visual output. This is
findings about the hands and feet being the first intstruments especially attractive because of the readiness of Max/MSP/Jitter
[5], and Malcolm McCullough, as he praises our hands as a best to process and produce video and other moving image. Using
source of personal knowledge [6]. A remarkable source for the same parameters in video processing brings up interesting
understanding the importance of music playing has been Tellef questions about the connection between auditory and visual
Kvifte, who formulates a classification of instruments using sensory systems.
playing techniques, not based on the way the musical sound is
produced [7]. To enhance the usability of the instrument, its robustness needs
more attention. Also, in order to compete with traditional
The Sormina research suggests that the touch and feel of the instruments, the Sormina should be developed more in the
interface is important to take into account when designing new direction of a consumer product.
instruments. The musician uses subtle, almost intuitive and
unconscious movements of her body. The fingers, for example, 7. ACKNOWLEDGMENTS
have developed through evolution to take care of the most
The author would like to acknowledge the important
sophisticated and precise actions. Therefore it is reasonable to
contributions of many people to this project, including Martijn
use the fingers for playing music. In the culture of the human
Zwartjes, Risto Linnakoski, and Matti Kinnunen. The author
being, the fingers have been crucial for surviving. Even today,
received research funds from the Wihuri Foundation and the
they are used extensively, to express our thoughts, by writing
Runar Bäckström Foundation. The University of Arts and
with a pen or a computer.
Design Helsinki has also given grants for the research.
In the course of history, traditional instruments have matured to
be well adapted to the human body. Their long evolution has 8. REFERENCES
given them power to survive even in the era of computers. [1] Rubine, D. and McAvinney, P. Programmable
Through careful examination of their principles, it is possible to Fingertracking Instrument Controllers. In Computer Music
learn from their pattern and use the results in the design of Journal, Vol 14, No. 1, Spring 1990, 26-40.
totally new electronic instruments. In the present research, the
role of the physical interface has been found to be fundamental [2] Waisvisz, M. The Hands, A Set of Remote MIDI
for such a design. It appears that attention should be paid to the Controllers. In Proceedings of 1985 International
physical appearance of the instruments in order to build stable Computer Music Conference. Computer Music
instruments. Association, San Francisco, 1985.
Sormina aims to be more than a controller. As Rubin and [3] Malloch, J. and Wanderley, M. The TStick: From Musical
McAvinney formulate, a musical instrument may be thought of Interface to Musical Instrument. In Proc. of the 2007 Conf.
as a device that maps gestural parameters to sound control on New Interfaces for Musical Expression (NIME-07),
2007, 66-69.
parameters and then maps the sound control parameters to
sound [1]. By binding together a fixed set of sensors with a [4] Hunt, A., Wanderley, M. and Paradis M. The importance
stable sound source, we have developed Sormina into an of parameter mapping in electronic instrument design. In
instrument, not a controller. Proc. of the 2002 Conf. on New Interfaces for Musical
Expression (NIME-02), 2002, 149–154.
Sormina attempts to be engaging to new musicians, but also
rewarding for the professionals. Based on the current evidence, [5] Sachs, C. The History of Musical Instruments. Norton,
these goals have been reached to a large extent. New York, 1940, 25-26.
The Sormina has been played in concert situations, both solo [6] McCullough, M. Abstracting Craft. The Practiced Digital
and with acoustic musicians. Playing with an acoustic cello has Hand. The MIT Press, Cambridge, Massachussets, 1996,
been rewarding, but an a cappella choir also made a good 1-15
combination with the electronic sounds of the Sormina. [7] Kvifte, T. Instruments and the electronic age. Toward a
terminology for a unified description of playing technique.
The experience of concerts with acoustic instruments and
Solum förlag, Oslo, 1988, 1.
singers point out that the sound quality and playing techniques
of Sormina are well adaptable to classical music orchestra. The [8] Wi-Microdig v6.00/6.1 <http://infusionsystems.com/
possibility to notate the playing brings another useful catalog/info_pages.php?pages_id=153
characteristic for use with a symphony orchestra.
60
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Actuator Sensor
Musician
Controller
61
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
62
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
63
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
F
Table 2: Control Hardware
Approximate Native
Control Maximum minimum floating x
hardware sampling rate delay point
ATMEL-based ≈ 20kHz =50μs
˙ N z
DIMPLE 1kHz < 1ms Y
TFCS 40kHz ≈ 20μs Y x
ASP 96kHz typ. 10ms typ. Y
64
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
65
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
66
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
This paper presents a new approach for designing acoustic guitars, 1.1. Acoustic, Electric and Virtual Guitar
making use of the virtual environment. The physical connection
between users and their instruments is preserved, while offering The design of a guitar is influenced by its cultural context. For
innovative sound design. This paper will discuss two projects, thousands of years lutes and afterwards guitars evolved: starting
reAcoustic eGuitar, the concept of a digitally fabricated with ancient instruments that were made out of natural chambers
instrument to design acoustic sounds, and A Physical Resonator (turtle shells, gourds), through fine handmade wooden chambers
For a Virtual Guitar, a vision in which the guitar can also [4] to electrically amplified guitars. Carfoot [7] presents and
preserve the unique tune of an instrument made from wood. analyzes the huge changes in guitar in the 20th century; electric
guitars, which use electricity in order to amplify instead of
chambers, evolved at mid century and were a part of the musical
Keywords revolution of Rock & Roll and its distortion sound.
Virtual, acoustic, uniqueness of tune, expressivity, sound
processing, rapid prototype, 3D printing, resonator. The guitar has been influenced by electrical technologies. It is to
be expected that digital technologies will now take a significant
part in the guitar evolution. While sound design has been
1. BACKGROUND conventionally done using digital software, expressive digital
Each acoustic instrument made of wood is unique. Each piece of instruments are starting to appear as well. The Line 6 Variax [5]
wood is different, leading to uniqueness of tune of the acoustic guitar gives a variety of preset sounds, from classic acoustic and
sound that is created. Both uniqueness and expressivity are the electric tones to sitar and banjo. It allows the player to plug into a
most important characteristics of the acoustic instrument. Digital computer and customize a chosen tone. Expressive playing and
instruments lack the uniqueness but usually allow more sound sound flexibility is enhanced with the digital guitar. Another
flexibility [1], by offering digital sound processing or synthesis example is Fender’s VG Stratocaster [6], a hybrid electric and
[2]. digital guitar.
Digital keyboard instruments have been significantly more
Carfoot uses the term virtual instead of digital. If digital defines
successful than bowed or plucked instruments, which suffered
the type of process being done, virtual refers better to an
from lack of expressivity and uniqueness of tune. On the one
experience’s context. Like virtual reality, the virtual sound
hand, the digital instrument can add new interfaces, controllers
created in digital environment imitates real life experience. This
and sound abilities to the musical experience. On the other hand,
experience feels like a natural experience to our senses, but it was
there is a significant cost for modeling the captured information
created with a computer model of that real life experience. In
into a pre-defined digital structure. Besides the processing
sections 2 and 3 we present our approach using the virtual sound
problem, it usually leads to decreasing or canceling the
uniqueness of tune and expressivity of the instrument. experience in order to create a new physical guitar (a conceptual
work). In section 4 we present a different vision in which the
The main approach to deal with the expressivity problem lies in guitar can also preserve unique tune of a material (a work in
the field of sound processing, instead of synthesis. One option to progress).
this approach is to capture expressive signal and modify some
parameters while preserving the expressive behavior [3].
We come to suggest a different approach. We believe that 2. COMBINING VIRTUAL AND
significant work can be done by combining benefits from both of PHYSICAL IN GUITAR DESIGN
the worlds (digital and physical) – preserving the values of 3D design, sound design and digital music software are becoming
acoustic instruments while applying digital control to their common and easier to use. Their combination is leading to the
structures. possibility of designing, simulating and printing objects according
to pre-required acoustic behavior.
67
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3. reACOUSTIC eGUITAR
Three perspectives are fundamental to the sound experience
created by a musical instrument: the listener, the performer and
the instrument constructor [12].
The vision of reAcoustic eGuitar invites players to become
creators of their acoustic instruments and their sounds with
endless possibilities for the sounds to be re-shaped. Players will
customize their own sounds by assembling different small
chambers instead of using a single large one. Each string has its
own bridge; each bridge is connected to a different chamber.
Changing the chamber size, material or shape will change the
guitar’s sound.
Designing sounds digitally allows the player to share the
experience of the constructor. This might lead in a change of
relationship between players and their instruments. Today rapid
prototype materials have a broad range of qualities. Players can
now take part in designing their own acoustic sounds, by
modifying the physical structure of their instruments, revealing
the characteristics of new materials (see Figure 1).
We created a simple chamber in rapid prototype process. This
chamber adds a significant amplification to a single string (see
Figure 2)1, even without optimizing acoustical parameters as
membrane thickness and sound box size.
In the reAcoustic eGuitar vision digital technology will be used to
design the acoustic guitar structure (see Figure 3 for a design Figure 1: Constructing principles: Searching, downloading,
suggestion). It presents a novel sound design experience between modifying, printing and assembling the chambers.
users, their objects and the digital environment.
Re-designing the guitar according to the characteristics of rapid
prototyping materials could lead to sound innovations. Open
source and shared files environments could create a reality in
which a player downloads or designs his own sound cells, and
plugs them to his instrument (see Figure 4).
Starting from virtual sound, getting the desired virtual shape and
then printing it, the reAcoustic eGuitar offers a new user
experience for the guitar player.
The main disadvantage of the reAcoustic eGuitar concept lies in
the rapid prototype process itself. The process is expensive and
doesn’t preserve uniqueness of tune as wood does. Perhaps in a
few years, 3D printers will become less expensive and more
accessible so this idea can be reconsidered.
68
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
unique tool that also enables the player to design the required
sound with the computer.
The uniqueness of a musical instrument influences more than just
its sound. By differing itself from other instruments, it assumes an
individual economic value and stabilizes a unique relationship
with its owner. The structure of the wood is the main reason for
the acoustic instrument’s unique behavior. The grain of the
soundboard [13], the wood’s humidity, the exact thickness and
more influence how it transfers different frequencies. Luthiers
[14,15] used their experience in order to tune the instrument by
making modification to the wood until it gave the required results.
A Physical Resonator For A Virtual Guitar focuses on the
influences of the chamber on the sound of the acoustic guitar. The
chamber’s main parameters are the shape and material [14,15].
The structure and shape can be virtually designed on a computer
and be used as a virtual chamber. The material will not be
synthesized or modulated. In this way we will get a hybrid
chamber – part of it is physical (the guitar’s resonator) and part of
it is virtual (see Figure 5).
A replaceable slice of the material (the guitar resonator) will be
connected to the guitar bridge using mechanism that enables easy
replacement. Piezo sensors will capture the frequencies being
developed on the guitar’s resonator. The signal will be transferred
to a digital signal-processing unit (DSP). The DSP will modify the
sound by simulating different chambers shapes and sizes,
thickness and surface smoothness.
Figure 3: reAcoustic eGuitar, a design suggestion.
69
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
5. CONCLUSION AND FUTURE WORK Version of Die Gitarre und ihr Bau by Havrey J. C., (1981).
We believe that the future of the guitar lies in the connection The Bold Strummer Ltd, First English Edition, 1981.
between digital sound design and acoustic experience. Digital [5] Line 6, Variax®. Product website line6.com/variax. Last
processing can create new options for sound design, where the accessed: January 27, 2008.
acoustic part of the instrument will give the expressivity and
[6] Fender, VG Stratocaster®. Product website
uniqueness of tune. The reAcoustic eGuitar concept is based on
www.fender.com/vgstrat/home.html. Last accessed: January
rapid prototype techniques and 3D printers. This process is 30, 2008.
expensive and not accessible to the majority of guitar players.
There is not enough knowledge and experience of using rapid [7] Carfoot G. Acoustic, Electric and Virtual Noise: The Cultural
prototype for creating acoustic instruments. However, we believe Identity of the Guitar. Leonardo music journal, Vol. 16, pp.
that this may be more feasible in the future. 35-39, 2006.
The A Physical Resonator For A Virtual Guitar is a work in [8] Gershenfeld N. FAB: The Coming Revolution on Your
progress. We believe that by creating a chamber that is part virtual Desktop - From Personal Computers to Personal Fabrication,
and part physical, we will preserve expressivity and uniqueness of pp. 3-27. Basic Books, April 12, 2005.
tune in digital sound design innovations. We intend to develop a [9] RedEye RPM. Guitar with digital manufacturing technology.
working model for A Physical Resonator For A Virtual Guitar. Company website www.redeyerpm.com. . Last accessed:
This process will be divided into different parts - from mechanical January 27, 2008.
solution for the replaceable resonator through development of
piezo sensors system that will be able to capture the resonator [10] Jonathan H. Carbon Fiber vs. Wood as an Acoustic Guitar
vibration in different locations. We also intend to develop a DSP Soundboard. PHYS 207 term paper.
unit that will implement the digital modeling of the structure. [11] Blackbird Guitars. Blackbird Rider Acoustic. Company
website www.blackbirdguitar.com. Last accessed: January
27, 2008.
6. ACKNOWLEDGMENTS [12] Kvifte T., Jensenius A. R. Towards a Coherent Terminology
Authors want to thank MIT Media Laboratory, Marco Coppiardi, and Model of Instrument Description and Design. NIME 06,
Cati Vaucelle, Nan-Wei Gong and Tamar Rucham or their help June 4-8, 2006. Paris, France.
and support.
[13] Buksnowitz C., Teischinger A., Muller U., Pahler A., Evans
R., (2006). Resonance wood [Picea abies (L.) Karst.] –
7. REFERENCES evaluation and prediction of violin makers’ quality-grading.
J. Acoustical Society of America 121, 2007.
[1] Magnusson T., Mendieta E. H. The Acoustic, The Digital
and he Body: A Survey on Musical Instrument. NIME 07, [14] Kinkead J. Build Your Own Acoustic Guitar: Complete
June 6-10, 2007. New York, New York, USA. Instructions and Full-Size Plans. Published 2004 by Hal
Leonard.
[2] Poepel C., Overholt D. Recent Developments in Violin-
related Digital Musical Instruments: Where Are We and [15] Cumpiano W. R., Natelson J. D. Guitarmaking: Tradition
Where are We Going? NIME 06, June 4-8, 2006. Paris, and Technology: A Complete Reference for the Design &
France. Construction of the Steel - String Folk Guitar & the Classical
Guitar (Guitar Reference). Published 1998 by Chronicle
[3] Merrill D., Raffle H. The Sound of Touch. CHI 2007, April
Books.
28 – May 3, 2007, San Jose, California, USA.
[4] Jahnel F. (1962). Manual of Guitar technology: The History
and Technology of Plucked String Instruments. English
70
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Dylan Menzies
Dept. Computer Science and Engineering
De Montfort University
Leicester, UK
rdmg dmu.ac.uk
71
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
displacement
profile at the contact point to generate an audio excitation.
pulse shorter Rolling is similar to sliding, except there is no relative
because k increases
movement at the contact point, resulting in a spectrally
above threshold
less bright version of the sliding excitation. This can be
modeled by appending a lowpass filter that can be varied
constant k/m pulses
according to the slip speed at the contact, creating a strong
cue for the dynamics there. See Figure 3. A second or-
time
der filter is useful to shape the spectrum better. The con-
tact excitation is also amplified by the normal force, in the
same way impacts are modified by collision energy. More
Figure 1: Displacements from three impacts, one of
subtle are modifications to spectral brightness according to
which is stiff.
the m/k ratio that determines the brightness of an impact.
Low m/k corresponds to a light needle reading the surface
at full brightness. Heavier objects result in slower response,
which can modeled again by controlling the lowpass filter.
contact layer Although simple, this efficient model is effective because it
surface
contact surface profile lowpass gain exittion
speed generator
Figure 2: A grazing impact. / position freq
m/k
Impact stiffness is important for providing cues to the slip speed
listener about impact dynamics, because it causes spec-
tral changes in the sound depending on impact strength, normal force
whereas impact strength judged from the amplitude level
of an impact received by a listener is ambiguous because Figure 3: Surface excitation from rolling and slid-
of the attenuating effect of distance. Stiffness can be mod- ing.
eled by making the spring constant increase with impact
displacement. This causes an overall decrease in impact takes in the full dynamic information of the contact and
duration for an increase in impact amplitude, and makes it uses it to shape the audio which we then correlate with the
spectrally brighter, illustrated in Figure 1. The variation visual portrayal of the dynamics. It is also easily customized
in stiffness with impulse is a property of the surface and to fit the sound designers requirements. When flat surfaces
can be modeled reasonably well with a simple breakpoint are in contact over a wide area this can be treated as sev-
scheme, that can be tuned by the sound designer directly. eral spaced out contact points, which can often be supplied
Increasing brightness with note loudness is an important directly by the dynamics-collision system.
attribute of many musical instruments, acoustic and elec-
tronic, and is rooted in our everyday physical experience. 2.2.2 Contact jumps
It might even be called a universal element of expression. Even for a surface that is completely solid and smooth,
Phya incorporates this behavior naturally. the excitations do not necessarily correspond very well with
the surface profile. A contact may jump creating a small
2.1.3 Multiple hits and grazing micro-impact, due to the blunt nature of the contact sur-
Sometimes several hits can occur in rapid succession. A faces, see Figure 4. The sound resulting from this is signifi-
given physics engine would be capable of generating this im- cant and cannot be produced by reading the surface profile
pact information down to a certain time scale. The effect directly. Again, the detailed modeling of the surface inter-
can be simulated by generating secondary impulses accord- actions is beyond the capabilities available from dynamics
ing to a simple poisson-like stochastic process, so that for and collisions engines, which are not designed for this level
a larger impact the chance of secondary impacts increases. of detail. Good results can instead be achieved by adding
Also common are grazing hits, in which an impact is as- the jumps, pre-processed, into the profile, Figure 5. Down-
sociated with a short period of rolling and sliding. This sampling a jump results in a bump, unless it is sampled
is because the surfaces are uneven, and the main impulse with sufficient initial resolution, which may be impracti-
causing the rebound occurs during a period of less repulsive cal. A useful variation is therefore to downsample jumps to
contact. Such fine dynamics cannot be captured by a typ- jumps, by not interpolating. This retains the ’jumpiness’
ical physics engine. However, good results can be achieved and avoids the record-slowing-down effect.
by combining an audio impulse generation with a continu-
ous contact generation, according to the speed of collision 2.2.3 Programmatic and stochastic surfaces
and angle of incidence, see Figure 2. The component of ve- Stored profiles can be mapped over surface areas to cre-
locity parallel to the surface is used for the surface contact ate varying surface conditions. This can be acceptable for
speed. sparse jump-like surfaces that can be encoded at reduced
sample rates, but in general the memory requirements can
2.2 Continuous contacts be unreasonable. An alternative is to describe surfaces pro-
grammatically, either in a deterministic or fully stochastic
2.2.1 Basic model way. The advantage of a largely deterministic process is
Continuous contact generation is a more complex pro- that repetitions of a surface correlate closely, for instance
cess. The first method introduced, [13], was to mimic a when something is rolling back and forth, providing consis-
needle following the groove on a record. This corresponds tency cues to the dynamic behavior even without visuals.
to a contact point on one surface sliding over another sur- Indexable random number generators provide a way to de-
face, and is implemented by reading or generating a surface terministically generate random surfaces. Others include
72
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
linked to the movement of the door. Stick and slip for dis-
crete solid objects is simulated well by the generation of
pulses at regular linear or angular intervals. The amplitude
and spectral profile of the pulses modifying as the contact
force and speed changes. As contact force increases, nor-
mally the interval between each pulse increases, due to the
increased static friction limit, with more or less constant
impact lateral spring constant.
2.2.5 Buzzing
Common phenomena are buzzing and rattling at a con-
tact, caused by objects in light contact that have been set
Figure 4: Micro-impact occuring due to contact ge- vibrating. Like impact stiffness, it provides a distant in-
ometry dependent cue of dynamic state, which in this case is the
amplitude of vibration. Objects that are at first very quiet
can become loud when they begin to buzz, due to the non-
linear transfer of low frequency energy up to higher frequen-
cies that are radiated better. Precise modeling of this with
a dynamics-collision engine would be infeasible. However,
the process can be modeled well by clipping the signal from
the main vibrating object, as shown in Figure 7, and feed-
ing it to the resonant objects that are buzzing against each
other. The process could be made more elaborate by cal-
culating the mutual excitation due to two surfaces moving
Figure 5: Preprocessing a surface profile to include against each other.
jumps.
normal force
filter event gen amp pulse filter resonance
2.3 Resonators
Figure 6: Modeling loose surface particle sound. 2.3.1 Modal resonators, calibration, location de-
pendence
rameters are used to determine the activity rate of a poisson There are many types of resonator structure that have
like process which then generates impulses mimicking the been used to simulate sounding objects. For virtual envi-
collisions of gravel particles. A low frequency lowpass fil- ronments we require a minimal set of resonators that can be
ter is used to simulate the duration of the particle spray easily adapted to a wide variety of sounds, and can be effi-
following an impact. The impulses have randomly selected ciently run in numbers. The earliest forms of resonator used
amplitudes and are shaped or filtered to reflect increased for this purpose were modal resonators [5, 13] which con-
particle collision brightness with increased contact force and sist of parallel banks of second order resonant filters, each
speed, before exciting a particle resonance. This model sim- with individual coupling constants and damping. These are
plifies the fact that at high system collision energies there particularly suited to objects with mainly sharp resonances
will still be particle collisions occurring at low energy. It such as solid objects made from glass, stone and metal. It is
also assumes all particles have the same resonance. The possible to identify spectral peaks in the recording of a such
model does however have sufficient dynamic temporal and an object, and also the damping by tracking how quickly
spectral behavior to be interesting. Three levels of dynam- each peak decays, [11]. A command line tool is included
ics can be distinguished here, the gross object dynamics, with Phya for automating this process. The resultant data
the simulated gravel dynamics, and audio resonance. The is many times smaller than even a single collision sample.
detail that can be encoded in surface excitations is critical Refinements to this process included sampling over a range
from the musical point of view. It provides the foundation of impact points, and using spatial sound reconstruction.
from which the full sounds evolves. The associated complexities were not considered a priority
in Phya. Hitting an object in different places produces dif-
2.2.4 Friction ferent sounds, but just hitting an object in the same place
Friction stick and slip processes are important in string repeatedly produces different sounds each time, due to the
instruments. In virtual environments they are much less changing state of the resonant filters. It is part of the at-
common source of sound than the interactions considered traction of physical modeling that such subtleties are man-
so far. A good example is door creaking, which is visually ifested. If needed, an collision object can be broken up into
73
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
several different collision objects, and different Phya sound There is a common class of objects that are not com-
objects associated with these. pletely rigid, but still resonate clearly, for example a thin
sheet of metal. Such objects have variable resonance char-
2.3.2 Diffuse resonance acteristics depending on their shape. While explicit model-
For a large enough object of a given material the modes ing of the resonance parameters according to shape is pro-
become very numerous and merge into a diffuse continuum. hibitive, an excellent qualitative effect that correlates well
This coincides with the emergence of time domain struc- with visual dynamics is to vary the resonator parameters
ture at scales of interest to us, so that for instance a large about a calibrated set, according variations of shape from
plate of metal can be used to create echos and reverbera- the nominal. This can be quantified in a physical model
tion. For less dense, more damped material such as wood, of a deformable model by using stress parameters or linear
pronounced diffuse resonance occurs at modest sizes, for in- expansion factors. The large scale oscillation of such a body
stance in chairs and doors. Such objects are very common modulates the audio frequencies providing an excellent ex-
in virtual environments and yet a modal resonator is not ample of audiovisual dynamic coupling.
efficiently able to model diffuse resonance, or be matched
to a recording. Waveguide methods have been employed to 2.4 Phya overall structure and engine
model diffuse resonance either using abstract networks, in- Phya is built in the C++ language, and is based around
cluding banded waveguides [4], feedback delay networks [9] a core set of general object types, that can specialized and
or more explicit structures such as waveguide meshes [14, extended. Sounding objects are represented by a contain-
15]. An alternative approach introduced in [6], is to mimic ing object called a Body, which refers to an associated Sur-
a diffuse resonator by dividing the excitation into frequency face and Resonator object, see Figure 9. Specializations
bands, and feeding the power in each into a multi-band noise of these include SegmentSurface for recorded surface pro-
generator, via a filter that generates the time decay for each files, RandSurface for deterministically generated stochas-
band, see figure 8. This perceptual resonator provides a dif- tic surfaces, GridSurface for patterns. The resonator sub-
fuse response that responds to the input spectrum. When types are ModalResonator and PerceptualResonator. Bod-
combined with modal modeling for lower frequencies it can ies can share the same surface and resonator if required in
efficiently simulate wood resonance, and can be easily ma- order to handle groups of objects more efficiently. Colli-
nipulated by the sound designer. A similar approach had sions states are represented using Impact and Contact ob-
been used in [10] to simulate the diffuse resonance of sound jects that are dynamically created and released as collisions
boards to hammer strikes, however the difference here is occur between physical objects. These objects take care of
that the resonator follows the spectral profile of a general updating the state of any associated surface interactions.
input.
body
noise
resonator surface
bandpass gain
+
body1 impact generator body1 contact generator
body2 body2
bandpass gain
Figure 9: Main objects in Phya.
bandpass envelope lowpass
follower
74
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2.4.2 Tracking contacts From a more abstract view, the layered, multi-scale dynam-
Most collision engines do not use persistent contacts, mean- ics within Phya capture the layered dynamics present in
ing they forget information about contacts from one colli- real acoustic instruments. It is sometimes claimed that this
sion frame to another. On the other hand Phya wishes structure is particularly relevant to musical performance,
to remember contacts because it has audio processes that [8]. Electronic performance systems often fail to embody
generate excitations continuously during a contact. The the full range of dynamic scales, even within physically
problem can be attacked either by modifying the collision modeled instruments, which sometimes lack physical con-
engine, which is hard or not possible, or searching contact trol interfaces with appropriate embedded dynamics.
lists. In the simplest case, the physics engine provides a list Although grounded in physical behavior, and therefore
of non-persistent physical contacts at each collision step, naturally appealing to human psychology, the intimate in-
and no other information. For each physical contact, the teractions can be tailored to more unusual simulations that
associated Phya bodies can be found and compared with would be difficult or impossible in the real world. For
a list of current Phya contact pairs. If no pair matches a instance very deep resonances can be easily created that
new Phya contact is formed. If a pair is found, it is asso- would require very heavy objects, and unusual resonances
ciated with the current physical contact. For any pairs left can be created. Likewise, the parameters of surfaces can be
unmatched, the associated Phya contact is released. See composed to ensure the desired musical effect. The physical
Figure 11. This works on the, mostly true, assumption that behavior of objects can be matched to any desired scale, of
if a physical contact exists between two bodies in two suc- distance, time or gravity. Because the graphical world is
cessive frames then that is a continuous contact evolving. If virtual it too can be composed artistically with more free-
two bodies are in contact in more than one place then some dom than the real world.
confusion can occur, but this is offset by the fact that the The graphical output not only provides additional feed-
sound is more complex. Engines that keep persistent con- back to the performer, but adds the kind of intimate visual
tacts are easier to handle. The ability to generate callbacks association, present in traditional musical performance, but
when contacts are created and destroyed helps even more. lacking in much live electronic music, especially that focused
around keyboard and mouse control. Phya provides the au-
Physical Body1 Phya Body1 dience with an alternative to the performer as a visual focus.
Physical look in Phya Phya The mouse interface is readily extended to a more hapti-
Contact Contacts Contact cally and visually appropriate controller using a device such
Physical Body2 Phya Body2 as a Nintendo Wii remote. This has the effect of making the
control path correspond directly to the object path, improv-
Figure 11: Find a Phya contact from a Physical ing the sense of immersion for the performer. In a CAVE
contact. like environment the performer can maneuver within a spa-
tial audio environment, although without an audience. In a
full headset virtual reality environment, the performer can
2.4.3 Smooth surfaces interact directly with objects through virtual limbs, with
Another problem of continuous contacts arises from the virtual co-performers and virtual audience.
collision detection of curved surfaces. For example the col- While Phya has not been used yet to produce an extended
lision of a cylinder can be detected using a dedicated algo- musical work, we discuss musical aspects of some demon-
rithm, or a more general one applied to a collision net that strations. Figures 12 and 13, show simple examples of sonic
approximates a cylinder. From a visual dynamic point of toys constructed with Phya. In the first nested spheres form
view the general approach may appear satisfactory. How- a kind of virtual rattle, with the lowest resonance associated
ever, the dynamic information produced may lead to audio with the biggest sphere. The user interacts by dragging the
that is clearly consistent with an object with corners and middle sphere around by invisible elastic. The second shows
not smooth. A way to improve this situation is to smooth a deformable teapot with a range of resonances. The defor-
the dynamic information when it is intended that the sur- mation parameters are used to modify the resonant frequen-
face is smooth, using linear filters. This requires Phya to cies on the fly. The effect is at once familiar and surreal.
check the tags on the physical objects associated with a new Further examples demonstrate the stacking of many differ-
contact to see if smoothing is intended. ent resonant blocks. Configuring groups of blocks becomes
a musical, zen-like process.
2.4.4 Limiters
The unpredictable nature of physical environmental sound
requires automated level control, both to ensure it is suf-
ficiently audible and also not so loud to dominate other
audio sources or to clip the audio range. This has already
been partly addressed at the stage of excitation generation,
however because of the unpredictability of the whole sys-
tem, it is also necessary to apply limiters to the final mix.
This is best achieved with a short look-ahead brick wall lim-
iter, that can guarantee a limit, while also reducing annoy-
ing artifacts that would be caused without any look-ahead. Figure 12: Nested sonic spheres.
Too much look-ahead would compromise interactivity, how-
ever the duration of a single audio system processing vector,
which is typically 128 samples, is found to be sufficient. 4. COPING WITH NETWORK LATENCY
There has been considerable interest in collaborative in-
3. A VIRTUAL MUSICAL INSTRUMENT teractive musical performance over networks. One aspect
While Phya was designed for general purpose virtual worlds, of such systems is the delay or latency required to transmit
the variety and detail of sonic interactions on offer lend information around the network, which can be musically
themselves to the creation of musical virtual instruments. significant for long distance collaborations. In the case of
75
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
6. CONCLUSION
The original goal was to create a system that can capture
the sonic nuance and variety of collisions, and that was easy
to configure and use within a virtual reality context. This
required the consideration of a variety of inter-dependent
factors. The result is a system that is not only useful from
the point of view of virtual reality, but has natural aesthetic
Figure 13: Deformable sonic teapot. interest and application in musical performance. The inte-
grated graphical output is part of a fused perceptual aes-
thetic. Phya is now an open source project. 2 .
performance with acoustic instruments, it is impossible to
make each side hear the same total performance while also 7. REFERENCES
playing their instruments normally. Virtual instruments of [1] F. Avanzini, M. Rath, and D. Rocchesso.
the kind described here offer another possibility, due to the Physically-based audio rendering of contact. In Proc.
fact that the dynamics of the virtual world is strictly sepa- IEEE Int. Conf. on Multimedia and Expo,
rated from the control in the outer world. Figure 14 shows (ICME2002), Lausanne, volume 2, pages 445–448,
a collaboration between two performers across a network. 2002.
Adding local delays to match the network latency keeps the [2] F. Avanzini, S. Serafin, and D. Rocchesso. Interactive
simulation of rigid body interaction with
friction-induced sound generation. IEEE Tr. Speech
and Audio Processing, 13(5.2):1073–1081, 2005.
Delay D Virtual world Latency D Virtual world Delay D [3] P. Cook. Physically informed sonic modeling (phism):
Synthesis of percussive sounds. Computer Music
Journal, 21:3, 1997.
[4] G. Essl, S. Serafin, P. Cook, and J. Smith. Theory of
Figure 14: Two performers with local virtual banded waveguides. Computer Music Journal, spring
worlds. 2004.
[5] J. K. Hahn, H. Fouad, L. Gritz, and J. W. Lee.
two virtual worlds synchronized. In each world the audio Integrating sounds and motions in virtual
and graphical elements are of course synchronized. Per- environments. In Sound for Animation and Virtual
formance gestures are delayed, but this is not such a severe Reality, SIGGRAPH 95, 1995.
handicap because the visual feedback remains synchronized, [6] D. Menzies. Perceptual resonators for interactive
and is a price worth paying to maintain overall synchroniza- worlds. In Proceedings AES 22nd International
tion over the network. If control is by force rather than po- Conference on Virtual, Synthetic and Entertainment
sition, the gesture delay is even less intrusive. To eliminate Audio, 2002.
drift between the virtual worlds, and handle many perform- [7] D. Menzies. Scene management for modelled audio
ers efficiently, a central virtual world can be used, shown in objects in interactive worlds. In International
Figure 15 This adds return latency delays. Conference on Auditory Display, 2002.
[8] D. Menzies. Composing instrument control dynamics.
Organized Sound, 7(3), April 2003.
[9] D. Rochesso and J. O. Smith. Circulant and elliptic
Virtual world feedback delay networks for artificial reverberation.
Latency 1 Latency 2 IEEE trans. Speech and Audio, 5(1):1997, 1997.
[10] J. O. Smith and S. A. Van Duyne. Developments for
the commuted piano. In Proceedings of the
International Computer Music Conference, Banff,
Figure 15: Many performers with a central virtual Canada, 1995.
world. [11] K. van den Doel. Sound Synthesis for Virtual Reality
and Computer Games. PhD thesis, University of
British Columbia, 1998.
[12] K. van den Doel. Physically-based models for liquid
5. BACK TO REALITY sounds. ACM Transactions on Applied Perception,
The aesthetics of Phya partly inspired a tangible musi- 2:534–546, 2005.
cal performance piece, that we mention briefly because it [13] K. van den Doel, P. G. Kry, and D. K. Pai.
provides an interesting example of how the boundary be- Foleyautomatic: Physically-based sound effects for
tween virtual and real can become blurred. Ceramic Bowl1 interactive simulation and animation. In Computer
centers around a bowl with 4 contact microphones attached Graphics (ACM SIGGRAPH 01 Conference
around the base, where there is a hole. Objects are launched Proceedings), 2001.
manually into the bowl where they roll, slide and collide in [14] S. A. Van Duyne and J. O. Smith. Physical modeling
orbit until they exit. The captured sound is computer pro- with the 2-d digital waveguide mesh. In Proc. Int.
cessed under realtime control and diffused onto an 8 speaker Computer Music Conf., Tokyo, 1993.
rig. The microphone arrangement allows the spatial sound [15] S. A. Van Duyne and J. O. Smith. The 3d tetrahedral
events to be magnified over a large listening area. digital waveguide mesh with musical applications. In
Proceedings International Computer Music
1
First performed at the Electroacoustic Music Studies con- Conference, 2001.
ference, Leicester, 14 June 2007. Broadcast on BBC Radio
2
3 Hear and Now, 25 August 2007 Details at www.zenprobe.com/phya
76
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
77
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
definition of virtuosity put forward by Dobrian and Koppelman Etude 8 introduces different methods of synthesis (for example:
[1]: “the ability to call upon all the capabilities of an instrument granular and FM), and Etude 9 is a study in changing tempos.
at will with relative ease.” As the authors point out, when Etude 10, the final etude in the series, brings together all skills
working with computers it does not make sense to judge learned in the earlier etudes.
virtuosity only by the factor of speed, because computers can
unquestionably play faster than humans. Each of these introductory etudes is notated along a timeline
that the performer must follow, using a clock that has been
When a performer has achieved virtuosity on an instrument, placed in the etude patch. (see figure 2). The main goal is for
many levels of control and technique have become the performer to become fluent enough on the instrument in
subconscious, and “when control of the instrument has been these basic control parameters so that when further complexity
mastered to the point where it is mostly subconscious, the mind is added the performer will be ready.
has more freedom to concentrate consciously on listening and
expression.” [1]
Virtuosic performers are highly valuable to composers and
instrument designers. Without virtuosic performers, and
instruments capable of adequate expression, composers cannot
hear their music fully realized. In many cases, instrument
designers and programmers have to rely on their own, often Figure 2. Example of Notation
limited, performing skills when first testing a new piece or
instrument.
Etudes help to develop virtuosity, and therefore play a crucial Complexity is increased gradually throughout the series. It is
role in further developing a repertoire for an instrument. understood that the level of complexity might depend on the
Without etudes, players of acoustic instruments would not be characteristics of each interactive instrument. The main method
able to handle the technique needed to perform musical works, of adding complexity is to increase the number of different
and composers would not have performers to play the music control elements (for example, the number of triggers or
they imagine. As it says in the New Grove Dictionary, “the true different layers of sounds to be controlled) or by increasing the
virtuoso has always been prized not only for his rarity but also speed at which these elements need to be controlled. The first
for his ability to widen the technical and expressive boundaries three etudes use only one dimension, layer, or direction of
of his art.” [4] moveable data (constant flow between 0 and 127). Etudes 5 and
6 will involve two such layers. For example, one stream of data
could control volume, and the other spatialization. With some
3. STARTING AT THE BEGINNING instruments or mappings, the gestures that control this data may
3.1 Ordering the Series be completely separate (such as with a keyboard, or different
My initial series of etudes includes ten graduated studies that pedals), and with others they may be more connected (such as
introduce the basic skills needed to manipulate different with a wii, glove, or mouse). The final etudes will be the most
elements of musical sound. This series is designed for a complex: including many control parameters and requiring
beginner or novice performer on interactive instruments. The more intricate synchronization.
etudes are designed to create a non-intimidating experience for However, it is important to keep in mind that for now this is a
a musician with little or no previous experience with electronic series of beginner etudes, designed to prepare a beginning
music. performer for future compositions that may require a much
In choosing which musical elements and types of controls to higher level of complexity and technique.
include in the etudes, and in which order they will appear, I
have also created a priority list. Undoubtedly, my etudes focus 4. COMPATIBILITY
on the skills and musical elements most likely to be needed for
my own compositions. However, I have tried to make the 4.1 A Universal Interface
etudes stylistically diverse. By the end of the series the One of the most important features of these etudes is their
performer will have experience with: triggers, toggle, and more adaptability to many different controllers. Each etude is
fluid or constant parameters. designed so it is playable by any device that can produce the
required types of data. The interface for each etude lists the data
needed and provides the necessary links into the etude. For
3.2 The Etudes example, Etude 1 requires an instrument that can produce eight
Each etude contains four elements: 1 – a basic description of the separate triggers for sample playback (see figure 3).
purpose and intent of the etude, including a simulation
performance of the etude; 2 – a graphically notated score; 3 – Different mappings and interpretations can easily be tried with
the Max/MSP etude patch; and 4 – a Max/MSP patch that will each etude. This flexibility will allow performers to practice
be used to connect the interactive instrument to the etude patch. different movements for different musical parameters, helping
them to assess which of the movements will work best.
Etude 1 introduces the performer to different approaches to Performers can gain a deeper understanding of the particular
rhythm and synchronization. At times rhythmic freedom is strengths and weaknesses of their instrument.
encouraged, and at times strict rhythm is required. Etude 2
focuses on pitch control, while Etude 3 focuses on dynamic, or The etudes do not require specific movements, so the performer
volume control. Etude 4 combines the elements of rhythm, can choreograph all the gestures. For example, depending on
pitch, and volume control. Etude 5 focuses on spatialization and the instrument being used, different actions can activate each
localization, and Etude 6 on timbre and envelope manipulation. trigger; different parameters (position, amplitude, pitch) can
Etude 7 combines the elements used in the first six etudes. produce the same types of continuous numbers – yet the
78
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
resulting sounds will always be the same. Similar gestures, marimba. The various strengths and weaknesses of each
listening skills, and types of coordination are used by a large instrument become quickly apparent when repertoire is shared.
number of interactive instruments. Therefore, the skills a Also, many composers, notably John Cage, have written pieces
performer develops while learning this series of etudes on one for open instrumentation. Performances of these works can
controller will very likely be transferable to other controllers. vary widely depending on the instruments chosen.
Traditional etudes are also typically practised using a variety of
approaches that challenge players in a variety of ways (for
example, with different articulations or dynamic levels).
79
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The etudes may also be a good test of which type of controller Interactive electronic music is an emerging field that has yet to
might be best suited for a certain piece of music. This could be solidly establish a repertoire or performance practice. I believe
especially useful while the piece is still being composed. A one of the most important steps in developing both of these
more skilled performer could easily learn these basic etudes on fundamental parts of a musical genre is to create a method for
several different controllers and quickly evaluate their learning performance technique. In the near future I hope to see
effectiveness on many musical levels. As Wanderley and Orio strong performances of well-written pieces replacing the
state, “Musical tasks are already part of the evaluation process demonstrations and experiments that currently occupy many
of acoustic musical instruments, because musicians and concert spots. For this to occur I believe composers, instrument
composers seldom choose an instrument without extensive designers and performers must work together.
testing to how specific musical gestures can be performed.” [6]
These etudes can strengthen such collaborations by providing a
foundation for evaluation of both the instrument and the
performer. This basis for evaluation is an essential ingredient in
building a lasting repertoire for interactive instruments.
6. ACKNOWLEDGMENTS
I am very grateful to Dr. Bob Pritchard and Dr. Keith Hamel for
their support of this project, programming help, and generous
feedback on my work. Thank you also to my husband Michael
Begg for his invaluable editing skills. This project is supported
in part through the Social Science and Humanities Research
Council of Canada, grant 848-2003-0147, and by the University
of British Columbia Media And Graphics Interdisciplinary
Centre (MAGIC) the UBC Institute for Computing, Information
and Cognitive Science (ICICS), and the School of Music.
7. REFERENCES
[1] Dobrian, C., and Koppelman, D. “The ‘E’ in NIME:
Musical Expression with New Computer Interfaces”.
Proceedings of the 2006 Conference on New Interfaces for
Musical Expression (NIME06), Paris, France, 2006.
[2] Fels, S., Gadd, A., and Mulder, A. “Mapping transparency
through metaphor: towards more expressive musical
instruments”. Organised Sound 7:2, 109-126. Cambridge
Figure 6. Etude 2 User Interface University Press, 2002.
[3] Ferguson, H., and Hamilton, K. L. “Study”. Grove Music
Online. L. Macy, ed. http://www.grovemusic.com
5. CONCLUSIONS [4] Jander, O. “Virtuoso”. Grove Music Online. L. Macy, ed.
My primary goals in writing these etudes are to: http://www.grovemusic.com
1. Create a learning environment in which beginners can [5] Lazzetta, F. “Meaning in Musical Gesture”. Trends in
experience a non-intimidating introduction to interactive Gestural Control of Music, M. M. Wanderley and M.
performance. Battier, eds. Paris, Fr: IRCAM - Centre Georges
2. Encourage other composers and performers to create their Pompidou, 2000.
own etudes and pieces that can be exchanged to broaden the [6] Wanderley, M. M., and Orio, N. “Evaluation of Input
level of shared knowledge, and help to define the skills needed Devices for Musical Expression: Borrowing Tools from
for performing on interactive electronic instruments. HCI”. Computer Music Journal 26:3, 62-76. MIT Press,
3. Create a tool that will guide performers and instrument 2002.
builders towards higher levels of control and musical
expression.
80
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
81
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
82
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2] for fast multidimensional search. The focus-group tradition provides a well-studied approach
to such group discussion [15]. Our group session has a lot
3. METHOD in common with a typical focus group in terms of the fa-
cilitation and semi-structured group discussion format. In
In evaluating a musical interface such as the above, we
addition we make available the interface(s) under consid-
wish to develop a qualitative method which can explore
eration and encourage the participants to experiment with
issues such as expressivity and affordances for users. Lon-
them during the session.
gitudinal studies may be useful, but imply a high cost in
As in the solo sessions, the transcribed conversation is
time and resources. Therefore our design aims to provide
the data to be analysed, which means that a neutral facili-
users with a brief but useful period of exploration of a new
tation technique is important – to encourage all participants
musical interface, including interviews and discussion which
to speak, to allow opposing points of view to emerge in a
we can then analyse.
non-threatening environment, and to allow the group to ne-
In any evaluation of a musical interface one must decide
gotiate the use of language with minimal interference.
the context of the evaluation. Is the interface being eval-
uated as a successor or alternative to some other interface 3.3 Data analysis
(e.g. an electric cello vs an acoustic cello)? Who is ex-
Our DA approach to analysing the data is based on that
pected to use the interface (e.g. virtuosi, amateurs, chil-
of [2, p. 95–102], adapted to our study context. The DA of
dren)? Such factors will affect not only the recruitment
text is a relatively intensive and time-consuming method.
of participants but also some aspects of the experimental
It can be automated to some extent, but not completely,
setup.
because of the close linguistic attention required. Our ap-
Our method is designed either to trial a single interface
proach consists of the following five steps:
with no explicit comparison system, or to compare two sim-
ilar systems (as is done below in our case study). The (a) Transcription
method consists of two types of user session (solo sessions
The speech data is transcribed, using a standard style of
followed by group session(s)), plus the Discourse Analysis
notation which includes all speech events (including repe-
of data collected.
titions, speech fragments, pauses). This is to ensure that
3.1 Solo sessions the analysis can remain close to what is actually said, and
avoid adding a gloss which can add some distortion to the
In order to explore individuals’ personal responses to the
data. For purposes of analytical transparency, the tran-
interface(s), we first perform solo sessions in which a partic-
scripts (suitably anonymised) should be published alongside
ipant is invited to try out the interface(s) for the first time.
the analysis results.
If there is more than one interface to be used, the order of
presentation is randomised in each session. (b) Free association
The solo session consists of three phases for each interface:
Having transcribed the speech data, the analyst reads it
Free exploration The participant is encouraged to try out through and notes down surface impressions and free asso-
the interface for a while and explore it in their own ciations. These can later be compared against the output
way. from the later stages.
Guided exploration The participant is presented with au- (c) Itemisation of transcribed data
dio examples of recordings created using the interface, The transcript is then broken down by itemising every sin-
and encouraged to create recordings inspired by those gle object in the discourse (i.e. all the entities referred to).
examples. This is not a precision-of-reproduction task; Pronouns such as “it” or “he” are resolved, using the par-
precision-of-reproduction is explicitly not evaluated, ticipant’s own terminology as far as possible, and for every
and participants are told that they need not replicate object an accompanying description is extracted, of the ob-
the examples. ject as it is in that instance – again using the participant’s
Semi-structured interview The interview’s main aim is own language, essentially by rewriting the sentence/phrase
to encourage the participant to discuss their experi- in which the instance is found.
ences of using the interface in the free and guided ex- The list of objects is scanned to determine if different
ploration phases, both in relation to prior experience ways of speaking can be identified at this point. Also, those
and to the other interfaces presented if applicable. objects which are also “actors” (or “subjects”) are identified
Both the free and guided phases are video recorded, – i.e. those which act with agency in the speech instance;
and the interviewer may play back segments of the they need not be human.
recording and ask the participant about them, in or- It is helpful at this point to identify the most commonly-
der to stimulate discussion. occurring objects and actors in the discourse.
The raw data to be analysed is the interview transcript. (d) Reconstruction of the described world
Our aim is for the participant to construct their own de- Starting with the list of most commonly-occurring objects
scriptions and categories, which means it is very important and actors in the discourse, the analyst reconstructs the
that the interviewer is experienced in neutral interview tech- depictions of the world that they produce. This could for
nique, and can avoid (as far as possible) introducing labels example be achieved using concept maps to depict the inter-
and concepts that do not come from the participant’s own relations between the actors and objects. If different ways
language patterns. of speaking have been identified, there will typically be one
reconstructed “world” per way of speaking. Overlaps and
3.2 Group session contrasts between these worlds can be identified.
To complement the solo sessions we also conduct a group The “worlds” we produce are very strongly tied to the
session: peer group discussion can produce more and dif- participant’s own discourse. The actors, objects, descrip-
ferent discussion around a topic, and can demonstrate the tions, relationships, and relative importances, are all de-
group negotiation of categories, labels, comparisons, etc. rived from a close reading of the text. These worlds are
83
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
essentially just a methodically reorganised version of the available online1 . In the study, condition “Q” was used to
participant’s own language. refer to the system with timbre remapping active, “X” for
In our particular context, we may be interested in the the system with timbre remapping inactive.
user’s conceptualisation of musical interfaces. It is partic-
ularly interesting to look at how these are situated in the 4.1 Reconstruction of the described world
described world, and particularly important to avoid pre-
conceptions about how users may describe an interface: for User 1
example, a given interface could be: an instrument; an ex-
User 1 expressed positive sentiments about both Q and X,
tension of a computer; two or more separate items (e.g. a
but preferred Q in terms of sound quality, ease of use and
box and a screen); an extension of the individual self; or it
being “more controllable”. In both cases the system was
could be absent from the discourse.
construed as a reactive system, making noises in response
(e) Examining context to noises made into the microphone; there was no concep-
tual difference between Q and X – for example in terms of
The relevant context of the discourse typically depends on affordances or relation to other objects.
the field of study, for example whether it is political or The “guided exploration” tasks were treated as reproduc-
psychological. Here we have created an explicit context tion tasks. User 1 described the task as difficult for X, and
of other participants. After running the previous steps of easier for Q, and situated this as being due to a difference
DA on each individual transcript, we compare and contrast in “randomness” (of X) vs. “controllable” (of Q).
the described worlds produced from each transcript, first
comparing those in the same experimental condition (i.e. User 2
same order of presentation, if relevant), then across all par-
ticipants. We also compare the DA of the focus group ses- User 2 found the the system (in both modes) “didn’t sound
sion(s) against that of the solo sessions. very pleasing to the ear”. His discussion conveyed a per-
vasive structured approach to the guided exploration tasks,
in trying to infer what “the original person” had done to
4. THE METHOD IN ACTION: EVALUAT- create the examples and to reproduce that. In both Q and
ING VOICE TIMBRE REMAPPING X the approach and experience was the same.
In our study we wished to evaluate the timbre remapping Again, User 2 expressed preference for Q over X, both
system with beatboxers (vocal percussion musicians), for in terms of sound quality and in terms of control. Q was
two reasons: they are one target audience for the technol- described as more fun and “slightly more funky”. Interest-
ogy in development; and they have a familiarity and level ingly, the issues that might bear upon such preferences are
of comfort with manipulation of vocal timbre that should arranged differently: issues of unpredictability were raised
facilitate the study sessions. for Q (but not X), and the guided exploration task for Q
We recruited by advertising online (a beatboxing website) was felt to be more difficult, in part because it was harder
and around London for amateur or professional beatboxers. to infer what “the original person” had done to create the
Participants were paid £10 per session plus travel expenses examples.
to attend sessions in our (acoustically-isolated) studio. We
recruited five participants from the small community, all
User 3
male and aged 18–21. One took part in a solo session; one User 3’s discourse placed the system in a different context
in the group session; and three took part in both. Their compared to others. It was construed as an “effect plugin”
beatboxing experience ranged from a few months to four rather than a reactive system, which implies different affor-
years. Their use of technology for music ranged from min- dances: for example, as with audio effects it could be ap-
imal to a keen use of recording and effects technology (e.g. plied to a recorded sound, not just used in real-time; and the
Cubase). description of what produced the audio examples is cast in
In our study we wished to investigate any effect of provid- terms of an original sound recording rather than some other
ing the timbre remapping feature. To this end we presented person. This user had the most computer music experience
two similar interfaces: both tracked the pitch and volume of of the group, using recording software and effects plugins
the microphone input, and used these to control a synthe- more than the others, which may explain this difference in
siser, but one also used the timbre remapping procedure to contextualisation.
control the synthesiser’s timbral settings. The synthesiser User 3 found no difference in sound or sound quality be-
used was an emulated General Instruments AY-3-8910 [5], tween Q and X, but found the guided exploration of X more
which was selected because of its wide timbral range (from difficult, which he attributed to the input sounds being more
pure tone to pure noise) with a well-defined control space varied.
of a few integer-valued variables. We used the method as
described in section 3. Analysis of the interview transcripts User 4
took approximately 10 hours per participant (around 2000 User 4 situated the interface as a reactive system, similar
words each). to Users 1 and 2. However, the sounds produced seemed to
We do not report a detailed analysis of the group session be segregated into two streams rather than a single sound –
transcript here: the group session generated information a “synth machine” which follows the user’s humming, plus
which is useful in the development of our system, but little “voice-activated sound effects”. No other users used such
which bears directly upon the presence or absence of timbral separation in their discourse.
control. We discuss this outcome further in section 5. “Randomness” was an issue for User 4 as it was for some
In the following, we describe the main findings from anal- others. Both Q and X exhibited randomness, although X
ysis of the solo sessions, taking each user one by one before was much more random. This randomness meant that User
drawing comparisons and contrasts. We emphasise that 4 found Q easier to control. The pitch-following sound was
although the discussion here is a narrative supported by
quotes, it reflects the structures elucidated by the DA pro- 1
http://www.elec.qmul.ac.uk/digitalmusic/papers/
cess – the full transcripts and Discourse Analysis tables are 2008/Stowell08nime-data/
84
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
felt to be accurate in both cases; the other (sound effects / A uniform outcome from all participants was the con-
percussive) stream was the source of the randomness. scious interpretation of the guided exploration tasks as precision-
In terms of the output sound, User 4 suggested some small of-reproduction tasks. This was evident during the study
differences but found it difficult to pin down any particular sessions as well as from the discourse around the tasks. As
difference, but felt that Q sounded better. one participant put it, “If you’re not going to replicate the
examples, what are you gonna do?”
4.2 Examining context A notable absence from the discourses, given our research
context, was discussion which might bear on expressivity,
Effect of order-of-presentation for example the expressive range of the interfaces. Towards
Users 1 and 2 were presented with the conditions in the or- the end of each interview we asked explicitly whether either
der XQ; Users 3 and 4 in the order QX. Order-of-presentation of the interfaces was more expressive, and responses were
may have some small influence on the outcomes: Users 3 generally non-commital. We propose that this was because
and 4 identified little or no difference in the output sound our tasks had failed to engage the participants in creative
between the conditions (User 4 preferred Q but found the or expressive activities: the (understandable) reduction of
difference relatively subtle), while Users 1 and 2 felt more the guided exploration task to a precision-of-reproduction
strongly that they were different and preferred the sound of task must have contributed to this. We also noticed that
Q. It would require a larger study to be confident that this our study design failed to encourage much iterative use of
difference really was being affected by order-of-presentation. record-and-playback to develop ideas. In section 5 we sug-
In our study we are not directly concerned with which gest some possible future directions to address these issues.
condition sounds better (both use the same synthesiser in
the same basic configuration), but this is an interesting as- 5. DISCUSSION
pect to come from the study. We might speculate that
The analysis of the solo sessions provides useful infor-
differences in perceived sound quality are caused by the dif-
mation on the user experience of a voice-controlled music
ferent way the timbral changes of the synthesiser are used.
system and the integration of timbre remapping into such
However, participants made no conscious connection be-
a system. Here we wish to focus on methodological issues
tween sound quality and issues such as controllability or
arising from the study.
randomness.
Above we raised the issue that our “guided exploration”
Considerations across all participants task, in which participants were asked to record a sound
sample on the basis of an audio example, was interpreted
Taking the four participant interviews together, no strong as a precision-of-reproduction task. Possibilities to avoid
systematic differences between Q and X are seen. All par- this in future may include: using audio examples which
ticipants situate Q and X similarly, albeit with some nu- are clearly not originally produced using the interface (e.g.
anced differences between the two. Activating/deactivating string sections, pop songs), or even non-audio prompts such
the timbre remapping facet of the system does not make a as pictures; or forcing a creative element by providing two
strong enough difference to force a reinterpretation of the examples and asking participants to create a new recording
system. which combines elements of both.
A notable aspect of the four participants’ analyses is the Other approaches which encourage creative work with an
differing ways the system is situated (both Q and X). As interface could involve tasks in which participants are asked
designers of the system we may have one view of what the to create compositions, or iteratively develop live perfor-
system “is”, perhaps strongly connected with technical as- mance. We would expect that the use of more creative
pects of its implementation, but the analyses presented here tasks should produce more participant discussion of cre-
illustrate the interesting way that users situate a new tech- ative/expressive aspects of an interface.
nology alongside existing technologies and processes. The Such tasks could also be used to provide more structure
four participants situated the interface in differing ways: ei- during the group sessions: one reason the group session
ther as an audio effects plugin, or a reactive system; as a produced less relevant data than the solo sessions is (we
single output stream or as two. We emphasise that none believe) the lack of activities, which could have provided a
of these is the “correct” way to conceptualise the interface. more structured exploration of the interfaces.
These different approaches highlight different facets of the
interface and its affordances.
During the analyses we noted that all participants main- 6. CONCLUSIONS
tained a conceptual distance between themselves and the We have applied a detailed qualitative analysis to user
system, and analogously between their voice and the output studies involving a voice-driven musical interface with and
sound. There was very little use of the “cyborg” discourse without the use of timbre remapping. It has raised some
in which the user and system are treated as a single unit, interesting issues in the development of the interface, in-
a discourse which hints at mastery or “unconscious compe- cluding the unproblematic integration of the timbral aspect,
tence”. This fact is certainly understandable given that the and the nuanced interaction of issues such as control and
participants each had less than an hour’s experience with randomness.
the interface. It demonstrates that even for beatboxers with However, the primary aim of this paper has been to in-
strong experience in manipulation of vocal timbre, control- vestigate the use of Discourse Analysis to provide a robust
ling the vocal interface requires learning – an observation qualitative approach to evaluating the affordances and user
confirmed by the participant interviews. experience of a musical interface. Results from our DA-
The issue of “randomness” arose quite commonly among based user study indicate that with some modification of
the participants. However, randomness emerges as a nu- the user tasks, the method can derive detailed information
anced phenomenon: although two of the participants de- about how musicians interact with a new musical interface
scribed X as being more random than Q, and placed ran- and accommodate it in their existing conceptual repertoire.
domness in opposition to controllability (as well as prefer- We have presented one specific method for evaluating a
ence), User 2 was happy to describe Q as being more random musical interface, but of course there may be other appro-
and also more controllable (and preferable). priate methods. As discussed in the introduction, the state
85
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
of the art in evaluating musical interfaces is relatively un- [18] M. M. Wanderley and N. Orio. Evaluation of input
derdeveloped, and we would hope to encourage others to devices for musical expression: Borrowing tools from
explore reliable methods for evaluating new musical inter- HCI. Computer Music Journal, 26(3):62–76, 2002.
faces in authentic contexts.
7. REFERENCES
[1] C. Antaki, M. Billig, D. Edwards, and J. Potter.
Discourse analysis means doing analysis: A critique of
six analytic shortcomings. Discourse Analysis Online,
2004.
[2] P. Banister, E. Burman, I. Parker, M. Taylor, and
C. Tindall. Qualitative Methods in Psychology: A
Research Guide. Open University Press, Buckingham,
1994.
[3] G. De Poli and P. Prandoni. Sonological models for
timbre characterization. Journal of New Music
Research, 26(2):170–197, 1997.
[4] C. Dobrian and D. Koppelman. The ‘E’ in NIME:
musical expression with new computer interfaces. In
Proceedings of New Interfaces for Musical Expression
(NIME), pages 277–282. IRCAM, Centre Pompidou
Paris, France, 2006.
[5] General Instrument. GI AY-3-8910 Programmable
Sound Generator datasheet, early 1980s.
[6] J. Harvey. Evaluation Cookbook, chapter So You Want
to Use a Likert Scale? Learning Technology
Dissemination Initiative, 1998.
[7] J. Kreiman, D. Vanlancker-Sidtis, and B. R. Gerratt.
Defining and measuring voice quality. In Proceedings
of From Sound To Sense: 50+ Years of Discoveries in
Speech Communication, pages 115–120. MIT, June
2004.
[8] T. Magnusson and E. H. Mendieta. The acoustic, the
digital and the body: A survey on musical
instruments. In Proceedings of New Interfaces for
Musical Expression (NIME), June 2007.
[9] G. Paine, I. Stevenson, and A. Pearce. The thummer
mapping project (ThuMP). In Proceedings of New
Interfaces for Musical Expression (NIME), pages
70–77, 2007.
[10] C. Poepel. On interface expressivity: A player-based
study. In Proceedings of New Interfaces for Musical
Expression (NIME), pages 228–231, 2005.
[11] I. Poupyrev, M. J. Lyons, S. Fels, and T. Blaine. New
Interfaces for Musical Expression. Workshop proposal,
2001.
[12] F. P. Preparata and M. I. Shamos. Computational
Geometry: An Introduction. Springer-Verlag, 1985.
[13] M. Puckette. Low-dimensional parameter mapping
using spectral envelopes. In Proceedings of the
International Computer Music Conference
(ICMC’04), pages 406–408, 2004.
[14] D. Silverman. Interpreting Qualitative Data: Methods
for Analysing Talk, Text and Interaction. Sage
Publications Inc, 2nd edition, 2006.
[15] D. W. Stewart. Focus groups: Theory and practice.
SAGE Publications, 2007.
[16] D. Stowell and M. D. Plumbley. Pitch-aware real-time
timbral remapping. In Proceedings of the Digital
Music Research Network (DMRN) Summer
Conference, July 2007.
[17] H. Uszkoreit. Survey of the State of the Art in Human
Language Technology, chapter 6 (Discourse and
Dialogue). Center for Spoken Language
Understanding, Oregon Health and Science
University, 1996.
86
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
87
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
88
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
criticism was of the device’s lack of absolute positioning ca- side of the controller, and insight into global trends of the
pability. The statistics revealed little significant difference participants. However, the conclusions reached from these
between the two controllers; participants displayed no over- results alone seemed to be a limited measure of the device
all preference and the timing errors showed no significant compared to the subtlety of the participants’ observations.
variance. Did the study result in a complete answer in relation to
the research question, how useful is the Wiimote as a musi-
2.3 Reflections cal controller? It’s difficult to answer this objectively, but it
What we’ve done by presenting this case study is to make can be observed that the results showed a detailed and inti-
explicit the practical issues of conducting a usability exper- mate understanding of the controller in a musical context.
iment. This is often mundane detail that gets omitted from One important thing the results do lack is any measure of
experimental reports, but may be the type of essential detail the participants’ experience while using the controller. The
that will make it easier to others to try out HCI methods. more interesting results came from post-task interviews, but
As such, it is worth noting a few key points about how the there is no data about their experience in the moment while
study was implemented. Firstly, the importance of a pilot they were using the device, something that would seem im-
study is easy to under-estimate. The best way to expose portant for a musical evaluation. This gap in the results is
flaws in a script is to put it into practice; for valid results the partly due to lack of technology and partly due to a lack
experimental parameters need to stay constant throughout of methodology. How can musicians self-report their ex-
the study, so flaws need to be removed at this early stage. perience while they are using a musical controller without
In retrospect, the difficulty of some tasks in the Wiimote disrupting the experience itself? Are there post-task evalua-
study could still have been better optimised at pilot stage tion techniques that can give a more accurate and objective
to suit the range of participants’ skill levels. analysis of a musical experience than an interview? More
Secondly, an issue of particular importance in a musical recent research in HCI is starting to address similar issues
usability study is allotted practice time. There’s a lower and can point to possibilities.
limit on the time participants need to spend becoming ac-
customed to the features of an instrument; getting this 3.1 The ‘Third Paradigm’
amount wrong can result in unrepresentative attempts at Kaye et. al. [7], in 2007, described a growing trend in
a task, concealing the true results. Again, this is something HCI research towards experience focused rather than task
which can be assessed during the pilot study. focused HCI. With this trend comes the requirement for new
Thirdly, the gathering of empirical data presents some evaluation techniques to respond to the new kinds of data
challenges. In order for the data to be valid, the partic- being gathered. This trend is a response to the evolving
ipants needed to perform the tasks in the same way, al- ways in which technology is utilised as computing becomes
though getting people to perform a precise task can be dif- increasingly embedded in daily life, a shift in focus away
ficult especially when you have creative people performing from productivity environments [8], and from evaluation of
a creative task. There needs to be some built in flexibility efficiency to evaluation of affective qualities [3]. As HCI is
in the tasks which allows for this. increasingly involved in other ‘highly interactive’ fields of
Finally, the time and effort in transcribing interviews can- computing such as gaming and virtual reality, the require-
not be under estimated. Even supposing voice-recognition ment for evaluating user experience becomes stronger. This
software of sufficient accuracy was available to help avoid new trend is known as the ‘third paradigm’, and researchers
the hard slog of manual annotation, it might be at the have started to tackle some of the challenges presented by
cost of the researcher not engaging so deeply with the data this approach.
by parsing it themselves. An alternative approach is tran- The Sensual Evaluation Instrument (SEI), designed by
scribing just the ‘interesting’ sections, which can save time, Isbister et.al. [6], is a means of self-reporting affect while
though this selection process entails some subjectivity. Tag- interacting with a computer system. Users utilise a set of
ging log file data for analysis was also a long process, as the biomorphic sculptured shapes to provide feedback in real-
correct data had to be found manually by comparison to time. Intuitive and emotional interaction occur cognitively
the video; this could have been improved if the logging had on a sub-symbolic level so the system uses non-verbal com-
been automatically synchronised to the video data. munication in order to more directly represent this. With
its sub-verbal reporting method, the SEI is a step in the
3. DISCUSSION right direction for evaluation of musical interfaces; however,
The previous section discussed the details of applying the as the reporting technique already involves some interaction
specific methodology we used in this study. It is also useful itself, it could only be used effectively in less interactive con-
to reflect more generally on the structuring of the case study texts such as evaluating some desktop software. The most
and the efficacy of the HCI evaluation. Was it useful to dynamic example of its use is from the designers’ tests with
carry out the Wiimote usability study with the methods we a computer game, and they acknowledge in their results
chose? Where were the gaps in the results and how could that it’s not ideal for time-critical interfaces or tasks that
the methodology be improved to narrow these gaps? require fine-grained data.
The most ‘interesting’ results came from analysis of the For more interactive tasks such as playing a musical con-
interview data. The interviews confirmed some expected troller, a non-interactive data gathering mechanism is es-
results about the controller but more usefully brought up sential, so the measurement of physiological data may yield
some unexpected issues that some people found with certain realtime readings without interrupting the users’ attention.
tasks, and some surprising suggestions about how the con- Some studies concentrate on this area of evaluation. Chateau
troller could be used. This is the kind of data that shows and Mersiols’ AMUSE system [1] is designed to collect and
the benefits of conducting a usability study, the kind of synchronise multiple sources of physiological data to mea-
data that is difficult to determine purely by intuition alone sure a user’s instantaneous reaction while they interact with
and that is best collected from the observations of a larger a computer system. This data might include eye gaze,
group of people. From the remaining results, the quantita- speech, gestures and physiological readings such as EMG,
tive results provided objective backup to certain elements of ECG, EEG, skin conductance and pulse. Mandryk [8] ex-
the interview results, some useful data about the functional amines the issues associated with the evaluation of affect
89
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
using these physiological measures; how to calibrate the sen- [4] Alan Dix, Janet Finlay, Gregory D. Abowd, and
sor readings and how to correlate multi-point sensor data Russel Beale. Human-Computer Interaction. Prentice
streams with single point subjective data. Both studies Hall, 3rd edition, 2004.
acknowledge that physiological readings are more valuable [5] Kristina Höök, Phoebe Sengers, and Gerd Andersson.
when combined with qualitative data. The challenge here Sense and sensibility: evaluation and interactive art.
is to interpret the data effectively and research needs to be In CHI ’03, pages 241–248, New York, NY, USA,
done into how to calibrate this data for musical experiments. 2003. ACM.
Fallman and Waterworth [3] describe how the Repertory [6] Katherine Isbister, Kia Hook, Jarmo Laaksolahti, and
Grid Technique (RGT) can be used for affective evaluation Michael Sharp. The sensual evaluation instrument:
of user experience. RGT is a post-task evaluation technique Developing a trans-cultural self-report measure of
based on Kelly’s Personal Construct Theory, and it involves affect. International Journal of Human-Computer
eliciting qualitative constructs from a user which are then Studies, 65:315–328, April 2007.
rated quantitatively. It sits on the border between qualita- [7] Joseph ’Jofish’ Kaye, Kirsten Boehner, Jarmo
tive and quantitative methods, allowing empirical analysis Laaksolahti, and Anna Staahl. Evaluating
of qualitative data. RGT isn’t ideal in a musical context experience-focused hci. In CHI ’07: CHI ’07 extended
as the data isn’t collected in the moment of the experience abstracts on Human factors in computing systems,
it evaluates; however, it could be an improvement on inter- pages 2117–2120, New York, NY, USA, 2007. ACM.
views, and has the the practical advantage that the data [8] Regan Lee Mandryk. Evaluating affective computing
analysis is less time-consuming. environments using physiological measures. In CHI’05
A number of experience evaluation techniques attempt Workshop on Evaluating Affective Interfaces -
to gather data from multiple data sources in order to at- Innovative Approaces, 2005.
tempt to triangulate an overall result. This way of working
[9] James McCartney. Rethinking the computer music
brings the challenge of synchronising and re-integrating the
language: SuperCollider. Computer Music Journal,
data sources, and some researchers are creating tools to deal
26(4):61–8, 2002.
with this [2]. These kind of tools would have been of great
[10] Cornelius Poepel. On interface expressivity: a
value to the data analysis in the Wiimote study, especially
player-based study. In NIME ’05: Proceedings of the
because of the need for log file to video synchronisation.
Developments in new HCI research are encouraging, but 2005 conference on New interfaces for musical
how useful are they in a computer music context? All these expression, pages 228–231, Singapore, Singapore,
methodologies need to be assessed specifically in terms of 2004. National University of Singapore.
evaluation of musical experience as well as user experience. [11] Marcelo Mortensen Wanderley and Nicola Orio.
Evaluation of input devices for musical expression:
Borrowing tools from hci. Comput. Music J.,
4. CONCLUSION 26(3):62–76, 2002.
We have examined current intersections between HCI eval-
uation methodology and computer music, presented a case
study of an evaluation based on this methodology, and looked
at some of the new research in HCI which is relevant to our
field. The evaluation of the Wiimote produced some valu-
able insights into its use as a musical controller, but it lacked
real-time data concerning the participants’ experience of us-
ing the device. The third wave of HCI holds promising po-
tential for computer music; the two fields share the common
goal of evaluating experience and affect between technology
and its users. The analysis of musical interfaces can be con-
sidered as a very specialised area of experience evaluation,
though techniques for new HCI research are not necessarily
immediately applicable to music technology. New research
is needed to adapt and test these methodologies in musical
contexts, and perhaps these techniques might inspire new
research which is directly useful to musicians.
5. REFERENCES
[1] Noel Chateau and Marc Mersiol. Amuse: A tool for
evaluating affective interfaces. In CHI’05 Workshop
on Evaluating Affective Interfaces - Innovative
Approaces, 2005.
[2] Andy Crabtree, Steve Benford, Chris Greenhalgh,
Paul Tennent, Matthew Chalmers, and Barry Brown.
Supporting ethnographic studies of ubiquitous
computing in the wild. In DIS ’06: Proceedings of the
6th conference on Designing Interactive systems,
pages 60–69, New York, NY, USA, 2006. ACM.
[3] John Waterworth Daniel Fallman. Dealing with user
experience and affective evaluation in hci design: A
repertory grid approach. In CHI’05 Workshop on
Evaluating Affective Interfaces - Innovative
Approaces, 2005.
90
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
KEYWORDS
Generative design tools, Instrument building, Multi-faceted
audio, Personal music devices, Tangible user interfaces,
Technology probes
1. INTRODUCTION
We are interested in creating tangible user interfaces that
exploit the semantic richness of sound. Our research draws from
two disciplines: Human-Computer Interaction (HCI) and NIME
instrument design. The former offers a number of examples of
the use of sound in graphical interfaces, including Buxton et
al.’s [2] early work, Gaver’s auditory icons [5] and Beaudouin- Figure 1. The A20 is a working prototype of a technology
Lafon and Gaver’s [1] ENO system. These systems focused probe for exploring music and sound in a tangible interface.
primarily on sound as a feedback mechanism, with an emphasis This paper describes the design and development of the A20
on graphical rather than tangible user interfaces. (Figure 1), a polyhedron-shaped, multi-channel audio device
We draw upon HCI design methods, particularly participatory that allows direct manipulation of media content through touch
design [7][12], that emphasize the generation of ideas in and movement, with various forms of aural and haptic
collaboration with users. In particular, technology probes [9] feedback. During a series of participatory design sessions, both
engage users as well as designers to create novel design users and designers used the A20 to generate and explore novel
interface designs. The easily modifiable software architecture
concepts, inspired by the use of the technology in situ. This
generative design approach challenges both users and designers allowed us to create various mappings between gestural and
pressure inputs, producing specific sounds and haptic output.
Meanwhile the flexibility of the A20 as an interface allowed
Permission to make digital or hard copies of all or part of this work for users a range of interpretations for any given mapping. The A20
personal or classroom use is granted without fee provided that copies are was never intended as a prototype of a specific future system.
not made or distributed for profit or commercial advantage and that
Instead, we sought to use it as a design tool to explore the
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists, potential of music and sound in tangible interfaces. Our
requires prior specific permission and/or a fee. participatory design workshops served to both evaluate the A20
NIME08, June 5-7, 2008, Genova, Italy itself and to explore novel interface designs, including social
Copyright remains with the author(s). interaction through portable music players.
91
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
92
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
1
www.arduino.org
93
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
defines the interaction mappings. A second module is in charge technology probe and an instrument, to inspire and explore
of audio processing and sends data back to the A20. Both different forms of interaction with a tangible audio device.
modules communicate via Open Sound Control2. We chose the
UDP protocol for its efficiency in time-sensitive applications.
5.1 Multi-faceted Audio Perception
The A20 interaction mappings are implemented in C ++ as a The purpose of the first set of tests was to assess the users’
server process that aggregates data from the accelerometer, ability to perceive different modes of audio display on the A20,
gyroscope and pressure sensors. We used the OpenGL graphics including their ability to perceive sound position, motion
library to program a visual representation of the physical around the device, and haptic patterns. We also wanted to
prototype for debugging interaction mappings and to accelerate familiarize them with the A20 so they could participate in the
matrix operations during real-time sound mapping on the second set of participatory design exercises.
device. We vectorized sound location across the surface of the
icosahedron in a way similar to Pulkki’s work on vector-based
sound positioning [13]. Vector-based audio panning extends the
principle of stereo panning with an additional dimension,
making it useful when the listener is not fixed at a sweet spot or
in cases where sound distribution includes a vertical dimension.
In the control software, 3D vectors represent sound sources.
The origin of a 3D coordinate system is the center of the object,
in this case, the center of the A20. Each face and corresponding
speaker is represented by a vector from that origin to its center.
The control software outputs, in real time, a vector angle for
each sound source. The audio engine can then calculate
amplitude levels given the angular distance between the vectors
representing the sound sources and those representing the Figure 5. Testing how a user perceives the A20
speakers. The control software dynamically calculates the
source vectors, resulting in sounds moving across a series of We asked 16 participants to perform a set of tests, in individual
faces. After audio processing, this results in a gradual sessions lasting approximately 10 minutes each. Each
multidimensional panning between those two faces, giving the participant was given the A20 (Figure 5) and asked to perform
impression of sound moving across the surface of the object. the following tasks:
This software can be adapted to a range of different shapes. The Test 1: Localizing Sound
vectors representing the faces are computed according to the Impulse sounds were played randomly on one of the facets and
number of speakers and their placement. The audio engine is the participant was asked to identify the source facet without
touching the A20. (Repeated five times.)
then configured with the proper number of speakers and data
relative to their output capabilities, such as physical size and Test 2: Detecting Direction of Movement
amplitude range. Thus the same software works for the original An impulse train was panned around the equator of the device
cube-shaped prototype and for the 20-sided icosahedron. to simulate a moving sound source with a circular pattern. The
The audio engine is written in Max/MSP and is divided into two participant was asked to identify whether the direction of
movement was clockwise or counter-clockwise, without
parts. The main control program is the master of two slave
touching the A20.
patches, each controlling a sound card. The audio engine
manages multiple sound streams that can be placed on different Test 3: Distinguishing Static from Dynamic
positions on the device according to location attributes sent by We combined the first two tests to determine whether the
the control software. This software allows us to use synthesized participant could distinguish a moving sound from a static
sounds as well as samples of recorded music in MP3 format. sound. The participant was presented with four conditions: two
Post-treatment algorithms are applied to achieve acoustical static sounds were played (derived from Test 1) and two
effects from the real world. For example, Doppler shift changes moving sounds were played (counter and counter-clockwise) in
the sound pitch as it moves closer or further, and filtering a counterbalanced presentation sequence.
effects change the sound timbre as the sound moves behind Test 4: Distinguishing Haptic Stimuli
obscuring objects, thus enhancing the effect of sound movement We combined the auditory and haptic channels to create various
around the device. combinations – some where the two modes were synchronous,
reinforcing perception of a single source, and others that
5. EVALUATION presented two distinct sources, one in each modality. The haptic
In order to evaluate the A20, we invited non-technical users to channels were presented on the lateral faces under the
the third in a series of participatory design workshops. The first participant’s hands whereas the auditory channel (a musical
two sessions, not reported here, focused on an interview-based excerpt from a well-known pop song) was presented on the
exploration of existing personal music player usage, and ‘pie’ zone at the top of the A20. In some combinations, the
structured brainstorming on communicating entertainment haptic channel corresponded to the music being heard, while in
devices, respectively. Evaluation of the A20 was comprised of others the haptic and audio stimuli were independent. The
two activities. The first type of evaluation focused on its participant was asked to indicate whether or not the haptic and
perceptual characteristics as a multi-faceted multi-channel audio audio signals were the same. In cases where the haptic signal
device. The second type of evaluation used the A20 as a was derived from the music, several variations were made to
bring more or less of the music into the haptic range. This
included generating the haptic signal from the amplitude
envelope of the music, or low-pass filtering the music before
2
www.opensoundcontrol.org generating the corresponding haptic stimulus.
94
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
95
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
of the richness and innovation of the ideas generated by non- [4] Freed, A., Avizienis, R., Wessel, M. and Kassakian, P.
technical users, which go far beyond the creativity we saw in (2006). A compact 120 Independent Element Spherical
previous workshops, when they had no specific instrument on Loudspeaker Array with Programmable Radiation Patterns.
which to play and explore ideas. In Proc. of AES’06. paper 6783.
[5] Gaver, W. (1989). The Sonic Finder: An Interface that
6. CONCLUSION AND FUTURE WORK Uses Auditory Icons. In Human Computer Interaction,
Our goal has been to use the expressivity and open-endedness 4(1), pp. 67-94
typical of musical instruments to create generative design tools,
[6] Gaver, W.W. and Dunne, A. (1999). Projected Realities:
encouraging both users and designers to imagine new interfaces
Conceptual Design for Cultural Effect. In Proc. of CHI’99
using the evocative richness of sound. In workshops, users
pp. 600-608.
experienced, tested and explored design ideas, immersed in the
context provided by the workshop theme and the A20’s specific [7] J. Greenbaum and M. Kyng, Eds. (1992). Design at Work:
sound characteristics. We feel that the A20 successfully acted as Cooperative Design of Computer Systems. Lawrence
an expansive platform for generating and exploring new sound Erlbaum Associates, Inc.
interaction ideas. [8] Hunt, A., Wanderley, M.M. and Paradis, M. (2002). The
The icosahedron form served as a generic interface that could importance of parameter mapping in electronic instrument
be reinterpreted in different ways. The A20 constrained the design. In Proc. of NIME’02. pp. 149–154.
design space to gestural input and multi-directional sound [9] Hutchinson, H., Mackay, W.E., Westerlund, B., Bederson,
output and the idiosyncratic form factor influenced some B., Druin, A., Plaisant, C., Beaudouin-Lafon, M.,
participants’ scenario interpretations. However, since the sound Conversy, S., Evans, E., Hansen, H., Roussel, R.,
control software can be easily adapted to work on other form Eiderbäck, B., Lindquist S. and Sundblad, Y. (2003)
factors, different shapes could be used depending upon the Technology Probes: Inspiring Design for and with
design questions to be treated, allowing us to transpose on the Families. In Proc of CHI’03. pp. 17-24.
design space. This could be achieved by creating a wider range
of simple forms or even using Lego-like building blocks to [10] Lindquist, S., Westerlund, B., Sundblad, Y., Tobiasson, H.,
create a shape around the multidirectional sound source. Beaudouin-Lafon, M. and Mackay, W. (2007). Co-
designing Technology with and for Families - Methods,
In our future work, we plan to extend the output and networking Experiences, Results and Impact. In Streitz, N., Kameas,
capabilities of the A20. We found the preliminary perception A. & Mavrommati, I. (Eds), The Disappearing Computer,
tests with haptic patterns interesting and we also plan to explore LNCS 4500, Springer Verlag, 2007, pp. 99-119.
audio-haptic correlation and audio-to-haptic information
transitions and add these features to another instrument [11] Mackay, W.E. and Fayard, A-L. (1997). HCI, Natural
interface. This would allow user interface designers to take the Science and Design: A Framework for Triangulation
haptic capabilities of audio displays into account and to further Across Disciplines. In Proc. of DIS '97. ACM. pp. 223-
explore the multimodal potential across sound and touch 234.
together. We hope to develop a fully wireless lightweight [12] Muller, M.J. and Kuhn, S. (Eds.) (1993).
version of the A20 and would also like to add networking Communications of the ACM Special issue on
features so that multiple A20’s can communicate with each Participatory Design, 36 (6). pp. 24-28.
other and encourage diverse form of musical collaboration
among its users. [13] Poupyrev, I., Newton-Dunn, H. and Bau, O. (2006). D20:
interaction with multifaceted display devices. In CHI’06
Extended Abstracts. ACM. pp.1241-1246.
7. ACKNOWLEDGMENTS [14] Pulkki, V. (1997). Virtual sound source positioning using
This project was developed at Sony Computer Science
vector base amplitude panning. In J. Audio Eng. Soc. 45
Laboratory Paris. Our thanks to the project interns, Emmanuel
(6). pp.456-466.
Geoffray from IRCAM and Sonia Nagala from Stanford
University and to Nicolas Gaudron for the icosahedron [15] Tanaka, A. (2006). Interaction, Agency, Experience, and
structure. the Future of Music. In Brown, B., O’Hara, K. (Eds.)
Consuming Music Together: Social and Collaborative
Aspects of Music Consumption Technologies. Computer
8. REFERENCES Supported Cooperative Work (CSCW) Vol. 35. Springer,
[1] Beaudouin-Lafon, M. and Gaver, W. (1994). ENO: Dordrecht. pp. 267-288.
synthesizing structured sound spaces. In Proc. of UIST’94.
ACM. pp. 49-57. [16] Trueman, D., Bahn, C. and Cook, P. (2000). Alternative
Voices For Electronic Sound, Spherical Speakers and
[2] Buxton, W., Gaver, W. and Bly, S. (1994). Auditory Sensor-Speaker Arrays (SenSAs). In Proc. of ICMC’00.
Interfaces: The Use of Non-Speech Audio at the Interface.
http://www.billbuxton.com/Audio.TOC.html [17] Warusfel, O. and Misdariis, N. (2001). Directivity
Synthesis With A 3d Array Of Loudspeakers Application
[3] Chalmers, M. and Galani, A. (2004). Seamful For Stage Performance. In Proc. of DAFx’01.
interweaving: heterogeneity in the theory and design of
interactive systems. In Proc. of DIS’04. ACM. pp. 243- [18] Williamson, J., Murray-Smith, R., and Hughes, S. (2007).
252. Shoogle: excitatory multimodal interaction on mobile
devices. In Proc.of CHI '07. ACM. pp. 121-124.
96
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
97
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
the complete pressure of the bow with sensors connected to the 4. PRESSURE AND POSITION
bow hair [5].
MEASUREMENT
Maestre presents a gesture tracking system based on a commercial
EMF device [6]. One Sensor is glued on the bottom near the neck 4.1 Strings
of the violin, a second one on the bow. Data of position, pressure The basic measurements at the violin (exemplary for strings) are:
by deforming the bow and relating data to this capturing can be 4.1.1 Pressure and Position of each Finger of the
calculated. A lot more systems exist, but mostly combined with a
camera, which does not seem to be stable and reliable enough for Right Hand
performances and everyday use.
A different approach is developed at IRCAM by Bevilaqua [7].
The sensing capabilities are added to the bow and measure the
bow acceleration in realtime. A software based recognition system
detects different bowing styles.
Guaus measures the bow pressure over all [8] and not each finger,
which cause the pressure on the bow. Sensors are fixed on the
hairs of the bow on the tip and the frog. This means additional
weight on the tip, which could influence professional violin
playing, because of the leverage effect.
The recent paper of Young [9] describes a data base of bow
strokes with many sensor data like 3D acceleration, 2D bow force,
and electric field position sensing, again with an over all bow
force measurement.
The presented measuring system here shows a setup easy to
install, just sticking the less than 1mm thick, flexible sensor on the
bow or finger and connecting it with the convertor box. As every
single finger itself is measured, besides pressure and force
allocation and changes between the fingers at different playing
techniques, muscle cramps and wrong finger position can be
detected.
98
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
99
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Figure 6. Comparison Shoulder-Chin Rest Pressure and Figure 8. Visualisation of 3rd Finger, “Stiffness Control”, too
Position much force/pressure
100
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
101
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Similar methods could be applied to the right hand sensors, but it 5. REFERENCES
is quite difficult to change the pressure without influencing the [1] C. Poepel, D. Overholt. Recent Developments in Violin-
sound too much. related Digital Musical Instruments: Where Are We and
Where Are We Going? NIME06, 6th International Conference
on New Interfaces for Musical Expression, 2006.
[2] A. Askenfelt. Measurement of bow motion and bow force in
violin playing, Journal of Acoustical Society of America, 80,
1986.
[3] J. A. Paradiso and N. A. Gershenfeld. Musical applications of
electric field sensing, Computer Music Journal, 21:2, S 69-89,
MIT Press, Cambridge, Massachusetts, 1997.
[4] D. S. Young. Wireless sensor system for measurment of violin
bowing parameters, Stockholm Music Acoustics Conference,
2003.
Figure 11. Extended Score [5] M. Demoucron, R. Caussé. Sound synthesis of bowed string
instrumnets using a gesture based control of a physical model,
This extended score is a part of the piece “concertare” from International Conference on Noise & Vibration Engineering,
Tobias Grosshauser [13]. 2007.
For keyboard instruments for instance modulation is possible, just [6] E. Maestre, J. Janer, A. R. Jensenius and J. Malloch.
by changing the finger pressure after the key is already stroken. Extending gdif for instrumental gestures: the case of violin
This new playing technique allows new ways of articulation, even performance, International Computer Music Conference,
when the key is already pressed and sound effects like vibrato on Submitted, 2007.
the piano or keyboard instruments.
[7] F. Bevilaqua, N. Rasamimanana, E. Flety, S. Lemouton, F.
Baschet. The augmented violin project: research, composition
and performance report, NIME06, 6th International
4.4 Further Scenarios and Research Conference on New Interfaces for Musical Expression, 2006.
Further research will be observed with finger pressure
measurements on wind instruments and drums. In combination [8] E. Guaus, J. Bonada, A. Perez, E. Maestre, M. Blaauw.
with position recognition and acceleration sensors, most important Measuring the bow pressure force in a real violin
parameters are detected. performance, International Conference on Noise & Vibration
Engineering, 2007.
This pressure and force sensors provide more and more
possibilities for new music compositions in combination with [9] D. Young, A. Deshmane, Bowstroke Database: A Web-
extended scores and simplified real time interaction within Accessible Archive of Violin Bowing Data, NIME07, 7th
electronical environments. International Conference on New Interfaces for Musical
Expression, 2007
Concerning pedagogic issues, the systems and methods will be
more and more accurate and user friendly for a wider range of [10] M. Okner, T. Kernozek, Chinrest pressure in violin playing:
usage and target audience. type of music, chin rest, and shoulder pad as possible
mediators, Clin Biomech (Bristol, Avon)., 12(3):S12-S13,
The combination of traditional instruments, computer and high
1997-04
tech tools like new sensors could motivate a new generation of
young musicians to learn with new methods they like and they are [11] R. Möller, Wentorf, High-speed-camera Recording of Pulp
more and more used to. Cheap and easy to use sensor systems Deformation while Playing Piano or Clavichord.
would support this development. Also if teaching and pedagogy Musikphysiologie und Musikermedizin, 2004, 11. Jg., Nr. 4,
would be more of an adventure and research for new possibilities [12] Guido Van den Berghe, Bart De Moor; Willem Minten,
and “unknown terrain”, making music, learning and playing a Modeling a Grand Piano Key Action, Computer Music
musical instrument and pracitising could be more fascinating. Journal, Vol. 19, No. 2. (Summer, 1995), pp. 15-22, The MIT
Press
[13] T. Grosshauser, Concertare, www.extendedmusic.net, click
“concertare”
102
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
In this paper, we describe an algorithm for the numerical
evaluation of the orientation of an object to which a cluster
of accelerometers, gyroscopes and magnetometers has been
attached. The algorithm is implemented through a set of
Max/Msp and pd new externals. Through the successful
implementation of the algorithm, we introduce Pointing-
at, a new gesture device for the control of sound in a 3D
environment. This work has been at the core of the Celeri-
tas Project, an interdisciplinary research project on motion
tracking technology and multimedia live performances be-
tween the Tyndall Institute of Cork and the Interaction Figure 1: Mote and its cluster of sensors with bat-
Design Centre of Limerick. tery pack. Dimensions are 25 x 25 x 50 mm
Keywords
On the basis of the result achieved, we introduce in the
Tracking Orientation, Pitch Yaw and Roll, Quaternion, Eu- last paragraph Pointing-at, a new gesture device for the
ler, Orientation Matrix, Max/Msp,pd, Wireless Inertial Mea- control of sound in a 3D space or any surround system. The
surement Unit (WIMU) Sensors, Micro-Electro-Mechanical device can be used both in studio and live performances.
Systems (MEMS), Gyroscopes, Accelerometers, Magnetome- Our Celeritas system is built around the Tyndall’s 25mm
ters WIMU which is an array of sensors combined with a 12-
bit ADC [6, 4, 5, 7]. The sensor array is made up of three
1. INTRODUCTION single axis gyroscopes, two dual axis accelerometers and two
Motion Tracking technology has interested the multime- dual-axis magnetometers.
dia art community for two or more decades. Most of these The accelerometers measure the acceleration on the three
systems have tried to offer a valid alternative to camera- orthogonal axes (U, V and W as shown in Figure 2).
based system such as VNS[2] and EyesWeb [14]. Between The gyroscopes measures the angular rate around the
them are: DIEM [1], Troika Ranch[15], Shape Wrap, Pair three orthogonal axes.
and Wisear [19], Eco [17], Sensemble [13],The Hands [8]and The magnetometers measure the earth magnetic field on
Celeritas [20, 16] from the authors. the three orthogonal axes.
In this paper we describe the algorithm to numerically
solve the orientation of each single mote in our Celeritas
system. We also aim to give an introduction to the topic
2. TERMINOLOGY
to persons that aim to develop their own tracking device Before going into the description of the algorithm, we
(using Arduino for example). Although a full Max/Msp would like to introduce the reader to some of the most com-
and pd library has been developed and made available at mon terms in use to make easier the understanding of the
[10] , we have listed in the reference of this paper other following sections. A good explanation of this terms and of
Max/Msp developers [11, 12, 18] whose work has been freely the 3D math can be also found at [9].
released though their work focuses only on the conversion System of Reference. We will discuss the two systems of
between different numerical representation and does not in- reference: the Earth-Fixed one (x, y, z) which has the x
teract with the specific device specified above. axis pointing at the North Pole, y axis at west and z at
the Earth’ core. The IMU-Fixed frame (u, v, w) with three
orthogonal axes parallel to the sensor’s sensitive axes.
Quaternions form a 4-dimensional normed division alge-
Permission to make digital or hard copies of all or part of this work for bra over the real numbers. It is usually written in the form
personal or classroom use is granted without fee provided that copies are qw2 +qx2 +qy 2 +qz 2 = 1. Quaternions are used to represent
not made or distributed for profit or commercial advantage and that copies the rotation of an object in a 3D space. They are very com-
bear this notice and the full citation on the first page. To copy otherwise, to mon in programming, as they don’t suffer from problems
republish, to post on servers or to redistribute to lists, requires prior specific with singularities at 90 degrees.
permission and/or a fee.
NIME08, Genova, Italy Euler Angles The Euler angles are usually given in aero-
Copyright 2008 Copyright remains with the author(s). nautical term as Pitch, Roll and Yaw as shown in Figure
103
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3. ALGORITHM 0 1
1 0 0
With our cluster of sensors we calculate the orientation of R(v, α, k + 1) = @ 0 cosΔα(k + 1) −sinΔα(k + 1) A
the sensor with respect of the Earth-fixed frame of reference. 0 sinΔα(k + 1) cosΔα(k + 1)
The orientation is retrieved using two source of estimation:
the output of the gyroscopes and then the combination of which can be generally written as:
accelerometers and magnetometers on the other. The rea- Rotation(k+1) = R(w, θ, k+1)∗R(v, φ, k+1)∗R(v, α, k+1);
sons for doing this is that gyroscopes are not self-sufficient
for long-term precision because of a drift associated with Therefore we define our Orientation, in Matrix format, as
their reading. Accelerometers and magnetometers, on the to be:
other hand, are good for long-term stability but, once again,
Orientation(k + 1) = Rotation(k + 1) ∗ Orientation(k);
not good for short-term accuracy due to occasional inaccu-
racy caused by linear and rotational acceleration. Thus, From these results, the algorithm converts the resulting
our algorithm combines the short-term precision of the gy- matrix into quaternion and angle,x,y,z format which facili-
roscopes with the long-term precision of accelerometers and tate ease of use in graphical oriented programming language
magnetometers. such as Max/Msp and pd.
3.1 Reading the values from the sensor 3.3 Orientation using Accelerometers and Mag-
As the data from the motes are sent wirelessly to a base netometers
station connected to the host computer via serial port, we So far we considered the 3 x 3 Orientation Matrix as the
designed a C driver to handle this stream. Ultimately, we matrix describing the orientation of the IMU-fixed frame in
compiled a new external (mote) to import this stream in relation to the Earth-fixed frame. Conversely, the Inverse
Max/Msp or Pd. Values appearing in our host application Rotation Matrix describes the orientation of the Earth-fixed
104
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
frame in relation to the IMU-fixed frame and can be written quat2axis Converts the quaternion format to the angle, x,
as: y, z format.
azi ele Converts the input to azimuth and elevation num-
0 1 bers making the format readable by Vector Base Amplitude
a11 a12 a13
Orientation 1 = @ a21 a22 a23 A
− Panning (VBAP) or other multi-channel libraries.
a31 a32 a33 A schematic of the max patch is shown in Figure 4.
105
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
8. ADDITIONAL AUTHORS
Additional Author: Brendan OFlynn, Tyndall National
Institute - Cork University, email: brendan.oflynn@tyndall.ie
9. REFERENCES
[1] http://hjem.get2net.dk/diem/products.html.
[2] http://homepage.mac.com/davidrokeby/vns.html.
[3] http://www.acoustics.hut.fi/ ville/.
[4]
http://www.analog.com/en/prod/0,2877,adxl202,00.html.
[5]
http://www.analog.com/en/prod/0,,764 801 adxrs150,00.html.
[6] http://www.analog.com/en/prod/0,,ad7490,00.html.
[7]
http://www.atmel.com/dyn/products/product card.asp?part id=2018.
[8] http://www.crackle.org/the%20hands%201984.htm.
[9] http://www.euclideanspace.com/.
[10] http://www.idc.ul.ie/idcwiki/index.php/celeritas.
[11] http://www.jasch.ch/code release.html.
[12]
http://www.maxobjects.com/?v=libraries&id library=111.
[13] R. Aylward, S. D. Lovell, and J. Paradiso.
”sensemble: A wireless, compact, multi-user sensor
system for interactive dance”. The 2006 International
Conference on New Interfaces for Musical Expression
(NIME’06), (134 - 139), June 2006.
[14] A. Camurri, S. Hashimoto, M. Ricchetti, A. Ricci,
K. Suzuki, R. Trocca, and G. Volpe. Eyesweb:
Toward gesture and affect recognition in interactive
106
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Figure 1:FSR Footswitch
NIME08, June 4-8, 2008, Genova, Italy
Copyright remains with the author(s).
The foot/switch interface is provided by a soft but “grippy”
toroidal ring of molded rubber embedded in a hard PVC disk.
107
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
108
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
109
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
This device was designed to roll up and fit into a guitar case.
The electronics is configured to measure the positions of up to
two concurrent depressions of the strip.
3. Augmenting controllers
3.1 Pressure sensing buttons
In a project augmenting the cello [6, 7] the author discovered
many situations where it was as easy to install a pressure sensor
as a switch. Many microcontrollers have built-in A/D
converters so often there is a tiny or no additional cost to using
pressure sensors for switches. Continuing this idea -that the
fastest route to a new controller may be to modify an existing
sensor- we see in Figure 9 how to retrofit the Monome button
array with pressure sensors, the grey octagonal disks.
.
Figure 10: Dual Touch Pad
We can still explore some multitouch gestures by assembling a
pad that senses two simultaneous touches as shown in Figure
10. A pair of SlideWide sensors (http://infusionsystems.com)
are stuck to each other at right angles.
Instead of measuring single touch position for each axis using
the well-known potential divider method we ground the
“wiper” contact and measure the two end point resistances to
this ground node to estimate the position of the outer most
touch point pair. This idea was patented in 1972 for duophonic
analog synthesizer keyboards (USPO3665089). This method
was independently rediscovered for resistive touch applications
by the author and Mr. Loviscach [10].
Figure 9: Pressure Sensitive Monome Adaptation
The controller in Figure10 also includes a sheet of
Monome (http://monome.org) interfaces are square arrays of lit piezoresistive fabric to measure a single pressure estimate. The
switches interfaced over USB using OSC messaging. A large SlideWide sensors flex sufficiently for a useful touch pressure
part of the desirability of this interface is the tactile quality of range.
the buttons created with careful design of the silicone molding.
Each button has a ring of conductive rubber attached to connect 3.3 Touch Pad
with a circular array of interdigitated contacts. The conductivity Most computer laptop touch pads use capacitive measuring
of this connection does change with pressure but the techniques because of the low costs of high volume PCB
conductivity is so effective it is hard to measure the change production. Touch pressure cannot be measured by these pads
accurately. By cutting a small disk of piezoresistive fabric with which is unfortunate as it is an extremely useful control
a central hole we can retrofit a higher resistance range pressure parameter in musical applications - specially in combination
sensor. With careful design of the interface electronics we can with spatial location [16]. Resistive touch pads by contrast
even eliminate the array of diodes needed to scan concurrent provide x,y and z axis sensing, require simple calibration and
depressions of the buttons [9, 13]. are less prone to electrical interference and variations in
ambient humidity.
3.2 Dual Touch Pad Interlink, the main supplier of resistive xyz touch pads offers
Because of the number of connections required for matrix
them in only a few small standard sizes. They are rather
scanning it is difficult to rapidly prototype multi-touch systems.
expensive and technically challenging to employ in large
Even optical systems that avoid matrix scanning on the surface
arrays. By combining Velostat (http://3m.com), an electrically
itself are hard to build quickly because of the difficulties of
resistive plastic sheet material and piezoresistive fabric we can
sufficiently illuminating the interior of the touch surface and the
rapidly build xyz pads for modest cost and in a wide range of
complexities of calibrating the optical path of the camera [8].
sizes as illustrated in figure 11.
110
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
111
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
potential. Most engineering departments still focus on high Computer Music Conference, International
manufacturing volume materials made with standard milling Computer Music Association, New Orleans, LA,
and printing techniques. 2006, pp. 636-642.
A difficult problem for experienced designers is that they have [7] A. Freed, F.-M. Uitti, D. Wessel and M.
to abandon standard assumptions, such as “conductors are Zbyszynski, Augmenting the Cello, International
metals and plastics are nonconductive”. Polymers exist now Conference on New Interfaces for Musical
that are nearly as conductive as copper and are expected soon to Expression, Paris, France, 2006, pp. 409-413.
be more conductive. Even translucent concrete is now [8] J. Y. Han, Low-cost multi-touch sensing through
available. frustrated total internal reflection, Proceedings of
the 18th annual ACM symposium on User interface
Physical computing books mostly encapsulate workable recipes software and technology, ACM, Seattle, WA, USA,
that are twenty years old. Vendor application notes usually 2005.
address very narrow application spaces. [9] W. D. Hillis, A High-Resolution Imaging Touch
Effective application of the new materials requires a new Sensor, The International Journal of Robotics
curriculum based on emerging design patterns and will require Research, 1 (1982), pp. 33.
a context where the wisdom and experience of fiber and [10] J. Loviscach, , Two-finger input with a standard
malleable materials artists can be melded with that of material touch screen, Proceedings of the 20th annual ACM
scientists and application developers. symposium on User interface software and
technology, ACM, Newport, Rhode Island, USA,
6. ACKNOWLEDGMENTS 2007.
Frances Marie Uitti’s slider controller bag motivated the [11] R. Koehly, D. Curtil and M. M. Wanderley, Paper
author’s exploration of fabric sensing. Thanks to Leah FSRs and latex/fabric traction sensors: methods for
Buechley and Syuzi Pakchyan for generously sharing their the development of home-made touch sensors,
sources and techniques. Thanks to Judi Pettite for providing a Proceedings of the 2006 conference on New
challenging and rewarding studio environment in her fiber arts interfaces for musical expression (2006), pp. 230-
and malleable materials class. 233.
[12] D. Overholt, Musical Interaction Design with the
CREATE USB Interface: Teaching HCI with CUIs
7. REFERENCES instead of GUIs, ICMC, New Orleans, LA, USA,
[1] R. Avizienis and A. Freed, OSC and Gesture 2006.
features of CNMAT's Connectivity Processor, Open [13] J. A. Purbrick, A Force Transducer Employing
Sound Control Conference, Berkeley, CA, 2004. Conductive Silicone Rubber, Proceedings of the 1st
[2] S. K. Bjorn Hartmann, Michael Bernstein, Leith International Conference on Robot Vision and
Abdulla Brandon Burr, Avi Robinson-Mosher, Sensory Controls (1981), pp. 73-80.
Jennifer Gee, Reflective Physical Prototyping [14] J. T. Remillard, J. R. Jones, B. D. Poindexter, J.
through Integrated Design, Test, and Analysis, H. Helms and W. H. Weber, Degradation of
UIST 06, ACM, Montreaux Switzerland, 2006. Urethane-Foam-Backed Poly (vinyl chloride)
[3] L. Buechley, N. Elumeze and M. Eisenberg, Studied Using Raman and Fluorescence
Electronic/computational textiles and children's Microscopy, Applied Spectroscopy, 52 (1998), pp.
crafts, Interaction Design And Children (2006), 1369-1376.
pp. 49-56. [15] M. J. Wesley, J. R. P. Hanna and J. M. Richard,
[4] A. Chang and H. Ishii, Zstretch: a stretchy fabric Advances in dataflow programming languages,
music controller, Proceedings of the 7th ACM Comput. Surv., 36 (2004), pp. 1-34.
international conference on New interfaces for [16] D. Wessel, R. Avizienis, A. Freed and M. Wright,
musical expression (2007), pp. 46-49. A Force Sensitive Multi-touch Array Supporting
[5] A. Freed, R. Avizienis and M. Wright, Beyond 0- Multiple 2-D Musical Control Structures, New
5V: Expanding Sensor Integration Architectures, Interfaces for Musical Expression, New York,
International Conference on New Interfaces for 2007.
Musical Expression, Paris, France, 2006. [17] M. Wright, Open Sound Control: an enabling
[6] A. Freed, A. Lee, J. Schott, F.-M. Uitti, M. Wright technology for musical networking, Organised
and M. Zbyszynski, Comparing Musical Control Sound, 10 (2005), pp. 193-200.
Structures and Signal Processing Strategies for the
Augmented Cello and Guitar, International
112
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
alain.crevoisier@heig-vd.ch greg.kellum@cmusge.ch
113
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2.4 Communication
Contact points and their intensity information are sent to the client
application using the OSC protocol. This way, our multi-touch
system can be used as input device by a multitude of OSC
compatible applications (Reaktor, Max/MSP, SuperCollider, and
so on.). The messages are formatted as follows:
/touchEvent id touchState xPos yPos amplitude frequency
Figure 2. Image seen by the camera. Visible light is filtered The TUIO protocol, developed for communication with table-top
out using a 800nm pass filter. tangible user interfaces [5], is also supported. Messages are sent
114
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
in this protocol using the message type for 2-D cursors, 2Dcur. 3.2 Auxiliary Screen & Reference Grid
These TUIO messages do not, however, contain all of the In this configuration, a visual reference is placed on the surface,
information that is being sent with the previously mentioned OSC in the form of a grid, representing the control area (Figure 4).
message format. TUIO supports sending the identifiers for touch Control widgets displayed on the screen are aligned according to
points and their x and y positions explicitly as well as their touch the same repartition of lines and columns. Figure 5 shows an
state implicitly. It does not explicitly provide support for sending example of three different mapping layouts that have been
amplitude or frequency information, but it does provide a single designed using Max/MSP. Users can switch from one page to
free parameter that can be used to send an int, float, string or blob. another during performance using the two buttons on the bottom
We are using this free parameter to send the amplitude of the right of each page. The first page features a 4x4 array of pads
touch events while discarding their frequencies. with a single fader, the second page a 2D continuous controller
Both our custom OSC messages as well as TUIO can be received with the same single fader, and the third page an array of 5 faders.
by a variety of software clients. We began by working with Max In practice, experiments have shown that the grid on the surface
as our preferred client, but we found that mapping a surface in was giving sufficient information to establish a clear correlation
Max to assign functions to various zones on the interface was between the screen and the surface, allowing to select and activate
quite cumbersome. Even though we were using a scripting the desired control widgets in a single step. The advantage is thus
language inside Max to perform the mapping from contact points a more direct and engaging interaction, compared to the previous
to zones, it still took an inordinate amount of time to create the approach.
mapping script, Therefore, we have designed a dedicated
application for mapping input gestures to MIDI or OSC events, as
described in section 4.
3. IN USE
Since no image is projected on the surface, users need to know
what they are doing and what the state of their actions is on a
different manner. We have explored three different interaction
strategies, as presented below.
4. SURFACE EDITOR
In order to create control layouts and configure surfaces more
easily than using Max, we are currently developing a dedicated
Figure 4. Reference grid on the surface and auxiliary screen. software tool. The Surface Editor is organized around a main
115
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
window, representing the interface, and several configuration and 6. Koike, H., Sato, Y., and Kobayashi, Y. Integrating Paper and
browsing windows that can be either floating or docked on the Digital Information on EnhancedDesk: a Method for Realtime
border of the main screen (Figure 6). The editor has two modes: Finger Tracking on an Augmented Desk System. ACM
the editing mode, where all configuration windows are visible, Transactions on Computer-Human Interaction (TOCHI), 8
and the full screen mode, where only the interface is visible. (4). 307-322.
Information on the latest version is available on our website [17]. 7. Letessier, J., and Berard, F. Visual Tracking of Bare Fingers
for Interactive Surfaces. Proc. of the ACM Symposium on
User Interface Software and Technology (UIST), 2004.
8. Malik, S., and Laszlo, J. Visual Touchpad: A Two-Handed
Gestural Input Device. Proceedings of the International
Conference on Multimodal Interfaces, 2004, 289-296.
9. Martin, D.A., Morrison, G., Sanoy, C., and McCharles, R.
Simultaneous Multiple-Input Touch Display, Proc. of the
UbiComp 2002 Workshop.
10. Polotti, P., Sampietro, M., Sarti A., Crevoisier, A. Acoustic
Localization of Tactile Interactions for the Development of
Novel Tangible Interfaces, Proc. of the 8th Int. Conference on
Digital Audio Effects (DAFX-05), Madrid, Spain, 2005.
11. Rekimoto, J. SmartSkin: An Infrastructure for Freehand
Manipulation on Interactive Surfaces. Proceedings of CHI
2002. 113-120.
Figure 6. The main screen of the Surface Editor. 12. Tomasi, C., Rafii, A. and Torunoglu, I. Full-size Projection
Keyboard for Handheld Devices. Communications of the
5. ACKNOWLEDGMENTS ACM, 46-7 (2003). 70-75.
The project presented here is supported by the Swiss National 13. Wilson, A. PlayAnywhere: A Compact Tabletop Computer
Funding Agency and the University of Applied Sciences. Special Vision System, Proceedings of the ACM Symposium on User
thanks to all the people involved in the developments presented Interface Software and Technology (UIST), 2005.
here, in particular Pierrick Zoss for the programming of the
editor’s interface, Aymen Yermani for the initial development of 14. Wilson, A. TouchLight: An Imaging Touch Screen and
the multi-touch technology, and Mathieu Kaelin for his work on Display for Gesture-Based Interaction, Proceedings of the
the integrated illuminator. International Conference on Multimodal Interfaces, 2004.
15. Wu, M., and R. Balakrishnan, Multi-finger and Whole Hand
6. REFERENCES Gestural Interaction Techniques for Multi-User Tabletop
Displays. Proc. of the ACM Symposium on User Interface
1. Crevoisier, A. Future-instruments.net: Towards the Creation
Software and Technology, 2003.
of Hybrid Electronic-Acoustic Musical Instruments, Proc. of
the CHI workshop on Sonic Interaction Design, 2008. 16. http://www.nime.org
2. Dietz, P.H.; Leigh, D.L. DiamondTouch: A Multi-User Touch 17. http://www.future-instruments.net
Technology, Proc. of the ACM Symposium on User Interface 18. http://www.jazzmutant.com
Software and Technology (UIST), 2001.
19. http://www.surface.com
3. Han, J.Y. Low-Cost Multi-Touch Sensing through Frustrated
Total Internal Reflection, Proc. of the ACM Symposium on 20. http://www.tactex.com
User Interface Software and Technology (UIST), 2005. 21. http://www.celluon.com
4. Jordà, S., Kaltenbrunner, M., Geiger, G. and Bencina, R., The 22. http://www.lumio.com
reacTable*, Proceedings of the International Computer Music
23. http://www.smarttech.com
Conference (ICMC2005), Barcelona (Spain).
24. http://www.merl.com/projects/DiamondTouch/
5. Kaltenbrunner, M., Bovermann, T., Bencina, R. and Costanza,
E., “TUIO - A Protocol for Table Based Tangible User 25. http://nuigroup.com/wiki/Diffused_Illumination_Plans/
Interfaces”, Proceedings of the 6th International Workshop on 26. http://www.naturalpoint.com/
Gesture in Human-Computer Interaction and Simulation (GW
2005), Vannes (France).
116
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT control [1]. Other designers look towards HCI for a framework to
This paper presents a comparison of the movement styles of two base the design process on. Design practise may be informed by
theremin players based on observation and analysis of video ergonomics with a task based view of instrumental performance
recordings. The premise behind this research is that a and an associated desire to reduce the effort required by the
consideration of musicians’ movements could form the basis for a performer to complete a musical task [2]. Ryan and others
new framework for the design of new instruments. Laban however have pointed towards a notion of desirable effort in
Movement Analysis is used to qualitatively analyse the movement instrumental performance, expressing a view that an integral part
styles of the musicians and to argue that the Recuperation phase of expressive musicianship stems from the struggle with the
of their phrasing is essential to achieve satisfactory performance. instrument in the creation of the sound, “Though the principle of
effortlessness may guide good word processor design, it may have
Keywords no comparable utility in the design of a musical instrument. In
designing a new instrument it might be just as interesting to make
Effort Phrasing, Recuperation, Laban Movement Analysis,
control as difficult as possible. Physical effort is a characteristic of
Theremin
the playing of all musical instruments.” [3]
The notion of introducing physicality by making ‘control difficult’
1. INTRODUCTION is one that has been explored in several new interfaces [4]. As
A decoupling occurs in the design of a new digital music
previously described by the first author the GSpring can be seen
instrument (DMI) due to the freedom of not having to match
in this vein [5]. This approach however trivializes the complexity
physical exertion to driving energy. The instrument can be viewed
of human expressive movement. Simply requiring more force for
as being composed of distinct interconnected parts such as the
a given result does not necessarily engender a more expressive
interface, mapping, sound engine, and sound reinforcement. This
performance. The Theremin for example is an instrument that at
decoupling has been the focus of much research in terms of the
first consideration would seem to require little force in its
opportunities it affords and also in terms of problems that arise for
performance due to its ‘hands free’ non-contact interface. It does
the instrumentalist and audience when the relationship between
however allow for rich expressive movement as part of its
gesture and sonic result is obfuscated or removed.
performance. Clearly there is something else to this notion of
In a ‘traditional’ acoustic instrument, the physics of the sound effortful performance than merely the requirement for physical
production provides a guiding framework within which the exertion. As Waiswicz indicates in his comments regarding effort
instrument design evolves. The excitation of a string, the physics and expression, “In the early eighties I formulated thoughts about
of a standing wave in a column of air, these physical realities the importance of forcing the performer to apply physical effort
force their influence upon the instruments overall physical when playing sensor instruments. I assumed that also this effort
realisation, its size, where the valves and buttons are positioned factor was crucial in the transmission of musicality through
etc. In the design of Digital Musical Instruments (DMI) this electronic instruments. Now I think the crucial aspect of perceived
framework is absent. musicality is not the notion of effort itself, but what we feel and
Many different approaches to DMI design are evident in the perceive of how the physical effort is managed by the performer.”
literature. The field of tangible interface design points towards the [6] The term ‘managed’ is key here. It invokes an
notion of physicality in the interface particularly when contrasted acknowledgement of the temporality of movement; that
with standard mouse and keyboard paradigms for computer movement unfolds in time and that therefore to consider musical
performance is to consider how the performer’s movements
unfold over time. Analogous to the musical idea of phrasing we
Permission to make digital or hard copies of all or part of this work for must consider how the performer phrases their movement and
personal or classroom use is granted without fee provided that copies are how this correlates with the musical result if we are to enquire
not made or distributed for profit or commercial advantage and that into the nature of effortful musical expression. What are the
copies bear this notice and the full citation on the first page. To copy qualities that establish a certain relationship between movement
otherwise, or republish, to post on servers or to redistribute to lists,
phrasing and sonic phrasing as desirable? It is our belief that a
requires prior specific permission and/or a fee.
NIME08, June 4-8, 2008, Genova, Italy better understanding of these qualities could inform the design of
Copyright remains with the author(s).
117
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
new DMI’s that allow for the visceral physicality of performance the gesture expressed his fury. If we say “He brought his fist
visible on acoustic instruments without being simply mimetic. down onto the table with a diminished Punch – Strong, Quick,
As a starting point in our enquiry into the nature of physicality in Direct” – we have a better idea of the expression. Equally, if we
musical performance this paper presents an analysis of two saw the gesture - an arm coming down with a clenched fist toward
musicians’ theremin performances and attempts to draw from the table – done with that Strong Quick Direct Effort quality, we
these observations lessons that may inform the design of new would not have to hear the words, nor would the hand even have
instruments. Our premise is simple: when playing a musical to hit the table in order to see and interpret the expression of the
instrument, the human performing artist needs to use their body in gesture as angry. If the hand came down (clenched or not) with
order to allow an expressive musical process to occur. Therefore Light, Free Flow, Sustained we would probably not interpret it as
in designing a new musical instrument, we need to take this an angry gesture. The expression in the communication comes
premise into account and design instruments that allow for – through the quality of the movement. This is an example of Effort.
perhaps even invite – expressive human movement in the The Body, Shape and Space are also informative of the
production of the sound. individual’s movement patterns in other ways. It is assumed that
each individual has their own movement preferences in all four
In our performer observations Laban Movement Analysis (LMA) categories of movement and that these preferences are
is used as a qualitative framework for the description of this recognisable aspects of that person’s personality and expressive
movement. In particular we focus on one concept taken from the style.
Laban framework, Exertion/Recuperation and its role in
movement phrasing. All terms that are part of LMA are
capitalized. Within LMA these terms have specific defined
meaning.
2. BACKGROUND
The study of musicians’ movement is a well explored area of
research within the field of NIME seeing much concentration on
attempts to classify musicians’ movement in terms of gesture
types [7]. On the quantitative level, motion capture has been used
to study the ancillary gestures of clarinetists [8]. Laban Movement
Analysis (LMA) has been applied to provide a qualitative
description of clarinetists’ ancillary gestures [9]. Elsewhere
Figure 1 Basic Elements of LMA [12]
Laban’s theory of Effort, which constitutes part of the LMA
framework, has been used to investigate the impact of dynamic
resistance modulation on performer’s movement [10]. Since it is An aspect that we feel is of particular interest in studying
the musician’s body movement that produces the sound from the musicians’ movement is how Efforts are sequenced together.
instrument, we believe that qualitatively observing and describing Laban emphasised the role of rhythm in manual work which lead
the body’s movement both in conversational and in performing to the development of his concept of phrasing in movement. This
situations can give us an understanding of how that performer was particularly evident in his work with Lawrence on the
produces his/her expressivity while playing. movement of factory workers in wartime Britain. In this situation
Laban‘s remit was to alleviate strain and dissatisfaction amongst
2.1 Overview of LMA conveyor belt based workers in factories. Prior to this ‘time and
A full description of LMA is beyond the scope of this paper. Here motion’ studies had been used to minimize the amount of
we present a general overview of the framework and explain the movement required for a particular job in the belief that this
background to Effort phrasing in particular approach would maximize productivity. Laban however
Exertion/Recuperation. emphasized the need for full body movement as part of any
LMA provides a rich overview of the scope of movement process to alleviate strain and through his Effort system developed
possibilities. The basic elements of Body, Effort, Shape and Space movement training programs for workers that allowed for
(BESS) can be used for describing movement and providing an Recuperation as part of the production process. He emphasized
inroad to understanding movement. Every human being combines the correct phrasing of Efforts to allow for Recuperation
these movement factors in their own unique way and organizes following Exertion. In this way workers were able to minimize
them to create phrases and relationships which reveal personal, strain, enjoy their work more and work for longer. [13]
artistic, or cultural style [11]. Many movement observers [14][15][16] have developed Laban’s
An important distinction between LMA and other forms of concept of phrasing, or what he originally called “rhythm” in their
movement description is that LMA describes the movement of the work. Peggy Hackney defines phrases as perceivable units of
body in qualitative terms – not in aesthetic, quantitative or movement which are in some sense meaningful. They begin and
anatomical terms. It is the qualitative aspect of movement that is end while containing a through line [17]. Irmgard Bartenieff was
the key to describing and therefore understanding expressive non- particularly interested in how a well-phrased movement was
verbal communication. If we are describing a businessman losing simultaneously more expressive and more functionally effective:
his temper at a board meeting and we say, “His arm traveled “Thus, it is not just the activity that identifies the behavior but it is
downwards until his closed hand came to the table” (a quantitative the sequence and phrasing with their distinctive rhythms that
or mechanical description of an action) we have no sense of how express and reinforce verbal and emotional content.” [18] Maletic
118
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
comments on a particular basic training exercise that Laban used: 3. MATERIALS AND METHODS
“Its characteristic sequence is a rhythmic chain of a preparatory
swing (Anschwung) followed by a main swing (Aufschwung) and 3.1 Choice of material
its expiration which can coincide with its re-initiation.” [19] These observations are taken from a DVD produced in 1998 by
Moog Music Inc called Mastering the Theremin [23]. Two
Practitioners using the concept of a phrase of movement have
Theremin players, Lydia Kavina and Clara Rockmore,
divided it up in various ways. Hackney depicts this graphically:
demonstrate, discuss and perform pieces on the theremin. A
movement analysis of the two women has been carried out using
both formal performance and informal interactive situations. In
the case of Clara Rockmore three different situations were
analysed: firstly a social gathering discussing the theremin with
her sister and nephew at her apartment, secondly a demonstration
of theremin performance technique where she explains the basics
of performing with the instrument and finally her performances of
three pieces accompanied by her sister on piano. In the case of
Figure 2 Phases of Phrasing [20]
Lydia Kavina we analyzed six lessons given by her to camera and
2.2 Phases of Phrasing four performances. The performers were chosen as they are both
considered experts on this instrument. The availability of footage
In LMA a movement is commonly seen as being organized into
showing the two performers both in and out of performance was a
three sections: a Preparation phase that may overlap with the
requirement for the study as we wished to be able to compare both
initiation phase, a Main Action and finally a Recuperation or
styles in as wide a context as possible. We were also interested in
follow through phase that may resolve into a transition and
observing their individual movement patterns of phrasing in
subsequent preparation phase for the next action. Here we use the
different situations.
idea that a phrase has three phases: Preparation, Main Action, and
Recuperation. 3.2 Analysis Method
2.2.1 Preparation phase: Each section of the DVD described above was observed on four
In order to do any physically demanding or complex task, we see different days over a period of a month. Each observation session
an individual prepare themselves: whether it is the ballet dancer lasted between two and four hours. Three other Laban Movement
before a pirouette, a pole vaulter about to run or a new graduate Analysts were consulted on an informal basis to compare
about to go into a job interview. The body prepares as the mind (ie observations. All four categories of BESS were used in order to
the focus, concentration) prepares for the task. “It is in the establish some understanding of the differences in the performers’
preparation moment that we claim our intention. Intention patterns movement styles before focusing on their respective Recuperation
the organism.” [21] phase of phrasing.
119
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
120
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
musical expression. Though LMA has been applied to the analysis [8] Wanderley MM et al, The Musical Significance of
of musicians movement in NIME before an aspect that has up Clarinetists’ Ancillary Gestures: An Exploration of the
until now been ignored is Effort phrasing. Here we have Field. Journal of New Music Research 2005, Vol. 34, No.
particularly focused on Recuperation as part of phrasing, 1, pp. 97 – 113
hypothesising that the degree to which the performer may [9] Louise Campbell. On the use of Laban-Bartenieff
recuperate influences the perceived musical tension. We have techniques to describe ancillary gestures of clarinetists
highlighted the factors which influence the Recuperation phase, [Internet]. 2005 ;Available from:
the instruments realisation, the musical goal and the performers http://www.music.mcgill.ca/musictech/clarinet/LBMF_Final
skill, in the belief that an early consideration of the interdependent _Report.pdf
nature of these three factors can inform the design of a new
interface. [10] Bennett P, Ward N, O'Modhrain S, Rebelo P. DAMPER: a
platform for effortful interface development. Proceedings of
Specifically we have presented an analysis of two thereminists the 7th international conference on New interfaces for
movement. We have focused on the manner in which they musical expression. 2007; 273-276.
Recuperate following the Exertion of playing the instrument. Our
analysis shows that each performer has a different style of [11] Hackney P. Making Connections: Total Body Integration
Recuperation and we have demonstrated the utility of LMA in Through Bartenieff Fundamentals. Routledge; 1998. ;237
qualitatively describing this difference. [12] Hackney P. Making Connections: Total Body Integration
In our analysis of Recuperation presented here we have focused Through Bartenieff Fundamentals. Routledge; 1998
on the Recuperation evident as the performer finishes a piece or [13] Davies, E. Beyond Dance: Laban's Legacy of Movement
takes a rest as the accompanist carries the music. Future work will Analysis, Brechin Books. 2001 ;44-45
seek to focus LMA’s theory of phrasing on musicians movement
[14] North, M. Personality Assessment Through Movement,
looking at how the performer phrases their Exertion and Plays, inc. 1975
Recuperation whilst actively playing and seeking to correlate this
with perceived musical tension. [15] Penfield, Kedzie Comparison of Two Dancers.
Unpublished Certificate Dissertation at Dance Notation
Bureau, NYC. 1972
6. REFERENCES [16] Davis, M., Movement characteristics of hospitalized
[1] Ishii, H. & Ullmer, B. Tangible Bits: Towards Seamless psychiatric patients. American Journal of Dance Therapy,
Interfaces between People, Bits and Atoms CHI, 1997, 234- 4,1 (1981), 52-71.
241 [17] Hackney P. Making Connections: Total Body Integration
[2] Wanderley MM, Orio N. Evaluation of Input Devices for Through Bartenieff Fundamentals. Routledge; 1998. ;239
Musical Expression: Borrowing Tools from HCI. Comput. [18] Bartenieff, I. Lewis, D. Body Movement: Coping with the
Music Journal. 2002 ;26(3):62-76. Environment, Routledge. 1980. ;73
[3] Ryan J. Some Remarks on Musical Instrument Design at [19] Maletic, V. Body, Space, Expression: The Development of
STEIM. Contemporary Music Review. 1991 ;6(1):3 - 17. Rudolf Laban's Movement and, Walter de Gruyter. 1987. ;96
[4] Bennett P, Ward N, O'Modhrain S, Rebelo P. DAMPER: a [20] Hackney P. Making Connections: Total Body Integration
platform for effortful interface development. Proceedings of Through Bartenieff Fundamentals. Routledge; 1998
the 7th international conference on New interfaces for
musical expression. 2007; 273-276. [21] Hackney P. Making Connections: Total Body Integration
Through Bartenieff Fundamentals. Routledge; 1998. ;237
[5] Lebel D, Malloch J. The G-Spring controller. Proceedings
of the 2006 conference on New interfaces for musical [22] Hackney P. Making Connections: Total Body Integration
expression (Paris, France: IRCAM; Centre Pompidou, Through Bartenieff Fundamentals. Routledge; 1998. ;240
2006), 85-88. [23] Mastering The Theremin. (2004). DVD Moogmusic
[6] Michel Waisvisz, Composing the now. Available at: [24] Mastering The Theremin. (2004). DVD. Moogmusic
http://www.crackle.org/composingthenow.htm [Accessed
[25] Laban RV, Ullmann L. Choreutics. Macdonald & Evans;
September 6, 2007].
1966 ;4.
[7] Cadoz C, Wanderly M. Gesture Music. Reprint from :
Trends in Gestural Control of Music, M.M. Wanderley and
M. Battier, eds. . 2000 ;
121
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
122
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
would be a very interesting input in the design of a vir- 3.1 Front-end: gesture capture
tual music instrument, and what remains to do is the The front-end module of the system, is based on the use
validation, classification and detection of such kind of of a camera and a proper video-capture software interfacing
emotions with the proper interface. Therefore Pogany, to ‘Virtual Choreographer’ (VirChor) environment [5]. An
as a member of the affective interfaces family, seems image segmentation tool integrated in VirChor keeps only
to have a priori a major advantage against other inter- the important blocks from the image and finds the normal-
faces in the context of music performance and interac- ized mean luminocity value of the pixels that belong to each
tion. block. In this way we keep just one normalized value of light
intrusion (called alpha value)for each of the pixel blocks that
3. OVERVIEW OF POGANY INTERFACE correspond to each KeyPoint.
current luminosity
alpha value = (1)
luminosity at calibration time
Alpha value is bounded between 0 and 1, with 0 corre-
sponding to maximum light intrusion (that means no cover-
ing of the hole, thus zero activity) and 1 to minimum light
intrusion (the hole is fully covered, maximum activation of
the KeyPoint). The output of the front-end of the system
a b c consists of instantiations of a 43 float vector with a rate of
30 fr/sec, thus providing the gesture recognition core with
Figure 1: a)physical interface b)interaction holes a low dimensional vector instead of raw data of image for-
(KeyPoints) c)types of meaningful gestures detected mat. Further information concerning the particular tech-
niques used (image segmentation, calibration tool) can be
‘Pogany’ is a head-shaped tangible interface for the gen- found in [1], [8].
eration of facial expressions through intuitive contacts or
proximity gestures. The input to the interface consists of 3.2 Middle part: gesture processing
intentional or/and natural affective gestures. The inter- The middle part of the system includes the processing-
face takes advantage of camera-capture technology, passing feature extraction unit and the gesture recognition module.
a video stream to a computer for processing. A number of
constraints mentioned in [1] gave to the interface the size of 3.2.1 Gesture analysis-feature extraction
a joystick and the form shown in figure 1. The position of Here we extract useful features from gesture, such as en-
KeyPoints, small holes on the surface of the head used for ergy and velocity.
finger position capturing, was inspired by the MPEG-4 con- Energy : Particular meaning for the mapping procedure
trol points. In a lit environment, passing over or covering in next stage has the definition of the energy of the signal
these holes with the hands variates the luminosity level cap- that denotes activation in front of the interface. We call
tured by a camera placed inside the facial interface. From this multidimensional signal Xt . In case we define energy
each frame of the raw video image captured we analyze only as: Et = X2t , where Et is the temporal energy vector for the
the pixel blocks that correspond to KeyPoints and thus to frame t=0, 1,..n. The normalized mean short time energy
gestural information on the vicinity of the head. of the signal at frame t is:
Nkp
1 X
Et = Xj,t 2 , (2)
Nkp j=1
123
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3.2.2 Real-time Gesture Recognition Module detection module, HMM recognition core etc.). It is also
The gesture recognition module is responsible for the iden- worth to mention a module for visual feedback, in the form
tification of a ‘meaningful’ gesture or posture that the user of an animated head for facial expressions: this permits the
addresses to the interface out of a continous stream of ges- implicit link of user gestures with emotions arising from fa-
ture data in real time. The difference between gestures and cial expressions. Finally, for the HMM core we used the
postures lies on the motion or motionlessness of the hand in HTKLib (library for the HTK toolkit for speech recogni-
front of the interface. Meaningful gestures (figure 1c) cor- tion)[7], adequately adapted to face with real-time recogni-
respond to gestures with a particular significance that the tion issues for gesture. Details for these modules, as well
system has been trained to recognize; classified on a high as a module for gesture intention recognition based on the
level, they function as expressivity-related commands that Token-Passing algorithm (estimation for the type of gesture
tend to modify the sound synthesis procedure in the form of before it is completed), are described in [8].
modulation or interrupts. These gestures, in order to be dis-
tinguished from raw gesture data with higher success rate, 4. MAPPING STRATEGIES
demand permanent contact with the interface. For the mapping module (see figure 2) we followed mixed
Inspired from our experiments for off-line isolated gestures direct and indirect strategies: the first concern low-level con-
based on HMMs presented in [6], we developed a real-time tinuous information arising from direct gesture processing;
module for continuous gesture recognition. On the parallel, the second refer to the semantic (high-level) information of
we were interested in keeping a high degree of expandability meaningful gestures and postures. We linked this informa-
for the system; that means to let open future enhancements tion with parameters from two types of synthesis: FM and
with multiple gestures, complex gestures and a large-scale Granular Synthesis (GS). In general, for low-level informa-
gesture vocabulary. tion we used one-to-one and for high-level information one-
HMM configuration : In our HMM models the number to-many mapping. Correspondences are shown in table 1.
N of states for the HMM is set to 4, plus the two non-
emitting states at start and at the end. We use a left-to-
right-no-skips topology and an observation vector of a size Table 1: Mapping low & high level information to
of 43. The training of the system is based on the Baum- FM and GS parameters
Welsh algorithm. For the recognition we employed a non-
consuming Viterbi-like algorithm. low level: low level: High level:
Segmentation for continuous gesture : An important Energy Velocity Gesture
issue for the recognition of the continuous gesture is segmen- FM loudness Modulation Frequency
Index Ratio
tation. It is implemented in the activity detector module. Granular loudness time between audio sample,
This module is responsible for detecting predefined meaning- Synthesis grains grain duration,
ful gestures and postures in raw gesture data (meaningless pitch transpo-
gestures and silent parts). This module makes use of the sition,...
previously defined MMV and MAR metrics in combination
with a number of constraints. We provide the core of the
algorithm for a) separation of activity parts (gestures and
postures) from silent parts (no activity in front of the inter- 4.1 Direct Mapping strategies
face) and b) separation of gesture from posture: MMV and MAR metrics mentioned in the previous section
serve as continuous parameters that adjust music parame-
if MMV > thresh then ‘activity’
ters in the synthesis procedure.
else ‘silence’
if ‘activity’ then 4.1.1 Mapping Energy
if MAR > thresh2 then ‘gesture’
The Magnitude Value per Frame (MpF) represents loud-
else ‘posture’
ness. The function selected for this transformation was:
MMV represents the general amount of activation in the M pF (nM M V ) = 1 − e−nM M V /a , 0 ≤ nM M V ≤ 1, (5)
vicinity of the interface. Therefore, it gives evidence or not
for the existence of some kind of activity (gestural or postu- where nMMV the normalized MMV in [0..1], a a parameter
ral) or, for values near zero, what for we call ‘gesture silence’. for the control of the gradient of M pF (x). This parameter
MAR expresses the speed of the gesture, therefore it is useful helps to adjust the radius of sensitivity around the interface.
in separating gestural from postural activity. thresh1 and MpF is a conjunction of the need to quasi-linearize the dis-
thresh2 are thresholds used to regulate the procedure rela- tance factor and to preserve the additive effect of multiple
tively to light conditions. According to the output of the finger haptic interaction in zero distance.
activity detector module described above, the system trig-
gers or not the gesture and posture recognition and replies 4.1.2 Mapping Velocity
analogously according to the vocabulary of the meaningful MAR was defined as a metric for the speed of the ges-
types of gestural/postural activity it is trained to detect. ture in front the interface. According to theory, the Mod-
ulation Index M I = Am/F m in FM is responsible for the
3.2.3 Implementations brightness of the sound, as the relative strength of the dif-
In order to support the interface, we have implemented a ferent sidebands (which affects the timbre) is determined
variety of cooperating modules which we have integrated to by the relationship between the modulator amplitude (Am)
VirChor rendering environment (image segmentation, ges- and the modulator frequency (Fm). Hence, we have set
ture collection and data transformation algorithm, gesture M I = M AR/b, where b a normalization factor which gives
124
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
to the continuously changing value a meaningful -in musi- 3) learnability, and 4) explorability. Furthermore, the eval-
cal terms- range [8]. The MAR metric was also adapted for uation process was properly adapted in order to provide an
granular synthesis, this time in order to control the time objective measure for judging the effect of high-level discrete
between grains. gestural information to musical expressivity.
125
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
126
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
that they definitely succeeded in learning new gestures through- 6. CONCLUSIONS-FUTURE RESEARCH
out the little time they were given for manipulation, while The impressions we obtained from this experiment were
the sixth-referred as 1st on statistics- also gave a positive encouraging at many different levels. Firstly, the high-level
answer but with less certainty. On the question if, even af- gestural information decoding module in the second session
ter the experiment, the subject can recall correspondences proved to be particularly useful in terms of expressivity of
between gestures and resulting sounds all subject gave a the user, as stated by all the subjects and confirmed by the
positive answer, each time with more or less certainty. The equivalence of the two sessions in all other aspects of synthe-
opinion of the subjects on the matter was of great interest, sis’ global quality. Second, even through a non-complicated
as with their spontaneous thoughts they have underlined mapping, the general impressions for timbre modification
one of the most important issues for an interface: how to and time precision were positive, as well as for the interface
establish a learning curve that would not discourage ama- itself as a device. Third, the interface succeed in provid-
teurs from getting on with learning and in the same time ing sufficient conditions for learning patterns and exploring
set high limits for the perfection of performance and thus new gestures, with a priority in the advanced users learning
be intriguing for more experienced users to go on explor- curve. Finally, even not proved from the particular exper-
ing the capabilities of the virtual instrument. Hence, as far iment, the decisions concerning the expandability options
as the term of learnability converges with the issues set by of the setup that were left open during architectural design
the term of explorability, it would be worth having a look (such as the option for the interface to be trained for com-
at the statements of some of the subjects (1st, 5th and 6th plex gestures) were not discouraged by the results of the
respectively): experiment.
‘Many difficulties encountered when trying to explore new sounds... Recent results showed that the use of an interface for mu-
difficulties to find a logic and patterns...’ sic within an affective protocol could be beneficial. In the
‘...For the manipulation some time is necessary to explore the future we will focus on consolidating our results with fur-
possibilities but when it’s done, it is very interesting to produce ther experiment and artistic performance use cases. In this
different sounds.’ framework it is worth to also deal with technical issues con-
‘... However, the control on the second experiment was less cerning the interface: robustness, increased sensibility and
effective, maybe due to that it demanded a higher degree of ex- enhanced multimodal techniques, instability under difficult
pertise gained through practice.’ . light conditions, latency etc.. Enhancements within pure
In a question asking for the subject’s expectation concern- recognition issues could also help to improve the overall per-
ing the exploration of new sounds in an hypothetical second formance of the interface.
chance with the interface all subjects have responded pos- Finally, for an affective interface such as ‘Pogany’, even
itively, as if the impression created to themselves is that if the visual head animation feedback implicitly creates cor-
there is still part of the potential of the interface not discov- respondences between users emotions and sound results, a
ered yet. Some of the subjects underlined the importance of study of relative research in psychology field (such as a
the visual feedback in the form of an animated head for the model for touching parts of the body) is more than impera-
exploration of the sound capabilities of the interface. Con- tive. However, such a model is difficult to evaluate, due to
cerning this kind of feedback, all subjects found it in all ways the polyparametric nature of actions of touch among people
useful, also mentioning ‘control’ and ‘logic’ in the sound as and the social factor effect. Nevertheless, this would surely
factors of the creation where it can contribute. help create a solid base for the semantic space to be linked
About the general impression on the interface, the 1st to gestural information.
subject was rather negative. He insisted in the problems he
encountered in trying to understand how exactly it works.
All the other subjects found the interface at least interest-
7. REFERENCES
ing. Although some subjects claimed not to have familiarity [1] Jacquemin, C. ‘Pogany: A tangible cephalomorphic interface
for expressive facial animation’, ACII ’2007, Lisbon, Portugal,
with the ‘type’ of music it produced, or even not to find it 2007.
pleasant, this did not prevent them from attaining a good [2] Paiva, A., Andersson, G., Hook, K., Mourao, D., Costa, M.,
general impression: Martinho, C.‘Sentoy in FantasyA: Designing an affective
‘...sometimes it is noisy, but it’s funny. I felt like playing (good sympathetic interface to a computer game’, Personal
or bad!) a music instrument...’ Ubiquitous Comput. 6(5-6) (2002) 378389.
A subject underlined the constructive appropriateness of [3] Yonezawa, T., Suzuki, N., Mase, K., Kogure, K.,
the ’Pogany’ interface for such a scope: ‘HandySinger: Expressive Singing Voice Morphing using
Personified Hand-puppet Interface ’ , NIME 06, Paris, 2006.
‘Touching the interface seems important and the contact/touch
[4] Ekman, P., Friesen, W.V.: ‘Facial action coding system: A
impression is quite nice...’ technique for the measurement of facial movement’.
Finally, some of the subjects proposed types of usage Consulting Psychologists Press, Palo Alto, CA, USA,1978.
where setups such as the one of ’Pogany’ for music would [5] http://virchor.sourceforge.net
prove particularly useful, such as for blind people. An in- [6] Maniatakos, F., ‘Affective interface for emotion-based music
spiring point of view was also set from one of the subjects, synthesis’, Sound and Music Computing conference SMC07,
mostly concerning intuitive purposes of tangible interfaces Leykada, Greece, 2007.
for music : [7] htk.eng.cam.ac.uk/
‘With this interface, people have to guess how to touch it, to [8] Maniatakos, F., ‘Cephalomorphic interface for emotion-based
musical synthesis’, ATIAM Master Thesis, UPMC Paris 6 &
learn it by themselves...perhaps a ’traditional’ instrument player,
IRCAM, LIMSI-CNRS, Orsay, France, 2007.
after practicing with an interface such as the head interface, will
[9] Wanderley, M., Orio, N. ‘Evaluation of input devices for
try to find other manners to play with his instrument and produce musical expression; borrowing tools from HCI’, Computer
new sounds.’ Music Journal, 26(3):6276, 2002.
127
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
128
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
video input [1] that has been used in several European research in each group (age divided) were let into an empty dance studio
projects (e.g. [2]). There exists a multitude of other software (without mirrors) measuring 150 m2. Each session was filmed
aimed at real-time video analysis and manipulation (e.g. from a fixed video camera. The instruction was ”Welcome in, we
Troikatronix 3 , MAX/MSP/Jitter 4 ). However, EyesWeb was will start in a while”. The children who all knew each other
designed primarily for analysis of expressive human gestures and coming from the same preschool group, immediately started to
was thus chosen for the present application. move around in the space. After approx. 5 minutes (depending on
Gesture control of audio with video recognition has been the activity in the room) music was put on. Music of different
used in several applications at KTH. The first attempt was the styles (orchestral, pop songs) with different rhythm and tempo
Groove Machine in which a dancer controlled the mixing of was played for each group. The music was played for 10-15
different music loops using the emotional expression of the dance minutes, after which the children were gathered in a circle to rest
predicted by a combination of overall motion features extracted and to discuss what had happened. The children were asked if
from a video camera. This evolved later into the computer game they remembered how they had moved. The same music was
Ghost in the Cave centered on emotional expression of gestures played again. The children then repeated some of the movements
and music [6]. It was played with two competitive teams, each and made new ones. During the second phase of moving around
with one to three main players. The main players had to express 2-3 children at a time were invited to a 5 minute recorded
different emotions with either gestures or vocally that was then interview. Accompanied by the interviewer they left the room,
recognized by the computer utilizing fuzzy logic techniques [4]. followed blue arrows taped on the floor in the outside corridor to
An important feature was the collaborative aspects in the game. another room close by. After the interview they returned to the
For example, the team controlled the speed of an avatar while the dance studio and joined in the ongoing activity. Summarizing
main player controlled the steering. The team also controlled the these interviews, most of the children experienced the session in
music – the more they were dancing the more intense became the the dance studio as positive. No one expressed negative feelings
music. The music was synchronized across the teams. One of the with the exception of a few responses referring to a particular
teams controlled the percussion instruments and the other team all music example and the dark studio being scary. (Two groups
the other instruments. came a second time and spent the session in a black dance studio
In the recent artistic research project “Nu Moove” led by instead with purposely little lighting and using flashlights. The
Lisa Ladberg, Royal College of Music, Stockholm, gesture idea was to observe if a near to dark room affected the amount of
control of sounds was used in a stage production. Two movement activity. It turned out that it influenced the movement
professional dancers controlled an interactive sound synthesis very little.) In these interviews to the question what is your
using both overall motion parameters and different zones on the favorite type of movement the answer most often was running.
stage. Choreographic Processing With Dancers
Siegel and Jacobsen [7] describe an example of an interactive In the second step, based on an analysis of what happened in the
dance application, which has many ideas in common with the sessions with the children; through looking at shapes, effort and
present study. Instead of having non-invasive video cameras they timing of movement; balance and shift of weight of the body;
used custom-built bending sensors attached to the joints of a overall movement patterns, interaction, and moods; a 23 minutes
professional dancer. The bending data was transferred wirelessly dance was choreographed. This was done in collaboration with
to a computer and mapped to sound-generating software. Several five professional contemporary dancers. The dancers began by
scenes were defined using different mappings and sound material. learning some of the children’s movement in detail. In this
They also discussed the interesting shift of the roles of the process something that stood out compared to other choreographic
composer and dancer. When the dancer is controlling the music processes was a relationship to time, interaction and perception of
there is a shift from being a dancer to becoming a musician. This self. In the children’s movement there was no expression of
new role of using gestures to control the sound may come in anticipation, planning, or judging. The adult dancers tried to move
conflict with the visual impression of the gesture; the latter with the same intent and found this very difficult. When trying to
obviously being the modus operandi of a dancer. do the children’s material they became aware of their own habits
The dancers’ role in the interaction between choreographer of for example anticipation, or judging. Trying to move without
and dancers has changed considerably recently. The dancer’s role these learned habits became one of the main focuses in the work
has undergone a change from interpreting the choreographer’s and affected it on all levels regarding, space, shape, timing,
intentions and movements to being actively participating in the emotional expression, the way scenes were structured, overall
creative process by also providing more of the movements, that is, dramaturgy, and relationships between the characters/dancers and
going from interpreters to creators. In the current project this has the audience. When looking at the movement and working with it
been taken one step further in that there is no determined we found an emotional content that we tried to put into the
choreography and the audience becomes performers themselves choreography and its performance. This came to play an important
taking an active part in the result. role in the choreography and therefore also in the music based on
the choreography. An example of these types of gestures can be
2. METHOD seen in Figure 1.
Children’s Moving Session
Composing Music
The development of the installation was the last stage in a three-
In the third step music was composed for the choreography based
step process that started with observation of children’s’ free
on a live dance performance and on a video recording of the
movement. Twelve groups of children age 3-6, with 7-9 children
choreography by the composer Niko Röhlcke. He analyzed the
choreography and made a chart of it according to a timeline
3
www.troikatronix.com marking for example rhythmical and spatial patterns and themes.
4
www.cycling74.com He also worked with tempos and efforts set by the dancing,
129
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
images that the different sections evoked and specific movements Even though the installation was temporary, we wanted the
in the choreography. The finished music contained six sections of equipment to look as it was integrated into the room. By removing
clearly different character. distractions we wanted to help the participants to focus on their
own bodies in the interaction with the music and the light. As the
exhibition hall could never meet the weight and power
requirements of a full light rig we had to find a flexible, light and
low power solution. Therefore we chose fluorescent light modules
from the manufacturer Leader Light. Each module is fitted with
four tubes in the color red, green, blue and white. There is full
color blending, a very bright output with very low power usage
and a reasonable weight. Fourteen modules divided in two parallel
rows were neatly fitted into gaps in the acoustic padding in the
ceiling, covering the room’s entire length. The light installation
was programmed and controlled through an AVAB Pronto! ver.
3.1 DMX light console.
A speaker was placed in each of the four corners. All the
equipment were connected to one PC computer running Windows
Figure 1. An example taken from the dance performance with XP equipped with a sound card (E-MU 0404) and two analog
the dancers interpreting children’s movements. video capture boards (IEI, IVC-200).
130
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
accuracy. The resulting output from the video analysis is a stream The light started with an instant bright all covering green
of numbers reflecting the amount of motion in each zone. light, sharply contrasting the purple color of the previous scene.
A patch written in pure-data (pd) 5 served as the overall This light quickly faded to the basic dim light of the scene. We
control unit. The complete sequence consisted of six different worked exclusively with the green color in this scene. Each of the
scenes controlled by a timer unit. Each scene was active for a sixteen zones where connected to the closest corresponding light.
fixed time duration and the whole sequence took about 20 min to As the participants played the different instruments, they were at
cycle through and was then repeated. All the audio control was the same time playing the light. In this way the light hinted the
done within the pd patch. There was a separate sub patch for each different camera zones giving the participants a visual reference to
scene containing the audio samples and the specific control in where a particular sound was played.
each scene. All audio samples were panned in four channels and
positioned according to their trigger position in the room Scene 3 Waltz
facilitating for the user to understand the interaction. Similar to scene 1 the tracks of the original music were divided in
A large floor map showing the correlation between different groups. However, the control was slightly different. The
movement and sound response in the different scenes, was at melody track was activated if there were movement anywhere in
display outside the exhibition space and was used to explain to the the room. Thus one person could walk around in the whole room
visitors how the interaction worked. The idea of having the map and play the melody. When there was movement in two zones
on the floor was to better couple the explanation with the simultaneously, also different effect type of sounds were
installation and to encourage the visitors of physical action while activated. When there was movement in at least three major zones
comprehending the functionality of the system. Following is a the rhythmic accompaniment was finally activated.
description of each scene and its interaction. In scene three we worked with a pale yellow color base.
When a zone was activated, as the music faded up, the pale yellow
Scene 1 Disco light dimmed slightly and gave way for a soft red pulse. This
The tracks in the original sound track were divided into four main change only occurred in the activated zone. The idea was to
parts coupled to the four cameras. The music was groove-oriented stimulate movement in one zone at a time. With participants in all
with two different percussion parts, each divided in two levels, four zones the entire room became warmer and more suitable for a
one bass part and one part with the rest of the melodic lines and soothing waltz.
the accompaniment. The interaction can be seen as an advanced
mixer in which the music is continuously playing and with the Scene 4 Lightning
volume of the different tracks controlled by the QoM from each The major interaction was the possibility to exchange “lightning”
camera. Thus, when nobody is moving the room it is silent. pulses from one side to the other. A large movement in one corner
Moving in one zone will activate the corresponding track. The activated these. There was a resulting light rapidly moving to the
amount of motion controls the volume but will also activate new other side and a corresponding sound moving in the same
tracks in order to enhance the coupling between motion and direction. This pulse could not be retriggered before it has been
sound. Thus to “play” the whole music there must be several “sent” back from the other side. In addition, movement in the
people moving. Scene 1 was optimized so that the typical children outer areas activated some background sounds. The light base was
movement of running around the room in a big circle would space blue. Using the white fluorescents in one row successively
maximize the music output. created the lighting pulse.
The basic light was dim blue with just a tint of red. When
movement occurred in one of the zones the blue light started to Scene 5 Techno/metal
move in a smooth chase clockwise around the room. If two zones Similar to scene 1 and 3 the music was divided in tracks. The
where activated simultaneously a red chase was added. Though, major groove was activated if everybody was gathered in the
just as smooth the red chase was slightly faster allowing every middle of the room and was dancing forcefully. Some effect
nuance of color between blue, purple and red to run through the sounds were triggered in the corners of the room.
room. Activating a third zone made the intensity of the blue chase The green and blue lights corresponding to the activation
reach maximum and adding a movement in the forth zone made zones in the middle was dimly lit from the start. When the first of
the red chase peak as well. these zones where activated the green lights started to strobe to
The main idea was that the running light would stimulate the the beat of the music. Upon adding activity in the second and third
participants to run. In practice if a few people were running zone the blue and white fluorescents started to strobe as well.
around very fast they could activate all four zones, thus getting the Activating the outer corner zones sent red light running back and
entire soundtrack as well as the full light. forth through the entire length of the room.
131
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
4. EVALUATION right way. Adults with little children were more relaxed. Older
The exhibition was open to the public during a five-week period. people were both shy and let go. Many of them brought their
Schools could book for a guided tour of the installation and a grandchildren. 9-13 year-olds who hangs out in the nearby
workshop. During public open hours there were personnel shopping center found ”Hoppsa” where they run around a
available to answer questions and to offer a guide to the played for a while.
installation. The installation came to attract people of all ages Many old people came, some from the elderly home, and
from preschool to senior citizens. All available school tours were groups from schools for people with special needs. These
fully booked and there were in total about 3980 visitors. groups spontaneously saw a room that gave energy enabling
Interviews were made with 25 children age 7 two weeks after they them to walk around. They were sometimes hesitant at first,
had visited the installation. Summarizing their answers the standing in a corner. High school kids went mad and the
children all thought it was a fun, exciting and magical experience. teachers backed and there became groups within the groups.
They were fascinated by fact that sounds were invisible. They The ”cool guy” was actively participating but also the shy ones
could not see any instruments, and they were impressed that one dared also.
could not see where the sound came from. They felt great freedom
to dance and to move around freely without instructions from any
5. DISCUSSION
adult. This was an indication that the intended purpose with the The idea of the project was, both in process of making it and in
installation worked with the children. It was also informally the final installation, to be very open to input from those who
confirmed while observing the visitors in activity in the participated and to allow for them to shape the project. Making
installation. room in the process and in the final “product” for shared ideas and
Following are some representative quotes from the interviews: working material (movements, music) contributed greatly to the
Q: What did you think when you heard the music? project and gave it lots of energy and momentum in moving
A: I thought it was fun and that you could dance how you forward. In this process with its different steps, how something
wanted and you didn’t need to decide, as it usually is. was done, to great extent also shaped what was done.
A: Different things, perhaps that it was fun. Something can bee said about usefulness, purpose together
A: I thought when there was lightning, and then I was just with openness. The aim was to offer a room that had possibilities
playing and having fun. It was a bit strange because they were to become different things depending on how one moved in it. So
transparent; I mean invisible instruments in that room. one could se it as a place where the participant used the room for
A: I thought it was very well done because I would never her or his needs, and at the same time through her or his dancing
dream about making such invisible instrument in the rooms. It expressed this in a sort of performance. This could thus be seen a
was like magic. The lightning, it was like you only did like that “product” that consists of an empty room, no light, no sound, until
with the hand in the air. That was very cool. someone steps in, and shapes it according to her or his needs or
A: It was exciting pleasures. This also touches on ideas such as does the visitor
Q: What was exciting? becomes a dancer? Does the dancer become a musician or does
A: That the music started without you knowing it. It changed. the music become a dance or a choreography? From our
Q: Why did you want to move to that music in that room? experience from the installation, professional dancers tended to
A: It felt so good to be able to move when the music was on. see it as a dance improvisation focusing on the physical and
A: It’s fun, instead of standing still. It is more fun to dance. emotional experience in interaction with music and light.
A: It’s more fun to move than to stand still. Musicians, on the other hand, tended to see it as musical
A: It was so, it was like it steered me. It was so fun I felt I had instruments controlled by gestures. Either way it is through their
to move. bodies in motion that they interact with the room.
A: They (the instruments) almost steered me. It felt a little When meeting with dance and art teachers who were to
weird sometimes you could do it but you could not. (Shows a receive school groups in the installation a discussion came up to
movement) what extent the visitors/participants should be given instructions.
Q: You could not? The teachers had planned for a workshop in and around the
A: I wanted to do it then it steered so I started running instead. installation that included painting and making a performance. The
Q: How did it feel to move in that room? choreographer wanted for the visitors to have the freedom to do
A: Good! what they felt like in the installation. Depending on one’s
A: It was fun. To be with friends and play with them at the background one had different preferences of how to use the
same time you are dancing. installation. This relates to the possibility of the shaping of the
A: It was fun with my friends. “product” or experience. How one uses it affects what one gets, in
A: It was a lot of fun. this case the children’s experience of being in the space.
Q: Why was it fun? It is popular from elementary to high school education to use
A: It was just like that with different music and dances and art and dance for learning about other things for example
different light and everything was sort of higgledy-piggledy. communications skills, discipline, conflict solving, physical
exercise, mathematics and history 6 . We argue that it is also very
Following are some comments from the staff at the exhibition hall important to practice art as art in order to understand it and its
to the question “What have you seen children and adults do?” processes and techniques, including creativity, to fully make use
Children were open and unafraid; grown-ups asked what one of it in all those other areas mentioned above. From the interviews
should do. The children were more spontaneous and wild. I with the 7 year olds who had visited the installation one can see
told them to go in, to try it a little and to come back. I tried to
6
get them to discover new movement patterns. Many people http://www.oru.se/templates/oruExtNormal____37079.aspx
were looking for an answer/key in order to interpret it in the
132
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
that they had strong memories and impressions. The opportunities the 1999 IEEE International Conference on Systems, Man,
they were given to explore and make decisions themselves played and Cybernetics, SMC'99. IEEE Computer Society Press,
an essential part in their experience and memory of it. New York, 1999.
[4] Friberg, A. A fuzzy analyzer of emotional expression in
6. ACKNOWLEDGMENTS music performance and body motion. In Brunson, W., &
This project was supported by The University College of Dance,
Sundberg, J. (Eds.), Proceedings of Music and Music
Stockholm, Swedish Arts Council, City of Stockholm, Stockholm
Science, Stockholm 2004, 2005
County Council, The Modern Dance Theater, Stockholm, and The
Municipality of Botkyrka. [5] Friberg, A. Home conducting: Control the overall musical
We would like to thank the composer Niko Röhlcke, the expression with gestures. In Proceedings of the 2005
dancers Linda Adami, Johanna Klint, Kerstin Abrahamsson, International Computer Music Conference, ICMA, San
Maryam Nikandish, Stina Nyberg, and the set designer Tove Francisco, (pp. 479-482), 2005
Axelsson. [6] Rinman, M-L., Friberg, A., Bendiksen, B., Cirotteau, D.,
Dahl, S., Kjellmo, I., Mazzarino, B., & Camurri, A. Ghost in
7. REFERENCES the Cave - an interactive collaborative game using non-verbal
[1] Camurri A., Hashimoto S., Ricchetti M., Trocca R., Suzuki communication. In Camurri, A., & Volpe, G. (Eds.), Gesture-
K., and Volpe G. EyesWeb – Toward Gesture and Affect based Communication in Human-Computer Interaction (pp.
Recognition in Interactive Dance and Music Systems. 549-556), Springer Verlag, Berlin, 2004.
Computer Music Journal, 24(1), 2000, 57-69.
[7] Siegel, W. and Jacobsen, J. The Challenges of Interactive
[2] Camurri, A., De Poli, G., Friberg, A., Leman, L., and Volpe, Dance: An Overview and Case Study. Computer Music
G. The MEGA project: analysis and synthesis of Journal, 22(4), 29-43, 1998.
multisensory expressive gesture in performing art
applications. Journal of New Music Research, 34(1), 2005, 5- [8] Sjöstedt Edelholm, E. and Wigert, A. Att känna rörelse - en
21 danspedagogisk metod. (To know/feel movement – a dance
pedagogical method), Carlsson, Stockholm, 2005.
[3] Camurri, A., Hashimoto, S., Suzuki, K. and Trocca, R.
KANSEI Analysis of Dance Performance. In Proceedings of
133
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
134
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
different expressive performances of the same music piece. number of voices is interacting with the installation. Moreover,
Users can navigate such affective spaces by their expressive since each user controls the performance of the voice associated
movement and gesture. On the other hand, Mappe per Affetti to the area she occupies, the whole piece is performed with the
Erranti explicitly addresses fruition by multiple users and same expressive intention only if all the users are moving with
encourages collaborative behavior: only social collaboration the same expressive intention. Thus, the more users move with
allows a correct reconstruction of the music piece. In other different, conflicting expressive intentions, the more the
words, while users explore the physical space, the (expressive) musical output is incoherent and chaotic. But the more users
way in which they move and the degree of collaboration move with similar expressive intentions and in a collaborative
between them allow them to explore at the same time an way, the more the musical output is coherent and the music
affective, emotional space. pieces is listened to in one of its different expressive
performances.
Section 2 presents the concept of Mappe per Affetti Erranti;
Section 3 focuses on the specific aspect of expressive Mappe per Affetti Erranti can therefore be experienced at
movement analysis and describes the model we designed for several levels: by a single user who has a limited but still
navigating the affective space; Section 4 and 5 illustrate the powerful set of possibilities of interaction, by a group of users
implementation of an installation of Mappe per Affetti Erranti who can fully experience the installation, by multiple groups of
developed for the science exhibit “Metamorfosi del Senso” users. In fact, each physical area can be occupied by a group of
(Casa Paganini, Genova, Italy, October 25 – November 6, users. In this case each single group is analyzed and each
2007). Conclusions summarize some issues and future work participant in a group contributes to intervene on the voice
that emerged from such installation. associated to the area the group is occupying. Therefore, at this
level a collaborative behavior is encouraged among the
2. CONCEPT participants in each single group and among the groups
The basic concept of Mappe per Affetti Erranti is the participating in the installation.
collaborative active listening of a music piece through the
navigation of maps at multiple levels, from the physical level to The possibility of observing a group or multiple groups of users
the emotional level. during their interaction with Mappe per Affetti Erranti makes
this installation an ideal test-bed for investigating and
At the physical level space is divided in several areas. A voice experimenting group dynamics and social network scenarios.
of a polyphonic music piece is associated to each area. The
presence of a user (even a single user) triggers the reproduction 3. EXPRESSIVE MOVEMENT ANALYIS
of the music piece. By exploring the space, the user walks This section focuses on a specific and relevant aspect of Mappe
through several areas and listens to the single voices separately. per Affetti Erranti, i.e., how the system analyses the expressive
If the users stays in a single area, she listens to the voice intentions conveyed by a user through her expressive movement
associated to that area only. If the user does not move for a and gesture. Such information is used for navigating the
given time interval music fades out and turns off. affective, emotional space and for controlling the expressive
performance of a voice in the polyphonic music piece.
The user can mould the voice she is listening to in several ways.
At a low level, she can intervene on parameters such as Expressive movement analysis is discussed with reference to an
loudness, density, amount of reverberation. For example, by implementation of Mappe per Affetti Erranti we recently
opening her arms, the user can increase the density of the voice developed (see Section 4). In such implementation we selected
(she listens to the two or more voices in unison). If she moves four different expressive intentions: the first one refers to a
toward the back of the stage the amount of reverberation happy, joyful behavior, the second one to solemnity, the third
increases, whereas toward the front of the stage the voice one to a intimate, introverted, shy behavior, the fourth to anger.
becomes drier. In order to make description easier we will label such
expressive intentions as Happy, Solemn, Intimate, Angry.
At a higher level the user can intervene on the expressive Please note, however, that we consider the reduction to such
features of the music performance. This is done through the labels as a too simplistic way of describing very subtle nuances
navigation of an emotional, affective space. The system of both movement and music performance. In fact, we never
analyzes the expressive intention the user conveys with her described Mappe per Affetti Erranti to users in terms of such
expressive movement and gesture and translates it in a position labels. Rather, we provided (when needed) more articulated
(or a trajectory) in an affective, emotional space. Like the descriptions of the kind of expressive behavior we (and the
physical space, such affective, emotional space is also divided system) expected and we let users to discover themselves the
in several areas, each one corresponding to a different installation step by step.
performance of the same voice with a different expressive
intention. Several examples of such affective, emotional spaces Such four expressive intentions were select since they are
are available in the literature, for example the spaces used in different and characterized enough to be easily conveyed and
dimensional theories of emotion (e.g., see [3][4]) or those recognized by users. Furthermore, they are examples of
especially developed for analysis and synthesis of expressive low/high positive/negative affective states that can be easily
music performance (e.g., see [5][6][7]). mapped on existing dimensional theories of emotion (e.g.,
valence-arousal or Tellegen’s space).
Users can thus explore the music piece in a twofold perspective:
navigating the physical space they explore the polyphonic 3.1 Feature extraction
musical structure; navigating the affective, emotional space In our current implementation, analysis of expressive gesture is
they explore music performance. A single user, however, can performed by means of twelve expressive descriptors: Quantity
only listen to and intervene on a single voice at time: she cannot of Motion computed on the overall body movement and on
listen to the whole polyphonic piece with all the voices. translational movement only, Impulsiveness, vertical and
Only a group of users can fully experience Mappe per Affetti horizontal components of velocity of peripheral upper parts of
Erranti. In particular, the music piece can be listened to in its the body, speed of the barycentre, variation of the Contraction
whole polyphony only if a number of users at least equal to the Index, Space Occupation Area, Directness Index, Space Allure,
135
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Amount of Periodic Movement, Symmetry Index. Such following motion phase. The variance of such inter-onset
descriptors are computed in real-time for each user. Most of intervals is taken as an approximate measure of PM.
descriptors are computed on a time window of 3 s. In the
context of Mappe per Affetti Erranti, we considered such time Symmetry Index (SI) is computed from the position of the
interval as a good trade-off between the need on the one hand of barycenter and the left and right edges of the body bounding
having an enough responsive system, and the need on the other rectangle. That is, it is the ratio between the difference of the
hand to give the users a time long enough for displaying an distances of the barycenter from the left and right edges and the
expressive intention. width of the bounding rectangle:
136
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
where is used for tuning the range of values for which the about 10 m from the stage (we did not use sensors or additional
descriptor can be considered to be at the appropriate level (Very videocameras in this first experience). Four loudspeakers were
High or Very Low), controls the steepness of the sigmoid placed at the four corners of the stage for audio output. A white
and controls the type of sigmoid, i.e., = 1 if the screen covered the back of the stage for the whole 9 m width:
descriptor is expected to be Very Low and = -1 if the this was used as scenery since the current implementation of
descriptor is expected to be Very High (inverse sigmoid tending the installation does not include video feedback. Lights were set
to 1 for high values). in order to enhance the feeling of immersion for the users and to
have a homogenous lighting of the stage.
Intuitively, the output of the Gaussian and sigmoid functions
applied to motion descriptors is a measure of how much the The music piece we selected is “Come again” by John Dowland
actual value of a motion descriptor is near to the value expected for four singing voices: contralto, tenore, soprano, and basso.
for a given expressive intention. For example, if a motion With the help of singer Roberto Tiranti and composer Marco
descriptor is expected to be Low for a given expressive Canepa we chose a piece that could be soundly interpreted with
intention and its expected value is 0.4, a Gaussian is placed with different expressive intentions (i.e., without becoming
its peak (normalized to 1) centered in 0.4. That motion ridiculous) and could result interesting and agreeable for non
descriptor will therefore provide the highest contribution to the expert users. We asked professional singers to sing it with the
overall sum if the real value is in fact the expected value. As a four different expressive intentions Happy, Solemn, Intimate,
consequence the highest value for the sum is obtained by that and Angry. The piece was performed so that changes in the
expressive intention whose expected values for descriptors interpretation could be perceived even by non-expert users.
according to Table 1 best match the actual computed values.
The physical map is composed by four rectangular, parallel
Table 1. Expected levels of each motion descriptor areas on the stage. Tenore and soprano voices are associated
for the four expressive intentions with the central areas, contralto and basso to the lateral ones.
This allows an alternation of female and male voices and
Motion
Happy Solemn Intimate Angry attracts users toward the “stronger” voices, i.e., the central ones.
descriptor
Navigation in the affective, emotional space is obtained with
QoM High Low Low High
the techniques for expressive movement analysis and
TQoM High Low Low High classification discussed in Section 3. As for music performance,
each recorded file was manually segmented in phrases and sub-
IM Medium Low Very low Very phrases. Changes in the expressive intention detected from
high movement triggers a switch to the corresponding audio file at a
position which is coherent to the position reached by that
VV High Low Low Medium
expressive interpretation as a result of the movement of other
HV High Medium Low High users/groups. In such a way we obtain a continuous
resynchronization of the single voices depending on the
BS Not Not Low Medium expressive intentions conveyed by users.
relevant relevant
In occasion of the opening of “Metamorfosi del Senso”,
dCI Medium Low Low Very choreographer Giovanni Di Cicco and his dance ensemble
high designed and performed a contemporary dance performance on
SOA Not Not Low High Mappe per Affetti Erranti. In such a performance, dancers
relevant relevant interacted with the installation for over 20 min, repeatedly
moving from order to chaos. The public of the dance
DI Medium High Low Low performance counted more than 400 persons in 3 days. Figure 1
shows a moment of the dance performance and a group of users
SA Low Low Medium Low experiencing Mappe per Affetti Erranti. The installation was
PM High Very high Low Very experienced by more than 1500 persons during “Metamorforsi
low del Senso”, with general positive and sometimes enthusiastic
feedback.
SI Medium Medium Not Low
relevant 5. IMPLEMENTATION: THE EYESWEB
XMI OPEN PLATFORM AND THE
4. THE INSTALLATION AT THE EYESWEB EXPRESSIVE GESTURE
SCIENCE EXHIBITION PROCESSING LIBRARY
“METAMORFOSI DEL SENSO” The instance of Mappe per Affetti Erranti we developed for the
Mappe per Affetti Erranti was presented the first time at the exhibit “Metamorfosi del Senso” was implemented using a new
science exhibition “Metamorfosi del Senso”, held at Casa version of our EyesWeb open platform [13][18]: EyesWeb XMI
Paganini, Genova, Italy, on October 25th – November 6th, 2007. (for eXtended Multimodal Interaction). The EyesWeb open
The exhibition was part of “Festival della Scienza”, a huge platform and related libraries are available for free on the
international science festival held in Genova every year. EyesWeb website www.eyesweb.org.
Mappe per Affetti Erranti was installed on the stage of the 250- With respect to its predecessors, EyesWeb XMI strongly
seats auditorium at Casa Paganini, an international center of enhances support to analysis and processing of synchronized
excellence for research on sound, music, and new media, where streams at different sampling rates (e.g., audio, video, data from
InfoMus Lab has its main site. The installation covered a sensors). We exploited such support for the synchronized
surface of about 9 m × 3.5 m. A single vidocamera observed the processing and reproduction of the audio tracks in “Come
whole surface from the top, about 7 m high, and at a distance of Again”. The whole installation was implemented as a couple of
137
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
EyesWeb application (patch): the first one managing video intention of a group as a kind of average of the expressive
processing, extraction of expressive features from movement intentions conveyed by its components or more complex group
and gestures, navigation in the physical and affective space; the dynamics have to be taken into account? Research on
second one devoted to audio processing, real-time audio computational models of emotion, affective computing, and
mixing, and control of audio effects. Every single component of expressive gesture processing usually focus on the expressive
the two applications was implemented as an EyesWeb sub- content communicated by single users. Such group dynamics
patch. The two applications ran on two workstation (Dell and their relationships with emotional expression are still
Precision 380, equipped with two CPUs Pentium 4 3.20 GHz, 1 largely uninvestigated.
GB RAM, Windows XP Professional) with fast network
connection. Another issue concerns the robustness of the selected
expressive movement descriptors with respect to different
Extraction of expressive descriptors and models for navigating analysis contexts. For example, the kind of motion a user
physical and expressive spaces were implement as EyesWeb performs when she stays inside an area in the space is often
modules (blocks) in a new version of the EyesWeb Expressive different under several aspects from the motion she performs
Gesture Processing Library. when wandering around the whole space. Motion inside an area
is characterized by movement of limbs, the amount of energy is
mainly due to how much limbs move, the expressive intention
is conveyed through movement in the Kinesphere. Walking is
instead the main action characterizing motion around the space,
the amount of energy of walking is much higher than the
amount of energy associated with possible other movements of
arms, the expressive intention is conveyed through the walking
style. The system should be able to adapt to such different
analysis contexts and different sets of motion descriptors should
be developed either specifically for a given context or robust to
different contexts.
Future work will also include refinements to the classifier and
formal evaluation with users. As for the classifier, it
encompasses many parameters (e.g., weights, parameters of the
functions applied to the movement descriptors) that need to be
fine tuned. In this first installation such parameters have been
set empirically during tests with dancers and potential users.
However, a deeper investigation based on rigorous experiments
would be needed in order to individuate a minimum set of
statistically significant descriptors and to find for them suitable
values or range of values for parameters. Formal evaluation
with professional and non expert users is needed for a correct
estimation of the effectiveness of the installation and its
usability.
Such future work will be addressed in the framework of the EU-
ICT Project SAME (www.sameproject.eu), focusing on new
forms of participative and social active listening of music.
7. ACKNOWLEDGMENTS
We thank our colleague and composer Nicola Ferrari for his
precious contribution in developing the concept of Mappe per
Affetti Erranti; choreographer Giovanni Di Cicco and singer
Roberto Tiranti for the useful discussions and stimuli during the
preparation of the dance and music performance; composer
Marco Canepa for recording and preparing the audio material.
Figure 1. Mappe per Affetti Erranti: on the top a We also thank singers Valeria Bruzzone, Chiara Longobardi,
snapshot from the dance performance; in the bottom a Edoardo Valle who with Roberto Tiranti performed “Come
group of user interacting with the installation. Again” with different expressive intentions, and dancers Luca
Alberti, Filippo Bandiera, Nicola Marrapodi who with Giovanni
6. CONCLUSIONS Di Cicco performed the dance piece on Mappe per Affetti
From our experience with Mappe per Affetti Erranti, especially Erranti. Finally, we thank our colleagues at DIST – InfoMus
at the science exhibit “Metamorfosi del Senso”, several issues Lab for their concrete support to this work, Festival della
emerged that need to be taken into account in future work. Scienza, and the visitors of the science exhibition “Metamorfosi
del Senso” whose often enthusiast feedback strongly
A first issue is related to the expressive movement descriptors encouraged us in going on with this research.
and the modalities of fruition of Mappe per Affetti Erranti. The
installation can be experienced by a single user, by a group, or 8. REFERENCES
by multiple groups. However, the expressive descriptors have [1] Rowe, R. Interactive music systems: Machine listening and
been defined and developed for analyzing movement and composition. MIT Press, Cambridge MA, 1993.
expressive intention of single users. To what extent can they be [2] Camurri A., Canepa C., and Volpe G. Active listening to a
applied to groups of users? Can we approximate the expressive virtual orchestra through an expressive gestural interface:
138
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The Orchestra Explorer. In Proceedings of the 7th Intl. of cue attunement, Developmental Psychology, 34 (1998),
Conference on New Interfaces for Musical Expression 1007-1016.
(NIME-07) (New York, USA, June 2007). [12] Kilian J. Simple Image Analysis By Moments, Open
[3] Russell J.A. A circumplex model of affect, Journal of Computer Vision (OpenCV) Library documentation, 2001
Personality and Social Psychology, 39 (1980), 1161-1178. [13] Camurri A., De Poli G., Leman M., and Volpe G. Toward
[4] Tellegen A., Watson D., and Clark L. A. On the Communicating Expressiveness and Affect in Multimodal
dimensional and hierarchical structure of affect. Interactive Systems for Performing Art and Cultural
Psychological Science, 10, 4 (Jul 1999), 297-303. Applications, IEEE Multimedia Magazine, 12,1 (Jan.
[5] Juslin, P. N. Cue utilization in communication of emotion 2005), 43-53.
in music performance: relating performance to perception. [14] Camurri A., Mazzarino B., Ricchetti M., Timmers R., and
Journal of Experimental Psychology: Human Perception Volpe G. Multimodal analysis of expressive gesture in
and Performance, 26,6 (2000), 1797-1813. music and dance performances. In A. Camurri, G. Volpe
[6] Canazza S., De Poli G., Drioli C., Rodà A., and Vidolin A. (Eds.), Gesture-based Communication in Human-
Audio Morphing Different Expressive Intentions for Computer Interaction, LNAI 2915, 20-39, Springer
Multimedia Systems, IEEE Multimedia, 7, 3 (200), 79 – Verlag, 2004
83. [15] Wallbott H. G. Bodily expression of emotion, European
[7] Vines B. W., Krumhansl C.L., Wanderley M.M., Ioana M. Journal of Social Psychology, 28 (1998), 879-896.
D., Levitin D.J. Dimensions of Emotion in Expressive [16] Argyle M., Bodily Communication, Methuen & Co Ltd,
Musical Performance. Ann. N.Y. Acad. Sci., 1060 (2005), London, 1980.
462-466. [17] De Meijer M., The contribution of general features of body
[8] Camurri A., Lagerlöf I., and Volpe G. Recognizing movement to the attribution of emotions, Journal of
Emotion from Dance Movement: Comparison of Spectator Nonverbal Behavior, 13 (1989), 247-268.
Recognition and Automated Techniques, International [18] Camurri A., Coletta P., Demurtas M., Peri M., Ricci A.,
Journal of Human-Computer Studies, 59, 1-2 (2003), 213- Sagoleo R., Simonetti M., Varni G., and Volpe G. A
225, Elsevier Science. Platform for Real-Time Multimodal Processing, in
[9] Laban R., and Lawrence F.C. Effort. Macdonald & Evans Proceedings International Conference Sound and Music
Ltd., London, 1947. Computing 2007 (SMC2007) (Lefkada, Greece, July
[10] Laban R., Modern Educational Dance. Macdonald & 2007).
Evans Ltd., London, 1963.
[11] Boone R. T., and Cunningham J. G. Children's decoding of
emotion in expressive body movement: The development
139
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
140
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
acoustic open work (Scambi by Henri Pousseur), actors, which interact with each other by sending
characterized by a variety of sequences and of different messages, improving on the sequential limitations of
performance degrees of freedom. passive objects.
Each actor is defined by three parts: a passive part, which
2. ACTOR–BASED ZIGZAG MODEL is a set of local variables, termed acquaintances, that
In order to present our model in section 3, this section constitute its internal state; an active part, that reacts to
introduces in two separate subsections (2.1 and 2.2) the the external environments by executing its procedural
basics of the ZigZag model and a brief description of the skills, called scripts. This active part constitutes actor’s
actors, a particular class of computational agents. behaviour; the third part is represented by the actor’s mail
queue, which buffers incoming communication (i.e.
2.1 The ZigZag model messages).
ZigZag [6] introduces a new, graph-centric system of
conventions for data and computing; it separates the Each actor has a unique name (the uniqueness property)
structure of information from its visualization (i.e. the way and a given behaviour; it communicates with other actors
the data – text, audio, video - is presented to the user); via asynchronous messages. Actors are reactive in nature,
therefore a ZigZag structure handles all the different i.e., they execute only in response to messages received.
visualizations necessary to realize an Electronic Edition of
An actor can perform three basic actions on receiving a
musical works.
message: create a finite number of actors with universally
The main element present in the ZigZag model is the fresh names, send a finite number of messages and
zz-structure: it can be viewed as a multigraph where edges assume a new behaviour.
are colored, with the restriction that every vertex, called
The actor's behaviour is deterministic in that its response
zz-cell, has at most two incident edges of the same color;
to a message is uniquely determined by the message
the sub-graphs, each of which contains edges of a unique
contents and its internal state. Furthermore, all actions
color, are called dimensions. The cells in a same
performed on receiving a message are concurrent.
dimension are linked into linear and directed sequences,
called ranks. Each dimension can contain a number of In order to describe the actors in our model we adopt the
parallel ranks, which are a series of distinct cells connected formalism used in [3]:
sequentially.
(DefActor
ActorName
Since there is no canonical visualization, the pseudo-
[inherits‐from
Class‐Name]
space generated by zz-structures is called zz-space and <acquaintances
list>
may be viewed in various ways. A view is a presentation {scripts
list}
)
of some portion of zz-space and is presented by a view Therefore an actor is described by specifying its
program, which visualizes, for example, a region around a superclass, its data part and its script part; the script part
particular cursor. represents the set of scripts that can be executed.
A 2D view can be drawn picking a single cell as a
focal point, and drawing the neighborhood around that cell 3. THE MODEL
along two chosen dimensions. By changing the chosen pair The architecture of our model is organized in two layers: a
of dimensions, we can visually reveal, hide, and rearrange component layer contains the zz-cells, those are actors
nodes in interesting ways. Considering that a zz-structure specifically designed to model the audio documents
may be very large, and that there is usually not enough domain; a meta layer contains the actors classes
room in the 2D view for all of the cells, we restrict the specialized, for example, to manage connections among
dimension of the 2D view. zz-cells or to generate specific views on them. The
Some observations are necessary on the zz-cells; they interaction between actors is defined using the
are the principal unit of the system and they are conceived diagrammatic language AUML (Agent Unified Modelling
not only as passive containers of primitive data (i.e., text, Language) [8], extension of UML for agents.
graphics, audio, etc.) but they can have types, based either
on their functions or on the types of data they contain. 3.1 Component layer
Thus, a zz-cell may have a variety of different properties The component layer is defined in relation to the magnetic
and functions, such as executable program or scripts (this tape structure: each open reel is usually composed by
type of cell is called progcell), or represent the package of several physical segments, i.e. pieces of magnetic tape
different cells (this type of cell is called referential cell). connected by means of adhesive tape (called junction).
Analogous observations can be made on the In each segment, the audio signal is recorded in one, two
dimensions; in fact, they can be passive and nominal or more tracks. Following this structure we define the
(merely receiving and presenting data) or operational, actors Source, PhysicalSegment, and DigitalSignal.
programmed to monitor changing zz-structures and events, Moreover, the actor LogicalSegment is introduced, with the
and calculate and present results automatically (for aim to compare the sources on the basis of a segmentation
example, the dimensions d.cursor and the dimension that is different by the physical one. The actor Source
d.clone). From these considerations it turns out that it is represents the overall characteristics of the document, such
reductive to associate zz-cells to passive entities meant as the tape width (typical values are 1/4, 1/2, 1, and 2 inch)
simple nodes are in a graph. So, we have considered the and the cataloguing fields.
opportunity to model a zz-cell by means of a specific class
of computational agents, the actors. (DefActor
Source
<physicalSegments
width
archive
shelfMark
inventory
conservationCondition
...>
2.2 Brief description of the actor model {calculateDuration
...})
The actor model [1] is a model of concurrent computations It contains also a list of phisicalSegments, which compound
in distributed systems; it is organized as a universe of the open reel tape. This actor is able to perform several
inherently autonomous computational agents, called
141
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
parameters) from one sequence to the next (a sort of features and that of others segments, each segment (srcj,
‘continuity principle’). j=1,...32) assigns this task to the rank re‐sj
(that manages the
logical segments able to follow srcn on the basis of
Pousseur’s ‘continuity principle’). Each re‐sj
contacts (in
synchronous multicast way) all its components (e‐srcjs, s
=
1,
…,
m) and, following stochastic law defined off-line by
the user or by means of user input, it choices the
components with the best matching. This information is
returned to generativeProcess. This last actor collects the
segments and creates the new requested performance. In
Figure 2 a screenshot of the system is showed.
5. CONCLUSION
The era of high modernism, in which concept of the
open work was a radical resistance to this dominant
aesthetic, has been relegated to history. The contemporary
western culture, as it is well-known, assumes that all
musical works are open to perpetually renewed
interpretation by listeners, musicologists, analysts, and
performers [2]. In particular, in multimedia domain no
work is permitted to resist endless (interactive)
interpretation. This contemporary situation is partly the
effect of the invention of the concept of the open musical
work, in which Pousseur was a precursor. For this reason,
Figure 1. Characteristic per sequence (from: Decroupet [4]).
the interest in Scambi is particularly high today, as also
proved by the success obtained by the Scambi Project
This segmentation process can be iteratively applied (www.scambi.mdx.ac.uk/). One effect of our work might
to all the sequences, obtaining a set of audio segments be to free the historical musical open work from its iconic
linked along two dimensions. The user-author can generate status as history, to revive and redefine its specific
new performances mixing different sequences also in openness within general (digital and interactive) openness,
polyphonic structure. To do so, user can apply and to return a continuous presence to it by opening it up
deterministic laws (given by the composer), stochastic to interpretive renewal.
models or self-oriented choices; this allows user to
generate new ‘reading’ performances of an open work. 6. REFERENCES
[1] Agha, G. Actors: A Model of Concurrent
Computation in Distributed Systems, MIT Press,
Cambridge, MA, 1986.
[2] Ayrey, C. Pousseur’s Scambi (1957), and the new
problematics of the open work, Proc. of Symposium
on Scambi at Goldsmiths College, University of
London 2005.
[3] Dattolo A. and Loia, V. Distributed Information and
Control in a Concurrent Hypermedia-oriented
Architecture. Inter-national Journal of SEKE, Vol.
10, n. 6, pp. 345-369, 2000.
[4] Decroupet, P. Vers une théorie generale – Henri
Pousseurs “Allgemeine Periodik” in Theorie und
Praxis in MusikTexte 98, pp. 31-43, 2003.
[5] Eco, U. The role of the reader: explorations in the
semiotics of texts, Indiana University Press, USA,
1979.
Figure 2. A screenshot of the system. X-axis: time, Y-axis: [6] Nelson, T. H. Cosmology for a Different Computer
sequences. The user can be realize a polyphonic structure and Universe. Journal of Digital Information, Vol. 5,
modify the pitch, the duration and the volume of each Issue 1, 2004.
sequence. [7] Pousseur, H. Scambi. In Gravesaner Blätter IV, pp.
We assume that an user is interested in creating the 36-54, 1959.
new performance starting from a segment srcn; this request [8] Winikoff, M. Toward making agent UML practical: a
is captured from the meta-actor generativeProcess and it is textual notation and a tool. First International
forwarded from it to the source rank rsrc (that manages the Workshop on Integration of Software Engineering
logical segments src1, …, src32). rsrc sends a synchronous and Agent Technology, Melbourne, Australia, pp.
multicast message (CalculateRule) to all its logical 401-412, 2005.
segments. In order to enable the comparison among its
143
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
144
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
defined rhythmic pattern. The goal of an agent is to play his Kinetic Engine, in collaboration with MahaDeviBot, builds
instrument in synchronism with the others.” upon such previous efforts; however, it is fundamentally
Murray-Rust and Smaill’s AgentBox [17] uses multi-agents in a different in two respects: firstly, it is a real-time system with
graphic environment, in which agents “listen” to those agents performance as its primary motivation; secondly, the software
physically (graphically) close to one another. A human controls a physical instrument that requires mechanical
conductor can manipulate the agents - by moving them around - movement.
in a “fast and intuitive manner”, allowing people to alter aspects
of music “without any need for musical experience”. The 3. AGENT-GENERATED RHYTHM
stimulus behind AgentBox is to create a system that will “enable It is important to recognize that rhythmic intricacy can result
a wider range of people to create music,” and facilitate “the not only from the evolution of individual rhythms, but also
interaction of geographically diverse musicians”. through the interaction of quite simple parts; such interaction
can produce musical complexity within a system. The
2.2 Rhythm Generation interrelationship of such simple elements requires musical
Various strategies and models have been used to generate knowledge in order to separate interesting from pedestrian
complex rhythms within interactive systems. Brown [2] rhythm. Such interaction suggests a multi-agent system, in
describes the use of cellular automata (CA) to create which complexity results from the interaction of independent
monophonic rhythmic passages and polyphonic textures in agents.
“broad-brush, rather than precisely deterministic, ways.” He Existing musical models for such a system can be found in the
suggests “CA provide a great deal of complexity and interest music of African drum ensembles and Central and South
from quite simple initial set-up”. However, complexity American percussion ensembles (note that Indian classical
generated by CA is no more musical than complexity generated music, which contains rhythmic constructions of great
by constrained randomness. Brown recognises this when he complexity, is fundamentally solo, and therefore lacks rhythmic
states that rhythms generated through the use of CA “often interaction of multiple layers). Furthermore, models for the
result in a lack of pulse or metre. While this might be relationship of parts within an improvising ensemble can be
intellectually fascinating it is only occasionally successful from found in jazz and certain forms of Techno. For more
the perspective of a common aesthetic.” information on such modeling, see [8].
Pachet [19] proposes an evolutionary approach for modeling
musical rhythm, noting that “in the context of music catalogues, 4. TOOLS
[rhythm] has up to now been curiously under studied.” In his
system, “rhythm is seen as a musical form, emerging from 4.1 MahaDeviBot
repeated interaction between several rhythmic agents.” Pachet’s
model is that of a human improvisational ensemble: “these
agents engage into a dynamic game which simulates a group of
human players playing, in real time, percussive instruments
together, without any prior knowledge or information about the
music to play, but the goal to produce coherent music together.”
Agents are given an initial rhythm and a set of transformation
rules from a shared rule library; the resulting rhythm is “the
result of ongoing play between these co-evolving agents.” The
agents do not actually communicate, and the rules are extremely
simple: i.e. add a random note, remove a random note, move a
random note. The system is more of a proof of concept than a
performance tool; it developed into the much more powerful
Continuator [20], a real-time stylistic analyser and variation
generator.
Martins/Miranda [13] describe a system the uses a connectionist
Figure 1. MahaDeviBot controlled by Kinetic Engine.
approach to representing and learning rhythms using neural
networks. The approach allows for the computer to learn
rhythms through similarity by mapping incoming rhythms in a The development of the MahaDeviBot serves as a paradigm for
three dimensional space. The research is part of a longer project various types of solenoid-based robotic drumming techniques,
[16, 14] in which self-organising agents create emergent music striking twelve different percussion instruments gathered from
through social interactions; as such, the emphasis is not upon around India, including frame drums, bells, finger cymbals,
the interaction of rhythms as in the emergence of new and/or wood blocks, and gongs. The machine even has a bouncing
related rhythmic patterns. head that can portray tempo to the human performer. The
MahaDeviBot serves as a mechanical musical instrument that
Gimenes [9] explores a memetic approach that creates stylistic extends North Indian musical performance scenarios, which
learning methods for rhythm generation. As opposed to viewing arose out of a desire to build a pedagogical tool to keep time
rhythmic phrases as consisting of small structural units and help portray complex rhythmic cycles to novice performers
combined to form larger units (a more traditional method of in a way that no audio speakers can ever emulate. It accepts
musical analysis), the memetic approach suggests longer blocks MIDI messages to communicate with any custom software or
that are dependent upon the listener (suggesting a more recent
hardware interface.
cognitive method of rhythmical analysis that utilizes
“chunking”). RGeme “generates rhythm streams and serves as a
tool to observe how different rhythm styles can originate and
evolve in an artificial society of software agents.”
145
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
4.2 Kinetic Engine Type can be loosely associated with the instrument an agent
Kinetic Engine is a real-time composition/performance system plays, and the role such an instrument would have within the
created in Max/MSP, in which intelligent agents emulate ensemble. Table 1 describes how type influences behaviour.
improvising percussionists in a drum ensemble. It arose out of a Table 1. Agent type and influence upon agent behaviour.
desire to move away from constrained random choices and
Type Low Type Mid Type High
utilise more musically intelligent decision-making within real-
time interactive software. Timbre low frequency: midrange high frequency:
The principle human control parameter in performance is • frame drums frequency: • hand drum
limited to density: how many notes played by all agents. All • gongs • tambourine
• shakers
other decisions - when to play, what rhythms to play in
response to the global density, how to interact with other agents Density lower than average higher than
– are left to the machines’ individual agents. average average
Agents generate specific rhythms in response to a changing Variation less often average more often
environment. Once these rhythms have been generated, agents
“listen” to one another, and potentially alter their patterns based
upon these relationships. No databases of rhythms are used: The stored personality traits include Downbeat (preference
instead, pre-determined musical rules determine both generation given to notes on the first beat), Offbeat (propensity for playing
and alteration of rhythmic patterns. off the beat), Syncopation (at the subdivision level), Confidence
(number of notes with which to enter), Responsiveness (how
5. AGENTS responsive an agent is to global parameter changes), Social
Agent-based systems allow for limited user interaction or (how willing an agent is to interact with other agents),
supervision, allowing for more high-level decisions to be made Commitment (how long an agent will engage in a social
within software. This models interactions between intelligent interaction), and Mischievous (how willing an agent is to
improvising musicians, albeit with a virtual conductor shaping disrupt a stable system). A further personality trait is Type-
and influencing the music. scaling, which allows for agents to be less restricted to their
specific types. For example, low agents will tend to have lower
There are two agent classes: a conductor and an indefinite
densities than other types, but a low agent with a high type-
number of players (although in this case the agents are limited
scaling will have higher than usual densities for its type. See
to the twelve instruments of the robot).
Figure 2 for a display of all personality parameters.
146
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
[2] if the accumulated density is “too low”, active agents begin social interactions. These interactions involve potentially
can add notes (or subtract them if the density is endless alterations of agent patterns in relation to other agents;
“too high”). these interactions continue as long as the agents have a social
bond, which is broken when testing an agent’s social
[3] if the accumulated density is judged to be “close commitment parameter fails. This test is done every “once in a
enough”, agent densities are considered stable. while”, an example of a “fuzzy” counter.
Social interaction emulates how musicians within an
6.2 Density Spread improvising ensemble listen to one another, make eye contact,
An agent’s density (i.e. seven notes) is “spread” across the and interact by adjusting and altering their own rhythmic
available beats (i.e. four beats) using fuzzy logic to determine pattern in various ways. In order to determine which agent with
probabilities, influenced by the agent’s downbeat and offbeat which to interact, agents evaluate other agent’s density spread.
parameters (see Figure 3 for an example of probability Evaluation methods include comparing density spread averages
weightings spread across four beats). Thus, an example spread and weighted means, both of which are fuzzy tests.
of seven notes for agent A, below, might be (3 1 2 1), in which Table 2. Example density spreads in 4/4: comparing agent 1
each beat is indicated with its assigned notes. with agents 2 and 3.
Agent # 1 2 3
Density spread 3122 1221 2333
Similarity rating 0.53 0.48
Dissimilarity rating 0.42 0.33
Figure 3. Example density spread weightings for two An agent generates a similarity and dissimilarity rating between
agents, 4/4 time with different downbeat and offbeat its density spread and that of every other active agent. The
parameter values. highest overall rating will determine the type of interaction: a
dissimilarity rating results in rhythmic polyphony
(interlocking), while a similarity rating results in rhythmic
Agents determine the placement of the notes within the beat heterophony (expansion). Note that interlocking interactions
using a similar technique, but influenced by the agent’s (dissimilarities) are actually encouraged through weightings.
syncopation parameter.
Once another agent has been selected for social interaction, the
6.3 Pattern Checking agent attempts to “make eye contact” by messaging that agent.
After an initial placement of notes within a pattern has been If the other agent does not acknowledge the message (its own
accomplished, pattern checking commences. Each beat is social parameter may not be very high), the social bond fails,
evaluated against its predecessor and compared to a set of rules and the agent will look for other agents with which to interact.
in order to avoid certain patterns and encourage others.
30% 90%
Figure 4. Example pattern check: given a previous beat’s Figure 5. Social messaging between agents.
rhythm, with one note required for the current beat, two 7.1 Interaction types: Polyphonic
“preferred” patterns for the current beat. In polyphonic interaction, agents attempt to “avoid” partner
notes, both at the beat and pattern level. For example, given a
In the above example, if the current beat has one note in it, and density spread of (3 1 2 2) and a partner spread of (1 2 2 1),
the previous beat contains the given rhythm, a test is made (a both agents would attempt to move their notes to where their
random number is generated between 0 and 1). If the generated partner’s rests occur (see Figure 6). Because both agents are
number is less than the coefficient for pattern A (.3, or a 30% continually adjusting their patterns, stability is actually difficult
chance), the test passes, and pattern A is substituted for the to achieve.
original pattern. If the test fails, another test is made for pattern
B, using the coefficient of .9 (or 90%). If this last test fails, the
original rhythm is allowed to remain. Using such a system,
certain rhythmic patterns can be suggested through
probabilities. Probability coefficients were hand-coded by the
first author after extensive evaluation of the system’s output.
7. SOCIAL BEHAVIOUR
Once all agents have achieved a stable density and have
generated rhythmic patterns based upon this density, agents can
147
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
10. ACKNOWLEDGMENTS
7.2 Interaction types: Heterophonic We would like to thank Trimpin and Eric Singer for their
In heterophonic interaction, agents alter their own density support in building the MahaDevibot.
spread to more closely resemble that of their partner, but no
attempt is made to match the actual note patterns (see Figure 7).
11. REFERENCES
[1] Beyls, P. Interaction and Self-Organization in a Society of
Musical Agents. Proceedings of ECAL 2007 Workshop on
Music and Artificial Life (MusicAL 2007) (Lisbon,
Portugal, 2007).
[2] Brown, A. Exploring Rhythmic Automata. Applications
On Evolutionary Computing, Vol. 3449 (2005), 551-556.
[3] Burtner, M. Perturbation Techniques for Multi-Agent and
Multi-Performer Interactive Musical Interfaces.
Figure 7. Example heterophonic interaction result Proceedings of the New Interfaces for Musical Expression
between agents A and B, with density spreads of (3 1 2 2) Conference (NIME 2006) (Paris, France, June 4-8, 2006).
and (2 1 2 1). Agent B had an initial spread of (1 2 2 1). [4] Dahlstedt, P., McBurney, P. Musical agents. Leonardo,
39, 5 (2006), 469-470.
8. ADDITIONAL AGENT KNOWLEDGE [5] Dixon, S. A lightweight multi-agent musical beat tracking
Because each agent is sending performance information, via system. Pacific Rim International Conference on Artificial
MIDI, to a specific percussion instrument, agents require Intelligence (2000), 778-788.
detailed knowledge about that instrument. Each instrument has [6] Eigenfeldt, A. Kinetic Engine: Toward an Intelligent
a discrete velocity range, below which it will not strike, and Improvising Instrument. Proceedings of the 2006 Sound
above which it may double strike. These ranges change each and Music Computing Conference (SMC 2006) (Marseille,
time the robot is reassembled after moving. Therefore, a France, May 18-20, 2006).
velocity range test patch was created which determines these
[7] Eigenfeldt, A. Drum Circle: Intelligent Agents in
limits quickly and efficiently before each rehearsal or
Max/MSP. Proceedings of the 2007 International
performance. These values are stored in a global array, which
Computer Music Conference (ICMC 2007) (Copenhagen,
each agent directly accesses in order to appropriately choose
Denmark, August 27-31, 2007)
velocities within the range of its specific instrument.
[8] Eigenfeldt, A. Multi-agent Modeling of Complex
Similarly, each instrument also has a physical limit as to how
Rhythmic Interactions in Real-time Performance, Sounds
fast it can re-strike; this limit is also determined through a test
of Artificial Life: Breeding Music with Digital Biology,
patch used to inform the program regarding potential tempo
Eduardo Miranda, ed., A-R Editions (forthcoming in
limitations. For example, the frame drums have limits of
2008).
approximately 108 BPM for three consecutive sixteenths (138
ms. inter-onset times) while the tambourine and hand-drum can [9] Gimenes, M., Miranda, E. R. and Johnson, C. A Memetic
easily play the same three sixteenths at over 200 BPM (better Approach to the Evolution of Rhythms in a Society of
than 75 ms inter-onset times). The conductor will limit the Software Agents. Proceedings of the 10th Brazilian
overall tempo and subdivisions so as not to exceed these Symposium on Computer Music (Belo Horizonte, Brazil
limitations; furthermore, individual agents will attempt to limit 2005).
consecutive notes for each drum at contentious tempi. [10] Goto, M., Muraoka, Y. Beat Tracking based on Multiple-
agent Architecture - A Real-time Beat Tracking System for
9. CONCLUSION Audio Signals. Proceedings of The Second International
Kinetic Engine has been used previously as an independent Conference on Multiagent Systems, (1996), 103-110.
ensemble, both autonomously (as an installation) and under [11] Kapur, A., Davidson, P., Cook, P.R., Driessen, P.F., and
performance control (via a network on nine computers for the W. A. Schloss. Evolution of Sensor-Based ETabla,
composition Drum Circle); its use as a generative environment EDholak, and ESitar. Journal of ITC Sangeet Research
for the control of MahaDeviBot has been discussed here. This Academy, Vol. 18 (Kolkata, India, 2004).
“collaboration” has been used in performance, in which the first
148
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
[12] Kapur, A, Singer, E., Benning, M., Tzanetakis, G., [19] Pachet, F. Rhythms as emerging structures. Proceedings of
Trimpin Integrating HyperInstruments, Musical Robots & the 2000 International Computer Music Conference ICMC
Machine Musicianship for North Indian Classical Music. 2000) (Berlin, Germany, August 27-September 1, 2000).
Proceedings of the 2007 Conference on New Interfaces for [20] Pachet, F. The Continuator: Musical Interaction With
Musical Expression (NIME 2007) (New York, New York, Style. Journal of New Music Research, 32, 3, (2003) 333-
June 6-10, 2007). 341.
[13] Martins, J., Miranda, E.R. A Connectionist Architecture [21] Spicer, M. AALIVENET: An agent based distributed
for the Evolution of Rhythms. Lecture Notes In Computer interactive composition environment. Proceedings of the
Science, Vol. 3907, (2006). Springer, Berlin, 696-706. International Computer Music Conference (ICMC 2004)
[14] Martins, J. and Miranda, E. R. Emergent rhythmic phrases (Miami, Florida, November 1-6, 2004).
in an A-Life environment. Proceedings of ECAL 2007 [22] Woolridge, M., Jennings, N. R. Intelligent agents: theory
Workshop on Music and Artificial Life (MusicAL 2007) and practice. Knowledge Engineering Review, 10, 2 (1995)
(Lisbon, Portugal, September 10-14, 2007). 115-152.
[15] Minsky, M. The Society of Mind. Simon & Schuster, Inc [23] Wulfhorst, R.D., Flores, L.V., Flores, L.N., Alvares, L.O.,
(1986). Vicari, R.M. A multiagent approach for musical interactive
[16] Miranda, E.R. Evolutionary music: breaking new ground. systems. Proceedings of the second international joint
Composing Music with Computers. Focal Press (2001). conference on Autonomous agents and multiagent systems.
ACM Press, New York, NY, 2003, 584-591.
[17] Murray-Rust, D. and Smaill, A.: The AgentBox.
http://www.mo-seph.com/main/academic/agentbox
[18] Murray-Rust, D., Smaill, A. MAMA: An architecture for
interactive musical agents. Frontiers in Artificial
Intelligence and Applications, Vol. 141 (2006).
149
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
150
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
not admit of being directly compared./.. but both are referable to a What we are trying to do here, is a sort of reverse process, from
universal formula.” These concepts seem perfectly in line with gesture to sound by looking at Klee’s lesson on dots and lines to
nowadays psychological and technological research on cross- define a new way of designing sound through gesture.
modality [6]. What we pursue in our present and future research is
the exploration of cross-modality features (i.e. of Goethe's
3. TOWARDS A SONIFICATION OF
universal formula) by investigating an abstract version of the
triangle gesture-sound-image. GESTURE THROUGH ELEMENTARY
In his Pedagogical Sketchbook, Klee points out a didactical path
SOUNDS
At this stage of our research, the aim is to create a virtual
for his students in the Bauhaus, but at the same time, he presents
instrument, producing “abstract” sounds via gesture analysis and
the general principles of his artistic research. In the first part of
recognition, where gesture is understood as an abstract entity. The
the book, Klee introduces the transformation of the static dot into
objective is to look for original relationships between gesture and
linear dynamics. In the colorful words of Sybil Moholy Nagy’s
sound through the recombination of elementary categories. In our
preface to the Sketchbook, the line, being a sequence of dots,
conception, we assume that there is no necessary relationship
“walks, circumscribes, creates passive-blank and active filled
between gesture and sound. On the contrary, the goal is to show
planes” (see Figures 1, 2 and 3).
how it is possible to build new effective and meaningful
relationships between gesture and sound, by defining abstract
relationships and appropriate mappings. The main idea is to
define a number of elementary components of gesture trajectories
and to associate to each of them a specific category of sounds. In
this section we discuss the principles adopted and the preliminary
results obtained.
151
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
MnM [10] is a package included in the FTM [11], an external Figure 5: Trajectory detection and classification: a CCW
library for Max/MSP, and in it is provided a Gesture-Follower curvilinear movement.
(see [12] and [13]). Unfortunately it was not suitable for our
purposes. In fact, this tool is intended for recognizing a large
collection of specific objects, while we need to recognize only From a technical point of view, the discrimination between
some more abstract characteristics. Here the purpose is to identify straight and circular movements is obtained by measuring the
a common characteristic of infinite objects. MnM needs to learn angle variations of the segments generated by three subsequent
many single object family in order to recognize similar ones. Our couples of points, i.e. the centripetal acceleration of the motion. A
aim is to find a common algorithm, a model that is valid for all variation near to zero is classified as a straight trajectory,
cases of a general category, for instance, of the curvilinear otherwise the curvilinear category is chosen.
movements (e.g. circles and spirals belong to the same category).
In EGGS, visual data concerning gesture are processed by a color
tracking routine that returns five values. The first one, ranging
from 0 to 3, discriminate between stillness, circular counter clock
wise (CCW) movement, straight movement, and circular clock
wise (CW) movement (see Table 1).
The second value is the scalar velocity of the gesture. The third
one is the angle, in radians, of the velocity vector, calculated from
the origin. The fourth value is the total angle, in radians,
calculated from the starting of the session; this value is useful in
order to have a continuously varying angle, avoiding the gap
between the end of a circle and the beginning of the next one.
152
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
6. REFERENCES
[1] Kagan, A. Paul Klee: Art & Music. Cornell University Press,
Ithaca, New York, 1987.
[2] Klee, P. Pedagogical Sketchbook, trans. Sibyl Moholy-Nagy.
Frederick A. Praeger, New York, 1965.
[3] Cadoz C., A. Luciani, and J.-L. Florens. Artistic creation and
computer interactive multisensory simulation force feedback
gesture transducers. In Proc. Conf. on New Interfaces for
Figure 6: EGGS in action: detection of a curvilinear
Musical Expression (NIME), pages 235–246, Montreal,
movement.
Canada, May 2003.
[4] D. Rocchesso and F. Fontana, editors. The Sounding Object.
Mondo Estremo, Firenze, 2003.
5. PERFORMATIVE POTENTIALITIES
[5] Kennedy, A. Bauhaus. Flame Tree Publishing, London,
AND FUTURE DEVELOPMENTS 2006.
EGGS provides a basic performance system. Many possibilities of
articulation and combination of the elementary mapping are [6] Camurri A., Drioli, C., Mazzarino, B., and Volpe, G.,
conceivable. We have tested a simple realization of an "Controlling Sound with Senses: multimodal and cross-
accumulation process, where stillness is the starting signal of the modal approaches to control of interactive systems". In P.
looping of a sonification. A fast alternation of movements and Polotti and D. Rocchesso, eds. Sound to Sense, Sense to
still instants create polyphonic situations, in which every loop Sound. A State of the Art in Sound and Music Computing.
automatically fade out in time. Logos Verlag, Berlin, 2008.
[7] Camurri, A. and Volpe, G., eds., Gesture-based
Also, as in any musical practice, the learnability issue is
Communication in Human-Computer Interaction, LNAI
fundamental. Exercise is important in order to understand the
2915, Springer Verlag, February 2004
possibilities of the instrument and obtain relevant results.
However, not many technical skills are needed as any simple [8] http://en.wikipedia.org/wiki/Shepard_tone
gesture produces a meaningful sonification. [9] Roads C., Computer Music Tutorial. The MIT Press,
Furthermore, following once more Klee’s and Bauhaus’ teaching Massachusetts, 1996.
and the “Punkt, Linie, Flaeche” (point, line, planes) paradigm, we [10] Bevilacqua, F., Müller, R. and Schnell, N., MnM: a
are working on an extension of the system in order to define plane Max/MSP mapping toolbox, Proceedings of the New
sonification. From a sonic point of view, this will correspond to Interfaces for Musical Expression Conference, NIME,
sound textures. More in general, our future plans are to Vancouver, Canada, 2005.
investigate the idea of using gesture as a control of both sound
[11] Schnell, N., Borghesi, R., Schwarz, D., Bevilacqua, F.,
and image generation. We can imagine three directions in creating
Muller, R. 2005. “FTM – Complex Data Structures for
correspondence between sounds and images: mapping sound to
Max.” Proc. of ICMC 2005. International Computer Music
image, mapping image to sound, and concurrent generation of
Association. Barcelona, Spain.
sound and image. With EGGS, the ultimate objective would be to
search for novel relations between sound and image by means of [12] http://ftm.ircam.fr/index.php/Gesture_Follower
recombining abstract categories controlled by gesture. The [13] Bevilacqua F., Guédy F., and Schnell N. “Wireless sensor
intention is to investigate if the definition of abstract (gestural) interface and gesture-follower for music pedagogy.” In
categories and the definition of effective (and independent) Proceedings of the 2007 Conference on New Interfaces for
mappings for both sound generation and image generation will Musical Expression (NIME07), New York, NY, USA.
153
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
154
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
or things which occur in nature. In any case, the artist (e.g., the through in-solid acoustic waves; further, thanks to the analogy of
musician) exploits the potentialities of the objects as a vehicle of these waves with sound, they can be “naturally” mapped to a
artistic meaning. These objects are denoted as “found” in order to perceptually clear and energetically consistent sound response.
distinguish them from other purposely created items used in arts. The limit – or the advantage – of TAIs with respect to TUIs is a
From the early compositions of Musique Concrète such as Pierre restriction of the scope: from “no limit” in the physical design of
Henry’s Variation pour une porte et un soupir (1963), John the input interface, to “no limit” in the choice of any object as an
Cage’s compositions, or astonishing soundtracks such as Jacques input interface. The possibility of using “any object” offers the
Tati’s movie Playtime, this practice continues investigating great opportunity to skip (to a certain extent) any training or
expressive qualities of everyday artifacts1, electronics included practice stage: the “interaction sound” mapping can be
(see for instance [5] for the practice of circuit bending, and [6] for designed so that the sound responds in an effective way to
the notion of infra-instruments). usual/everyday interactions with the objects.
155
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
156
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
physics-based sound models during the interaction. To this end, [9] N. Armstrong, “An Enactive Approach to Digital Musical
we created families of parameters configurations among which to Instrument Design,” PhD thesis, Princeton University, 2006.
morph; 2) Thanks to the possibility of recording gestural data, it is [10] S. Fels, L. Kaastra, S. Takahashi and G. McCaig, “Evolving
possible to interact with gestural loops in a “sequence and Tooka: from experiment to instrument,” Proc. Conf. on New
playback” style; 3) Interaction modalities (configurations) are interfaces For Musical Expression (NIME), Hamamatsu,
investigated in order to set basic musical gestures (as e.g., bending Shizuoka, Japan, June 03-05, 2004.
or finger-picking for a guitar). In detail:
[11] G. Essl and S. O’modhrain, “An enactive approach to the
the cutlery: both the fork and the knife make use of the design of new tangible musical instruments,” Org. Sound
friction sound model. By exploiting combinations of buttons 11, 3 (Dec. 2006), pp. 285-296.
and movements, users can range over different presets, or
effectively and reliably drive the control parameters of the [12] J. Patten, B. Recht and H. Ishii, “Interaction Techniques for
sound model, such as the stiffness and viscosity of the Musical Performance with Tabletop Tangible Interfaces,”
interaction, or the mass and the resonant qualities of the ACE 2006 Advances in Computer Entertainment,
Hollywood, California June 14-16, 2006.
objects (Figure 1);
[13] G. Weinberg, “Playpens, Fireflies and Squeezables – New
the bottles make use of a continuous-crumpling sound model Musical Instruments for Bridging the Thoughtful and the
[23]. The available control parameters are the stiffness and Joyful,” Leonardo Music Journal, MIT Press, vol. 12, pp.
shape of particles, and material resistance as a metaphor of 43-51.
the present quantity of liquid (Figure 2);
[14] A. Crevoisier and P. Polotti, “Tangible Acoustic Interfaces
the steak configuration: typically when holding the fork and their Applications for the Design of New Musical
with the left hand, and the knife with the right one; Instruments,” Proc. Conf. on New Interfaces for Musical
the pasta configuration: when holding the fork with one Expression (NIME), Vancouver, Canada, May 26-28, 2005.
hand, and a dressing bottle with the other. [15] D. Rocchesso and F. Fontana, editors, “The Sounding
Object,” Mondo Estremo, 2003. Available at
7. CONLUSIONS http://www.soundobject.org/
In this paper, we present a development in musical direction of
[16] W. W. Gaver, “How do we hear in the world? Explorations
our former work on sonic interaction design for artifacts. Some
of ecological acoustics,” Ecological Psychology, vol. 5, no.
examples of what we called SAFOs are illustrated. These new
4, pp. 285-313, 1993.
instruments reflect the impulse of giving voice to everyday
objects that belongs to musical traditions of every time and [17] K. Moriwaki, “MIDI scrapyard challenge workshops,”
culture. This practice is here brought to the present by making use Proc. Conf. on New interfaces For Musical Expression
of current technologies and interaction design. (NIME), New York, June 06-10, 2007.
[18] P. Cook, “Musical Coffee Mugs, Singing Machines, and
8. REFERENCES Laptop Orchestras,” 151st Meeting of the Acoustical
[1] A. Gadd and S. Fels, “MetaMuse: metaphors for expressive Society of America, Providence, May 2006.
instruments,” Proc. Conf. on New interfaces For Musical
Expression (NIME), Dublin, Ireland, May 24-26, 2002. [19] S. Jordà, M. Kaltenbrunner, G. Geiger, and R. Bencina,
“The reacTable,” Proc. Intern. Computer Music Conf.
[2] R. Hoskinson, K. van den Doel and S. Fels, “Real-time (ICMC), 2005.
Adaptive Control of Modal Synthesis,” Proc. Conf. on New
Interfaces for Musical Expression (NIME), Montreal, pp. [20] K. Ferris and L. Bannon, “The Musical Box Garden,” Proc.
99-103, 2003. Conf. on New Interfaces for Musical Expression (NIME),
Dublin, Ireland, May 24-26, 2002.
[3] “A Roadmap for Sound and Music Computing,”
http://smcnetwork.org/roadmap [21] A. A. Cook and G. Pullin, “Tactophonics: Your Favourite
Thing Wants to Sing” Proc. Conf. on New Interfaces for
[4] C. Sachs, “The History of Musical Instruments,” Norton and Musical Expression (NIME), pp. 285-288, New York, NY,
Company, Inc., 1940. USA, 2007.
[5] R. Ghazala, “Circuit-Bending: Build Your Own Alien [22] P. Polotti, S. Delle Monache, S. Papetti and D. Rocchesso,
Instruments,” Wiley Publishing Inc, Indianapolis, USA, “Gamelunch: Forging a Dining Experience through Sound”,
2005. http://www.anti-theory.com/soundart/ Proc. Conf. on Human Factors in Computing Systems
[6] J. Bowers and P. Archer, “Not Hyper, Not Meta, Not Cyber (CHI), Florence, Italy, 2008. http://www.vimeo.com/874774
but Infra-Instruments,” Proc. Conf. on New Interfaces for [23] R. Bresin, S. Delle Monache, F. Fontana, S. Papetti, P.
Musical Expression (NIME), Vancouver, BC, Canada 2005. Polotti and Y. Visell, “Auditory feedback through
[7] P. Dourish, “Where the Action Is: The Foundations of continuous control of crumpling sound synthesis”. Proc.
Embodied Interaction,” MIT Press Cambridge, MA, USA, CHI - Sonic Interaction Design workshop, Florence, Italy,
2001. April 6th, 2008.
[8] F. J. Varela, E. Thompson and E. Rosch, “The Embodied
Mind: Cognitive science and human experience,” MIT
Press, Cambridge, MA, USA, 1991.
157
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
158
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
sonArt [30], for instance, is a system for creating music from While the term “feature” is used extensively in the computer
information contained in still pictures. vision literature, its definition remains somewhat vague. A
feature can be seen as “an interesting image structure that could
Some research has been done from a more scientific perspective
arise from a corresponding interesting scene structure. Features
towards the sonification of vector fields, which at the mapping
can be single points such as interest points, curve vertices,
stage shares some similarities with the framework presented
image edges, lines or curves or surfaces, etc.” [5] For the
here.
purpose of this paper, however, features can be seen as having
Funk, Kuwabara and Lyons used an optical flow field in the following properties: 1) They are local, that is, they have a
conjunction with face detection and zones to devise a musical specific (x,y) position. 2) They exist at a given scale. For
interface that can be played with the muscles of the face. [7] example, a square can either yield a single large-scale feature or
Jakovich and Beilharz used a dense optical flow field (one four small scale features at each corner. 3) Features are the local
computed at every pixel in the image) to alter the cells of a maximum of some image intensity variation metric. The
cellular automaton running a “game of life”, which in turn features that match these properties are often referred to as
controlled a granular synthesizer [10]. “corners”.
The most similar research to date is that of Kapur et al. who As feature detection is now one of the most fundamental
used motion data from a VICON system to control parameters processes in computer vision, several algorithms have been put
of various synthesis algorithms [11]. While their direct mapping forward [15]. The Harris detector [8] and its multi-scale variant
(for instance using n motion vectors to control n sinusoids of an [14], and the very closely related Shi-Tomasi detector[22],
additive synthesiser) closely mirrors that of the framework which are based on the partial derivatives of the image intensity,
presented here, the VICON system, with its six cameras and are some of the most commonly used algorithms. Other
physical markers imposes great physical and technological detectors include the difference of gaussian (DoG), the SUSAN
constraints that limit the range of its practical uses. corner detector [25] and the FAST corner detector [21]. If we
Furthermore, the authors focused their research on human limit our search to smallest-scale features those occurring in
gestures, whereas this research aims towards the use of arbitrary a 9 pixel by 9 pixel neighborhood the machine-learning-
imaging data. based FAST detector is well-suited due to its rapid execution
time. However, the Shi-Tomasi detector and the DoG detector
may prove better choices in certain situations.
3.FEATURES AND FLOW
3.1 Image Features 3.2 Motion Flow
Raw images contain a vast amount of information: a single- As a result of performing feature detection, the image is
channel 320 by 240 pixel 8-bit image contains 76,800 bytes, described as a field of image coordinates (and optionally scale
which translates to 2,304,000 bytes per second. By contrast, a values) corresponding to the features in the image. While it
stereo 16-bit audio stream at 44.1 kHz yields only 176,400 bytes would be possible to use this information as it is, in order to
per second. In order to limit the amount of data available, salient perform more significant mappings, it is important to find out
image features must first be identified. how these features move from frame to frame. The techniques to
achieve this can be broadly classified in two categories: feature
matching techniques and optical flow-based techniques.
Feature matching techniques [25] involve finding the features in
two different frames and matching each feature in one frame to
the most similar feature in the second. A number of statistical
metrics can be used to measure the similarity of two features
based on the values of its pixel neighborhoods. The sum of
squared differences and the earth mover distance are two such
metrics that perform well [25]. It should be noted that feature
matching is an asymmetric process: not all features in both
images can be matched into pairs. Some features in the first
image will be lost, some in the second will appear and some,
outliers, will be mismatched.
Instead of computing features for each frame and finding
matching pairs, it is also possible to start with a given set of
features and calculate the optical flow at each of these points.
The optical flow is a (x,y) vector expressing apparent motion
at a point. Common optical flow estimation algorithms can be
classified between block-matching methods [1] (which are
computationally similar to feature matching algorithms) and
differential-based methods such as the Lucas-Kanade algorithm
and its more robust pyramidal implementation [2]. Knowing the
Figure 1: Features computed at two different scales displacement value, it is possible to compute the new position
for every feature at each frame. Because in most cases features
will be lost, for example by moving outside the image bounds,
159
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
and new features are bound to appear, it is necessary to update 4.2 Space
the feature list in parallel with the optical flow calculation. Since images are inherently spatial, the most natural and
Hence, the image is processed in this way: optical flow is motivated mapping possible is that of vector position to sound
calculated for existing features and their position is updated position. As a matter of fact, the framework outlined in this
features that could not be successfully tracked are removed from paper is particularly apt at creating complex spatial trajectories.
the list new features are searched in image areas where there
are currently no features. The simplest type of spatial mapping is to assign the normalized
x values of each vector to the stereo pan position of the sonic
Regardless of the combination of feature detection and tracking
component it corresponds to. There is typically more freedom as
algorithm used, the result is conceptually the same: a field of
to how the y axis can be interpreted: in a planar surround
motion vectors either in the format (x,y,x,y) or
playback environment, it can be mapped to the front-back axis
(x,y,s,x,y,s) where x and y denote position and s denotes
although in some setups, it could also be assigned to the up-
scale.
down axis.
3.3 The Flow Field It is also possible to generate positional vectors for the various
In its raw state, the motion vector field computed above is not audio spatialization methods available. The scale dimension (if
usable for musical purposes. It will typically contain a certain it is calculated) can be mapped to the z or y axes, with larger
number of outlier vectors, which will tend to produce jarring features being mapped to closer positions. While this would
and unpredictable results when mapped to sound synthesis correctly translate features becoming smaller and larger to
parameters. It is thus necessary to run a rather strict filter on the sounds moving further and closer, this is a rather naïve mapping
motion field to get rid of these outliers. This filter can be that can often lead to undesirable results. Large features do not
implemented several different ways, including the median flow necessarily correspond to closer objects. In this case, one
technique described by Smith et al., in which “each vector in dimension must be assumed to be constant: the vectors are
turn is compared with its neighbours. If it points in a similar assumed to move along a plane, though in some cases this is not
direction or is a similar, small, length when compared to the an accurate representation of the motion of the object being
'median flow' in that area, then it is classified as an inlier, sensed. In some situations it should also be possible to
otherwise it is discarded as an outlier.” [25] extrapolate the z axis displacement of motion vectors using 3D
reconstruction algorithms.
While the motion flow field is expressed using cartesian One of the great advantages of using motion vectors for
coordinates and deltas, for later mapping purposes it is useful at
spatialization is that since we know not only where a feature lies
this stage to translate at least the displacement values to a polar but also how fast and in what direction it is moving, it is also
coordinate system. This yields the following motion vector:
possible to use this information to control doppler shift
(,,s,,,s) or in hybrid form: (x,y,s,,,s). (Here also, simulations.
the scale dimension is optional.)
It would be possible at this stage to perform further analysis on
4.3 Amplitude
An often convincing approach to controlling amplitude
the motion field. 3D reconstruction algorithms would allow us
parameters of synthesis components is to assign the length of
to recover some form of depth information, either in the form of
the displacement vector () to amplitude. As is directly
camera ego-motion or scene structure. More general algorithms
related to motion velocity, this means that faster objects will
can quantify certain types of macroscopic motion such as
sound louder. This relationship is somewhat metaphorically
contractions and expansions, as well as perform object
grounded: if the sound is thought to be generated through
segmentation. However, in this framework, this step is skipped
friction, then indeed faster gestures will produce louder sounds.
in favor of using the vectors directly.
Hence, the velocityamplitude mapping is to an extent
perceptually motivated.
4.MAPPING
Overall amplitude is also indirectly controlled via vector
4.1 Time density. As has already been mentioned, motion flow over a
Depending on the type of synthesis technique used, it may be given area exhibits smooth transitions. This means that areas
necessary to process the motion vectors temporally. Using with a high density of features will tend to produce several
current hardware, frame rates of 30 Hz are typical for camera- similar sound components, which adding up, result in greater
based systems. More specialized cameras can image at up to 120 overall amplitude.
Hz but processing these images in real-time becomes
problematic. For additive synthesis and similar generators a Lastly, when scale is taken into consideration, it can make sense
control rate as low as 30 Hz may not always be a significant to use it to control amplitude, with larger features sounding
problem, however, the time quantization artifacts that results louder. Note that since spatialization, beyond simple linear
when motion vectors are used to generate sonic grains are quite panning, also affects amplitude it might not always be necessary
noticeable and likely undesirable. The solution to this problem to control amplitude directly.
is to smooth out the vector field temporally by delaying each 4.4 Frequency and Timbre
vector individually by some random value normalized between 0
The most difficult mapping to motivate is that of parameters that
and the projected time until the next frame is processed. affect the pitch and timbre of the sound. That is not to say that
such mappings must always be arbitrary, but they largely depend
on the nature of the image used, the type of synthesis technique
160
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
employed and most importantly the intent of the composer or three voices to the music corresponding to each dancer, they
performer. were never explicitly identified by the system. The polyphonic
aspect of the music was a direct translation of the “polyphonic”
Even in situations where there is no spatialization, mapping
nature of the action on stage.
vector displacement in a pseudo-doppler fashion can often result
in interesting sound textures. Here, frequency is a function of To a limited extent, this form of control gestalt, where global
the displacement relative to the image origin. control structures implicitly result in similar global output was
already present in early zone-based systems like the VNS.
In some cases, where the image is to be controlled by a musical
However, the greater amount of information contained in motion
performer, simply assigning a given axis value to pitch can be a
vector fields coupled with microsonic sound generation means
convincing and easy to understand approach.
that these relationships occur at a much finer degree.
Other possible mappings for frequency include distance from
The performance scenarios outlined above use image analysis in
origin (), displacement direction (similar conceptually to
a traditional fashion. Some more exotic approaches include the
accordions and harmonicas) or displacement amplitude (related
use of pre-recorded video as a composition tool. Translating
to the pseudo-doppler approach).
visual structures and movements to musical forms can be a very
Timbre control in this framework is achieved by the efficient and rewarding method of generating musical material
superposition of a great number of sound components and by that can be further edited or processed as part of a composition.
altering the pitch and amplitude of these components. It is also The motion flow field lends itself especially well to the
possible to affect the timbre through the number of features generation of dense, micro-polyphonic scores.
present in the image. This can be done either by changing the
Returning to the realm of performance, the framework has also
input image so that it is less complex or by changing the
been used in conjunction with video content generated in real-
threshold of the feature detector. A greater number of features
time by a VJ, in order to have the visuals linked to part of the
directly translates to a greater number of synthesis components.
music. The possibility of robustly coping with a vast range of
When using granular processing of recorded sound, it is also possible input structure is a great asset in this scenario.
possible to control the timbre of the resulting sound by
In a somewhat less musical vein, it is also possible to use
assigning vector position to sound file position. For example, a
systems based on motion flow fields to perform automated foley
vector moving from the left edge of the image to the right edge
tasks. While some research has been done in this direction in the
might trigger a sound to be played back from start to end (or
past [17], it assumed that the motion of objects in the scene was
vice-versa.)
already known. With some adjustments and proper sound
generation algorithms, it is possible to create convincing sound
5. AESTHETIC ISSUES effects, especially considering the spatial gestalts outlined
The framework presented here is meant to be general in nature above.
and adaptable to many different situations. As a performance
tool, it offers a natural method of controlling sound clouds and 6. IMPLEMENTATION
dense textures. Two usage examples highlight an important While the general concepts of how the framework can be
aesthetic aspect, that of control gestalt . This control gestalt acts
implemented are presented in earlier sections, the current
as a binding agent between perceptual groups, or clusters, in
system implementation will be described in greater detail.
both the source image and its sonified form [28].
Recent portable computers often come equipped with a camera Despite its usefulness, the computation of the motion flow field
mounted somewhere above the screen. With this camera, we can remains somewhat intensive, limiting both the maximum frame
control the parameters of a sound mass generated through rate, minimum latency, image size and CPU cycles left for
additive synthesis. If frequency is a function of the motion sound generation. This is especially problematic since the sound
vector's position, then head movements towards and away the synthesis algorithms used tend to also be rather taxing. In the
screen will result in sonic expansion and contraction, as each of earliest implementation, the solution was to use two computers
the components' frequencies more towards and away from each with one dedicated to image analysis and the other to sound
other. The image features are also contracting and expanding generation. This solution worked well but it is bulky and costly.
away from each other. However, we do not need to actually In recent years, much attention has been directed towards
measure this change. By virtue of direct sonification, the global general processing on graphical processors (GPGPU) [13].
characteristics of the motion flow field are expressed in the Already a number of libraries, such as OpenVIDIA [6],
sound output. implement some computer vision task on the GPU, freeing the
CPU for other tasks and sometimes yielding improvement in
The first use of a system based on this framework by the author performance of an order of magnitude [23].
occurred in January 2007 for an improvised dance performance
held at the Hy go Performing Arts Center. The sounds were The system is currently implemented as an external object for
generated by granular processing of sound files, with each Cycling'74's Jitter system. Standard Jitter functionality is used
grain's spacial location mapped so that it would sound to the for image input but all further processing is carried out
audience as though it was coming from where the dancer was. If internally. While this most recent implementation of the
there were two dancers standing at opposing sides of the stage, framework uses the GPU to perform the image analysis, it is
two different sound clusters could be heard in those positions. independent of existing software libraries. When GPU
When a third dancer raced across the stage, yet another sound processing is available, features are identified using the Shi-
followed him. However, while it sounded as though there were Tomasi method and matching is performed using the sum of
161
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
squared differences. If the computation cannot be performed on [8] C. G. Harris, “Determination of ego-motion from matched
the GPU, it reverts to the previous CPU-based algorithm, where points,” in Proceedi ngs of the 3rd AlveyVision Conference,
features are selected using the FAST method and are then pp. 189-192, 1987.
tracked using pyramidal Lucas-Kanade optical flow estimation. [9] A. Hunt, M. Wanderley, and R. Kirk, “Towards a Model for
It should be noted that since different feature detection and Instrumental mapping in Expert Musical Interaction,” in
tracking algorithms are used the vector fields generated by GPU Proceedi ngs of the 2000 Inter national Computer Music
and CPU implementations will differ. In practice, however, they Conference , pp. 209-212, 2000.
will display similar characteristics that will result in very similar
sound output. [10] J. Jakovich and K. Beilharz, “ParticleTecture: interactive
granular soundspaces for architectural design,” in
After the motion flow field has been processed to remove the Proceedi ngs of the 2007 inter national Conference on New
noise and make adjustments to its coordinates it is sent to the inter faces For Musical Expression, pp. 185-190, 2007.
sound synthesizer via OSC [29]. OSC is used to decouple the
analysis module from the synthesis module, which is meant to [11] A. Kapur, G. Tzanetakis, N. Virji-Babul, G. Wang, and P.
be implemented by the user. Temporal smoothing through Cook, “A Framework for Sonification of Vicon Motion
random delay can be electively performed prior to output. Capture Data,” in Proceedi ngs of the 8th Inter national
Conference on Digital Audio Effects, 2005.
7.CONCLUSION [12] E. Klein and O. Staadt, “Sonification of Three-Dimensional
Motion flow fields are not a perfect method of controlling Vector Fields,” in Proceedi ngs of the SCS High
musical parameters. As outlined above, the temporal resolution Perfor mance Computing Symposium, pp 115-121, 2004.
is comparatively poor. The biggest flaw is probably that feature [13] D. Luebke, M. Harris, N. Govindaraju, A. Lefohn, M.
detection and tracking algorithms are not perfectly robust. When Houston, J. Owens, M. Segal, M. Papakipos, and I. Buck,
used as an instrument, it is often very difficult to finely control “GPGPU: general-purpose computation on graphics
individual components, as one cannot know with certainty where hardware,” in Proceedi ngs of the 2006 ACM/IEEE
precisely features will be identified in real-world situations. Conference on Supercomputing, p. 208, 2006.
However, motion flow fields are better suited for control of
dense masses of sound which in practice alleviates the problem. [14] K. Mikolajczyk and C. Schmid, “Scale & Affine Invariant
Its main merits lies in the generality of the approach, the Interest Point Detectors,” Inter national Journal of
possibility of using natural structures as a source of sonic Computer Vision, vol. 60, no. 1, pp. 63-86, Oct. 2004.
complexity and the control gestalts outlined above. [15] F. Mokhtarian and F. Mohanna, “Performance evaluation of
corner detectors using consistency and accuracy measures,”
8.REFERENCES Computer Vision and Image Underst anding, vol. 102, no. 1,
pp. 81-94, Apr. 2006.
[1] S. S. Beauchemin and J. L. Barron, “The Computation of
Optical Flow,” ACM Computing Surveys, vol. 27, no. 3, pp. [16] N. Moody, N. Fells, and N. Bailey, “Ashitaka: an
433-367, 1995 audiovisual instrument,” in Proceedi ngs of the 2007
inter national Conference on New inter faces For Musical
[2] J.-Y. Bouguet, “Pyramidal implementation of the Lucas-
Expression, pp. 148-153, 2007.
Kanade feature tracker,” Intel Corporation Microprocessor
Research Labs, 1999. [17] M. Nayak, S. H. Srinivasan, and M. S. Kankanhalli, “Music
synthesis for home videos: an analogy based approach,” in
[3] A. Camurri, S. Hashimoto, M. Ricchetti, A. Ricci, K.
Proceedi ngs IEEE Pacific-Rim Conference On Multimedia,
Suzuki, R. Trocca, and G. Volpe, “EyesWeb: Toward
pp. 1556- 1560, 2003.
Gesture and Affect Recognition in Interactive Dance and
Music Systems,” Computer Music Journal , vol. 24, no. 1, [18] M. Ojanen, J. Suominen, T. Kallio, and K. Lassfolk,
pp. 57-69, Apr. 2000. “Design principles and user interfaces of Erkki
Kurenniemi's electronic musical instruments of the 1960's
[4] M. Cardle, S. Brooks, Z. Bar-Joseph, and P. Robinson,
and 1970's,” in Proceedi ngs of the 2007 Inter national
“Sound-by-numbers: motion-driven sound synthesis,” in
Conference on New inter faces For Musical Expression, pp.
Proceedi ngs of the 2003 ACM Siggraph /Eurographi cs
88-93, 2007.
Symposium on Computer Animation, pp. 349-356, 2003.
[19] C. Roads, Microsound, Cambridge, Mass., USA: MIT
[5] R. Fischer, K. Dawson-Howe, A. Fitzgibbon, C. Robertson,
Press, 2001.
and E. Trucco, Dictionary of Computer Vision and Image
Processi ng, New York: Wiley, 2005. [20] D. Rokeby, "Very Nervous System," Nov. 2000. [Online].
Available: http://homepage.mac.com/davidrokeby/vns.html
[6] J. Fung and S. Mann, “OpenVIDIA: parallel GPU computer
[Accessed Apr. 15, 2008].
vision,” in Proceedi ngs of the 13th Annual ACM
inter national Conference on Multimedia, pp. 849-852, [21] E. Rosten and T. Drummond, “Machine learning for high-
2005. speed corner detection.” in Proceedi ngs of the 9th European
Conference on Computer Vision, pp. 430-443, 2006.
[7] M. Funk, K. Kuwabara, and M. J. Lyons, “Sonification of
facial actions for musical expression,” in Proceedi ngs of [22] J. Shi and C. Tomasi, “Good Features to Track.” in
the 2005 Conference on New inter faces For Musical Proceedings of the 1994 IEEE Computer Society
Expression, pp. 127-131, 2005. Conference on Computer Vision and Pattern Recognition,
pp. 593-600, 1994.
162
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
[23] S. Sinha, J.-M. Frahm, M. Pollefeys, and Y. Genc, “GPU- [27] B. Truax, “Real-Time Granular Synthesis with a Digital
based video feature tracking and matching.” presented at Signal Processor,” Computer Music Journal , vol. 12, no. 2,
Workshop on Edge Computing Using New Commodity pp. 14-26, Summer 1988.
Architectures, Chapel Hill, North Carolina, USA, May [28] S. Williams "Perceptual Principles in Sound Grouping," in
2006. Auditory Display: Sonification, Audification and Auditory
[24] P. Smith, D. Sinclair, R. Cipolla, and K. Wood, “Effective Inter faces , G. Kramer, ed. Santa Fe Institute Studies in the
Corner Matching,” in Proceedi ngs of the 9th British Sciences of Complexity, Proc. Vol. XVIII, Reading MA:
Machine Vision Conference, pp. 545–556, 1998. Addison Wesley, pp. 95-125, 1994
[25] S. M. Smith and J. M. Brady, “SUSAN—A New Approach [29] M. Wright, A. Freed, “Open Sound Control : A New
to Low Level Image Processing,” Inter national Journal of Protocol for Communicating with sound Synthesizers,” in
Computer Vision, vol. 23, no. 1, pp. 45-78, May 1997. Proceedi ngs of the 1997 Inter national Computer Music
[26] S. Soto-Faraco and A. Kingstone, “Multisensory Conference, pp. 101-104, 1997.
Integration of Dynamic Information,” The Handbook of [30] W. Yeo, J. Berger, “Application of Image Sonification
Multisensory Processe s, G. A. Calvert, C. Spence, and B. Methods to Music,” in Proceedi ngs of the 2005
E. Stein, Eds. Cambridge, Mass., USA: MIT Press, pp. 49- Inter national Computer Music Conference, 2005.
68, 2004.
163
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
164
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
certain importance, which may not be fully realised in current native tongue) has nothing to do with either the image it conjures
processes of sound generation. up, or the physical reality of a tree. This idea, that sign and
signified have no innate connection, has played out in many
different guises over the course of the last hundred odd years,
2. P[A]RA[PRA]XIS: A SEMI[ER]OTIC beginning with early modernism, and culminating in multiple
MACHINE instances of user-created semiotic systems, where any sign may be
P[a]ra[pra]xis provides a platform for the performer (or attached to any signifier, as long as the relationship is pre-
musician, or writer) to sculpt a personally meaningful system of determined. In the paper previously mentioned, Magnusson sees
linguistic substitution within a self-created text. Although the that “actors and the contexts in which they function are all
P[a]ra[pra]xis Suite software is applicable to any project elements in a semiotic language…We provide a semiotics or
involving the sonification of data gathered from lingual suggest language games where the behaviour of an actor maps
substitutions, it was created with a particular direction in mind. onto some parameters in a sound engine. For example, vertical
The term ‘Parapraxis’ emerged as an English translation for what location of an actor could signify the pitch of a tone or playback
Freud termed die Fehlleistung, literally, ‘faulty action’, used to rate of a sample”[5].
describe the unintentional miscommunication occurring during
even the most banal of daily human interactions [7]. It In taking on Saussure’s notion that ‘the link between signal and
encompasses the range of mistaken perceptions, actions or speech signification is arbitrary’, many conceptual versions of semiotic
which occur when the subconscious and the conscious mind, as is systems fail to take a key factor into account: much of the power
generally the case, are working to non-aligned agendas, and is of language arises precisely because of the false innate meaning
commonly known as the Freudian slip, where you may ‘say one we ascribe to individual words. P[a]ra[pra]xis aims to utilise this
thing but mean your mother’. Needless to say, its motives are power by involving the performer/user in a tension between
often classed as sexual. emotional or psychic resonances which may be attached to
particular word significations and the implementation of a rule-set
The unique combinations of words and concepts which parapraxis which can make what may at first appear to be extremely radical
creates also lend an additional flexibility to grammatical norms. changes to the associations between words as we generally use
Whereas Freud’s ‘parapraxis’ is either a singular instance or a them.
genre-descriptor of such an error and constitutes that which is a
kind of ‘sub-normal activity’ in relation to the business of This returns us to Freud’s investigation of the hidden associations
perception and communication, our version, P[a]ra[pra]xis, lurking in every Parapraxis; P[a]ra[pra]xis works to open up
conflates the nuance of ‘para’ meaning ‘beyond’, or ‘outside of’ these associations in several ways. Firstly, a user involved in
with the academic notion of ‘praxis’ as theory put into action: entering or modifying words for the dictionary file is free to
thus it comes to describe an entire way of creatively exploring explore their own mental links between sounds, text and ideas.
language and music through the building of user-initiated When dealing with the word ‘box’, one man’s ‘bo[ra]x’ may be
dictionaries based on free association and metonymic slippage [8]. another man’s ‘b[ot]ox’. When playing P[a]ra[pra]xis in real-
time, users will be forced to respond to lingual substitutions
In the early 1900s, the Swiss linguist, Ferdinand de Saussure, was determined by a dynamic, but grammatically oriented rule-set. A
responsible for the development of a linguistic apparatus which player writing a poem or story will be subjected to a continually
re-defined the focus of the relationship between words and the altering narrative, and will thus involuntarily form new chains of
ways in which meanings become attached to them. Saussure signification, by either engaging or refusing to engage with the
claimed the linguistic sign as “a two-sided psychological entity”, material presented.
consisting only of “a concept which exists in equilibrial
relationship with a sound pattern” [9].
165
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
word has at least one substitution that meets the conditions of a properties. These are appended sequentially to the list broadcast
rule. For example, if the rule stipulates that nouns can only be on the /knownWord and /replacement address patterns1.
replaced with other nouns, and the typed word is a noun but none
of its possible replacements are nouns, no substitution is made.
Figure 2. shows how a set of possible substitutions are filtered
into a set of legal substitutions.
166
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
‘hearer’ and from the utilitarian meanings we ascribe to words for the relationships between glissandi in a fugal counterpoint, and
the sake of shared communication to the metonymic resonances signaling the start of a new invocation of the cantus firmus, or
(often unwelcome) which are engendered in the unconscious principle melodic line.
mind. As the performer/musician/writer has complete control not only
over the possible substitutions created for dictionary words, but
also over the framework in which to define their relationships, it
is very easy to generate audio output which maps the emotionality
of the piece through changes in the text.
5. CONCLUSION
The development of this P[a]ra[pra]xis software suite marks a
milestone in a continually evolving and expanding project.
Starting from the simple shared idea of a basic real-time
interactive poetry generator, we have been drawn to grammar,
linguistics, psychoanalytical theory and serial, electronic
composition as tools to investigate the human relationship to
language.
P[a]ra[pra]xis marks a collaboration between two authors from
divergent backgrounds within the Creative Arts field; Poetry and
Sonic Arts. In order to make P[a]ra[pra]xis a genuine
collaboration, not just an outsourcing of difficult specialist tasks,
we have had to adjust and develop our perceptions of our own and
each others’ language, just as those who play P[a]ra[pra]xis will.
Hopefully others will find this as beneficial as we have.
6. REFERENCES
[1]Judge,A.<http://www.laetusinpraesens.org/docs00s/conv
ert.php> (draft, 2007)
[2] Worral, D., Bylstra, M., Barrass, S., and Dean, R.
Sonipy: The Design of an Extendable Software Framework
for Sonification Research and Auditory Display Proc.
ICAD 2007, Montreal, Canada.
[3] NLTK <http://nltk.sourceforge.net >
[4] Wordnet <http://wordnet.princeton.edu>
[5] Magnusson, T. Screen-Based Musical Interfaces as
Figure 4. Screen shots of the IM performance. Each person Semiotic Machines Proc. NIME 2006, Paris, France.
sees the original text they type whilst only seeing the altered
version of the other person’s text. [6] See, for example, the work of Jenkins, G.S. at
Because the text is re-written in realtime on the other person’s
<http://www.1-4inch.com/archive05.html>
screen, (typically animated at around 25 mSec/character), the [7] Freud, S. ‘The Psychopathology of Everyday Life’ in
performance develops its own pace. Also, a visual counterpoint The Standard Edition of the Complete Psychological Works
develops between the two screens, as the square brackets make of Sigmund Freud, Vol. VI (London: The Hogarth Press
substituted sections appear especially dense. and the Institute of Psychoanalysis, 1966).
The music is generated by interpreting a number of performance [8] Detailed discussion on the role played by metonymic
artefacts. Based on a set of endless glissandi [Risset], their slippage in the functioning of the unconscious can be found
relative base frequencies and speed are continually modified as a
counterpoint to the tension in the dialogue. Specific factors
in: Lacan, J. The Four Fundamental Concepts of Psycho-
controlling musical parameters are: average time between analysis, ed. Jacques-Alain Miller, trans. Alan Sheridan
keystrokes; sentence length; phrase length (how much a person (London: Hogarth Press, 1973)
types before pressing the ‘send’ button); and type of substitution. [9] de Saussure, F. Course in general linguistics, ed.
Interpreting the type of substitution is especially powerful. Whilst Charles Bally and Albert Sechehaye with Albert Riedlinger,
most of the substitutions are midrashes and use square brackets,
trans. Roy Harris, (London: Duckworth, 1983), 67.
phonetic substitutions and anagrams provide visual relief as well
as prompting a different kind of intellectual reaction from an [10] Dubrau, J. and Havryliv, M. P[a]ra[pra]xis in proc.
audience. The music-generating algorithm uses these to structure ACMC, June 2007
167
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
168
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
with listening on at least two levels (natural sound and composed point of interest in the landscape and extending spheres of
electronic music) and to engender from this a meaningful musical influence of varying size. We collected the GPS-coordinates for
experience. This setting is representative of a number of everyday each spot, assigned them their music, a playback mode, a radial
musical and sonic experiences that form part of our contact with amplitude envelope which controls the cross fade between zones
mobile technologies. Ubiquitous or pervasive computing is a or the increase in volume towards the point of interest and finally
growing trend through the closer integration of a increasing the size of the sphere. The eight routes were mapped out with a
number of technologies into our personal communication devices. total of 86 points of interest, each covering a large area and
Tapping into this potential for musical creation and acquiring overlapping with one or several of the neighboring zones. Google
knowledge and experience in dealing with these issues is one of Earth became an invaluable tool for planning and visualizing the
the main motivations of this project. spatial relationships of the sound zones and routes. (Fig 1.)
The following projects serve as examples of other location-based While approaching the task of producing the actual music for the
concepts in the urban field. Both Akitsugu Maebayashi’s Sonic davos soundscape several additional strategies emerged. Since
Interface [7] and Lalya Gaye’s Sonic City [8] use urban sounds one of the premises was to present a transparent sound overlaying
and user interaction to create a mobile personal soundscape. Marc natural and composed elements through open headphones, it was
Shepard's Tactical Sound Garden [9] is about planting sounds in quite natural to think of the effects an augmented acoustic reality
an urban context and it locates the user through triangulation of achieved by using field recordings or recordings of natural sounds
known wireless hotspots. The projects Mediascape by HP Labs as well as their polar opposites, the purely synthetic sounds. It
[10] and the net_dérive by Atau Tanaka for Sony/CSL [11] merge soon became apparent that the strict separation of the two would
different types of media content and location technologies to be difficult and not very desirable if one wants to to maintain the
create an urban and social interaction. sonic unity of the piece. The music contains brief sequences of
field recordings made on site, sometimes deliberately displaced,
for example where the cowbells from the alpine pastures make
3. COMPOSITION their appearance on the busy main thoroughfare or when the
The first task in our composition process was to define how the sound of the wash of the waves on the lakeshore reappears in the
landscape should be subdivided. Eight routes were devised, each middle of the mountain woods. Since most of the time no
representing an essential aspect of the area. The two town centers predetermined chronology is possible in the arrangement of
(Davos is actually split in two); the lake-side promenade; the sounds, the music rarely establishes a linear evolution. Most zones
town’s park; the famous two kilometer long hill-side promenade; have several possible neighborhood relationships; the music can
the walk downriver to the secluded forest cemetery; the high overlap and occur in a number of combinations all depending on
pastures and woods on the slopes above the Schatzalp sanatorium the itineraries chosen by the visitors. It was our principal intention
and finally the alpine hiking trails high up towards the to generate an inderminate field of acoustic possibilities that had
Weissfluhjoch. Each of these routes was treated differently, the to be explored and experienced in an individual way.
sonic structure or spatial placement of the music governed by a
different principle. The longitudinal topography of the hillside 4. TECHNOLOGY
promenade for example engendered sequential musical segments At present mobile devices equipped with GPS are becoming very
that connect differently according to the point of entry and the common but they were less accessible when we evaluated
direction along which one walks on the promenade. possible solutions for our GPS-enabled music platform. Based on
the options available at the time the choice was made to use a
semi-industrial platform running Linux.
4.1 Hardware
The prerequisites for the mobile device were guided by a number
of personal choices. We wanted to be able to write custom code
without having to develop the entire software from the ground up.
We wanted a device that gives access to all low-level routines of
the Firmware or OS in order to set up daemons for automatic
upkeep of the devices for extended periods of time. Coming from
a background in electronic music, we were interested in using
data-flow software for composition. The device needed to contain
or easily connect to a GPS receiver and make the position-data
available on a software-accessible interface. It had to have a solid-
state medium on board that would store several hours of
uncompressed stereo PCM audio and an analog audio output
which could be controlled from software. The device needed to be
able to run for about eight hours on one battery charge. Most
Figure 1. The lakeside promenade and its corresponding importantly we wanted to avoid having to build any custom
sound zones (image from Google Earth) electronic components ourselves.
The circling of the lake by its promenade led to overlapping zones 4.1.1 Choice of Platform
some of which functioned like musical beacons across the water. After evaluating all available options, ranging from commercially
The common structure that emerged and became the guiding available GPS equipped PDAs to open source hardware, we
principle for all areas was the use of circular zones centered on a finally decided to use the Gumstix platform [12]. It fulfills many
169
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
of the prerequisites by offering a selection of expansion-boards in keyboard. All operations are executed from the host computer on
addition to its ARM-based motherboards. The three determining the command line through a secure shell.
factors were that the Gumstix run a Linux OS, that they offer an
expansion-board that hosts both a GPS receiver and audio I/O 4.2.2 PDa
chip and, as we were excited to learn, that a port of Pure Data An important step for our project was to port PDa to the Gumstix.
called PDa (Pure Data anywhere) [13] is available for the ARM Featuring a limited set of functions PDa still contains all essential
processors used on many of the embedded single board devices. tools for audio on such a system. Originally ported to run on an
Finally, several expansion modules exist for the Gumstix that iPaq-PDA this downscaled version of Pure Data has been
offer Compact Flash or MMC interfaces to connect large solid- successfully applied to Apple iPods, Linksys routers and a variety
state storage devices. of portable devices running Linux [14]. Since the ARM type
processor doesn't feature a dedicated Floating Point Unit, and
4.1.2 Device Assembly software processing of IEEE-754 32-bit floating point numbers is
In addition to the embedded computer with its daughter boards, extremely slow, PDa has been rewritten to run all DSP code in
three other components are necessary to make the device 16-bit fixed point numbers. This makes extending the audio-
complete: the battery, the GPS-antenna and the headphones. capabilities difficult and for that reason, for example it is not
Unsure about the actual power consumption we opted for a large possible to play compressed audio in the Ogg/Vorbis or the mp3
single-cell 3300mA/h Lithium Polymer battery. This is the same formats. Apart from this limitation normal patches can be written,
technology found in mobile phones and laptops. Preliminary tests system-access is given through the shell external and access to
had shown that the device consumed roughly 250 to 350 serial ports is possible after porting Pure Data’s comport object.
milliamps per hour so in theory a full charge should run for a full The most essential feature of PDa for our application is the ability
eight hours. Active GPS antennas are readily available and have a to extend its functionalities by writing dedicated objects in C.
form-factor ideally suited for mounting on top of headphones. The
entire device was assembled in a standard electronics shielding 4.2.3 Custom C Code
metal-case and packed into a soft case for protection and user- Because of the limited set of objects and for the sake of
friendlier packaging (see Fig. 3). efficiency, PDa is used as a kind of framework within which to
run our own code. The first task is to obtain the coordinates from
the GPS receiver. Thankfully, the data from the module used by
the Gumstix is made accessible on a standard serial port. This
stream of data is parsed for the standard NMEA GPS reports to
obtain a new set of coordinates every second [15]. At startup a
database file is loaded into a simple data structure which contains
the map with the coordinates of all the points of interests and their
associated sound files and further information about global scaling
factors, reference points and envelope tables. With each new GPS
coordinate, this internal map is evaluated and the appropriate
commands are generated to control a very simple patch which
consists of four sound file players and a mixer.
170
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
moving outdoors with the device there is a brief waiting period device capable of real-time interaction with intelligent electronic
since the GPS receiver needs to locate and identify the satellites, music generation out into nature and to witness a musical
download the almanac and obtain a stable position fix. A printed expression and spatial sonic experience which would not be
map of the landscape, including point descriptions and possible any other way. Due to the complexity of all elements
information, is handed out together with the GPS device. The involved, the composition and topographical principles applied to
eight routes that make up the davos soundscape are clearly the music in the davos soundscape had to remain quite simple.
marked. In the real landscape, to facilitate orientation but more The platform’s computing power and the flexibility of the
importantly to leave a physical mark, stakes painted a bright software offer a much greater creative potential that remains to be
orange and bearing the logo of the davos soundscape are planted explored. Generative, algorithmic music and a closer integration
at all 86 points of interest. of the user through sensor technology are only some of the ideas
that come to mind.
Feedback from members of the audience clearly indicates that a
memorable sonic experience was presented. Of course not all of For future iterations of the piece, the software will be ported to
the music can be heard in one day and sometimes the participants one of the new commercially available GPS enabled devices
have difficulties to orient themselves within the multitude of running Linux such as the N810 Internet tablets by Nokia. With
elements present within the acoustic domain. Often the terms these devices the hardware constraints are resolved and since the
“treasure hunt” and “exploring new territories” are mentioned. software has already received its validation, location-based
The intention to enhance Davos’ sonic reality by overlaying the interactive music experiences can now be imagined in many other
natural acoustic environmental with electronic sounds is not forms.
always recognized. This might be largely due to the fact that we
have all been conditioned to filter out external sounds when 7. ACKNOWLEDGEMENTS
wearing headphones. Depending on the weather the individual I'd like to thank Marcus Maeder for his partnership in this creative
experiences can also vary. Satellite signals get disturbed by endeavor, Alejandro Duque for his expertise in all things Linux
certain atmospheric conditions; some people reported problems and the organizers and sponsors of the Davos Festival 2007 for
during thunderstorms and were clearly apprehensive to walk making this project possible.
around under such conditions wearing an antenna on their head!
8. REFERENCES
[1] http://www.davosoundscape.ch
[2] Galloway, A; Ward, M; 2006. Locative Media as Socialising
and Spatialising Practices: Learning from Archaeology
Leonardo Electronic Almanac, Vol. 14, Issue 3/4.
[3] http://library.nothingness.org/articles/all/all/display/314
[4] Gilles Deleuze, Félix Guattari, Mille Plateaux, Minuit,
coll. « Critique », Paris, 1980, 645 p.
[5] Umberto Eco; Opera aperta. Forma e indeterminazione nelle
poetiche contemporanee, 1962, Bompiani,
[6] Marc Leman, 2008, Embodied Music Cognition and
Mediation Theory, p. 52-53, the MIT Press, ISBN 978-0-
262-12293-1
[7] http://www2.gol.com/users/m8/installation.html
Figure 3. One of 86 markers in the landscape and a visitor
with the GPS-device and Headphones. [8] Gaye L., Mazé R., Holmquist L. E. Sonic City: The Urban
Environment as a Musical Interface NIME 2003, Montreal,
6. CONCLUSION AND OUTLOOK Canada, May 2003
Davos soundscape taught us some valuable lessons. Complexity [9] http://www.tacticalsoundgarden.net
emerged as the constantly challenging factor. It happened on a
technical level during the first phase when a lot of elements had to [10] http://www.hpl.hp.com/mediascapes
be assembled and problems sorted out, before even raw sketches [11] http://www.csl.sony.fr/items/2006/net_derive
of the planned features could be made and evaluated. Once the
[12] http://www.gumstix.com/
prototype for the device was functioning, the challenge of
imagining, structuring and composing music for a landscape [13] Geiger, G. PDa: Real Time Signal Processing and Sound
arose. As musicians we are clearly not trained to think in loose Generation on Handheld Devices, Proceedings of the
aural or temporal relationships and need to learn how to deal with International Conference on Computer Music 2003
a real topographical space as the stage for our music. The final (ICMC'03) Singapore, 29. Sept. - 4. Oct, 2003
most valuable lesson learned was never to underestimate the [14] http://gige.xdv.org/pda/
demands that a series of experimental devices make to be able to
run for an extended period of time without any attendance. This [15] http://gpsd.berlios.de/NMEA.txt
being said, it still seems an intriguing concept to be able to take a All URLs were accessed and verified in April 2008
171
Posters
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Permission to make digital or hard copies of all or part of this work for To promote wider use of OSC, we initially targeted a physically
personal or classroom use is granted without fee provided that copies are small, readily available, extremely low cost (USD $25)
not made or distributed for profit or commercial advantage and that hardware platform, the PIC18F2455-based “bitwacker”.
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
1
NIME08, June 5-7, 2008, Genova, Italy A database of OSC implementations and their features is
Copyright remains with the author(s). online: http://opensoundcontrol.org/implementations.
175
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The uOSC project source code, new developments, benchmarks point that is irrelevant to the affordable applications we have in
and details beyond the scope of this paper are documented mind.
online at http://cnmat.berkeley.edu/research/uosc.
3. FIRMWARE OVERVIEW
2. HARDWARE PLATFORM uOSC builds on the MCHPFSUSB firmware [13], an open-
source implementation of the USB control endpoint and a USB
2.1 Microchip PIC USB Full-Speed class-compliant serial port. The uOSC core program is triggered
uOSC runs on the popular and compact Microchip PIC18F by activity on the USB interface: receipt of the USB start-of-
USB-Full-Speed family of microcontrollers. The product line frame (SOF) packet from the host controller serves as an
spans chips from 20-80 pins, 10+ analog inputs, hardware isochronous 1000Hz timing beacon to which the firmware
modules for TTL, PWM, etc, 2-4Kb RAM, 8-128Kb ROM, and operations are synchronized.
CPU speeds of 12 MIPS. Many prototyping boards for these
devices are available for less than USD $100. The initial release
3.1 Device Clock
of uOSC specifically supports the Sparkfun Bitwacker, the The current time, relative to device initialization, is tracked with
CREATE USB Interface (CUI) [4]; and Olimex PIC-USB- a precision of 1 msec. The clock is incremented by the SOF
455x. (Pictured in Figure 1, ordered bottom-to-top). Microchip interrupt. Because this signal comes from the host controller,
provides a free C compiler (C18), an implementation of C the clock is not subject to any thermal drift or resonator
standard library and a comprehensive IDE. imprecision caused by the hardware. The clock is used for
bundle timestamping and scheduling.
176
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
SLIP is the recommended framing method for OSC encoding oscBundleOpenTimestamped(); // sends SLIP_END and packs time
oscMessageOpen(); // reserves 4-bytes at start for length
over stream-oriented transport such as TCP and has already oscPackROMString("/rb“);
been used for this purpose in the popular Make Controller Kit p_osc_tt = p_osc_message + 1; // pointer to typetags
oscPackROMString(",NNNNNNNN“); // final types unknown
by Making Things [http://makingthings.com]. for(i = 0; i < 8; i++) {
// invokes oscPackInt16ToFractionalFloat
// returns 'T' or 'F' for digital pins
4. ULTRA-LIGHT OSC PROGRAMMING *p_osc_tt++ = oscReportPin(i);
The small memory model, limited type support, and low clock }
oscMessageClose(); // prepends length, invokes CDCTxRAM
rate of the microcontroller imposes challenging limitations on // other messages are packed in here…
the implementation of an OSC library that is both full-featured oscBundleClose(); // sends SLIP_END and finalizes CDCTxRAM
and easy to create and understand. 5. LOW-COST FLOATING POINT
4.1.1 OSC as Binary Data Structure A widely adopted OSC convention also used by audio plug-ins
OSC implementations typically translate from the OSC binary is to scale control parameters to floating point values using a
message structure to/from an appropriately typed data structure conventional representation such as the unit interval. The
in the native format of the language along with encoding benefit of this abstraction became obvious for PIC18 family of
metadata. With only a few thousand bytes of memory to work microcontrollers as Microchip recently upgraded the ADC on
with, uOSC cannot accommodate this style, and so the some new variants from 10-bit to 12-bit—an integer encoding
programmer works directly with C pointers to a statically would require target-specific logic on the client side to
allocated buffer. Only one incoming message and one outgoing accommodate both ranges.
message are simultaneously processed. This style was Even though the PIC18 processor has no hardware FPU,
anticipated in the OSC specification with the mod-4 byte- Microchip provides an implementation of <math.h>, the C float
alignment rule and conservative native type support. type, and IEEE-754 compliant operations by software
emulation. Profiling of this code revealed that the cost of int-to-
4.1.2 Open-Ended Bundles float conversion (90 microseconds per conversion) was too
An important feature of an OSC bundle is that the total length great for use at the desired reporting rates.
of the frame is not encoded in the bundle header. This allows
uOSC to format bundles with multiple messages while only We therefore created novel special-purpose code for floating
retaining a single outgoing message in memory. In addition, the point conversion that is exact for integers up to 23-bits and is
number of responses generated by an OSC pattern dispatch does approximately 3 times faster than the general-purpose library.
not need to be known in advance.
5.1 Theory
4.1.3 Type Considerations We take the normalized target range to be the closed interval
The PIC18 is an 8-bit processor, so, for efficiency the use of 8- [0.0, 1.0]. This results in the conversion formula:
bit and 16-bit numbers is preferred. OSC uses minimum 32-bit y = x / (2n – 1)
numbers, so uOSC provides efficient routines to pack 8-bit and
For simplicity, suppose that n = 8. x is given in binary digits as:
16-bit numbers. uOSC also provides routines to pack low-bit
depth integers as normalized floating point fractions, and to x = x8x7x6x5x4x3x2x1
pack automatically padded strings from ROM or RAM data. where x8 is the most significant bit. Then, as a binary repeating
uOSC packs boolean data types using the ‘T’ and ‘F’ typetags, decimal:
which do not consume any space in the data section of an OSC
message. y = 0.x8x7x6x5x4x3x2x1(x8x7x6x5x4x3x2x1)…
The conversion to y attains the sufficient precision as x when
4.1.4 Push-down of the SLIP encoder the decimal expanded to the first repetition of the most-
The SLIP reserved characters have the two highest bits set significant-bit of x (the 9th fractional digit above). This bit
(ASCII characters >= 192). The bulk of the output data stream equals 1 when y >= 0.5, and 0 otherwise. Furthermore, a special
does not require SLIP encoding. For example, the SLIP encoder case applies when x = 2n - 1, y = 1.0 since by definition of a real
can remain inactive in cases such as OSC address patterns that number, the repeating binary decimal
are printable ASCII, bundle sub-message lengths, NULL- 0.11111111(111111111)… = 1.0.
padding bytes, and other bytes known to be strictly less than
192. 5.2 Conversion Algorithm
The calculation of y as IEEE-754 single-precision floating point
4.1.5 Input decoding state machine proceeds as follows:
The SLIP decoder must be active at all times. To avoid the
necessity to reexamine input bytes, the OSC parser is embedded 1. If x = 0, return 0.0. If x = 2n – 1, return 1.0.
inside the SLIP decoder. The SLIP decoder, in turn, is 2. Scan digits of x to find the index, i, of the most
embedded in the USB serial input handler, resulting in a third- significant non-zero bit. Requires O(log n)
order nested state machine. The OSC parser consists of bundle comparisons. If x > = 2 (n - 1) – 1 then least significant
start detection, basic sanity checks on the packet format, and bit of y (first repetition of most significant bit of x) is
pointer retention to the location of address, typetags, start and 1, else it is 0.
end of the data section. Any SLIP decoding error causes the
3. Compute the exponent as e = 127 – (n – i).
entire bundle to be discarded.
4. Left-shift x by (n – i) + 1 places to obtain mantissa.
4.2 Code Example 5. Composite the exponent, mantissa and least
The following example illustrates the programming style on the significant bit together to realize IEEE-754 format.
microcontroller using the ultra-light OSC implementation to Requires O(n/8) shift and or operations.
create a port report with 8 data values of variable type:
177
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The first byte is never SLIP encoded (sign bit is always zero). 7.1 Dispatch Table Structure
The last byte is SLIP encoded for y >= 0.5, otherwise the last The dispatch structure is a statically allocated tree structure
byte is zero. using the following data structure:
The inverse conversion similar algorithm is similar but also typedef struct _oscSchemaNode {
requires detection of denormal numbers and a rounding oscCallback target;
byte num_children;
operation. rom char* child_name[OSC_MAX_CHILDREN];
rom struct _oscSchemaNode* child[OSC_MAX_CHILDREN];
} oscSchemaNode;
6. OSC REPORTING
uOSC sends OSC packets reporting the current state of all pins Adding new method calls is simply a matter of inserting new
isochronously at intervals of two milliseconds. The reporting nodes into the root node.
itself consumes only approximately one millisecond of
processor time. The remaining time is used to handle other 7.2 Efficiency of Pattern Matching
device functions such as processing of incoming OSC The purpose of the OSC pattern syntax is primarily to enable
messages. Note that this doesn’t mean that there is 2msec of the client to compactly describe certain bulk and atomic
jitter in the measurements themselves -- their timing operations, not to provide a sophisticated search mechanism.
relationship to the 1000Hz USB-SOF beacon is precisely The OSC address pattern syntax is significantly less complex
known. An appropriately implemented host driver could than typical general-purpose regular expression languages.
achieve sub-millisecond timing precision. Specifically: 1. Patterns may not cross ‘/’ boundaries in the
address, 2. List matches do not support nesting or containment
6.1 Bundle Timestamps of other pattern operators, and 3. Character-class matches and
The bundle timestamp conforms to the NTP fixed-point format wildcard operators ‘?’ and ‘*’ are always greedy, obviating the
described in the OSC specification. The fractional part is need for backtracking. Therefore a pattern match is O(1) in
computed to a precision of 1 msec This is approximately 2-10, so memory.
a 16-bit integer is sufficient for the calculation. The fractional The set of possible matching addresses is finite, and for patterns
part is exactly zero at intervals of 1000 SOF interrupts, i.e., up to a set length, the total execution time to match is bounded.
there is no roundoff error accumulation. The integer part is a Furthermore, the dispatch process can leverage the nested
long integer, which is unbounded for all practical purposes. structure since child addresses cannot match if the parent fails
Since the host and microcontroller have a point-to-point to match.
connection the timestamp can theoretically be conformed to the
host computer’s best UTC approximation [2]. Our profiling shows that the cost of matching in uOSC is not a
cause for concern and in particular is not more expensive than a
6.1.1 Use of IMMEDIATE standard string comparison for the most common case of
Informational messages such as device firmware version, pin addresses that contain no wildcards.
capability reports, profiling and debugging information are not
time-sensitive and are encapsulated in bundles that use the 7.3 Scheduled Dispatch
IMMEDIATE timestamp (value: 0x0.0x1). When a received bundle has a timestamp in the future relative
to the device internal clock, the action of the packet can be
6.2 Efficient encoding of Port Reports delayed until the requested time. A bundle with a timestamp in
To save space in the data stream, sequentially numbered pins the past is discarded. This mechanism makes possible the
are grouped together in a single message called a port report. forward synchronization method for jitter compensation [5].
Each analog input pin is reported as a normalized floating-point The embedded processor has insufficient RAM to retain entire
number, OSC typetag ‘f’, requiring approximately 5 bytes. A packets for future processing so scheduling is limited to digital
pin configured as a digital input or output is reported as boolean pin writes, which are stored in a fixed length, insertion-sorted
using OSC typetag ‘T’ or ‘F’, requiring 1 byte of data space. A list.
pin that is not connected or in a reserved state (e.g., in use by a
hardware module) is reported as NULL using OSC typetag ‘N’, 7.4 Port Writes and Pin Aliasing
consuming 1 byte data space. The CNMAT OpenSoundControl The client can write to groups of pins organized in ports using
object for MaxMSP2 supports these types sensibly. the same format described in section 6.2. Individual pins can
For a port of 8 pins, the total size of the OSC message is 12-60 also be addressed using their specific addresses.
bytes, depending on current pin configuration. The same
number of pins encoded as separately addressed messages 8. DEBUGGING AND PROFILING
would require 96-160 bytes. Profiling is essential for code optimization. However, use of in-
circuit serial debuggers is known to be problematic for USB
7. OSC DISPATCH devices because associated interrupts are time-sensitive. Timing
An incoming OSC message is dispatched by matching its issues can also arise when using printf-style debugging over the
address pattern against a nested structure of path components TTL serial port.
and invoking the appropriate callback for each match. Full uOSC includes a microsecond-accuracy profiling system, and
support for OSC address pattern matching is implemented in when enabled by compile-time switch, timing of various
uOSC. operations are measured and reported periodically in
supplemental OSC messages. This solution has negligible
impact on the timing performance of the system.
178
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
variations apply for other boards because of the “user friendly” 10.2 uOSC via Serial Connectivity
design choice that parameters are named according to the silk The serial driver is known to contain some input buffering so it
screens on each development board. is expected that this data pathway will not be as fast as the
reference platform. Two variations on accessing the serial port
9.1 Port and Pin Messages data were tested:
/ra ffffFf : generates/accepts port-report format
/0 : individual pin control for /ra/0
/info : returns “dio”, “adc”, “pwm”, “ttl” 10.2.1 MaxMSP serial -> slipOSC
/state : returns “input”, “output”, “reserved”
/set : accepts “low”, 0, 0.0, or “high”, 1, 1.0 The built-in MaxMSP serial object is used to perform high-rate
/get : see /set non-blocking reads on the corresponding serial port. A custom
/1-5 (same as /ra/0)
/rb fffffFFF : /rb port report (8 pins) object, slipOSC, decodes the SLIP framing into OSC
/0-7 (same as /ra/0) “fullpacket” messages compatible with the CNMAT
/tx (same as /ra/0)
/rx (same as /ra/0)
OpenSoundControl object.
/status
/0 : controls the yellow status LED 10.2.2 py-serial to UDP
/set : accepts “off”, 0, 0.0, “on”, 1, 1.0
/get : return LED state In this configuration, a Python program reads the serial port,
/1 : controls the red status LED decodes the SLIP framing, and relays the resulting datagram to
9.2 Device Messages MaxMSP via the network stack as a UDP datagram.
/device
/platform : returns “Bitwacker”, “CUI”, etc. 10.3 Discussion
/firmware : returns “uOSC 1.0”
/processor : returns “PIC18F2455 Rev. B4” The py-Serial method is clearly the worst performer (Figures 2,
/ports : returns a list of port addresses 3), as is expected due to the extra layer of indirection.
/pins : returns a list of pin addresses
/id : user-writeable 64-bit hex string It is clearly possible to attain timing performance within the
/save 1 : commits pin and module state to flash
/reset 1 : restore default state
desired latency bounds for musical performance (~ 6-8 msec),
/modules however the jitter observed requires consideration. The use of
/list : list available modules
/enable s : enable a module
OSC bundle timestamping can be used to compensate for this
/disable s : disable module jitter, and will be an interesting topic for future work.
/pwm
/0 : control the first hardware PWM
/rate f : rate in Hz
/duty f : duty cycle in [0.0-1.0]
/ttl
/0 : control first hardware TTL
/open [baud, bits, stopbit]
/read : return string data
/write : write string data
/close
/usb
/stall : stall detected
/error : error detected
10.1.1 CoreAudio
The CoreAudio path copies the sensor data into a dedicated
audio channel, available directly as audio in Mac applications in
particular as a signal in MaxMSP. Since primary interrupts and
core audio threads are the highest priority operations in OS/X
no priority inversion occurs. This represents the highest
reliability operating system path for musical applications, and
Figure 3: Latency histogram under system load
has a consistent input latency of 4 msec and peak jitter of 0.7
msec corresponding to the gesture-input scan rate. 11. Sample Applications
10.1.2 /dev/osc The uOSC platform has been successfully integrated into
The /dev/osc path writes the sensor data into a UNIX-style several new music controllers developed at CNMAT [3]. The
character-device file which is read using standard file I/O compact size of various hardware platforms have also allowed
operations via the devosc object for MaxMSP (see Footnote 2). us to retrofit older MIDI and analog devices such as the Max
Although preemption can delay packet delivery to Max, only a Mathew’s radio drum and various foot pedals.
single context switch is required to read data. A more sophisticated sensor platform was constructed using
custom hardware module extension added to uOSC that makes
179
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
use of the SPI port and other pins to communicate with a 3-axis
magnetometer having a digital communication interface.
Combined with standard analog input capabilities of uOSC, a
compact, high-speed inertial measurement module was
constructed for research into spatial gesture tracking (Figure 4).
12. CONCLUSION
This paper describes the implementation of OSC, including its
advanced timing features and type support, for an embedded
microprocessor.
By including deadline scheduling and timestamping, uOSC
contributes to a large project now underway to implement solid
deadline scheduling in future multi-core desktop and handheld
device operating systems. [5].
The inclusion of end-to-end latency and jitter performance
benchmarking demonstrates the current results with USB-serial
in relation to a best-case reference platform, an analysis that the Figure 4: IMU+magnetometer hybrid sensor built on the
Bitwacker running uOSC, mounted on Sennheiser HD650
authors consider to be essential for the discussion and careful
analysis of any similar implementation.
15. REFERENCES
13. FUTURE WORK [1] Brandt, Eli; Dannenberg, Roger, Time in Distributed Real-
Measurements and tuning on a wide range of host platforms is Time Systems, in Proceedings of the ICMC (San
ongoing. We are exploring other USB device classes such as the Francisco, CA, USA, 1998) 523-526
CDC-ECM (Ethernet Control Model) and USB-Audio classes, [2] Freed, Adrian, Towards a More Effective OSC Time Tag
both of which can use isochronous endpoints having improved Scheme, in Proceedings of the OSC Conference (Berkeley,
reliability for real-time applications. CA, USA, June 30 2004)
As we release the source code we will support the community’s [3] Freed, Adrian, Application of new Fiber and Malleable
applications and participate in ports to other processors with Materials for Agile Development of Augmented
initial focus on PIC controllers with integrated Ethernet. Instruments and Controllers, in Proceedings of the NIME
The code structure of uOSC anticipates the desire to port to new Conference, (Genova, Italy, 2008)
microprocessor targets by isolating platform-independent code [4] Freed, Adrian; Avizienis, Rimas and Wright, Matt, Beyond
components. We are exploring the implementation of uOSC on 0-5V: Expanding Sensor Integration Strategies, in
the ATmega controllers employed on the Arduino and Wiring Proceedings of the NIME Conference (Paris, France,
platforms. These implementations rely on a separate USB serial 2006), 97-100
controller instead of using integrated USB. Therefore they
[5] Hayes, Brian, Computing in a Parallel Universe,
cannot implement different USB protocols. They are also more
American Scientist, Volume 95, Issue 6, 2007, 476-480
expensive and have slower performance than PIC18F systems
for time-critical applications. The Wiring platform, for [6] Overholt, Dan, Musical Interaction Design with the
example, has three different unconnected clock domains. Many CREATE USB Interface: Teaching HCI with CUIs instead
Arduino-compatible systems such as the Lilypad use the cheap of GUIs,, in Proceedings of the ICMC (New Orleans, LA,
integrated clock that is neither accurate nor precise. We have USA Juny 11 2006)
achieved some success using forward and backward [7] Romkey, J., A Nonstandard for Transmission of IP
sychronization on the host side to obviate these problems [8] Datagrams over Serial Lines: SLIP, RFC 1055,
but we strongly encourage designers of future physical http://rfc.net/rfc1055.html, 1988
computing platforms to carefully study these timing and
performance issues. [8] Schmeder, Andy and Freed, Adrian, Implementation and
Applications of Open Sound Control Timestamps, in
14. ACKNOWLEDGEMENTS Proceedings of the ICMC (Belfast, Ireland, 2008)
We gratefully acknowledge the financial support of Sennheiser, [9] Wessel, David and Wright, Matthew, Problems and
the pioneering implementations of Making Things by Liam Prospects for Intimate Musical Control of Computers,
Staskawicz, and to Dan Overholt who brought the integration Computer Music Journal, Volume 26, Issue 3, 2002, 11-22
advantages of the PIC processors to our attention with his CUI
[10] Wright, Matthew, The Open Sound Control 1.0
board.
Specification, http://opensoundcontrol.org/spec-1_0
[11] Wright, Matthew; Cassidy, Ryan J. and Zbyszynski,
Michael F., Audio and Gesture Latency Measurements on
Linux and OSX, in Proceedings of the ICMC (Miami FL,
USA, 2004) 423-429
[12] The Universal Serial Bus Specification Revision 2.0,
http://www.usb.org, April 27, 2000.
[13] MCHPFSUSB User’s Guide, DS51679A, Microchip
Technology Inc., 2007
180
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
181
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
182
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3.3 Controlling the Range 3.4.3 OSC Namespace for Ramping Properties
For integer, float and list nodes a range can be specified. Ramping properties are addressed using :/ramp/drive
This can be useful for setting up auto-scaling mappings from and :/ramp/function OSC name classes.
one value to another, or for clipping the output range. The The ramping case provides an example of a node class
clipping property can be none, low, high or both. The range which contains other node classes, as illustrated in Figure 2.
properties are accessed thus: As discussed in Section 2.5 information on all available ramp
:/range/bound :/range/bound:/get units or functions can be requested with the standardized
:/range/clipmode :/range/clipmode:/get :/catalog method. If the current function or ramp unit
183
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
contains additional parameters, the namespace of the unit [4] J. Malloch, S. Sinclair, and M. M. Wanderley. From
can be retrieved by the :/namespace method, while :/dump controller to sound: Tools for collaborative
returns the state of the node: development of digital musical instruments. In
:/ramp/drive :/ramp/drive:/get Proceedings of the International Computer Music
:/ramp/drive:/catalog Conference, Copenhagen, 2007.
:/ramp/drive:/dump [5] N. Peters, S. Ferguson, and S. McAdams. Towards a
:/ramp/drive:/namespace Spatial Sound Description Interchange Format
:/ramp/drive:/catalog (SpatDIF). Canadian Acoustics, 35(3):64 – 65,
:/ramp/function :/ramp/function:/get September 2007.
:/ramp/function:/catalog [6] T. Place and T. Lossius. Jamoma: A modular
:/ramp/function:/dump standard for structuring patches in max. In
:/ramp/function:/namespace Proceedings of the International Computer Music
For instance the user can control how often the sched- Conference, pages 143–146, New Orleans, LA, 2006.
uler RampUnit is to update by setting the :/granularity
[7] A. W. Schmeder and M. Wright. A query system for
property of the ramp:
Open Sound Control. Draft Proposal, July 2004.
:/ramp/drive:/granularity
:/ramp/drive:/granularity:/get [8] M. Wright. The Open Sound Control 1.0
The same principles apply to the function units used for Specification. Version 1.0. Technical report, Avaliable:
ramping. http://opensoundcontrol.org/spec-1 0, 2002.
[9] M. Wright. Open sound control: an enabling
3.5 DataspaceLib technology for musical networking. Organised Sound,
In addition to the current RampLib and FunctionLib, 10(3):193–200, 2005.
work has started on the implementation of a DataspaceLib. [10] M. Wright and A. Freed. Open sound control: A new
The DataspaceLib will enable nodes to be addressed using protocol for communicating with sound synthesizers.
one of several interchangeable measurement units. For ex- In Proceedings of the International Computer Music
ample a gain parameter can be set using MIDI, dB or linear Conference, pages 101–104, Thessaloniki, 1997.
amplitude depending on the context and preferences of the [11] M. Wright, A. Freed, and A. Momeni. Open Sound
user. The OSC representation of this will be implemented Control: State of the Art 2003. In Proceedings of
as a set of properties to the node. The DataspaceLib is also NIME-03, Montreal, 2003.
meant to offer mapping between more complex interrelated [12] M. Zbyszynski and A. Freed. Control of VST plug-ins
coordinate systems, so that e.g. Cartesian and spherical using OSC. In Proceedings of the International
coordinates can be used interchangeably for description of Computer Music Conference, pages 263–266,
points in space, as it was proposed for SpatDIF [5]. Barcelona, 2005.
4. DISCUSSION
As discussed in Section 2.1.1 other projects have also pro-
posed standardizing the means of querying values of OSC
nodes and the OSC namespace in general. They propose
syntax that differs or conflicts with the suggestions put for-
ward in this paper as well as each other. The authors call
on the OSC developer community to work towards a stan-
dardized query system to extend the current OSC 1.0 spec-
ification, resolving these conflicts in the process.
At the same time we would like to point out that the
proposal put forward in this paper broadens the scopes of
the Integra project and Jazzmutant OSC 2 proposals by
integrating a querying system with the notion of nodes as
classes. The proposal set forward in this paper could thus
be considered one step in the direction of a more object
oriented approach to Open Sound Control.
5. ACKNOWLEDGMENTS
The authors would like to thank all Jamoma developers
and users for valuable contributions, and iMAL Center for
Digital Cultures and Technology for organizing a workshop
where the issues presented in this paper were discussed.
6. REFERENCES
[1] T. Bray, J. Paoli, C. M. Sperberg-McQueen,
E. Maler, and F. Yergeau. Extensible markup
language (xml) 1.0 (fourth edition). Technical report,
W3C, September 2006.
[2] M. Habets. OSCQS - Schema for Open Sound Control
Query. System version 0.0.1, 2005.
[3] Jazzmutant. Extension and enhancement of the OSC
protocol. Draft 25 July, 2007.
184
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT These parameters are then used to drive the various forms
Many mobile devices, specifically mobile phones, come equip- of parametric synthesis. We believe that the microphone of
ped with a microphone. Microphones are high-fidelity sen- mobile devices is a useful addition to the palette of sensory
sors that can pick up sounds relating to a range of physi- interactions for musical expression.
cal phenomena. Using simple feature extraction methods, In recent years there has been intensified work to create
parameters can be found that sensibly map to synthesis al- sensor based interaction and parametric playback on mobile
gorithms to allow expressive and interactive performance. devices. Tanaka presented an accelerometer based custom-
For example blowing noise can be used as a wind instru- made augmented PDA that could control streaming audio
ment excitation source. Also other types of interactions [14] and ShaMus uses both accelerometers and magnetome-
can be detected via microphones, such as striking. Hence ters to allow varied interaction types [5]. Geiger designed
the microphone, in addition to allowing literal recording, a touch-screen based interaction paradigm with integrated
serves as an additional source of input to the developing synthesis on the mobile device using a port of Pure Data
field of mobile phone performance. (PD) for Linux-enabled portal devices like iPaqs [8, 7]. Ca-
Mus uses the camera of mobile camera phones for tracking
visual references and motion data is then sent to an exter-
Keywords nal computer for sound generation [12]. Various GPS based
mobile music making, microphone, mobile-stk interactions have also been proposed [13, 15]. A review of
the general community was recently presented by Gaye and
co-workers [6].
1. INTRODUCTION The microphone-signal as a generic sensor signal has been
Many mobile devices come with the intrinsic ability to used previously in the design of various new musical instru-
generate sound and hence suggest their use as musical in- ments. For example, PebbleBox uses a microphone to pick
struments. Hence there has been an increasing interest to up collision sounds between coarse objects like stones while
find ways to make interactive music performance possible CrumbleBag picks up sounds from brittle material to con-
with these devices. trol granular synthesis [11]. Scrubber uses the microphones
An important step in this process is the discovery of ex- to pick up friction sounds and sense motion direction [3].
pressive ways to interact with mobile devices. In recent Live audio based sound manipulation based on microphone
years, optical sensing, keys, touch-pads and various motion pickup is a known concept. It has for instance been used
and location sensors have been explored for this purpose. by Jehan, Machover and coworkers[9, 10]. The audio was
In this work we consider the built-in microphone of mo- driven by an ensemble mix of traditional acoustical musical
bile phones as a generic sensor to be used for mobile per- instruments. Microphones are also used for control in non-
formance. One reason for using microphones is that they musical settings. For example, they can be used to derive
are integral to any mobile phone, no matter how basic. It position via microphone arrays [2].
seems natural to integrate microphones into mobile phone
performance as well. 2. TURNING MICROPHONES INTO SEN-
To this end we implemented stand-alone recording (half-
duplex) as well as simultaneous recording and playback
SORS
(full-duplex) for MobileSTK for Symbian OS mobile de- A goal of the project is the broad accessibility of micro-
vices [4] . Using this implementation we explore ways to phone recording for engineers and musicians interested in
use the microphone away from the traditional use of direct mobile phone performance. Hence it was natural to consider
recording. Instead we want to use it as a generic sensor to extension of MobileSTK to include microphone recording.
drive sound synthesis algorithms in expressive ways. We For this, it was necessary to recover the audio recording ca-
discuss a few basic parameter detection methods to extract pability which already existed in the original STK [1] for
and give abstract representations to the microphone signal. Symbian OS, make it accessible in the MobileSTK context
and offer examples of use of the capability. MobileSTK pro-
vides a range of digital filters and synthesis algorithms in
C++ and the capability to interact with and play sound.
Permission to make digital or hard copies of all or part of this work for Mobile Operating Systems are still in a process of matura-
personal or classroom use is granted without fee provided that copies are tion, which adds some complications to the development of
not made or distributed for profit or commercial advantage and that copies applications for this platform.
bear this notice and the full citation on the first page. To copy otherwise, to The complete architecture can be seen in Figure 1. The
republish, to post on servers or to redistribute to lists, requires prior specific core to make this possible is allowing recording audio from
permission and/or a fee.
NIME08, Genova, Italy the microphone. Then the microphone data is processed.
Copyright 2008 Copyright remains with the author(s). The processed data can either be used directly as output or
185
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
1
*
0
2
control parameters. The next sections describe the details
3
Microphone
of each part of this architecture. input
Camera,
accelerometer, Synthesized
audio
2.1 Recording under Symbian other sensors
signal
Recording has been implemented for Series 60, 3rd edition
devices running version 9.1 of Symbian OS1 . The record- Processed
audio signal
ing class needs a CMdaAudioInputStream object for real-
time audio input streaming, and must implement the MMda- Digital
STK
signal
AudioInputStreamCallback interface. This interface in- unit
processing Control
cludes methods that are called when the input stream has generators
blocks parameters
been opened or closed, and when a buffer of samples has or input
been copied from the recording hardware. Reading the next
buffer of input audio from the hardware starts with a call
to the ReadL() function of CMdaAudioInputStream.
With this framework, half-duplex audio works simply by
Figure 1: Processing pipeline: Audio received by
ensuring that only one of the two audio streams—input or
the microphone passes through digital signal pro-
output—is open at any time. Full-duplex audio succeeds
cessing units. The output of these can be sent di-
if the output stream has been opened and is in use before
rectly to the speaker for playback, or act as control
the input stream is started. Experiments with the Nokia
parameters or input to STK unit generators.
5500 Sport phone yielded a delay of up to half a second
between audio recording and playback in full-duplex mode
within MobileSTK. It is also worth noting that the au- • Rewind() : Resets output index to 0 so that tick()
dio input buffer size is fixed for each phone. The Nokia starts returning samples from the beginning of the in-
5500 Sport and Nokia 3250 use audio input buffers of 1600 ternal storage buffer.
samples. Other Nokia S60 3rd edition phones use 4096-
sample buffers, while S60 2nd edition phones use 320-sample With this framework, using microphone input within Mo-
buffers. The buffer size may further differ for other mobile bileSTK involves creating a MicWvIn object and calling Open-
phone models. Mic(). Individual samples from the microphone can then
S60 2nd edition phones such as the Nokia 6630 use a be obtained via tick() and directed into a playback buffer
different full-duplex framework. Both the recording and or as input into processing and synthesis modules. A new
playback classes implement the MDevSoundObserver inter- version of MobileSTK, which contains these additions, is
face, which includes methods called when an audio buffer already available under an open software license.2
needs to be read or written. Both also have a CMMFDevSound
object. The recording and playback classes are run on dif-
2.3 Deriving control parameters from audio
ferent threads, and audio is passed between the two via a signals
shared buffer. As the older phones tend to have much less Audio signals are very rich in nature. They often contain
processing power, we focus on S60 3rd edition phones here. more information than is needed to identify certain physical
parameters. For example, loudness of an impact sound is
2.2 Additions to MobileSTK a good correlate of the impact strength, while spectral in-
Microphone input has been integrated into MobileSTK formation contains cues about the impact position [17, 16].
via a new MicWvIn class, which inherits from the WvIn class Separating these parameters and removing fine detail that
already in STK. MicWvIn acts as the interface between the does not influence the desired control is an important step
microphone and the rest of MobileSTK. As required, it in using the microphone as an abstracted sensor.
implements the MMdaAudioInputStreamCallback interface In many of our usage examples we are indeed not inter-
and contains a CMdaAudioInputStream object. In addition, ested in the content of the audio signal per se, but these
it holds two audio buffers. The CMdaAudioInputStream ob- more general physical properties that lead to the audio sig-
ject copies input samples from the recording hardware into nal. So a form of signal processing is necessary, which one
the first of these buffers. Samples from the first buffer are can think of as a simple version of feature extraction. Some
then copied into the second and typically larger internal relevant methods to do this for impact sounds (detecting
buffer, to be stored until they are sought elsewhere in Mo- impact moment, impact strength and estimates of spectral
bileSTK. content) have already been proposed in a slightly different
The interface provided by MicWvIn includes the following context [11]. Similarly [3] describes separation of amplitude
methods: and spectral content for sustained friction sounds. We im-
plemented the onset detection method from [11] to allow
• OpenMic() : Opens the audio input stream and starts impact detection.
recording. The buffer sizes and input/output modes Another relevant and interesting use of the microphone
can be set via this method. is as virtual mouthpiece of wind instruments. We take the
• CloseMic() : Closes the audio input stream. heuristic assumption that audio signal amplitude is a good
• tick() : Returns the next sample of audio, as read indicator of blow pressure; hence, arriving at an abstracted
from the internal storage buffer. This is inherited from pressure measurement means keeping a windowed-average
WvIn along with other ticking methods. amplitude of the incoming wave-form. This value is then
1
rescaled to match the expected values for a physical model
Resources on Audio programming for SymbianOS in C++ of a mouth piece as can be found in STK [4].
can be found at http://www.forum.nokia.com/main/
2
resources/technologies/symbian/documentation/ MobileSTK can be downloaded at http://sourceforge.
multimedia.html net/projects/mobilestk
186
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2.4 Mobile phone camera as tone-hole phone input through a simple onset detector and play a
A tone-hole in a conventional instrument is a hole that sound file from memory each time it detects an onset. The
is covered or uncovered to control the produced sound. To file played contains either hi-hat samples or a more pitched
allow tone-hole-like behavior, we read information from the sound. The value of the input signal’s amplitude envelope
mobile phone camera. We let the camera lens act as a tone- at the time of onset detection, relative to its value at the
hole by computing the average grayscale value of the camera previous onset, determines which file is played. Thus if the
input image. When this value drops below a threshold, we input grows louder from one onset to the next, we hear
estimate that the camera lens (or metaphorically, the tone- the hi-hat, while the pitched sound is played if it becomes
hole) is covered. We can also estimate degrees of covering softer.
by setting several different thresholds. This simple example can be extended in various ways.
While this technique has drawbacks when used against For instance, which audio is played, and how, could be de-
a completely dark background, it succeeds in most nor- termined by other features of the microphone input or by
mal, reasonably bright surroundings. It also sets the stage information from other sensors. It would be feasible to in-
for further ways for camera and microphone information to clude several more drum samples and create a mini drum
complement each other in creating a unified expressive mu- kit. The onset detection could also control other unit gen-
sical instrument. For example, more dynamic input like the erators. However, the perceptible delay between cause and
strumming gesture of guitar-plucking can be sensed by the effect in full-duplex mode may prove more challenging to
camera and combined with microphone input. the striking paradigm than to other interaction models.
187
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
it adds the current microphone input, so that one hears the Speech, and Signal Processing ICASSP), Las Vegas,
repeating loop as well as any noise one is currently making NV, April 2008.
near the microphone. This gives the performer a way to [3] G. Essl and S. O’Modhrain. Scrubber: An Interface
accompany himself in creating interactive music. Such an for Friction-induced Sounds. In Proceedings of the
instrument could also be modified to let the current micro- Conference for New Interfaces for Musical
phone input drive one of the instruments described earlier Expression, pages 70–75, Vancouver, Canada, 2005.
or be otherwise processed before reaching the output stage. [4] G. Essl and M. Rohs. Mobile STK for Symbian OS.
Another example, the Fast-Forward, uses the previously In Proc. International Computer Music Conference,
recorded audio samples along with camera input. In this pages 278–281, New Orleans, Nov. 2006.
case, the amount by which the camera lens is covered con- [5] G. Essl and M. Rohs. ShaMus - A Sensor-Based
trols the speed at which the recorded loop is played. If not Integrated Mobile Phone Instrument. In Proceedings
covered at all, the samples are played at normal speed. If of the International Computer Music Conference
the lens is slightly covered, every other sample is played, (ICMC), Copenhagen, Denmark, August 27-31 2007.
while if it is fully covered, every fourth sample of the pre- [6] L. Gaye, L. E. Holmquist, F. Behrendt, and
recorded segment is played. This example does not use full- A. Tanaka. Mobile music technology: Report on an
duplex audio at all, but allows the camera input to control emerging community. In NIME ’06: Proceedings of
playback in a way that would be difficult if the samples to the 2006 conference on New Interfaces for Musical
play were not recorded in advance. Expression, pages 22–25, June 2006.
[7] G. Geiger. PDa: Real Time Signal Processing and
4. CONCLUSIONS Sound Generation on Handheld Devices. In
Proceedings of the International Computer Music
We presented the use of the microphone of mobile phones
Conference, Singapure, 2003.
as a generic sensor for mobile phone performance. The fi-
delity and dynamic range, along with the types of physical [8] G. Geiger. Using the Touch Screen as a Controller for
effects that can be picked up via acoustic signals, make this Portable Computer Music Instruments. In Proceedings
an interesting addition to the range of sensors available in of the International Conference on New Interfaces for
mobile devices for mobile music making. The microphone Musical Expression (NIME), Paris, France, 2006.
is particularly interesting for picking up performance types [9] T. Jehan, T. Machover, and M. Fabio. Sparkler: An
that are not easily accessible to mobile devices otherwise. audio-driven interactive live computer performance
For example, the wind noise from blowing into the micro- for symphony orchestra. In Proceedings of the
phone can be used to simulate the behavior of a simple International Computer Music Conference, Göteborg,
mouthpiece of a wind-instrument, or just a police whistle. Sweden, September 16-21 2002.
At the same time the sensor also allows for other types [10] T. Jehan and B. Schoner. An audio-driven, spectral
of gestures to be detected, like striking. In addition, it analysis-based, perceptual synthesis engine. In
allows instant recording and manipulation of audio samples, Proceedings of the 110th Convention of the Audio
letting the samples heard in performance be directly related Engineering Society, Amsterdam, Netherlands, 2001.
to the venue. Audio Engineering Society.
The great advantage of microphone sensing in mobile de- [11] S. O’Modhrain and G. Essl. PebbleBox and
vices is their broad availability. While accelerometers are CrumbleBag: Tactile Interfaces for Granular
only just emerging in contemporary high-end models of mo- Synthesis. In Proceedings of the International
bile devices (Nokia’s 5500 and N95, Apple’s iPhone), micro- Conference for New Interfaces for Musical Expression
phones are available in any programmable mobile phone and (NIME), Hamamatsu, Japan, 2004.
offer signals of considerable quality. [12] M. Rohs, G. Essl, and M. Roth. CaMus: Live Music
One current limitation for interactive performance is the Performance using Camera Phones and Visual Grid
limited performance of current devices when using floating Tracking. In Proceedings of the 6th International
point arithmetic. This means that currently either all sig- Conference on New Instruments for Musical
nal processing has to be implemented in fixed-point or one Expression (NIME), pages 31–36, June 2006.
has to tolerate only somewhat limited computational com- [13] S. Strachan, P. Eslambolchilar, R. Murray-Smith,
plexity on processing algorithms. It’s very likely that this S. Hughes, and S. O’Modhrain. GpsTunes:
will change with the evolution of smart mobile phones. Al- Controlling Navigation via Audio Feedback. In
ready Nokia’s N95 contains a vector floating point unit, and Proceedings of the 7th International Conference on
the overall computational power is considerably higher than Human Computer Interaction with Mobile Devices &
the earlier Nokia 5500 model. One can expect this trend to Services, Salzburg, Austria, September 19-22 2005.
continue, making this limitation eventually obsolete. [14] A. Tanaka. Mobile Music Making. In NIME ’04:
Microphones offer yet another sensor capability that can Proceedings of the 2004 conference on New Interfaces
be used for mobile music performance and allow performers for Musical Expression, pages 154–156, June 2004.
to whistle, blow and tap their devices as a vocabulary of [15] A. Tanaka, G. Valadon, and C. Berger. Social Mobile
musical expression. Music Navigation using the Compass. In Proceedings
of the International Mobile Music Workshop,
5. REFERENCES Amsterdam, May 6-8 2007.
[16] K. van den Doel and D. K. Pai. The sounds of
[1] P. Cook and G. Scavone. The Synthesis ToolKit physical shapes. Presence, 7(4):382–395, 1998.
(STK). In Proceedings of the International Computer [17] R. Wildes and W. Richards. Recovering material
Music Conference, Beijing, 1999. properties from sound. In W. Richards, editor,
[2] H. Do and F. Silverman. A method for locating Natural Computation. MIT Press, Cambridge,
multiple sources using a frame of a large-aperture Massachusetts, 1988.
microphone array data without tracking. In
Oriceedubgs of the IEEE Conference on Acoustics,
188
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
189
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
190
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3. AUGMENTED GUITAR AND MAPPING to the x -axis, allowing the performer to play desired sections
The development of a mobile augmented guitar involves of the material with horizontal motions. This is interesting
several issues. One challenge is the delay between acqui- when harmonic progressions have been recorded, allowing
sition of gestures and related feedback. Wireless audio musicians to explore tonal or atonal accompaniment, as de-
streaming must be sufficiently fast to allow for real-time in- scribed in Section 4.
teraction and control. The 6ms of network delay, combined
with minimal OSC overhead, was regarded as sufficient dur-
ing experimentation.
191
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
are mobility and ease of use during play. Fortunately, experimental results demonstrate that our dy-
namically reconfigurable audio streaming protocol satisfies
4. MUSICAL PRACTICE both the timing and fidelity requirements for the demands
of musical performance.
Several body movements are easily achievable by the mu- The gesture-controlled phase vocoding features of our sys-
sicians while playing the guitar. For example the neck can tem suggest some interesting application possibilities, in-
be moved horizontally and vertically while the musicians cluding note matching for counterpoint techniques and mu-
can walk, run and jump in any direction. The Wii con- sical transcription, where sample-part selection is controlled
troller, attached to the guitar’s headstock, easily allows for by the user’s movements.
the detection of neck movements; it provides control of the Our experiences with this platform lead us to believe that
augmented guitar features while the instrument is being in the near future, a more capable mobile platform will ex-
played simultaneously. This allows a musician to experi- tend the range of pedagogical and artistic practice. This
ment with various forms of self interaction within solo or may open up a new range of interaction possibilities in such
group performance.3 Since performers are able to move creative areas as music, theater and dance.
around within the WiFi 802.11g range (100 meters), they
may interact with the surrounding environment in addi-
tional ways. For example, a composer can arrange the space 6. ACKNOWLEDGMENTS
with ambient light or moving objects, with which the per- The authors wish to acknowledge the generous support
former can interact. Multi-user interactions can also include of NSERC and Canada Council for the Arts, which have
IR LEDs or reflectors worn to augment group interaction. funded the research and artistic development described in
In terms of musical dialogues, the augmented system pro- this paper through their New Media Initiative.
vides some interesting techniques for an individual performer
that usually cannot be accomplished with a single instru- 7. REFERENCES
ment. We consider the possibility of three such dialogues:
[1] nSlam: TOT [Territoires Ouverts - Open Territories]
Website. http://tot.sat.qc.ca/logiciels nslam.html.
• Questions and answers: Samples can be recorded while
[2] Arduino. http://www.arduino.cc.
playing, and then pointing the guitar in a particular
direction replays the melody or sound in a different [3] F. Bevilacqua, F. Guédy, N. Schnell, E. Fléty, and
order and speed. N. Leroy. Wireless sensor interface and
gesture-follower for music pedagogy. In NIME ’07:
• Self-duo: The same technique as questions and an- Proceedings of the 7th international Conference on
swers, but pointing and simultaneously playing the New Interfaces for Musical Expression, pages
guitar allows one to perform counterpoint-like inter- 124–129, New York, NY, USA, 2007. ACM.
actions. [4] G. Geiger. PDa: Real time signal processing and
sound generation on handheld devices. In Proceedings
• Self-accompaniment: Recorded chords can be played of International Computer Music Conference
back using dance-like movements while simultaneously (ICMC), 2003.
playing guitar. [5] Gumstix. www.gumstix.com.
[6] S. Schiesser and C. Traube. On making and playing
The dialogues that allow for playing two melodies simul- an electronically-augmented saxophone. In NIME ’06:
taneously can be interesting in terms of tonal or atonal ex- Proceedings of the 2006 Conference on New Interfaces
perimentation. Particularly, when pitch shift is disabled, for Musical Expression, pages 308–313, Paris, France,
a musician can record a progression of chords or a melody France, 2006. IRCAM Centre Pompidou.
that can be used as the sample for the phase vocoder. The
[7] Z. Settel and C. Lippe. Real-time musical applications
musician can then experiment with various guitar-played
using frequency-domain signal processing. In IEEE
harmonies while triggering sounds from the recording using
ASSP Workshop Proceedings, 1995.
body movements.
[8] D. Wessel, M. Wright, and J. Schott. Situated trio:
An interactive live performance for a hexaphonic
5. CONCLUSIONS guitarist and two computer musicians with expressive
We presented the design and investigation of a mobile controllers. In International Conference on New
wireless augmented guitar, in which powerful control mech- Interfaces for Musical Expression (NIME), pages p.
anisms allow the user to control sonic events through sim- 171–173, Dublin, Ireland, 2002.
ple gestures while simultaneously playing the instrument. [9] M. Wozniewski, N. Bouillot, Z. Settel, and J. R.
Augmentation includes sample recording, looping, distor- Cooperstock. Large-scale mobile audio environments
tion and a gesture-based phase vocoding sample reader. We for collaborative musical interaction. In International
described our experience with the initial prototype, as well Conference on New Interfaces for Musical Expression,
as new musical practices that it supports, including gesture- Genova, Italy, 2008.
based self-duo and self-accompaniment. [10] A. K. E. Yang and A. T. P. Driessen. Wearable
The benefit of mobility, such as that provided by our sensors for real-time musical signal processing. In
small form-factor wireless system, is that its features can IEEE Pacific Rim Conference on Communications,
be used in classrooms or during rehearsals and do not de- Computers and signal Processing PACRIM, Aug.
pend on the technology available at a particular venue. 2005.
However, the computational demands for signal processing
necessitate, at present, the use of a remote computer for
generating the audio output. In such a context, remotely
computed audio raises the crucial issue of feedback latency.
3
Demo videos are available at
http://cim.mcgill.ca/∼nicolas/NIME2008/
192
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
Keywords
1. INTRODUCTION
2. COMPILER
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
NIME08, Genoa, Italy
Copyright 2008 Copyright remains with the author(s).
193
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3. PHYSICAL INSTANTIATION
194
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
loadbang table GhPdR 352838 loadbang loadbang t b b
1.00003e+06 1.5e+06 22050
read gpprog.txt read msprog.txt table dl 423360 read -raw 44 1 2 n plain.wav plain 15
s gpprog s msprog table SqBt 109189 unpack f f f
read -raw 44 1 2 n SqBt.wav SqBt
table gpprog 64 table msprog 64 1.5e+06 22050
table plain 607345 read -raw 44 1 2 n GhPdL.wav GhPdL tabread4~ SpDr 1.00003e+06 / 1000
/
read asprog.txt table SpDr 1.5e+06 read -raw 0 1 2 n DL.raw dl osc~ 0.04 / -
s asprog table GhPdL 352838 read -raw 44 1 2 n SpDr.wav SpDr osc~ 0.007 f f f (2 sec)
netreceive 9999 1 table asprog 128 table MjSt 7746 read -raw 0 1 2 n majorstrings MjSt *~ *~ pack f f pack f f pack f f
route d796F kick d7970 X Y Z gX gY table OCs 49000 read -raw 44 1 2 n OilCans16.wav OCs dac~ spacedrone predicted period (1 sec)
unpack 0 0 0 0 0 unpack 0 0 0 0 0 soundfiler r current-rate delay delay delay
0
footacc footrotate handacc handrotate t b b
0 45808 t f b / 2 f + 1 tabread asprog
t f bt f bt f bt f bt f bt f b t f bt f b timer * 1.9 delay
line~ (0 sec)
f f f f f f f f t f f f < sel 127 sel 0 mtof 0.6, 0.2 100
0 / 2 osc~ line~
- - - - - - - - spigot 0
r asv
t f ft f ft f ft f ft f ft f f t f f t f f clip 300 2000
sel 0 / 2 tabread msprog f + 1 delay
tabread gpprog *~
predicted footfalls
sel 0 0 line~
* * * * * * * * / 1 30-200bpm input f sel 0 0
mtof *~
+ + + + + t f b t b f
mtof
dac~
detected footfalls
error signal sel 0
t f f b b f t f f b b f f 97600 r gpv t b f
- 10000, 607345 33147 line~ 3.7e+06
t f f b b f / stop
f f line~ / 0, 352828 $1
0 f delay 0 *~
0 0
f * 0.3 t f f f b 0, 7746 $1
tabread~ plain dac~ line~ pad pattern |angular acceleration|
- * 0.99 * 0.99
0 f summator line~ osc~ 0.017 tabread4~ GhPdL
* 0.01 prop -
* 0.01 * 0.99
+ +
* 0.01
* 0.05 + f tabread4~ MjSt
osc~ 0.013 r current-rate |Jerk|
- * 0.3 *~ *~ t f b
0 r msv loadbang
f abs + first; dac~ tonedrone
- * 2
-
-
* 0.99
f abs +
+ backwards; + *~ line~
1
r handacc spigot r current-rate gyro_y
difference
f abs + 0.1 * 0.01 - dac~ 0 t f b b
* 0.99 0 spigot t f b b
t f f
* 0.99 / + 0
+ 0.1 * 0.01 clip 200 4000 0 > 5 < -5
random 2 random 8 gyro_x
* 0.01 s handacc * 2
/ + 0 * 2
+ f sel 1 sel 1
0 s handrot + +
+ 0.1 0 0
a_z
/ t b f f f + * 26460
0 * 27304
0 this area down here;
attempts to enforce phase? f 1000 s current-rate
* 16384 + 27303 + 26459 a_y
> 5 t f f 0 0
sel 1 0 0 1 + 16383 pack 0 0 pack 0 0
spigot 1 pack f 667 r handrot t l b t l b
a_x
timer timer f
spigot 0 line~ * 20 f f
0 0
delay 300 delay tabread4~ OCs + 40 line~
> > 0 stop line~
t b b loadbang *~ 0.3 tabread4~ SqBt tabread4~ dl
1
sel 1 sel 1 t b b
dac~
mtof
*~ r sbv
loudness (linear)
lop~ *~ r hpv
dac~ dl 2 line~ dac~ line~ dl 1
0
t f b f f t f b f f t f b f f t f b f f t f b f f t f b f f
t f b f f t f b f f t f b f f t f b f f
1000 1500 2000 2500 3000 3500 4000 4500
* 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > < * 0.01 > <
spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot spigot
f f f f f f f f f f
* 0.99 * 0.99 * 0.99 * 0.99 * 0.99
f f f f f f f f f f
(2 sec)
* 0.99 * 0.99 * 0.99 * 0.99 * 0.99
t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99 t f f * 0.99
+ + + + + + + + + + + + + + + + + + + + predicted period (1 sec)
- fX - fY + - fZ - fgx + - fgy - - - -
hX hY + hZ hgx + - hgy
t f b f + + t b f + +
10k<sum<150k + - 70000 r current-rate
(0 sec)
f something like 0<v<.6;
0 / 20000 t b f ghostpad v=constant-current-rate
- / 100000
so-called squishbeat during faster work.
/ 1000 clip 0.2 0.8 always running, but we'll bring it to varying;
0
clip 0 1
600
this might also be fctn of hY
predicted footfalls
clip 0 1 levels of prominence - / 1000 clip 0 0.6 $1 3000 s gpv
$1 3000
(proportional to total amount of activity) t b f1
$1 3000 s sbv * $1 3000 s msv these two are exclusive
s hpv play only as total activity decreases
- *
but only when gp;
$1 3000 s asv is on
detected footfalls
|angular acceleration|
|Jerk|
gyro_y
gyro_x
a_z
a_y
a_x
loudness (linear)
0
0 500 1000 1500 2000 2500 3000 3500 4000
4. EXAMPLE MAPPING
195
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
6. REFERENCES
5. FUTURE WORK
196
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
197
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
accelerometer based game controllers used here. Two significant on, nor do they have highly developed skills in this area. How
points of difference are, firstly, his application of a might musicians/technologists explore physical expressiveness in
parameterisation strategy based on an analysis of traditional extended ways?
instrumental methods of sound production (as opposed to say, the While there was no desire to privilege the physical, we felt it
natural causality of Chion's synchretic footsteps [3], or a musical important to short circuit the musician and technologist’s
structuring model grounded in a more general sonic typology tendencies to de-prioritise their body’s expressive range – to
[15]) and secondly, an interface-centric rather than body-centric create a different mindset from which to launch our investigation.
orientation towards control affordances. Moody et al. [10]
develop a mapping strategy for gestural control of an audio-visual Our working process began with free-form brain and body
system, which seeks to achieve synchresis between generated storming. We brainstormed possible uses of the sensor technology
audio and video. Their argument is relevant to the present work, before plugging in the Wii Remote and playing with it, to avoid
where we seek synchresis between observed performer gesture imaginations being tempered by knowledge of the device's
and generated synthetic sound. limitations, and to encourage working directly with the body as a
medium through which we could undertake our research.
The technical development of the sensor filtering and mappings Similarly, we created short physical vignettes without setting a
described below has been informed by a range of literature specific point of departure or other assistive limitations, thus
concerning gestural control of music [18], pragmatic forcing open engagement of the imagination to be linked directly
accelerometer-based motion analysis [4, 9], and more specifically, to the body from the outset. This process enabled the
accelerometer controlled synthesis [13, 17]. Much of the inertial musician/composer collaborators to familiarise themselves with
motion analysis literature is concerned with more elaborate extended physical expression and with varying levels of physical
sensing and filtering schemes than were applied here, however we proximity. It also formed a platform from which we could begin to
found the mathematical development in Ilmonen and Jalkanen's talk about movement.
system for analysis of conductor gestures particularly helpful [7].
For the duration of our working process, we made a point of
3. TECHNOLOGY OVERVIEW preceding discussions, brainstorming and ideation sessions with
Although we were interested in exploring a range of gestural input movement sessions. This allowed us to approach non-physical and
technologies, the Nintendo Wii Remote was settled on as a sensor physical tasks alike in a ‘physically ready’ gestural state.
platform for prototyping. The Wii Remote was chosen for
pragmatic reasons as it provided a wireless 3-axis accelerometer 4.2 The Approach
in an off-the-shelf package. As we were primarily concerned with Over the course of the residency we engaged in a range of
gestural input, only accelerometer data from the Wii Remotes was activities to develop our gesture sound mappings, continually
utilised. Masayuki Akamatsu's aka.wiiremote Max objects [1] striving to broaden the parameters within which we were thinking
were adapted to simultaneously convert accelerometer data from and working. We investigated ideas stimulated by the sensor
up to 6 Wii Remotes into an OSC data stream that was used as an technology such as how and where the Wii Remote could be
input to AudioMulch, where mapping and sound synthesis was placed on the body and what kinds of gestures it might be able to
performed. Mappings were developed using an embedded Lua sense and measure. We thought directly about sound – without
script interpreter running inside AudioMulch. limiting our ideas to the constraints of the technology; and worked
directly from a consideration of the body’s affordances and
dynamic capabilities. Throughout, we engaged in repeated
4. PROTOTYPING PROCESS ideation sessions, developed simple patches in response to ideas,
The schema for gesture≈sound prototyping arose out of a belief
and tried to understand what different choices afforded and what
that interweaving the development of sound and movement would
directions might be valuable for us to pursue. All of our
open up new ways of thinking about gestural sound performance
experiments were captured on video to enable ongoing assessment
and lead to gestural sound synchresis. We adopted a strategy of
and review.
minimal development – pursuing development in each modality
sufficient only to allow or provoke advance of the work as a Although we kept our attention on the technology, we remained
whole. We were thus prevented from falling back on known cautious that its demands not draw our focus away from other
methods and solutions, or staying in our comfort zones. The areas of inquiry. One method we used to counter this tendency
different modalities – sound, movement and technology – were was to vocally prototype our ideas so that we could discuss and
developed in tandem. A new vocabulary was allowed to emerge explore links between sound and movement without being limited
from our existing skills and the area of inquiry. Our approach by the technical constraints of the mapping process.
included ‘vocal prototyping’, discussed below and, while neither
extensively nor rigorously evaluated, resulted in each of us 4.3 Vocal Prototyping
working in new and unexpected ways, with positive outcomes. The aim of vocal prototyping was to challenge our usual ways of
thinking about movement and sound and to begin to understand
4.1 Moving Musicians the kinds of relationships we might make between them. Through
According to our criteria, a gesture controlled sonic performance this process we generated a substantial amount of material and
needs to engage the body of the performer in movement which made concrete steps towards formalising a gesture sound
incorporates a broad spectrum of physical expression. vocabulary. As outlined below, vocally prototyping ideas
Successfully engaging musicians and technologists in physical naturally flowed out of other approaches.
exploration can prove challenging, as typically they do not focus
198
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
We began by exploring a range of processes to develop PREPARED VIGNETTES: Take 10 minutes to compose a short
appropriate sounds. Working individually we identified sounds gestural/vocalised vignette that is then performed. Decide where
from the Freesound creative commons database [5], which we the sensor technology would be placed and how the data would
used as a basis for discussing and understanding the qualities of affect the sonic output. Experiment with the imagined placement
sonic space we each desired to create. This was followed by free- of technology – identical to the other person, mirrored, completely
form sound generation using the voice only; physical performance unconnected. Experiment also with possible sound output and
making sessions during which we vocalised sounds that were effects, performance relationships, etc.
suggested by movement; and free-form movement and sound While the above is not exhaustive it gives an indication of our
generation using the voice and entire body. approach in what we hope is a repeatable manner. As mentioned
previously the challenge was to find new ways of working with
and thinking about both sound and movement. ‘Vocal
Prototyping’ was found to be ideally suited to this task, it also
released us from the constraints of technology development. The
methodology was both rich and fecund.
199
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
200
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
201
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
As previously observed by Bahn et al. [2] performing with the Proceedings of 6th Eurographics Workshop on Virtual
whole body involves skills not always possessed by musicians – Environments, Amsterdam. 2000, 187-196.
some of the authors are now considering training in this area to [8] Langley, S. ID/i-o Website: http://www.criticalsenses.com
continue the research. Accessed January 31 2008.
Finally, the sensor technology employed so far has been adopted [9] Mizell, D. Using Gravity to Estimate Accelerometer
as a pragmatic prototyping aid. We are now considering options Orientation. In Proceedings of the 7th IEEE international
for smaller, wearable sensor platforms. Symposium on Wearable Computers. ISWC. IEEE Computer
Society, Washington, DC, 2000, 252.
8. CONCLUSION [10] Moody, N. Fells, N. and Bailey, N. Ashitaka: an audiovisual
The gesture ≈ sound experiments outlined in this paper represent, instrument. In Proceedings of the 2007 Conference on New
for the authors, a solid foundation from which to continue our Interfaces for Musical Expression (NIME07), New York,
research. While many questions remain unanswered, the process NY, USA. 2007.
has both provoked and supported new ways of grappling with the
problem of mapping gesture and sound. The importance of getting [11] Paine, G. Interfacing for dynamic morphology in computer
musicians to think through their bodies has been highlighted. By music performance. The inaugural International Conference
consistently approaching non-physical and physical tasks alike in on Music Communication Science, 5-7 December 2007,
a ‘physically ready’ and gestural state, our way of working, Sydney, Australia.
thinking and creating shifted dramatically. Our clear intent to [12] Riddell, A. HyperSense Complex: An Interactive Ensemble.
develop movement and sound mappings in tandem was central to In Proceedings for the Australasian Computer Music
our approach, and was integral to providing the outcomes Conference. Brisbane, Australia, Australasian Computer
presented here. Music Association, 2005.
In our search for gesture sound synchresis, we have established [13] Ryan, J. and Salter, C. TGarden: wearable instruments and
clear directions for ongoing research and an approach which augmented physicality. In Proceedings of the 2003
promises to support development of a diverse performance Conference on New interfaces For Musical Expression.
vocabulary. National University of Singapore, Singapore, 2003, 87-90.
[14] Simulus P5 Glove Developments. Website:
9. ACKNOWLEDGMENTS http://www.simulus.org/p5glove/ Accessed 31 January 2008
We gratefully acknowledge the support of STEIM1 for hosting [15] Smalley, D. Spectromorphology and Structuring Processes.
this residency. For their financial assistance we thank The In Simon Emmerson (Ed.) The Language of Electroacoustic
Australia Council for the Arts, The Australian Network for Arts Music, London, 1986.
and Technology, Monash University Faculty of Art and Design
and CSIRO Division of Textile and Fibre Technology. [16] Steiner, H. Towards a catalog and software library of
mapping methods. In Proceedings of the 2006 Conference
on New Interfaces for Musical Expression (NIME06), Paris,
10. REFERENCES France. 2006.
[1] Akamatsu, M., aka.objects: aka.wiiremote. Website: [17] Trueman, D. and Cook, P. BoSSA: The Deconstructed
http://www.iamas.ac.jp/~aka/max/. Accessed Jan. 31 2008 Violin Reconstructed, Proceedings of the 1999 International
[2] Bahn, C. Hahn, T. and Trueman, D. Physicality and Computer Music Conference. Bejing, China, 1999, 232-239 .
Feedback: A Focus on the Body in the Performance of [18] Wanderley, M. Gestural Control of Music. In Proceedings
Electronic Music. In Proceedings of the 2001 International International Workshop Human Supervision and Control in
Computer Music Conference, Havana. ICMA, 2001. Engineering and Music. Kassel, Germany, 2001.
[3] M. Chion. Audio-Vision: Sound on Screen. Columbia [19] Motion Analysis at Wiili.org. Website:
University Press, 1994. http://www.wiili.org/index.php/Motion_analysis Accessed
[4] Davey, N. P. Acquisition and analysis of aquatic stroke data 31 January 2008
from an accelerometer based system. M. Phil. Thesis, [20] Wilde D., Achituv, R. ‘faceClamps’ (1998). Website: http://
Griffith University, Australia, 2004. www.daniellewilde.com/docs/faceclamps/faceClamps.htm.
[5] The Freesound Project. Website: Accessed 31 January 2008
http://freesound.iua.upf.edu/ Accessed 31 January 2008 [21] Wilde, D. hipDisk: using sound to encourage physical
[6] Hahn, T. and Bahn, C. Pikapika - The collaborative extension, exploring humour in interface design. Special Ed.
composition of an interactive sonic character. Organised International Journal of Performing Arts and Digital Media
Sound, 7, 3 (2002), Cambridge: Cambridge University Press, (IJPADM). Intellect. 2008. Forthcoming.
229-238. [22] Winkler, T. Making Motion Musical: Gesture Mapping
[7] Ilmonen, T. and Jalkanen, J. Accelerometer-Based Motion Strategies for Interactive Computer Music. In Proceedings of
Tracking for Orchestra Conductor Following. In the 1995 International Computer Music Conference. San
1
Francisco, CA. Computer Music Association, 1995.
The Studio for Electro-Instrumental Music, Amsterdam, the
Netherlands. http://www.steim.org
202
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
203
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
multidimensional data, but to take in account the particular dimensions (the relation to the other three static limbs). Ideally,
characteristics of human-body gestures with special in this case we would generate a single location change in the
consideration of the human perception capabilities and its dimension space. In practice however, the individual dimension
tolerance against significant variations in different realizations. values would not change absolutely synchronous, resulting in a
It is an attempt to simulate the human ability to read and line of trace through the vector space, which is composed of
recognize a gesture as an abstract sign, by drawing attention to instable (elusive) states and is pointing from one stable state to
the temporal progression of the relational statistics among the other. This phenomenon would appear even more
selected body features. The key thesis that I was trying to work pronounced in the case of complex, full-body gestures. There is
with is that a human body-gesture can be sufficiently described a slight variation in the sequence as well as the actual presence
or abstracted by the trajectories of the inter-point (marker) of those instable states in successive realizations of the same
distance variations. gesture, which is why we need to define a radius of tolerance (a
cluster in the vector space) for each incoming state of the probe
1.2.3 Inter-point distance variation sequence. This radius was designed to exhibit a dynamic
A first version with 4 points (markers), which mark the ends of behavior, namely to allow for a specific degree of deviation
extremities (arms, legs) was completed, where the variation in from the currently compared state in the reference (exemplar)
distance between each pair of markers is taken as feature. By sequence, but simultaneously featuring indifference towards a
choosing this approach, we immediately get rid of the absolute specific “location” (the exact dimension) in which the deviation
coordinates in space and are not bound to a specific location or might occur.
orientation of the performer inside the tracking area. Four
markers would generate 6 distances between the markers,
respectively a 6 dimensional vector space for modeling the state
sequence that would classify a particular gesture. In this
situation, where we work with 4 markers, we achieve a 50%
dimensionality reduction: from 12 (4 * x,y,z coordinates) to 6
(inter–point distances), however this approach would not be so
effective once we increase the amount of markers.
204
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
1.2.7 Time warping avoid sonification of his actions (different from the recorded
Although the adaptive filtering component should foster the gestures) if not desired.
disaggregation of temporal clusters and an equal state density
distribution along gesture progression, there are still situations 1.2.9 Results and observations
where the proportional variations of individual gesture It is to say that the algorithm is still in development at this time
segments exceed the threshold of correct recognition. If the and all the constellations of different parameters were not
current state of the probe signal, for example, does not match extensively tested yet. The tests that we made up to now
the currently compared state of the recorded reference showed following results: Through careful tuning of the
sequence, neither its values would fit inside the probe-state algorithm parameters, it was possible to achieve around 80%
clusters tolerance radius, the incoming state vector is being correct identifications – (4 out of 5 identical gestures (including
passed on to a time warping function, which compares it against variation factors) were recognized to 100%). At the same time,
a certain neighborhood of states. If this function finds a match the inter- gesture discrimination was kept under 70%, i.e. no
in the values of the neighboring states of the reference more than 30% of a “false” (arbitrary) gesture were identified
sequence, it time warps the probe signal to it and updates the as one of the reference gestures.
index of the state that is to be compared next.
It is obvious that an approximation of a gesture through four
points on a human body is not very accurate and satisfactory.
Further, the concept of inter-point distance variation usually
does not discriminate between mirror-inverted gestures, etc. We
also discovered that it is possible to work with gestures of
varying complexity levels (from robotic to more fluent and
natural choreographies), but it is very important to maintain an
equal degree of complexity in all gestures that we want to
identify, since the algorithm tuning parameters depend strongly
on gesture complexity. The selected choreographic vocabulary
has to exhibit as much diversity between its single elements
(gestures) as possible and the algorithm parameters need to be
tuned according to it. However, if we take in account the
specific conditions and limitations of such an approach, we can
Fig. 3: Time warping the probe- to the reference gesture – still develop a well distinctive choreographic language /
image taken from [4] vocabulary that might even set of a new and unique – system-
conditioned aesthetic of movement.
1.2.8 Identification process
One of the intentions of this project was to blur the causal 1.2.10 Future work
relationship of movement and sound, as it is usually the case For now there is still a lot of testing and tuning work to be done
when we apply direct mapping between sensor data and with this particular approach. In the further development of the
musical parameters. However, the approach of generating gesture recognition system, I would still like to stick the basic
musical parameters via gestural cues should not restrain the principles of spatiotemporal quantization described in this
control data to discrete values emerging at the end of a paper, but to put more focus on the state-bursts (the temporal
successful completion of a predefined gesture. The goal was clusters described above) in the recognition process. Perhaps
rather to stick to the possibility of generating continuous output more reliable information could be gained, by disregarding the
data, but to restrain it to accompany only specific exact temporal progression of the state vectors, and by
choreographic material. Thus we are expecting to work with a analyzing the temporal progression of state clusters instead.
continuous output parameter describing the degree of Then the statistics of state occurrences in such a cluster would
completion of a particular gesture in real-time. The algorithm be compared to each other in different gesture realizations.
does not need to output probability values or to show a degree Since it was found out that the clusters mark the transition
of deviation from the temporarily observed state, since the points of gesture segments, they consequentially include all the
acceptable deviation limits are already integrated in the directional information of the preceding as well as the
clustering radius, the time warping function, etc. described following segment.
above. We are not interested in how strong a deviation really is
as long as it is inside a carefully chosen tolerance radius
considering an adequate inter- and intra-gestural discrimination 1.3 Feedback
/ tolerance. Each gesture in our prerecorded gallery has its own By moving through space, the dancer conducts actions in three
module, continuously monitoring the input feed. If the initial spatial dimensions plus one temporal dimension. A
state of a gesture is being detected, the attention is put to the fundamental part of the musical composition is the function that
next and so on, for as long as the break condition is not translates those actions to a two dimensional space (a time
exceeded. If this is the case, the algorithm stops tracking the varying amplitude (the audio signal)), and will undergo a
gesture and returns in the initial state to continue looking for detailed discussion later in the text. The dimension of amplitude
the beginning of the gesture again. As soon as it turns out that a refers to the (fast changing) electronic signal waveform
gesture is not the one we are looking for, the algorithm needs to corresponding to the sound being generated and projected. In
be ready to accept a new “candidate” sequence. Not all the addition to the sonification of the electronic waveform, which
incoming data needs to be assigned to a particular prerecorded produces an auditory feedback, the dancer is also exposed to an
gesture, and therefore we are not selecting the highest alternative instance of the same signal. This instance is the
likelihood among our reference sequences to match the probe (amplified) signal itself, in its primary (the electronic) domain.
sequence. Thus the dancer is able to provoke an expected sonic The connection with the dancer is established by a cable which
result by selecting its choreographic material in real-time and to he is holding in his mouth. This concept of direct electronic
205
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
signal-feedback was already applied and discussed in my earlier most important part however contributed the Institute for
compositions and interface designs [9], [10]. It enables the Electronic Music (IEM) – www.iem.at – in Graz (A), by
dancer / performer to experience an alternative impression of providing the facilities and technical infrastructure. Special
the induced sound. Since it is electricity we are dealing with Thanks goes to Dr. Gerhard Eckel and David Pirro for
here, the dancer would feel a pain with waveform (sound providing theoretical opinions and IOhannes Zmölnig for
amplitude) dependant intensity. Therefore, we need to be very technical assistance.
careful with the amplification of the signal in order not to
seriously harm the dancer.
5. REFERENCES
[1] Aylward R. and Paradiso J., “Sensemble: Awireless,
compact, multi-user sensor system for interactive dance”
Proc. of the International Conference on New Interfaces
for Musical Expression (NIME 06), Paris, France, 2006.
[2] Bevilacqua, F., Fléty, E., Lemouton, S., Rasamimanana,
N., Baschet, F. “The augmented violin project: research,
composition and performance report” Proc. of the
International Conference on New Interfaces for Musical
Expression (NIME 06), Paris, France, 2006.
[3] Bevilacqua, F., Dobrian, C. “Gestural Control of Music
Using the Vicon 8 Motion Capture System”, Proc. of the
International Conference on New Interfaces for Musical
Fig. 4: the dancer with the audio-output cable in her mouth Expression (NIME 03), Montreal, Canada, 2003
[4] Bevilacqua, F., Fléty, E., Guédy, F., Leroy, N., Schnell,
2. ARTISTIC CONCEPTION N. “Wireless sensor interface and gesture-follower for
In a dance performance, there are usually 2 elements (visual music pedagogy”, Proc. of the International Conference
and audible) that need to be arranged and put into a contrasting on New Interfaces for Musical Expression (NIME 07),
or harmonizing etc. context. The title “3rd. Pole” should New York, NY, USA, 2007
indicate the inclusion of a third, a haptic component contributed [5] Bevilacqua, F., Muller, R., Schnell, N. “MnM: a
by the electronic current running through the dancer’s body. He Max/MSP mapping toolbox “, Proc. of the International
is exposed to a situation where he is in absolute decision power Conference on New Interfaces for Musical Expression
and needs to consider and outbalance all three elements (poles). (NIME 05), Vancouver, Canada, 2005.
Like already mentioned, we have the induced sound
respectively its electronic abstraction, which is in direct contact [6] Bevilacqua, F., Cuccia, D., Ridenour, J. “3D motion
with the performer’s body. This enables a different corporal capture data: motion analysis and mapping to music”,
perception and interpretation of the caused sound, since now Proceedings of the Workshop/Symposium on Sensing and
the performer does not only have the audible but also a haptic Input for Media-centric Systems, Santa Barbara CA, 2002
reference - i.e. pain, caused by the electric current - for the [7] Brand, M. “Style machines”, In Proceedings of
choice of his following actions. Therefore, also the process of SIGGRAPH, New Orleans, Louisiana, USA, 2000
composition or better to say, the final arrangement of pre-
[8] Ciglar, M. homepage: http://www.ciglar.mur.at
composed material is only possible in real time, since we are
interested in an alternative arrangement of the choreographic [9] Ciglar, M. “I.B.R. Variation III.” Proceedings of the EMS
and musical progression, which is inspired by all three “poles” – Electroacoustic music Studies Network Conference –
together. A pre-composed form or sequence of events would Beijing, China - October 2006
not make any sense, apart from satisfying possible sadistic [10] Ciglar, M. “Tastes Like…” Proceedings of the ACM
tendencies of the composer. Multimedia Conference. Singapore, November 2005
[11] Jie Yang, Yangsheng Xu, Chen, C.S. “Human action
3. CONCLUSION AND FUTURE WORK learning via hidden Markov model” IEEE Transactions
The focus of this paper was rather on the interfacing concept on Systems, Man and Cybernetics, Part A, Jan, 1997.
and the interactivity of the system. A second major component
of this project besides the gesture recognition system was sound [12] Max/MSP programming environment
design, which was not discussed here at all. Those components http://www.cycling74.com/products/maxmsp.html
however are not bound to each other, so the project presented [13] Puckette, M. “Pure Data” Proceedings of the ICMC, 1996
here is not meant to be considered as a sealed (finished) entity. [14] Rabiner, L. R. and Juang, B. H., "An introduction to
It can be developed further independently in both, the artistic hidden Markov models," IEEE Acoust. Speech Sign.
(musical, choreographic) and/or technological domains. “3rd. Process. Mag. 3 (1986) 4-16.
Pole” is only a first manifestation of an artwork and stands for
one of many possible results that can be achieved in the future. [15] Vicon 8 motion capture system:
http://www.vicon.com/entertainment/technology/v8
4. ACKNOWLEDGMENTS [16] Wright, M. “Open Sound Control: an enabling technology
This project has been supported by STEIM – www.steim.org – for musical networking” Organised Sound, 2005/12/01,
in Amsterdam (NL), by offering me a residency, during which I Volume 10, Issue 3, p.193-200, (2005).
completed most of the sound design for this composition. The
206
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT mon for all these directions is that they mimic or model the
This paper describes a project started for implementing DJ speed manipulation of a turntable. To our knowledge, there
scratching techniques on the reactable. By interacting with are no commercial products that take advantage of using the
objects representing scratch patterns commonly performed above mentioned DJ playing techniques directly.
on the turntable and the crossfader, the musician can play On the software side, we have seen some attempts at using
with DJ techniques and manipulate how they are executed scratch techniques to simplify the process of sounding like a
in a performance. This is a novel approach to the digital DJ real DJ. For example, with Scratcher2 the user can manually
applications and hardware. Two expert musicians practised draw speed and amplitude envelopes and play them back,
and performed on the reactable in order to both evaluate the making scratch patterns on audio files. This opens the pos-
playability and improve the design of the DJ techniques. sibility of coming up with new techniques, to experiment
with the sounds or to compose music for the turntable. The
disadvantage of drawing envelopes is the lacking real-time
Keywords control for performance situations.
reactable, DJ scratch techniques, interfaces, playability Another path is seen in Skipproof,3 where scratch tech-
niques can be assigned to hardware or software controllers.
1. INTRODUCTION AND BACKGROUND Here, all the techniques are based on models derived from
analysis of real DJs’ movements [7]. The user affects the
It is well known that scratch DJs acquire very specific playback of the techniques by the action and gesture as-
skills and learn a more or less defined set of playing tech- signed, for instance can the speed of the scratch be controlled
niques. One recent example of formalizing the techniques by the effort of the player. Skipproof have been used in com-
can be found in the DVD by DJ Q-bert [6], one of the lead- bination with the Radio Baton, gesture sensors, MIDI de-
ing musicians in the field. In the DVD, about one hun- vices and computer input such as Wacom tablet. However,
dred different “scratches”, or techniques, are demonstrated. it has been desirable to treat the techniques as individual
These techniques are interesting for several reasons: They building blocks in a scratch performance.
represent a natural starting point for studying how turntable The presented work builds on the Skipproof application
musicians—or turntablists—play expressively, they define in combination with the reactable.
what a new (non-vinyl) DJ interface should manage, and
they offer an approach to perform complicated playing ges-
tures with simple actions.
The reactable instrument
Since turntablism peaked in popularity in the late nineties, The reactable is by now a well-known novel electronic musi-
many solutions for scratching and DJing without using vinyl cal instrument, with recent massive exposure in all kinds of
records and a turntable have surfaced. These are mentioned media, especially since the artist Björk gave it a pronounced
in several earlier papers, see eg. [2, 10, 13]. Such hardware position in her stage shows and compositions. It is a ver-
include, among others, the CD scratch decks (e.g. Pioneer satile instrument that works in a similar way as tools like
CDJ1000), time coded vinyl controlling sound files stored on Pure Data, Max/MSP or Reaktor. It was designed to meet
a computer (e.g. Final Scratch), software simulations (e.g. artistic and musical demands not catered for by other inter-
TerminatorX and FruityLoops), various “scratch pads” and faces,4 and it follows a well-defined principle for developing
jog wheels, and also controllers found on keyboards.1 Com- the behavior of its physical objects that are handled on the
table top [11, 12].
1
These are examples from the many emerging products. For Integration of DJ techniques on the reactable was started
a while back with the development of a few objects that
could provide some of the functions from Skipproof [1]. These
Permission to make digital or hard copies of all or part of this work for instance, CDJ1000 was not the first CD scratch player, but
personal or classroom use is granted without fee provided that copies are it represented a market break-through.
not made or distributed for profit or commercial advantage and that copies 2
http://web.ics.purdue.edu/˜afaulsti/skrasms/
bear this notice and the full citation on the first page. To copy otherwise, to 3
republish, to post on servers or to redistribute to lists, requires prior specific http://www.csc.kth.se/˜kjetil/software.html
4
permission and/or a fee. There are however a number of interfaces with a similar
NIME08, Genova, Italy approach; most of those are listed on the reactable project
Copyright 2008 Copyright remains with the author(s). website: http://reactable.iua.upf.edu/?related
207
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
were later combined with a different approach to scratching For the performance sessions, the musicians were left alone
by Dimitrov [4, 9]. Although these objects worked, some and undisturbed in a rehearsal room, and listened to their
improvements remained to be done, and a formal evaluation own performance through loudspeakers. The second perfor-
of the scratch functionality was needed. mance was videotaped. While practising, they had the pos-
Within the SID initiative5 the development that started sibility to get help and ask questions (however, they both
in 2006 could continue. Main aims were to get better scratch preferred to practise without help).
possibilities on the reactable, and to investigate how a trained For evaluating playability, the DJ and reactable player
DJ could interact with familiar techniques in new ways. Our answered to a questionnaire and an open interview following
method included to let musicians into the design loop and re- each of the performances, including the rehearsal.
ceive their feedback and expertise, and to let them evaluate The questionnaire was a modified version of the TAI-
the playability. Given the latency introduced by a system CHI evaluation questionnaire proposed by Bornand et al. [3]
based on video recognition, we never expected the system for testing two different interfaces. Our modified version,
to be responsive enough for performing real scratch gestures which added a few questions directly concerning the re-
comfortably. For this reason, our main focus was on the actable scratch objects, was not used to make comparisons
higher level of control and let the users control scratch mod- between interfaces (the reactable and standard scratch tools,
els instead of scratching directly. for instance), but rather improvement within-subject for the
performance sessions and differences between-subject.
The interviews tried to isolate specific problems subjects
2. METHOD faced while playing, or any other comment not accounted
For the reactable, the development of the objects is done for in the questionnaire. As a last part of the interview, the
in Pure Data (Pd) patches. This allows for fast prototyping subjects suggested possible improvements to the objects and
and can even be done in run-time. The video projection that their behavior. Between the two sessions, most of their sug-
is used to provide visual feedback on the table is also the gested improvements could be addressed and implemented.
user environment that is displayed on the computer screen,
called the virtual environment. Working with the objects in
real life (moving, twisting and turning them) is very different 3. RESULTS
than working with the objects in the virtual environment. The reactable objects developed in a previous phase of the
During the design phase, the virtual environment was used project [1], were only slightly modified for the first session.
for editing and simple testing. The new patches were then In addition to the existing sample player, the vinyl move-
tested on the real table by the developers, and parameters ment models object, and the crossfader movement models
were adjusted to correspond to the objects. object, we introduced a “manual crossfader” that changed
The underlying concept of the scratch objects on the re- from on to off when moved, and a second sample player that
actable is that some of the patterns that DJs play on their used a different playback function. After the first session,
turntables and crossfaders are used as control models that the functionality of these two new objects were integrated
are triggered and manipulated by new gestures and actions. in the crossfader object and sample player, respectively.
The mapping between gestures and control is the most crit- As the development followed an iterative process, the re-
ical part of the design process. By assigning control prop- sults from the first session of evaluation naturally affected
erties and behavior to physical objects and by making con- the objects used in the second session. Not only the parame-
nections to sample playback functions, we came up with a ters were adjusted between the sessions; even the functional-
totally new method of “scratching”—and as a consequence, ity and mapping were changed and improved. The following
with new conventions for playing. We made an effort to section describes the state of the objects at the final stage.
respect the reactable principles in designing the objects, al- One important improvement between sessions was achieved
though we needed to make some compromises. by moving to a new reactable software version that increased
Given a virtually endless number of possible functionali- the time resolution of the video recognition from around
ties and mappings to test, a few were settled on. 30 fps to 60 fps.
Since there are only a handful of reactable instruments
and the number of performers is accordingly very limited, 3.1 New reactable objects
we decided to use two experts for testing. One was a pro- Figure 1 shows the three different objects active in the
fessional reactable musician, and the other an experienced virtual environment. The Loop player object plays back an
scratch DJ. By using experts from different fields, we aimed audio file and has visual representations of the track progres-
at highlighting important aspects of DJ and reactable do- sion (ii) and sound level (iii). A wave form (i) is “travelling”
mains respectively. from the object towards the Out. The Crossfader object
Two sessions were arranged for each subject. In the first applies a crossfader movement pattern (B) to the sound,
session, a 30 min rehearsal was followed by a 45 min per- resulting in the chopped-up sounds typical for scratching.
formance, while the second session only had a 45 min per- The sound level the Loop player will get (graphically rep-
formance. Some tasks were given, for instance to explore resented by A) moves out from the Crossfader object. The
all objects, to perform scratching with and without backing Movement speed object changes the sample player’s speed
music and to try beat juggling (another common DJ tech- in some defined patterns (3) and beat subdivisions (2) syn-
nique), but the subjects disposed the time as they wished. chronized with the current bpm of the table.6 The speed
alteration enforced on the audio playback is shown with (1)
5
SID is the acronym for COST IC0601 Action on Sonic In-
6
teraction Design, http://www.cost-sid.org. The presented There is a global bpm object in the normal reactable setup
work is the result of a Short Term Scientific Mission of two that is not included in Figure 1. The metronome is visual-
weeks granted by SID, reported in [8]. ized with a wave (a) propagating from the Out.
208
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Crossfader object ject, it was found that having too many patterns was con-
fusing, and a few were chosen, including the chirp, the flare
Loop player object
A and a muting of the vinyl’s return movement.
iii Also for the crossfader object, a manual mode was in-
cluded where the sound was constantly on or off. Moving
ii
B the object would produce a very short silence or burst of
sound, like moving the crossfader between the fingers.
i 3.2 Evaluation
1
a (..........a)
3 Out
2
209
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
with turntables. With proper training, as seen in the evalu- Martin, Günter) and the MTG for all help.
ation, performances will improve greatly. Also thanks to Smilen Dimitrov, who worked alongside
Musically, the most interesting result could probably be this project with the implementation of physics-based mod-
found in the meeting between a skilled DJ and the easy els of friction sounds for both reactable and scratching.
access to the techniques normally used, with the added di- This work was sponsored as a Short Term Scientific Mis-
mension of manipulating their parameters in real-time with sion by COST IC0601 Action on Sonic Interaction Design
unfamiliar means. (SID), http://www.cost-sid.org, and by BrainTuning FP6-
2004-NEST-PATH-028570.
4. DISCUSSION
This is the first major test of performing with scratch
6. REFERENCES
techniques, or the combination of synchronized crossfader [1] M. Alonso. Scientific Report from ConGAS Short
patterns and oscillating movement patterns. The advantage Term Scientific Mission (STSM) to Stockholm.
of this approach, as compared to modeling the turntable and Technical report, ConGAS Cost action 287,
mixer, is that even non-experts can perform rather intricate http://www.cost287.org/documentation/stsms,
and correct techniques without much practise. Results from October 2006.
the evaluation show that the non-expert felt confident with [2] T. H. Andersen. In the Mixxx: Novel digital DJ
playing with the models. Given some more development, interfaces. In Proc. of CHI, pages 1136–1137, 2005.
this approach can provide realistic-sounding scratching for [3] C. Bornand, A. Camurri, G. Castellano, S. Catheline,
various types of interfaces. A. Crevoisier, E. Roesch, K. Scherer, and G. Volpe.
Today, the reactable is mostly used for either beat-based Usability evaluation and comparison of prototypes of
or more freely structured experimental electronic music. Vi- tangible acoustic interfaces. In Proceedings of
sually, the performances are very exiting as the blocks on the ENACTIVE05, 2005.
table creates dynamically changing images. To see it being [4] S. Dimitrov. Scientific Report from ConGAS Short
played in traditional DJ-style demonstrates another side of Term Scientific Mission (STSM) to Stockholm.
the graphical feedback, where the visualizations aid the mu- Technical report, ConGAS Cost action 287, March
sician in performance. Traditionally, turntablists mark their 2007.
records with stickers or notice the position of the center label [5] S. Dimitrov. Scientic report from SID Short Term
to find the right spot in the music. Here, we experimented Scientic Mission (STSM) to Barcelona. Technical
with other representations, and both subjects were able to report, MTG, UPF, 2008. Online: http://www.cost-
use them to their advantage. sid.org/browser/action/stsm/reports/.
For real virtuoso playing, the reactable implementation of [6] DJ Q-bert. Scratchlopedia Breaktannica: 100 secret
scratching cannot match real turntables, but by manipulat- skratches. DVD: SECRT001-DVD, 2007.
ing the models the musician can on the other hand effort- [7] K. F. Hansen. The Basics of Scratching. Journal Of
lessly go beyond what is normally possible to accomplish, New Music Research, 31(4):357–365, 2002.
for example very fast scratches.
[8] K. F. Hansen. Scientic report from SID Short Term
After the evaluation (and unscheduled jam session), the
Scientic Mission (STSM) to Barcelona. Technical
two musicians and the reactable team suggested a number
report, MTG, UPF, 2008. Online: http://www.cost-
of improvements to the objects. For the main part, the sug-
sid.org/browser/action/stsm/reports/.
gestions involve making the interaction smoother and easier,
[9] K. F. Hansen, M. Alonso, and S. Dimitrov. Combining
not changing the way they are designed.
DJ scratching, tangible interfaces and a physics-based
Working with the reactable has proved to be a helpful op-
model of friction sounds. In Proc. of the International
portunity for testing how specific playing styles and musical
Computer Music Conference, 2007.
ideas can be transferred to unfamiliar interfaces. The fast
and easy means for prototyping and testing have many ad- [10] K. F. Hansen and R. Bresin. Mapping strategies in DJ
vantages. High latency and slow response time, determined scratching. In Proc. of the Conference on New
by the frame rate and processing of the video image, might Interfaces for Musical Expression, 2006.
pose a problem. For manipulating techniques and patterns [11] S. Jordá, G. Geiger, M. Alonso, and
this was not troublesome, but for more direct manipula- M. Kaltenbrunner. The reacTable: Exploring the
tion of playback speed and amplitude, the instrument was synergy between live music performance and tabletop
as foreseen far too slow for expert performances with our tangible interfaces. In Proc. of the first international
implementation. conference on ”Tangible and Embedded Interaction”,
As mentioned in the Introduction, there was also a related Baton Rouge, Louisiana, 2007.
project by Dimitrov, who connected the reactable scratch [12] M. Kaltenbrunner, S. Jordá, G. Geiger, and
objects to physics-based models of friction sounds [9, 5]. M. Alonso. The reacTable*: A collaborative musical
Although not tested extensively, it was clearly possible to instrument. In Proc. of the Workshop on ”Tangible
use friction models instead of sampled audio as sound source Interaction in Collaborative Environments” (TICE),
for scratch patterns. at the 15th International IEEE Workshops on
Enabling Technologies, 2006.
[13] T. M. Lippit. Turntable music in the digital era:
5. ACKNOWLEDGMENTS Designing alternative tools for new turntable
We are very grateful for all the time and effort that the two expression. In Proc. of the Conference on New
dexterous musicians Lele and Carles put into the project. Interfaces for Musical Expression, 2006.
Thanks to the whole reactable development team (Sergi,
210
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
211
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
sense, the Reactable does not exhibit ’acoustic’ behaviour the objects, which could then be mapped to a sound param-
in a physical sense; however, due to the tactile nature of the eter. In principle, a contact friction sound is perceptually
way to interact with it, one can easily imagine a related - noisy, and thus it could be generated through variuos sources
physical - set of table and objects, made of a rough material. [1].
In this ’rough physical’ case, gliding of the objects upon the A Pure Data real-time implementation of a physical model
table surface would produce a contact friction sound, which of frictional interaction between dry surfaces was available,
lasts as long as the objects are in motion. To create an anal- which has already been described in [12, 13, 11]. It was
ogy with the real world: assuming that objects and table are decided that this friction model could be used as a sound
made of, say, wood - it is easy to conceptualize that to pro- generator for contact friction - especially in those ranges
duce significant amount of sound from this system, would where high forces and low velocities would be involved. Due
require both a significant amount of force, and specific mo- to the limited duration of the STSM visit, only a design and
tions, from the player. For the purposes of this paper, we implementation of a prototype object was initally planned,
will name such motions ’block movements’. to be followed by an expert user evaluation.
2. METHODOLOGY
The work of developing objects that would demonstrate
block motions on a Reactable, was made much easier by the
efforts of the Reactable team, who provided a working and
fully compatible standalone Reactable simulator for Win-
dows, with an audio engine based in PureData (Pd) [10].
The simulator renders the visual representation of the Re-
actable objects on screen, and allows these representations
to be manipulated through the GUI - as on a real Reactable.
Patches developed on the simulator can then be ported and Figure 3: Reactable simulator showing the second
tested on a real Reactable. As it was relatively easy to build proposal for a mapping strategy to connect the Re-
upon existing objects for inheritance of the user interaction, actable to the friction model.
most of the work consisted of audio programming in Pd.
Since one of the defining high-level characteristics of block This first proposal contradicted with common mapping
motion sound seems to be the relationship between sound strategies used in the Reactable, where frequency is related
volume and velocity of the objects on the table, it was de- to rotation of an object. Moreover, the relatively slow cam-
cided that the main parameter obtained from the objects era used for tracking, created some differences in response
(besides the standard parameters), would be the velocity of between the simulator and the tangible interface. Therefore
212
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
a second mapping strategy was investigated, as shown in Additionally, video recordings were taken from some of the
Figure 3. In this second strategy, the parameters mapped development tests; these, along with a development log, were
in the exciter object were switched, the constant velocity posted online [4].
was mapped to a finger parameter and force was mapped
to rotation. The second prototype was further improved, to
produce the strategy shown in Figure 4.
4. DISCUSSION
4.1 Reactable as a development platform
As mentioned previously, as the Reactable (both real and
simulator) has an interface to Pd, the easiest way to create
Figure 4: Reactable simulator showing the third additional audio capabilities for it, is by creating plugins
proposal for a mapping strategy to connect the Re- for Pd. From a perspective of a new Reactable object de-
actable to the friction model. veloper, possibly the only glitch in the engine could be the
current impossibility to set the so-called ’finger’ parameters
The implementation of the SkipProof (a DJ scratching of Reactable objects directly from Pd (as it is possible with
application and virtual turntable developed by Hansen and the ’rotation’ parameter of the objects, for instance). Other-
others at KTH [5]) engine as a set of DJ scratching objects wise, it is relatively easy to develop the auditory behaviour
for the Reactable [2] was also furthered. As interfacing be- of new objects using a Reactable simulator locally.
tween the friction physical model and the SkipProof engine In our experiments we used a vision tracking system work-
was attempted as a part of a previous STSM visit [3], it was ing at 60 fps. This created some problems with motion blur
attempted again - this time as an experiment in the context during fast motions. For future experiments, the motion
of Reactable objects. blurring and low (in audio terms) 60 fps framerate must
The original intent to develop a single block-motion ob- be taken into account - especially for objects that are to
ject, changed soon after deciding to take upon the friction be moved in a faster, linear manner across the table. This
model as a sound engine base - as also in frictional interac- proved to be a major difficulty in implementing a motion-
tions which happen in the real world, we can observe situ- based object, as for faster linear motions the system failed to
ations where several objects interact, but only one of them detect the object, and the corresponding control signal used
is the primary sound source. in audio was interrupted. Some measures were attemped
Hence, the goal was extended with development of a pro- to overcome this, which were not succesful - which finally
totype of a set of objects for the Reactable, where one would resulted in the not-so-extatic evaluation of these Reactable
represent the interaction control, and the other would repre- object prototypes.
sent the source. As an analogy to bowed string instruments, Here, the framerate issue had to be taken into account for
we can consider these objects as a ’bow’ interacting with a both the exciter object, and the single friction object, whose
’string’. average velocity of motion across the table was used to de-
rive a bow velocity signal. For these objects, accumulation
of signal values, undersampling and linear smoothing was
3. RESULTS attempted to overcome the sudden change of values (dur-
The main results of the study visit are the production of ing video tracking blurring). This, however, didn’t prove to
prototypes of two sets of Reactable objects, and their pre- be efficient; averaging and low pass filtering in audio sig-
liminary (and informal) evaluation by an expert Reactable nal domain, would possibly be a much better approach to
user. overcome these problems. On the other hand, one can try
As a first set of experiments, we tried to emulate the sound and avoid linear motions when designing interaction, and
of a bow exciting a violin string. The second set is a single replace them with rotatory ones - as was suggested by the
object intended to simulate the sound of surface friction of expert user. Although, it is important to note that the Re-
moving objects in contact. actable team currently works on overcoming these problems
213
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
with the vision input, and at some point in the future, such European Cost action 287 and SID, European Cost action
problems could become minor. 601 respectively.
6. ACKNOWLEDGMENTS
The first author would like to thank researchers at the
Music Technology group at UPF for welcoming him for the
STSM and for providing useful input in the development of
the application. The STSMs were sponsored by ConGAS,
214
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT (2)
215
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
L, will depend on the sampling rate, e.g. 6 (x, y, z) triplets (a) Incoming signals from Wii controller (5)
at 50 Hz for a gesture that lasts 120ms. The user triggers 100
the capture of a template or reference gesture by pressing a
button (‘A’) on the controller at the end of the movement1 . 0
X
L 0
A·B = Ax (i)Bx (i) + Ay (i)By (i) + Az (i)Bz (i), (2)
i=1 −1
0 50 100 150 200 250 300 350 400
that is, a sum over the L samples and the three dimensions.
The gesture is detected when the distance drops below, or Figure 2: Analysis of the signal for a repeated gesture.
reaches a minimum below, a given threshold, as shown in The reference gesture was taken from the begining and is
fig. 2(b). visible where the distance drops to zero. Suitable thresh-
olds for detection are shown as black horizontal lines.
2.2 Second method: cosine similarity
The cosine of the angle between the reference vector and
where the Euclidean distance drops to zero. As it is shown
the input vector can be computed by taking the dot product
in figure 2, repetitions of the same gestures are not iden-
and dividing by the norms of the two vectors:
tical, therefore the threshold for detection must be larger
Vr · Vi than 0 or less than 1 for the two methods respectively. The
C= √ √ , (3)
Vr · Vr Vi · Vi cosine method, being invariant to overall magnitude of the
accelerometer signals, is able to recognize the reference ges-
using the same definition of the dot product as before. It is ture even if it is performed at a larger scale, as long as it
1 when the vectors are parallel, i.e. the gestures are identi- has the same duration.
cal up to an arbitrary scaling factor. Thus, we can detect Both methods are quite sensitive to the choice of reference
gestures similar to the reference by looking for peaks in the gesture and the thresholds, but in this case we were able to
cosine above a certain threshold, as shown in fig. 2(c). find parameters that gave successful detection of all 45 in-
stances of the reference gesture using the cosine methods,
2.3 Discussion and 44/45 using the Euclidean distance measure, with no
Supervised recognition, in both cases presented above, false positives. We were also able to use the results of the
seems to be an appropriate method for the definition of pre- initial run to construct a better reference gesture by averag-
cise gestures. By focusing on one gesture at a time, we are ing all the previously detected instances. This gave perfect
able to repeat a movement several times until the vibration results using both methods.
produced (as a result of the recognition) arrives at the mo-
ment it is expected. Moreover, the issue of latency due to
the various processing steps can be addressed. A gesture can 3. UNSUPERVISED METHOD USING
be recognized before it is finished as long as its initial frag- INFORMATION DYNAMICS
ment can reliably be recognized in advance. In our case, we The above supervised method requires two distinct pieces
observed that initial fragments of more than 80ms are usu- of information to recognise a gesture in a timely way: one is
ally distinct enough not to be confused with other gestures. the reference gesture with its label and the other is the indi-
If we increase the ‘anticipatory lag’ by choosing a gesture cation of the particular time point, relative to the reference,
template from an initial fragment that ends well before the at which to respond to the gesture. This can be thought of
end of the gesture, the haptic feedback can be triggered at as a mark indicating the ‘perceptual centre’ of the gesture
the time the performer expects, but on the other hand, the (see fig. 3).
detection is less reliable. The number of entries of the con- Though in some applications it may be possible to inter-
stituted database is also an important factor in the overall leave the training phases with the performances phases, as
error rate. we did in the system described above, in other applications
We chose to analyse a regular, repeated movement, con- it may not be possible for the person or system creating the
sisting of a cycling through three hand movements, visible gestures to provide this extra stream of information stating
as the large peaks in fig. 2(a). One of these movements, ex- that ‘this is gesture A’, ‘this is gesture B ’, and so on. For
tracted from near the beginning of the signal, was taken to example, a dancer’s movements might be improvised and
be the reference gesture—it is visible in fig. 2(b) at the point the dancer too occupied with the actual execution of them
1
Pressing the button while doing the gesture is not an ap- to be able to mark and label them as well. However, human
propriate solution in the long term, as it affects the gesture observers are capable of recognising a repeated gesture and
itself. This problem is addressed in the unsupervised version inferring a series of relatively precise timings from what is
(see section3). on the face of an unstructured continuous movement.
216
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
position perceptual 11
centre 1
10 6
9
12 16
18
time 19 15
20
2
7
14
17
Figure 3: A one dimensional gesture (e.g. a hand moving
4
up and down) where the implied punctual event or beat 13
8
is marked as the perceptual onset and is some time after 5
the initial onset. 3
217
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
(3) ing algorithm for the HMM [7]. The explicit probabilistic
formulation of the model makes it well suited to handling the
3 detection latency problem by predicting the future motion
2
1
of the controller and estimating how accurate this prediction
0 50 100 150 200 might be. The supervised method, however, is implemented
ave predinfo for states (3) in Java as a plug-in for Max/MSP and works in real-time.
2 An external to calculate the Euclidean and cosine match-
1 ing methods for any signal will be soon be released. Online
training of HMMs is possible but is an inherently more dif-
0
0 50 100 150 200 ficult problem which we are researching currently.
predinfo for transitions: (3) Part of the motivation behind this work is that multiple
4 performers could use the system and thereby share infor-
2
mation about gestures made. For example, when a gesture
triggers or schedules a sonic or visual event, it could also
0
0 50 100 150 200 cause a vibration signal to be sent to the other performers’
controllers. This extra level of haptic communication could
Figure 6: Information dynamic analysis of accelerom- enhance the sonic and visual interaction without interfering
eter signals in top panel. The middle panel shows the with the performance as seen and heard by the audience.
state sequence inferred from the HMM in a way that Future work will explore the importance of shared cues be-
highlights the average informativeness of each state in tween performers and the development of haptic solutions
the sequence: the shading of each marker encodes which to communicate these cues.
of the 20 states is active, while the y-axis represents
the average predictive information associated with that 5. ACKNOWLEDGMENTS
state. In the bottom panel, the shading encodes the state This work was partly supported by two EPSRC grants:
as before, but the y-axis encodes the predictive informa- GR/S82213/01 and EP/E045235/1. A. Robertson and J.-B.
tion associated with that particular transition in context. Thiebaut are supported by EPSRC Research studentships.
218
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Keywords
Augmented Violin, gesture tracking, interactive performance
1. INTRODUCTION
In June 2006, after meeting at 2006 NIME at IRCAM, Dr. Figure 1. Crescendo and amplitude discrepancy
Frédéric Bevilacqua (who has previously collaborated with
composer Florance Baschet [2]), and I decided to collaborate, B. However, bowing movements before and after a functional
which became my project of creating a new work entitled bowing, how you prepare before starting a stroke, and how you
VITESSIMO for Violin and the Augmented Violin, release the bow after ending a stroke, directly affect the
commissioned by Harvestworks. Nicolas Rasamimanana, one expression that the bow arm must make (or just made), in order
of the designers of the Augmented Violin said, "It is important to create a 'correct' or desired movement and musical
for us to stress that IRCAM's ultimate goal is to make such a expression. I personally recognize these ‘non-sound
device affordable and easy for any acoustic instrument to be producing’ movements as a kind of a gold mine of musical
"augmented” [3] expression, as such information is not transmittable without
the Augmented Violin; it ‘augments’ the expression of the
2. How to use the Augmented Violin violin.
I understood very early that Augmented Violin could become
just a fancy device ends up becoming an alternative to a
3. Building a ‘palette’
simple footswitch, only to create what George Lewis would call It was essential for me to first acquire an entirely new ‘palette’
the “Command and Obey” mechanism, and not a true of expressions using the Augmented Violin in order to start
interaction. [4] Dr. Andrew Schloss, one of the foremost composing VITESSIMO. Interactive Installation artist David
composer/percussionist working with the Radio Drum, a 3- Rokeby wrote, “Rather than creating finished works, the
dimensional computerized gesture controller, [5] mentions interactive artist creates relationships. The ability to represent
that he also differentiates two kinds of information coming relationships in a functional way adds significantly to the
out from his device: "meta-information" and "information", as expressive palette available to artists.” [7]
the meta-information is information about the event, but not I imagined performance scenarios that can only be made using
the event itself. [6] Dr. Schloss’s comment corresponds to my the Augmented Violin, such as:
own observation of violin bowing described below.
3.1 ‘Silent’ violin
2.1 Observing Bowings [Example 1] These low ‘echo’ pizzicatos are generated by the
For composing VITESSIMO using the Augmented Violin, I Augmented Violin, which detects a ‘mock’ pizzicato
started making observations of my bowings. My findings s o movement of my right arm.
far, can be described in two main points below:
A. Bowing is a functional movement to create sounds. But the
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
NIME08, June 4-8, 2008, Genova, Italy [Example 1] ‘silent Pizzicato’
Copyright remains with the author(s).
219
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
5. Conclusion
This paper describes a ‘palette’ of expression using the
Augmented Violin. I believe that a gesture-tracking device
such as the Augmented Violin should be musically coherent
and effective, even without visual effect. There is also a danger
that a gesture-tracking interface could make a performer
unknowingly calibrate his/her gestures for the device. At the
same time, I believe that using the Augmented Violin and
[Example 2] Tracking ‘molto rit.’ creating new ‘palette’ of expression, is an extraordinary
learning process of human-machine interaction, developing
3.3 Control without playing new kinds of expression of our time.
I use ‘retake’ bowing gesture for creating expressions,
especially movements right before the second stroke, I believe, ACKNOWLEDGMENTS
must be consistent with the expression of the musical context Special thanks to:
of the second stroke. [Example 3] shows the non-sound
making up-bow ‘retake’ movement, controlling the glissando Nicolas Leroy and Emmanuel Fléty and the Real Time Musical
rate of the pitch-shifted, delayed chord. Interactions Team at IRCAM, Harvestworks, and Hervé
Brönnimann.
REFERENCES
[1] F. Bevilacqua, F. Guédy, N. Schnell, E. Fléty, N. Leroy.
“Wireless sensor interface and gesture-follower for music
pedagogy”, In Proceedings of the 2007 Conference on New
Interfaces for Musical Expression (NIME07), New York,
NY, USA
[2] F. Bevilacqua, N. Rasamimanana, E. Fléty, S. Lemouton and
F. Baschet. “The augmented violin project: research,
composition and performance report”, In Proceedings of
the 2006 Conference on New Interfaces for Musical
[Example 3] ‘Retake’ tracking Expression (NIME06), Paris, France.
4. Making the Augmented Violin Glove [3] N. Rasamimanana: email correspondence with the author
When Dr. Bevilacqua loaned me the Augmented Violin, the [4] G. Lewis, G, Interacting with latter-day musical automata.
device was made of two small parts connected with short wires. Aesthetics of Live Electronic Music: Contemporary Music
The sensor portion attaches to the bow, and a small circuit Review, 18(3): 99–122, 1999.
board containing a battery and wireless portion attaches to the [5] R. Jones, A. Schloss, Controlling a physical model with a
bow arm with a Velcro band. Therefore I created my own 2D force matrix, In Proceedings of the 7th conference o n
Augmented Violin Glove, which is a lace glove containing New interfaces for musical expression, 2007, p. 27-30.
both the sensor and the battery portion of the Augmented
Violin. The glove is made of Velcro strips, balloons attached [6] A. Schloss: Email correspondence with the author
to a lace glove for elasticity. The Velcro strips allow [7] D. Rokeby, “Transforming Mirrors : Subjectivity and
experimenting quickly with the different placement and angles Control in Interactive Media”, Critical Issues in Interactive
of the accelerometers. (See [Figure 2]) Media, ed. S. Penny, SUNY press, 1996, p.133-158.
220
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Jörn Loviscach
Hochschule Bremen (University of Applied Sciences)
Flughafenallee 10
28199 Bremen, Germany
joern.loviscach@hs-bremen.de
221
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
222
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
223
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Figure 3: The “autocompletion” interaction mode augments the synthesizer’s interface by a parallel coor-
dinates plot of the library.
6. REFERENCES
[1] R. Bencina. The metasurface: applying natural
Figure 6: The parameters (represented by dots) are neighbour interpolation to two-to-many mapping. In
arranged according to their statistical relation, with NIME ’05, pages 101–104, 2005.
their colors representing functional groups. The [2] C. J. C. Burges. Geometric methods for feature
disk indicates the influence radius. The window ti- extraction and dimensional reduction. In Data Mining
tle names the parameter below the cursor. and Knowledge Discovery Handbook, pages 59–91.
Springer, 2005.
2 [3] J. A. Burgoyne and S. McAdams. Non-linear scaling
target
ergy E = X=Y 1 − dactual
X;Y /dX;Y while staying in a techniques for uncovering the perceptual dimensions of
actual timbre. In ICMC 2007, pages 73–76, 2007.
square of 400 × 400 pixels. Here, dX;Y denotes the dis-
tance of the markers representing the parameters X and [4] P. Dahlstedt. A MutaSynth in parameter space:
Y on the screen, and the targeted distance is given by interactive composition through evolution. Org. Sound,
−1 6(2):121–124, 2001.
dtarget
X;Y = I(X; Y )3 + 200 1
, so that unrelated parameters
are pushed 200 pixel apart. The third power lets related pa- [5] D. P. Ellis. Extracting information from music audio.
rameters exhibit a strong pull on each other. Commun. ACM, 49(8):32–37, 2006.
The user can specify an influence radius in this 2D repre- [6] I. Herman, G. Melançon, and M. S. Marshall. Graph
sentation to control how many other parameters a change visualization and navigation in information
in one parameter will affect. The new value of each influ- visualization: a survey. IEEE Transactions on
enced parameter is computed through a weighted average Visualization and Computer Graphics, 6:24–43, 2000.
of its value in every patch. The relative weight of a patch is [7] M. Hoffman and P. R. Cook. Real-time feature-based
exp −r2 /(2 · 0.012 ) , where r denotes the difference of the synthesis for live musical performance. In NIME ’07,
parameter value set by the user and its value in the patch. pages 309–312, 2007.
[8] C. G. Johnson and A. Gounaropoulos. Timbre
interfaces using adjectives and adverbs. In NIME ’06,
5. CONCLUSION AND OUTLOOK pages 101–102, 2006.
This work presented two interaction modes that give new [9] J. Mandelis and P. Husbands. Don’t just play it, grow
meaning to the classic controls of a synthesizer, no matter if it!: breeding sound synthesis and performance
they are actual knobs or if they are drawn on a computer’s mappings. In NIME ’04, pages 47–50, 2004.
screen. This allows sticking to existing hardware or to ex-
224
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
This paper presents a project called i-Maestro The project specifically addresses training support for string
(www.i-maestro.org) which develops interactive multimedia instruments and among the many challenging aspects of music
environments for technology enhanced music education. The education. The project is particularly interested in linking music
project explores novel solutions for music training in both practice and theory training.
theory and performance, building on recent innovations
resulting from the development of computer and information
technologies, by exploiting new pedagogical paradigms with
cooperative and interactive self-learning environments, gesture 2. I-MAESTRO
interfaces, and augmented instruments. This paper discusses the With an analysis of pedagogical needs, the project develop
general context along with the background and current enabling technologies to support music performance and theory
developments of the project, together with an overview of the training, including tools based on augmented instruments,
framework and discussions on a number of selected tools to gesture analysis, audio analysis and processing, score
support technology-enhanced music learning and teaching. following, symbolic music representation, cooperative support
and exercise generation. The resulting i-Maestro framework for
technology-enhanced music learning is designed to support the
creation of flexible and personalisable e-learning courses, and
Keywords aims to offer pedagogic solutions and tools to maximise
Music, education, technology-enhanced learning, motion, efficiency, motivation, and interests in the learning processes
gesture, notation, sensor, augmented instrument, multimedia, and improve accessibility to musical knowledge.
interactive, interface, visualisation, sonification.
A process of continuous user requirements analysis was started
since the beginning forms the basis of the specification of a
framework which include enabling technologies, pedagogic
1. INTRODUCTION tools and the production of content, and supportive pedagogical
The i-Maestro project [1, 12, 18, 19] aims to explore novel aspects, such as modelling and formalising educational models
solutions for music training in both theory and performance, for music, courseware production tools. These include
building on recent innovations in computer and information innovative aspects, such as models and support for cooperative
technologies. New pedagogical approaches are being studied training, interactive and creative interfaces with sensors, and
with interactive cooperative and self-learning environments, gesture tracking, client tools for theory and play training,
and computer-assisted tuition in classrooms including gesture distribution and management tools for music lessons, and music
interfaces and augmented instruments. The project develops a exercise generation.
technology-enhanced environment for aural and instrumental The outcomes are being validated by several European
training both for individuals and ensembles, as well as tuition in institutions including Accademia Nazionale di Santa Cecilia
musical analysis, theory, and composition. (Rome), the Fundación Albéniz (Madrid) and IRCAM (Paris).
225
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
i-Maestro P2P
& Cooperative
Work Support
Educational individual
content, students
profiles,
history and
results
Group of students
226
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2.3 Symbolic Music Representation An overarching pedagogical approach and model [16] for
Music notation is one of the fundamentals in music education. technology-enhanced teaching and learning has been
i-Maestro is promoting MPEG Symbolic Music Representation developed. On this basis, a set of detailed pedagogical scenarios
(SMR), an ISO standard for the representation of music related to the use of the i-Maestro tools has been created.
notation with enhanced multimedia features [3, 4, 8, 9, 15]. This paper presented a brief overview of the i-Maestro project.
Cooperative work is another key area of music education. It With the introduction, the paper presented the overall
allows different components of the i-Maestro framework to be framework design and introduced several tools to support music
used across a network. Other tools include the Exercise learning and teaching including MPEG SMR for theory
Generator, which supports (semi-)automated creation of training, gesture analysis for performance training.
exercises and the School Server offers online access to stored
lesson material for sharing learning material at home and in the The final results are expected to consist of a framework for
classroom. technology-enhanced music training, that combines proven and
novel pedagogical models with technological tools such as
Figure 5 shows an MPEG SMR player/decoder within the IM1 collaborative work support, symbolic music processing, audio
MPEG-4 reference software. The MPEG-SMR has been processing, and gesture interfaces. Offering accessible tools for
accepted as an ISO standard under MPEG-4. music performance and theory training as well as for authoring
lessons and exercises will ensure wide participation.
Many prototype tools available are expected to be incorporated
in various new products and services, which will be made
available to both the general public and educational
establishments. These are in the process of being validated and
refined and the project is inviting music teachers and students
to take part in the testing phase the i-Maestro software. We are
particularly interested in testing the system in real pedagogical
situations to see how teachers and students interact with the
technology. At the ICSRiM - University of Leeds (UK), open
lab sessions are being organised for people to come and try out
the i-Maestro 3D augmented mirror system with a 12-camera
motion capture system.
227
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Production of Cross Media Content for Multi-channel Conference on Automated Production of Cross Media
Distribution (AXMEDIS 2006), Content for Multi-channel Distribution (AXMEDIS 2006),
www.axmedis.org/axmedis2006, Volume for Workshops, www.axmedis.org/axmedis2006, Volume for Workshops,
Tutorials, Applications and Industrial, pp. 87-91, 13th – Tutorials, Applications and Industrial, pp. 87-91, 13th –
15th December 2006, Leeds, UK, Firenze University Press 15th December 2006, Leeds, UK, Firenze University Press
(FUP), ISBN: 88-8453-526-3, (FUP), ISBN: 88-8453-526-3,
http://digital.casalini.it/8884535255 http://digital.casalini.it/8884535255
[6] Cont, A., Schwarz, D. (2006), Score Following at IRCAM, [13] Kia Ng, Oliver Larkin, Thijs Koerselman, and Bee Ong,
MIREX‘06 (Music Information Retrival Evaluation i-Maestro Gesture and Posture Support: 3D Motion Data
eXchange), The Second Annual Music Information Visualisation for Music Learning And Playing, in
Retrieval Evaluation eXchange Abstract Collection, Edited Proceedings of EVA 2007 London International
by The International Music Information Retrieval Systems Conference, Eds: Jonathan P. Bowen, Suzanne Keene,
Evaluation Laboratory (IMIRSEL), Graduate School of Lindsay MacDonald, London College of Communication,
Library and Information Science, University of Illinois at University of the Arts London, UK, 11-13 July 2007,
Urbana-Champaign http://www.music- pp20.1-20.8.
ir.org/evaluation/MIREX/2006_abstracts/MIREX2006Abs [14] Kia Ng, Oliver Larkin, Thijs Koerselman, Bee Ong,
tracts.pdf, p. 94, October 2006, Victoria, Canada Diemo Schwarz, Frederic Bevilacqua, The 3D Augmented
(http://ismir2006.ismir.net/) Mirror: Motion Analysis for String Practice Training, p.
[7] F. Bevilacqua, N. Rasamimanana, E. Fléty, S. Lemouton, 53-56, in Proceedings of the International Computer Music
F. Baschet (2006) The augmented violin project: research, Conference, ICMC 2007 – Immersed Music, Volume II,
composition and performance report, 6th International pp. 53-56, 27-31 August 2007, Copenhagen, Denmark,
Conference on New Interfaces for Musical Expression ISBN: 0-9713192-5-1
(NIME 06), Paris, 2006 [15] Kia Ng and Paolo Nesi (eds), Interactive Multimedia
[8] Pierfrancesco Bellini, Paolo Nesi, Maurizio Campanai, Music Technologies , ISBN: 978-1-59904-150-6
Giorgio Zoia, FCD version of the symbolic music (hardcover) 978-1-59904-152-0 (ebook), 394 pages, IGI
representation standard. FCD version of the symbolic Global, Information Science Reference, Library of
music representation standard, MPEG2006/N8632, Congress 2007023452, 2008.
October 2006, Hangzhou, China [16] Tillman Weyde, Kia Ng, Kerstin Neubarth, Oliver Larkin,
[9] P. Bellini, F. Frosini, G. Liguori, N. Mitolo, and P. Nesi, Thijs Koerselman, and Bee Ong, A Systemic Approach to
MPEG-Symbolic Music Representation Editor and Viewer Music Performance Learning with Multimodal
for Max/MSP, in Proceedings of the Second International Technology
Conference on Automated Production of Cross Media Support, in Theo Bastiaens and Saul Carliner (eds.),
Content for Multi-channel Distribution (AXMEDIS 2006), Proceedings of E-Learn 2007, World Conference on E-
www.axmedis.org/axmedis2006, Volume for Workshops, Learning in Corporate, Government, Healthcare, & Higher
Tutorials, Applications and Industrial, pp. 87-91, 13th – Education, Québec City, Québec, Canada, Association for
15th December 2006, Leeds, UK, Firenze University Press the Advancement of Computing in Education (AACE),
(FUP), ISBN: 88-8453-526-3, October 15-19, 2007.
http://digital.casalini.it/8884535255 [17] Thijs Koerselman, Oliver Larkin, and Kia Ng, The MAV
[10] Norbert Schnell, Frederic Bevilacqua, Diemo Schwarz, Framework: Working with 3D Motion Data in Max MSP /
Nicolas Rasamimanana, and Fabrice Guedy, Technology Jitter, in Proceedings of the 3rd International Conference
and Paradigms to Support the Learning of Music on Automated Production of Cross Media Content for
Performance, in Proceedings of the Second International Multi-channel Distribution (AXMEDIS 2007). Volume for
Conference on Automated Production of Cross Media Workshops, Tutorials, Applications and Industrial,
Content for Multi-channel Distribution (AXMEDIS 2006), i-Maestro 3rd Workshop, Barcelona, Spain, ISBN: 978-88-
www.axmedis.org/axmedis2006, Volume for Workshops, 8453-677-8, 28-30 November 2007.
Tutorials, Applications and Industrial, pp. 87-91, 13th – [18] Kia Ng, Tillman Weyde, Oliver Larkin, Kerstin Neubarth,
15th December 2006, Leeds, UK, Firenze University Press Thijs Koerselman, and Bee Ong, 3D Augmented Mirror: A
(FUP), ISBN: 88-8453-526-3, Multimodal Interface for String Instrument Learning and
http://digital.casalini.it/8884535255 Teaching with Gesture Support, in Proceedings of the 9th
[11] Ong, B., Khan, A., Ng, K., Nesi, P., Mitolo, N. (2006), international conference on Multimodal interfaces,
Gesture-based Support for Technology-Enhanced String Nagoya, Japan, pp. 339-345, ISBN: 978-1-59593-817-6,
Instrument Playing and Learning. International Computer ACM, SIGCHI, DOI:
Music Conference (ICMC) 6-11 November 2006, New http://doi.acm.org/10.1145/1322192.1322252, 2007
Orleans, Louisiana, USA, www.icmc2006.org, ISBN: 0- [19] Kia Ng, 4th i-Maestro Workshop on Technology-Enhanced
9713192-4-3 Music Education, in Proceedings of the 8th International
[12] Bee Ong, Kia Ng, Nicola Mitolo, and Paolo Nesi, Conference on New Interfaces for Musical Expression
i-Maestro: Interactive Multimedia Environments for Music (NIME 2008), Genova, Italy, 5-7 June 2008.
Education, in Proceedings of the Second International
228
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
This paper describes the HOP system. It consists of a wireless The radio chip used to establish the wireless link is the
module built up by multiple nodes and a base station. The nodes CYWM6935 of Cypress Semiconductor [5]. The chip runs the
detect acceleration of e.g. human movement. At a rate of 100 WirelessUSB protocol, a short-range, high-bandwidth wireless
Hertz the base station collects the acceleration samples. The data radio communication protocol that uses the 2.4 GHz band, so it is
can be acquired in real-time software like Pure Data and well suited for our application.
Max/MSP. The data can be used to analyze and/or sonify
movement.
Keywords
Digital Musical Instrument, Wireless Sensors, Inertial Sensing,
Hop Sensor
1. INTRODUCTION
This paper presents wireless motion sensors. The application is a
multipoint-to-one system. Three people can attach a sensor to
their body; their acceleration will be measured in three Figure 1. The node
dimensions and transmitted to a central base station at a data rate
of 100 Hertz. The collected data can be used to analyze or sonify The data of the accelerometer is acquired by the microcontroller,
movement [1]. Distances up to 30 meters are allowed between through the I²C interface. The microcontroller processes this
transmitter and receiver. Goal of the project is to increase the information and sends the data to the transceiver chip through the
number of users to over 10 people at the same data rate. SPI [6] interface. The chip transmits the data to the base station.
For this project the ATmega168V of Atmel is being used [7]. The
chip transmits the data to the base station.
2. HARDWARE
The system consists of wireless nodes and a base station. The
wireless nodes collect the acceleration data and send these
samples to the base station at a rate of 100Hz.
229
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2.4.2 Noise
In figure 3 a view is given of the noise of the accelerometer. The
resolution of one acceleration sample here is 7 bit. The standard
deviation of the noise is roughly 0,054g. If this value appears to
be too large, there is a possibility to improve the noise by taking
more acceleration samples but keeping the data rate constant. The
microcontroller now reads the value of the accelerometer 4 times
Figure 4. Dataflow at the base station in one timeframe and takes the average of these samples before he
This packet contains the node address (one byte), a time stamp transmits the acceleration data to the base station. By doing this,
(two bytes), acceleration information (three bytes) and button Figure 6 is obtained. The standard deviation is now decreased to
information (three bytes). The microcontroller sends this packet to approximately 0,024g.
the ‘USB to UART’ chip, which sends the packet to a host
computer by USB.
230
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
0,25
Acceleration (g)
0,15
0,05
-0,05
-0,15
231
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
232
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
1. INTRODUCTION
Increasingly computers are becoming part of our everyday
2. PROBLEMS OF ESS
lives, not only as our familiar laptops or towers, but built into Emotional state sensing is currently far from an exact science.
less inherently digital items, from fridges to water-filters, cars There are three core problems with accurately judging the
emotions of a human being using indica tors from the
autonomic nervous system (ANS) [2].
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies Firstly there is what is referred to as the ‘Baseline Problem‘,
bear this notice and the full citation on the first page. To copy otherwise, or finding a condition against which changes in the ANS can be
republish, to post on servers or to redistribute to lists, requires prior specific measured. How does one induce a state of emotional
permission and/or a fee.
‘neutrality’ in a subject for study? Individual physiological
NIME08, June 5-7, 2008, Genova, Italy
Copyright remains with the author(s). characteristics also mean that readings may be at the high end
233
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
of the scale for one person, with lower readings for another, chair, both as independent streams and an interpolated view of
while both are experiencing th e ‘same’ emotion. data from all four chairs.
Environmental factors such as ambient temperature can also
play a part.
The chairs were fitted with a sensor package consisting of a
Secondly there is the ‘Timing of Data Assessment Problem’. light dependent resistor (LDR) mounted in the back of the
Emotions can be fleeting, arising and disappearing in a matter chair , two pressure sensors under the legs-one left, one right,
of seconds, Levinson [3] suggests that they may be as short as composed of Quantum Tunnelling Compound (QTC)) and a
0.5 seconds and last up until 4 seconds or beyond. This means galvanic skin response (GSR) sensor mounted on the arm of
that by measuring at the wrong time an emotion might be the chair. These allowed the system to capture various
missed. Other emotions may have a long initial onset, such as physical movements (posture in the chair-measured with the
anger, whilst some may be much shorter, such as surprise. LDR, Left /Right movement in t he chair-measured with the
QTC) as well as biometric data (weight and galvanic skin
response-QTC & GSR).
Thirdly there is the ‘Intensit y of Emotion Problem’, which
addresses the correlation betw een the magnitude of the
physiological response and the ‘intensity’ of the emotion felt.
At low levels of emotion there may be little response from the
3.2 Gauging Emotion
ANS whilst at higher levels th e pattern of ANS activity In order to create an ‘emotionally-aware’ system the data from
associated with a particular emotion may be destroyed. the four sensors was graphed according to the affect scale in
common use per Russell [7]. The X-axis was labelled
Other issues complicate the gr aphing and reporting of 'Valence' and its output was a combined product of the
emotions, such as how was the emotion induced, how was the pressure and LDR sensor data and was used as an indicator of
subject encouraged (or not) to ‘express’ the emotion and the occupants 'enjoyment' level. The Y-axis was labelled
complications from physiological responses not connected to 'Arousal' and is a product of the GSR readings. The GSR is an
emotional state [4]. indicator of skin conductance, measured across the hand, and
increases linearly with a person’s level of overall arousal [2].
Systems which rely on data from audience members also raise This was used as an indicator of the chair occupants’
questions relating to what we shall call ‘sensor ethics’. An intellectual engagement. This divides the graphing window
audience member may feel uncomfortable being monitored in into four distinct sectors or 'affect spaces', as may be seen
this way. Perhaps they wish to pretend to be enjoying the below, with some common emotions indicated.
piece for motives of their own . Perhaps they are
uncomfortable as ‘performers’ or have a medical condition that
the sensors might illuminate. For reasons such as these we
must approach audience monitoring/participation in the same
way as we would for conducting a physiological or
psychological experiment.
234
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
This file may then be imported into a graphing program such 3.3.3 Experiment C
as Matlab or Excel for visualisation. The operator may also
select the option of recording an audio file in parallel with the A short binaural audio play wa s created to test the system,
data, again for later study in conjunction with the sensor data. played over headphones comprising a recording of footsteps
running towards the listener from behind, followed by a loud
gunshot very close by with a high degree of realism.
3.3 Experimental Results This produced both a physical reaction in the listener (a jump
in their seat) as well as a spike in the GSR reading, although
The ’Sensory Chairs’ system provides a way of moni toring
this was very brief and diffic ult to detect without more
multiple audience or performer biosignals at the same time and
sensitive equipment.
using this data to make basic emotional state judgements. It is
also possible to compare this data against the performance
which generated it, either as an audio recording or a recording
of the performers own biosignal data.
3.3.1 Experiment A
During a performance of experimental electronic music, data
was recorded from four volunte ers seated in the Sensory
Chairs system. Audio from the three performances was Figure 2: Example Output A-Sensor Data Over Time
recorded simultaneously. The participants were unfamiliar
with the pieces played and wer e seated centrally in the
performance space.
Subsequent analysis of the dat a showed notable differences
between the magnitudes of individual participants’ responses
to the performance. Some prov ed very sensitive to GSR
monitoring while others showed more muted responses.
Comparison of the data with the accompanying audio indicated
fluctuations in the sensor rea dings that appeared linked to
audio events. During ‘calm’ o r ‘soothing’ portions of the
pieces we noted a lowering in the GSR reading, indicating a
relaxed state. Sudden loud so nic events following such
portions of audio showed a rising in the GSR indicating a state Figure 3: Example Output B-Sensor Data Over Time
of alertness. Accompanying these events were spikes in the
pressure sensors and LDR sensor indicating movement in the
chair, probably in response to the sudden sound. Figures 2 and 3 show the recorded sensor data for 2
individuals during the same performance and graphs the output
of each sensor (vertical axis) over 0.5 second intervals
3.3.2 Experiment B (horizontal axis). The Red and Blue plots show the readings
from the pressure (QTC) sensors, the Yellow the LDR and the
An interactive audio piece was created specifically for the Green line is the GSR. Note the values from the LDR remain
Sensory Chairs system. This was an ‘enactive’ composition in at maximum for most of the per formance indicating both
which the volunteers seated in the chairs generate d sonic participants leant heavily against the back of the chairs. In
events based on the biometric data sent from the system. Each Fig. 4 we see the participants GSR reading suddenly drop, this
chair/volunteer was assigned a specific ‘voice’ in the piece is as a result of the participant having removed their hand from
with rhythmic and melodic even ts as well as processing the sensor and then replaced it during the performance.
controlled by their emotions and movements.
We may clearly see the difference in magnitude of the sensor
readings for each individual a cross all the sensors. If we
A short questionnaire and info rmal debriefing afterwards compare the pressure (QTC) sensor data (Blue and Red) for
revealed that participants found it difficult to connect a sense both graphs we can see the individual in Fig. 2 remained
of control or ownership to the ir sounds. This illustrates a relatively still in their chair, whereas the individual in Fig. 3
mapping issue pertaining to em otional state sensing and shifted in their seat much more.
biosignals, how does one sonify an emotion or an affective
state?
Closer observation also reveals similar rise/fall cycles between
participants GSR, corresponding with relaxing or sudden
events in the audio performance.
235
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
236
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT 2. HARDWARE
We present examples of a wireless sensor network as applied
The hardware components of the system, in essence, comprise
to wearable digital music controllers. Recent advances in
a basic Motion Capture (MC) system. Accelerometers placed
wireless Personal Area Networks (PANs) have precipitated the
at different points on the arms, legs, and head, track the
IEEE 802.15.4 standard for low-power, low-cost wireless
motion of the user. This data, collected at different points
sensor networks. We have applied this new technology to
around the body, must be transmitted to a computer for
create a fully wireless, wearable network of accelerometers
analysis and translation into music. To minimize hindrance to
which are small enough to be hidden under clothing. Various
the user, our MC system completely eliminates wires. Data is
motion analysis and machine learning techniques are applied
transmitted wirelessly and independently from each
to the raw accelerometer data in real-time to generate and
accelerometer to a base station, which is attached to a
control music on the fly.
computer.
237
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
microcontroller and the radio (see Figure 1) are available Our system requires several independent sensor nodes to
together from Atmel’s Z-Link series, designed for Zigbee and communicate with a single base-station. Communication
802.15.4 networks. The Atmel radio, the AT86RF230, offers a latencies must be kept to a minimum, samples should be
digital radio solution that requires a bare minimum of external collected from each node at regular intervals, and power
components, allowing for low cost and a physically small consumption should be minimized. The 802.15.4 standard
footprint. A Linx chip antenna is used to minimize the form describes the Guaranteed Time Slot (GTS) feature that allows
factor of the devices. The three-axis accelerometers from rigid, reliable data transmission rates between network slaves
Kionix offer 6-g sensitivity and 12 bits of resolution, allowing and a network master. However, the GTS feature requires the
the sensors to detect fluctuations in acceleration as small as slaves to be either persistently listening, which wastes power,
0.003 g’s in any direction. The radio and the accelerometer or time-synchronized, which requires extra communication.
both interface with the microcontroller through an SPI (Serial
To solve this problem, our system utilizes a collaborative
Peripheral Interface) link, with speeds up to 2 Mbps, as the
virtual time slot allocation technique, which takes advantage
ATMega644 microcontroller is operated at 4 MHz.
of the Carrier Sense Multiple Access with Collision Avoidance
The radio operates in the 2.4 GHz band, although the IEEE (CSMA-CA) feature. In essence, when each node wants to
specification defines two other bands, around 800 and 900 transmit, it listens to see if the channel is busy. If it is not
MHz, which may be used when there is too much noise in the busy, it will wait a random interval before transmitting. After a
2.4 GHz band. The radio, when operating at 2.4 GHz is successful transmission, the node starts a deterministic timer,
capable of a raw throughput of 250 kbps. As each sample from corresponding with the desired sampling rate, which indicates
the accelerometer contains approximately 50 bits (12 bits * 3 when the node should transmit its next sample. In the steady
axes plus protocol overhead bits), each node is itself state, the node will transmit the next message after this pre-
theoretically capable of transmitting around 5000 samples per determined interval and will settle into a regular transmission
second. With a 5-sensor node system, the theoretical limit of schedule. If the node listens and finds the channel busy, it will
the rate at which samples may be collected from the entire wait a random interval before attempting to transmit again. It
system is around 1000 samples every second. This time will continue to wait and check the channel until it finds the
resolution is more than sufficient for a responsive system channel is not busy. At this point, the node will transmit its
without noticeable latency. Our experiments have used as few message.
as 60 samples per second with excellent results and no
With every node following this behavior, and using the same
noticeable latency. This wide range allows for successful
sampling rate, they will eventually settle into a schedule that
operation of the system even in electrically noisy
fits for every node, where no message overlaps, assuming the
environments where the communications rate is forced to
message lengths are short enough given the sampling rate that
drop.
is used. In addition, between each sample, the node can enter a
standby mode to reduce power consumption and extend
battery life. This scheme works well in a system such as this
sensor network where each data frame to be transmitted will
be of exactly the same length, and each node is taking samples
at exactly the same rate. Since the clocks are not
synchronized, however, and may actually run at slightly
different rates, the "set" schedule for each node is not actually
fixed. This scheme is flexible: as each sample timer is started
only after a successful transmission, the schedule is readjusted
such that no messages overlap. To minimize the latency jitter
this may introduce, a reasonably low sampling rate is required,
to allow some room in the transmission schedule for
readjustments.
In short, this transmission scheme allows for high throughput
without the communication overhead that would be required
with other schemes. Samples are transmitted on reasonably
Figure 1. A wireless node tight schedules that allow for little random jitter in the time
intervals between them, and is done without the use of
timestamps and the overhead of clock coordination.
2.3 Network Layer Design
The software that runs on each node in the network is built on
top of a custom library, designed according to an AT86RF230 3. SOFTWARE
software programming document [1], which encapsulates the The base station is connected to the computer via a USB
physical layer of the network. The network layer is kept very connection. FTDI's D2XX drivers1 allow direct access to the
simple to allow for fast implementation of new techniques, USB device through a DLL so our software can access it
which are not incorporated into a typical 802.15.4 Medium through a series of DLL function calls. We wrote this software
Access Control (MAC) layer. In addition, we are interested in using flext2, a C++ layer for cross-platform development of
a single-hop network and do not need many of the features the
full 802.15.4 specification provides. The networking layer we
have implemented is not 802.15.4 compatible, although the
1
physical layer is. http://www.ftdichip.com/Drivers/D2XX.htm
2
http://grrrr.org/ext/flext/
238
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Max/MSP3 and Pure Data (Pd).4 This gives us an object, or 3.2 Motion analysis
external, to use in either of these graphical programming Patches were also written for movement and gesture
languages that interfaces directly with the base station through recognition. Patches were created to determine the magnitude
the USB connection and streams the accelerometer data into and direction of movements. Directionality is determined by
our Max/MSP or Pd programs, or patches. using the last known orientation of the sensor at rest as the
We then designed a suite of patches to enable use of the sensor initial state and comparing this to the detected vector of
network with direct and indirect mappings and to allow the movement. While we often used a simple measure of
user to create or manipulate music in real-time. The acceleration for the magnitude of a movement, we also found
accelerometer data can be processed in various manners to it helpful to track the duration of a movement as an important
extract inclination and orientation when accelerometers are not basic parameter.
moving (i.e. when overall acceleration is about 1g) and detect We considered two forms of gesture recognition, essentially
movements and gestures when in motion. By creating a library separating them into programmed and learned gestures. The
of low level data processing patches that analyze the raw programmed gesture schemes used a simple patch that detects
accelerometer data and extract meaningful parameters about when one specified action follows another within a specified
the sensor nodes, we were able to provide functional time frame. This enabled us to combine multiple movements
components for use in higher-level designs. such that the overall gesture occurs when one movement is
followed by another movement within a certain time period. A
3.1 Data processing useful instance of these manually programmed gestures was
The low-level library includes patches for calibration and that of recognizing a specified orientation followed by motion
converting ADC values to real measures of acceleration in g’s, in certain direction. We designed this example with an
calculating total acceleration, jerk, frequency, and overall accelerometer attached to the wrist to detect 6 orientations
activity, and determining orientation and inclination. The total (palm up, palm down, thumb up, thumb down, fingers up,
acceleration patch can be used for detecting overall fingers down) and 6 directions of movement (up, down, left,
acceleration of a sensor, but is also important in inclination right, forward, backward), which provide 36 different
error control. If the total acceleration of a sensor goes above orientation/movement combinations. When combined with a
1g, there are forces other than gravity acting on it and second sensor for the other hand, the number of
inclination calculations are no longer valid. orientation/movement combinations is in the thousands. This
example illustrates the ability to use the system to make
One simple orientation patch takes the raw acceleration of commands with an “alphabet” of gestures, much like flag
three axes as input and essentially outputs which axis is facing semaphore signaling uses two flags held in specific positions
upward, with a check that the accelerometer isn’t in motion to signify letters.
and a small bias toward the current orientation. For a more
accurate indication of the accelerometer’s position in three- The second form of gesture recognition uses machine learning
dimensional space, we created an inclination patch to use on a techniques to teach the computer a set of gestures. Then, an
per-sensor basis. It includes trigonometric calculations that use arbitrary motion can be recognized from that set in real-time.
gravity to determine angles referred to as pitch, roll, and yaw We explored gesture recognition with hidden Markov models
for rotation about the accelerometer’s x, y, and z axes. The (HMM) by utilizing the FTM and MnM libraries [1]. The
method maintains constant sensitivity and allows tilt angles system has the capability to learn gestures, e.g. drawing shapes
greater that 45° to be sensed accurately and precisely by using or numbers in the air, perform gesture following, and detect
the acceleration of all three axes in each calculation of pitch, gestures with accompanying degrees of certainty.
roll, and yaw [5]. For example, the pitch (X-tilt) calculation is
given by I in Equation 1.
§ ·
¨ ax ¸
I arctan¨ ¸ (1)
¨ ay az 2
2
¸
© ¹
After performing the three inclination calculations, making
further corrections with sign recognition, and testing whether
the sensor is moving and its data is valid, the patch outputs
accurate measures of pitch, roll, and yaw in degrees.
Note that while designed for our sensor system, these patches
also work with popular accelerometer-based input devices
such as the Nintendo Wii Remote and Apple iPhone.
4. APPLICATIONS
3 Our hardware and software infrastructure was applied to a
http://www.cycling74.com/products/maxmsp
number of scenarios with success. One of the most valuable
4
http://puredata.info was using the system on trained dancers (see Figure 2) with
239
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
240
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT Recently, a system that controls the tracks like a DJ has also
This research aims to develop a wearable musical interface been proposed in [6].
which enables to control audio and video signals by using In this study, we focused on a sophisticated interface that
hand gestures and human body motions. We have been enables users to control the sound and music in intuitive
developing an audio-visual manipulation system that real- and efficient manners. The glove-like input device is one
izes tracks control, time-based operations and searching for of the conventional interfaces for human-computer interac-
tracks from massive music library. It aims to build an emo- tion. The developed system, MusicGlove, has a role of in-
tional and affecting musical interaction, and will provide teractive music player and explorer, which performs tracks
a better method of music listening to people. A sophisti- control, time-stretching of audio and video signals, and in-
cated glove-like device with an acceleration sensor and sev- formation retrieval from massive music library as a result
eral strain sensors has been developed. A realtime signal of hand gestures and body motion recognition. In particu-
processing and musical control are executed as a result of lar, time-based multimedia interaction including audio and
gesture recognition. We also developed a stand-alone de- video signals becomes popular in these years. To date some
vice that performs as a musical controller and player at the rich time-based operations have also been proposed for dif-
same time. In this paper, we describe the development of a ferent applications [7].
compact and sophisticated sensor device, and demonstrate Gestural control allows the real-time control and high
its performance of audio and video signals control. affinity for expressive performance. The target of Music-
Glove project includes an all-in-one device that can control
music and generate audio sound by itself. In this instance,
Keywords the user can listen to music that is produced by the wear-
Embodied Sound Media, Music Controller, Gestures, Body able device. Therefore, people can enjoy musical control by
Motion, Musical Interface using MusicGlove at any time and place, even in transit or
on the walking. The developed device and system can con-
tribute to new listening style of music from massive music
1. INTRODUCTION library.
The listening habits of people have been dramatically
changing in recent years because people can bring massive
libraries of digital music with small portable music players
2. SYSTEM OVERVIEW
like iP od. In this situation, it is required to build a new In this study, the musical control is mainly divided into
system that people can find desired music from enormous two functions; tracks control and audio/video time-based
numbers of digital media data. Many methods have been control. The tracks control is regarded as common manip-
proposed to address this problem, for example, a method of ulations of music player such as play, stop and skip to the
graphical visualization to organize lucidity music libraries next music. In addition, a function of searching tracks from
were suggested [1]. music library is implemented, which is similar to the manip-
In order to allow users to provide more degrees of free- ulation performed by a DJ. On the other hand, audio/video
dom, a variety of physical input devices such pen tablet, dial time-based control is regarded as signal processing which di-
or glove shaped interface are commercialized and widely rectly controls sound waveform such as change of tempo or
being used in various fields. In addition, intuitive input addition of tonal effect, which is a resemblance function of
devices such as touch and haptics have been attracting so- audio signals.
cial attention. To date there are a number of researches
about the gesture interface for music [2, 3]. For example, 2.1 Hardware Overview
systems for musical controller that are able to control elec- The overview of the developed glove-like sensing device
tronic devices by using simple finger gestures were proposed is shown in Figure 1. The device consists of one 3-axis
in several studies such as FreeDigiter [4] and Ubi-Finger [5]. acceleration sensor, 4 strain sensors, 1 microprocessor for
signal processing and control, Bluetooth wireless module, a
portable music player that is used in the Wearable Music
play (all-in-one application), and a battery. The measure-
Permission to make digital or hard copies of all or part of this work for ment range of the acceleration sensor is from ±10g. As the
personal or classroom use is granted without fee provided that copies are sensor is fixed on external side of the wrist part, X, Y, Z-
not made or distributed for profit or commercial advantage and that copies axis are also fixed at a given position. Four strain sensors
bear this notice and the full citation on the first page. To copy otherwise, to are mounted at upside of index finger and mid finger, and
republish, to post on servers or to redistribute to lists, requires prior specific also inner and exterior side of wrist as illustrated in Figure
permission and/or a fee.
NIME08, Genova, Italy 2. The strain sensors provide analog values of bending of
Copyright 2008 Copyright remains with the author(s). each position. The glove like device has light weight and
241
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
LED Microprocessor
Wireless Communication Module
Video Files Video Data Buffer
Screen
Strain Sensor
Video Control Synchronism
Figure 1: The overview of the MusicGlove in-
put/output device
Figure 3: Flow diagram
242
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Glove Device
Display / Speaker Acceleration sensor
Wireless communication module
Strain sensor 4
Control Computer
Max/MSP, EyesWeb
Gesture recognition, Beat extraction
Audio/Visual processing
synchronism
Figure 4: Air Disc Jockey (DJ) Figure 5: Air Conductor Figure 6: Wearable Music
When a user begins to swing his/her arm with a constant motion which is detected based on the accumulated value
tempo and keeps three times within a certain tempo change, of the acceleration sensors in all-axes. The user listens to
this conducting interaction is initiated. audio data, and acts a grasping motion of playing music
when she/he finds a desired track or library. Grasping mo-
(3) Wearable music: This is another application of the tion is detected based on the strain sensors. In addition,
developed MusicGlove. A portable music player can be waving user’s hand is regarded as “ shuffle search. ” The
attached with the glove device, and the users are able to user is able to choose a media library or tracks in a random
control music player by his/her gestures as illustrated in manner. These manipulations provides users with intuitive
Figure 6. This enables the system to be stand-alone, and search like grasping a music at the air.
users can listen to music via headphone or earphones that
are directly connected to the developed device without any
other equipments. The embedded microprocessor produces 4. PERFORMANCE DEMONSTRATIONS
a control signal to the player such as: play or stop tracks, In this section, we show some performance demonstra-
skip to the next or previous music, fast forward and rewind, tions with the developed device. We first describe time-
and volume control by means of acceleration and strain sen- series examples of sensors and gesture classification. Next,
sors. The predetermined gestures are the same as ones in a example of waveform regarding the scratch motion will be
the track control mode. shown with the spectrogram.
243
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
A B C D E A B
Frequency [Hz]
Frequency [Hz]
Voltage [V]
A B C D E Acknowledgement
A part of this work is supported by Japan Science and
Technology Agency (JST), CREST ”Generation and Con-
Extension of wrist
trol Technology of Human-entrained embodied Media.”
Acceleration [G]
Voltage [V]
6. REFERENCES
[1] M. Torrens, P. Hertzog and JL Arcos. Visualizing and
Accelerations of X,Y,Z Strain of index finger Exploring Personal Music Libraries. Proceedings of the
5th International Conference on Music Information
Retrieval (ISMIR2004) , 2004. Avaiable online.
[2] M. Wanderley, Gestural Control of Sound Synthesis.
Time [Sec] Proc. of the IEEE, 92(4), pp. 632-644, 2004.
Figure 9: Search for audio tracks mode [3] K. Ng, Music via Motion: transdomain mapping of
motion and sound for interactive performances, Proc.
In addition, a distinguishing spectrum feature can be seen of the IEEE, 92(4), pp. 645-655, 2004.
in the region A of Figure 10(a). That wave pattern is simi- [4] C. Metzger, M. Anderson and T. Starner, ”FreeDigiter:
lar to the region B which is seen in the Figure 10(b). The A Contact-free Device for Gesture Control,” Proc. of
wave pattern is generated by a particular rotation of the the Intl. Symp. on Wearable Comput., pp. 18-21, 2004.
turntable after the scratching action. That can be said that [5] K. Tsukada, and M. Yasumura, ”Ubi-Finger: a Simple
we have duplicated the behavior of digital turntable sys- Gesture Input Device for Mobile and Ubiquitous
tem. The waveform of audio output by the developed sys- Environment,” Journal of Asian Information, Science
tem has quite similar characteristics with one by the digital and Life(AISL), Vol.2, No.2, pp. 111-120, 2004.
turntable system in terms of temporal transition. The re- [6] K. F. Hansen and R. Bresin. DJ Scratching
sponse to gestural control is enough fast to realize natural Performance Techniques: Analysis and synthesis, Proc.
sound effects by scratch. Stockholm Music Acoust. Conf., pp. 693-696, 2003.
[7] E. Lee, et al., Toward a Framework for Interactive
5. DISCUSSION AND CONCLUSIONS Systems to Conduct Digital Audio and Video Streams,
In this paper, we proposed a method of active music lis- Comput. Music J., 30(1), pp.21-36, 2006.
tening for massive media library. Different styles of musical [8] Eyesweb, InfoMus Lab, [web]
interaction are realized by using the sophisticated glove-like http://www.infomus.dist.unige.it/EywMain.html
input device. The developed system allows humans not only [9] K. Suzuki et al., Robotic Interface for Embodied
to control audio and video signals but also to search audio Interaction via Dance And Musical Performance, Proc.
tracks from massive media library by hand gestures and of the IEEE, 92(4), pp. 656-671, 2004.
body motion. In addition, it is possible to listen to music
244
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
245
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
to learn by playing Chopin Op. 25 no. 10, rather than just understand that other texts will provide the missing pieces of
scales in octaves. the curriculum. In the case of the tablet method, there are no
other methods to fill in the gaps. It will be important that all
2.2 Basic techniques three of these categories are represented.
The most successful methods address the development of 3. STYLUS and TABLET RESEARCH
executive skills in a larger context of attentive practice and
musical development. When physical skills need to be 3.1 An extremely short history
practiced, they should be focused on specifically and with the
same intensity as music making. This describes the approach
of Joe Allard [19], who strongly influences David Liebman’s
method (see table 1). [17] Liebman does not offer the student
any musical études, instead he devotes the first seven chapters
of his book to the act of making a sound with the saxophone,
covering the mechanism part by part, offering visualizations
and physical exercises. His discussion of expressive
techniques covers devices such as pitch bends, portamento,
and vibrato, but does not address how to be expressive, as
defined above, but presents techniques that could be used for
“furthering one’s personal expression, so long as it is within Figure 1: Bert Sutherland at the TX-2, with light pen [1]
the bounds of artistic and musical taste.” Finally, he offers
The first appearance of a pen-computer interface is the
advice on practicing, which make it clear that Liebman
Lincoln TX-0 computer from the MIT Lincoln Laboratory in
expects the student (or teacher) to find other sources for
1957 [31]. There are many music-specific implementations of
études (e.g. [21]) and repertoire that will round out a whole
tablet and spatial interfaces, including Fairlight CMI 2
curriculum.
(although not for real-time performance), Xenakis UPIC [18],
Table 1. Chapter headings from Developing a Personal Buxton SSSP [5], Boie/Mathews/Schloss Radio Drum [3].
Saxophone Sound [17]
Chapter One Overview of The Playing Mechanism
3.2 HCI
Chapter Two Breathing Much can be learned about tablet and stylus interfaces from
literature of human-computer interaction.[22] An important
Chapter Three The Larynx
early study of pointing technologies was done by Paul Fitts in
Chapter Four The Overtone Exercises 1954.[12] His formulation, now referred to as Fitts’ Law,
Chapter Five The Tongue Position and Articulation predicts the time required to rapidly move to a target area, as
Chapter Six The Embouchure a function of the distance to the target and the size of the
target. This work has been expanded with the Steering Law,
Chapter Seven Reeds and Mouthpieces
[1] which deals not just with targets, but also with trajectories.
Chapter Eight Expressive Techniques This work shows that tablets out-perform other input devices
Chapter Nine Practicing (mouse, trackpoint, touchpad, and trackball). While both laws
have wide reaching implications for designers of interfaces,
In a two-hour practice session, one hour is devoted to the focus on untrained movements limit applicability of
different categories of tone exercises, 20 minutes to sight- authors of method books. However, the underlying metrics for
reading, and 40 minutes to “scales, arpeggios, and intervals … evaluating interfaces (indexes of performance) could be
in order to learn the alphabet of music.” For a method book to applied to evaluating performers and their progress. Also,
function in the context of “new” music and new interfaces, the selection of mappings and gestural situations are especially
possible alphabet(s) of music would need to expand beyond critical when preparing an instrument for students. [25]
these patterns. Also, there is no expressive music making in
this practice session – that happens at some other point. The 4. A TABLET METHOD BOOK
point of practicing is “to insure that the needed and physical 4.1 Why Tablet?
and technical manipulations occur quickly and efficiently, so It would be impossible to write a method book that addressed
that a musical idea is immediately transferable from ear to the entire range of instruments that arrive at NIME. While
mind with the soul (emotions) monitoring the entire process.” practicing and learning can be addressed in a general context,
2.3 Practical Information and Advice the details of implementation and developing performance
A third type of material in a pedagogical method is practical skills are specific to an instrument. Since specific skills are
information and advice. Steve Lacy offers a wealth of critical to understanding performance practice, it is desirable
information in his book Findings [15]. In addition to standard to develop a whole method, from basics to real music, around
fare, such as fingering charts, he advises against smoking and one instrument as an example for other instruments. The
poetically describes the rigors of life as an improvising Method for Table could potentially spawn a whole series:
musician. This book also has exercises and études. Method for Wii Remote, Method for Footswitch, etc.
Previous work [29] surveyed musical work with tablets, and
The Inner Game of Music [13] moves away from the category presented reasons why digitizing tablets make good interfaces.
of method book entirely, offering exclusively advice in prose. Briefly, the tablet interface offers:
With no musical examples, this is still an important addition
to instrument pedagogy. Like the methods above, the authors
2
http://en.wikipedia.org/wiki/Fairlight_CMI
246
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Low cost the method in Pd6, to be compatible with the largest number
Easy availability of possible users.) Incoming tablet data is mapped using an
High resolution output data Open Sound Control [28] wrapper, which is part of the
Fine temporal resolution CNMAT Max/MSP/Jitter Depot. [30]
Multiple axes of control
These qualities are even more important in choosing the focus
for a method book than they are for choosing one’s personal
instrument. It would be impractical to write a method for a
unique interface, no matter how good it is. The desire is for
people to use this text, either for individual practice, in
groups, or in the classroom.
Tablet interfaces offer other benefits as an instrument for
beginners. Stylus-based interfaces outperform other pointing
devices, such as joysticks, because they leverage the high Figure 3: An exercise based on Engraver Script by Willis
bandwidth of the thumb and finger in combination [2]. Most A. Baird (http://www.zanerian.com/BairdLessons.html)
performers come to the tablet with pre-existing pen skills, and
physical demands of the instrument are such that they are
attainable by a large number of users. (There are no issues The second section consists of basic exercises, analogous to
with handedness, for instance.) Tablet interfaces have been scales and arpeggios of classical instrumental technique. Their
part of the NIME community since the beginning [26] and are nature as interactive software means that some of the pitfalls
now are well established, appearing both in performance and of exercises (e.g. mindless repetition) are avoided. While the
print [8, 24, 29]. instrument mapping should stay the same, the content and
difficulty of an exercise adapts to the level of the student. An
alternate model for these exercises is a computer game[9].
The third section is the largest and most musically interesting.
It consists of études by a number of composers. For example:
• M. Zbyszyski’s News Cycle #2 [29] requires the player to
pull lines from a video stream to generate sound. 7 A Fitts-
esque exercise involves quickly and accurately putting the
pen down in a zone on the tablet surface.
• News Cycle #2 also uses the buttons and sliders on an
Intuos3 tablet, and requires the user to switch pens.
• N. D’Alessandro’s HandSketch [8] controller uses a polar
coordinate system, calibrated to the ergonomics of a
performer’s arm. This mapping is presented, and
calibrated for individual users. Individual gestures
(forearm for pitch, fingers for intensity) are practiced in
isolation and in combination.
• Ali Momeni [29, 20] uses multiple interpolation spaces:
one controlled by the tip of the pen and one by the tilt.
While initially difficult, this complex spatial navigation
scheme has huge expressive potential.
Figure 2 An Elementary Method for Tablet • Matthew Wright [29] employs a scrubbing metaphor,
where a click on the tablet defines the material to which a
4.2 The Method long trajectory is applied.8 This method also generates
The method book will have three basic sections: Practical
multiple spaces and navigation challenges.
Issues, Basic Exercises, and Études.
In addition to myself, I have already invited other members of
The practical issues section covers topics of getting situated the NIME Community to contribute, and I anticipate
with a tablet interface, including a discussion of which tablet involving additional composers in response to this paper.
to acquire (sizes and models, strengths and weaknesses), the Études should be short, focused pieces that deal with a
use of alternate pens, etc. Setting and adjusting the driver and technical issue from the composer’s musical practice.
sensitivities for musical performance follow, then Hopefully, the pieces will be more in the model of Chopin
recommended software implementations and conventions than Czerny, fully formed pieces of music and not simply
specific to the method book. The exercises and études in the exercises.
method use Max/MSP3 and Jean-Marc Couturier's Wacom
Further important topics will be addressed in an appendix or
Object4. They are programmed so that students can use the
the Advanced Method. These include:
free, runtime version of Max, and distributed under a Creative
Études and exercises that are intended for use in pairs, or
Commons License5 that allows sharing in a non-commercial
larger groups are also desirable in this section.
context. (It is also worth considering implementing some of
3 6
http://www.cycling74.com/ http://crca.ucsd.edu/~msp/software.html
4 7
http://cnmat.berkeley.edu/ video at: http://www.mikezed.com/music/nc2.html
5 8
http://creativecommons.org/licenses/by-nc/3.0/ video at: http://www.youtube.com/watch?v=4dTcSeDTq84
247
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Extensions to the tablet interface by employing an [15] Lacy, S. Findings: My Experience with the Soprano
alternate controller in the other hand, including the Saxophone. Paris, Outre Mesure, 1994.
Qwerty Keyboard, fader boxes, and FSR’s. [16] Lehman, A. A. “Efficiency of deliberate practice as a
Material that has a more explicit connection to the use of moderating variable in accounting for sub-expert
the stylus in other arts, such as writing, drawing, and performance” in Deliege and Sloboda (eds.) Perception
painting – Calligraphy or sumi-e inspired études. and Cognition of Music. Hove, East Sussex, Psychology
5.ACKNOWLEDGMENTS Press 1997.
Thanks to my colleagues Richard Andrews, Adrian Freed, [17] Liebman, D. Developing a Personal Saxophone Sound.
David Wessel, and Matthew Wright and to Wacom Co., Ltd. Medfield, MA, Dorn Publications, 1994.
[18] Marino, G, M. Serra, and J. Raczinski “The UPIC
6.REFERENCES System: Origins and Innovations” In Perspectives of New
[1] Accot, J. and S. Zhai “Performance evaluation of input Music Seattle, WA, Volume 31.1 1993, pp. 258-269.
devices in trajectory-based tasks: An application of the
[19] McKim, D. J. Joseph Allard: His Contributions to
steering law. ACM Conference on Human Factors in
Saxophone Pedagogy and Performance. Published
Computing Systems. Pittsburg, PA, 1999, pp.466-472.
Doctor of Arts Dissertation, University of Colorado,
[2] Balakrishnan, R. and MacKenzie, I. S. “Performance 2000.
differences in the fingers, wrist, and forearm in computer
[20] Momeni, A. and D. Wessel “Characterizing and
input control. Proc. of the SIGCHI Conference on Human
Controlling Musical Material Intuitively with Geometric
Factors in Computing Systems Atlanta, Georgia, United
Models” Proc. of the New interfaces for Musical
States, March 22 - 27, 1997. S. Pemberton, Ed.
Expression Conference, Montreal, Canada, 2003, pp.54-
[3] Boie, B, M. Mathews, and A. Schloss “The Radio Drum 62.
as a Synthesizer Controller,” In Proc. of the International
[21] Niehaus, L Jazz Conception for Saxophone. Hollywood,
Computer Music Conference Colombus, OH, 1989,
Try Publishing Company, 1965.
pp.42-45.
[22] Orio, N., N. Schnell, and M. Wanderly “Input Devices
[4] Buxton, B. Sketching User Experiences: getting the
for Musical Expression: Borrowing Tools from HCI”
design right and the right design. Morgan Kaufmann,
Proc. of the New Interfaces for Musical Expression
San Francisco, 2007.
Conference. Seattle, WA, 2001.
[5] Buxton, W, R. Sniderman, W. Reeves, S. Patel and R.
[23] Sand, B. L. Teaching Genius: Dorothy DeLay and the
Baecker “The Evolution of the SSSP Score Editing
Making of a Musician. New York, Amadeus Press, 2000.
Tools” In Computer Music Journal (The MIT Press:
Cambridge, MA, Volume 3.4, Winter 1979), pp. 14-25. [24] Schacher, J. “Gestural Control of Sounds in 3D Space”
Proc. of the New Interfaces for Musical Expression
[6] Czerny, C. The Art of Finger Dexterity. G. Schirmer,
Conference. New York, USA, June 2007, pp.358-362.
New York, 1986.
[25] Wanderly, M. “Gestural Control of Music” Proc. of the
[7] Czerny, C. The School of Velocity. G. Schirmer, New
International Workshop on Human Supervision and
York, 1986.
Control in Engineering and Music. Kassel, Germany,
[8] D’Alessandro, N. and T. Dutoit “Handsketch Bi-Manual 2001.
Controller” Proc. of the New Interfaces for Musical
[26] Wessel, D. and M. Wright “Problems and Prospects for
Expression Conference. New York, June 2007, pp.78-81.
Intimate Musical Control of Computers” ACM
[9] Denis, G. and P. Jouvelot “Motivation-driven educational Computer-Human Interaction Workshop on New
game design: applying best practices to music education” Interfaces for Musical Expression, Seattle, WA, 2001.
Proc. of the 2005 ACM SIGCHI International Conference
[27] Whiteside, A. On piano playing. Amadeus Press,
on Advances in computer entertainment technology, June
Portland, Or., 1997.
15-17, 2005, Valencia, Spain, pp.462-465.
[28] Wright, M. and A. Freed “Open Sound Control: A New
[10] Dobrian, C. and D. Koppelman “The ‘E’ in NIME:
Protocol for Communicating with Sound Synthesizers”
Musical Expression with New Computer Interfaces”
Proc. of the International Computer Music Conference.
Proc. of the New Interfaces for Musical Expression
Thessaloniki, Hellas, 1997, pp.101-104.
Conference. Paris, France, June 2006, pp.277-281.
[29] Zbyszyski, M., M. Wright, A. Momeni, and D. Cullen
[11] Ericsson, K. A., R. Th. Krampe, and C. Tesch-Römer
“Ten Years of Tablet Musical Interfaces at CNMAT”
“The role of deliberate practice in the acquisition of
Proc. of the New Interfaces for Musical Expression
expert performance” Psychological Review, 100, 1993,
Conference. New York, USA, June 2007, pp.100-105.
pp.363-406.
[30] Zbyszyski, M., M. Wright, and E. Campion “Design and
[12] Fitts, P. “The information capacity of the human motor
Implementation of CNMAT's Pedagogical Software”
system in controlling the amplitude of movement”
Proc. of the International Computer Music Conference,
Journal of Experimental Psychology, 47, 103-112, 1954.
Volume 2, Copenhagen, Denmark, 2007, pp.57-60.
[13] Green, B. and T. Gallwey The Inner Game of Music. New
[31] “The TX-0: Its Past and Present” The Computer Museum
York, Doubleday, 1986.
Reports, Vol. 8. Boston, Computer History Museum,
[14] Hanon, C. L. The Virtuoso Pianist in 60 Exercises. G. 1984.
Schirmer, New York, 1986.
248
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
We present an audio waveform editor that can be oper-
ated in real time through a tabletop interface. The system
combines multi-touch and tangible interaction techniques in
order to implement the metaphor of a toolkit that allows di-
rect manipulation of a sound sample. The resulting instru-
ment is well suited for live performance based on evolving
loops.
Keywords
tangible interface, tabletop interface, musical performance,
interaction techniques
1. INTRODUCTION
The user interface of audio editors has changed relatively
little over time. The standard interaction model is centered
on the waveform display, allowing the user to select portions Figure 1: Close view of the waveTable prototype.
of the waveform along the horizontal axis and execute com-
mands that operate on those selections. This model is not
very different to that of the word processor, and its basics Barcelona 2005) which inspired this project. The use of a
are usually understood by computer users even with little standard audio editor on a laptop computer as a sophisti-
or no experience in specialized audio tools. As a graphical cated looper served the performers’ minimalist glitch aes-
representation of sound, the waveform is already familiar thetics. Perhaps more importantly, the projected waveform
for many people approaching computers for audio and mu- provided a visual cue that helped the audience follow the
sic composition. Thus, audio editors have become general evolution of the concert, a simple solution to one of the
tools used for many different applications. most criticized problems of laptop based performance.
Particularly interesting are creative uses of these pro- Tabletop tangible interfaces have gained popularity in re-
grams that go beyond their originally devised functionality. cent years by allowing intuitive interaction with computers.
In describing ’glitch’ music, Kim Cascone wrote: In music performance, they bring back this needed visual
contact with the audience that is missing in laptop music
”In this new music, the tools themselves have by making interaction readable [17]. Thus, the availability
become instruments, and the resulting sound is of low cost means for building multi-touch and tangible in-
born of their use in ways unintended by their terfaces opens the door to a new revision of the possibilities
designers.” [4] of direct interaction with waveforms.
In this sense sound editors have revealed specially useful In this article we describe the waveTable, a tangible sound
in the production of errors and glitches and in general for editor that may be used as a sophisticated looping and sam-
experimental sound design. Thus, it may not be surprising ple manipulation device for live performance with an intu-
that, despite the essentially non-realtime interaction model itive interface that provides feedback to both the performer
that typically governs these programs, many musicians have and the audience.
used them in live performances. One example of this was
the Sound Waves live set by Saverio Evagelista and Federico
Spini (9th LEM International Experimental Music Festival,
2. RELATED WORK
The idea of sound generation from hand-made waveforms
was already envisioned in the 1920s by László Moholy-Nagy,
who proposed that the incisions in wax played by the phono-
Permission to make digital or hard copies of all or part of this work for graph could be created by hand (quoted in [9]). In the
personal or classroom use is granted without fee provided that copies are 1970s, Iannis Xenakis’ UPIC explored many different tech-
not made or distributed for profit or commercial advantage and that copies niques for sound generation from drawings, including wave-
bear this notice and the full citation on the first page. To copy otherwise, to forms and envelopes [8]. Also, one of the first commercial
republish, to post on servers or to redistribute to lists, requires prior specific samplers, the Fairlight CMI, included a light pen for the
permission and/or a fee.
NIME08, Genoa, Italy purpose of waveform and envelope edition.
Copyright 2008 Copyright remains with the author(s). In recent years a number of systems have been developed
249
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
that exploit tangible interfaces based on commodity com- show how this approach enables the implementation of ba-
ponents for music composition and performance. Most of sic concepts of data edition available in desktop computers
them are focused on synthetic sound generation or realtime for the case of a tabletop sound editor. The result is a tool
sequencing, but do not directly address the problem of tan- that allows the user to sculpt sound in a convenient way so
gible manipulation of waveform data. that sound design becomes a realtime composition process.
One of the applications of Enrico Costanza’s d-touch [5]
library, the physical sequencer, represents sound samples
as tangible objects in a cyclic timeline. Sounds can be 3. INTERACTION DESIGN
metaphorically loaded into objects through a microphone
and several effects can be applied by overdub recording. 3.1 Toolkit Metaphor
The Music Table [19] allows to compose music patterns The relevance of metaphor is traditionally recognized in
by placing cards on a table. Cards are tracked through the field of Human-computer interaction and interface de-
a camera and displayed on a separate screen with an aug- sign, being also applicable to Tangible User Interfaces (TUI)
mented reality layer overlapped. Copying patterns is sup- [7]. Interface metaphors are able to communicate the way
ported thanks to a copy card which stores patterns in phrase users can interact with the system, suggesting or simplifying
cards to be reused or edited at any time without requir- possible actions [6]. Within tangible interfaces, it has been
ing the presence of their note cards. The reactable [12] identified that real-world objects can be used in computer
has become one of the most popular multi-touch and tan- systems to couple physical and digital representations [18].
gible tabletop instruments. This collaborative instrument Thus, metaphor and coupling should provide meaning by
implements dynamic patching in the tradition of modular helping to establish a a continuous dialogue between the
synthesizers. Among many other features, the reactable al- physical and the virtual [6]. This is accomplished in our sys-
lows to draw the waveform of a wavetable oscillator us- tem by metaphorically mapping tangible pucks to tools [7].
ing one finger. Looping samples is also supported with The principal metaphor chosen for interacting with the
sampler objects [11]. Using the reactable technology, the waveTable system is closely inspired by the widely used con-
scoreTable* [10] explores realtime symbolic composition in cept of a tools palette found among graphical desktop ap-
a circular stove. Being more focused on the higher-level plications since the 1980s (e.g. in MacPaint or HyperCard),
compositional aspects of music performance, this project including some sound editors. This approach may be useful
takes advantage of the reactable round shape to represent for shaping the waveform graphically employing tangible
a cyclic timeline, allowing users to collaborate in moving and iconic tools, establishing an interactive dialogue that
physical notes along the stove. Golan Levin’s Scrapple [15] uses familiar verbs and nouns (in the sense proposed in [7]).
allows the generation of a spectrographic score using tangi- Thus, an effective toolkit is provided that can be easily ex-
ble objects laid on a long table, with an augmented reality ploited by musicians, experts or beginners, facilitating the
overlay for visual feedback. This approach seeks a compro- act of editing sound.
mise between compositional precision and flexibility in the
tradition of spectrogram based composition. 3.2 Tools and Gestures
One common aspect of these projects is that tangible ob- According to Bill Buxton, the natural language of inter-
jects are used as physical representations of data. Thus, action should deal with non-verbal dialogues and highlight
their interfaces imply that manipulating a tangible object the gestures as phrases with their own meaning [3]. The in-
is analogous to performing modifications on the underlying teraction elements used in the waveTable are both physical
model. The main drawback of this approach is that physi- artifacts and fingers, and the properties detected by the sys-
cal objects cannot be created from scratch, nor can they be tem are 2D position, rotation and presence of the objects,
duplicated. As seen with the Music Table, this may lead to as well as one or two finger movements. The toolkit is com-
break the relationship between the tangible object and the pounded of tools representing basic operations such as copy,
digital model. We propose the utilization of tangibles as paste or erase, as well as tools that represent audio effects
tools, which represent functions that operate on data. We applied in real time. The chosen mapping is one object
250
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
per tool. There are four main groups of gestures and tools, Tools for the described operations are simple acrylic plas-
namely Editing, Effects, File and Visualization/Navigation tic pieces with fiducial markers attached on one side and
operations, as shown in Figure 2. descriptive icons on the other. Tools and fingers are illumi-
Editing tools represent operations used for basic modifi- nated using infrared leds and captured by a webcam with a
cation of the sound: Eraser, Pencil, Copy, Paste and Gain. visible light blocking filter. Captured video is processed by
Eraser deletes part of the sample when moving along the x reacTIVision which tracks position and rotation of fiducial
axis. Pencil allows freely drawing waveforms with one fin- markers, as well as position of fingers. This information is
ger when present. Copy stores a fragment selected by drag- encoded using the Tangible User Interface Objects (TUIO)
ging over the waveform along the x axis. Paste stamps that protocol based on Open Sound Control (OSC) [14] and sent
fragment at the object position and repeats it when moved over UDP to the waveTable software. Visual feedback is
along the x axis. Gain increases or decreases the overall provided through rear projection on the table surface.
amplitude when turning the object clockwise or counter-
clockwise.
Effects tools represent common audio effects applied to
the sound in real time: Delay, Resonant low pass filter,
Tremolo, Reverb and Bit crush. In all cases position and
rotation are detected, modifying respectively the position
and shape of an envelope. Each envelope controls the most
relevant parameter of its respective effect.
File tools are applied following the VCR metaphor (com-
mon buttons present in VCRs or CD players [6]): Open file,
Play and Record. Open file allows previewing samples from
a collection displayed in a radial menu, and loading one by
pointing with the finger. Play reproduces the sound in a
loop when it is present. Turning the object clockwise or
counterclockwise increases or decreases the playback rate
respectively. Record captures the output of the system in
real time when present, and then swaps the playback sample
for the result.
Figure 4: A sample of waveTable tools.
Visualization/navigation gestures and tools are con-
cerned with displacement and zoom level: Two finger zoom,
One finger scroll and Grid. Two finger zoom allows nav- The software is written in the SuperCollider 3 [16] lan-
igation between the closest detail and the most general guage, which is based on a distributed architecture where
overview of the waveform depending on the direction and audio synthesis is carried out by a specialized server process
distance between fingers. One finger scroll provides the that is controlled using OSC. This environment allows rapid
option of moving from the starting point towards the end development and very easy implementation and evaluation
point of the sample scrolling right or left along the x axis. of all kinds of effects and operations over audio buffers that
Grid shows a pattern of vertical lines in order to facilitate can be used in real time. Moreover, the Mac OS X ver-
the task of some editing tools, say, eraser, copy or paste. sion provides a set of graphics primitives and a ready-made
waveform display. On the other hand, the distributed na-
ture of the system involves some complications in the syn-
4. IMPLEMENTATION chronization of data between the server and the client. In
the current prototype this limitation is overcome by using a
Using computer vision software like reacTIVision [13] it
RAM disk. Integration with reacTIVision is done through
is now possible to build tangible and multi-touch interfaces
Till Bovermann’s implementation of the TUIO protocol [1].
with low cost components. In order to implement the con-
The software is logically divided into control, model and
cept of realtime direct manipulation of a looping sample, the
view modules. The control module is composed by a hier-
waveTable system was developed as a program that runs on
archy of TUIO objects that handle each of the tools, and
reactable-class hardware using the reacTIVision framework.
a class that handles TUIO cursors (fingers). The model is
implemented as a SuperCollider server node tree, that runs
synth definitions for playing the sound buffer and dynam-
ically applying effects. An overdub record synth definition
allows swapping the playback buffer with a recording of the
output. The view module is also composed of a hierarchy of
objects that implement the graphic representation of tools
and envelopes, and a container view that manages the main
waveform display.
5. USAGE
The resulting prototype makes it possible to modify a
sound sample using fingers and tangible artifacts at the
same time it is being played in a loop. This is accomplished
by loading a sample with the Open file tool. For starting
from scratch by drawing a waveform, a sample filled with
silence of the desired length may be loaded. The waveform
of the sample is then projected onto the table, and can be
zoomed and scrolled using finger gestures. Locating the
Figure 3: System overview. Play tool starts looping the sound, and rotating it modifies
251
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
252
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT FFT
Audio
GUI
In
This paper describes a new algorithmic approach to instru-
mental musical composition that will allow composers to
Audio
explore in a flexible way algorithmic solutions for different Input CAC FFT In
compositional tasks. Even though the use of computational
tools is a well established practice in contemporary instru- GUI
Analysis UI IAC
mental composition, the notation of such compositions is
still substantially a labour intensive process for the com-
Compositional
poser. Integrated Algorithmic Composition (IAC) uses a procedures
Output Stochastic
generator Notation
fluid system architecture where algorithmic generation of
notation is an integral part of the composition process.
Stochastic Notation
generator
Keywords
Algorithmic composition, automatic notation
Figure 1: Rackbox v.s fluid architecture.
1. INTRODUCTION
Algorithmic composition can be defined as a composition completely algorithmic workflow –from the first idea to the
practice that employs formalized procedures (algorithms) final score–, can be defined as Integrated Algorithmic Com-
for the generation of the representation of a musical piece. position (IAC). An IAC approach pursues the integration
Apart from the many ante litteram examples, algorithmic of notation generation with musical data manipulation, so
musical composition has been proposed and practiced widely that any manual process could be removed from the com-
starting from the ’50s. In particular, from the late ’50s a position pipeline.
computational perspective started spreading across the two The paper is organized as follows: first IAC/CAC approaches
Western continents (see [1] for a detailed discussion). An in- are discussed in relation to different software architectures;
teresting shift in perspective has occurred roughly from the then, the need for a specific architecture is motivated in
’60s up to present day. The first approaches to algorithmic relation to automatic generation of music notation; finally,
composition were driven by instrumental scoring. But, even two cases are presented.
if computer tools are largely widespread in contemporary in-
strumental scoring through computer-assisted composition
systems (hence on CAC, e.g. PatchWork, Open Music, [2],
2. RACKBOX VS. GLUE ACHITECTURES
PWGL [9], but also Common Music, [11]), the idea of a CAC systems are intended to aid the composer in the
purely algorithmic approach, in which a strict formalization computational manipulation of musical data: these data,
rules the whole composition process, is no more pursued in at the end, can be exploited in traditional score writing.
its integrity and has migrated from the instrumental domain Typically based on Lisp, CAC systems offer a large body of
to the electroacoustic one. In fact, considering the final out- functionalities: pitch/rhythm operations remain the core of
put of the composition process, while in the electroacoustic the system, with the inclusion of input modules for audio
domain the synthesis of the audio signal is a trivial task analysis and sound synthesis modules in output. All this
per se, in the instrumental domain the generation of mu- functionalities are typically accessible through a GUI envi-
sical notation still remains a very difficult task ([4], [10]). ronment. While GUI is the main interface to the system,
This notational issue has prevented the diffusion of real al- offering an easier access for the less programming-oriented
gorithmic practice for instrumental composition. Such an composer, a high degree of flexibility is offered by enabling
approach, in which the composition process is turned into a the user to extend the program via the Lisp language. Still,
CAC architectures are based on the assumption that new
functionalities must in some way be adapted to the host-
ing environment. A CAC application architecture can be
Permission to make digital or hard copies of all or part of this work for thought as a rackbox containing a certain number of mod-
personal or classroom use is granted without fee provided that copies are ules (Figure 1, left): the box can leave large room for other
not made or distributed for profit or commercial advantage and that copies modules to be inserted in it. Still the container is solid
bear this notice and the full citation on the first page. To copy otherwise, to and consequently rigid, its capacity is finite, and the mod-
republish, to post on servers or to redistribute to lists, requires prior specific ules, in order to be inserted, must meet the requirements of
permission and/or a fee.
NIME08, Genova, Italy the box geometry. By reversing the perspective, a different
Copyright 2008 Copyright remains with the author(s). approach to computer-based composition environments can
253
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
254
Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org ppp Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Music engraving by LilyPond
Music engraving 2.5.29 2.5.29
by LilyPond — www.lilypond.org
— www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Python Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — ww
p
.063 ppp
Music engraving by LilyPond 2.5.29 — www.lilypond.org
vertices
.061 Music engraving by LilyPond 2.5.29 — www.lilypond.org Musi
2 notation
.044 Music engraving by LilyPond 2.5.29 — www.lilypond.org
3
Music engraving by LilyPond 2.5.29 — www.lilypond.org
ppp
.141 Labels (edges) Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org
.124
Music engraving by LilyPond 2.5.29 — www.lilypond.org
Edges Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engravin
Music engraving by LilyPond 2.5.29 — www.lilypond.org
I ppp
ppp
ppp Music engraving Music
by LilyPond 2.5.29
engraving by—LilyPond
www.lilypond.org
2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engravingMusic
by LilyPond 2.5.29
engraving by— www.lilypond.org
LilyPond ppp
2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org
4
Music engraving by LilyPond 2.5.29 — www.lilypond.org
text
"Graph I" Annotations
TeX Prestissimo possibile, ma preciso
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond2.5.29— www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
ConTeXt
Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org
In ogni arco, l’etichetta indica il valore a cui deve essere legata l’ultima nota del vertice da cui l’arco inizia.
Tutto deve essere suonato alla 15 ma superiore.
Music engraving by LilyPond2.5.29— www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving
Figure 3: A graph model is fed into Scalptor, gluing LilyPond and ConTEXt to generate a graphic notation. Music engraving Music
by LilyPond
engraving
2.5.29
by—LilyPond
www.lilypond.org
2.5.29 — www.lilypond.org
Music engraving by LilyPond 2.5.29 — www.lilypond.org
provided with some drawing capabilities. Examples of fluid fitting in this case. LATEX and ConTEXt are two typeset-
Music engraving by LilyPond 2.5.29 — www.lilypond
architectures implementing IAC systems are described in
Music engraving by LilyPond 2.5.29 — www.lilypond.org
ting systems for document preparation implemented as a Music engraving by LilyPond 2.5.29 — www.lilypond.org Music engraving by LilyPond 2.5.29 — www.lilypond.org
[13] (where they are referred as Automatic Notation Gen- set of TEX macros. Both allow to work together with ad-
erators), [5], and [3]. In the rest of the paper, we discuss vanced graphic packages. ConTEXt ([6]) has been chosen
two cases of IAC where different fluid systems are designed as it provides direct support for the MetaPost graphic lan-
to fit different needs, allowing for a complete algorithmic guage and extends it by adding a superset of macros (named
control over the final score. “Metafun”) explicitly oriented towards design drawing (e.g.
allowing pdf inclusion). For this particular project, Python
4. GRAPHIC NOTATION has been chosen as the gluing language: it has a remark-
ably clear syntax and meets all the previously discussed
In the first project, the final scores (for piano solo) is com- requirements for an IAC language. Python takes into ac-
posed of a page in very large format (A0) containing graph- count all composition data processing, i.e. graph generation
ical notation. The formal composition model is a graph and manipulation algorithms, and also the gluing, scripting
and the notation mirrors visually the graph structure (Fig- process. The Python module, named Scalptor (“engraver”),
ure 3). All information associated to the graph data struc- generates the score by writing text files containing code for
ture in the model has to be mapped into music/notation each of the involved modules and calling each module in
information, so that notation can be generated automati- order to render it.
cally. The score is made up of musical notation (vertices
and edge labels), graphics (graph drawing), text (perform-
ing annotations) (Figure 3, right): all these components 5. SPECTRAL COMPOSITION
must be provided by programmable modules and their out- As previously noted, an IAC system should provide room
put integrated in an unique document. A strong constraint for inserting modules specialized in audio analysis. Analy-
is that musical tradition requires high typographic quality sis parameters can then be processed and used as starting
both for the overall document and for the specific musi- material for musical composition. Figure 4 represents an
cal notation elements. As all the involved components are implementation of an IAC fluid system for a composition
alphabetic or geometric, vector graphic solutions are conse- project involving parameter extraction from audio signals.
quently needed. In generale, as standard GUI applications In particular, the commission was to use as starting material
are here not relevant, the possible candidates shares a TEX- an excerpt from Sophocles’ Antigon, which was read by a
based approach ([8]), i.e. they are command languages, to philologist so to respect as possible the reconstructed Greek
be input via textual interface and to be compiled in order classic pronunciation. Three voices sing melodies generated
to generate a vectorial output. Concerning musical nota- from data resulting from the analysis of the original au-
tion, among the possible candidates (for a review see [10]), dio file, in particular from the fundamental frequency and
LilyPond, while still sharing a TEX-oriented approach, en- the first two formants. The Praat software has been cho-
sures very high typesetting quality but on the same time sen for the analysis task, being it specialized in phonetic
can be tailored for advanced uses, has a simple, human- processing. The SuperCollider application ([12], hence on
readable syntax, it has undergone a fast development and SC) has been chosen both as system glue and as an audio
it is now the most common text-based music notation ap- module: as a language, SuperCollider is rich in data struc-
plication. LilyPond scripting solves the problems of gener- ture, highly expressive, provides an interfaces to the OS
ating standard notation for the vertices of the graph, but environment, allows for string manipulation; as an audio
the resulting files (one for each vertex, in pdf/ps format) server, it provides state-of-the-art sound processing. Most
must then be included into the drawing of the graphic no- importantly, from a UI perspective, SC allows for interac-
tation. It is interesting to see that many candidates are tive sessions and provides also programmable GUIs. Com-
255
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Audio data
- input
Compositional Audio
data processing analysis
Transcription
SuperCollider
- processing
0.35861678
Lei
Audio Praat 5000
synthesis
4000
before, a transcription module is responsible for the gener- Figure 5: Different outputs. SC GUI, Praatsf graph- sf
ation of LilyPond files which can then be rendered to final ics, LilyPond notation. s r h st' pi ç x k%
0 0 0 0 0
pdf score file. The transcription algorithm also performs a Voce 2 4 2 " " " " " " " " ( !) ! ! ! !!!! !!!!
melodic contour evaluation on the input data, so that con-
tinuous pitch increases/decreases are converted into ascend-
7. REFERENCES sf
s r h st' t' - - - - -
tually does not support Unicode, the LilyPond file has been O. Delerue. Computer-assisted composition at
post-processed by a Python module replacing special ASCII IRCAM: From PatchWork to OpenMusic. Computer
strings sequences with necessary Unicode glyphs. SC pro- Music Journal, 23(3):59–72, 1999.
vides facilities to sonify in real time all the data, i.e. before. [3] T. Baça. Re:8 Lilypond for# serial music? LilyPond
&
7 $#
( # $# # (& ( # # # ( " " " " " " " " (& *
and after processing and, through GUI packages, the same1 * '
S
pp
mailing list (lilypond-user@gnu.org), Nov. 28 2007.
pp p mf
data can be displayed on screen. For purposes of documen- # [4]- D. Byrd. - ø Music # % notation
- - ø# software and intelligence. e -
tation, high quality, vector graphics has been generated by - Computer % &
Music - $ # Journal, 18(1):17–20, 1994. 7 %
writing in SC the opportune modules. Such modules allow ( $#
1 # #( ( # # # # # $ # # (Specification
" " " " " " " " (& # # $#
pp [5] N. Didkovsky. pp Java Music Language,
T
8 p mf
to interface Praat, which is able to create graphics from # o v103 øupdate. # In % Proceedings
o - - - - ø# of the International e æ
all its data, and the PyX Python graphics package, which Computer Music Conference 2004, Miami, 2004.
- 7 %
has been used to plot compositonal data structure. FigureB /53 ( $ # [6] H. Hagen. ConTeXt the manual. #( " " PRAGMA
" " " " " " (& #
mf #& p
Advanced
shows (from top to bottom) a GUI from SC plotting formant # -
Document - -
Engineering,
- o -
Hasselt NL, 2001.
- - #
f
e -
data, the same data exported by Praat into an eps file, and [7]!5 K.
. 0 H. Hamel. A design for music editing 0and
) 0 printing 7 !
an excerpt from the final score by LilyPond. This rich sys- + !) !
4 ! ( "
software "
based " ( & ! !syntax.
" on "notational ! & ! ! & Perspectives
!!! ! + " "of " " ( !) !
!6 * !6
V1
gained by using an interactive language as a system glue. [13] ppH. Wulfson, G. D. Barrett, and M. Winter.
8
p Automatic
pp
. ! 7
4 ! ! + " + !) ! !( " " " " " " " " + ! ( " " " " " " (
!! !) &
sf 6
V1
256 sf
st' ti d# to tum ' k% h
0 0 !, 0 0 0 0 0 0 0 0 0 7
4 ! !& ! ! ! ! ! ! ! ! ! ! ! ( " " " " " " + ! !!!! ! ! !!!!( " " " " (
! !)
V2
sf
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Andrea Valle
CIRMA, Università di Torino
via Sant’Ottavio, 20
10124 - Torino, Italy
andrea.valle@unito.it
ABSTRACT 1: woodLow
vLab woodLow woodHi woodHi woodLow woodHi
This paper is about GeoGraphy, a graph-based system for vID 1 2 2 1 2
the control of both musical composition and interactive per- 4: 1 3: 0.5 1: 0.7
eID: eDur 4: 1 2: 1.2 1: 0.7 3: 0.5
formance and its implementation in a real-time, interactive t
257
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
comes the origin of a path); then the actant navigates the GraphParser GUI
258
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
259
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3 2
Figure 4: GeoGraphy GUIs. 1. Runner GUI, 2. Painter, 3. SC Post Window, 4. GraphParser GUI.
ties. Interactive graph manipulation can mix iXno script- [2] E. Gansner, E. Koutsofios, and S. North. Drawing
ing with GUI control through the Painter space. GUI cre- graphs with dot, 2006.
ation/deletion can be scheduled by SC programming. As [3] W. Ge and P. Cook. Chuck: A concurrent, on-the-fly
iXno is defined on top of SC Geography classes, it rep- audio programming language. In Proceedings of the
resents a simplified interface to SC classes which can be International Computer Music Conference (ICMC),
used directly in SC programming, by constructing strings Singapore, 2003.
and passing them to GraphParser instances. Through iXno, [4] T. Magnusson. The ixiQuarks: Merging code and gui
code snippet readability is highly improved with respect to in one creative space. In Proceedings of the
the SC-only code. This is useful for live coding both in International Computer Music Conference 2007,
terms of typing speed and code management. Copenhagen, August 27-31, 2007.
[5] C. Nilson. Live coding practice. In NIME ’07:
5. CONCLUSIONS Proceedings of the 7th international conference on
The real-time, interactive implementation of the GeoG- New interfaces for musical expression, pages 112–117,
raphy system aims at merging different attitudes towards New York, NY, USA, 2007. ACM.
composition. Real-time usage has required the development [6] E. Pietriga. A toolkit for addressing hci issues in
of different interfacing systems: GUI, code and script. This visual language environments. IEEE Symposium on
has been possible through a strictly modular architecture, Visual Languages and Human-Centric Computing
which favors the insertion of modules specialized for differ- (VL/HCC), 00:145–152, 2005.
ent tasks. The resulting three-layer structure has proven [7] C. Roads. The computer music tutorial. The MIT
to be useful in order to allow the user a maximum control Press, Cambridge, Mass., 1996.
flexibility, providing a smooth transition between the two [8] J. Rohrhuber, A. de Campo, R. Wieser, J.-K. van
extremes of code typing and GUI. In this way, it is possible Kampen, E. Ho, and H. Hölzl. Purloined letters and
for the musician to merge features typical of “out-of-time” distributed persons. In Music in the Global Village
algorithmic composition with a real-time control over per- Conference, Budapest, December 2007.
formance. [9] A. Sorensen. ”impromptu: An interactive
GeoGraphy is distributed as a “quark, a public SC ex- programming environment for composition and
tension (see http://quarks.sourceforge.net/). See also performance”. In Proceedings of the Australasian
http://www.cirma.unito.it/andrea/notation/. Computer Music Conference 2005, pages 149–153.
ACMA, 2005.
6. ACKNOWLEDGMENTS [10] H. Taube. An introduction to Common Music.
A thank is due to Vincenzo Lombardo for his continuous Computer Music Journal, 21(1):29–34, 1997.
support. [11] A. Valle and V. Lombardo. A two-level method to
control granular synthesis. In Proceeding of the XIV
7. REFERENCES CIM 2003, pages 136–140, Firenze, 2003.
[1] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. [12] S. Wilson, D. Cottle, and N. Collins, editors. The
Design Patterns: Elements of Reusable Object SuperCollider Book. The MIT Press, Cambridge,
Oriented Software. Addison-Wesley, 1995. Mass., 2008.
260
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
261
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
r1 r2 r1 r2
os
tp
no
262
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Output
Inputs
Figure 5: Inputs and Outputs in a Script GUI
Figure 7: The Resource Pane Window
2.3.1 Extensions of the Interconnection Scheme
Interconnections are not limited to audio signals, but can also be 4. APPLICATION EXAMPLE: AN
created for control signals. Additionally, there is a similar scheme
for linking scripts so that they can exchange messages or function
AUDIOVISUAL SEQUENCE
calls. This is implemented by attaching editable pieces of code, The tools described above are currently being evaluated for
called "snippets" to scripts, which can be used to further control or application in mixed media for artistic production and for
automate the script's behavior. education. Figure 6. shows the results of work done by two
students, Alexandros Synodinos and Christos Mousas, at the
3. RESOURCE MANAGEMENT Department of Audiovisual Arts at the Ionian University as part of
A characteristic difference of experimental and programmable 4th year undergraduate coursework. These students had no
development environments to commercial tools for image or experience in programming at all.
sound processing is the relative lack of management facilities of
the former. Applications such as FinalCut Pro, DVD Studio, Logic
Audio, Cubase etc. use their own file formats for saving "project
data" which include settings such as the paths of audio files used,
processing data on the files etc. One of the objectives of the
present work is to provide such resource management facilities to
SuperCollider. The usefulness of such facilities is easy to
demonstrate: When experimenting with several scripts that require
synthesis algorithms, buffers, and bus interconnections it is
convenient to be able to save the configuration of scripts, buffers,
synthesis algorithms and interconnections onto file per mouse-
click. This is implemented in Lilt by the concept of a Session. A
session saves all the above data as a Script that can recreate the stage 1
stage 2
sessions elements. The Script is generated in SuperCollider code
and can therefore be inspected by the user.
stage 3 stage 4
263
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
7.REFERENCES
[1] Wright, M. and Freed, A. Open Sound Control: A New
Protocol for Communicating with Sound Synthesizers.
Proceedings of the 1997 International Computer Music
stage 6 Conference, Thessaloniki, Hellas (Greece), 1997, 101-104.
Figure 8b: Further Stages of an Algorithmic Audiovisual Piece [2] Alvaro, J. Miranda, E. and Barros, B. EV Ontology:
Multilevel Knowledge Representation and Programming,
The examples of Figures 8a and 8b. show several phases in the Proceedings of the 10th Brazilian Symposium on Computer
unfolding of an algorithmically composed audiovisual piece Music (SBCM), Belo Horizonte (Brazil) 2005.
running on Processing and SuperCollider. It is visible how the
students created a work with several distinct sections, starting [3] Papert, S. Mindstorms: Children, Computers, and Powerful
Ideas. Basic Books, N.Y. 1980.
264
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT systems are often more concerned with timbral generation, while
Through the developing of tools for analyzing the performers the player paradigm requires the use of some meta-compositional
sonic and movement-based gestures, research into the system- generation method to produce musical output” [10]. His
performer interaction has focused on the computer’s ability to taxonomy, though mainly focused on process used to generate the
respond to the performer. Where as such work shows interest system’s response, implies a consideration of a fundamental
within the community in developing an interaction paradigm difference in the manner of interaction. The instrument paradigm
modeled on the player, by focusing on the perception and suggests devices used for direct control of synthesis and low-
reasoning of the system, this research assumes that the level parameters (pitch, volume, on/off), while the player
performer’s manner of interaction is in agreement with this paradigm generally involves sensors that allow for the mapping
computational model. My study presents an alternative model of of larger performative gestures to global parameters.
interaction designed for improvisatory performance centered on
the perception of the performer as understood by theories taken Improvisational music systems often implement elaborate sensors
from performance practices and cognitive science. and algorithms for analyzing the physical and sonic gestures of
the performer [7] [14] [11]. The assumption underlying this
approach is that much of the communication between performers,
Keywords and in particular musicians, is through the context and syntax of
Interactive performance, Perception, HCI their sonic response. This argument is not wholly untrue and has
produced some very accomplished systems. However, musicians
1. INTRODUCTION tend to play within what might be termed social contact with each
For the past two-decade, composers have been designing other. With this term I refer to communications modes, such as
interactive music systems that are often viewed as new musical eyesight, that are separate from the act of playing, but I also
instruments, or as an emulation of a player or conductor [10][3]. intend to bring attention to the social aspects of music that give it
As processor speeds increase, the systems being designed not common ground with other performance disciplines.
only produce more complex sounds, but generating responses and
analyze performer’s gestures with increasing sophistication. In Other interactive music projects have expanded the mode of
conjunction with these developments, increasing amounts of communication to explore other cues such as visual movement
intelligence and autonomy are being built into systems for use in cues [14], acoustic variation [7][14] and multisensory multimedia
a variety of performance situations including improvisation. But [3] often in the context of interdisciplinary performance.
as the autonomy of these systems increases it may be necessary Exploration in Multimedia and interdisciplinary interaction has
to reconsider the models used for designing the interaction. found that the system-performer communication cannot rely on
Research into the performer-system interaction has focused the syntax of a particular performance domain but rather be
largely on the computer’s ability to respond. As composers expanded to general expressive gestures.
explore giving agency to the computer, the performer is being
required to be responsive. This study addresses a number of This research into performer-system communication shows an
issues that lead towards constructing a framework for developing interest in developing a player paradigm for interaction.
a performer-based model for improvisatory interaction. However, the research has focused on the perception of the
system thus ignoring aspects of human communication. This
study presents an alternative model of interaction appropriate for
2. TRADITIONAL MODELS improvisatory performance by examining theories taken from
In the early years of interactive music, Robert Rowe proposed a performance practices and cognitive science to focus on the
distinction between an instrument paradigm and a player performer’s ability to perceive intention.
paradigm as one axis along which we could place different
interactive systems [7]. Rowe suggests, “ Instrument paradigm
3. BACKGROUND THEORY
Communication in performance is an inter-subjective
Permission to make digital or hard copies of all or part of this work for phenomenon where understanding is agreed upon by the agents
personal or classroom use is granted without fee provided that copies are involved in the moment. As Lockford and Pelias explain:
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
NIME08, June 5-7, 2008, Genova, Italy
Copyright remains with the author(s).
265
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
“Even when faced with the challenge to perform in an of the other without the need of theorizing about it” [4]. What is
unscripted moment, performers understand that they are crucial to this phenomenon is that the action observed must be
engaged in an ongoing communicative exchange. This goal oriented, that is it must have intention [5][9][6].
exchange is a process best conceived, not as an act of
information transmission or shared understanding, but as
communication scholar H. L. Goodall, Jr. would have it, as an However, there is some question as to the usefulness of mirror
act of ‘boundary negotiation’.” [8, p433] neurons in human-computer interaction. The findings to date
concerning a person’s ability to perceive intention in others
Here “boundary negotiation” refers to the process of the self of suggest that the ability diminishes in correspondence to the
the performer being incremental build with in the context of the physical similarity with the other. This means that a human
performance. In a theatrical sense, this is the build up of character subject perceives the intention of other humans, but less so apes,
as new information is reveled in the scene. In a musical sense, the only slightly with other animals and not at all with machines [4]
negotiation is between soloist and accompanist over harmonic [5]. The prevalent reason given for this distinction is a perceived
extensions and rhythms that occur during a particular solo. Such similarity of motion [5]. It is then unclear whether a system’s
a negotiation implies that the agent must be able to respond to response actions would affect the per-cognitive process of a
new information while simultaneously presenting information to subject if accurately modeled on human action.
contribute to the self of other agents. Negotiation in these terms is
a coordination of the interaction between agents [8]. It becomes Still, the presence of the pre-cognitive function implies that the
imperative that all agents are able to negotiate the coordination of human cognitive system as a whole works in connection with this
their intention and therefore able to track the intention of the mechanism, and that even at a cognitive level, interaction is
others. governed by the prediction of events as much or more then
reaction to events, an interpretation supported by the presented
The importance of the agent’s ability to track intention can be theories on improvisational performance. These findings suggest
clearly seen when considering the notion of trust. Since the that as social being we have developed the ability to intuitively
agents constitute them selves and each other through the predict the actions and sounds of those around us.
negotiation of boundries [8], this inter-subjective communication
requires a sense of trust. For a performer to be open to The idea that human action and intention happens before the act
constituting their performance identity anew in negotiation with has been shown in other experiments as well. Wegner in his book
others on the stage, they must trust the environment. “The Illusion of Conscious Will” presents the work of Kornhuber
Furthermore, a sense of support is established when their actions and Deecke (1965) as well as Libet (1983). These researchers
both affect and support other agents. Again, this support comes measured a rise in brain activity up to 800ms before an action
from trust in the inter-subjective understanding of the moment. took place. In the case of Libet’s experiments, brain activity was
This understanding keeps the ensemble synchronized, but recorded over 300ms before the subject was even aware they
requires that all the performer-agents are able to track the wanted to act [13].
intention of the others. Therefore, it becomes imperative that all
agents be able to project their own intentions. These findings further indicate that humans do not live in a static
present moment but rather in a moment becoming the next. Our
social engagements are informed by an embodied empathy that
3.2 Agencies and State Knowledge allows minor predictions of those around us. We react not in the
Bogart and Landau coach students of improvisation to “trust in
moment but in the moment next over half a second late.
letting something occur onstage, rather then making it occur” [1].
Applicable to both sonic and physical gestures, their statement
does not mean that nothing should be started but rather to avoid 4. PERFORMER MODEL
forcing a start. We might call this an additive approach where The theories presented give an understanding of the role of
additive suggests that the agency is added to the state of the perception of intention in human interaction. Based on these
system whether it is in steady state or a dynamic state. The theories, I suggest that a framework for interaction between
implications of this view can be seen when considering the autonomous agents should address:
response of the performer rather then the system. To trust in the
something that will happen is to coordinate the actions, adding to 1) The need to negotiate boundries and build trust with others.
action of the system. This cannot be done in response. The 2) The development of an inter-subjective understanding of the
improviser must move beyond the cognitive and trust in the moment
intuitive [8]. 3) The need to feel supported through one’s agency and
acceptance in the environment.
3.3 Intuition and Intention I propose that these criteria may be address by incorporating in to
Research in the field of neuroscience has recently suggested links
the system a mechanism to allow the performer to perceive the
between intuition and intention. Neurons found in pre-motor
system’s intention. Therefore, I have started a series of studies
areas of the brain have been shown to fire not only when
looking at the experience of the performer working in a system
producing a sound or action, but when the subject hears the sound
designed to project its intention.
or observes others doing the action as well [5] [9] [6]. The firing
of these neurons allows the subject to predict the outcome of their
own actions as well as the actions of others. “This implicit, 5. SYSTEM DESIGN
automatic, and unconscious process of motor simulation enables The system used to conduct the study took two forms, visual and
the observer to use his/her own resources to penetrate the world sonic. Both systems were constructed through an iterative design
266
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
process using a first person methodology. In order to focus the as “fakes” in which the Light Actor moved it’s “weight in one
study on methods for modeling an embodied projection of the direction then immediately moved it back to a center position.
system’s intention, gestures in both systems were generated with This emergent behavior was of special interest. The perception
simple random processes, avoiding any signifiers that may come that I could tell where it was “thinking” of moving encouraged
from structure or syntax, and allowed the system to enact its own me to get close but the impression that it could “change its mind”
“intention” with no sense of the performer. The response kept up my interest in the engagement.
paradigms chosen for both test systems were informed by human
response and perception behaviors but were not meant to mimic 6.1.1 Test with non-projecting system
them. Finally, the research was set up as studies into the Some time was spent comparing the system with and without the
experience of a subject being afforded the ability to move with center circle active. With out the center active I noticed I was not
the system. No expectation of creation or performance was inspired to get close to the light, and my willingness to engage
imposed. with the system was shorter. Similarly, I noticed when the
response behavior was tuned to give less fakes the movements
5.1 Visual System became easier to predict, but the interaction became less
The response gestures in the visual system were realized using an engaging in the context of a tag paradigm.
image of two concentric circles generated in MAX/Jitter. This
image was projected onto the floor of the performance space 6.1.2 moving with the light
using an I-CUE dmx controllable mirror. The behavior of the During a second session I focus on moving with the light rather
system was set so that the inside circle needed to move off center then avoiding the light. At first I changing only my behavior, the
for the entire image to move in the space. Stopping required the system’s behavior pattern remained the same as before; however,
circle to return to the center. The direction and amount that the I found this interaction very unsatisfying. Although I could tell
circle moved off center corresponded to the direction and speed where the light was going, I had very little time to coordinate my
at which the image was about to move. The time required for the own movements. The interaction quickly became a dodging
inner circle to reach its maximum point was set at 200ms, in line rather then a moving with.
with the research presented by Wegner. The movement of the
light object was constrained using a dynamic weighted random The behavior settings of the system were then changed to
algorithm. The probability of the light moving in any direction generate movements that tended to be longer with less “fake”
was a function its position in the space. motions. These changes were modeled after mirroring exercises
in which human partners try to mimic each other’s motion with
5.2 Sonic System out a sense of leading. In these exercises, fluid, often slow
The sonic version of the study was modeled on the common idea predictable motions are emphasized. With the system’s behavior
that breath can be used to synchronize a group. The system used modeling mirror exercises, I found the interaction with the light
a physical model of a flute constructed in the PeRColate more of a moving with experience. However, the quality of my
synthesis library for MAX/MSP [12]. Each session explored movement remained at a “proof of concept” level. The interaction
different approaches to perceiving information embedded in did not inspire flow or exploration in my own movement.
different parts of the breath sound. The information was
embedded by manipulating the parameters of the flute model to 6.1.3 shape
get different qualities breath sounds before and after the tone. As final note, I noticed that the circle inside a circle design had
The timings of these different breath qualities in each session more the top down look of a joystick then a human. I tried giving
were functions of the generated gesture’s length, density and a more human shape by using ovals rather then circles, but found
speed. the oval shape less engaging then the circles. Though this can be
explained by the fact that an oval implies a direction and the
6. QUALITATIVE DATA system was not programmed to take direction of the image into
account, my experience suggests that the circle configuration,
6.1 Visual System though endowed with behavioral characteristics, remained a spot
I spent a number of sessions working in the system to feel the
of light. My perception of the object combined “lightness” with
experience of being in the space with it. As might be expected, it
behavior and did not need to construct a new humanoid entity.
was easy to anthropomorphize the light. I Perceived it’s motion
as a nervous exploring intention, even though I knew the
movements were random. Still, it quickly became apparent that 6.2 Sonic System
the system had no sense of my presence. This had been part of The audio-based system had a different initial impact. Where as
the design, however, it was interesting to note how easily I the visual system had inspired an avoidance response and only
perceived the design as experience. Furthermore, this perception after being re-modeled, produced a moving with response, my
profoundly changed the quality of the interaction from the experience was that the breath model in the audio based system
intended design model of tag to one of playing in ocean waves or immediately inspired a moving with response. The randomness of
taunting a blindfolded partner. My perception of the system’s the gestures had less of an affect, perhaps because there were no
movement intention, stalking and lunging with no focus on me, fake gestures produced by the sonic system. The breath sound in
inspired a sense of teasing. I noticed myself considering which the first session was linked to the duration of the generated
way the system was “thinking of moving” and circling to the phrase and produced a feeling of lift into tone of the sound. This
other side just out of “reach”. The random process used for feeling of lift encouraged my motion with the onset of the sound
starting and stopping also produced occasional motions perceived even though I had no knowledge of when it would happen.
267
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Through reflecting on my response I noticed two parts to the alignment of both the performer and system’s intentions for a
breath generated by the physical model: the inhale and the stream more unified and balanced interaction.
focusing. I was lifting on the inhale but moving on the focusing
change of breath just before the flute tone. This discovery
inspired a series of sessions exploring the breaking of the breath 9. REFERENCES
sound into three parts: inhale, focused –airstreem and breath trail- [1] Bogart, A., and Landau, T., The Viewpoints Book: A
off. By considering that a breath into a beat is often used to signal Practical Guide to Viewpoints and Composition. New York:
a down beat and that more air is needed to play longer phrases I Theatre Communications Group, (2005)
mapped inhale duration to tempo and inhale volume to phrase
duration. This mapping frequently allowed me to anticipate the [2] Camurri, A., and Feffentino, P., Interactive Environments
tempo of the phrase and move with it but only with in a small for Music and Multimedia. Multimedia Systems 7: 32-47
range of values. However, when inhale duration was a function of (1999)
phrase length I found that I moved with out much thought with
the sound. The mapping of duration to tempo affected in me a [3] Camurri, A., et al. The MEGA Project: Analysis and
more rational approach to moving. Synthesis of Multisensory Expressive Gesture in Performing
Art Applications. Journal of New Music Research, 34:1, 5-
21
7. DISCUSSION
The literature and theories presented in this paper suggests that [4] Gallese, V. The “Shared Manifold” Hypothesis: From
human interaction is not restricted to reacting to enacted events. Mirror Neurons to Empathy. Journal of Consciousness
Instead, as social being, our interactions include the Studies 8 5-7 (2001) 33-50
understanding and prediction of events through the perception of
the intention of others in the environment. From these theories, I [5] Gallese, V., The Intentional Attunement Hypothesis: The
have suggested a framework for interaction, modeled around the Mirror Neuron System and Its Role in Interpersonal
abilities and needs of a Performer pertaining to perception of Relations. Biomimetic Neural Learning (2005) 19-30
intention. The crucial point is that all agents in the environment
need to be able to perceive the intentions of the other agents. The [6] Iacoboni, M. et al. Grasping the Intention of Others with
framework that I am constructing has a crossover with the One’s Own Mirror Neuron System. PLoS Biology 3:3
“Player” paradigm of interaction, first suggested by Rowe, in that (2005): 529-35
agency is being given to the system. However, the proposed
framework differs from Rowe’s paradigm by focusing on [7] Lewis, G. E., Interacting with Latter-Day Musical
interaction through the perception of interaction rather then Automata. Contemporary Music Review 18:3 (1999): 99-
through a process for responding. 112
In order to demonstrate the implications of this approach in the [8] Lockford, L. and Pelias, R., Bodily Poeticizing in Theatrical
context of both sonic and physical interaction, I have discussed Improvisation: A Typology of Performative Knowledge.
two example systems: a visual based system and a sonic based Theatre Topics 14.2 (2004) 431-43
system. Both systems were designed around the claim that the
performer needs to be able to perceive the intention of the system [9] Kohler, E., et al. Hearing Sounds, Understanding Actions:
in anticipation of any action. The result of my studio work Action Representation in Mirror Neurons. Science 297,
indicates that both visual and sonic systems provide the (2002) 846-8.
opportunity to embed information in the system’s response
media, projecting the system’s general intention. Analyses of the [10] Rowe, R. Incrementally Improving Interactive Music
results indicate further that the two systems share many of the Systems, Contemporary Music Review 13:2 p.47-62 (1996)
same issues. The cognitive load imposed on the performer when
trying to predict the action of the system was reviled as an issue [11] Thom, B., Artificial Intelligence and Real-Time Interactive
when using analytical models to indicating intentions. These Improvisation. Proceedings of the Seventeenth Conference
models were most prevalent in the sonic system, and yet, a on Artificial Intelligence, Austin Texas, August (2000)
similar effect was observed in the visual system. Both systems
indicated an experiential difference between “natural” and [12] Trueman, D., and DuBois, R. L., PeRColate: A Collection of
analytical interactions; however, the parameters for separating Synthesis, Signal Processing, and Video Objects for
these qualities have not been isolated. What was made clear by MAX/MSP/Nato. V 1.0b3
the studio work was that the manner in which the system
expressed its intention did not need to be “true”, modeled on a [13] Wegner, D. M., The Illusion of the Conscious Will,
human gesture. However, there is some indication that a stronger Cambridge MA. USA: MIT Press (2002)
reference to signifiers that are already part of the performer’s
body knowledge reduced their need to rationally analyze the [14] Weinberg, G., and Driscoll, S., The Perceptual Robotic
intention of the system. Of prime importance was the observation Percussionist- New Developments in Form, Mechanics,
that a feeling of trust and sharing of space was created in the Perception and Interaction Design 2nd ACM/IEEE
system projecting its intention that was not present in the International Conference on Human-Robot Interaction
response only system. With more investigation it is hoped that a Washington DC, USA, March 9-11 (2007)
system may be developed, that enables the integration and
268
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
+&"9:
9*+;
!"#$%$# .1<9
#&' :=5#)
()* & 632345%3%54>.
+,-(#...
//012134/151 ?0>>$1798,8
622..3532$52.
07*8,8
`
!
#!
{|
}
<
! #
<
\
!"
#$"$
%$
'$"
#$"$
!
!
^
269
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
¤
*;<=>=
*'%
#"
@$%"%#$JZ
\%$%"#
;
#$'^
#$\#$
"$%_%$
*#$
!"
#"
$#
%`#
j%`'
"
^$%"
#$$%
#
!$
}
\
^
#$%#$%"\\
'$$
$
"'$
^;{'%\$
%#$
!"
"
#$
\\
|$
#
|
%
\
_%"
#
{ '%
#"
;$%"%#$
!" $%
¤
!
}
¡
`
¥
^ ¢
{
{!}
^
££ |
^
}
££
&'
*+$%;
}
270
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
271
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
`
]{®®®¤®
#<
' * ?@
|
¡
¤
® ±
! £ ¬ ^
}*=`~%{!
¡
® ¤ |
¡ {
² ³ *
"[ @
|
*@[ +[ ^ @ ^[\
£!}¤
£
^
|
!¡{
±
¯
£ Z
"@@
* " * ""
`+_{®
ª Z@*
\@ "
¡ ¤
?@ "
!¡
^
|
Z**_!
¡
§ ª!|
£
}
Z
"@^"
[
}
"
*_\
}
{£¤§¤
}
£
!
"@^""
"
*_\
£!
} !
¨
& ¡
± ¡£
¡ §¤
®
£´}
~ µ
^
ª ¡!
¤ ^`
` ^ '
@
$
!
¡
|
¡
}
}
<Z
"@]
*^[\
^"
}
¯ `
¬ {
¡ } ± {
¬ {£
£ |
¬
^
±
£ £
^" "
"
*
! _\
{£{¬ {¬±!
^\ $| ^ @[!
~
?*
¶
¡£}±¡!£
272
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
; "
#
6
& &
"<
!
" # $ $
"
"# $ #
$
% &
& ' "
( # 4
5
( " 4 5 & 4 & 5
( ) * 6
+ , "- "
# & ;
&
. & " 2 &
"
" 2
/ 6 "
# 3 3
& 3
'
" "
#
& # "
& 0 / # 1 # 3
" 2 (
3
3
"
"
) 4 5 & "
" # ;
"
6 = 3
3 & &
" 7 " <
(
"#
-
* ,"- 89 - 89:
273
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
" 1
%
"
2 3
"
4 & 5
%
& "
#
%
&
"2
3
# " #
" !
# 3 "
2 &
> &6> = $ 6
$ >2/2 " 4 5
"0
% "#
3
"#
3
6 &
"
1
!
"
2 3
! & " #
!" # ('! )*+
! %
&
3 4 5" ! "
# 6 6 #
( 3 !
" 4 5 (
&
"# 3 & " 1
" ! !
# 4 5 !
& 3 !
274
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3 # &
"? & !
3
! " < 3 !
! "
2 " #
> > 3 &
" 2 & " # 3
! &
&
& "
(
&
- . $
! "
#
1 3
& "#
(
" 2
47 ; 5
"#
3 "2
&
!
%
&
(
& "
(
7
" # &
(
"#
3
;
7 ;
& " 1
(
( 3
&
"#
"1 :
!
+@ 8@
" # :
; & " #
3
(
" #
;
&
"# 3
"
6
"1
2
;
4 4
'
!
"
!
" *
'
!" # , (+ % , )* &
"
275
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
/ 0 1
/ 3 > >
& 3"
4 5 4 5" 2
2
/ ?" A ."
(
0 B > 3 C2 D" "1
/1
-88 "
3 " 2
/ / " >" ) > * =
! -8E@"
" 7 1" 0"
# 4 5 , 7 -8++"
" #
& 7 1" 0" " <&
$ $ D = -8@+"
" B >" ? B" <&
# D = -8E@"
. =" "D B
( = -8@E"
& %
; " #
&
"
276
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
∗
Mitsuyo Hashida Yosuke Ito Haruhiro Katayose
School of Science & School of Science & School of Science &
Technology Technology Technology
Kwansei Gakuin University Kwansei Gakuin University Kwansei Gakuin University
Sanda, 669-1337 JAPAN Sanda, 669-1337 JAPAN Sanda, 669-1337 JAPAN
hashida@kwansei.ac.jp katayose@kwansei.ac.jp
277
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
user’s action
input target score Structure Analysis examples (cases)
to be referred
indicate phrase
(melody fragment) case1
case2
modify structure case3
(option) ...
choose a phrase
Similarity Search
1. Usage of the referred melody fragment; the most
indicate the strategy similar melody fragment, all of the extracted super class
melodic fragments over the threshold, or the
for copying melodic fragments that the user selects for case1 model
parameters of herself/himself. case4
expression 2. Usage of the parameters; weighed parameters case8
based on the similarity of referred melody ... waltzes
fragments or the simple average of the sonatas
parameters of the referred melodic fragment. Chopins
3. Use parameters of super class ...
indicate (if 2, 3) threshold
edit an approximate
melody line (option)
Expression Copy threshold = 1.0 threshold = 0.5 threshold = 0.0
performance
278
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
279
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The duration of each note is calculated as the same man- 5.2.2 Musical ability of the system
ner as the power calculation. The difference is only the Itopul allows users to generate expressions of metric struc-
parameters of the note the position and the time-value of ture and phrasing based on the results of a similar melodic
which are the same between the target and the referred fragment search. At present, the functions to copy expres-
melody fragment, when calculating the duration. sions regarding tempo, dynamics and the duration of notes
As for the timing control, tempo is controlled by sending have been implemented. One of our future jobs will be
the local score beat-time given by the following equation to provide functions to deal with articulation of each note
every infinitesimal score time Sk . within chords.
Y “ ” The current version of Itopul is designed for monophony
T (Tk ) = Tratio Tk̃,Ni ,l expressions. The expression of the accompaniment part is
l generated by simply copying the expression of the corre-
Where“ Ni is ” the note that contains Tk in its control, and sponding melody part. We should improve the system so
that it can deal with polyphony. We are planning to trans-
Tratio Tk̃,Ni ,l is the score beat-time ratio at time Tk̃ of the plant the functions from jPop-E [4] that we have been de-
referred melody fragment of the target melody fragment at veloping.
level l. Iratio (Ni , l), and Tratio (Tk̃ , Ni , l) are calculated by In addition, expression marks, explicitly described in scores,
referring to the user’s preferences (see section 3.2). such as staccato or legato need to be handled.
280
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
281
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
may shift from green to red when a particular stock drops “League of Electronic Musical Urban Robots” and create a
significantly in price. Another example of a visual ambient semi-autonomous automated musical instrument [8], which
display is the DataFountain by Koert van Mensvoort. This can be fed information regarding the activities in a remote
display is comprised of three water jets that project water location, and change its state accordingly.
to different heights based upon the relative value of the Yen, Our musical interface has to be able to alter the music
Euro, and US Dollar. it was producing to inform listeners about different levels of
One of the problems with the normal visual cues used activity in remote locations, but had to do so without forcing
in AIS is that they must be within the reader’s field of vi- itself into the listener’s primary focus of attention. We had
sion. However, one of the primary advantages of this class to come up with a way to change the type of music being
of information technology is that one may perceive it while played such that it mapped to the remote level of activity,
focusing directly on other tasks. This has lead us to believe but without being distracting. To achieve this, our system
that sound may be a preferable medium to use when de- makes use of Markov chains in a generative context. This is
veloping this sort of system, although sound-based AIS has discussed further in Section 6.1.
been largely left unexplored. To build a proper AIS we had to take the aesthetics of our
Unlike more traditional musical interfaces, which are de- instrument into careful consideration. Maintaining the in-
signed to be directly observed in public performances, we strument’s perceptual subtlety requires that it blend smoothly
need our instrument to have some subtlety and to perform into the environment so that people would not be overly en-
in the periphery of the listener’s awareness. gaged with it. This means that it could not be overly at-
This paper describes an instrument designed to study mu- tractive or unattractive. In this case, we chose to make the
sic as a medium for delivering ambient information in the instrument resemble a piece of generic artwork, akin to those
style of AIS. We begin by discussing our design rational for that might be found in common waiting rooms or lobbies.
the development of the initial instrument, and follow with a For our sound emitting material, we made use of thin bars of
description of its actual development and construction. Fi- slate stone which were tuned and could be played similarly
nally, we discuss some evaluations and the next stages in our to a xylophone or marimba. To keep the motion and elec-
research. tronics from drawing attention to the instrument, we hid
all of the mechanisms that were operating the instrument
3. DESIGNING AN AMIS internally. To the casual onlooker, it would appear that
the instrument was nothing more than a simple, somewhat
Because of the lack of existing sound-based implementa-
bland, wall-hanging piece of artwork.
tions, we sought to construct our own AIS that can convey
a simple stream of information within a public setting. We
are proposing the term Ambient Musical Information Sys- 4. MUSIC AS INFORMATION
tem (AMIS) for this type of research, based on Pousman As we have mentioned, there have been few explorations
and Stasko’s definition of an Ambient Information System, in AIS that make use of sound as an information channel,
but focused on audio-based delivery. but the underlying concept of providing music as an addi-
One type of information which is both important and use- tional information layer has some precedents. Perhaps the
ful, but not necessarily critical, is regarding how and when most pervasive example of music as information is Muzak.
people are making use of public spaces (i.e., lounges, con- Used by 90 million people each day, this company’s tradi-
ference rooms, study halls, etc.). The particular situation tional products are designed specifically to be non-invasive,
we wished to experiment with is a system that can inform yet are subversively “made and programmed for business
people in one location how much activity is taking place in environments to reduce stress, combat fatigue, and enhance
a remote location. For example, if there are two separate sales” [4, pp. 4] and in some cases has been used to increase
lounges in a building it could be useful to know if a high worker productivity in factories by arranging songs in cycles
level of activity is taking place in the other lounge so that of increasing tempo. [4, pp. 43–5]. Muzak is a good example
one could choose to relocate to the area where most of their of music that was developed to be perceived outside the di-
colleagues are. Alternatively, they could choose to relocate rect focus of attention. The style of Muzak is produced such
to the other lounge if less activity was taking place, and they that it is deliberately tame (“easy listening”), and does not
needed a place to study, or have a private meeting. give cause for listeners to become overly interested in what
This is one of the most important and difficult features is being played.
of an AIS to design. We had to carefully consider different One of the continuing topics of discussion concerning our
qualities of sound that might be appropriate for this sort instrument is regarding the qualities of sound that are best
of informational interface. The sound produced needs to suited for conveying ambient information. Of course we
situate the information being conveyed at the edge of the know what sound qualities are probably inappropriate (e.g.
listener’s perception, and fade in and out of their aware- a fog horn, or drum set), but we believe that there may be
ness depending on the level of information that is being other sound qualities (e.g. tone, timbre, resonance)may be
presented. The idea that there could be an optimal kind best for delivering information in an ambient manner. In
of music for this sort of display has been the source of some considering the construction of our instrument we felt that
debate within our group. the right place to start was with a highly resonant, slowly
The need to focus on tangible representations required changing, sound source.
that we could not simply place audio speakers into the pub-
lic space and provide the information as disembodied sound.
Instead, we had to build a physical musical instrument that 5. IMPLEMENTATION
was capable of conveying musical information in a subtle Our instrument consists of five tuned bars (sound ele-
manner. We decided to take our inspiration from Eric Singer’s ments) of slate stone mounted to a hollow wooden box with
282
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
a hole beneath each bar to amplify the sound. Beneath each The final solution involved welding a small spring to the base
of the bars is a single solenoid that can be activated to strike of the plunger, and attach the opposite end of the spring to
the bar, producing sound. The solenoids are controlled pro- a cap which is fitted over the back of the solenoid. This so-
grammatically by using an existing hardware platform called lution worked well enough to make our solenoids functional,
“Phidgets” [2]. This setup proved to be more complicated but the resulting assembly produces a bit of extra noise that
than we had anticipated. Individual components of our in- lessens the overall effectiveness as an AIS device. Other re-
strument are discussed below. searchers have proposed different strategies for dealing with
solenoid problems, although some of these were not applica-
5.1 Physical Aspects ble in our case [3].
Upon acquiring our sound elements, we tested them on The Phidgets platform works very well as a means to con-
a standard xylophone mounting and found that the sound trol the solenoids we manufactured for our instrument. We
produced had a mellow quality, and decay rate similar to were able to make use of the Phidget Interface Kit 8/8/8
a marimba. To amplify the sound and contain the elec- [2] to control three Dual Relay Boards in combination with
tronic components, a resonant box was constructed out of a 24 Volt power supply. The only drawback to using the
1/4” thick particle board with 3” holes below each sound Phidget Dual Relay Boards is that they produce some extra
element. After mounting the sound elements we found that noise. Specifically, a “click” can be heard when the board
the density of the wood, and the mechanism used to mount switches between the on and off position. This had the effect
the sound elements vertically, had an effect on the decay of giving our instrument a sound that has some similarities
of the sound elements. We assumed that part of the prob- to a pinball machine, which we are not completely certain is
lem was the thickness of the wood, so a second box was appropriate as an sound-based ambient information source.
constructed from 1/8 inch wood stock. This improved the The company that developed the Phidgets platform has just
sound by increasing the overall decay rate, but the act of released a new component that is similar to the Dual Re-
mounting the sound elements vertically still caused them to lay Board, but this model makes use of a solid-state switch
lose some resonance. In our second iteration on the instru- which operates silently, and that we believe will solve this
ment’s construction, we designed a new mounting bracket problem.
that pinched the drilled mounting holes in the sound ele-
ments between two small pieces of foam not much larger 6. MUSICAL CHARACTERISTICS
than the holes themselves. This provided some improve- There are a number of musical factors that must be care-
ment over the initial mounting. We are still experimenting fully considered when designing an ambient musical instru-
with better ways to mount these sorts of sound elements ver- ment. One strategy for building an ambient instrument
tically, so that they produce the same sound as when they could be to use specific melodies or well-known songs. How-
are mounted horizontally. ever, the difficulty with this approach is that these kind of
musical materials bring in a variety of distracting cultural
5.2 Working with Solenoids and Phidgets and semantic associations, as can be seen in the current
The primary difficulty in automating our instrument was “ringtone” phenomenon with cellular phones, that we feel
in acquiring solenoids that would best suit our purposes. A would bias our results. Instead, the approach used here has
solenoid is a device that can convert energy into a linear been to use somewhat non-descript musical materials, and
motion by making use of a simple electromagnet. Inside to transmit information to the user using changes in global
the electromagnet is a simple piston that is drawn in when musical characteristics. For instance, changes in tempo, ac-
power is passed through the magnetic coil. Solenoids can tivity levels, repetitiveness of pitches or regularity of rhythm
be categorized as either pull-type, or push-type (sometimes can be used to indicate changes in some aspect of the infor-
called thrust-type), depending on the motion they create. mation that is being represented. Additionally, there is the
The push-type solenoids differ from the pull-type only in need to supply musical materials over long periods of time,
that another smaller piston is attached to the primary so such as entire days or perhaps even weeks at a time.
that when the primary is drawn in, the other pushes out the However, music that is completely redundant, such as an
opposite direction. endlessly repeating rising scale, can be extremely tiring for a
Solenoids like these are used for everything from control- listener. A chorus that is repeated multiple times may be ac-
ling automated car locks, to operating soda vending ma- ceptable within the confines of a three minute pop song, but
chines. The companies that produce these devices will make is likely to be unacceptable for longer periods of time. Al-
custom orders to match the needs of a particular project, but ternately, and as has been pointed out by other researchers
these companies normally expect very large orders in order [5], completely random music with no predictable attributes
to do so. For someone doing a project like ours, were we also has a tendency to be extremely tiring for a listener. This
need only 5 to 10 solenoids, purchases are likely going to is perhaps in part because the only predictable attribute is
be done through surplus retailers, and only the basic model that it is unpredictable. The long-term characteristics of the
will be available. A basic solenoid model will consist of only instrument must sit somewhere between these two extremes.
the plunger and the frame. If the plunger is placed half way The ideal music, for our purposes, should not carry recog-
into the frame, and power is applied, the plunger will quickly nizable cultural attributes, be able to transmit information
force itself into the stop position. Making this mechanism through large changes in style, have a partial degree of re-
useful requires the additional construction of a return spring dundency, and be able to run continuously for extremely
to move the plunger back to the start position when the long periods of time.
power is turned off. Without access to specialized drilling
machines, we attempted several less-than-optimal ways to 6.1 Markov Chains
create a functioning return spring for each of our solenoids. Discrete Markov chains have a long history in computer
283
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
music and algorithmic composition [1], and are ideally suited proven valuable in both the planning of the experiment, as
to our purposes. well as the next version of the instrument. The most preva-
A discrete Markov chain is a discrete-time stochastic pro- lent feature that has been mentioned, and that we would like
cess that can be used to model a series of events, where each to introduce (besides fixing the aforementioned click prob-
event is assumed to belong to one of a finite set of unique lem) would be to have better control over the volume of
states. The entire process must always be in a single state individual notes, which is difficult with our present solenoid-
at any one time, and will change state based on some kind based system. The present version has convinced us that we
of received information. One interesting aspect of Markov are moving in the right direction and that the AMIS concept
chains is the underlying Markov property, which assumes shows much promise.
that future states only depend on the present state, and not
on previous states.
7. EVALUATION
Our current, implemented device is shown in Figure 1.
While more-extensive listener studies are planned, we have
already tested the instrument casually with a number of
listeners. We have received a range of opinions that have
284
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
285
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
muscle. While a student performs thumb touches, the amount of horizontal image coordinates. For that reason, it needs to know
muscular activity is shown on a screen. In comparison to a the horizontal position of the left and right end of the claviature,
control group that received traditional training of the thumb as well as the vertical coordinate of the front. These positions
attack, a biofeedback group was able to match the muscle have to be configured once by the user using a GUI.
activity pattern of professional pianists better.
The keys are mapped to horizontal positions by linear
A multimodal feedback system is used by Riley to improve interpolation. Each key corresponds to an area with the width of
piano lessons [11]. The system can record and replay MIDI, 1/88 of the width of the claviature. Although this approximation
EMG, and video. The video and MIDI output is synchronized does not reflect the structure of the claviature, it is sufficiently
with a piano roll of the performance. accurate for the Elbow Piano.
Mora et al. developed a system that overlays a 3D mesh of a The visual tracking is locked on the hands when the user plays
suggested posture over a video of the student's performance the tuning chord. This chord consists of two black keys per hand
[10]. The student can see the differences and adopt the (each hand plays a f#-c#-chord on different octaves). When the
suggested posture. To generate the 3D mesh, the posture of a tuning chord is played, the approximate positions of the hands
professional pianist was recorded using motion capturing. are estimated by the system. Two areas, which are located in
front of the claviature at the horizontal positions of the hands,
3. ELBOW PIANO are used to calculate the histograms of the skin colors (for each
hand separately) and serve as initial search windows for the
A user who wants to practice with the Elbow Piano sits at the
tracking algorithm.
keyboard and attaches the goniometers to her arms. She then
plays a tuning chord, which is necessary to initialize the visual The visual tracking of the hands is done with the OpenCV
tracking system. Visual tracking of the hands is used to assign implementation of the CAMSHIFT algorithm. CAMSHIFT [2]
the goniometer measurements to the notes played. In the climbs the gradient of the probability distribution, which is
following, we describe a typical use case of our system. computed using a histogram, to adjust the position of the search
window. CAMSHIFT continuously changes the size of the
The user starts playing. Sometimes the user performs elbow-
search window. Therefore, the entire hand of the Elbow Piano
touches, sometimes the user avoids them. Always when the user
user is tracked after some iterations of the algorithm. Because of
plays an elbow-touch, the system plays the elbow-touch sound.
the operating principle of the CAMSHIFT, the color of the floor
Now, if the user unintentionally plays an elbow-touch, the
and the clothing have to be sufficiently distinct from skin color
system will also produce the elbow-touch sound. The user can
and it is necessary that the user wears long-sleeved clothing.
stop playing at this point and use a graphical visualization to
analyze this condition. After some time, the user continues The Elbow Piano segments the claviature into a part for the left
playing. hand and a part for the right hand. For this purpose, the system
examines the rightmost pixel assigned to the left hand and the
3.1 Hardware Setup and Software leftmost pixel assigned to the right and determines the middle.
Each note is assigned to the left or right hand by comparing its
Architecture Overview position to the middle.
The Elbow Piano consists of sensors, which are connected to a The user receives visual feedback about the hand tracking
computer, and a software that analyzes the incoming data stream module (Figure 1). The hands are surrounded by a circle in the
and controls an attached synthesizer. The sensor hardware of the image from the webcam.
Elbow Piano consists of a MIDI keyboard, a webcam placed
above the keyboard, and a pair of self-built goniometers. The
webcam is used to visually track the hands. The goniometers
provide data about the angles in the elbow joints.
When the system receives a note-on event from the keyboard,
the system assigns the note to the left or right hand. This is done
by means of visual tracking (section 4.2). The history of
goniometer data of the identified hand is examined to determine
if the key was pressed with activity of the elbow joint (section
4.3), or not.
286
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
3.3.1 Goniometers
A pair of custom-built goniometers (Figure 3) provide the
computer with data about the angles in the elbow joints. Each
goniometer consists of a potentiometer and two plastic strips.
The plastic strips are connected to the axis of the potentiometer Figure 3. Visualization of Figure 4. Visualization of a
so that the motion of the plastic strips is transferred to the an elbow touch non-elbow touch
potentiometer. The potentiometer has a aluminum case and can
therefore sustain the occurring physical forces. Velcro fasteners visualizations. To assist the user to navigate, a separate view is
are attached to the plastic strips and are used to mount them on a provided for each hand. Furthermore, the graphs are stacked
suited pullover. The goniometers are additionally fixated by when a hand plays a chord, i.e., if two notes are received within
rubber bands. The potentiometers are wired up as voltage 0.1 seconds.
dividers and are connected to A/D converters. The digital signal
is transmitted to the computer via USB with a rate of up to 100
3.4 Sound Generation
Hz (values are transmitted on change only). The A/D converter When an elbow-touch is recognized, the system passes the note-
used is a CreateUSB board. on MIDI message that was received from the keyboard to the
connected (software or hardware) synthesizer. The system
3.3.2 Decision changes the channel of the MIDI message to reflect which arm
The goniometer data is continuously stored with the executed the elbow-touch. By configuring the synthesizer to
corresponding timestamps. When the keyboard sends a note-on play different instruments on these channels, the Elbow Piano
message, the sensor data log of the arm that produced the tone is can play different sounds for the left and right arm. The
examined. The latest 0.2 seconds of the sensor data is analyzed. generated sound effects can be prolonged by using the sustain
Considering the rapid sequences of movements that can occur in pedal of the MIDI keyboard.
piano playing, the choice of this rather large time interval is
reasonable, because elbow-touches cannot be done (much) more 4. EVALUATION
rapidly. The lowest angle in the elbow during that time interval
and the last measured angle are compared. If the difference of The Elbow Piano was evaluated with students of the HfMDK
these angles exceeds a predefined threshold, the touch is Frankfurt (University of Music and Performing Arts Frankfurt).
classified as a touch with elbow activity. Four pianists (professional level), one composer (advanced
level), and one singer (intermediate level) participated in the
3.3.3 Decision Visualization user study. The system was briefly explained to each participant.
Each participant then played pieces of her/his own choice with
To provide feedback to the user, a visualization of the decision the Elbow Piano. Afterwards, the participant filled out a
process was developed. The Elbow View (Figure 4) shows the questionnaire and was interviewed. The questionnaire contained
goniometer data that was used to decide whether the touch was different statements, which were rated by the participants on a
executed with activity in the elbow joint, or not. The graph is scale from 1 (disagree) to 5 (agree) (Table 1).
inverted along the y-axis to provide a more intuitive
visualization. The graph of an elbow touch begins with high It was evident that the participants improved their ability to
values and slopes to the right; this corresponds to the movement control the system during the sessions. At the beginning, the
of the forearm performing an elbow-touch, which starts at a high participants tended to imitate the mere appearance of the
position and then moves downwards. Relevant information motion, which had been shown to them. The participants often
about the graph and the decision is provided to the user: the did not consistently use the elbow joint to move the fingers but
lowest angle, the last angle, the difference between the two and used the wrist, the shoulder and (in one occasion) the back
the result of the decision. The user can access past instead. During the session, the participants moved more
consistently and could therefore control the system better. The
participants expected that they could learn to control the system
better if they have had more time to practice with it and they
stated that the system increased their awareness of arm
movements. Despite all that, all but one participants did rather
not want to use the system for practice or teaching because the
system focuses only on one specific aspect of piano technique
and can therefore not (yet) be integrated into a piano syllabus.
Overall, we received very positive feedback and were
encouraged to continue with our approach.
Figure 2. Goniometer
287
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Table 1. Questionnaire angles in the elbows. Measured change of posture could be used
Statement Score (Avg.) to clean the goniometer before the recognition method is
applied.
I have good control of the sound. 3.5 of 5 We are currently exploring the use of different sensors to
generalize the presented approach to other playing movements.
I would learn to control the system better if I 4.8 of 5
had more time to practice with it.
7. REFERENCES
I am more aware of the movement of my arm 4.5 of 5 [1] Bernstein, S. Twenty Lessons In Keyboard Choreography.
when using the system. Seymour Bernstein Music, 1991
[2] Bradski, G. R. Real Time Face and Object Tracking as a
Using the system is fun. 4.2 of 5
Component of a Perceptual User Interface. In Fourth IEEE
I would use the system to practice or teach the 2.5 of 5 Workshop on Applications of Computer Vision (WACV
piano. '98), 1998
[3] Dannenberg, R. B., Sanchez, M., Joseph, A., Joseph, R.,
Saul, R., and Capell, P. Results from the Piano Tutor
Project. In Proceedings of the Fourth Biennial Arts and
5. CONCLUSIONS Technology Symposium, 1993
Awareness of playing movements can be beneficial for [4] Gat, J. The Technique of Piano Playing, Collet‘s Holding,
instrumental performance. The Elbow Piano distinguishes two London, 1965
types of touch: a touch with movement in the elbow joint and a [5] Gorodnichy, D. O. , and Yogeswaran, A. Detection and
touch without. Therefore, the Elbow Piano can increase tracking of pianist hands and fingers. In Proceedings of the
awareness of these movements with possible beneficial effects The 3rd Canadian Conference on Computer and Robot
on normal piano performance. Vision (CRV'06), 2006
The Elbow Piano consists of a MIDI keyboard, a webcam, a pair [6] Goebl, W., and Widmer, G. Unobstrusive Practice Tools for
of custom-built goniometers, a computer to which these sensors Pianists. In Proceedings of the 9th International Conference
are connected to and a software that analyzes the incoming data on Music Perception and Cognition (ICMPC9), 2006
stream and controls an attached synthesizer. The system uses [7] Ng, K., Weyde, T., Larkin, O., Neubarth, K., Koerselman,
visual tracking to find the positions of the hands on the T., and B. Ong. 3D Augmented Mirror: A Multimodal
keyboard. On each keypress, the goniometer data of the matched Interface for String Instrument Learning and Teaching with
arm is evaluated and the system decides what type of touch the Gesture Support. In ICMI '07: Proceedings of the 9th
user executed. The user gets visual feedback about the decision international conference on Multimodal interfaces, 2007
process and can evaluate the decision of the system.
[8] Lin, C., and Liu, D. S. An Intelligent Virtual Piano Tutor.
A user study with music students of the HfMDK Frankfurt was In Proceedings of the 2006 ACM international conference
conducted. The participants learned to better control their arm on Virtual reality continuum and its applications, 2006
movements during the sessions. Despite that, most participants
[9] Montes, R., Bedmar, M., and Martin, M. S. EMG
did not want to use the system to practice or teach the piano.
Biofeedback of the Abductor Pollicis in Piano Performance
Although a conservative attitude might be a minor factor for this
Brevis. In Biofeedback and Self-Regulation, 2, 18, 1993
result, we think that the our approach needs to be integrated into
a systematic piano syllabus to make it more convincing. To this [10] Mora, J., Lee, W., Comeau, G., Shirmohammadi, S., and
end, we are currently applying the presented approach to other Saddik, A. E. Assisted Piano Pedagogy through 3D
playing movements. Visualization of Piano Playing. In HAVE 2006 - IEEE
International Workshop on Haptic Audio Visual
6. FUTURE WORK Environments and their Application, 2006
Gorodnichy and Yogeswaran developed a system that allows to [11] Riley, K., Coons, E. E., and Marcarian, D. The Use of
track hands, fingers and the position of the keyboard in the Multimodal Feedback in Retraining Complex Technical
images of a camera placed above the keyboard [5]. The visual Skills of Piano Performance. Medical Problems of
tracking of the Elbow Piano could be improved using this Performing Artists, 20, 2, 2005
approach and the user would not need to configure the position
of the keyboard. [12] Shirmohammadi, S., Khanafar, A., and Comeau, G.
MIDIATOR: A Tool for Analyzing Students' Piano
Movements in the elbow joints are not only performed to press a Performance. In Revue de recherche en éducation musicale,
key downwards. They also occur when the player moves the 2, 2006
hand forwards, backwards or sideways. These changes of elbow
[13] Smoliar, S. W., Waterworth, J. A., and Kellock, P. R.
angles could be estimated using information gained by visual
pianoFORTE: A System for Piano Education Beyond
tracking of the hands. The goniometer data input could be
Notation Literacy. In MULTIMEDIA '95: Proceedings of
cleaned from this effect before the recognition method is
the third ACM International Conference on Multimedia,
applied. The posture of the player has also an effect on the
1995
288
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
(a) BaseUnit
Musical keyboard instruments have a long history, which
resulted in many kinds of keyboards (claviers) today. Since 1 octave higher diapason
the hardware of conventional musical keyboards cannot be than that of the BaseUnit
289
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Sound Generator
Wireless Module
Host
Highend Simple
EnhancedUnit EnhancedUnit
290
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Wireless module
Microcomputer
Connector Input/output devices Figure 5: An EnhancedUnit with electric motor
(EnhancedUnit Only)
board neighboring an EnhancedUnit equipped with distance
Figure 4: The hardware of Unit sensors. For example, the longer the distance between the
UnitKeyboard and the EnhancedUnit, the higher the dia-
pason of the UnitKeyboarda.
Establishing connection to the host A UnitKeyboard
broadcasts a “New Entry” command after it is turned on, Acceleration sensor Users control the tone of UnitKey-
and when the UnitKeyboard receives acknowledgement from boards with the users’ posture that is calculated and de-
the host, it sends an “ID” and “connector data”, such as tected from data of the acceleration sensor.
the number of connectors, to the host.
Motor Users can move UnitKeyboards automatically by
Sending keying data A UnitKeyboard sends keying data using an EnhancedUnit equipped with motors attached to
to the host, when the status of the UnitKeyboard keys is a propeller and wheels. For example, if musicians use an
changed. EnhancedUnit equipped with a motor and wheels, they
can add/subtract a diapason by automatically moving a
Sending connection data A UnitKeyboard sends a “Con-
UnitKeyboard as shown in Figure 5.
nection Status” command to the host, when the status of
its connectors is changed.
4. CONSIDERATIONS
3.3 EnhancedUnit We discuss the usability of proposed UnitKeyboard from
The EnhancedUnit has two models: a simple model that the reviews by 5 amateur pianists and 5 professional pianists
only controls the diapason of a UnitKeyboard and a high- that actually used the UnitKeyboard. We have demon-
end model that is equipped with sensors, actuators, and a strated UnitKeyboard in various kinds of events such as
wireless module to operate settings of the UnitKeyboards. Kobe Luminarie Live Stage on December 8th and 9th, 2007.
The former is inserted between UnitKeyboards to increase It began in 1995 and commemorates the Great Hanshin
the diapason. It has a simple structure that consists of two earthquake of that year about 4 million participants at-
connectors and a variable electric resistance.Since the con- tended last year.
nectors of a UnitKeyboard can measure the change of volt-
age that works with the number of the variable resistance,
4.1 Performance Evaluation
UnitKeyboards that interleave with simple EnhancedUnits
convert the amount of voltage to changing the diapason. Visibility We checked the function that automatically
Figure 4 shows hardware of the high-end EnhancedUnit. assigns the settings of the UnitKeyboard assuming the re-
The main differences between the EnhancedUnit and the lationship among all the UnitKeyboards were working well.
UnitKeyboard are that the EnhancedUnit does not have a The host settled conflicting settings among the UnitKey-
keyboard and has various input/output devices. The high- boards. Moreover, the proposed automatic-assignment al-
end EnhancedUnit has the following functions. gorithm was intuitive from participants’ reviews.
Because he participants could see the connection rela-
Connection to the host The enhancedUnit broadcasts tionships between the UnitKeyboards, it was easy to rec-
a “New Entry” command after the power is turned-on and ognize the relative diapason of each UnitKeyboard. How-
establishes connections with the host just like a UnitKey- ever, it was difficult to recognize the absolute diapason of
board. each UnitKeyboard. In present implementation, partici-
pants could not see the BaseUnit and the diapason of the
Sending connection data The EnhancedUnit monitors BaseUnit. Therefore, participants had to press the keys of
the status of its own connectors, and it sends a “Connection each UnitKeyboard to check the diapason.
Status” command to the host when it detects a change of For future work, we plan to develop an EnhancedUnit
connection just like the UnitKeyboard. with LEDs and a display for checking the settings of the
UnitKeyboard.
Sending of input data from input devices The En-
hancedUnit collects data from input devices, and informs Wireless vs. Wired connections We adopted a wireless
the host of this according to the requirements of the host. connection for communication between the host and the
Units.
Control of output devices The EnhancedUnit controls In the wireless connection, although there was some de-
output devices according to commands sent from the host. lay between the keying to the output sound. The delay was
not so noticeable in the music. However, the more UnitKey-
3.3.1 Input/Output devices boards were used, the higher the possibility was for packet
We developed a high-end EnhancedUnit prototype equipped loss and longer delays.
with various kinds of input/output devices. On the other hand, the delay produced using wired con-
nection was less than that of the wireless connection.
Distance sensor Users can control diapasons of a UnitKey- Because both methods have advantages and disadvan-
291
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
5. CONCLUSIONS
We proposed the UnitKeyboard, which can apply vari-
ous kinds of keyboard instruments by connecting one-octave
keyboards together. Moreover, the UnitKeyboard has var-
ious functions such as the automatic settings considering
the relationship among multiple UnitKeyboards, intuitive
controls and new performance using an EnhancedUnit.
We intend to evaluate the hardware and the usability of
the system in the future.
6. ACKNOWLEDGMENTS
This research was supported in part by a Grant-in-Aid for
Scientific Research (A) (17200006) from the Japanese Min-
istry of Education, Culture, Sports, Science and Technol-
ogy, a Grant-in-Aid for Scientific Research from the JSPS
Research Fellowship, and by the Hayao Nakayama Founda-
tion for Science & Technology and Culture.
7. REFERENCES
292
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Cléo Palacio-Quintin
LIAM - Université de Montréal - Montreal, QC, Canada
IDMIL - Input Devices and Music Interaction Laboratory
CIRMMT - Centre for Interdisciplinary Research in Music Media and Technology
McGill University - Montreal, QC, Canada
cleo.palacio-quintin@umontreal.ca
ABSTRACT
After eight years of practice on the first hyper-flute proto-
type (a flute extended with sensors), this article presents
a retrospective of its instrumental practice and the new
developments planned from both technological and musi-
cal perspectives. Design, performance skills, and mapping
strategies are discussed, as well as interactive composition
and improvisation.
Keywords
hyper-instruments, hyper-flute, sensors, gestural control,
mapping, interactive music, composition, improvisation
1. INTRODUCTION
Since 1999, I have been performing on the hyper-flute [13].
Interfaced to a computer by means of electronic sensors and
Max-MSP software, the extended flute enables me to di- Figure 1: The hyper-flute played by Cléo Palacio-
rectly control the digital processing parameters as they af- Quintin. Photograph by Carl Valiquet.
fect the flute’s sound while performing and allows me to
compose unusual electroacoustic soundscapes.
Until now, I mostly used the hyper-flute to perform im- sonorities for the flute in my own compositions. Already
provised music. Wishing to expand a repertoire for the familiar with electroacoustic music and with the use of the
hyper-flute, I began doctoral studies in January 2007 to computer, it was an obvious step to get into playing flute
work on written compositions. Before developing a core with live electronics. My goal was to keep the acoustic rich-
repertoire, I decided to review my experience with the in- ness of the flute and my way of playing it. The computer
strument. would then become a virtual extension of the instrument.
This article presents the original design of the hyper-flute During post-graduate studies in Amsterdam, I had the
and the learning experience of eight years of practice on it. chance to meet the experienced instrument designer Bert
The performance skills and mapping strategies developed Bongers [3]. In 1999, I participated in the Interactive Elec-
over time now suggest new enhancements of the instrument. tronic Music Composition/Performance course with him
Technological and musical issues in the development of a and the meta-trumpeter Jonathan Impett [9] at the Darting-
new prototype of the hyper-flute as well as a hyper-bass- ton International Summer School of Music (U.K.). There,
flute will be discussed. I made my first attempt at putting several sensors on my
flute, programming a Max interface, and performing mu-
2. BACKGROUND sic with it. Several months later, I registered as a student
at the Institute of Sonology in The Hague (The Nether-
2.1 Why, Where and When lands) in order to build my hyper-flute. The prototype of
the hyper-flute was mainly built during the Fall of 1999
By the end of my studies in contemporary flute perfor-
with the help of Lex van den Broek. Bert Bongers was a
mance (Université de Montréal – 1997), I was heavily in-
valuable consultant for the design. He also made the main
volved in improvised music and had started looking for new
connector from the sensors to the Microlab interface.
293
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
294
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
295
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Combinations of convergent and divergent mappings are different levels of interactivity between each other. We can
always experienced while playing an acoustic instrument. It divide these structures in 3 distinct types:
seems much more appropriate to control complex sound pro-
cessing parameters according to the same principles. These • sound processing transforming the acoustic sound,
highly non-linear mappings take substantial time to learn, • sound synthesis,
but further practice improves control intimacy and compe-
tence of operation. • pre-recorded sound material.
Different sound processing methods demand different ways
of controling them. Mappings must be adapted for each On the hyper-flute, I have focused on the development of
specific situation, and a lot of fine tuning is necessary. I the first type: transforming the flute sound with live digital
experimented with different combinations of direct, conver- processing. However, when looking for new extended flute
gent and divergent mapping, some being more suitable to sonorities, the process also leads to the integration of sound
control specific sound processing patches. As my software synthesis.
evolves for each new piece, no definite mapping is possible. In an improvisational context, the interactive computer
However, I try to keep as much consistency as possible in environment is designed to maximize flexibility in perfor-
the use of sensors, so that the precision of the control is mance. The environnement must give the opportunity to
maintained for each performance. generate, layer and route musical material within a flexi-
ble structure, like an open form composition. Ideally, the
computer environment would give the same improvisational
4. INTERACTIVE COMPOSITION, freedom the performer has developed with his acoustic in-
IMPROVISATION & PERFORMANCE strument. Each performer has his personal repertoire of
instrumental sounds and playing techniques from which he
Joel Chadabe is one of the pionneers of real-time com-
can choose while performing. This sound palette can be
puter music systems. In 1983, he proposed a new method
very wide, and switching from one type of sound to another
of composition called interactive composing, which he de-
is done within milliseconds. Of course, any interactive ges-
fined in the following terms: “An interactive composing
tural interface has a limited number of controllers. The
system operates as an intelligent instrument – intelligent
sound processing patches can only generate the sounds that
in the sense that it responds to a performer in a complex,
have been programmed (even if they include some random
not entirely predictable way, adding information to what
processings). The freedom of the performer is somewhat
a performer specifies and providing cues to the performer
limited by the computer’s environment.
for further actions. The performer, in other words, shares
My long term goal is to develop an interactive sound pro-
control of the music with information that is automatically
cessing palette that is as rich and complex as my instru-
generated by the computer, and that information contains
mental one. I want to improvise freely and to be able to
unpredictable elements to which the performer reacts while
trigger many different processes at anytime, and this with-
performing. The computer responds to the performer and
out disturbing my flute playing. Though there are still pro-
the performer reacts to the computer, and the music takes
gramming issues to be addressed before achieving an ideal
its form through that mutually influencial, interactive rela-
environment, I have always felt more limited by the number
tionship.” (page 144) [5]
of controllers and buttons on the hyper-flute. This has led
From this point of view, the performer also becomes an
me to new developments on the instrument itself.
improviser, structuring his way of playing according to what
he hears and feels while interacting with the computer.
In most cases, users of interactive computer systems are 5. NEW DEVELOPMENTS
at once composer, performer and improviser. Due mostly After eight years of practice, I am now very comfortable
to the novelty of the technology, few experimental hyper- playing the hyper-flute. I have also developed a very good
instruments are built by artists. These artists mostly use knowledge of my musical needs in order to control the live
the instruments themselves. There is no standardized hyper- electronics while performing. Over the years, I found what
instrument yet for which a composer could write. It is works best and what is missing on the instrument. So I
difficult to draw the line between the composer and the decided to make a new prototype which will feature some
performer while using such systems. The majority of per- new sensors. As I also perform on the bass flute, an hyper-
formers using such instruments are concerned with impro- bass-flute is in development. The following sections briefly
visation, as a way of making musical expression as free as presents the planned design of those new hyper-instruments.
possible. Jonathan Impett also thinks that the use of com-
puters to create real-time music has profoundly changed the 5.1 Hyper-Flute
traditional kinds of music practices. “In such a mode of pro- To maintain the playing expertise I have developed over
duction, the subdivisions of conventional music are folded the years, most sensors used since 1999 will be used in the
together: composer, composition, performer, performance, same physical configuration, but will include technical im-
instrument and environment. Subject becomes object, ma- provements (ultrasound transmitter, magnetic field sensors
terial becomes process.” (page 24) [10] on the little fingers, and force sensing resistors under the left
Using an interactive computer system, the performer has hand and thumbs). There will be several more buttons on
to develop a relation with different types of electroacoustic the new prototype, located close to the right thumb which
sound objects and structures. These relationships consti- is more free while playing.
tute the fundamentals of musical interaction. The computer Earlier I mentionned the necessity to have more sensors
part can be supportive, accompanying, antagonistic, alien- which do not disturb the hands and fingers while playing.
ated, contrasting, responsive, developmental, extended, etc. The new prototype is thus designed with a two axis ac-
All the musical structures included in a piece have different celerometer placed on the foot-joint of the instrument. This
roles. Some affect the micro-structure of a musical perfor- accelerometer gives information about the position of the
mance, others affect the macro-structure and many are in flute (inclination and tilt of the instrument) in a continu-
between. The interaction between the performer and these ous data stream instead of the simple on/off switches used
musical structures vary. The structures can also support previously.
296
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
5.3 Interface
For both hyper-flutes, I will replace the Microlab device
with a new interface using the Open Sound Control proto-
col. “OSC is a protocol for communication among comput-
ers, sound synthesizers, and other multimedia devices that is
optimized for modern networking technology. Bringing the
benefits of modern networking technology to the world of
electronic musical instruments, OSC’s advantages include
interoperability, accuracy, flexibility, and enhanced organi-
zation and documentation.This simple yet powerful proto-
col provides everything needed for real-time control of sound
and other media processing while remaining flexible and easy
Figure 3: Accelerometer and ultrasound transducer
to implement.” [2]
mounted on a Bo-Pep
This protocol will allow the transmission of different types
of parameters with more resolution and velocity. This will
be achieved with fewer intermediary interfaces and will be
The present proprioceptive sensors on the hyper-flute give much faster. Data will go directly from one interface to
information about muscle actions that are not visible to the computer through a USB connection. Previously, the
the audience (except for the ultrasound sensor and the tilt Microlab was plugged to a MIDI Interface then to the com-
switches working with the inclination of the instrument). puter.
The use of an accelerometer will give more multidimensional A new ultrasonic range finder is being implemented on a
data about movements and position which are visible by the PSoC chip by Avrum Holliger at IDMIL. It has a much more
auditors. This will help to correlate the amount of activity refined resolution than the one used on the Microlab, which
of the computer with the physical activity of the performer. was limited to 128 values by the MIDI protocol. This new
The amount of data produced by the accelerometer greatly range finder will be directly linked to the main interface.
increases the possibilities of multiparametric mapping and For the bass flute, it is possible to install the complete
permits the development of more complex musical struc- interface on the instrument. The hyper-bass-flute will be
tures. This will be very helpful to increase the number of connected to the computer with a single USB cable. A
tasks while playing. For example, one can use the inclina- prototype is now in development using a Arduino-mini in-
tion to scroll through long menus of different sound process- terface [1] which is small enough to fit on the instrument.
ing modules or to choose between several buffers to record Wireless connection is not desirable because of its need for
in. This way, only one button is necessary to trigger many power. A 9 volt battery would be too heavy to install on
different tasks. As I am already aware of the instrument’s the flute.
inclination while playing (because of the tilt switches), it
is now easier to remember the physical position at various 5.4 Mapping Research Project
angles. For my doctoral project, my compositions will aim to
Fastening the sensors on the flute has always been prob- optimize the mappings of my extended instruments in the
lematic. I own only one (expensive) flute and I do not context of new computer music pieces. My first intention
wish to solder anything onto it. Therefor I have been using when building the hyper-flute was to use the natural ges-
double-sided tape to attach the sensors to the flute. This tures of the flutist to control sound processing parameters.
way, the sensors can be taken off when the instrument needs However, as stated above, I was obliged to develop new
to be cleaned or repaired. But this is a tedious exercise and playing techniques to control some of the sensors.
there is always a risk of breaking them. I am now trying to In the Performance skills section, I mention that the ul-
build the sensors on clips that can easily be attached and trasound transducer, pressure sensors and magnet sensors
removed. This will make it easier to transform any flute continually capture the natural movement of a performer.
into a hyper-flute, and will eventually give opportunities to It is a similar situation with the new accelerometer. Those
other performers to play my music. gestures are directly related to the musical material being
A first test was to use a Bo-Pep for the accelerometer performed.
and ultrasound transducer (as showed on Figure 3). These With the new prototype of the hyper-flute, more infor-
plastic hand supports for the flute are simply clipped on the mation from the natural gestures of the performer will be
body of the instrument, and can be taken on and off in a usable. I would like to use these gestures to control the
second. Some sensors can simply be applied on a Bo-Pep, computer so that the performer will not need to add too
while others will need to use a custom made clip. many extra movements. To achieve this, I will study the
gestural data captured by the new hyper-flute (and hyper-
5.2 Hyper-Bass-Flute bass-flute) [15].
I am also developing a hyper-bass-flute, a noticeably dif- Instrumental music material will be written first, then
ferent instrument than the hyper-flute. The bass flute has performed on the hyper-flutes. The performer will play
the advantage of being much bigger so there is more space without taking notice of the sensors. All the gestural data
to attach sensors. Nevertheless, the weight of the instru- will be recorded together with the flute sound. I will then
ment limits the capacity of the thumbs to reach different be able to analyse the gestural data in a specific musical
sensors while playing. The new design of the sensors needs context. This analysis will guide the choice of mappings
to be different than the hyper-flute. Only the accelerome- between the sensors and the computer’s live processing pa-
ter and ultrasound transducer can be installed on the bass rameters. The use of sensors will be precisely specified in
flute as on the flute. Compositional strategies will need to a musical context and will be directly related to the per-
be adapted for this instrument and a new period of learn- former’s natural gestures. This should allow a more subtle
ing will be necessary to perform with it. Even if many and expressive control of the sound processing than is pos-
controllers will be different, I expect the learning process to sible in an improvised music context.
be much faster due to my experience with the hyper-flute. To explore the differences of motion between performers,
297
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
298
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
299
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
300
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
301
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
LPF Bridge
Out 1 Out 2
g
LPF 1 + − − + LPF 2
model output given various excitation sources such as pluck- [6] E. Berdahl, G. Niemeyer, and J. O. Smith.
ing, picking, bowing, and scraping. To enable others to ex- Applications of passivity theory to the active control
cite their physical models with quality excitation signals, of acoustic musical instruments. In Proc. of the
we provide the corresponding non-resonant excitation sig- Acoustics ’08 Conference, June 2008.
nals themselves. Finally, an example melody played on the [7] C. Cadoz, A. Luciana, and J.-L. Florens. Synthèse
tangible virtual string demonstrates the viability of physi- musicale par simulation des mécanismes
cally motivated instrument design. instrumentaux. Revue d’acouqistique, 59:279–292,
1981.
5. FUTURE WORK [8] A. Freed and O. Isvan. Musical applications of new,
The behavior of the interface could be further refined with multi-axis guitar string sensors. In Proc. of the Int.
force-feedback. For example, the excitation string segment Computer Music Conf., Aug. 27-Sept. 1, 2000.
could be made into a haptic device by adding an actuator. [9] B. Hannaford. A design framework for teleoperators
Then the piece of physical string could be joined to a portion with kinesthetic feedback. IEEE Transactions on
of the waveguide using teleoperator techniques [9]. It would Robotics and Automation, 5(4):426–434, August 1989.
be essential that the string segment would have as little mass [10] R. Hanson. Optoelectronic detection of string
as possible to avoid loading down the virtual waveguide at vibration. In The Physics Teacher, volume 25, pages
the point of connection.We would also like to eventually con- 165–166, 1987.
struct a six-string version to promote the maximum transfer [11] A. Hoover. Controls for musical instrument sustainers.
of guitarists’ skills from real guitars to virtual guitars. U.S. Patent No. 6034316, 2000.
[12] M. Karjalainen, T. Mäki-Patola, A. Kanerva, and
6. CONCLUSION A. Huovilainen. Virtual air guitar. Journal of the
We have presented a physically motivated interface for Audio Engineering Society, 54(10):964–980, October
controlling a virtual digital waveguide string. The excita- 2006.
tion and pitch are sensed separately using two independent [13] K. Karplus and A. Strong. Digital synthesis of plucked
string segments. In contrast with prior interfaces, the exci- string and drum timbres. Computer Music Journal,
tation to the physical model is measured according to the 7(2):43–55, 1983.
principles of physically motivated interfaces. In particular, [14] L. Kessousand, J. Castet, and D. Arfib. ’gxtar’, an
we measure the excitation signals with high quality, linear, interface using guitar techniques. In Proc. of the
and low noise sensors at the audio sampling rate. We hope International Conf. on New Interfaces for Musical
that interfaces such as this one will continue to promote skill Expression, pages 192–195, 2006.
transfer from traditional acoustic musical instruments to the [15] N. Lee and J. O. Smith. Vibrating-string coupling
virtual domain. estimation from recorded tones. In Proc. of the
Acoustics ’08 Conference, June 2008.
7. ACKNOWLEDGMENTS [16] A. Luciani, J.-L. Florens, D. Couroussé, and
C. Cadoz. Ergotic sounds. In Proc. of the 4th Int.
We thank the Stanford Graduate Fellowship program for
Conf. on Enactive Interfaces, November 2007.
supporting this work.
[17] E. Miranda and M. Wanderley. New Digital Musical
Instruments. A-R Editions, Middleton, WI, 2006.
8. REFERENCES [18] J. O. Smith. Physical Audio Signal Processing: For
[1] http://www.graphtech.com/. april 14, 2008. Virtual Musical Instruments and Audio Effects.
[2] http://www.starrlabs.com/. april 14, 2008. http://ccrma.stanford.edu/˜jos/pasp/, 2007.
[3] http://www.vg-8.com/. april 14, 2008. [19] M. Yunik, M. Borys, and G. Swift. A digital flute.
[4] http://www.yamaha.com/. april 14, 2008. Computer Music Journal, 9(2):49–52, Summer 1985.
[5] W. Aitken, A. Sedivy, and M. Dixon. Electronic
musical instrument. International Patent,
WO/1984/004619, 1984.
302
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
303
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
offering novel and interesting functions, contents, interactions and interface complexity and interaction space, 2) the development of
styles of presentation. PQ measures the traditional concept of interface prototypes with high-level tools, 3) user testing and
usability, i.e., how well the user achieves her goals with the analysis, and 4) cycles of refinement and final implementation. In
product. This model has been implemented as AttrakDiff2™, a this study we classify NIMEs along two dimensions: interaction
web-based instrument for measuring the attractiveness of space and interface complexity. The interaction space is defined
interactive product. With the aid of pairs of opposite adjectives, as the spatial extent occupied by the user during interaction with
users indicate how they experience the design [5]. Innovative the NIME. Figure 1 denotes the different values for interaction
interfaces for intuitive music expression should also have space along with the second dimension interface complexity. A
pragmatic and hedonic qualities. Creating musical expressions detailed description of the two dimensions and how to define and
with new interface concepts should be easier and more joyful for measure interface complexity for NIMEs will be presented in
a performer and attractive for the audience. Thus, measurement [15]. The prototype development is divided into a music
approaches for hedonic and pragmatic qualities, like the ones generation backend and an interface front end. The presented
developed by the HCI community, can be very helpful to design interface variants were assembled using the MIDI-based sensor
and evaluate new musical interfaces. In particular, we used the kits from I-CubeX [2], optical tracking using IR lighting and/or
AttrakDiff2™ approach to evaluate our designs. fiducials [13], and the Wiimote game controller.
3. RELATED WORK
As we see the work presented here falling into two categories we
will refer to both, Theremin inspired controllers as well as the
evaluation of new musical interfaces. Due to its successful use
during decades the Theremin has a sustained influence on
researchers working in the field of musical expression. Looking at
the idea to use two hands, freely moving in the air to play
electronic sounds "the hands" [1] are an early example of an
highly flexible and expressive musical interface offering a set of
sensors and keys to be played by hands and fingers. For his
virtual musical instruments Mulder [3] used data gloves and a
Polhemus 3-D tracking system to shape and play sounds including
a visual representation of sounding objects and virtual hands. A Figure 1. Classification
Swedish project used optical tracking to develop virtual
The prototyping of the back-end was straight forward. We
instruments that are controlled by gestures [4]. Four virtual
implemented a small synthesizer application using Native
instruments including a virtual xylophone and air guitar have been
Instruments Reaktor 5. We implemented a lean MIDI interface
developed using a Cave-like virtual room and have been
that allows selecting a small set of predefined instruments or
evaluated concerning their efficiency and learning curve. As we
effects and controlling effect parameters, pitch and volume by
are following the interface paradigm "low threshold, high ceiling"
MIDI commands. This allows us to develop different variants of
the controllers involved in this research are low cost and
input devices and techniques without altering the music
commercial available ones. We use the Nintendo Wii controllers
generation back-end. The PDA based version and the massive
and as we do, others have been using those for the control of
multi user version will be described in [15].
sound. Paine [6] developed a method for dynamic sound
morphology using the Wiimote and the Nunchuck controllers 4.1 VRemin I – Wii Controller
seeking two goals, to increase the performance and the ability for The initial approach for a Theremin-based interaction scenario
communication with the audience. While having the system uses the Wii game controllers for interaction. For the VRemin I,
successfully used in several concerts he points out the potential of the Wiimote is controlled by the dominant hand (DH, usually
further investigation. While a method for evaluation using tools right) and the Nunchuck is used by the non-dominant hand (NDH,
from HCI has been provided by Wanderley and Orio [7] it has not usually left). Pitch and volume are controlled by the Wiimotes
yet often been used. Isaacs [10] presents a study comparing a 3-D acceleration sensor. The buttons are used for switching the current
accelerometer with a Korg Kaosspad KP2 looking at participants note on and off and permit to interrupt the sound generation. The
learning to play with those. In addition, a method to compare NDH is used to select the predefined effect with Nunchuck
digital instruments based on findings of music psychology on buttons and control the effect parameter with the acceleration
musical expression has been provided [8]. Since basic evaluation sensor (see figure 2). The Wiimote / Nunchuck values are
methods seem not yet to have been established, however, we recorded and transformed to midi notes using the DarwiinRemote
consider this as an important issue to provide and test frameworks software and a virtual midi device (IAC device driver). The
for evaluation of newly designed interfaces. interaction space is determined by each hand’s rotation and thus is
the smallest of all variants. The sound generation is designed as
4. DESIGN OF VRemin VARIANTS an asymmetrical two-handed interaction [10] because both hands
The design of NIMEs is a challenging task and we propose a perform different tasks and with different gestures / interaction
participatory design approach that evaluate design prototypes techniques.
through end users and iteratively refine the designs according to
test results. Our design approach for NIMEs consists of the
folliwing steps: 1) the classification of NIMEs with regards to
304
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The second variant interacts within an arm-based space. The Figure 4. Participant questionnaire
VRemin II tracks the X and Y position of the dominant hand and Subjects were assigned the test variants in permuted order to
assigns volume level and pitch level based on the assigned exclude corruption of the results by learning experience or change
position values. We selected optical tracking for monitoring the of perception by preceding tasks. Each subject received a short
hand movement. In the current prototype we attached a small introduction to the instrument immediately before each test. After
fiducial marker at the DH’s wrist and use a web cam to capture a short familiarization period of five minutes the subject was
the image. The reacTIvision software package [13] is used to given two tasks. The first task consisted in playing a musical
analyze the image, calculate the position values and sends a MIDI scale. The second task allowed the subject to improvise to a
value which controls pitch of the sound to the back end interface. played back drum beat. Subsequently the subject answered two
The selection and adjustment of effects is also controlled with the questionnaires. The first form is based on the attrakDiff2
DH. We built a small custom glove-based input device that uses a described in section 2. A second form amended the attrakDiff2
bend sensor for each finger. The sensors are connected to an I- questionnaire. This form queries facts which are also similarly
CubeX midi converter that generates MIDI signals. The poses of surveyed in attrakDiff2 (complexity, precision, comfort, etc.) and
all fingers determine an individual combination of effects and thus increases result validity. In addition the subjects evaluate
their strength. This approach realizes a unimanual interaction musical qualities to allow for further indications on the adequacy
technique. of the interface as a musical instrument. Due to the limited
training of the subjects as musicians this questionnaire was
considered only in a reduced fashion. The tests were concluded
for each subject with a comparison questionnaire, which allowed
to comparatively checking the results of the previous
questionnaires once more. The result diagram of the attrakDiff2
questionary (figure 5) shows that the Theremin (number 3) is
neutrally valued in pragmatic (PQ) and hedonic (HQ) quality. The
Theremin was deemed suitable for playing music, but it only
achieved average evaluations. Also the hedonic quality was
within the average range. The Theremin was only averagely
interesting. The VRemin II (number 1) has similar PQ values as
the Theremin. It supports the user in fulfilling his tasks, but
obtains only average values. However, compared to the Theremin
it is valued more exciting and it generates curiosity (HQ). Both
Figure 3. VRemin II – Hand tracking and Glove (prototype) instruments show a large variance in user rating (confidence
rectangle). The VRemin I (number 2) convinced the subjects with
its pragmatic and hedonic qualities. (PQ and HQ of number 2).
5. EVALUATION The users had the feeling to be able to play music more easily and
We are interested in usability and manageability of our interfaces
it motivated and stimulated the user more than the other two
as well as the potential they offer for interaction variations. Since
variants. The Theremin and the VRemin II show potential for
the AtrakDiff test is new in the evaluation of musical interfaces
improvement with respect to usability, though the VRemin II with
we decided to start with a pre-test. Controlled laboratory tests of
a value of 0.6 was deemed more attractive than the Theremin at
the variations »Theremin«, »VRemin I« and »VRemin II« were
0.05. Figure 8, right shows the averages of some important values
performed for deployment to the usability evaluator. Three
of the second questionnaire. The VRemin II is deemed as more
questionnaires surveyed the subjects’ evaluations after each test.
complex and having more capabilities than the Theremin and the
Video and audio recordings added quantitative data to the
VRemin I. The VRemin I is superior to the other two variants
collected materials. However, the pre-test focuses on the
with respect to handling, usability, precision, comfort, and
quantitative data of the questionnaires, since the surveyed subjects
controllability.
305
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Ease of handling
variance. The planned test for the analysis of the potential of
Instrument operation
musical expression of the evolutionary VRemin series should give
Precision more information on the pragmatic quality and further the
Complexity advancement of the VRemins as musical instrument. We plan to
Range of capabilities
use this research to develop Interface Design Patterns for Musical
Your progress
Instruments (IDP-FMI).
Comfort
Reached goals
Interface optic 7. REFERENCES
Sound quality [1] M. Waisvisz. The hands, a set of remote midi-controllers. In
Measurment of control B. Truax, editor, Proceedings of the 1985 International
Computer Music Conference, pages 313-318, 1985.
Theremin VRemin I VRemin II
[2] www.infusionsystems.com
Figure 5. AttracDiff Results and second questionnaire
[3] A. Mulder. Design of three-dimensional instruments for
Progress with all instruments is deemed similar in the neutral sound control. PhD thesis, Simon Fraser University,
range (value 3). With respect to appearance the VRemin I is Vancouver, Canada, 1998
placed before VRemin II and the Theremin in the positive range.
[4] Mäki-Patola, T., Laitinen, J., Kanerva, A., Takala, T.
This confirms the findings of the attrakDiff2 questionnaire. Both
Experiments with Virtual Reality Instruments, In Proc. NIME
VRemins appeal more to the subjects. Both generate curiosity and
2005, pages 11-16, Vancouver, Canada, 2005.
stimulate the participants. Due to the complexity and prototypical
character of the VRemin II the subjects were able to attain the [5] www.attrakdiff.de
task goals (pragmatic values), but they are valued slightly below [6] G. Paine. Interfacing for dynamic morphology in computer
the values for the Theremin and much more so below the VRemin music performance. In Proceedings of the 2007 International
I. On the other hand, the VRemin II is valued as more complex Conference on Music Communication Science, pages 115-
and having more capabilities. The VRemin I obtains throughout 118, Sydney, Australia, December 2007.
neutral to very good values and hence shows also in the
specialized questionnaire its qualities. While the Theremin is [7] M. M. Wanderley and N. Orio. Evaluation of input devices
viewed as mostly neutral with respect to usability of an interactive for musical expression, borrowing tools from HCI. Computer
device for playing music, the VRemin I is seen as superior in all Music Journal, 26(3):62-76, 2002.
aspects. The VRemin II appeals as interesting, stimulating, and [8] C. Poepel. On interface expressivity: A player-based study.
fascinating with the cost of an increase of complexity and In Proc. NIME 2005, pages 228-231, Vancouver, Canada,
accordingly operational difficulty. Further development of the 2005.
VRemin II will require a reduction in its capabilities or a longer [9] M. Hassenzahl, A. Platz, M. Burmester, K. Lerner. Hedonic
training phase and the transformation of the prototype into a more and ergonomic quality aspects determine a software’s
stable version. A further planned test in spring with a group of appeal. CHI Letters, 2, 1, 201-208, 2000
musicians (sound editors) with a longer training period will
analyze the identified points of criticism and the harmonic [10] D. Isaacs. Evaluating input devices for musical expression.
capabilities of the interactive devices. Master's thesis, University of Queensland, Brisbane,
Australia, 2003.
6. Conclusion
The presented approach methodically analyzed digital input [11] Guiard, Y. A symmetric division of labor in human skilled
devices as computer supported musical instruments. The bimanual action: The kinematic chain as a model. The
presented evaluation steps and the pre-test argue that digital Journal of Motor Behavior, 19 (4), 1987.
developments based on the Theremin appear to the casual player [12] Mason, C. P. Theremin “Terpsitone” A New Electronic
as tantamount. At the same time the attractiveness and use of the Novelty in Radio Craft, Dec. 1936, p.365
new input modes is viewed as more positive and more appealing. [13] Kaltenbrunner, M., Bencina, R. reacTIVision: A Computer-
The capabilities seem to the subjects as higher-valued, than the Vision Framework for Table-Based Tangible Interaction. 1st
original instrument introduced by Lev Theremin. For our Conf. on Tangible and Embedded Interaction. Baton Rouge,
subjects, the VRemin I seems to be superior to the VRemin II for Louisiana, 2007.
playing. However, the VRemin II is still at a prototype stage, has
a higher degree of complexity and this results in a higher degree [14] C. Geiger, H. Reckter, D. Paschke, F. Schulz. Poster:
of usage difficulty. The approach with attrakDiff2, the specialized Evolution of a Theremin-based 3D Interface for Music
questionnaire with explicit music-related questions and validity Synthesis. Conf. 3D User Interfaces, Reno, Nevada, 2008
check by comparison questionnaire has proven itself. The claims [15] D. Paschke. A Method to Design and Implement new
were congruent even with a small number of subjects. The interfaces for musical expressions. Bachelor Thesis,
variance fluctuations are defensible – a larger number of subjects University of Applied Science Harz, (in German,
or a firmer selection of subjects should result in a reduced forthcoming)
306
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
META-EVI
Innovative Performance Paths with a Wind Controller
Tomás Henriques
Faculdade de Ciências Sociais e Humanas, Universidade Nova de Lisboa
Av. de Berna, 26-C 1069-061 Lisboa, Portugal
+351 96 991 1001
tomas.henriques@fcsh.unl.pt
307
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
because they are sonically unsteady, technically difficult, or at 4.1 The extra analog sensors
times insufficiently expressive, they fail at being able to create While most of the extra sensors and switches are placed at
a new performance model. By conquering specific new levels specific places in order to be easily accessed by the fingers of
of performance techniques the META-EVI presents the musician, both the accelerometer and gyroscospe
innovative solutions for the player of a monophonic placement was chosen to fully optimize their readings as a
instrument. These include among others, playing complex function of the performer’s motions. The accelerometer sits at
harmonic structures while simultaneously playing a lead the end of the top part of the instrument, being able to detect
melody, a process where the two strands are played in real the amplitude of the motion within the vertical plane. It
time and totally asynchronous from each other. basically detects how high the musician is holding the
Attempts to tackle this particular issue have been realized instrument, a measurement that can vary between 0-180º.
with some commercial electronic instruments but with very Similarly, the tilt compo nent of the accelerometer will detect
limited results. Specifically with the AKAI EWI 3020 [2] (an roll, determining when and how much the musician bends
offspring of the early EWI) or the Syntophone [3]. These his/her body sideways. Here the amplitude of motion can be
instruments are able to trigger a previously stored stack of made to also vary between 0-180º degrees.
notes in a sound module when a specific pitch is played. This The gyroscope being a sensor that detects rotational
approach, although allowing the playback of some harmonic acceleration is used in the META-EVI as a means of detecting
material presents very little flexibility. fast movements of the upper torso, specifically when the
musician swings his/her upper body while playing the
instrument.
While the force resistance sensor (FRS) was mounted right by
the ‘3rd valve’ of the MIDI EVI being comfortably accessed
by the right hand pinky finger of the musician, the 3-
membrane position sensor - which consists of three small
parallel strips that are able to independently detect the exact
location where they are touched - was placed along the left
side of the instrument. They allow the right hand of the
musician to touch it with either the pinky finger or the thumb,
while playing the three main MIDI EVI keys. They can also
be touched simultaneously with up to three fingers of the right
hand when the musician is not using the virtual valves.
The joystick was placed under the body of the instrument and
it is controlled by the right hand’s thumb and the two linear
Figure 2. Side view: joystick and 3-membrane sensor potentiometers are placed inside the canister of the MIDI EVI.
They two potentiometers are accessed by the index and
The second goal for the creation of the META-EVI was the middle fingers of the left hand of the musician, the same hand
intention of gathering and taking full advantage of the that controls the octave the instrument is playing in as well as
gestures and body motions that are naturally used by a the half-octave of the MIDI EVI. The two potentiometers
performer of a brass/wind instrument. These gestures are have a heavy usage in the extended instrument since they are
always present and very visible and they are used to directly controlled by the left hand of the performer, which is freer
influence various parameters of the software application that from the keying scheme.
controls the ongoing performance. The force resistance sensor, the 3-membrane position sensor,
the joystick and the two linear potentiometers being
4. PLAYING THE META-EVI independent from the player’s gestures allowed them to have
The META-EVI is played by blowing through its mouthpiece,
a more extensive set of functions and to respond to multiple
and fingering four touch sensors or virtual “valves”, which
mapping options.
allow the production of the 12 chromatic steps. Three of those
valves or keys, sit on top of the instrument and are the
equivalent of the three valves of a regular trumpet that are
played with the index, middle and ring fingers of the
musician’s right hand. The fourth valve (that lowers the pitch
by a fourth) consists of a metal ring snug against the lower
edge of the instrument, and it is accessed by the index finger
of the player’s left hand.
The instrument is supported mainly with the left hand, which
holds the canister - a cylinder shaped component located at its
bottom edge. The controller has a pitch span of seven full
octaves with octave switching being done by sliding the
thumb of the left hand on a set of six metal rollers (also touch
sensors) that are housed inside the canister.
308
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
309
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
9. SOURCES
310
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
311
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
312
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
signal that had first been processed through two signal processors,
a sample and bit rate reduction unit and a filter and dub delay
unit. The signal processing varied with the heat, below a threshold
level the sound was passed through the sample and bit rate
reduction unit. The heat parameter controlled the amount of
sample and bit rate reduction so that if the set was left tuned then
slowly over time the sound degraded with a tearing effect. If a
gentle motion was applied to any of the dials the sound recovered,
a harder motion would supply enough energy to remove the
degradation unit from the DSP chain leaving the sound arriving
through the filter and dub delay unit. Here the heat parameter
controlled the length of the delay line, with higher heat leading to
a shorter delay line, to a minimum of 5ms, and the frequency of a
low frequency oscillator. This LFO was added to the position of
the volume dial to control the frequency of a peak equalizer filter
that sat in the feedback path of the delay unit giving it the dub
sound. The heat similarly controlled the frequency of another low
frequency oscillator that controlled the smooth interpolation
between numbers of different filters, the cut off frequency of
which was controlled by the tone dial.
Figure 4. Mapping strategy.
When the set was detuned the altered signal was faded and
replaced by the output of a phase vocoder that was a loop of the
last ten seconds of recorded radio station input. The speed of this
output would slowly drop to give the effect of the real signal
shifting out of place. This provided a sound bed for the key
feature of the untuned set, the sonified, sorted and browsable
contents of the database of detected radio sounds. This was
achieved through the play back of the 250 ms samples, looped,
filtered and enveloped by amplitude and low-pass filter envelopes.
The shape of these envelopes was set to track with the heat of the
system with higher temperatures corresponding to shorter, sharper
envelopes and a brighter filter sound. Similar to the installation’s
tuned mode the behaviour changed when the heat was above a
threshold value. At this point the tone dial would control the
length of the loops such that at either extremity of the dial the end
point of the loop and the sample would match but as it was rotated
towards the centre it shrank towards a 10 ms minimum giving a
glitching effect. The selection of samples to voice were taken from
the database by sorting the samples in order of loudness,
brightness or noisiness depending on which of the FM, MW or
LW buttons was depressed and mapping the location of the
frequency dial to a distance down the sorted list. Up to eight Figure 5. Example of database visualization.
consecutively ordered samples (thus timbrally similar in some
fashion) were voiced at a time. The number and rhythmic timing 3.4 Database visualization
of the samples was controlled by a collection of first order markov
chains, where each entry in a thirty-two step sequence table Tandem to the sonic output of the radio set, the installation
corresponded to the probability of a sound being triggered at that included a projected visualization of the sound database, showing
point. The probabilities changed and interpolated to create more the currently selected sounds and a representation of the unfolding
frenetic probability based beats, with more samples voiced, as the audio output. This was achieved through a custom programmed
heat of the system increased. Processing sketch running on a separate laptop wirelessly
networked to that running Max/MSP and communicating via
OSC. The radius of fifty-one rings was symmetrically mapped to
the energy in twenty-five frequency bands with the radius of
thicker central ring controlled by the loudest band at any time.
The length of the whole graphic equalizer bar extended and
contracted with the volume of the output. The degree of damping
applied both to this motion and the changing ring radius was
controlled by the heat of the system, such that at higher
temperatures low damping afforded rapid, jerky movement closely
following the output while lower temperatures gave a slow glacial
Figure 3. Signal flow diagram. movement. The database of samples was visualized around this
313
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
314
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
315
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Figure 1. Treatment of image data 4. Monalisa: “see the sound, hear the image”
A linear PCM sound data consists of samples. Each sample stands 4.1 Monalisa Application
in a line at uniform intervals (i.e. sampling rate) and is defined Monalisa Application is standalone software that transparently
within a certain range. We sequentially treat each sample as a treats the image data and the sound data. Monalisa Application
separate 8-bit data from the start to the end of the sound data (see uses Core Image and Core Audio API as a basis of its image and
Figure 2). sound processing. The application enables people to open any
types of image data and sound data that the API supports. The
data is treated as both image and sound. Based on the 8-bit
algorithm, the image data can be played as sound and the sound
data can be showed as image. These API also provide access to
system-level plug-in architecture: Image Unit and Audio Unit. By
using these plug-in, Monalisa Application offers to add image and
sound effects for the data. We briefly describe two examples of
the effect for the data: invert the sound, and delay the image.
Figure 2. Treatment of sound data Invert the sound: Invert is an image effect that inverts each color
In this algorithm, we exchange the 8-bit color value for 8-bit value of pixels. In Monalisa Application, each sample of the
sample and vice versa. Each color value of a pixel is separately sound data is treated as separate 8-bit color value. By adding
treated as a sample of a sound data and each sequence of three invert to the sound, each sample of the sound data is inverted
samples is treated as three color values (R, G, B) of a pixel (see within a given range (8-bit) and forms a phase-reversed sound
Figure 3). from the original (see Figure 5).
316
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
4.2 Monalisa-Audio Unit stream of sound through the speaker. Then, the microphone
Monalisa-Audio Unit is plug-in software for sound processing captured the stream of sound and transformed it to the application.
applications. Currently it works on Audio Unit host applications The application re-projected the incoming sound data as an image
on Mac OSX (e.g. Apple GarageBand, Apple Logic Pro). It on the screen from top left to down right of the screen. In this
enables people to add several kinds of image effects for the sound installation, the re-projected image reflects the reverberation of
data in real-time by wrapping existing Image Unit plug-in as the room as duplications of the image (see Figure 8).
Audio Unit plug-in. In Audio Unit host applications, the plug-in
behaves as a single Audio Unit plug-in. By using this plug-in, the
sound data is split into separate bitmap image by given buffer size
along to time line. Based on the 24-bit algorithm, each sample of
the sound data is treated as a pixel of a 24-bit image data. Image
effects (i.e. Image Unit plug-ins) can be added to the sound data
through Monalisa-Audio Unit. Several kinds of Image Unit plug-
ins are pre-installed on Mac OSX. Each pixel of the processed
image is re-treated as a sample and formed a new sound data in Figure 8. Original image and re-projected image.
real-time. We have developed two custom versions of Monalisa application.
One for image capture and sound production, and another for
4.3 Monalisa-Image Unit sound capture and image production. Both applications were
Monalisa-Image Unit is plug-in software for image processing installed in different PC. The first one (PC1) was connected to the
applications. Currently it works on Image Unit host applications camera, the projector, and the speaker. It captured the image, and
on Mac OSX (e.g. Pixelmator, Apple Motion). It enables people played the image as the sound. The other one (PC2) was
to add existing sound effects for the image data within several connected to the microphone and the speaker. It captured the
applications by wrapping existing Audio Unit plug-in as Image sound, and projected the sound as the image. To control the
Unit plug-in. It works not only for the static image data, but also trajectory of the installation, we employed MaxMSP [4] in other
for the motion graphics data by treating it as a collection of static PC (PC3). It was connected to the switch, the light, and a video
image. The plug-in works with standalone software Monalisa- switcher. When it received a signal from the switch, it sent a
Image Unit Generator. The software separately wraps each capture message for PC1 through Open Sound Control (OSC) [9].
existing Audio Unit plug-in as an Image Unit plug-in. In Image The light was controlled by DMX controller through MIDI. When
Unit host applications, each wrapped Audio Unit plug-in behaves the light turned to the darkness, PC3 sent a play message for PC1,
as a single Image Unit plug-in. Based on the 8-bit algorithm, each a capture message for PC2, and switched the video switcher from
color value of the pixel is treated as an 8-bit sample of the sound PC1 to PC2. After PC2 projected whole captured sound, PC3
data. Audio effects (i.e. Audio Unit plug-ins) can be added to the increased the light to ordinal level (see Figure 9).
image data as standard Image Unit plug-ins. Several kinds of
Audio Unit plug-ins are pre-installed on Mac OSX. Each sample
of the processed sound is re-treated as a color value of pixel and
formed a new image data in real-time.
5. DISCUSSIONS
In Monalisa, image data become sound data and vice versa. As
Whitelaw cited from the email of Christopher Sorg, "all data
Figure 7. Setting of Monalisa "shadow of the sound". inside the computer is essentially the same, ... either with ears or
eyes, or whatever senses we care to translate the switching of 1s
When entering the room, each participant saw his/her image and 0s into..." [15]. While sonification or visualization lay
projected on the screen. When he/she pushed the switch, the emphasis on the use of sound / image to help a user monitor and
image was captured as a static bitmap image data and the light of comprehend whatever it is that the sound / image output
the room was gradually decreased to the darkness. The image data represents [8], our software platform gives people new modes of
was transformed to the application and automatically played as a manipulation to employ the sound data and the image data as
317
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
materials to produce their own creative works. Sometimes, such [2] Apple, Core Audio, Core Image, http://www.apple.com
unintended use may results in horrible noise, while other times it [3] Cascone, K. The Aesthetics of Failure: "Post-Digital"
can produce wondrous tapestries of sound [3]. The software Tendencies in Contemporary Computer Music. Computer
platform also enables people to use existing image plug-ins in Music Journal, Vol. 24, No. 4, December 2000, pp. 12-18.
existing sound processing applications and vice versa. This
experience suggests us to access old media objects in new ways [4] CYCLING74. MaxMSP,
congruent with information interfaces we use in our everyday life http://www.cycling74.com/products/maxmsp
[10]. [5] Erbe, T. SoundHack: A Brief Overview, Computer Music
Journal, Vol. 21, No. 1, Spring 1997, pp. 35-38.
We have developed two algorithms: 8-bit and 24-bit. While 8-bit
provides one to one relationship in each color value and sound [6] Gallo, E. and N. Tsingos. Efficient 3D Audio Processing
sample, 24-bit provides one to one relationship between each with the GPU. In GP2, ACM Workshop on General Purpose
pixel and sample. Therefore the result of adding effects for data Computing on Graphics Processors, 2004.
has difference in each algorithm. For instance, with 8-bit, the [7] Kandinsky, W. Concerning the Spiritual in Art, New York:
resulted sound will be low bit like old hip-hop sample. In contrast, Dover Publications Inc, 1977.
24-bit retains the resolution of sound data which most of existing
sound processing application provides. We think their [8] Kramer, G. (ed.) Auditory Display - Sonification,
distinguished characters and the variety of expressions are not a Audification, and Auditory Interfaces. Addison-Wesley,
trivial function. Therefore we plan to provide the selection of two 1994.
algorithms in our future release. [9] Maeda, J. Design by Numbers, Cambridge, MA: MIT Press,
1999.
Due to the technical limitation, we currently fixed the buffer size
of 4096 in Monalisa-Audio Unit. It limits the range of [10] Manovich, L. The anti-sublime ideal in data art.
transformation and inhibits to treat whole sound data if it is not http://www.manovich.net/DOCS/data_art.doc, 2002.
consists of 4096 samples. Therefore we also plan to add adjusting [11] OSC, Open Sound Control,
mechanism for buffer size in future release. http://cnmat.cnmat.berkeley.edu/OSC/
While situating the installation Monalisa "shadow of the sound", [12] Sengmuller, G. VinylVideo TM, Leonardo - Volume 35,
we have conducted several observations. Due to the limited space, Number 5, October 2002, pp. 504-504.
we briefly introduce two following observations.
[13] Wenger, E. MetaSynth, http://metasynth.com
Figure changes the sound: The resulted sound was produced from [14] Whalen, S. Audio and the Graphics Processing Unit.
the image of the participant. Therefore, the figure of the http://www.node99.org/projects/gpuaudio/gpuaudio.pdf,
participant affects the character of the sound, for example, white 2005.
T-shirt produced higher frequency and border of striped shirts
made kind of rhythmical sound. [15] Whitelaw, M. Hearing Pure Data: Aesthetics and Ideals of
Data-Sound, in Arie Altena (ed.) Unsorted: Thoughts on the
Sound / Image equipment affects image / sound: We have tested Information Arts: An A to Z for Sonic Acts X, Amsterdam:
several video cameras, microphones, speakers and lights. When Sonic Acts/De Balie, 2004.
we changed these equipments, the quality of the sound equipment
[16] Whitney, J. Digital Harmony: On the Complementarity of
affects image and vice versa. For instance, the sensitivity of the
Music and Visual Art, McGraw-Hill, Inc., New York, NY,
video camera affects the spectrum of the sound and the directivity
1981.
of the microphone affects the clearness of the image. These
characters of Monalisa show potential as alternative tools to check [17] Xenakis, I. Formalized Music, rev. ed. Stuyvesant, New
the quality of sound and image equipments. York: Pendragon Press, 1992.
The possibility of computational process as a material for artistic [18] Yeo, W. S.,, Berger, J., and Lee, Z., SonART: A framework
creation [9] is not fully investigated yet. We are interested to for data sonification, visualization and networked multimedia
explore the alternative way for image and sound processing. We applications, in Proceedings of the Internaional Computer
anticipate that our initial explorations of the software platform Music Conference, Miami, FL, USA, November 2004.
stimulate new ideas for the instruments for image and sound [19] Zhang, Q., Ye, L., and Pan, Z. Physically-Based Sound
productions. Synthesis on GPUs, Entertainment Computing - ICEC 2005,
Springer Berlin / Heidelberg, Volume 3711/2005, pp. 328-
6. ACKNOWLEDGEMENTS 333.
This work was developed under the support of FY2005 IPA
Exploratory Software Project (Project Manager: KITANO 8. APPENDIX
Hiroaki) provided by Information-Technology Promotion Agency Monalisa application is under development for next release.
(IPA) and NTT InterCommunication Center. We also would like Monalisa-Audio Unit and Monalisa-Image Unit are downloadable
to thank Nao Tokui and Kazuo Ohno for their valuable comments from following URL.
and Karl D.D. Wills for his excellent graphic design.
http://nagano.monalisa-au.org/?page_id=351
7. REFERENCES A Japanese techno musician Junichi Watanabe employs Monalisa-
[1] Anderson, L. THE RECORD OF THE TIME Sound in the Audio Unit to produce his latest album "LITTLE SQUEEZE
Work of Laurie Anderson, NTT Publishing Co., Ltd., 2005. PROPAGANDA" (ADDL-004, AsianDynasty, 2007).
318
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT tests, such as the MIREX Competition [4], since these did
Off-line beat trackers are often compared to human tap- not involve the necessary component of interaction and our
pers who provide a ground truth against which they can beat tracker was highly specialised for performance with in-
be judged. In order to evaluate a real-time beat tracker, put from drums. In MIREX, the beat trackers are compared
we have taken the paradigm of the ‘Turing Test’ in which to data collected from forty human tappers who collectively
an interrogator is asked to distinguish between human and provide a ground truth annotation [5].
machine. A drummer plays in succession with an interac- In order to test the real-time beat tracker, we wanted to
tive accompaniment that has one of three possible tempo- make a comparison with a human tapper and to do so within
controllers (the beat tracker, a human tapper and a steady- a live performance environment, yet in a way that would be
tempo metronome). The test is double-blind since the re- both scientifically valid and also provide quantitative as well
searchers do not know which controller is currently function- as qualitative data for analysis.
ing. All participants are asked to rate the accompaniment In Alan Turing’s 1950 paper, ‘Computing Machinery and
and to judge which controller they believe was responsible. Intelligence’ [9] he proposes replacing the question ‘can a
This method for evaluation enables the controllers to be computer think?’, by an Imitation Game, popularly known
contrasted in a more quantifiable way than the subjective as the “Turing Test”, in which it is required to imitate a
testimony we have used in the past to evaluate the system. human being1 in an interrogation. If the computer is able
The results of the experiment suggest that the beat tracker to fool a human interrogator a substantial amount of the
and a human tapper are both distinguishable from a steady- time, then the computer can be credited with ‘intelligence’.
tempo accompaniment and they are preferable according to Turing considered many objections to this philosophical po-
the ratings given by the participants. Also, the beat tracker sition within the original paper and there has been consid-
and a human tapper are not sufficiently distinguishable by erable debate as to its legitimacy, particularly the position
any of the participants in the experiment, which suggests referred to as ‘Strong A.I.’. Famously, John Searle [7] put
that the system is comparable in performance to a human forward the Chinese room argument which proposes a sit-
tapper. uation in which computer might be able to pass the test
without ever understanding what it is doing.
The Imitation Game might prove to be an interesting
Keywords model for constructing an experiment to evaluate an inter-
Automatic Accompaniment, Beat Tracking, Human-Computer active musical system. Whilst we do not wish to claim the
Interaction, Musical Interface Evaluation system posseses ‘intelligence’, its ability to behave as if it
had some form of ‘musical intelligence’ is vital to its ability
1. INTRODUCTION to function as an interactive beat tracker.
B-Keeper controls the tempo by processing onsets de-
Our research concerns the task of real-time beat tracking
tected by a microphone placed in the kick drum with addi-
with a live drummer. In a paper at last year’s NIME Con-
tional tempo information from a microphone on the snare
ference [6], we introduced a software program, B-Keeper,
drum. The beat tracker is event-based and uses a method
and described the algorithm used. However, the evaluation
related to the oscillator models used by Large[3] and Toivi-
of the algorithm was mainly qualitative, relying on testi-
ainen[8]. Rather than processing a continuous audio sig-
monial from drummers who had tried using the software in
nal, it processes events from an onset detector and modifies
performances and rehearsal.
its tempo output accordingly. B-Keeper interprets the on-
In trying to find a scientific method for testing the pro-
sets with respect to bar position using an internal weighting
gram, we could not use previously established beat tracking
mechanism and uses Gaussian windows around the expected
beat locations to quantify the accuracy and relevance of the
onset for beat tracking. A tempo tracking process to deter-
Permission to make digital or hard copies of all or part of this work for
mine the best inter-onset interval operates in parallel with a
personal or classroom use is granted without fee provided that copies are synchronisation process which makes extra adjustments to
not made or distributed for profit or commercial advantage and that copies remain in phase with the drums. The parameters defining
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific 1
permission and/or a fee. As Turing formulates the problem, the computer imitates a
NIME08, Genoa, Italy man pretending to be a woman, so as to negate the element
Copyright 2008 Copyright remains with the author(s). of bias due to the imitation process from the test
319
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2. EXPERIMENTAL DESIGN
The computer’s role in controlling the tempo of an ac- Figure 2: Design set-up for the experiment. Three
companiment might also be undertaken by a human con- possibilities: (a) Computer controls tempo from
troller. This, therefore, suggests that we can compare the drum input; (b) Steady Tempo; (c) Human controls
two within the context of a “Turing Test” or Imitation tempo by tapping beat on keyboard
Game. We also extend the test by including a control -
a steady accompaniment which remains at a fixed tempo After each trial, we asked each drummer to mark an ‘X’
dictated by the drummer. For each test, the drummer gives on an equilateral triangle which would indicate the strength
four steady beats of the kick drum to start and this tempo of their belief as to which of the three systems was respon-
is used as the fixed tempo. sible. The three corners corresponded to the three choices
The test involves a drummer playing along to the same and the nearer to a particular corner they placed the ‘X’, the
accompaniment track three times. Each time, a human tap- stronger their belief that that was the tempo-controller for
per (AR) taps the tempo on the keyboard, keeping time that particular trial. Hence, if an ‘X’ was placed on a cor-
with the drummer, but only one of the three times will this ner, it would indicate certainty that that was the scenario
be altering the tempo of the accompaniment. For these tri- responsible. An ‘X’ on an edge would indicate confusion
als, controlled by the human tapper, we applied a Gaussian between the two nearest corners, whilst an ‘X’ in the mid-
window to the intervals between taps in order to smooth dle indicates confusion between all three. This allowed us
the tempo fluctuation, so that it would still be musical in to quantify an opinion measure for identification over all
character. Of the other two, one will be an accompaniment the trials. The human tapper (AR) and an independent
controlled by the B-Keeper system and the other the same observer also marked their interpretation of the trial in the
accompaniment but at a fixed tempo (see Figure 2). The same manner.
sequence in which these three trials happen is randomly In addition, each participant marked the trial on a scale
chosen by the computer and only revealed to the partic- of one to ten as an indication of how well they believed
ipants after the test so that the experiment accords with that test worked as ‘an interactive system’. They were also
the principle of being ‘double-blind’: i.e. neither the re- asked to make comments and give reasons for their choice.
searchers nor the drummer know which accompaniment is A sample sheet from one of the drummers is shown in Figure
which. Hence, the quantitative results gained by asking for 3.
opinion measures and performance ratings should be free We carried out the experiment with eleven professional
from any bias. and semi-professional drummers. All tests took place at
We are interested in the interaction between the drum- the Listening Room of the Centre for Digital Music, Queen
mer and the acommpaniment which takes place through the Mary, University of London, which is an acoustically iso-
machine. In particular, we wish to know how this differs lated studio space. Each drummer took the test (consisting
from the interaction that might take place with a person, of the three randomly-selected trials) twice, playing to two
or in this case, a human beat tracker. We might expect different accompaniments. The first was based on a dance-
that, if our beat tracker is functioning well, the B-Keeper rock piece first performed at Live Algorithms for Music Con-
trials would be ‘better’ or ‘reasonably like’ those controlled ference, 2006, which can be viewed on the internet [1]. The
by the human tapper. We would also expect them to be second piece was a simple chord progression on a software
320
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
version of a Fender Rhodes keyboard with some additional been correctly identified. The distribution does not seem
percussive sounds. The sequencer used was Ableton Live to have the kind of separation seen for the Steady Tempo
[2], chosen for its time-stretching capabilities. trials, suggesting that they have difficulty telling the two
We recorded all performances on video and audio and controllers apart, but could tell that the tempo had varied.
stored data from the B-Keeper algorithm. This allowed us
to see how the algorithm processed the data and enabled us
to look in detail at how the algorithm behaved and monitor
how the tempo of the accompaniment was changed by the
system.
Table 2: Table showing the polarised decisions made Table 4: Table contrasting decisions made by the
by the drummer for the different trials. drummer over the B-Keeper and Human Tapper
Judged as: trials.
Controller B-Keeper Human Steady Judged as:
B-Keeper 9.5 8.5 4 Controller Human Tapper B-Keeper
Human Tapper 8 10 4 Human Tapper 9 8
Steady Tempo 2 4 16 B-Keeper 8 8
Table 3: Table showing the polarised decisions made acteristic of having variable tempo and thus is not identifi-
by the drummer over the Steady Tempo and Human able simply by trying to detect a tempo change, we would
Tapper trials. expect that if there was a machine-like characteristic to
Judged as: the B-Keeper’s response, such as an unnatural response
Controller Human Tapper Steady Tempo or unreliability in following tempo fluctuation, syncopation
Human Tapper 12 4 and drum fills, then the drummer would be able to iden-
Steady Tempo 5 14 tify the machine. It appeared that, generally, there was
no such characteristic and drummers had difficulty decid-
ing between the two controllers. It may appear that having
the Human Tapper visible to them would give them an ad-
Of the B-Keeper trials themselves, the drummers were
vantage, however, this did not prove to be the case as the
least confident in identifying it as the controller. The re-
similarity between the computer’s response and a human
searchers, who acted as independent observer and the tap-
tapping along was close enough that often the observer and
per, were more confident. In an analogous result, we might
the human tapper were also unsure of the controller.
expect the human tapper, the first author, to be able to dis-
tinguish the trials in which he controlled the tempo, how-
ever, this did not appear to be the case. He was more
successful at discerning the other two trials.
We can polarise the decisions made by drummers by tak-
ing their highest score to be their decision for the that trial.
In the case of a tie, we split the decision equally. The advan-
tage of this method is that we can make pair-wise compar-
isons between any of the controllers, whilst also allowing the
participants the flexibility to remain undecided between two
possibilities. Table 2 shows the polarised decisions made by
drummers over the trials. There is confusion between the
B-Keeper and Human Tapper trials, whereas the Steady
Tempo trials were identified over 70% of the time. The B-
Keeper and Human Tapper trials were identified 43% and
45% respectively, little better than chance.
325
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
start and the end of the spiral winding are formed by the same keys (major, minor, …). To be aware about the geometric center
tone. If the pitch space is modified such that these two pitch helps to recognize redundancies in western tonal music. An
classes occupy the same XY-location (see Figure 2c, Figure 1b) it example: The chord progression C-f-C has exactly the same
is possible to represent many basic tonal relationships by simple geometric structure like the chord progression a-E-a. But both
geometric structures or in simple geometric ratios (Table 1). That structures are mirrored around the geometric center of the key.
simplification again makes it much easier to navigate and find There are much more of such major/minor mirror relationships
desired tonal structures within the pitch space. which become apparent when the pitch space is aligned to the
~ geometric center. But from a music psychological point of view
a) b) d
C e the geometric center is not the tonic of a key. The tonic of a key is
represented by the root of the given key which we denote with
F a a G
d C cognitive center. The most resting tone in a given key is that root
D e note. While the geometric center of a key is mode independent the
b G F b cognitive center of a key changes if the mode changes. For this it
d D could be better to align the pitch space not to the geometric center
gze@IDMT000600 but to the appropriate root note. This is shown in Figure 3: Figure
Figure 2. The extraction of one spiral winding results in a 2D 3a shows the system aligned to the geometric center. It can be
subspace that represents a diatonic key. This 2D space is the seen that major and minor chords together form a perfect
basis of the proposed interface [6] symmetric structure. It should be noted, that the geometric
distances along the spiral correspond to the tones distances on a
Representing one extracted and modified spiral winding in a 2D
semitone scale. Figure 3b shows the system aligned to the
plane – like shown in Figure 2 – results in a new geometric sub
cognitive center of a-Minor. The root note “a” is represented on
pitch space which particularly expresses key related tonal
the circle’s top. The root of the subdominant (“d”) and the root of
structures like functional relationships, aspects of tension and
the dominant (“g”) are now symmetrically arranged around the
resolution as well relationships between tones, intervals and
tonic. Figure 3c shows this for C-Major: The root “C” is aligned
chords [6]. This expressivity led to the decision to develop a to the circle’s top, the subdominant (“F”) and the dominant (“G”)
musical interface based on the adapted Krumhansl space. are symmetrically arranged around the tonic (“C”). To make a
musical interface intuitive it should allow switching between the
two alignment types. It should be possible to change the
Table 1. Often used tone combinations and its position alignment between the major tonic, the minor tonic and key’s
Diatonic Key One complete spiral winding geometric center.
Maj/min. chord Three neighbored tones
a) b) C c) a
Relative minor Direct neighbor counter clockwise of major C e
a C
e F
Relative major Direct neighbor clockwise of the minor chord a G
Parallel major The cord located directly one spiral winding
F G d e
above a given minor chord
F h
Parallel minor The chord located directly one spiral winding d h h G
d
below a given major chord gze@IDMT001136
Diminished The tones forming the start and the end of the Figure 3. The difference between a key’s geometric center and
selected spiral winding a key’s cognitive center: a) The geometric center, b) the major
root and c) the minor root are represented on the top
Major/minor Four neighbored tones (G-b-D-F)
sevens chord
Subdominant All tones to the left of the symmetry axis (e.g.
chords d-F-a, F-a-C [8]
4. NAVIGATION IN PITCH SPACE
Tonic chords Tones centered around the geometric center Now we will derive an interface that makes it possible to select
of the selected spiral (e.g. a-C-e, C-e-G) [8] parts of the pitch space like the one described in Table 1. Firstly a
simple set of parameters will be chosen in order to define the
Dominant All tones to the right of the symmetry axis desired sound. The interface proposed here has to support the
chords (e.g. e-G-b, G-b-D) [8] following movements and tasks in pitch space:
a) Moving within one spiral winding to play tones and
chords of one key. If the selected spiral winding is
3.2 Geometric versus cognitive center projected onto a 2D plane like shown in figure 2 this
It is important to distinguish between the geometric center and the results in a 2D navigation.
cognitive center of a key. The geometric center is located exactly b) Define what pitches are selected if a certain spatial
in the middle of the selected spiral winding (Figure 2, represented position has been reached. This requires parameters that
by the tilted d). The geometric center is the same for all diatonic define the dimensions of the selected part in the space.
326
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
c) Change the current spiral winding to change the key or a continuous fading between single tones, third intervals, major
to temporarily play chords from other keys. This results and minor chords or major or minor seven chords.
in a movement within the 3rd dimension.
4.2 Changing the key - Changing the spiral
d) According to Figure 3 it must be possible to align the
pitch space to the major tonic or to the minor tonic or to winding
the geometric center of a given key. The alignment It has to be distinguished between two modes of spiral winding
should be a task which is executed conscious. change: 1.) the permanent key change, which is used, if the key
shall be modified permanently. This permanent modification is
4.1 Playing Tones and Chords - Navigating used for example, if another song shall be accompanied or the
within one spiral winding current musical piece is to be transposed. In that case it is
The most basic task to create music is to navigate within one necessary to switch to another spiral winding and also to rotate
spiral winding and to select tones of one key. In Figure 4 it is the spiral winding such that the geometric or cognitive center
shown that concrete pitch classes are represented at discrete (Section 3.2) is represented at the same position where the former
angles. But to make pitch classes audible we also have to assign a key’s center was represented. In Figure 3 this position is the
pitch height to every pitch class. For this the authors propose to circle’s top.
use the radial dimension to assign different root positions to every 2.) The second change is the temporary key change: In many
pitch class. According to Figure 4 this results in four control cases there is a fixed key, but we need to play chords that belong
parameters for playing tones, intervals and chords: 1.) A start to another diatonic set. E.g. often it is required to play the
angle, 2.) an apex angle 3.) a start radius and 4.) an apex radius. dominant major chord in harmonic minor. In that case it is
The start angle defines the root of the chord that is to be played necessary to jump to another spiral winding, but the tonal center
(Figure 5). The apex angle defines how many pitch classes shall remain at the same spatial location.
neighboured to the root are played (Figure 6). The start radius it
used to define the pitch height of the played pitch class such that
the pitch height increases continuously with the radial position. Table 2. Several key respectively spiral winding changes:
So the higher a tone’s pitch the greater is the tone’s radial Key change Spiral winding change
position. Because chords can also be composed of tones of more
then one octave the apex radius can be used to increase or 1 Jump to the parallel major Select the spiral winding
decrease the number of octaves that are used to generate the tone key directly above the current one
combination.
2 Jump to the parallel minor Select the spiral winding
key directly below the current one
C e
3 Shift the key by one Shift the spiral by fife fifths
semitone
Start angle D 4 Jump to the next key in the Shift the spiral by one fifths to
a G circle of fifths the left or to the right
Apex angle E
r1 r2
Apex radius r2 Table 2 shows characteristic key changes. The two most
Start radius r1 important spiral winding changes are the parallel major/minor
ones (Table 2: 1, 2). These changes convert a given major chord
to a minor chord and convert a given minor chord to a major
F h chord (See also Figure 1). It can be seen that these changes are
simple operations. Other operations like the transformation of a
d gze@IDMT000404 given major chord into the dominant 7th chord are more
complicated and have to be regarded in more detail.
327
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
328
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
6. SOFTWARE ARCHITECTURE problems. The reason for this was that manipulating a parameter
Figure 10 shows the software architecture that realizes the A involuntarily led to change in another parameter B also. In
presented musical interface approach. The core module is the addition, the simultaneous manipulation of two or more
module “Geometric Pitch Distribution”. This module defines independent parameters has been perceived as very complicated.
where the tones are geometrically positioned within the pitch The manipulation of the apex angle (Table 3, Task 11) has been
space. In our case this module realizes the spiral of thirds and can perceived as totally unusable by more than half of the test
be replaced by other geometric pitch distributions, e.g. the circle persons. This was due to the high sensibility of the
of fifths or a Riemann network. The module “Pitch Selection” SpaceNavigator. Also the time it takes to set the angle has been
defines in what way pitches can be selected for playout. The Pitch rated to be too long for real musical applications. Some people
Selection depends on the geometric pitch distribution and stated that they liked the perceptional link to the pitch space and
provides a high level interface to change e.g. the start- and apex the way that notes are being selected and played (parameter 1,
angle or the start- and apex radius (Figure 4) or the velocity of the Table 3).
tones to be played. The control parameters provided by the pitch
selection module must be mapped to the parameters of a given 8. APPLICATIONS
hardware controller e.g. the SpaceNavigator. This is done by the The proposed pitch space based musical interface is interesting
“Parameter Mapper”. This module receives the events from the for different target groups. Children often have extensive skills in
hardware controller (SpaceNavigator), transforms those events if computer games and the usage of new hardware controllers. Such
required and maps these parameters to the control parameters an interface could help them to learn tonal western music step by
provided by the module “Pitch Selection”. step. Combined with an appropriate visualization they can quickly
learn many theoretical relationships like the composition of
Parameter Pitch Midi Note chords, functional relationships between chords (subdominant,
Mapper Selection Generator tonic, dominant) or relationships between different keys, but also
more psychophysical relationships like the distinction between
Pitch
pitch chroma (assigned to angle) and pitch height (assigned to
Weighting
radius) which have been shown to be processed in different brain
Geometric regions [10].
Synthesizer
Pitch Distribution
Music students often have to learn many music theoretical terms.
gze@IDMT001137
For them it is important to keep the overview. The challenge
within this relationship is to bridge the gap between theoretical
Figure 10. The software architecture of the presented knowledge and its practical application. Using a pitch space based
interface approach instrument could help them to organize many theoretical terms by
The module “Pitch Weighting” takes the current Pitch Selection linking them to a spatial model. The possibility to interact directly
and the given Geometric Pitch Distribution and derives the with such geometric representation of tones and to hear the result
weights of the currently selected pitches. The weighted pitches immediately will additionally help them to improve their learning
again are forwarded to the “Midi Note Generator” which progress. For this the proposed musical interface could become
generates a midi signal which is fed to a synthesizer. part of the standard scholar education.
Older people are often willing to learn a new instrument, but
7. EVALUATION classical instruments like piano or violin are too complicated and
To evaluate the parameter mapping proposed in Table 3 several require the development of extensive motor functions. A musical
informal tests, a focus group and a usability test have been instrument with a simple set of parameters could motivate them to
conducted. The focus group consisted of five participants of start a new challenge i.e. to start to learn a new instrument.
varying musical background. The usability test featured 20 Musicians, DJs, Composers: Combined with an advanced sound
participants (10 musicians, 10 non-musicians). Task 1 (as synthesis (Figure 10) the pitch space based instrument can
proposed in Table 3) has been perceived as according to the become a creative tool that supports the finding of new chord
model in the software, and generally accepted. Due to the self- progressions throughout all keys and to develop advanced sound
centering property of the SpaceNavigator, Tasks 2-4 have been textures.
evaluated as difficult and only suitable for experimental musical
settings. Thus the start radius has preliminarily been set to a fixed 9. SUMMARY AND RESULT
position. This led to an alteration of the controller assignment of A new musical interface approach based on tonal pitch spaces was
Tasks 3 and 4 in a way that now a note is being played when a presented. The approach targets both musicians as well non
certain radius-margin is being crossed. The velocity of the played musicians. The combination of tonal pitch spaces with 3D-
note is now derived from the velocity of the movement during the Navigation tasks and a real-time auralisation provides many new
crossing of that margin. Tasks 5-7 have been perceived as very musical possibilities. The used model was derived from a music
problematic and unusable, due to physical limitations of the psychological model which guarantees a strong relationship
SpaceNavigator. It has been found that the 6 degrees of freedom between the used model of tonality and cognitive principals. With
of the SpaceNavigator cannot always be handled simultaneously. the extraction of one spiral winding and the projection onto a XY-
For example a full rotation locks the controller cap and prevents plane a strong simplification of the tonal space and the
movement to the desired angle (as in Task 1, Table 3) etc. Also complexity of navigation tasks could be reached (Figure 2). The
handling two independent parameters on two different axes led to alignment of that diatonic subspace to the symmetry axis
329
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
respectively the geometric center of the extracted key helped to [4] Shepard, Roger N.: Geometrical approximations to the
encounter structural redundancies between major and minor structure of musical pitch. In: Psychological Review (1982),
which again leads to a reduction of the learning matter. Nr. 89, S. 305–333
A simple set of parameters to navigate within the pitch space and [5] Tymoczko, Dmitri: The Geometry of Musical Chords. In:
to interact with pitches has been provided. This interface allows Science (2006), Nr. 313, S. 72–74
fading continuously between tones, intervals, chords and [6] Gatzsche, G.; Mehnert, M.; Gatzsche D.; Brandenburg, K.: A
inversions of chords as well as defining chord progressions symmetry based approach for musical tonality analysis,
throughout all keys, modulations, etc. The spiral of thirds portions Proceedings of the 8th International Conference on Music
the 12 major and the 12 minor keys in easy understandable Information Retrieval, ISMIR2007, Vienna, 2007
diatonic subspaces which can be easily accessed. The spiral has
been designed such that important tonal relationships like parallel [7] Gatzsche, G; Mehnert, M.; Gatzsche, D.; Brandenburg, K.:
and relative major and minor chords are in neighborhood. Mathematical optimization of a toroidal tonality model, 8th
Conference of The Society for Music Perception and
To create music out of the pitch space several navigation tasks Cognition, Concordia University Montreal QC Canada, 2007
have been defined and mapped to the control parameters of the
3DConnextion SpaceNavigator. A subsequent usability test [8] Mehnert, M.; Gatzsche, G.; Gatzsche, K. Brandenburg, K.:
consisting of a focus group and 20 individual tested participants The analysis of tonal symmetries in musical audio signals.
brought the result that it is required to have an controller which International Symposium on Musical Acoustics ISMA 2007,
allows anindependent control of different parameters (push/pull, Institut d'Estudis Catalans, Barcelona, 2007
rotate, …) simultaneously. The tested controller didn’t meet this [9] 3DConnextion, A Logitech company, SpaceNavigator,
requirement. http://www.3dconnexion.com/, January 2008
Next steps in the development of the interface will be 1.) to [10] Warren, J. D., et al.: Separating pitch chroma and pitch
evaluate controller alternatives that meet the requirements height in the human brain. www.pnas.org, 100(17):10038–
denoted before, 2.) to add the possibility for navigation within 10042, 2003.
different parts of the space independently, 3.) to introduce other
[11] Krumhansl, C.; Kessler, E.: Tracing the Dynamic Changes in
pitch spaces to play other and more complex chords and scales
Perceived Tonal Organization in a Spatial Representation of
and 4.) to develop an advanced visualization.
Musical Keys. In: Psychological Review (1982), Nr. 89, S.
334-368
[2] Krumhansl, Carol L.: Cognitive foundations of musical [13] Mehnert, Markus ; Gatzsche, Gabriel ; Brandenburg,
pitch. Oxford psychology series; no.17. Oxford University Karlheinz ; Arndt, Daniel; Circular Pitch Space based
Press, 1990. – ISBN 0–19–505475–X Harmonic Change Detection, In: 124th AES Convention
(2008)
[3] Lerdahl, Fred: Tonal pitch space. Oxford: Oxford University
Press, 2001. – ISBN 0–1950–5834–8.
330
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
331
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
332
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Figure 2: The left panel shows a screen shot of the java external embedded in the Max/MSP environment.
The user is given access to various parameters that control the algorithm. The pitch track and amplitude of
the signal are displayed here. The right panel shows the display of the posterior distribution over the raag
targets and the pitch class distribution.
The system correctly identified raag Desh as performed palasi was confused with Darbari and Bageshri, and at
by the author in less than fifteen seconds, and the posterior times with Malkauns, and was never correctly identified.
became locked within thirty. There was some slight con-
fusion with Jaijaiwante, which is similar in both scale and
phraseology. In the first few seconds, when the performer 4. CONCLUSIONS
had played only two or three notes, with a prominent ma- We have demonstrated that real-time raag recognition
jor seventh scale degree, raag Yaman had the highest pos- is possible in realistic performance situations with minimal
terior, however with the introduction of more material, the adjustments needed for different performers. In terms of ac-
estimate quickly shifted to Desh. As a special case we were curacy we noted that as additional information is collected,
interested to see whether the introduction of raag Jaijai- initial ambiguities are often resolved leading to correct clas-
wantes signature phrase, which includes the minor third sification. In some cases, however, as the performer begins
scale degree not present in raag Desh, would be immedi- to focus on certain phrases that overlap with nearby raags,
ately detected. In fact, the change in the PCD brought on the estimate begins to fluctuate between different possibil-
by this phrase did indeed produce a nearly immediate shift. ities. In the latter case, this error is in part attributable
Raag Kedar (also performed live) provided a challenge: it to our feature choice, which does not include any sequence
is a raag that is primarily distinguished by its zig-zagging information. This make the system biased towards simply
phrases, and has a major scale similar to several other raags. using the most commonly occurring pitches for the clas-
Not surprisingly, the system took longer to converge, and sification decisions, without consideration of the melodic
for the first two to three minutes Kedar was confused with context. This suggests that if sequence were taken into ac-
Maru Bihag, Jaijaiwante, and Darbari. The first two were count, such as by counting pitch dyads, we would be able to
unsurprising due to their similar scales, however Darbari more clearly disambiguate raags in the same neighborhood.
contains a minor third and minor sixth, neither of which are This is consistent with what we found earlier in our non-
present in Kedar. We observed that this confusion occurred real-time experiments where pitch-dyads were also used as
when the fifth scale degree was the most prominent, an features.
example of the dominant note effect. Another point that becomes clear is that the algorithms
Raag Darbari, as sung by Amit Mukherjee, provided an lack a note model makes it very difficult for it to estimate
interesting example of tetrachordal ambiguity being resolved. the perceptual salience of a pitch. For example, for a hu-
Often a performer will focus on half of the scale for a period man listener, the fleeting introduction of a new note can
of time. In this case, the singer lingered on the upper four dramatically change the tonal landscape even though its ef-
notes of the scale for nearly forty seconds before including fect on the PCD is minimal, perhaps indistinguishable from
the other notes. During this time, Malkauns had by far the noise. Another example is the use of gliding approaches to
highest posterior, an unsurprising confusion given the two notes that are very common in Indian music. For our sys-
raags’ similarity in the upper tetrachord. However, the sys- tem, such glides introduce energy at all the frequency bands
tem immediately and unambiguously switched to Darbari between the starting and ending point, and for a plucked
upon hearing notes from the lower tetrachord. instrument that decays quickly, may emphasize the start-
It should be noted that in some cases the system was un- ing point. This leads to a noisier PCD estimate. However,
able to converge to the correct raag, or only briefly touched for a human the opposite is often true. Glissandi serve to
upon it before converging elsewhere. For example, Bhim- emphasize the target note. In both these cases, the crucial
333
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
difference is that the system has no concept of what consti- [6] H. Sahasrabuddhe and R. Upadhy. On the
tutes a note. Humans are able to integrate many different computational model of raag music of india. In Proc.
types of information to resolve notes from complex time- Indian Music and Computers: Can Mindware and
varying pitch tracks, from the basic perception of vibrato Software Meet?, 1994.
as a modulation of a central frequency to understanding the [7] X. Sun. A pitch determination algorithm based on
importance of a pitch within the musical structure. subharmonic-to-harmonic ratio. In In Proc. of
These insights were suggested concretely by our ability International Conference of Speech and Language
to visualize inner working of the algorithm, both the data Processing, 2000.
presented to it and its hypotheses, while simultaneously lis- [8] I. H. Witten and E. Frank. Data Mining: Practical
tening. machine learning tools and techniques. Morgan
We have created a framework for high-level musical in- Kaufmann, 2005.
teraction for Indian electroacoustic music. We envision a
number of extensions to this system to allow it to be used
effectively in a performance setting. Aside from tracking
more features to yield different and more robust classifica-
tion estimates, the momentary maximum posterior can be
used for example to introduce phrases from the same or re-
lated raags. The time-varying posteriors can also be treated
as modulating control signal that tracks a very high-level
aspect of the music. The time-varying PCDs can be used
in a similar fashion. These signals can be used to generate
tonally relevant material.
5. FUTURE WORK
Improving the pitch estimation algorithm would likely
yield better results. The primary difficulties faced are track-
ing pitches that vary rapidly over time, which is typical of
Indian music, eliminating pitches that are not part of the
main melodic line, and source separation from accompani-
ment and resonating strings.
Future systems will attempt to use simple sequential in-
formation such as pitch class dyad distributions as well,
which would require segmenting the signal into notes using
an onset detector. This is particularly difficult in Indian
music where notes do not often correspond to clear onsets
in the time domain. If solved, this would partially address
the lack of a note model referred to above, and would likely
lead to a substantial increase in accuracy. However, cre-
ating a perceptual note model remains a fundamental and
interesting problem.
6. ACKNOWLEDGMENTS
The authors would like to acknowledge the following mu-
sicians who made substantial contributions to the raag database:
Prattyush Banerjee (sarod), Nayan Ghosh (sitar), Amit
Mukherjee (vocal), Sugato Nag (sitar), Falguni Mitra (vo-
cal), Manilal Nag (sitar).
7. REFERENCES
[1] V. Bhatkande. Hindusthani Sangeet Paddhati. Sangeet
Karyalaya, 1934.
[2] P. Chordia. Automatic raag classification of
pitch-tracked performances using pitch-class and
pitch-class dyad distributions. In Proceedings of
International Computer Music Conference, 2006.
[3] P. Chordia and A. Rae. Raag recognition using
pitch-class and pitch-class dyad distributions. In
Proceedings of International Conference on Music
Information Retrieval, 2007.
[4] A. de Cheveigne and H. Kawahara. Yin, a fundamental
frequency estimator for speech and music. Journal of
the Acoustical Society of America, 111(4):1917 – 1930,
2002.
[5] D. Huron. Sweet Anticipation: Music and the
Psychology of Expectation. MIT Press, 2006.
334
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Anders Vinjar
Institute for Musicology
University of Oslo
Oslo, Norway
andersvi@extern.uio.no
ABSTRACT
A general CAC1 -environment charged with physical-model-
ling capabilities is described. It combines Common Music,
ODE and Fluxus in a modular way, making a powerful and
flexible environment for experimenting with physical models
in composition.
Composition in this respect refers to the generation and
manipulation of structure typically on or above a note-
, phrase or voice-level. Compared to efforts in synthesis
and performance little work has gone into applying physical
models to composition. Potentials in composition-applica-
tions are presumably large. Figure 1: Intuitive response in physical model
The implementation of the physically equipped CAC-en-
vironment is described in detail.
human research and cpu-time — are again focused on prob-
Keywords lems concerning musical structure.
335
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
336
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Open Source to help sharing of development-work and 2.3 Virtual mechanical structures
experiment-results with other projects or individuals All kinds of realistic and unrealistic virtual mechanical
structures are interesting to experiment with in this context.
General representation to facilitate analysis- and inter- Having access to a general toolkit for rigid-body-mechanics
application-work makes all possible shapes, structures or set of structures
definable in this environment available for experimentation.
The application is programmed as a client/server archi-
tecture consisting of two parts — Common Music as the 2
CM can be built using either Scheme or Common-Lisp
client, and Fluxus with ODE built-in running as a server. 3
Algorithms with built-in functionality for handling musical
The physically equipped CAC-environment is up & run- time
ning, and has been used to compose musical material for 4
Fluxus is perhaps used most as a real-time performance
new compositions. tool
337
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
4. REFERENCES
[1] C. Agon, G. Assayag, and J. Bresson, editors. The
OM Composer’s Book, volume 1. Delatour
Figure 5: Virtual hand behaving in a virtual world France/Ircam-Centre Pompidou, 2006.
[2] T. Anders, C. Anagnostopoulou, and M. Alcorn.
Multiparadigm Programming in Mozart/OZ, volume
The interactive nature of the application suggests start- Volume 3389, chapter Strasheela: Design and Usage
ing off with simple structures, observe the results, modify, of a Music Composition Environment Based on the
and observe how the musical output changes. Systems re- Oz Programming Model. Springer Berlin/Heidelberg,
sponding realistically to input according to physical laws 2005.
simplifies learning its behavior. An example of one such ap- [3] C. Cadoz. The Physical Model as Metaphor for
proach used to experiment with is a model of a hand where Musical Creation. pico.. TERA, a Piece Entirely
attributes such as length of fingers or gravity are modulat- Generated by a Physical Model. Proceedings of the
able. A certain point on the hand is subjected to externally 2002 International Computer Music Conference, 2002.
controlled forces in three dimensions, resulting in the whole [4] N. Castagne and C. Cadoz. Creating music by means
structure responding in a physically coherent way. of ‘physical thinking’: The musician oriented genesis
As part of this project virtual structures consisting of environment. In Proc. of the 5th. Int. Conference on
mechanical bodies with varying shape, mass and surface- Digital Audio Effects. DAFX-02, 2002.
qualities, connected by links of various types and qualities [5] S. Gibet, N. Courty, and J.-F. Kamp, editors. Gesture
— eg. “balljoint”, “hingejoint”, “fixedjoint”, “sliderjoint” in Human-Computer Interaction and Simulation, 6th
— are constructed and set to interact in virtual physical International Gesture Workshop, GW 2005, Revised
worlds with arbitrary values for physical properties such as Selected Papers, volume 3881 of LNAI. Springer,
gravity or friction. All physical parameters in the system Berder Island, France, May 2006.
— position, speed, forces, collisions between objects, angles [6] D. Griffiths. Fluxus, http://www.pawfal.org/fluxus.
of joints etc. — can be read at any time, notified as OSC-
[7] C. Henry. pmpd: Physical modelling for pure data,
messages and be mapped to musical parameters.
2004.
Different structures and modes of interaction may be saved
[8] K. Karplus and A. Strong. Digital synthesis of
and recalled as presets, and behavior over time may be
plucked string and drum timbres. Computer Music
recorded and played back.
Journal, 7(2):43–55, 1983.
2.4 Mapping [9] M. Laurson. Pwconstraints. X Colloquio di
Special mapping-layers connecting streams of data from Informatica Musicale, X:332–335, 1993.
the physical world to musical parameters, and fitting their [10] F. L. Lezcano. dlocsig,
ranges onto eligible scales, are programmed as classes and http://ccrma.stanford.edu/ nando/clm/dlocsig/.
methods in the CM-environment. [11] Open Dynamics Engine, http://www.ode.org.
The way data-streams from the physical objects are used [12] R. Parncutt. Modeling piano performance: Physics
to control evolution of musical parameters defines the re- and cognition of a virtual pianist. ICMC Proceedings,
sulting music. These choices are made to suit the actual pages 15–18, 1997.
compositional problem at task. [13] SuperCollider, http://supercollider.sourceforge.net/.
The enhancements provided by a physical black-box in- [14] R. Taube. Common Music.
cluded before the mapping-stage are related to how param- http://commonmusic.sourceforge.net.
eters evolve over time and the way simple input are used to [15] H. Vinet. Recent research and development at ircam.
control complex sets of parameters through intuitive one- Computer Music Journal, 23(3):9–17, 1993.
to-many mappings. [16] A. Vinjar. Oppspent line. MIC-recording, 1994.
Musical composition.
2.5 Musical parameters
[17] M. J. Willis and M. T. Tham. Advanced process
The system controls parameters on various levels as il- control, April 1994. Web-document.
lustrated on the right-hand side of figure 3. Examples of
interesting compositional parameters range from low-level
ones — pitch, onset, register, dynamic — to high-level at-
tributes such as phrasing, ambitus, texture, redundancy.
338
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
The Color of Waiting is an interactive theater work
3. Philosophy
with music, dance, and video which was developed at A live event engages the audience in a unique way, where each
STEIM in Amsterdam and further refined at CMMAS member contributes to shaping the event, actively participating i n
in Morelia Mexico with funding from Meet the its realization. In this way, the artistic experience becomes a
Composer. Using Max/MSP/ Jitter a cellist is able t o dialectical one. Importantly, the live event implicates its audience
control sound and video during the performance both as individuals and as a collective. As critic Nicholas
while performing a structured improvisation i n Bourriaud notes, “Each particular artwork is a proposal to live in a
response to the dancer’s movement. In order t o shared world…intersubjectivity…becomes the quintessence of
ensure. repeated performances of The Color o f artistic practice.” [1] In performance, audience members engage
Waiting, Kinesthetech Sense created the score with the artists and their creations in a collective elaboration of
contained in this paper. Performance is essential t o meaning. This component of a communal development of meaning
the practice of time-based art as a living form, but is an essential aspect of the artistic experience. At a live event,
has been complicated by the unique challenges i n “there is the possibility of an immediate discussion: I see and
interpretation and re-creation posed by works perceive, I comment, and I evolve in a unique space and time.” [2]
incorporating technology. Creating a detailed score With the diminished critical distance comes an increasing
is one of the ways artists working with technology emotional involvement where the participant is immersed “in a
can combat obsolescence. 360-degree…unity of time and place.” [3] The live event is thus a
site of encounter and exploration.
Keywords By contrast, the viewer takes on a much more passive role when
experiencing an event through documentation. Instead of a shared
Interactivity, Dance, Max/MSP/Jitter, Sustainability
site of artistic communion, the document “refer[s] each individual
to his or her space of private consumption.” [4] The viewer cannot
1. INTRODUCTION participate in the communal aspect of a live performance, as the
This score with descriptions of the electronic documentary forces him or her to acknowledge his or her current
sounds, video compositing, choreography, cello surroundings, separating the individual from the experience while
tracking, lighting, costume stage diagram will allowing only a glimpse of it. In addition, a documentary of an
enable performance of this work well into the future. event lacks the dynamism of meaning one encounters at a live
A DVD with the Max/MSP/Jitter patch saved as text, event, as a document is essentially a predigested, one-sided
the sound and video files used in the performance interpretation of a historical circumstance. Documentation, n o
and video clips from various performances i s matter how thorough, is unavoidably biased towards producing a
included with the score. By including screen shots certain interpretation of the event: each image presented i s
of relevant sections of the max patch in the score, we mediated through the critic’s lens. Here, the relationship between
show which part of the interaction is most important viewer and image is one of authoritarian promotion and reception.
for the artistic success of the work. [4] But art exists in time and space, and its reduction to mere
document subtracts something essential from it, reducing it to an
object that exists within the confined parameters of the viewer’s
2. The Score screen. Bourriaud argues that artistic form can only be realized
The following figures show full pages of the score. “from a meeting between two levels of reality. For [the
The entirety of the introductory materials (figures 1- homogeneity of a document] does not produce [art]: it produces
3) is included while due to space restrictions only only the visual, otherwise put, ‘looped information.’” [5]. Our
excerpts from the timeline of the performance are score including the DVD is not a documentation of a performance,
included (figures 4-6). The introduction serves t o nor is it a document to be used in performance, rather it is a
document all elements of the piece including the set, document to ensure repeated performances.
the lighting and the costumes. The timeline contains
sketches of the choreography, performance 4. References
instructions for the cellist and dancer, musical
notation for the cellist including light and dance [1] Bourriaud, Nicholas. 2002. Relational Aesthetics. Transl. Simon
cues, and stills from the video showing brightness, Pleasance and Fronza Woods. Paris: Les presses du reel, 22.
and placement of elements. [2] Ibid., 16.
[3] Popper, Frank. 2007. From Technological to Virtual Art.
Cambridge, MA: The MIT Press, 181.
[4] Bourriaud, 22.
[5] Ibid., 24.
[6] Ibid., 24.
339
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
340
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
341
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
342
Demos1
1
The following pages include the contributions that have been accepted as demos. The demo program also includes
nine further demos associated to papers and posters.
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT 2. METHOD
We developed a rhythmic instruments ensemble simulator
generating animation using game controllers. The motion of a 2.1 Concept
player is transformed into musical expression data of MIDI to We use rhythmic instruments as a musical ensemble, which
generate sounds, and MIDI data are transformed into animation include drum sets and percussions in various genres, such as pop,
control parameters to generate movies. These animations and latin, ethnic, techno, etc. They have no clear melody and it is easy
music are shown as the reflection of player performance. Multiple to make a sound by a comparably simpler action like hitting than
players can perform a musical ensemble to make more varied other melodious instruments. The performing action is directly
patterns of animation. Our system is so easy that everyone can related to the generating of sound expression. Even beginners or
enjoy performing a fusion of music and animation. children can play them in a demonstration.
We use a Wii Remote as the wireless controller by Bluetooth
Keywords technology, which is easily connected to a computer. In past
Wii Remote, Wireless game controller, MIDI, Max/MSP, Flash reports, Bluetooth controllers are often used for musical
movie, Gesture music and animation. performance [1][2]. Wii Remote is an obtainable device on the
market at a reasonable price, and has useful operation. A wireless
device can make it possible to construct an unfettered
1. INTRODUCTION
environment for the music performance, and players can perform
Many persons have a desire to perform music with instruments.
freely from bothersome wires. Wii Remote has three axis
However, playing musical instruments requires some degree of
acceleration sensors, and various physical motions can be
training, and as a result, it is difficult to obtain the skill of
detected, such as shake, hit, slide, turn, twist, etc. Rhythmic
performing well.
performance is highly related to these handy actions, and it means
Recently, as computer technology advances, numerous music players directly make rhythmic sounds by handling it.
video games have been developed as simulators of musical
Our system generates animation movies synchronized in real time
instrument performance, for example, Beatmania or Guitarfreaks
with music performing by players. The visual aspects of playing
of Konami. However in those games, players are so passive that
music are as important as the sound itself. Players enjoy
they are only pushing buttons on controllers according to
performing music more by animation, which can be controlled
preloaded music. They cannot perform their own music. A video
and make various patterns synchronized with musical expression
monitor shows only the performance data and provided images.
such as tone, velocity, tempo, etc. If multiple players perform an
In this paper, we proposed a musical instrument performance ensemble, an animation generated by each player interacts with
simulating system, which generates animation. Players use a the others, and as a result, variations of animation are increased.
normal wireless game controller known as a Wii Remote, which The interaction of movies is interesting for all players related to
is developed by Nintendo using Bluetooth technology. They can the ensemble, because unexpected motion graphics are generated.
play rhythmical instruments by operating the controller. The In our system, a maximum of four players can perform together at
action of players is reflected to sound data, such as velocity, a time.
timing, etc. Additionally animation movies are generated based
on the sound data. If multiple players perform an ensemble, each
player generates his own movie and influences the images of the PC
others. They can enjoy a performance by not only music from the Bluetooth Max/MSP
flashserver Flash
ears, but also animation from the eyes. Motion to
MIDI MIDI MIDI to
transmitted Movie
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. Wii Remotes Speakers playing music
NIME08, June 4-8, 2008, Genova, Italy Display animation
Copyright remains with the author(s). Figure 1. System configuration.
345
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
346
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
2.2 Grip Sensor Directly beneath the bow hair sensor on the front of the frog is an
IR photo detector. The detector receives a modulated signal from
A cylindrical pressure sensor made of a 5 layer “sandwich” of an array of LEDs that emerge from beneath the instruments
conductive materials replaces the usual grip. Changes in
fingerboard. Decoded signals from the IR detector represent the
resistance occur in relationship to the pressure and total surface
distance of the frog from the fingerboard. These are processed in
area of the musician’s grip. The sensor output is fed to a 12-bit the analog domain before being presented to the 12-bit ADC.
ADC before it is transmitted to the host. Repeatability and return
to zero are very reliable. 2.6 The board within the frog
2.3 Bow Hair Tension Housing all of the circuitry in a frog not much larger than a
traditional bass frog was challenging. The board, itself, forms the
Many have tried to measure the pressure of the bow hair on the major structural element fastening the hair to the stick through a
strings by measuring flexing [1] of the stick. While this provides a
frog adaptor. The frog is fully adjustable providing the normal
useable signal, it is inherently prone to damage from exposure to
range of hair tension.
outside forces. In the K-bow a special angular measurement
scheme is attached to the bow hair at the frog end. After bringing The circuitry includes 20 op-amps, two cpus (an ARM7 and a
the bow up to tension, and upon power up, the sensor is auto Silicon Laboratories F411), the Bluetooth transceiver,
calibrated. accelerometer and extensive power management systems.
347
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
A 6 gram lithium polymer battery provides a full days use. The extended functionality beyond that provided. This program
battery is charged through a standard USB connector, which also provides user settable sensitivity options and a calibration routine.
provides for firmware updates. Triggers are extracted from inflection points in bow data. Data
Using different frog adaptors and hair mounting brackets alows smoothing and sensor blending provides fluidly useful data for
the same circuitry and housing to be used for violin, viola, cello, continuous control functions.
and bass bows. Programmable signal processing for violin audio provides a wide
Monitoring the accelerometer’s activity allows the CPU to range of timbres for user selection. A four track “Looper” is
determine bow activity and power down unused circuits to integrated into the application with controls tightly coupled to the
conserve power. A user settable “Off Interval” turns the entire bow’s capabilities.
bow off when this time is exceeded due to lack of bow motion. Included in the application is a 2D OCR MXJ Object. This can be
Toggling the power switch on the bow allows the bow to be trained from any 2 outputs from the bow. One use is to map X and
automatically discovered and routed to its previous application Y position into a trained object that recognize letters written in the
address. These states are forwarded to the Emitter under the air for control of recording functions or preset selection.
fingerboard so it can follow similar power management rules.
Connectivity is via Bluetooth 1.2 Class 2 devices. Normal line of A custom Bluetooth object for the bow interfaces the RFCOMM
site range is greater than 10 meters. Data rates can be updated as layer directly to Max/MSP. Bows can be named for easy
recognition when presented device lists by the Host OS.
fast as 1.6 ms for a single bow. Up to seven bows can be
supported by one host computer. 4. ACKNOWLEDGMENTS
Our thanks to Ashley Adams, Don Buchla, Dawson Bauman,
Chuck Carlson, Joel Davel, Jeff Van Fossen, David Hishinuma,
Marriele Jakobsons, Dan Maloney, Denis Saputelli and Barry
Threw for their contributions to the project.
5. REFERENCES
[1] Young, Diana s. New Frontiers of Expression Through Real-
Time Dyamics Measurement of Violin Bows. Masters Thesis,
MIT September 2001
[2] Rasamimanana, Nicolas H. Gesture Analysis Bow Strokes
Using an Augmented Violin. Technical Report, IRCAM, June
2004.
3. Host Software
A host software program accompanies the bow. Written in
Max/MSP, the Host application can be easily modified for
348
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Plink Jet
Lesley Flanigan Andrew Doro
Interactive Telecommunications Program Interactive Telecommunications Program
New York University New York University
721 Broadway 4/F 721 Broadway 4/F
New York, NY, USA New York, NY, USA
+1 212 998 1894 +1 212 998 1894
lesleyflanigan@gmail.com andy@sheepish.org
349
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
350
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
351
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Permission to make digital or hard copies of all or part of this work for
2. RELATED WORKS
personal or classroom use is granted without fee provided that copies are The works with umbrellas have been variously provided.
not made or distributed for profit or commercial advantage and that Amagatana[3] is an umbrella shaped portable device. By
copies bear this notice and the full citation on the first page. To copy swinging it, sounds of a sword are generated from the headphone.
otherwise, or republish, to post on servers or to redistribute to lists, Rain Dance[4] is an installation content using umbrellas. When
requires prior specific permission and/or a fee. the user whoputting up the umbrella passes through the
NIME08, June 4-8, 2008, Genova, Italy
Copyright remains with the author(s).
352
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
shower which has the audio vibration, the shower is cut off by the Through these processing, Oto-Shigure enables to make various
umbrella and a sound is generated. musical expressions; locating the sound on the surface of umbrella
and generating the surround sound like the 5.1ch surround system.
3. IMPLEMENTATION Also, using original interface running on iPod touch enables the
user to control sound area intuitively on real-time without
3.1 Technique for Generating Sound complicated manipulation. For example, when the user draws a
The system presented in this work is based on the original sound
generating system (Figure 2). Oto-Shigure comes equipped with
vibration motors instead of speakers. The vibration motors are
attached to each of four tips of the umbrella ribs and they generate
sounds by vibrating the whole umbrella cloth. The audio signal
from line input is amplified from 100 to 200 times by operational
amplifier (LM386)(Figure 3). The amplified audio signal is
transmitted to the vibration motors and resonate the ribs and cloth
along with the surrounding air.
A sound-generating device, such as a speaker in general, makes
point source sound, while our sound generating system enables to
make almost plane source sound. That is due tothe umbrella
cloth that vibrates as a plane and makes the air resonate. This
allows making non-localized sound and enables the user under the
umbrella to have a quite new experience like bathing in the sound
of a rain. Additionally, the umbrellas in general are made of Figure 3. Printed circuit board of amplifier
cloth, vinyl and metals, but we used the traditional Japanese circle clockwise on iPod touch, the sound and its effect heard
umbrellamade of Japanese paper and bamboo. Owing to these from above his head rotating clockwise.
materials' high sound transmission property, superior sound
characteristic and the high sound volume were realized.
4. CONCLUSION AND FUTURE WORK
In this paper, we explained about the new sound-generating
interface which enables the user to develop various musical
expressions. There are two ways to generate musical
expressions with Oto-Shigure. One is, generating non-localized
and airy sounds without any cords. The user can feel as if the
sounds of falling rain are encompassing their whole body. Second
is, generating 3D sound space under the umbrella by connecting
to PC and controlling the interface built in iPod touch. This is the
novel sound output system which provides two kinds of quite
unlike sounds.Moreover, we will create an interactive content
that can be usedamong multi-usersfor our future work.
5. REFERENCES
[1] Takahashi, Bog: Instrumental Aliens, In Proc. of the 2007
Conference on New Interfaces for Musical Expression
(NIME2007), 429, 2007,
[2] André Knörig, Boris Müller, Rato Wettach, Articulated
Paint: Musical Expression for Non-Musicians, In Proc. of the
2007 Conference on New Interfaces for Musical Expression
Figure 2. Two Systems of Oto-Shigure (NIME2007), 384-385, 2007.
[3] Yuichiro Katsumoto, Masa Inakage, "Amagatana", Ars
3.2 3D Sound Electronica 2007 Pixelspace, Linz, Austria, 5-11 September,
To generate 3D sounds, the audio signal is processed by computer 2007.
software. The processed signal goes through the external audio [4] Paul De Marinis, Rain Dance,
interface and is generated from Oto-Shigure. Specifically, the
http://www.well.com/~demarini/exhibitions.htm
software splits the input audio signal into four equal signals and
controls the volume and phase of each signal, based on the
assigned point. This software is developed by Max/MSP.
353
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
!" $&'()*
#$
$
#$
$
%
\
^\`
+ ` =
$
{|
}
=
!
@
#
#
#
#
@
#
|
@ *+=?
! =
@
#
#$%
&
QQ
@
#
Q
Q {
#
#
|
'
"()**
+ *
(
#
#
# # #
#
##
\
^
#
@
= +
#
#
#
##
#
*+=?
#
#
$
#
*+=?
##
@
*+=?Q
`|
#
##
\
^\`
}
#
= #
@
#
/&
#
+ `
=
#
@ `
`|
+
`
@#}*
@
354
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
#$
@
@
@
*+=?
}
=
#
##
\
^\`
!
#
#
<%(
@
#
=
*+=?
##
#
#
+`=\
^\`
Q
}
\
^\`
{|
*
@ $
@
*+=?#
8>=?
@
@\
=
`
#
;#
\
\
^^
^^
\
`=\`
#
`#
@@
{
!"
#
##`
#
#$%
'& (*"++"*("(:%"+
$
=
$
@
#
`
`#
$"
;:%"+
}
!
##
# }
355
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
#
! " $ "
&
"
"
!
"
#
)
&"
0&1
$
"
"
"!! "
"
%
!
&
'( )
#$%
"!! "
"
&
"
" *"
2
3
3
*
4
56 7+889:%
;
<(
356
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
(#-
)
&
/
#
"
)
"
#
&
*
*
=
$
("))"$
=
%
*
*-.
"
: ; :
>
;;?)
0
#
/#
@'A &"
&1
44
"
4"4)C?
357
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT For this, the motion of a gesture is grouped in at least two main
In recent years video based analysis of human motion gained states performed in a repetitive way. The whole gesture may then
increased interest, which for a large part is due to the ongoing be seen as a progression through a cyclic state model with the
rapid developments of computer and camera hardware, such as aim to view the gesture not as an isolated event but in the gesture
increased CPU power, fast and modular interfaces and high context and related motions.
quality image digitisation. A similar important role plays the
development of powerful approaches for the analysis of visual
2. VARIATION OF GESTURE INSTANCES
data from video sources. In computer music this development is Each gesture was recorded at 3 lower positions and 2 upper
reflected by a row of applications approaching the analysis of positions of the gestural space of the hand and arm to obtain data
video and image data for gestural control of music and sound reflecting the variance of the hand articulation at differing
such as Eyesweb, Jitter, CV ([1,[2], [3]). Recognition and locations. All 5 recording instances were aimed to be in a plane
interpretation of hand movements is of great interest both in the parallel to the front of the camera. Blended hand positions for the
static states of four cyclic gesture types are shown in Figure 1 to
areas of music and software engineering ([4], [5], [6]). In this
demo an approach is presented for the control of music and Figure 4.
sound parameters through hand gestures, which are recognised
by an artificial neural network (ANN). The recognition network
was trained with appearancebased features extracted from image
sequences of a video camera.
358
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
example. In our approaches each output unit was associated with aleatoric and the strict approach the binding of the body gestures
a certain state of the training patterns for the network. to the musical actions have to be considered thoroughly.
A similar situation may be found for the required number of
4. GESTURAL CONTROL OF SOUND recognisable gestures, which differs between the musical
The system tries to realise the control of a sound generation intention and the role it assigns the gesture recognition. Two or
process by using discrete bindings of gestures to sound three gestures may be enough to play a central role in a piece.
parameters. The system uses gestures of the left hand which are For a complex control a larger number of gestures is required
recognised by the video analysis. Two identical sound generation e.g. more than the 16 gesture states of the hand used for the
processes for live sampling and sound modification are realised training of the neural network of the demo system.
in a Max/Msp patch. The position space of the hand is divided
through a dedicated object (Gitter) into 9 concentric fields
(Figure 5).
359
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT by means of sound, music and his/her own body motion like
This research aims to develop a novel instrument for socio- dancing or tapping. The promising applications include edu-
musical interaction where a number of participants can pro- tainment, recreation, fitness, rehabilitation, entertainment
duce sounds by feet in collaboration with each other. The or sports and new artistic expression.
developed instrument, beacon, is regarded as embodied sound
media product that will provide an interactive environment 2. SYSTEM OVERVIEW
around it. The beacon produces laser beams lying on the
ground and rotating. Audio sounds are then produced when Hardware Configuration: The developed instrument con-
the beams pass individual performer’s foot. As the perform- sists of a loudspeaker, a small-size computer, 60 line laser
ers are able to control the pitch and sound length according modules, 2 laser range finders, dial and buttons interface,
to the foot location and angles facing the instrument, the and battery. All equipments are installed in a cylinder
performer’s body motion and foot behavior can be trans- shaped interface as illustrated in figure 2. This instrument
lated into sound and music in an intuitive manner. is a kind of small lighthouse sending out line laser beams.
The beams are used not only to mark the current location
to produce the sound but also to assist musical interaction.
Keywords In the current implementation, up to 4 laser beams with
Embodied sound media, Hyper-instrument, Laser beams equiangularly-spaced directions are lying on the ground and
rotating during musical performance. The rotation speed of
laser beams can be set from 40bpm to 100bpm. At the bot-
1. INTRODUCTION tom of the instrument, two laser range-finders are installed
Many sound installations with electroacoustic techniques and used for the distance measurement to performers, in
have been presented so far. However, those systems usually particular those foot positions and its angles every 100 ms
require a large space or complicated instruments. There at the height of 1 cm from the ground. The installed range-
are very few compact interfaces for enjoying music with finder has 4[m] measuring range with 99% range accuracy,
other performers or audience like conventional musical in- and also has a 240 degree angle of view for each. We used
struments. In this paper, we introduce a portable instru- two range-finders in order to obtain omni-directional dis-
ment called beacon for socio-musical interaction. A number tance map every time.
of line laser modules are installed, and the laser beams are
produced and rotated around the instrument. The beam Motion-to-sound Mapping: The performer is regarded
performs like a moving string because sounds are generated as a musical note. beacon generates sounds when the beams
every time the beam lying on the ground passes the perform- passed individual performers as if the rotating laser beams
ers’ feet. A real-time motion capture technique and pattern could detect them. However, in reality, the performers around
recognition of users’ feet are used in order to create a new by beacon are detected at all times by the equipped omini-
style of musical interaction. Therefore, this instrument pro- directional laser range-finder. A number of performers, there-
vides an embodied sound media environment [1] where ev- fore, can participate in a musical session, and individual per-
eryone can readily enjoy it for playing the sounds without
scores and also can interact with others through collabo-
rative musical experiences. In the interactive environment
around beacon, people can communicates with each other
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
NIME08, Genova, Italy
Copyright 2008 Copyright remains with the author(s).
Figure 1: beacon - a new interface for socio-musical
interaction.
360
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Acknowledgement
A part of this work is supported by JST CREST ”Generation and
Control Technology of Human-entrained Embodied Media.”
References
[1] K. Suzuki et al., Proc. IEEE, 92(4), pp. 656-671, 2004. Figure 5: Foot angles with different distances
361
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Keywords
Wireless controller, Pure Data, Gestural interface, Interactive
Lights.
1. INTRODUCTION
This paper describes the development of prototype GO, a Figure 1. Prototype GO.
wireless and wearable controller for sound processing using
Pure Data [1]. GO is being developed as part of research i n
wireless and portable systems for sound processing. Various
sensors on the GO board are reading data from human
movements. Output from GO is, in addition to live sound
processing, also using various lights modules
corresponding to physical movement. The first stage of
development was described in Designing Prototype GO for
Sound and Light. [2] To couple sound and light for live
performance has not been examined within studies of
wearable interactive performances. Figure 2. Volume control.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. To copy
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee.
NIME08, June 5-7, 2008, Genova, Italy
Figure 3. Main interface in Pure Data.
Copyright remains with the author(s).
362
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ACKNOWLEDGEMENTS
Many thanks to White Noise ll and New Composers Series at
White Box in New York for commissioning the GO
Karamazov performance for PERFORMA 07 [7]. Many thanks
also to Nordscen Artist in Residency Program for giving me
space to work out the first prototype and light modules for
GO [8].
Figure 4. Light module circle.
REFERENCES
[1] Pure Data, http://puredata.info/
[2] Sjuve, E.S.I. Designing Prototype GO for Sound and
Light. In Proceedings of Pure Data Convention 2007,
(Montréal, Canada, August 21-26, 2007),
http://artengine.ca/~catalogue-pd/
[3] Micro Chip, http://www.microchip.com
[4] Analog Devices, http:// www.analog.com/
[5] Images Scientific Instruments,
http://www.imagesco.com/sensors/flex-sensor.html
[6] Serial communication, RS-232 is emulated by rfcomm, a
Bluetooth protocol.
[7] PERFORMA 07, http://www.performa-arts.org/
[8] Nordscen, Nordic Resort,
http://www.nordscen.org/nordicresort/
Figure 5. Light module centre.
363
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
ABSTRACT
The computer games industry has recently been producing
titles in a new genre called ‘music’ that toys with engag-
ing the user in musical expression. The technology used in
these games has allowed for novel interfaces for represent-
ing musical instructions which has yet to be tried within
musical practice and tuition. The games themselves have
greatly simplified the instruments and the music created to
the point where the skills learnt are not transferrable to the
actual instruments that they seek to recreate.
The aim of this work is to explore the potential to move
from this category of entertainment systems based on mu- Figure 1: A front and back view of the projected
sical expression towards tutoring applications that support display
learning musical instruments in an entertaining and reward-
ing experience. Note-Scroller (Figure 1) is an interface that
has been designed to bridge this gap and act as a case study 2) for example, is the first video game franchise to generate
for evaluating the potential of such a movement. It is hoped more than $1bn in revenue and the title Dance Dance Revo-
that Note-Scroller will be fun and intuitive to use, teaching lution (with a similar instructive interface) alone sold more
users how to play music on a piano-style keyboard. than 7.5 million units. Recreating the elements that make
these games successful in educational platforms would be
the next logical step.
Keywords The motivations for incorporating this set of features into
Graphical Interface, Computer Game, MIDI Display one package are varied but ultimately stem from the hope
of making music more accessible to a wider audience:
1. INTRODUCTION • The display methods used in these popular video games
In this paper, we describe an interactive system for pro- may be more intuitive to non music-literate users than
viding the user with musical instructions using methods in- standard music notation.
spired by video games. This system also has the purpose of • Computer games incorporating modern features have
acting as a learning aid that can evaluate the users perfor- been shown to increase students’ motivation [7].
mance in real-time to provide visual feedback. As MIDI files
often contain the full range of instruments and instructions • Adding visual cues to the users’ auditory feedback
used in a musical piece, the option of having the computer loop may result in an increase in performance.
play selected instruments means Note-Scroller can also ac-
company the player. • As an interface to the vast MIDI content already avail-
Previous work in this area includes examples of computer able on the internet, students will be able to choose
based musical learning aids [2]. Work by Guillaume Denis musical pieces from an almost limitless source.
and Pierre Jouvelot [3, 4] demonstrates the opportunity to By making the performance of music more accessible it
create video games with the purpose of teaching music. The was intended that such a system will remove the typical
use of visual feedback in musical expression and its impli- barriers that deter people from learning to play instruments
cations on mental workload was explored in François et al’s such as the cost of tuition, availability and the requirement
Mimi tool [5]. Also, games that already go some way as to of reading standard music notation. However, Note-Scroller
use musical expression on a more simplified level are proving would also ideally be used in conjunction with other learn-
immensely popular. The Guitar Hero franchise (see Figure ing methods.
2. DESIGN
Permission to make digital or hard copies of all or part of this work for Similar to the displays featured in video games such as
personal or classroom use is granted without fee provided that copies are Guitar Hero, Frets on Fire and Dance Dance Revolution,
not made or distributed for profit or commercial advantage and that copies the instructions flow to the user as and when they need to
bear this notice and the full citation on the first page. To copy otherwise, to be executed. As these video games show, providing users
republish, to post on servers or to redistribute to lists, requires prior specific with visual cues of what is coming next allows the user to
permission and/or a fee.
NIME08, Genova, Italy prepare more efficiently for their subsequent actions utiliz-
Copyright 2008 Copyright remains with the author(s). ing any spare attention [8]. Another benefit to the animated
364
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
365
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
366
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
and interface. Burtner’s musical landscape takes the them to play sensor controls. For these reasons we chose a bent
saxophone well outside the instrument’s traditional jazz and soprano as our first instrument to work with and i n
repertoire boundaries into a space that redefines timbre. collaboration with the saxophonists developed a sensor
Burtner explains his approach as a modification of the keys, interface consisting of two panels.
situating force-sensitive resistors under the finger-tips t o
affect “after-touch”. He writes; “…In essence, the saxophone The first panel mounted a number of dials, switches and FSRs
keys which normally execute only on/off changes of the air and was situated on the right hand side of the saxophone’s
column, are converted to continuous control levers…”[1]. bell.
3. GLUISOP
Amongst the saxophonists involved in the project, there
remained a strong interest in developing a small, portable
extended saxophone. Touring and air freight issues were the Figure 2. Gluisop left-hand panel
main consideration here. But also smaller instruments, well
supported by neck-straps, allow for the weight of the Two microphones were used to pick up the instruments sound,
instrument to be taken off the thumbs, potentially freeing one clipped on to the bell and another one over the key-work
367
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
to pick up key clacks and other techniques. The microphone range is confined discretely to control the buffer size for
signal was digitized by a Digidesign 002 audio interface. rhythmic looping and re-sampling. The two transposable
buffer delays have a pitch range of over eight octaves. The
mapping was developed as an expressive, intuitive solution
The sensors were digitized using a gluion sneaker interface for a number of joysticks, wheels and FSRs.
with sensors cabled [soldered] onto pins, allowing them to be At this stage of the project the majority of trial performance
connected directly into the interface housings high density work has been done with the Gluisop instrument, [consisting
SUB-D connector. Analogue sensors are sampled at 16bit of regular rehearsals over a six-month period. During this time,
resolution and OSC data streamed directly into MaxMSP with the instrument was secured to an adjustable stand taking all of
up to a 1 msec refresh rates. With the instrument supported b y the weight off the hands. Another dial controller was added t o
a neck-strap the musicians could play the saxophone’s key- the left-hand panel just above the joystick. The FSR on this
work unrestricted and still have at their disposal up to four panel was repositioned between the underside of the panel and
independent channels of simultaneous sensor control. the left-hand upper thumb rest of the instrument so that any
Situating a joystick at the lower thumb rest extended this downward pressure applied to the dials or joystick of this
further to five. panel could be transferred independently as another channel
for control.
368
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
detailed knowledge of the signal processing techniques The project team has also presented the instrument at the
involved. Once the mapping was set up the instrument was University of Adelaide where experimental saxophonist Derek
intuitive to the player. New sounds and techniques have been Pascoe and composer Luke Harrald have been working on live
discovered in each of the following sessions also and the multi-agent performance systems for saxophone. Luke has
development of advanced techniques continues. The mapping expressed an interest in writing for the ensemble and it i s
and sonic outcomes are also compatible with the Bent Leather anticipated that the new instrument projects will be completed
Band’s existing ensemble language so the Gluisop has been in 2008. New mappings and signal processing techniques
brought into the group. including tuning systems and spatial projection control will
also be trialed.
6. REFERENCES
[1] Burtner M. “The Metasaxophone: Concept,
implementation and mapping strategies for a new
computer music instrument” Organised Sound: Vol. 7,
no. 2. Cambridge: Cambridge University Press: pp. 201-
213.
[2] M. Burtner and S. Serafin: The Exbow-Metasax. JNMR,
Fig. 4 & 5. Gluialto metasax July 2002.
[bent leather band 2008] [3] Favilla S. Cannon J. & Greenwood G. 2005 “Evolution
and Embodiment: Playable Instruments for Free Music” In
ICMC Free Sound, ICMA 2005 pp. 628-631
[4] De Laubier S. & Goudard V. “Meta-Instrument 3: a look
This instrument, although not capable of as many over 17 years of practice”, NIME06, pp. 288-291, 2006
simultaneous channels of control as the Leathersop; [5] Kartadinata S. “The gluion, advantages of an FPGA-based
introduces the main ideas of signal processing expression sensor interface.” NIME06, pp. 93-96, 2006
such as, delay time [dial] and feedback [FSR or pressure]
control, the use of two dimensional joystick controllers and [6] Kientzy D. Teruggi D. Risset J. & Racot G. Sax-Computer,
global parameter settings and controls. Therefore it also serves Audio CD, INA GRM, SNA, 1990
as a training instrument for the more advanced Leathersop [7] LeMouton S. Stroppa M. & Sluchin B. “Using the
sensor interface as well as an instrument in its own right. augmented trombone in I will not kiss your f.ing flag”,
NIME06, pp. 304-307, 2006
[8] Schiesser S. & Traube C. “On making and playing an
electronically-augmented saxophone.” NIME06, pp. 308-
313, 2006
[9] M. Wright and A. Freed: Open Sound Control: A New
Protocol for Communicating with Sound Synthesizers, In
Proceedings, International Computer Music Conference,
Thessaloniki, Hellas, 1997
5. NEXT STEPS
The next stage of the project involves building the ensemble
up and networking interfaces to a single computer. The Bent
Leather Band project “Heretics Brew” aims to develop an
ensemble of experimental instruments/interfaces for brass,
saxophones, woodwinds and guitar families. The project i s
building momentum and in the process of staging public
performances and recording with Tony Hicks, Dale Chant, Paris
Favilla and Melbourne experimental improviser and guitarist
Ren Walters.
369
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Staas de Jong
LIACS
Leiden University
staas@liacs.nl
!
!
!
.
"
#
! $ # /(
%0'
$
%&'()
!
%/'
*
$
+
(
$
(
! ,
2
!
<8; -44= 3
$(
370
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
!"!
9
!
!
9
!
2
!
# 0
$
(
@
+
%6'
>
.
$ # -( ?
/4
A
!
)
!
/
-
!
2 %8'
# 0
B2 2
-84 )9
%& /'
,
9
$
%/' ? , ?
D"
+,
"
B
E
>
!
"%
&
' %&
'*+/> BF -44&
%-' ><
,
D
>
"
!
* '
4
9;
:
# -446
%0' " ,B
@
< = =
"/;;>++?>"@/
/;//G
:
,
! / C : /778
%&' " " * ""E
#>" !
* ' 4
9;
H/B
C, -44;
%6' * ":
:
,,
?
,
!
JK;
KJ
;
! ?' J
K
J
J/,? C,
!
371
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
372
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The HOP sensors incorporate a 3D accelerometer and a Wireless platform that can highly contribute to our understanding of music-
USB transceiver, which accesses the 2.4 GHz ISM band. A driven social interaction, using the principles of embodied music
dedicated HUB is connected via USB to the computer and cognition [6].
recognized as a virtual COM port. Sampling rates of 100 Hz are
possible and a receiver range of 40 m is achieved. The major 5. ACKNOWLEDGMENTS
advantage of the HOP sensors is their size: 55 mm long, 32 mm
Special thanks to Bart Kuyken and Wouter Verstichel and the
wide and around 15 mm thick (including connectors). Each sensor
support of the TFCG Microsystems Lab - IMEC under the
is powered by a Li-Po battery, which has the same dimensions as
guidance of Jan Vanfleteren to design and manufacture the HOP
the sensor and provides around 18 hours of operation time. This
sensors. Also our gratitude goes to Prashant Vaibhav for
small design makes it easy to strap the sensor on the legs or arms
implementing the WiiSense object in PD and to all the
of people using simple stretchable Velcro.
participants in our experiments.
373
Performances
Opening Concert
This concert includes four original music pieces emerging from the experience of four
young composers working with interactive technologies, and in particular with the
EyesWeb XMI platform for eXtended Multimodal Interaction (www.eyesweb.org).
EyesWeb XMI supports the design of multimodal interactive systems, the analysis and
processing of expressive full-body movement and gesture, and a large number of further
features. This concert shows on stage concrete results from current research at Casa
Paganini-InfoMus Lab (www.casapaganini.org). The concert is not just held at Casa
Paganini: it fully exploits the whole environment of Casa Paganini as an overall
instrument/interface for musical expression.
In particular, the piece “Lo specchio confuso dell’ombra” by Roberto Girolin faces the
problem of remote communication and social interaction between audience in different
locations: the Foyer and the Auditorium of Casa Paganini. The piece is structured in two
separate but communicating installations. One of the main scientific research issues
behind this piece, raised and experimented during its design and implementation is on
“how to interact and to convey expressive content in a remote networked environment?”,
one of the main core issues of CoMeDiA.
The piece “The Bow is bent and drawn” by Nicola Ferrari is again another challenge, this
time centered on the SAME ICT EU Project. This piece explores a novel paradigm on
“active music listening” developed at Casa Paganini - InfoMus Lab, described in the paper
by Camurri et al. in these proceedings. The active music listening paradigm has been
elaborated and transformed into a compositional element by the composer.
377
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Roberto Girolin (1975) was born in Pordenone, Italy, and after studying of the classical guitar he began to study the piano
and composition at the "J. Tomadini" Conservatory in Udine. He studied the vocal and instrumental counterpoint, graduating
in choral music and conducting in the same Conservatory. He has conducted many choirs and orchestras, exploring different
kinds of repertories from Gregorian music to contemporary music.
He has deepened the study of contemporary music at the University of Udine with Dr.A.Orcalli and then with Dr.N.Venzina
at "B.Maderna" Archive in Bologna (Italy). He has followed several Masterclasses and seminars: choral music, chamber
music, composition (Salvatore Sciarrino, Fabio Nieder, Mauro Bonifacio), electronic music (Lelio Camilleri, Agostino Di
Scipio), a Sound Design course with Trevor Wishart, an Audio Digital Signal Processing for Musical Applications (Lab
workshop, lessons and applications) with Giuseppe Di Giugno and live electronics in Luigi Nono's works with Alvise
Vidolin and André Richard (Experimental Studio Freiburg für Akustische Kunst).
He graduated with full marks in Electronic Music and Multimedia at the Musical Academy of Pescara (Italy) and in 2006 he
also got his degree at the Conservatory of Venice under the direction of Alvise Vidolin with full marks (cum Laude).
He is actively involved in performing and investigating the compositional and performance potential offered by
electronic&multimedia music systems. His music is performed in Italy and abroad. He has recently won the “Call 2007”,
(Italian CEMAT Competition) and a Mention at the 34th "Concours Internationaux de Musique et d'Art Sonore
Electroacoustiques de Bourges", France.
Description of Piece
Lo specchio confuso dall’ombra can be translated as “The mirror confused by its shadow” and it is between a distributed
installation and a concert, in which opposing groups of performers in two remote places play solo or interact.
The audience (two people at a time, one for each installation) activates video and sound transformations, depending on the
space they occupy and their gesture. The two installation are in the Foyer and in the Auditorium, respectively, so the two
persons from the audience cannot see and talk each other. Multimodal data and expressive gesture cues are extracted in real-
time by an EyesWeb patch, interacting and playing with the electronic performer. The interaction occurs both between the
electronic performer and the two places where the audience has access, and between the two remote installations. There are
two different levels of intervention in the audio and video transformation: autonomous, depending on the single person and
conditioned, depending on the behaviour and the actions occurring in the other, separate installation.
Further, the entrance of the concert hall has microphones, which capture words, sentences, coughs, laughs or other noise,
which are transformed in real-time and thus entering into the piece.
Lo specchio confuso dall’ombra can’t bind the audience remain seated or follow a specific pattern in his behaviour. His
duration is indefinite: it changes every time it is performed.
Acknowledgments
This piece has been commissioned by Casa Paganini – InfoMus Lab, to tackle open problems on networked performance
faced in the EU Culture 2007 Project CoMeDiA.
378
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Dancers: Giovanni Di Cicco (choreography), Luca Alberti, Filippo Bandiera, Nicola Marrapodi
EyesWeb interactive systems design: Paolo Coletta, Barbara Mazzarino, Gualtiero Volpe
Biographical information
Nicola Ferrari was born in 1973. He studied composition with Adriano Guarnieri and took his degree at ‘G. B. Martini’ Conservatory
in Bologna. He took his Master Degree and PhD from the Faculty of Arts and Philosophy at University of Genoa. Since 2005 he is a
member of the staff of the InfoMus Lab. For many years he directed the ‘S.Anna’ polyphonic choir. He wrote scores for theatrical
performances.
The bow is a theatrical mise-en-scene of the installation Mappe per Affetti Erranti. During the Science Festival 2007, as a preparatory
work for the EU ICT Project SAME on active listening (www.sameproject.org), the audience was invited to explore and experience a
song by John Dowland (see the paper on these proceedings by Camurri et al). The audience could walk inside the polyphonic texture,
listen to the singles parts, change the expressive quality of musical interpretation by their movement on the stage of Casa Paganini
analysed with EyesWeb XMI. Aesthetically, the most interesting result consists in the game of hiding and revealing a known piece.
The idea could be matched with the classical theatrical topos of recognition. So, the musical potentiality of the ‘interactive
performance’ of a prerecorded music becomes a new dramaturgical structure.
Roberto Tiranti and his madrigalistic group recorded, under the supervision of Marco Canepa, different anamorphic interpretations of
a bachian choral. Thanks to the interactive application developed with EyesWeb XMI, the group of dancers conducted by the
choreographer Giovanni Di Cicco, mix and mould the recorded music material in real time. At the same time, the live sound of the
vocal group explores the whole space of Casa Paganini, as a global (both real and imaginary) musical instrument. In a metamorphic
game where, according to Corrado Canepa’s compositive lesson, electronic and acoustic technologies merge and interchange their
specificity, this interactive score of losing and finding, multiplying and distillating the ancient bachian palimpsest tries to tell the
dramatic history of King Lear, the most tragic western figure of difficulty to reach the affects you possess without being able to know
or express.
Acknowledgments
The music commission is kindly offered by Fondazione Spinola. The scientific and technological developments are partially
supported by the EU FP7 ICT Project SAME (www.sameproject.eu).
379
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Giorgio Klauer studied electronic music, instrumental composition, flute and musicology in Trieste, where he was born in 1976, in
Cremona and in Liège. He is professor at the Conservatory of Como, school of music and sound technologies.
Description of Piece
Putting a distance sensor under the scroll of the instrument and an inclination sensor on the wrist, the detection of the displacements of the
limbs of the interpreter becomes possible. These displacements, drawn onto a cartesian plane, give the coordinates of a track in an ideal
performing space, whose third dimension is increased and formed by the passing of time. Actually, the computer permits to assimilate to
the aforesaid track the sounding path proposed by the interpreter, hence to rehear it. Also in the latter case, the coordinates to access it are
given by current gestures, therefore the dimension of time results bundled, somehow like considering a parchment palimpsest: the sounding
form returned by the computer results increasingly dense and inexplicable and needs an electroacoustic exegesis that unleash it at least in
shreds.
The procedures of musical production are here a metaphor for knowledge; alike are the compositional methods at the root of the score,
which providing the prescriptions of the musical path, portrays in addition a mental track.
380
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Aurora Polare
Alessandro Sartini
Conservatorio Niccolò Paganini
Via Terracini 140/2
16166 Genoa, Italy
+393491291231
sartiniale@libero.it
Alessandro Sartini
Born in Genoa in 1982, he studied piano with Canzio Bucciarelli and attends the last year of Composition at the Conservatory of Genoa
with Riccardo Dapelo, who introduced him to “live electronic” treatments. His first public exhibition was at the Auditorium Montale of the
Carlo Felice Theatre in Genoa, during the concert commemorating the 50th anniversary of Béla Bartók's death in 1995. From that year on
he established a great number of collaboration with various solo musicians, who really appreciated his way to accompany; this guided him
to work in partnership with a good number of professional soloists. In 1999 he joined the class of Composition at the Conservatory of
Genoa with Luigi Giachino, who introduced him to film music: this interest led him to win the third prize at the Lavagnino International
Film Music Festival in Gavi in 2006 and the first prize at the “Concorso Internazionale di Composizione di Alice Belcolle" in 2007. With
Valentina Abrami, he is the founder of the “Associazione Musica in Movimento”, which operates at the “International School in Genoa”.
Aurora Polare
Aurora Polare (Polar Dawn) is a short piece for cymbals, tam-tam, vibraphone, live electronics and EyesWeb system. This piece was
inspired by the smooth movements of waves, the drawings created by polar dawns and the cold weather in polar seas – that’s the reason
why only metallophones are used.
The first matter to fight with was making the percussionist elaborate the sound they
produce while playing their instruments and crafting a brand-new easy way to
specify every movement. That’s why, under the traditional notation score, two
special lines follow the music specifying the direction to move to: up-down and
left-right/near-far. A line approaching the top or the bottom of the Y axis tells the
way to track. You can find an example here on the left.
All of those movements fully interact with EyesWeb and MAX MSP thru two
30fps accelerometer bracelets worn by the performers. Every vertical movement
controls the volume of the processed sound, while horizontal movements manage a
different patch in MAX MSP suited to every instrument: a tam-tam sample speed
controller (this make the instrument play without being touched), an harmonizer to
make cymbals sing just like a Theremin, but with their own processed sound, and
the rate of a delay. In the control room a MIDI controller and a computer will be used to manage live additional effects and parameters,
like granular synthesis, reverb and multi-slider filters.
Thanks to Martino Sarolli for helping me with MAX MSP, to Matteo Rabolini and Matteo Bonanni for playing my composition.
381
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Pyrogenesis
Pascal Baltazar
GMEA,
4, rue Sainte Claire
F-81000 Albi France
+33 563 545 175
pb@gmea.net / pb@zkrx.org
Biographical information
Pascal Baltazar is a composer and research coordinator at GMEA, National Center for Musical Creation in Albi, France. His research
focuses on spatial and temporal perception of sound, and its relationship to the body and musical gesture. He is coordinating the Virage
research platform, on control and scripting novel interfaces for artistic creation and entertainment industries, granted by the French
Research Agency, in the frame of its Audiovisual and Multimedia program, for the 2008-2009 period. He is an active member of the
Jamoma collective.
He has studied Aesthetics (Masters of Philosophy Thesis The sonic image : material and sensation, 2001, Toulouse III, France) and
electroacoustic composition at the National Conservatoire of Toulouse. He has then been implied as a composer or interactive designer in
diverse artistic projects : concerts, performing arts shows and interactive installations. He has been commissioned for musical works by
several institutions, as the French State, INA-GRM, GMEA, IMEB… and participated in international festivals (Présences Électroniques,
Paris / Radio France Festival, Montpellier / Synthèse, Bourges / Videomedeja, Novi Sad / Space + Place, Berlin…).
Description of Piece
The composition of Pyrogenesis took inspiration from several aspects of the blacksmithing, not in a literal way, but much as a set of
correspondences :
First, the gesture, by which the blacksmith models the matter continuously; striking, heating, twisting, soaking metals to gradually print a
form into them.
Then, the tool: Just like the blacksmith manufactures his own tools, I work on developing my own electro-acoustic instrument: an
instrument to write sound, in space and with a gestural input.
Lastly, the organic construction of the form: Gilles Deleuze says "Why is the blacksmith a musician? It is not simply because the forging
mill makes noise, it is because the music and the metallurgy are haunted by the same problem: that the metallurgy puts the matter in the
state of continuous variation just as the music is haunted by putting the sound in a state of continuous variation and to found in the sound
world a continuous development of the form and a continuous variation of the matter "
On a more technical/scientific point of view, the interaction with the performer uses two interfaces : a Wacom tablet, and a set of force-
resistive-sensors (through an analog-to-digital converter), which common point is that they both allow control by the pressure of hands, and
thus offer a very “physical” mode of control.
The composition/performance environment consists of a set of generative audio modules, fully addressable and presettable, including a
mapping engine allowing a quick yet powerful set of mapping strategies from controllers inputs and volume envelopes to any parameter,
including those of the mappers themselves, allowing a very precise, flexible, and evolutive sound/gesture relationship in time.
The composition has been realized through a constant dialogue between improvisations in a pre-determined trajectory, and afterwards-
listening of the produced result. Thus, most of the details of the composition have been generated by an improvisation/learning-through-
repetition process, without any visual support - thus allowing to emphasize expressivity while keeping a very direct relationship to the
musical gesture.
382
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Chikashi Miyama received his BA(2002) and MA(2004) from the Sonology Department, Kunitachi College of
Music, Tokyo, Japan and Nachdiplom(2007) from Elektronisches studio, Musik-Akademie der Stadt Basel, Basel,
Switzerland. He is currently attending the State University of New York at Buffalo for his ph.D. He has studied
under T.Rai, C.Lippe, E.Ona, and G.F.Haas. His works, especially his interactive multimedia works, have been
performed at international festivals, such as June in Buffalo 2001 (New york, USA) , Mix '02 (Arfus, Denmark),
Musica Viva '03 (Coimbra, Portugal), Realtime/non-realtime electronic music festival (Basel, Switzerland), Next
generation'05 (Karlsruhe, Germany), as well as various cities in Japan. His papers about his works and realtime
visual processing software "DIPS" have also been accepted by ICMC, and presented at several SIGMUS
conferences. Since 2005, he has been performing as a laptop musician, employing his original sensor devices and
involving himself in several Media-art activities, such as Dorkbot, Shift-Festival, SPARK, and SGMK workshops.
His compositions have received honorable mention in the Residence Prize section of the 30th International
Electroacoustic Music Competition Bourges and have been accepted by the International Computer Music
Conference in 2004, 2005, 2006 and 2007. Several works of him are published, including the Computer Music
Journal Vol.28 DVD by MIT press and the ICMC 2005 official CD.
Description of Piece
"Keo" is a performance for voice improvisation, Qgo sensor instrument , and live electronics. The author attempts
to realize three concepts in the work. The first is "dual-layered control," in which the performer improvises
phrases by singing and providing sound materials for a computer. Simultaneously, he sends commands to the
computer to process vocals using a pair of sensor devices worn on both hands. The second is the connection
between the visuality of the performance and the musical
gestures. In most parts of the performance, the movement of
the sensor instrument and the musical parameters are clearly
connected. If the performer moves his hand even slightly,
particular aspects of the sound are influenced in an obvious
manner. The third is the strong connection between music
and theatricality. In several parts of this work, the body
motions of the performer not only control the sensor device,
but also provide some theatrical meanings. (Photo ; Qgo, sensor
instrument)
383
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Intersecting Lines
Keith Hamel
University of British Columbia
6361 Memorial Rd.
Vancouver, B.C. Canada
1-604-822-6308
hamel@interchange.ubc.ca
François Houle
Vancouver Community College
1155 East Boradway
Vancouver, B.C. Canada
1-604-874-3300
f.houle@telus.net
Aleksandra Dulic
University of British Columbia
6361 Memorial Rd.
Vancouver, B.C. Canada
1-604-822-8990
adulic@interchange.ubc.ca
Biographical information
François Houle has established himself as one of Canada’s finest musicians. His performances and recordings transcend the stylistic
borders associated with his instrument in all of the diverse musical spheres he embraces: classical, jazz, new music, improvised music, and
world music. As an improviser, he has developed a unique language, virtuosic and rich w ith sonic embellishments and technical extensions.
As a soloist and chamber musician, he has actively expanded the clarinet’s repertoire by commissioning some of today’s leading Canadian
and international composers and premiering over one hundred new works. An alumnus of M cGill University and Yale University, François
has been an artist-in-residence at the Banff Centre for the Arts and the Civitella Ranieri Foundation in Umbria, Italy. Now based in
Vancouver, François is a leader in the city’s music community and is considered by many to be Canada’s leading exponent of the clarinet.
Keith Hamel is a Professor in the School of M usic, an Associate Researcher at the Institute for Computing, Information and Cognitive
Systems (ICICS), a Researcher at the M edia and Graphics Interdisciplinary Centre (MAGIC) and Director of the Computer M usic Studio at
the University of British Columbia. Keith Hamel has written both acoustic and electroacoustic music and his works have been performed
by many of the finest soloists and ensembles both in Canada and abroad. M any of his recent compositions focus on interaction between
live performers and computer-controlled electronics.
Aleksandra Dulic is media artist, theorist and experimental filmmaker working at the intersections of multimedia and live performance with
research foci in computational poetics, interactive animation and cross-cultural media performance. She has received a number of awards
for her short animated films. She is active as a new media artist, curator, a writer, an educator, teaching courses, presenting art projects and
publishing papers, across North America, Australia, Europe and Asia. She received her Ph.D. from the School of Interactive Art and
Technology, Simon Fraser University in 2006. She is currently a Postdoctoral research fellow at the M edia and Graphics Interdisciplinary
Centre, University of British Columbia funded by Social Sciences and Humanities Research Council of Canada (SSHRC).
Description of Piece
Intersecting Lines is a collaboration between clarinetist François Houle, interactive video artist Aleksandra Dulic and computer music
composer Keith Hamel. The work grew out of Dulic's research in visual music and involves mapping a live clarinet improvisation onto
both the visual and audio realms. In this work an intelligent system for visualization and signification is used to develop and expand the
musical material played by the clarinet. This system monitors and interprets various nuances of the musical performance. The clarinetist’s
improvisations, musical intentions, meanings and feelings are enhanced and extended, both visually and aurally, by the computer system,
so that the various textures and gestured played by the performer have corresponding visuals and computer-generated sounds. The
melodic line, as played by the clarinet, is used as the main compositional strategy for visualization. Since the control input is based on a
classical instrument, the strategy is based on calligraphic line drawing using artistic rendering: the computer-generated line is drawn in 3D
space and rendered using expressive painterly and ink drawing styles. The appearance of animated lines and textures portray a new artistic
expression that transforms a musical gesture onto a visual plane. Kenneth Newby made contributions to the development of the animation
software. This project was made possible with generous support of Social Sciences and Humanities Research Council of Canada.
384
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Vistas
Ernesto Romero
Los Platelmintos
Union 139-5, Col. Escandón, México, D.F.
tait_mx@yahoo.com
Esthel Vogrig
Los Platelmintos
Union 139-5, Col. Escandón, México, D.F.
cuki100@hotmail.com
Biographical information
Los Platelmintos are a group of artists, living in Mexico City, that work under the premise of interdiscipline and experimentation. Dance,
music and electronic media are fundamental elements in their work. Ernesto Romero : music composition and electronic media. Studies
Composition, Mathematics and Choir conduction in México. Chief of the Audio Department at the National Center for the Arts in México
where he researches and developes technology applied to the arts. Esthel Vogrig : Coreographer and dancer. Studies contemporary dance
and coreography in México, V ienna and the United States. Director of Los PLatelmintos company. Recipient of the "Grant for
Investigation and Production of Art Works and New Media” from the National Council of the Arts and the Multimedia Center in Mexico.
This grant was used to produce the piece Vistas. Karina Sánchez : Dancer. Studies contemporary dance and coreography in Chile, Spain
and México.
Description of Piece
Diagrams/images are welcome (do not exceed 1 page total).
VISTAS. (2005) Choreography with video, one musician playng live electronics and two dancers with metainstruments interacting with
the music. Divided in three scenes the work is conceptually based in the “self-other” cognitive phenomena inspired by Edgar Morin's idea
of the evolution of society through interdisciplinary interaction. The interdisciplinary feature of the piece is carefully constructed using 2
metainstruments that link the formal elements in a structural way. This metainstruments are two wireless microphones plugged into two
stethoscopes attached to the dancers hands. The movements of the dancers make the microphones generate an amplitude that is
transmitted to the computer and mapped into different music elements. Some live voice participations from the dancers add dramatic
accents to the piece. Vistas is en integral piece in wich the music supports the choreography as well as the choreography gets influenced by
the music. The video supports the scene creating an abstract space that changes and evolves according to the performance. The musical
aesthetic has Noise elements and voice sample manipulation playing with texture and density contrast in a very dynamic way. The
language of the choreography comes from an exploration of the planes in a 3rd dimension space by separate first and united later. The
language is also influenced by the need to achieve the best usage as possible of the metainstrument
385
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Martin Messier
Holding a diploma in drums for jazz interpretation, Martin Messier has completed a bachelor’s
degree in electroacoustic composition at the University of Montreal, and De Montfort University
in England. Recently, Martin has founded a solo project called « et si l’aurore disait oui… »,
through which he develops live electroacoustic performance borrowing stylistic elements from
Intelligent Dance Music, acousmatic and folk. Based on strong aptitudes for rhythm, Martin’s
esthetic can be defined as a complex, left field and happily strange sound amalgam, constantly
playing with construction and deconstruction.
Jacques Poulin-Denis
Jacques Poulin-Denis is active in projects that intersect theater, dance and music. He has
completed his undergraduate studies in electroacoustic composition from the University of
Montreal, and De Montfort University in England. Most of his music was composed for theater
and dance. Jacques explores innovative ways of presenting electro-acoustic music. Jacques’
musical style is evocative and filled with imagery. Combining traditional and electronic
instruments with anecdotic sound sources of everyday life, he creates vibrant music that is
fierce and poetic.
The Pencil Project is about musicianship. Liberated from the computer screen and equipped
with hands-on objects, the performers explore a new form of expressivity. Through an
authentic and stimulating performance, the musicians bring computer music intimately close to
playing an actual musical instrument.
386
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Heretic’s Brew
Stuart Favilla
Bent Leather Band
CTME, OTRL, Victoria University
Melbourne, Australia
sfavilla@bigpond.com
Joanne Cannon
Bent Leather Band
Melbourne, Australia
joanne_cannon@bigpond.com
Tony Hicks
Saxophonist/Multi-Instrumentalist
Melbourne Australia
hixt@optusnet.com.au
Biographical information
Composer/improviser Joanne Cannon, is one of Australia’s leading bassoonists. Although she began her career as a professional
orchestral musician, she now works as a composer and improviser, exploring extended techniques. Stuart Favilla has a background in
composition and improvisation. Together they form the Bent Leather Band, a duo that has been developing experimental electronic
instruments for over twenty years in Australia. Bent Leather Band blurs virtuosity and group improvisation across a visual spectacle of
stunning original instruments. These were made in conjunction with Tasmanian leather artist, Garry Greenwood. The instruments
include fanciful dragon headed Light-Harps, leather Serpents and Monsters that embody sensor interfaces, synthesis and signal
processing technology. Practicable and intuitive instruments, they have been built with multi-parameter control in mind. Joint winners of
the Karl Szucka Preis, their work of Bent Leather has gained selection at Bourges and won the IAWM New Genre Prize.
Inspired by the legacy of Percy Grainger’s Free music, i.e. “music beyond the constraints of conventional pitch and rhythm” [Grainger,
1951], Bent Leather Band has strived to develop a new musical language that exploits the potentials of synthesis/signal processing,
defining new expressive boundaries and dimensions and yet also connecting with a heritage of Grainger’s musical discourse. Grainger
conceived his music towards the end of the 19th Century, and spent in excess of fifty years bringing his ideas to fruition through
composition for theremin ensemble, the development of 6th tone instruments [pianos and klaviers], the development of polyphonic reed
instruments for portamento control and a series of paper roll, score driven electronic oscillator instruments.
Tony Hicks enjoys a high profile reputation as Australia's most versatile woodwind artist. Equally adept on saxophones, flutes and
clarinets, his abilities span a broad spectrum of music genres. A student of Dr Peter Clinch Tony also studied at the Eastman School of
Music. He has performed throughout Australia, and across Europe, the United States, Japan and China with a number of leading
Australian ensembles including the Australian Art Orchestra, Elision, and the Peter Clinch Saxophone Quartet. He has performed
saxophone concertos with the Melbourne Symphony Orchestra, and solo’d for Stevie Wonder and his band. As a jazz artist he has
performed and recorded with leading jazz figures Randy Brecker, Billy Cobham, notable Australian artists, Paul Grabowsky, Joe
Chindamo, David Jones, and also lead a number of important groups in the local Australian scene. An explorer of improvised music, he
consistently collaborates with numerous artists both in Australia and overseas.
Description of Piece
Bent Leather Band introduces their new extended instrument project, Heretics Brew. The
aim of this project is to develop an extended line up with the aim of building a larger
ensemble. So far the project [quintet] has developed a number of new extended saxophone
controllers and is currently working on trumpets and guitars. Their instruments are based on
Gluion OSC, interfaces; programmable frame gate array devices that have multiple
configurable inputs and outputs. For NIME08, the ensemble trio will demonstrate their
instruments, language and techniques through ensemble improvisation.
[Pictured Right: Gluisop extended saxophone]
387
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Mark A. Bokowiec
The Suicided Voice Julie Wilson-Bokowiec
University of Huddersfield EDT
School of Music & Humanities www.bodycoder.com
Queensgate, Huddersfield. HD1 3DH
00 44 (0) 1484 472004 00 44 (1) 484 513158
m.a.bokowiec@hud.ac.uk juliebokowiec@yahoo.com
388
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Etch
Mark A. Bokowiec Julie Wilson-Bokowiec
University of Huddersfield EDT
School of Music & Humanities www.bodycoder.com
Queensgate, Huddersfield. HD1 3DH
00 44 (0) 1484 472004 00 44 (1) 484 513158
m.a.bokowiec@hud.ac.uk juliebokowiec@yahoo.com
Etch
(for performer/vocalist, the Bodycoder System, live MSP & computer graphics)
Etch is the third work in the Vox Circuit Trilogy (2007). In Etch extended vocal techniques, Yakut and Bell Canto singing, are coupled
with live interactive sound processing and manipulation. Etch calls forth fauna, building soundscapes of glitch infestations, howler tones,
clustering sonic-amphibians, and swirling flocks of synthetic granular flyers. All sounds are derived from the live acoustic voice of the
performer. There are no pre-recorded soundfiles used in this piece and no sound manipulation external to the performer’s control. The
ability to initiate, embody and manipulate both the acoustic sound and multiple layers of processed sound manipulated simultaneously on
the limbs – requires a unique kind of perceptual, physical and aural precision. This is particularly evident at moments when the source
vocal articulates of the performer, unheard in the diffused soundscape, enter as seemingly phantom sound cells pitch-changed, fractured
and heavily processed. In such instances the sung score, and the diffused and physically manipulated soundscape seem to separate and the
performer is seen working in counterpoint, articulating an unheard score. Etch is punctuated by such separations and correlations, by choric
expansions, intricate micro constructions and moments when the acoustic voice of the performer soars over and through the soundscape.
Although the Bodycoder interface configuration for Etch is similar to that of The Suicided Voice, located on the upper torso - the
functional protocols and qualities of physical expressivity are completely different. Interface flexibility is a key feature of the Bodycoder
System and allows for the development of interactive works unrestrained by interface limitations or fixed protocols. The flexibility of the
interface does however present a number of challenges for the performer who must be able to adapt to new protocols, adjust and temper her
physical expressivity to the requirements of each piece.
The visual content of both Etch and The Suicided Voice was created in a variety of 2D and 3D packages using original photographic and
video material. Images are processed and manipulated using the same interactive protocols that govern sound manipulation. Content and
processing is mapped to the physical gestures of the performer. As the performer conjures extraordinary voices out of the digital realm, so
she weaves a multi-layered visual environment combining sound, gesture and image to form a powerful ‘linguistic intent’.
Etch was created in residency at the Confederation Centre for the Arts on Prince Edward Island, Nova Scotia in June 2007.
389
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Thomas Ciufo
Smith College
Seelye Hall, Room B1
Northampton, MA 01063 USA
413-585-3435
tciufo@smith.edu
Biographical information
Thomas Ciufo is an improviser, sound / media artist, and researcher working primarily in the areas of electroacoustic
improvisational performance and hybrid instrument / interactive systems design, and is currently serving as artist-in-
residence in Arts and Technology at Smith College. Recent and ongoing sound works include, three meditations, for
prepared piano and computer, the series, sonic improvisations #N, and eighth nerve, an improvisational piece for prepared
electric guitar and computer. Recent performances include off-ICMC in Barcelona, Visione Sonoras in Mexico City, the
SPARK festival in Minneapolis, the International Society for Improvised Music conference in Ann Arbor, and the Enaction
in Arts conference in Grenoble.
Description of Piece
Silent Movies: an improvisational sound / image performance
Silent Movies is an attempt to explore and confront some of the possible relationships / interdependencies between visual and
sonic perception. In collaboration with a variety of moving image artists, this performance piece complicates visual
engagement through performed / improvised sound. In a sense, Silent Movies plays with the live soundtrack idea, but from a
somewhat different vantage point. Or maybe it is an inversion; a visual accompaniment to an improvised sonic landscape?
For this performance, I will use a hybrid extended electric guitar / computer performance system, which allows me to explore
extended playing techniques and sonic transformations provided by sensor controlled interactive digital signal processing.
For tonight's performance, the moving image composition is by Mark Domino (fieldform.com).
390
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Alison Rootberg
Artistic Director - Kinesthetech Sense
5009 Woodman Ave #303
Sherman Oaks, CA 91423
847-209-8116
arootberg@gmail.com
Margaret Schedel
Assistant Professor - Stony Brook University
PO Box 1137
Sound Beach, NY 11789
415-246-1096
gem@schedel.net
Biographical information
Kinesthetech Sense was founded by Alison Rootberg and Margaret Schedel in 2006
with the intent to collaborate with visual artists, dancers, and musicians, creating
ferociously interactive experiences for audiences throughout the world. Rootberg, the
Vice President of Programming for the Dance Resource Center, focuses on
incorporating dance with video while Schedel, an assistant professor of music at Stony
Brook University, combines audio with interactive technologies. Oskar Fischinger once
said that, "everything in the world has its own spirit which can be released by setting it
in motion." Together Rootberg and Schedel create systems which are set in motion by
artistic input, facilitating interplay between computers and humans. Kinesthetech Sense
has had their work presented throughout the US, Canada, Denmark, Germany, Italy,
and Mexico. For more info, please go to: www.ksense.org
Description of Piece
391
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
georg.essl@telekom.de
ge@ccrma.stanford.edu henri.penttinen@hut.fi
Biographical information
Ge Wang received his B.S. in Computer Science in 2000 from Duke University, PhD (soon) in Computer Science (advisor Perry Cook) in
2008 from Princeton University, and is currently an assistant professor at Stanford University in the Center for Computer Research in
Music and Acoustics (CCRMA). His research interests include interactive software systems (of all sizes) for computer music, programming
languages, sound synthesis and analysis, music information retrieval, new performance ensembles (e.g., laptop orchestra) and paradigms
(e.g., live coding), visualization, interfaces for human-computer interaction, interactive audio over networks, and methodologies for
education at the intersection of computer science and music. Ge is the chief architect of the ChucK audio programming language and the
Audicle environment. He was a founding developer and co-director of the Princeton Laptop Orchestra (PLOrk), the founder and director of
the Stanford Laptop Orchestra (SLOrk), and a co-creator of the TAPESTREA sound design environment. Ge composes and performs via
various electro-acoustic and computer-mediated means, including with PLOrk/SLOrk, with Perry as a live coding duo, and with Princeton
graduate student and comrade Rebecca Fiebrink in a duo exploring new performance paradigms, cool audio software, and great food.
Georg Essl is currently Senior Research Scientist at Deutsche Telekom Laboratories at TU-Berlin, Germany. He works on mobile
interaction, new interfaces for musical expression and sound synthesis algorithms that are abstract mathematical or physical models. After
he received his Ph.D. in Computer Science at Princeton University under the supervision of Perry Cook he served on the faculty of the
University of Florida and worked at the MIT Media Lab Europe in Dublin before joining T-Labs.
Henri Penttinen was born in Espoo, Finland, in 1975. He completed his M.Sc. and PhD (Dr. Tech.) degrees in Electrical Engineering at the
Helsinki University of Technology (TKK) in 2002 and 2006, respectively. He conducted his studies and teaches about digital signal
processors and audio processing at the Department of Signal Processing and Acoustics (until 2007 known as Laboratory of Acoustics and
Signal Processing) at TKK. Dr. Penttinen was a visiting scholar at Center for Computer Research in Music and Acoustics (CCRMA),
Stanford University, during 2007 and 2008. His main research interests are sound synthesis, signal processing algorithms, musical
acoustics, real-time audio applications in mobile environments. He is one of the co-founders and directors, with Georg Essl and Ge Wang,
of the Mobile Phone Orchestra of CCRMA (MoPhO). He is also the co-inventor, with Jaakko Prättälä, of the electro-acoustic bottle
(eBottle). His electro-acoustic pieces have been performed around Finland, in the USA, and Cuba.
Additional Composer Biography: Jeffrey Cooper is a musician / producer from Bryan, Texas. Having worked as a programmer and DJ for
a number of years, he is currently finishing a Master Degree in Music, Science, and Technology at Stanford University / CCRMA. Co-
composer of music for mobile phones with the honorable Henri Penttinen.
Description of Piece
The Mobile Phone Orchestra is a new repetoire-based ensemble using mobile phones as the primary musical instrument.
The MoPhO Suite contains a selection of recent compositions that highlights different aspects of what it means to compose for and perform
with such an instrument in an ensemble setting. Brief program note: The Mobile Phone Orchestra of CCRMA (MoPhO) presents an
ensemble suite featuring music performed on mobile phones. Far beyond ring-tones, these interactive musical works take advantage of the
unique technological capabilities of today's hardware, transforming phone keypads, built-in accelerometers, and built-in microphones into
powerful and yet mobile chamber meta-instruments. The suite consists of selection of representative pieces:
***Drone In/Drone Out (Ge Wang): human players, mobile phones, FM timbres, accelerometers.
***TamaG (Georg Essl): TamaG is a piece that explores the boundary of projecting the humane onto mobile devices and at the same time
display the fact that they are deeply mechanical and artificial. It explores the question how much control we have in the interaction with
these devices or if the device itself at times controls us. The piece work with the tension between these positions and crosses the desirable
and the alarming, the human voice with mechanical noise. The alarming effect has a social quality and spreads between the performers. The
sounding algorithm is the non-linear circle map which is used in easier-to-control and hard-to-control regimes to evoke the effects of
control and desirability on the one hand the the loss of control and mechanistic function on the other hand.
***The Phones and Fury (Jeff Cooper and Henri Penttinen): how much damage can a single player do with 10 mobile phones? Facilitating
loops, controllable playback speed, and solo instruments.
***Chatter (Ge Wang): the audience is placed in the middle of a web of conversations...
392
Club Performances
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Traces/Huellas
for flute and electronics
Jane Rigler
25 Monroe Street
Brooklyn, NY 11238 USA
1.917.826.9608
Biographical information
Jane Rigler, flutist, composer, educator and curator is known for her innovations in new flute performance, techniques and unique musical
vocabulary. She is a featured performer in contemporary music festivals throughout the U.S. and Europe as a soloist as well as within
chamber ensembles (Ensemble Plural, Either/Or, Ne(x)tworks, Ensemble Sospeso, Anthony Braxton 12tet, etc.). Besides premiering works
written especially for her, Jane’s compositions cover simple solo acoustic pieces inspired by language to complex interactive electronic
works that pay homage to painting, poetry and dance. After receiving a B.M. (Northwestern University) and then pursuing flute studies in
various parts of Europe and North America, she gained her M.A. and Ph.D. (UC San Diego) completing The Vocalization of the Flute, a
book demonstrating both new and ancient methods of singing-while-playing the flute. Her expertise has led to performances in
contemporary operas, experimental theater and dance events as well as other interactive electronic festivals. Her compositions are sought
after by other flutists and have been performed in South Korea, Australia, France, Spain, and in concert halls and universities throughout
the U.S.
After living in Spain for 9 years, Jane resides in Brooklyn, NY and organizes events such as the Relay~NYC! held at MoMA, the
Spontaneous Music Festival, and collaborated with the Conflux Festival in 2007. She has received several Brooklyn Arts Council grants
for her community activities, a Global Connections grant to perform in Munich this year and was awarded several artist residencies for her
interactive electronic works from Harvestworks Studios, Art Omi and RPI’s Create @ iEar Studios.
Description of Piece
Traces/Huellas, a quadraphonic work for flute and computer, is inspired by the ancient storytelling tradition where a bard (the flutist) uses
the voice, motion and gesture to convey characters within a story in order to orally pass on the teachings of a culture. Designed using the
interactive computer program Max/MSP, this work incorporates a tiny triggering device strategically placed on the flute so the performer
can control the timing of the work, the sound processing and the distribution of sound through the space in real-time. The spatialization of
sound provides room for the sonic textures to move and interact to each other offering clues to the characters and situations within the
story. Although technology is essential to the music and story, its concealment attempts to give the appearance of a free-moving acoustic
instrumentalist-as-storyteller. In this way, Jane’s flute language merges into an organic electronic one creating a compelling musical
narration. Through the poetics of a unique vocal/flute/electronic/physical language, Traces/Huellas’ musical journey re-creates the
traditional story of the hero quest, her transformation and the consequences of this journey.
395
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Drawing / Dance
Biographical information
Renaud Chabrier was born in 1974. After receiving a master’s degree in physics, computer sciences and cognitive sciences, he has devoted
his research to the perception of movement through drawing. He has since become a dancer at KMK street theatre company and a
children’s book illustrator. He develops animation techniques and movement analysis tools for both performance and education purposes.
Antonio Caporilli is a dancer, performer, videomaker and installation artist. His researches on movement and improvisation have led him to
experiment with conventional and unconventional spaces, collaborating with people from different backgrounds, such as Georgio Strehler,
Robert Wilson and Bill T. Jones.
Description of Piece
"Drawing / Dance" shows both the making of a short animation movie and its interpretation as a dance solo.
This performance is based on a custom software for real-time animation, on the basis of handmade dance sketches drawn on paper.
Thanks to this tool, real-time drawing can be used in a choreographic way, interacting in various manners with a live dance performance.
Real-time animations can thus either lead the dancer’s movements or follow them as the drawer sketches him while he dances.
A short dance solo will be created in advance, in relation with animated sequences. The dialogue between those sequences, projected on a
screen, and the solo dance will constitute the heart of the performance.
Some simple animations and dance movements will be also improvised in real-time, as an opening and a conclusion to the piece. Those
improvisations are an insight on the choreographical creative process.
A public rehearsal for this performance, based on drawing and dance improvisations, could also be organised as an installation (duration 3
hours).
The scene : video images are generated from the laptop Example of sketches
396
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
RADIO WONDERLAND
Joshua Fried
Independent Artist
th
277 N. 7 Street, Apt. 4R
Brooklyn, NY 11211 USA
+1 718-599-3414
composer@acedsl.com
Biographical information
Joshua Fried's unique profile spans experimental music, electronic dance music, performance, rock and pop. He has performed solo at
Lincoln Center, The Kitchen, CBGB, a Stuttgart disco, a former East Village bathhouse, a Tokyo museum, and the Royal Palace of
Holland; art rock guitar giant Fred Frith soloed on Fried's first solo disk, and Fried has produced or co-produced records by artists as
diverse as They Might Be Giants, Chaka Khan and avant-drone master David First. He is a recipient of numerous awards including two
New York Foundation for the Arts (NYFA) Fellowships, a National Endowment for The Arts (NEA) Composer's Fellowship and artist
residencies at MacDowell, Yaddo, VCCA, Djerassi and the Rockefeller Foundation's Bellagio Center on Lake Como, Italy. Fried won two
large commissions from American Composers Forum: to create live music for Douglas Dunn & Dancers, and to compose for the robotic
instruments of New York's League Of Electronic Musical Urban Robots (LEMUR). Joshua Fried is the youngest composer to appear in
Schirmer Books' American Music in the 20th Century.
Description of Piece
Diagrams/images are welcome (do not exceed 1 page total).
RADIO WONDERLAND turns live commercial FM radio into recombinant funk. All sounds originate from an old boombox, playing
radio LIVE. All processing is live, programmed by me in MaxMSP. But I hardly touch the laptop. My controllers are a vintage Buick
steering wheel, old shoes mounted on stands, and some gizmos. You'll hear me build grooves, step by step, out of recognizable radio, and
even UN-wind my grooves back to the original radio source. I want to show that we ALL can interrupt and interrogate the endless flow. So
my transformations, taken individually, must be clear and simple--mostly framing, repeating and changing pitch--although when put
together the whole is indeed complex. My controllers are simple too: the wheel merely a knob to make things go up and down (frequency,
tempo) or play radio loops like a turntable, the shoes merely pads to hit softer or louder. The surreality of these ordinary objects
underscores the absurd disconnect between digital controller and sound, as well as the congenial nature of the aural transformations
themselves. So, too, my riffs must be vernacular and not elite. (We need the funk.)
397
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
,OVXRQRLQFDXVDWRLPSURYLVHDFWLRQIRUVXVSHQGHG
FODULQHWFODULQHWWLVWDQGHOHFWURQLFV
6LOYLD/DQ]DORQH
VLOYLDODQ]DORQH#FUPPXVLFLW
%LRJUDSKLFDOLQIRUPDWLRQRIWKHDXWKRU
6LOYLD /DQ]DORQH 6DOHUQR IODXWLVW DQG FRPSRVHU VWXGLHG IOXWH ZLWK (QULFR 5HQQD $QQDPDULD 0RULQL DQG 3HWHU/XNDV *UDI
FRPSRVLWLRQZLWK0DXUR&DUGLDQG*XLGR%DJJLDQLHOHFWURQLFPXVLFDQGFRPSRVLWLRQZLWK0LFKHODQJHOR/XSRQHDQG*LRUJLR1RWWROL
6KHFROODERUDWHVIURPZLWK&50&HQWUR5LFHUFKH0XVLFDOL5RPHDVPXVLFDVVLVWDQW
,Q VKH ZRQ WKH WK HGLWLRQ RI WKH ,QWHUQDWLRQDO 3UL]H RI &RPSRVLWLRQ ³4XDUDQW¶DQQL QHO ´ LQVWLWXWHG E\ &(0$7 &HQWUL
0XVLFDOL$WWUH]]DWLIRUWKHUHDOL]DWLRQRIDZRUNRIPXVLFDOWKHDWHUIRUFKLOGUHQ
,QKHUZRUN,OVXRQRLQFDXVDWRLPSURYLVHDFWLRQIRUVXVSHQGHGFODULQHWFODULQHWWLVWDQGHOHFWURQLFVZRQWKH)LUVW3UL]HH[
DHTXR RI WKH ,QWHUQDWLRQDO 3UL]H RI &RPSRVLWLRQ ³)UDQFR (YDQJHOLVWL´ DQG LV SXEOLVKHG E\ 6XYLQL =HUERQL ([HFXWLRQV RI VDPH
FRPSRVLWLRQVDUHSXEOLVKHGRQ&'E\$UV3XEOLFD
6KH KDV SXEOLVKHG DQDO\WLFDO DUWLFOHV IRU VSHFLDOLVW SHULRGLFDOV LQFOXGLQJ ³2UJDQL]HG 6RXQG´ ,QWHUQDWLRQDO -RXUQDO RI 0XVLF DQG
7HFKQRORJ\&DPEULGJH8QLYHUVLW\3UHVVDQG³6\ULQ[´$,)
+HUFRPSRVLWLRQVH[HFXWHGLQ,WDO\DQGDEURDGDUHRULHQWDWHGWRZDUGVH[SHULPHQWDWLRQDQGUHVHDUFKLQWRQHZH[SUHVVLYHDQGOLQJXLVWLF
VROXWLRQVDQGDUHUHDOL]HGZLWKLQIRUPDWLRQWHFKQRORJLHVZKLFKSHUPLWWKHSURFHVVLQJRIVRXQGLQUHDOWLPH'XULQJUHFHQW\HDUVVKHKDV
EHHQSULQFLSDOO\LQWHUHVWHGLQJHVWXUDOSRVVLELOLWLHVLPSURYYLVDWLRQDQGLQDGOLEFUHDWLRQZLWKWKHFRPSXWHUWKDWLVRIWHQLQWHJUDWHGZLWK
LQWHUDFWLYHLQVWDOODWLRQV
%LRJUDSKLFDOLQIRUPDWLRQRIWKHSHUIRUPHU
0DVVLPR0XQDUL5RPDJUDGXDWHGLQFODULQHWDW&RQVHUYDWRU\$&DVHOOD/¶$TXLODVWXGL\QJZLWK&7DGGHLDQG,YR0HFFROL
+H VWXGLHG FRPSRVLWLRQ ZLWK 0 *DEULHOL 5 6DQWRERQL DQG 'DYLG 0DFFXOL DQG KDV IROORZHG VSHFLDOL]DWLRQ FRXUVHV DW $FFDGHPLD
0XVLFDOH &KLJLDQD RI 6LHQD ZLWK )UDQFR 'RQDWRQL DQG DW 6FXROD GL 0XVLFD GL )LHVROH ZLWK *LDFRPR 0DQ]RQL +H VWXGLHG (OHFWURQLF
0XVLF ZLWK 5LFFDUGR %LDQFKLQL DQG *LRUJLR 1RWWROL DW &RQVHUYDWRU\ 6DQWD &HFLOLD RI 5RP+HKDVSHUIRUPHGDVVRORLVWLQFKDPEHU
HQVHPEOHDQGLQRUFKHVWUD$WWKHPRPHQWKHFROODERUDWHVZLWKYDULRXVFKDPEHUHQVHPEOHVDV0R]DUWHQVHPEOH1DEODHQVHPEOH
WKH HQVHPEOH &HFLOLD HOHWWULFD WKH 'RPDQL 0XVLFD HQVHPEOH DQG HQVHPEOH $OJRULWPR +H KDV SHUIRUPHG LQ YDULRXV LWDOLDQ
IHVWLYDOV )HVWLYDO LQFRQWUL PXVLFDOL QHO /D]LR )HVWLYDO GHOOH 5RFFKH 'RPDQL 0XVLFD IHVWLYDO 1XRYD &RQVRQDQ]D 0XVLFD
9HUWLFDOH 1XRYH IRUPH VRQRUH $XWXQQR PXVLFDOH 8QLYHUVLW\ RI &DVHUWD 6WDJLRQH FRQFHUWLVWLFD GHOO¶$FFDGHPLD 0XVLFDOH
3HVFDUHVH &DPSXV ,QWHUQD]LRQDOH GL 0XVLFD GL /DWLQD +H FROODERUDWHG DV VRORLVW ZLWK 2UFKHVWUD 6LQIRQLFD RI 3HVFDUD +LV
SHUIRUPDQFHVZHUHWUDQVPLWWHGE\5DGLR9DWLFDQD5DGLR7UHDQG5DL7UDGH,QKHZRQWKHGLSORPDRIVSHFLDOUHJDUGDWFRPSHWLWLRQ
RI FRPSRVLWLRQ 3 %DUVDFFKL GL 9LDUHJJLR DQG LQ KH ZRQ WKH , DZDUG DW FRPSHWLWLRQ 6 &LDQL RI 6LHQD LQ WKH MXU\ /XFLDQR
%HULR
'HVFULSWLRQRI3LHFH
,O VXRQR LQFDXVDWR LPSURYLVHDFWLRQ IRU VXVSHQGHG FODULQHW FODULQHWWLVW DQG HOHFWURQLFV
LV D SLHFH ZKLFK LQYHVWLJDWHV WKH UHODWLRQVKLS EHWZHHQ WKH SHUIRUPHU DQG KLVKHU
LQVWUXPHQW LQ D ZD\ WKDW JRHV EH\RQG WKH XVXDO SURGXFWLRQ RI VRXQG , ZDV SURPSWHG E\
FHUWDLQFRQVLGHUDWLRQVDERXWWKHSULQFLSDORIFDXVDOLW\WRWU\RXWDV\VWHPWKDWZRXOGUHOHDVH
WKHFODULQHWIURPWKHWUDGLWLRQDOH[FLWDWLRQRIWKHDLUFROXPQSURGXFHGE\WKHFODULQHWWLVWE\
PHDQV RI WKH UHHG ,Q WKLV SLHFH WKH XQFDXVHG VRXQG LV QRW WKH UHVXOW RI WKH V\VWHP RI
YLEUDWLRQWKHFODULQHWEXWWKHFDXVHWKHDFWWKDWDOORZVXVWRH[SORUHLWVDFRXVWLFV
7KH XQFDXVHG LV DFFRUGLQJ WR PHGLHYDO $ULVWRWHOLVP WKH ³ILUVW LPPRELOH PRWRU´ WKH SXUH DFW WKH ILUVW FDXVH RI PRYHPHQW WKDW RI
QHFHVVLW\JDYHULVHWRWKHZRUOGDQGLWVFDXVDOVHTXHQFHV7KHXQFDXVHGVRXQGLVDOVRLQ,QGLDQFXOWXUH³$QkKDWD´WKHIRXUWKFKDNUDWKH
DERGHRISULPLJHQLDOVRXQG
398
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Mag. art. Luka Dekleva, in 2000 finished his studies in fine art photography at FAMU Prague in 2000. Works as a
freelance photographer, vj performer and multi media artist. Recently he has begun to explore the relation of image to
sound, trough interactive installations and performances. Luka Prinčič works in the field of sound and programming as a
performer and composer. He is an artist, web-developer, dj, writer, critic, reverse engineer, part-time hacker and open
source agent. Miha Ciglar is a composer and sound artist currently studying at the University of Music and Dramatic Arts
in Graz, Austria. His subject of high concern and priority is the problem of absolute awareness of sonic perception which
is directly connected with the question of existential legitimacy of sound art.
FeedForward Cinema
Audio/Video composition for three performers.
Closed information floows, become networks for interaction. Three A/V instruments are integrated to produce a resonant
and harmonic experience and a possibility to interact for the artists. A bond where all layers of the composition are
affected by a single change, be it image creating a audio signal or the other way round. The inherent glitch of used
devices exposes their fragile nature and turns them into instruments of expression. The performers, choose to limit their
expressive posibilities and submit a part of their „digital freedom“. By doing so the narow space of expression, that such a
fragile instrument has, becomes exposed to outside manipulation. Rather as three seperate instruments, FeedForward
Cinema combines one instrument for three performers.
399
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Hannah Drayson is an artist and a research student. With an interest in science and
technology, and their integration into lived reality. She uses media ranging from web and
graphic design to visual performance, video and digital audio production.
Koray Tahiroglu is with the University of Art and Design Helsinki, Finland. He is a sound
artist, performer and a researcher who grew up in Istanbul. He has been performing noise
and electronic music collaborating with dierent sound artists and performers as well as
with solo performances.
Miguel Ortiz Perez is a mexicancomposer and sound artist based in Belfast. Born in
Hermosillo Sonora, he has been involved in a vast range of activities related to modern
music and sound art. He has worked professionally as a composer, sound engineer,
lecturer, score editor, promoter and sound designer.
Thomas Greg Corcoran lives and works in Dublin, Ireland. Previously of a math/science
background he now practices painting, drawing, and computer-based art.
Description of Piece
The members of the Control Group play melodic improvisation derived from the
physiological signals of their bodies.Communicating via gestures of their nervous systems
the group plays an improvisation within descriptive spaces of sound and visual media.
This is an audio-visual performance based upon the idea of exposition of the community
of bodies outside the realm of everyday body language.The performers interact with their
own internal rhythms in a feedback loop by observing the data-made-
audiovisual.Performers play one person per virtual instrument, each with their own
distinctive sound space, sparse enough to allow unconfused communication: I.e. Both the
audience and the performers know who is playing what.Components of the sounds used
include biometricbodilysound samples (such as the sound of breath and beat of heart),
sonified data streams either directly rendered or mediated by processes such as physical
models, and further interpretive sounds.
This piece is the result of a collaboration between active researchers and artists in the area
of bio-music performance and the rendering of sound to physiological control data.
400
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Cent Voies
Nicolas d’Alessandro
Information Technology Group of
Faculté Polytechnique de Mons
31 Boulevard Dolez
B-7000 Mons (Belgium)
(+32) (0)65 37 47 94
nicolas.dalessandro@fpms.ac.be
Biographical information
Nicolas d'Alessandro – holds an Electrical Engineering degree from the Polytechnic Faculty of Mons (Belgium) since 2004. He achieved
his master thesis in the Faculty of Music of the University of Montreal, collaborating with Prof. Caroline Traube on the measurement of
perceptual analogies between guitar and voice sounds. For the last 3 years he work in the Information Technology Group of the
Polytechnic Faculty of Mons (supervisor: Prof. Thierry Dutoit) as a PhD student. He mainly works on expressive gestural control of sound
production, and more precisely digital instruments achieving voice (speech/singing) synthesis. He recently proposed a tablet-based digital
instrument devoted to the realtime manipulation of voice materials (live and synthetic) – called the HandSketch – and presented during
NIME 2007.
Description of Piece
It is relatively straightforward to think that if a musical instrument is not played, it can not properly evolve or even does not exists. The
HandSketch (cf. Fig. 1) is a bi-manual digital instrument (controller and synthesizer) focused on the expressive control of incoming and
synthetic voice material. In order to be played, this instrument had thus to be “excited” by some composing work. From a purely technical
point of view, “Cent Voies” explores the field of possibilities in gestures and sounds that the HandSketch can produce. Even if the singing
synthesis mappings are exploited, main features that have been updated since its first presentation during NIME 2007 e.g. the
implementation of expressive interactions between the performer’s own voice characteristics and bi-manual gestural control. The most
relevant example is the concurrent control of the intonations of virtual locators which is both the result of the performer’s voice and
external “sketches”.
The piece explores the boundaries and overlapping regions between narration and music. Part of the incoming material is the performer’s
voice but the intonation is deformed in order to exacerbate musical aspects of speech prosody. Then the multiple virtual locutors initiated
by the performer become autonomous. By this way, a conversational confusion is created. This confusion is increased by the fact that the
voices progressively loose their human aspects (disappearing consonants, voicing, then timbral coherence). On the top of this phenomenon,
a virtual singer appears, where phrases and voice quality variations are completely due to hand movements through the tablet and
embedded force sensors. It brings the spectator of this confusing conversation in a much more intimate context.
This piece also wants to serve as a contextual work. Indeed, talking about convergence between narrative and musical materials drives us to
the typical context of singing, which is actually an “acoustic” way of seeing it. It can also meet some needs in the world of theatre
performances where the “music” of sentences is really meaningful for actors and authors, but this work often stands on traditional
references. This is precisely in the need of finding new paths between these two ways of expression (narration and music) that the design of
digital musical instruments is really meaningful. A contemporary instrument replying to a contemporary need: the hybridation of practices.
401
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Cléo Palacio-Quintin
Constantly seeking new means of expression and eager to create, the flutist-improviser-composer Cléo Palacio-Quintin takes part in many
premieres as well as improvisational multidisciplinary performances, and composes instrumental and electroacoustic music for various
ensembles and media works. Since 1999, she extended these explorations into the development of a new instrument: the hyper-flute.
Interfaced to a computer and software by means of electronic sensors, the enhanced flute enables her to compose novel electroacoustic
soundscapes. She is now pursuing doctoral studies in Montreal to compose new works for the hyper-flute and for her new hyper-bass-flute.
Sylvain Pohu
Composer, improviser and guitarist, Sylvain Pohu is a founding member of the contemporary jazz ensemble [iks] and, since 2007, is also
its artistic director. As an electroacoustic composer and improviser, Sylvain Pohu has participated in numerous festivals. Another
dimension of Sylvain's work is the design and production of sound installations and vidéomusique pieces. Parallel to the aforementioned
activities, Sylvain is currently researching the role of improvisation in the compositional process and in real-time processing in conjunction
with a master's degree whose aim is to explore the expressive possibilities of improvised electroacoustic music.
Description of Piece
Composers Cléo Palacio-Quintin and Sylvain Pohu are both skilled and experienced performers-improvisers. They have known each other
since years as collegues at the Université de Montréal, but never got the chance to share a stage. However, they always shared their passion
for improvisation and electroacoustic music.
Each of them is working on a new interface to perform live electronics together with their traditional instrument (flute and electric guitar).
In the development of their respective computer interfaces to perform improvised electroacoustic music, they are concerned with the same
issues. They both focus on the research of expressivity and freedom while performing on an augmented instrument.
NIME-08 give them now an opportunity to confront and merge their extended sonic worlds. No doubts, its going to be a challenging trip
for your ears!
402
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Nicolas d'Alessandro – holds an Electrical Engineering degree from the Polytechnic Faculty of Mons (Belgium) since 2004. He did his
master thesis in the Faculty of Music of the University of Montreal, collaborating with Prof. Caroline Traube on the measurement of
perceptual analogies between guitar and voice sounds. For the last 3 years he work in the Information Technology Department of
Polytech.Mons (supervisor: Prof. Thierry Dutoit) as a PhD student. He mainly works on expressive gestural control of sound production,
and more precisely digital instruments achieving voice (speech/singing) synthesis. He recently proposed (for NIME'07) a digital instrument
devoted to the realtime manipulation of singing voice contents, the HandSketch.
Composer, improviser and guitarist, Sylvain Pohu is a founding member of the contemporary jazz ensemble [iks] and, since 2007, is also
its artistic director. As an electroacoustic composer and improviser, Sylvain Pohu has participated in numerous festivals. Another
dimension of Sylvain's work is the design and production of sound installations and videomusique pieces. Parallel to the aforementioned
activities, Sylvain is currently researching the role of improvisation in the compositional process and in real-time processing in conjunction
with a master's degree whose aim is to explore the expressive possibilities of improvised electroacoustic music.
Description of Piece
This piece presents a collaboration between two people interested in pushing forward instrumental control of sounds through improvisation.
On the one hand, the HandSketch is a fully invented (controller and synthesizer) bi-manual digital instrument devoted to the expressive
control of incoming and synthetic voice materials (speech and singing). It comes from the will of a PhD student (Nicolas d'Alessandro) in
signal/information processing to develop gestural interaction with narrative contents, in order to manipulate them ouside usual boudaries,
and make them becoming music. On the other hand, there is the will of a guitarist (Sylvain Pohu) to extend the possibilities of his six
strings in order to improvise on a wider range of sounds that will create electroacoustics. Each of them is working on new interfaces and
new strategies to perform and improvise live electronics. They meet together in this work, starting from two completely different points.
Figure 1. Presentation of Nicolas d'Alessandro's (left) and Sylvain Pohu's (right) setups. The HandSketch (left) is made of a Wacom tablet,
FSR sensors and a custom voice processor/synthesizer. The guitar is extended with MIDI knobs, faders and pedals and live processing.
403
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Ajay Kapur is the Music Technology Director at California Institute of the Arts. He received an Interdisciplinary Ph.D. in 2007 from
University of Victoria combining Computer Science, Electrical Engineering, Mechanical Engineering, Music and Psychology with a
focus on Intelligent Music and Media Technology. Ajay graduated with a Bachelor in Science and Engineering Computer Science
degree from Princeton University in 2002. He has been educated by music technology leaders including Dr. Perry R. Cook, Dr.
George Tzanetakis, and Dr. Andrew Schloss, combined with mentorship from robotic musical instrument sculptors Eric Singer and
the world famous Trimpin. A musician at heart, trained on Drumset, Tabla, Sitar and other percussion instruments from around the
world, Ajay strives to push the technological barrier in order to make new music.
Description of Piece
Blending Indian Classical knowledge with the 21st century music scene, by adopting the age of computer human interface. Custom
made Electronic Sitar with wearable sensors controlling modular software systems used to bring dance and tribal groove to the next
dimension. 21st century music for the renaissance audience member, bringing the fields of human-computer interfaces together with
custom software design for space age sonic mosaics.
The Electronic Sitar (ESitar) is a custom built hyperinstrument that captures performance gestures for musical analysis and real-
time musical expression. It has sensors that help deduce rhythmic strumming information, fret detection, and tilt of the neck of the
sitar in 3-axises. All sensor data is converted to MIDI messages and used simultaneously with the audio data from the humbucker
pickup to make new sounds and atmospheres.
The KiOm is a custom built wearable instrument that converts accelerometer data to MIDI messages. It is used on the performer’s
head during performance to aid in sound design.
404
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Redshift
Jonathan Pak
124A Johnston St
Fitzroy, VIC 3065 Australia
+613 94171775
jon@pak.id.au
Biographical information
Jonathan Pak is an Australian based electronic musician, new media artist and technology innovator currently interested in humanising
technology to create interactive art through intuitive hardware and software design. In this endeavour he created the Light Matrix interface
subsequently presented at NIME ’06. Jon performs his music regularly in both solo and collaborative work including performances at the
Melbourne International Arts Festival (2007), Melbourne Electronic Music Festival: Electundra (2004/2005) and a collaborative work with
contemporary music group Re-sound entitled Ungrounded (2005). Other works include installation pieces such as Sonic Feast (2007),
Virtual Vandalism (2007) and Comic Effect (2008).
Description of Piece
Redshift is an exploration of sound, light and movement within the intimate space beneath the hands of a solitary performer. The
expressive potential of the human hand is harnessed through the reflection of computer-controlled light patterns that are translated into
music.
The centrepiece of this work is the Light Matrix interface: a device consisting of an array of high intensity red LEDs in a bi-directional
configuration acting as both photosources and photosensors. Light intensity measurements are used to manipulate a range of powerful and,
by nature, parameter heavy software synthesisers and effects enabling the performer to sculpt sound timbre with their hands. The changing
light patterns are driven by a sequencer and also derived from the attributes of the resulting sound. This in turn alters the amount of light
reflected from the performer’s hands resulting in a complex and potentially chaotic interplay. In this way the audience can engage with
some of the elements of electronic music such as patterns and automated parameters that traditionally remain hidden during performance.
405
Installations
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Habitat
Olly Farshi
Goldsmiths College
Electronic Music Studios
Department of Music, Goldsmiths,
University of London, New Cross,
London SE14 6NW
+44 (0) 7725 413 088
olly@ollyandjeremy.com
Biographical information
Olly Farshi is a British Sound Artist and Composer, currently residing in Jyväskylä, Finland. He is currently completing his debut
album for Anglo-Canadian record/media-label CocoSolidCiti, due for release in late 2008. As a Sound Artist, Farshi's explorations
concern playful interaction, digital communication and, in an ongoing piece of research entitled iRedux, notions of intangible
property and connected life-styles. Farshi has been involved with a variety of festivals, venues, exhibitions, conferences and
collaborative projects, including: Resfest Austria, Foldback, Mediaterra Greece, FutureSonic, Commonwealth Film Festival,
Defunktion.net, Shunt Gallery London, Salone Internazionale del Mobile, CocoSolidCiti, KunstForum.
Description of Piece
Habitat is an ambient sound installation designed to be exhibited within a public city-space. The installation generates a sonic
landscape – a generative wildlife habitat – based upon how populated the installation space is. The objective of Habitat is to
encourage those who experience it to consider the installation space and, by association, the city-space differently; considering the
impact of commercialisation, gentrification and industrialisation on these public spaces which, at one point in time, were not so
densely populated.
Habitat displaces the default sonic landscape – pedestrians traversing the city-space, commercial ambience etc. - overlaying a
generative ambience constructed entirely out of wildlife and field-recordings. As the population shifts within the physical
installation space – transient individuals enter and leave – the sonic landscape generated by Habitat reflects these changes: various
animals and exotic wildlife emerge, flock and interact with each other, resulting in a vibrant and enchanting ambience.
Using Bluetooth, the Habitat software polls the space every 30 seconds, counting the number of mobile devices carried by
individuals in the installation space. The more devices the system counts, the fewer wildlife sounds will be heard, implying that to
encourage wildlife to return to the space, the individuals with those devices must eschew their technology – switch off their mobile
devices – to re-enliven the Habitat.
As fewer devices are counted, more animals and insects flock to the sonic landscape. When the installation space is empty, in
terms of bluetooth devices counted, the wildlife habitat is vibrant, all possible sounds are playing (see fig1 and fig2 for examples of
empty installation spaces). Thus users with bluetooth mobile devices can leave the space, knowing that in doing so they are having
a positive impact on the sonic landscape, or, as a more compelling course of action , encourage others within the space to switch
off their mobile devices and instigate a mass change on the installation's sonic landscape.
The aim is to heighten awareness and consideration of the city-space that one occupies, utlises and traverses. In experiencing
Habitat, one is given the opportunity to build a new relationship with the city-space, to consider our impact as pedestrians, users
and consumers on this space and to consider its natural origins. As each individual within the space contributes to the juxtaposition
of familiar wildlife sounds alongside the city ambience, Habitat invites the flâneur to stop, observe and listen, in order to
experience the true impact that those traversing the installation space have on the sonic environment.
409
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Jeff Talman
Assistant Professor, Visual and Media Arts
Emerson College, Boston, MA
203 W. 109th St, 1W, New York, NY 10025 USA
001 (212) 729-4430
www.jefftalman.com, jefftalman@mindspring.com
Artist Biography
International artist Jeff Talman has created installations in collaboration with the cathedral and the City of Cologne, Germany, for St James
Cathedral, Chicago, at the MIT Media Lab, The Kitchen, Eyebeam and Bitforms Gallery in New York. He completed a series of three
installations in the Bavarian Forest in May 2008. Recognized as ‘a pioneer of the use of resonance in artworks’ by ‘Intute’ the consortium
of British universities, his unique achievement is self-reflexive resonance, in which the ambient resonance of an installation site becomes
its sole sound source. Talman's work further investigates the nature of sound and light as primal wave/radiant forces. Recent awards include
a Guggenheim Foundation Fellowship in Sound Art (2006) and a New York Foundation for the Arts Fellowship in Computer Arts (2003).
Residencies in 2007 include the Liguria Study Center in Bogliasco, Italy; Yaddo, and the Künstlerhaus Krems.
Project Description
For the 8th Annual NIME Conference and the Museo d’Arte Contemporanea Villa Croce, it seems only natural to begin with the sea, as the
sea permeates the culture and lives of the Genoese. From this it follows that it is entirely appropriate that a sense of the sea should literally
permeate the museum gallery. Rather than the cartoonish effect of merely transposing literal sea sounds to the gallery, I decided instead to
extract sound waves of the tide mapped to the gallery’s resonant frequencies, so that the gallery itself would harmoniously speak of the sea.
Sonic spectral analysis of the gallery provided a chart of the room’s resonant frequencies. I then programmed progressively shaped, digital
filters and used them to filter a recording of sounds of the tide. Only those frequencies that are resonant to the largest gallery space in the
museum were extracted for use in the installation.
Humans do not normally pursue sound as a referential aspect of space. The sense of sound of a space remains largely intuitive and/or sub-
conscious, though it is a significant factor in human spatial cognition – as any blind person would know. By emphasizing the resonance of
the gallery the installation enriches the human perception of the space’s sonic and spatial reality. In MIRROR OF THE MOON this
emphasis on the characteristic sound of the space provides a phenomenological template for hearing/sensing the space, while it serves as
the plastic art material for an expressive sound work. Constructed into a 5-channel sound installation, the temporal field of the work is ever-
changing as one walks through the space. The gallery itself becomes recognizable as a tuned instrument ‘played’ by sounds of the sea.
Further, the gallery space becomes a field of compositional activity, which may be explored interactively by simply walking through the
space and pausing at different locations to witness the room, video projection and the interaction of the room modes and their nodes and
anti-nodes within the space. Everything heard, though heard differently from any location in the room, reflects normally submerged sonic
aspects of the room. Importantly, by harnessing the sound of the sea to the gallery, the installation recognizes a confluence of waveforms,
those of water, sound, light and the effects of gravity. Here the sounds of the Mediterranean meld into an environment that stands in
relation to itself, the people and their city by the sea.
410
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Fold Loud
JooYoun Paek
Eyebeam
540 W. 21st street
New York, NY 10011
1 917 238 9448
jypaek@gmail.com
Biographical information
JooYoun is an artist and interaction designer born in Seoul and based in New York. She has created interactive
objects that reflect on human behavior, technology and social change. She earned a Master’s degree from the Interactive
Telecommunications Program at NYU and is currently an Artist in Residence at Eyebeam. JooYoun’s art has been displayed
by the Museum of Modern Art New York, SIGGRAPH 2007, Museum of Science Boston and Seoul Museum of Art. Her work
has also been published in BBC News, Architectural Magazine, Next Magazine and many other publications.
Description of Piece
Fold Loud is a (de)constructing musical play interface that uses origami paper-folding techniques and ritualistic
Taoist principles to give users a sense of slow, soothing relaxation. Fold Loud interconnects ancient traditions and modern
technology by combining origami, vocal sound and interactive techniques. Unlike mainstream technology intended for fast-
paced life, Fold Loud is healing, recovering and balancing.
Playing Fold Loud involves folding origami shapes to create soothing harmonic vocal sounds. Each fold is assigned
to a different human vocal sound so that combinations of folds create harmonies. Users can fold multiple Fold Loud sheets
together to produce a chorus of voices. Opened circuits made out of conductive fabric are visibly stitched onto the sheets of
paper, which creates a meta-technological aesthetic. When the sheets are folded along crease lines, a circuit is closed like a
switch. Thus, the interface guides participants to use repetitive delicate hand gestures such as flipping, pushing and
creasing. Fold Loud invites users to slow down and reflect on different physical senses by crafting paper into both geometric
origami objects and harmonic music.
411
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Kenneth Newby is a media artist whose research and creative practice explores expressive applications of computer assisted media
composition, performance and diffusion. He teaches new media composition and technique. Aleksandra Dulic is a media artist and
theorist working in the area of interactive computer animation with current research underway in performative visualization of sound
and music. Martin Gotfrit’s research centres on the creation, performance and function of music and sound in many different
disciplines and contexts. He is the Director os the School for the Contemporary Arts at Simon Fraser University. Together the three
authors of this work have been working together
as the Computational Poetics Research Group
for the past four years.
Description of Work
in a thousand drops… refracted glances is an audio/visual sculpture in fragmented space and time that becomes a single
audiovisual image as one interacts with the space of the exhibition. The work presents fragments of the bodies of humans
in hybrid relations to themselves, thereby creating a sense of the fragility of experience. The work reveals a background
made of deeper perennial questions: Who am I? What is my community? Where do its boundaries exist and how
permeable might they be?
The interactive aspects of the work provide points of focus for flows of both audible and visible images. As one moves
with the work a subtle effect is exerted on how these images are animated. Characters composed of multiples emerge and
are accompanied by synchronized emergent musical gestures. The resulting audiovisual environment is one of
construction and deconstruction of bodies through processes of stitching, repetition, collage, stretching, contraction,
multiplication, and reduction. As a result of these processes new hybrid fugal bodies are born that speak to the variety
and complexity of the ecological and interpersonal balances that depend on the mutual interdependencies of the
community of agents that make up its population. Interactions with the work take the form of refracted glances both
rewarding and confounding in an ongoing process of making sense of a chaosmos — the balance between confusion and
order — the fantastic and the logical — dreamt and waking realities.
Musical Interface
A set of layered generative music processes are guided in their production by the data inferred by a motion tracking
system including blob-detection, to determine individual locations for tracking in relation to the space of the installation,
andoptical flow sensing, to determine the relative direction of the participants’ movement. The overall effect of the
interactive process is one of a kind ofspatially dynamic orchestration in which a particuar musical process-gesture is
mapped to either a specific location or a movement style such as motion along the slow-fast spectrum, the near-far
spectrum, and stillness. These states are mapped onto the musical parameters such as orchestration, phrase selection and
detail as well as stochastic characteristics such as glissandi speed and direction. As the same motion tracking information
is also used to guide the visual animations, the audible and visible images have a strong synchronization. The
participants, in this way, become collaborators with the unfolding audio-visual experience.
Given the dynamic character of the multi-screen animation, and the flexibility of the musical production, the work moves
toward what we have been theorizing as a new form of process-based cinematic experience in which the processes
guiding the audible and visible images are braided together into a new heteroform of multiply-mediated experience.
412
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Soundscaper
Jared Lamenzo, Mohit Santram,
Kuan Huan, Maia Marinelli
Mediated Spaces
225 East 4th St.
New York, NY 10009
917-405-4352
jared@mediatedspaces.com
Biographical information
The Soundscaper team has worked on many interactive projects and large-scale installations. The four met at NYU's Interactive
Telecommunications Program. Their work collectively has been covered in The New York Times, Popular Science, The Village Voice,
New York Metro, C|NET and others, and has been displayed at the Chelsea Art Museum, Sony Wonder Tech Lab, Eyebeam, 3rd Ward
(Brooklyn), Proflux (RI), VIDEOFORMES (France), China Digital Entertainment Festival, New York University, University of British
Columbia, Svevo Castle (Italy), DUMBO Arts Festival, Refusalon Gallery (San Francisco), STYLIN Festival (Miami), Schautankstelle
Gallery (Berlin), Merce Cunningham Studio, Rockefeller Center, and many others.
Description of Piece
As John Cage demonstrated, even in silence, there is sound all around us. By detaching sound from observable cues, recordings often
reveal more about our surroundings than passive hearing, i.e., the sound of industrialized society masking the natural world, and vice-versa.
The Soundscaper is a tool to make geo-tagged recordings, entered from a mobile device. The soundscapes of cities and savannahs can be
captured for later listening and manipulation, by increasingly sophisticated mobile devices. For NIME, we intend to walk around the city
and the sea to create a Genovese soundscape using Nokia N95 phones—the sound of ships, church bells, factories, parks, buoys, etc. These
recordings can be used for sound games in which people are sent out to find or retrieve specific sounds, games of hear-and-seek, and other
applications.
The SoundScaper widget contains a “See Waypoints” section and a “New Waypoint” section. The database contains a list of “Waypoints”
entered by users. Users can enter new recordings, attached to a Waypoint, with a description of the sound. The database stores date and
time information, and a Web page lists the sounds available for listening and download. N.B. The application does not operate on all
networks in all countries. The only technical requirements are phones that run our widget, and small condenser microphones. A project
description may be found at http://mediatedspaces.com/soundscaper/ and a video at http://mediatedspaces/lib/mov/soundscaperlow.mov.
413
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Biographical information
Pas ale apolitano, born in 8 , is a researcher in design and visual communication. He grduated with onors at the University of Salerno, with a thesis
on the aesthetics of remix. He is a stable collaborator of the faculty of Industial Design at the same university, where he curates didactical laboratories in
visual communication, as an expert in video-design. His doctoral research in Media Studies is focused on the design potential of video as a form of
expression, as well as on the relation between the former and the derive of contemporary visual imaginary. He has participated in a few exhibitions as a
visual artist, videomaker and performer. He co-founded the collective Componibile and the project SoundBarrier. A video of his will be shown at Locarno
film festival 2007, in the sectio nPlay Forward. Some of his papers appeared edited by Plectica edizioni. He also published some essays on the relation of
audiovisual forms and digital cultures, in editions such as Carocci, Cronopio, L’Arca.
tefano Perna orn in 7 , Napoli, Italy, Stefano Perna is pursuing a PhD in Communication Sciences at `Università degli Studi di Salerno´, focused on
the analysis of the Scopic Regimes of Information Age. His work is about Visual Culture, Information Aesthetics, Digital Design. He published several
articles in the field of Visual Studies and New Media Studies. He is author, with Ruben Coen Cagli, of Ber.loose.coin, a digital theory and online project on
contemporary politics, now in the Rhizome Art Base.
Pier i seppe aricon a born in 80 in Avellino, he is Degree in Communication Sciences by thesis in Design entitled The Sound Image. Study and
design scapes, and Communicative Use of Sound”. Musician and Sound Designer (Sdudies of Violin and Piano at the Conservatory D. Cimarosa of
Avellino from 0 to 7) constantly engaged in studies on sound and visual communication and relationships from audiovisual and generative
occurrences. Research in the field and on Interaction Design in realtime audio and video through the use of experimental software (Eyesweb, Processing,
Puredata, Supercollider, etc.). Participation in Vision'R 2008 (international festival of Vjing in Paris) as a programmer team SoundBarrier.
Description of Piece
. A moving image is a signal that continuosly change in time. More than forms, figures and volumes, it is made of dimensions, frequencies, intensities. On
a digital medium image and sound share the same substance: electronic coded impulses. What at a first glance may appear as a simple technical factor, can
become an aesthetic hypothesis. In the era of digital technologies we can mention acustic or musical factors of an image not only in a metaphorical way. A
new kind of connection is emerging. Now it is possible to create interactions, driven by mathematical rules, between the audible and the visible media in a
way that erodes the barrier between the two fields of sound and image. With digital media we reach the indistinguishable point where image becomes sound
and sound becomes image. In our project, a patch written in EyesWeb analyzes the moving image extracting some parameters that are subsequently
converted in a data stream. Data are translated to a MIDI signal that controls audio software and synthesizers. This process gives the sound a deep
sensibility to the variations of the moving image. Selected movies are analyzed and played back . Most of them are from american underground cinema of
the '60. The experimental approach of some directors of that period - their approach to cinema as a medium and as a machine - deeply influenced the birth
of what now we call new media . The selection is at the same time a homage to that directors and a media archaeology research.
2. The continuous reference to the universe of the artistic experimentation can act from indicative jewel of the new ways of feel, to perceive and to know
proper of the video-cultures. In this perspective it could be of extreme interest to analyze
that area of experimentation, already very advanced from the point of view of the
realizations but almost entirely unexplored from the theoretical-critical point of view,
transversal to a series of productive practices (software art, genetic art, vjing), that
investigates really the unpublished and disruptive possibilities of elaboration, mapping
and aesthetic result of the video flows and to which he is turning as with always greater
interest not so much the traditional disciplines of aesthetics but the universe of the
planning.
3. In this perspective, the need of a “reinvention” (Rosalind Krauss) of the video medium,
is central. In this constant and militant remediation context our analytical artefact on the
Andy Warhol Empire tape find its basis. The long film of the american artist have a
peculiarity: it puts up the time, not the chronological time, but the enduring time,
theorized by Bergson (peculiarity that is noticeable in many other work of Warohl like so
in other video artists that refer to the New American Cinema). In other words: a
bergsonian permanence exercise. During the eight (and then some) hours of the film,
Warhol never quits the fixed eye on the skyscraper; this way of proceeding involve that the
scan of the time is entrust to the shake, the flow and the flicker of the film on the tape
head. As in other works of that period all the channel distortions became incarnation of
sense. As to say an antelitteram glitch. In mcluhanian terms: the medium (and its material
device) is the message.
. The core of the project is: try to map and sonificate this film movements, vibrations and distortions. What follows is the patch created with EyesWeb in a
screenshot beside.
414
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
China Gates
Art Clay Dennis Majoe
ETH Zurich ETH Zurich
Institute for Computer systems Institute for Computer systems
Clausisusstrasse 59, Zurich CH-8092 Clausisusstrasse 59, Zurich CH-8092
++41 044 632 84 14 ++41 044 632 73 23
art.clay@inf.ethz.ch dennis.majoe@inf.ethz.ch
Biographical information
Art Clay (USA/CHE) is a specialist in the performance of self-created works with the use of intermedia. He has appeared at international
festivals, on radio and television in Europe, Asia & North America. Recently, his work focuses on large performative works and public
spectacles using mobile devices. He is artistic director of the Digital Art Weeks in Zurich and teaches at various Institutes including the
Zurich University of the Arts. http://mypage.bluewin.ch/artclay
Dennis Majoe has a PhD in Navigation related Electronic systems and has worked extensively in the design of a variety of motion and
orientation sensing systems and computer generated environments including 3D audio. He is director of MASC, an innovative electronics
and computer design company active in the field of wireless communications. In addition to his activities at MASC he is as a senior
researcher on the ETH Zurich for the Computer Systems Department, where he developing applications related to proactive health.
415
Workshops
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Recent advancements in digital media, ICT, and related technologies offer many opportunities for supporting music education. The
i-Maestro workshop series (see www.i-maestro.org/workshop) aims to explore the subject area, including but not limited to the following
topic of interests: Gesture/posture analysis, support and understanding; Score and gesture following; Multimodal interfaces, with
visualization, sonification, etc; Augmented instruments; Cooperative environments for music; Music notation and representation;
Technology enhanced music pedagogy; Linking theory and practice trainings; Exercise generation, packaging and distribution;
Courseware authoring and generation; Assessment support; Profiling and progress monitoring; and more generally Interactive multimedia
music.
The i-Maestro project is partially funded by the European Commission under the IST 6th Framework to explore interactive multimedia
environments for technology enhanced music education. The project explores novel solutions for music training in both theory and
performance, building on recent innovations resulting from the development of computer and information technologies, by exploiting new
pedagogical paradigms with cooperative and interactive self-learning environments, and computer-assisted tuition in classrooms including
gesture interfaces and augmented instruments with particular focus on bowed string instruments.
The resulting i-Maestro framework for technology-enhanced music learning is intended to support the creation of flexible and
personalisable e-learning courses. It shall to offer pedagogic solutions and tools to maximise efficiency, motivation, and interests in the
learning processes and improve accessibility to musical knowledge.
At the time of going to press, the provisional programme for this workshop from the first call includes the following presentations and
demos:
An Overview of the i-Maestro Project on Technology Enhanced Learning for Music, Kia Ng
This presentation provides an introduction to the i-Maestro project. It discusses the overall aims and objectives, latest results, achievements
and future directions. An overview of the key components is given with their corresponding pedagogical contexts. This opening
presentation will introduce the structure of the workshop highlighting aspects of the various presentations and demos to follow.
Analysis and Sonification of Bowing Features for String Instrument Training, Oliver Larkin, Thijs Koerselman, Bee Ong, and Kia Ng
The i-Maestro 3D Augmented Mirror (AMIR) allows a teacher and student to study a performance using 3D motion capture technology
and provides multimodal feedback based on analyses of the data. This paper presents a module for AMIR that maps bowing feature
analyses to sound parameters. We describe several sonifications which are designed to be used in both real-time and non-real-time
situations offering an alternative way of looking at the performance which, in some cases, has advantages over visualisation techniques.
Three Pedagogical Scenarios using the Sound and Gesture Lab, Nicolas Rasamimanana, Fabrice Guedy, Norbert Schnell, Jean-
Philippe Lambert, and Frederic Bevilacqua
This article reports pedagogical experimentations using the “Sound and Gesture Lab”, a prototype application that allows for the
manipulation and processing of sound and gesture data of a real music player. Three scenarios were designed to provide real-time
interactions between an instrumentalist and a computer, or between several instrumentalists. The scenarios make use of direct sonifications
of bow movements, gesture following and sound synthesis. As such, they create movement based interactions that can develop students’
embodiment in music.
Integration of i-Maestro Gesture Tools, Thijs Koerselman, Oliver Larkin, Bee Ong, Nicolas Leroy, Jean-Philippe Lambert, Diemo
Schwarz, Fabrice Guedy, Norbert Schnell, Frederic Bevilacqua, and Kia Ng
We discuss two prototype pedagogical applications which allow the study of string instrument bowing gesture using different motion
capture technologies. Several modalities for integration of these tools are presented, and the advantages and implications of the various
approaches are discussed. Data exchange between the two applications is achieved using the Sound Description Interchange (SDIF) format
which has, until now, been used primarily as a format for storing audio analysis data. We describe our method of storing motion- and
analysis- data in the SDIF format and discuss future works in this direction.
Music Representation with MPEG-SMR, Pierfrancesco Bellini, Francesco Frosini, Nicola Mitolo, and Paolo Nesi
Symbolic music representation is a logical structure of symbolic elements representing music events and the relationships among those
events, and with other media types. The evolution of information technology has recently produced changes in the practical use of music
representation and notation, transforming it from a simple visual coding model for sheet music into a tool for modelling music in computer
programs and electronic devices in general with strong relationships with other audiovisual data: video, images, audio, animations, etc.
419
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
MPEG SMR is a new ISO standard integrated into MPEG-4 allowing the realization of new applications in which multimedia and music
notation may take advantage and enrich each other in terms of functionalities.
The European Curriculum Challenge: a case study on technology-supported specialised music education, Kerstin Neubarth, Vera
Gehrs, Lorenzo Sutton, Tillman Weyde, and Laura Poggio
The European project i-Maestro is developing an interactive multimedia environment to support music teachers and students at music
schools and conservatories across Europe, focusing on bowed string instruments. This paper analyzes the curricula of European music
education institutions against: pedagogic needs reported by music teachers; desired learning outcomes defined by regulatory frameworks;
findings from music psychology and education research; and the technological priorities of the i-Maestro environment. The analysis leads
to an identification of curriculum areas that are expected to benefit from technology-enhanced teaching and learning experiences as offered
by i-Maestro.
Collaborative Working for Music Education, Pierfrancesco Bellini, Francesco Frosini, Nicola Mitolo, and Paolo Nesi
In the area of technology enhanced music training, cooperative work is becoming an increasingly feasible concept. It can be used to
experiment with and to exploit new modalities of training and to reduce the set up time necessary for the organization of group work -
integrating distributed systems for audio-visual processing and general control (i.e. synchronous playback, recording). To this end, a
flexible model to cope with groups, roles, tools, and large sets of features is required. This paper presents a Cooperative Work environment
for Max MSP to facilitate the creation of a variety of cooperative applications (not limited to educational uses) structurally supporting:
group role and tool concepts, undo/redo, joining and rejoining, preserving the simplicity and consistency of commands, etc. The system
can be used for a large variety of multimedia/multimodal music applications.
Integration Aspects in i-Maestro, Marius Spinu, Giacomo Villoresi, Maurizio Campanai, Andrea Mugellini, and Fabrizio Fioravanti
The integration of large software projects is a complex task. The i-Maestro framework is composed of many modules created with
different technologies such as Max/MSP, C++, Java and Php. Thus integration presents challenges in diverse fields: from communications
protocols to Graphical User Interface harmonisation, not forgetting the connection between different tools made with graphical- and
procedural- programming environments. Minor changes performed in any module could have significant effects on other modules both
from a technical and usability perspective. This paper discusses the integration of the i-Maestro project components during the first two
years of development, considering the various aspects such as the collaborative environment, technical and pedagogical issues and
strategy.
i-Maestro: Making Music Tuition Technologies Accessible, Neil McKenzie and David Crombie
This paper discusses a set of reusable user interfaces that have been created for viewing, navigating and editing music notation in
accessible formats. These were successfully incorporated into software developed during the i-Maestro project which is investigating the
role of technology in improving the quality of music education across Europe. The work follows on from the AccessMusic project which
created converters for producing Braille and speech output from scores created in the popular Finale package. Furthermore, these
interfaces have been designed such that they can be coupled onto any music notation viewer or editor.
The paper will present the findings related to this area of the i-Maestro project and provide a demonstration of the tools and interfaces
created for visually impaired end users during the course of the project.
Introducing a Novel Musical Teaching Automated Tool to Transfer Technical Skills from an Anthropomorphic Flutist Robot to
Flutist Beginners, Jorge Solis and Atsuo Takanishi
Up to now, different kinds of musical performance robots (MPRs) and robotic musicians (RMs) have been developed. MPRs are designed
to closely reproduce the human organs involved during the playing of musical instruments. In contrast, RMs are conceived as automated
mechanisms designed to introduce novel ways of musical expression. Our research on the Waseda Flutist Robot has been focused on
clarifying the human motor control from an engineering point of view. As a result, the Waseda Flutist Robot No. 4 Refined IV (WF-4RIV)
is able of playing the flute nearly similar to an intermediate player. Thanks to the human-like design and the advanced technical skills
displayed by the WF-4RIV, novel ways of musical education can be conceived. In this paper; the General Transfer Skill System (GTSS) is
implemented on the flutist robot, towards enabling the automated transfer of technical skills from the robot to flutist beginners. A set of
experiments are carried out to verify the evaluation and interaction modules of the GTSS. From the experimental results, the robot is able
of quantitatively evaluating the performance of beginners, and automatically recognizing the melodies performed by them.
Audio-driven Augmentations for the Cello, Benjamin Lévy and Kia Ng
This paper presents the development of a suite of audio-driven sonifications and effects for the acoustic cello. Starting with a survey of
existing augmented string instruments, we discuss our approach of augmenting the cello, with particular focus on the player’s gestural
control. Using features extracted from the audio input we maintain the player’s normal interactions with the instrument and aim to provide
additional possibilities to allow new expressions with technology. The system is developed in Max/MSP. It comprises analysis and
processing modules that are mapped through virtual layers forming effects for either live improvisation or composition. The paper
considers the musicality of the effects and pedagogical applications of the system.
Acknowledgement
The i-Maestro project is supported in part by the European Commission under Contract IST-026883 I-MAESTRO. The authors would like
to acknowledge the EC IST FP6 for the partial funding of the I-MAESTRO project (www.i-maestro.org), and to express gratitude to all I-
MAESTRO project partners and participants, for their interests, contributions and collaborations.
420
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Michael Zbyszyński
Center for New Music and Audio
Technologies, UC Berkeley
1750 Arch Street
Berkeley, California USA
+1.510.643.9990 x 314
mzed@cnmat.berkeley.edu
This is a workshop for people who want to get started with tablet-based interfaces or who
want to teach others to use tablet interfaces for music. It is based on my method book (in
progress, see paper accepted to this year's NIME) and will cover:
Basics:
• Short history of pen and tablet based
interfaces for music
• Choosing a tablet, styluses, etc. and
installing and running on your operating
system
• Implementation in musical software
including Max/MSP, Pd, and
OpenSoundControl
Répertoire
• Workshop presenters will perform and
demonstrate their own works, illustrating
performance and mapping paradigms that are unique to their situations, as well as general
strategies
• Participants will be given a selection of tablet interfaces that could make up a "tablet
orchestra." The final stage of the workshop will be a group performance on diverse tablet
instruments. (It would be great if we could schedule a "tablet jam" during the conference,
perhaps at one of the concerts.)
421
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Participants will leave the workshop with an overview of prior musical work with tablets,
specifically recent pieces by members of the NIME Community. They will also have many
pre-built software tools to implement their own work, to continue practicing tablet skills, and
to teach in individual and classroom settings.
Presenters:
Myself (http://www.mikezed.com/) , and I intend to invite Matt Wright (http://ccrma.stanford.edu/~matt/) ,
Ali Momeni (http://alimomeni.net/), Jan Schacher aka jasch (http://www.jasch.ch/), and Nicholas
D'Alessandro (http://www.dalessandro.be/). They have all expressed prior interest in the tablet
pedagogy project.
Participants:
Approximately 25 people who want to get started with tablet-based interfaces or who want
to teach others to use tablet interfaces for music.
Schedule:
4 hours -- 1.5 hours lecture/demonstrations followed by a short break and approximately 2
hours of hands on laboratory
422
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
The Details:
On-body (ambulatory) sensing – Ben Knapp:
This component of the workshop will examine the use of a number of on-body sensor systems for
measuring kinematics and physiological state during performance. The simultaneous use of motion
sensors (e.g. accelerometers, gyros, and location trackers), force sensors (e.g. FSRs, QTCs, and strain
gauges), and bioelectric sensors (e.g. EMG, EKG, EEG, and GSR) will be demonstrated. The key
issues of synchronization and multimodal processing will be discussed.
423
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Format:
The workshop will consist of 90 minutes of presentations on the measurement techniques followed
by 90 minutes of interactive discussions and demonstrations.
Audience:
The workshop is intended for anyone interested in understanding the quantitative and qualitative
aspects of measuring gestures during live performance of music. It is open to both the novice as well
as those that are already experienced in specific techniques but might be interested in learning more
about some of the other methods of gesture acquisition.
424
Proceedings of the 2008 Conference on New Interfaces for Musical Expression (NIME08), Genova, Italy
Jamoma Workshop
Alexander Refsum Jensenius,a Timothy Place,b Trond Lossius,c
Pascal Baltazar,d Dave Watsone
a)
University of Oslo & Norwegian Academy of Music, a.r.jensenius@imv.uio.no
b)
Electrotap, tim@electrotap.com
c)
BEK - Bergen Center for Electronic Arts, lossius@bek.no
d)
GMEA - Groupe de Musique Electroacoustique dʼAlbi-Tarn, pb@gmea.net
e)
dave@elephantride.org
Jamoma
Jamoma1 is an open-source project for developing a structured and modularized approach to programming in Max/MSP and Jitter. The
main idea of Jamoma is the module, which is built up of a separate algorithm patcher and a patcher containing the graphical user
interface. Figure 1 shows examples of the three main types of modules: control, audio and video.
Figure 1. Examples of the three main types of modules (from left): control, audio and video.
Besides providing a large collection of ready-made modules, one of the core strengths of using Jamoma in Max development is that it
simplifies the creation of large projects by enforcing a structured approach to the patching. Jamoma uses Open Sound Control (OSC)
for internal and external messaging, thus making it easy to communicate to and from modules. Recent development adds support for
cues, various types of mappings and an extensive ramping library including a modular function and dataspace library.
1
http://www.jamoma.org
425
Author Index
Abdallah, Samer, 215 Drayson, Hannah, 400
Aitenbichler, Erwin, 285 Dubrau, Josh, 164
Ajay, Kapur, 144 Dulic, Aleksandra, 384, 412
Alonso, Marcos, 207, 211 Eigenfeldt, Arne, 144
Baltazar, Pascal, 382, 425 Endo, Ayaka, 345
Barbosa, Álvaro, 9 Essl, Georg, 185, 392
Bau, Olivier, 91 Falkenberg Hansen, Kjetil, 207
Bencina, Ross, 197 Farshi, Olly, 409
Berdahl, Edgar, 61, 299 Favilla, Paris, 366
Blackwell, Alan, 28 Favilla, Stuart, 366, 387
Bokowiec, Mark Alexander, 388, 389 Feldmeier, Mark, 193
Bossuyt, Frederick, 229, 372 Fernstrom, Mikael, 103
Bouënard, Alexandre, 38 Ferrari, Nicola, 379
Bouillot, Nicolas, 13, 189 Fitzpatrick, Geraldine, 87
Boyle, Aidan, 269 Flanigan, Lesley, 349
Bozzolan, Matteo, 24 Follmer, Sean, 354
Bryan-Kinns, Nick, 81, 319 Fraietta, Angelo, 19
Butler, Jennifer, 77 Freed, Adrian, 107, 175
Camurri, Antonio, 134 Friberg, Anders, 128
Canazza, Sergio, 140 Fried, Joshua, 397
Canepa, Corrado, 134 Gatzsche, Gabriel, 325
Cannon, Joanne, 366, 387 Geiger, Christian, 303
Caporilli, Antonio, 396 Gibet, Sylvie, 38
Chabrier, Renaud, 396 Girolin, Roberto, 378
Chant Dale, 366 Godbehere, Andrew B., 237
Chordia, Parag, 331 Goina, Maurizio, 150
Ciglar, Miha, 203, 399 Gotfrit, Martin, 412
Ciufo, Thomas, 390 Grosshauser, Tobias, 97
Clay, Art, 415 Hadjakos, Aristotelis, 285
Coghlan, Niall, 233 Hamel, Keith, 383
Coletta, Paolo, 134 Hartman, Ethan, 356
Collins, Nick, 87 Hashida, Mitsuyo, 277
Cooper, Jeff, 356 Havryliv, Mark, 164
Cooperstock, Jeremy R., 13, 189 Hayafuchi, Kouki, 241
Corcoran, Greg, 400 Hazlewood, William R., 281
Corness, Greg, 265 Hébert, Jean-Pierre, 261
Cospito, Giovanni, 24 Henriques, Tomás, 307
Crevoisier, Alain, 113 Hicks, Tony, 366, 387
d'Alessandro, Nicolas, 401, 403 Houle, François, 384
Dattolo, Antonina, 140 Huan, Kuan, 416
De Jong, Staas, 370 Ito, Yosuke, 277
de Martelly, Elizabeth, 339 Jacobs, Robert, 193
Dekleva, Luka, 399 Jacquemin, Christian, 122
Delle Monache, Stefano, 154 Jensenius, Alexander R., 181, 425
Demey, Michiel, 229, 372 Jo, Kazuhiro, 315
Dimitrov, Smilen, 211 Källblad, Anna, 128
Dixon, Simon, 364 Kamatani, Takahiro, 360
Doro, Andrew, 349 Kamiyama, Yusuke, 352
429
Kapur, Ajay, 144, 404 Paek, Joo Youn, 411
Katayose, Haruhiro, 277 Pak, Jonathan, 405
Kellum, Greg, 113 Pakarinen, Jyri, 49
Kiefer, Chris, 87 Palacio-Quintin, Cléo, 293, 402
Kim-Boyle, David, 3 Papetti, Stefano, 154
Kimura, Mari, 219 Paradiso, Joseph A., 193
Klauer, Giorgio, 380 Paschke, David, 303
Knapp, R. Benjamin, 117, 233, 425 Pelletier, Jean-Marc, 158
Knopke, Ian, 281 Penfield, Kedzie, 117
Kuhara, Yasuo, 345 Penttinen, Henri, 392
Kuuskankare, Mika, 34 Perna, Stefano, 414
Kuyken, Bart, 229 Place, Timothy, 181, 425
Kyoya, Miho, 360 Plumbley, Mark D., 81, 319
Lähdeoja, Otso, 53 Pohu, Sylvain, 402, 403
Lamenzo, Jared, 413 Polotti, Pietro, 150, 154
Langley, Somaya, 197 Pöpel, Cornelius, 303
Lanzalone, Silvia, 273, 398 Poulin-Denis, Jacques, 386
Laurson, Mikael, 34 Price, Robin, 311
Leman, Marc, 229, 372 Prinčič, Luka, 399
Lossius, Trond, 181, 425 Puputti, Tapio, 49
Loviscach, Joern, 221 Rae, Alex, 331
Mackay, Wendy, 91 Räisänen, Juhani, 57
Macrae, Robert, 364 Rebelo, Pedro, 311
Maes, Pattie, 67 Reckter, Holger, 303
Majoe, Dennis, 415 Rigler, Jane, 395
Maniatakos, Vassilios-Fivos A., 122 Robertson, Andrew, 215, 319
Mariconda, Pier Giuseppe, 414 Rocchesso, Davide, 154
Marinelli, Maia, 416 Rohs, Michael, 185
Marquez-Borbon, Adnan, 354 Roma, Gerard, 249
Mazzarino, Barbara, 134 Romero, Ernesto, 385
McMillen, Keith A., 347 Rootberg, Alison, 339, 391
Mehnert, Markus, 325 Santram, Mohit, 416
Menzies, Dylan, 71 Sartini, Alessandro, 381
Messier, Martin, 386 Schacher, Jan C., 168
Misra, Ananya, 185 Schedel, Margaret, 339, 391
Miyama, Chikashi, 383 Schmeder, Andy, 175
Modler, Paul, 358 Schutz, Florian, 303
Mühlhäuser, Max, 285 Serafin, Stefania, 211
Myatt, Tony, 358 Settle, Zack, 13, 189
Nagano, Norihisa, 315 Sjöstedt Edelholm, Elisabet, 128
Napolitano, Pasquale, 414 Sjuve, Eva, 362
Nash, Chris, 28 Smith III, Julius O., 299
Nesi, Paolo, 225 Spratt, Kyle, 356
Newby, Kenneth, 412 Steiner, Hans-Christoph, 61
Ng, Kia, 225, 419 Stöcklmeier, Christian, 325
Oldham, Collin, 61 Stowell, Dan, 81
OʼModhrain, Sile, 117 Suzuki, Kenji, 241, 360
Ortiz Perez, Miguel, 400 Svensson, Karl, 128
430
Tahiroglu, Koray, 400
Takegawa, Yoshinari, 289
Talman, Jeff, 410
Tanaka, Atau, 91
Tanaka, Hiroya, 352
Tanaka, Mai, 352
Teles, Paulo Cesar, 269
Terada, Tsutomu, 289
Thiebaut, Jean-Baptiste, 215
Torre, Giuseppe, 103
Torres, Javier, 103
Tsukamoto, Masahiko, 289
Uchiyama, Toshiaki, 360
Välimäki, Vesa, 49
Valle, Andrea, 253, 257
Vanfleteren, Jan, 229, 372
Verstichel, Wouter, 229
Vinjar, Anders, 335
Vogrig, Esthel, 385
Volpe, Gualtiero, 134, 423
Wanderley, Marcelo M., 38, 423
Wang, Ge, 392
Ward, Nathan J., 237
Ward Nicholas, 117
Warren, Chris, 354
Watson Dave, 425
Wilde, Danielle, 197
Wilson-Bokowiec, Julie, 388, 389
Wozniewski, Mike, 13, 189
Xambó, Anna, 249
Young, Diana, 44
Zannos, Ioannis, 261
Zbyszynski, Michael, 245, 421
Zoran, Amit, 67
431