Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Dannenberg
School of Computer Science
Interactive Visual Music: A
Carnegie Mellon University
Pittsburgh, Pennsylvania 15213 USA
Personal Perspective
rbd@cs.cmu.edu
Interactive performance is one of the most innova- musical situations by generating sounds and images
tive ways computers can be used in music, and it that encourage improvising performers to make cer-
leads to new ways of thinking about music compo- tain musical choices. In this way, I feel that I can
sition and performance. Interactive performance achieve compositional control over both large-scale
also poses many technical challenges, resulting in structure and fine-grain musical texture. At the
new languages and special hardware including sen- same time, performers are empowered to augment
sors, synthesis methods, and software techniques. the composed interactive framework with their
As if there are not already enough problems to own ideas, musical personality, and sounds.
tackle, many composers and artists have explored
the combination of computer animation and music
within interactive performances. In this article, I Origins
describe my own work in this area, dating from
around 1987, including discussions of artistic and In the earliest years of interactive music perfor-
technical challenges as they have evolved. I also de- mance, a variety of hardware platforms based on
scribe the Aura system, which I now use to create minicomputers and microprocessors were available.
interactive music, animation, and video. Most systems were expensive, and almost nothing
My main interest in composing and developing was available off-the-shelf. However, around 1984,
systems is to approach music and video as expres- things changed dramatically: one could purchase an
sions of the same underlying musical deep struc- IBM PC, a MIDI synthesizer, and a MIDI interface,
ture. Thus, images are not an interpretation or allowing interactive computer music using afford-
accompaniment to audio but rather an integral part able and portable equipment. I developed much of
of the music and the listener/viewer experience. the CMU MIDI Toolkit software (Dannenberg 1986)
From a systems perspective, I have always felt that in late 1984, which formed the basis for interactive
timing and responsiveness are critical if images and music pieces by many composers. By 1985, I was
music are to work together. I have focused on soft- working with Cherry Lane Technologies on a com-
ware organizations that afford precise timing and puter accompaniment product for the soon-to-be-
flexible interaction, sometimes at the expense of announced Amiga computer from Commodore.
rich imagery or more convenient tools for image- Cherry Lane carried a line of pitch-to-MIDI convert-
making. ers from IVL Technologies, so I was soon in the pos-
Interaction is commonly accepted (usually session of very portable machines that could take in
without much thought) as a desirable property of data from my trumpet as MIDI, process the MIDI
computer systems, so the term has become over- data, and control synthesizers.
used and vague. A bit of explanation is in order. In- Unlike the PC, the Amiga had a built-in color
teraction is a two-way flow of information between graphics system with hardware acceleration. At
live performer(s) and computer. In my work, I see first, I used the Amiga graphics to visualize the in-
improvisation as a way to use the talents of the per- ternal state of an interactive music piece that was
former to the fullest. Once a performer is given the originally written for the PC. It was a small step to
freedom to improvise, however, it is difficult for a use the graphics routines to create abstract shapes
composer to retain much control. I resolve this in response to my trumpet playing. My first experi-
problem by composing interaction. In other words, ment was a sort of music notation where performed
rather than writing notes for the performer, I design notes were displayed in real-time as growing squares
positioned according to pitch and starting time. Be-
Computer Music Journal, 29:4, pp. 2535, Winter 2005 cause time wrapped around on the horizontal axis,
2005 Massachusetts Institute of Technology. squares in different colors would pile up on top of
Dannenberg 25
other squares, or very long notes would fill large tated within the color lookup table. Effects analo-
portions of the screen, effectively erasing what was gous to chaser lights on movie marquees could be
there before. This was all very simple, but I became created, again with very little computation. Scenes
hooked on the idea of generating music and images could also be rendered incrementally. Drawing ob-
together. This would be too much for a performer jects one by one from lines and polygons can be an
alone, but with a computer helping out, new things interesting form of animation that spreads the
became possible. drawing time over many video frames. Polygons can
grow simply by overwriting larger and larger poly-
gons, again with low cost as long as the polygons are
Lines, Polygons, Screens, and Processes not too large.
One of my first animations had worm-like figures
One of the challenges of working with interactive moving around on the screen. Although the worms
systems, as opposed to video or pre-rendered com- appeared to be in continuous motion, each frame re-
puter animation, is that few computers in the early quired only a line segment to be added to the head
days could fill a screen in 60 msec, which is about and a line segment to be erased at the tail. Sinuous
the longest tolerable frame period for animation. paths based on Lissajous figures were used in an at-
Even expensive 3-D graphics systems fell short of tempt to bring some organic and fluid motion to the
delivering good frame rates. The solution was to in- rather flat two-dimensional look of the animation.
vent interesting image manipulations that could be At other times, randomly generated polygons were
accomplished without touching every pixel on created coincident with loud percussion sounds,
every frame. and color map manipulations were used to fade
these shapes to black slowly.
Dannenberg 27
an equally rich, complex, visual field can be ab- these elements appear in synchrony with the music
sorbed at the same time vastly overestimates the or not.
powers of human perception. Some composers may However, many people who see this performance
pursue this path as a logical extension of the com- never notice the coincidence. There may be evolu-
plexity that has arisen in modern music, in spite of tionary reasons for this. If one hears footsteps at a
the perceptual difficulties. Others may want to con- rate of 300 per minute in the jungle, one might want
sider film music as a model. A successful composer to pay attention, but it seems unlikely one would
described to me how he sometimes writes very ever see flashing lights or visual stimuli at a similar
simple lineslines that might be too simple to frequency. On the other hand, we are very sensitive
stand aloneknowing that the picture and dialog to visual patterns, spatial frequency, and motion.
will complete the film and satisfy the audience. Although there are many parallels between music
This is not to say music and sound in film (see Bord- and animation, and the synaesthetic possibilities
well and Thompson 1986; Altman 1992; Chion are very appealing, composers should be careful not
1994) is simpleonly that image and sound percep- to fall into the trap of mapping musical experience
tion are intertwined. directly into the visual world.
In a similar way, if we place rich music and per-
formers in the foreground, it makes sense that im-
ages can consist of very simple lines and still be Making the Connection Interesting
very satisfying. Audience members have often com-
mented that there is simply too much to see and Third, there is a common temptation to draw con-
hear in some of my pieces, and to some extent this nections between music and image at a very super-
is purposeful. This is a much better reaction than I ficial level. The best (and worst) examples of this
left early because nothing was happening, but I am are music visualizers as supported by, for example,
still struggling with this problem. Images, sounds, Windows Media Player, WinAmp, and iTunes. I dis-
and their impressions cannot be considered sepa- like the animations created by these programs be-
rately, because the perception of each one is affected cause they offer only what is readily apparent in the
by the other. music itself. As soon as the obvious connections
from sound to image are made, the image ceases to
be interesting or challenging. A better approach that
Visual and Auditory Time is particularly available to composers is to make
connections between deep compositional structure
Secondly, visual events do not work the way sound and images. Usually in interactive music systems,
events do. As an example, consider again the ani- there are some music generation components with
mated worms mentioned above. These were set in control parameters and state information that affect
the context of a steady eighth-note rhythm at about the music structure but which are not directly per-
150 beats per minute (200 msec per eighth note). ceivable. By tying visuals to this deep, hidden infor-
The worms moved by adding to the head and mation, the audience may perceive that there is
erasing from the tail every eighth note in synchrony some emotional, expressive, or abstract connection,
with the music. Now, 200 msec means a 5-Hz up- but the animation and music can otherwise be quite
date rate, which is well below the rate at which independent and perhaps more interesting.
frames fuse into continuous motion. This is also For example, in my Nitely News (1993/1995), four
well beyond the asynchrony (around 50 msec) be- contrapuntal musical voices are connected to four
low which visual and auditory events seem to be graphical random walks that accumulate to form
simultaneous. Therefore, it should be quite appar- a dense image over time (see Figure 1). Later, a stick-
ent that the worms are constructed from distinct figure dancer appears and reacts to changes in tempo
visual elements, and it should be clear whether and improvisational style (see Figure 2). Thus, the
Dannenberg 29
Figure 3. In The Words are Figure 4. In Transit (1997)
Simple (1995), a back- uses iterated function sys-
ground fall image is over- tems and simulated heat
laid with handwriting diffusion to generate
from a graphics tablet and chaotic, flowing images
falling, spinning leaves. with high-level controls
derived from music im-
provisation.
(Matsuda and Rai 2000), and Isadora by TroicaTronix. tion, the video images were texture-mapped onto
Other approaches can be seen in Videodelic by U & I pulsating graphical objects that were projected be-
Software, Processing (see www.proce55ing.org), and hind the dancer and ensemble.
Electronika by Aestesis. As more processing power In the future, we will see more technology and
becomes available (which is inevitable), many new greatly enhanced processing power. While faster
video processing and image making techniques will does not mean better, new technology will cer-
be explored. tainly enable new artistic directions. Advances in
Video processing opens up the possibility of im- consumer technology such as video games and ani-
ages controlling sounds. For many years, researchers mated films will influence how audiences perceive
have created interactive systems that use video cam- and interpret computer music with video and ani-
eras for sensing (Krueger 1983; Rokeby 1997; see mation. Low resolution and uneven frame rates
also STEIMs BigEye software, available online at may have been considered high tech years ago,
www.steim.org/steim/bigeye.html). In some ways, but the same technology might appear to todays au-
this is simpler than image generation, because low diences as intentionally retro.
frame rates and very-low resolution can often be
used for sensing, and software does not necessarily
need to process every pixel. On the other hand, Software Architecture
there is a tendency to use video merely as a trigger-
ing device, and there are relatively few examples Creating works for integrated media poses some dif-
where video-based controllers achieve continuous ficult problems for software design and implemen-
and subtle sound control. In The Watercourse Way tation. As mentioned earlier, there is a mismatch
(2003), I used video to capture light reflected from between the timing requirements for audio and
water, transforming this into time-varying spectra video, corresponding to fundamental differences in
for synthesizing sound (Dannenberg et al. 2003; our aural and visual perception. Video and graphics
Dannenberg and Neuendorffer 2003). The patterns processing can slow down audio performance if the
of light were clearly audible in the sounds. This was two are tightly integrated at the software level. At
augmented by the theatrical aspect of a pool of wa- the other extreme, highly decoupled systems can
ter on stage as well as the human aspect of a dancer suffer from a lack of coordination and communica-
making ripples in the water (see Figure 6). In addi- tion between image and sound.
Dannenberg 31
Timing Requirements of real-time graphics, video, and music processing.
The timing issues mentioned above drive much of
Let us begin by examining some requirements. Au- the Aura architecture. The mixture of high- and
dio computation should never be delayed by more low-latency computation calls for multiple pro-
than a few milliseconds to avoid perceptible latency cesses and a priority-based scheduling scheme so
and jitter, whereas video and graphics can be de- that audio computations can preempt graphics,
layed by about one frame time (15 to 40 msec) or video, and other high-latency tasks. This naturally
more without noticeable problems. leads to a separation or decoupling of graphics and
The audio timing requirement stems from two music processing, so one of the goals of Aura is to
sources. First, there is the desire to achieve low- restore close coupling and communication with-
latency for audio effects applied to live sound. To out sacrificing real-time performance. Aura also
keep latency down, audio buffers must be short. If looks forward to future parallel-processing systems
audio samples are not computed soon enough to re- based on multicore, multiprocessor, and distributed
fill the buffer, audible pops and clicks will result. systems.
Therefore, the deadline to complete each audio The key to communication in Aura is an efficient
computation is bounded by the low overall audio la- object and message-passing system, which aims to
tency. The second reason for low latency is that we give the illusion of ordinary object-oriented pro-
can perceive jitter of only a few milliseconds in a se- gramming even when objects are spread across mul-
quence of equally spaced sounds, such as a run of tiple threads and even multiple address spaces
fast 32nd notes. (Dannenberg and Lageweg 2001). To accomplish
Video is more forgiving, in part because when this, objects are referenced by 64-bit, globally
video deadlines are missed and a new frame is not unique identifiers rather than memory addresses.
ready, the previous frame is displayed again, at least Whereas message passing in object-oriented lan-
in most modern double-buffered graphics systems. guages like C++ and Java amounts to a synchronous
Our eyes are not very sensitive to this; in fact, when procedure call, Aura messages are true messages: a
converting from film (24 fps) to NTSC video (29.97 sequence of bytes containing a size, destination,
fps), roughly every fourth frame is doubled, causing timestamp, method name, and parameters. Also,
frequent timing errors of 20 msec, yet we do not Aura messages are normally asynchronous, so there
notice. is no return value, and the sender does not wait for
In actual practice, timing requirements depend the completion of the operation. This is important
upon the nature of the composition and perfor- because it allows low-latency operations to con-
mance, and many wonderful works have been cre- tinue without waiting.
ated without meeting such stringent timing One could imagine a system in which every object
requirements. On the other hand, taking an engi- is executed by its own thread, allowing the sched-
neering perspective, we do not want to preempt uler to decide which objects receive priority. Every
artistic decisions by building in limitations at the object would then have a private stack and message
system level. queue, requiring lots of memory and adding a lot of
context-switching overhead. To avoid this, Aura
uses relatively few threads and partitions objects by
The Aura System assigning each to a single thread. A thread with its
collection of objects is called a zone in Aura, and
Aura is a software framework for creating interac- zones run at different priority levels. Typically,
tive, multimedia performances (Dannenberg and there are three zones: one for hard-deadline audio
Rubine 1995; Dannenberg and Brandt 1996; Dan- processing, one for soft-deadline music control, and
nenberg 2004a, 2004b). Aura has a number of inter- one for high-latency graphics, animation, video, and
esting design elements that support the integration user-interface processing. Each zone can exist in a
Dannenberg 33
tegration and control. In the early days, computer Danks, M. 1997. Real-Time Image and Video Processing
power was lacking, resulting in low frame rates or in GEM. Proceedings of the 1997 International Com-
various shortcuts to achieve some form of anima- puter Music Conference. San Francisco: International
tion. Computers have gained enough speed to pro- Computer Music Association, pp. 220223.
cess video at the pixel level, and graphics processors Dannenberg, R. B. 1986. The CMU MIDI Toolkit. Pro-
ceedings of the 1986 International Computer Music
now produce high-resolution, three-dimensional
Conference. San Francisco: International Computer
images with amazing flexibility. Music Association, pp. 5356.
The Aura system is designed to facilitate the inte- Dannenberg, R. B. 1993. Software Design for Interactive
gration of these new and evolving technologies by Multimedia Performance. Interface 22(3):213228.
offering a flexible, distributed, real-time object sys- Dannenberg, R. B. 2004a. Aura II: Making Real-Time
tem. Aura provides the glue to combine video, Systems Safe for Music. Proceedings of the 2004 Con-
graphics, audio, and control. Aura emphasizes per- ference on New Interfaces for Musical Expression.
formance and ease-of-integration with hardware and Hamamatsu, Japan: Shizuoka University of Art and
software systems, but its open-endedness can be a Culture, pp. 132137.
problem for less-than-expert programmers. Fortu- Dannenberg, R. B. 2004b. Combining Visual and Tex-
nately, there are alternatives that fall at different tual Representations for Flexible Interactive Audio
Signal Processing. Proceedings of the 2004 Interna-
points in the spectrum from very general to very
tional Computer Music Conference. San Francisco:
easy to use. International Computer Music Association,
Certainly, the biggest challenges ahead are artis- pp. 240247.
tic rather than technological. While better projec- Dannenberg, R. B., et al. 2003. Sound Synthesis from
tion technology, greater resolution, faster computers, Video, Wearable Lights, and The Watercourse Way.
and lower costs may be desirable, we already have Proceedings of the Ninth Biennial Symposium on Arts
plenty of capabilities to explore right now. One of and Technology. New London, Connecticut: Connecti-
the attractions of this pursuit is that there are rela- cut College, pp. 3844.
tively few precedents and no established theory. In- Dannenberg, R. B., and E. Brandt. 1996. A Flexible Real-
teractive visual music welcomes the creative spirit. Time Software Synthesis System. Proceedings of the
1996 International Computer Music Conference. San
Francisco: International Computer Music Association,
pp. 270273.
References Dannenberg, R. B., and P. v. d. Lageweg. 2001. A System
Supporting Flexible Distributed Real-Time Music Pro-
Altman, R. 1992. Sound Theory/Sound Practice. New cessing. Proceedings of the 2001 International Com-
York: Routledge. puter Music Conference. San Francisco: International
Bargar, R., et al. 1994. Model-Based Interactive Sound for Computer Music Association, pp. 267270.
an Immersive Virtual Environment. Proceedings of Dannenberg, R. B., and T. Neuendorffer. 2003. Sound
the 1994 International Computer Music Conference. Synthesis from Real-Time Video Images. Proceedings
San Francisco: International Computer Music Associa- of the 2003 International Computer Music Conference.
tion, pp. 471474. San Francisco: International Computer Music Associa-
Bordwell, D., and K. Thompson. 1986. Sound in the Cin- tion, pp. 385388.
ema. In Film Art, 2nd ed. New York: Newbery Award Dannenberg, R. B., and D. Rubine. 1995. Toward Modu-
Records, pp. 233260. lar, Portable, Real-Time Software. Proceedings of the
Burger, J. 1993. The Desktop Multimedia Bible. Reading, 1995 International Computer Music Conference. San
Massachusetts: Addison-Wesley. Francisco: International Computer Music Association,
Camurri, A., et al. 2000. EyesWeb: Toward Gesture and pp. 6572.
Affect Recognition in Interactive Dance and Music Sys- Deutsch, D., ed. 1999. The Psychology of Music, 2nd ed.
tems. Computer Music Journal 24(1):5769. San Diego: Academic Press.
Chion, M. 1994. Audio-Vision: Sound on Screen, trans. Gromala, D., M. Novak, and Y. Sharir. 1993. Dancing
C. Gorbman. New York: Columbia University Press. with the Virtual Dervish. Paper presented at the Fourth
Dannenberg 35