Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Cheng* and
Gregory H. Wakefield†
Moving Sound Source
*Dolby Laboratories
100 Potrero Avenue
Synthesis for Binaural
San Francisco, California 94103-4813, USA
cnc@dolby.com
Electroacoustic Music
http://www.eecs.umich.edu/⬃coreyc
This work was completed at the University
Using Interpolated
of Michigan, Ann Arbor.
†
University of Michigan Head-Related Transfer
Department of Electrical Engineering
and Computer Science Functions (HRTFs)
1101 Beal Ave.
Ann Arbor, Michigan 48109, USA
ghw@umich.edu
http://www.eecs.umich.edu/⬃ghw
Recent advances in computational power, acoustic then define interpolated HRTFs, describe why they
measuring techniques, and hearing technology have are important for synthesizing moving sound
made sound spatialization and moving sound sources for headphone listening, and describe a new
source synthesis both popular and widely accessi- method for computing interpolated HRTFs based
ble sound-sculpting tools. In particular, much at- on observations made from the SFRS-based visuali-
tention has been given to the use of Head-Related zation of HRTF data.
Transfer Functions (HRTFs), filters that mimic the Using a MATLAB-based implementation of a
directionally dependent filtering of the human ex- simple spatialization system that uses interpolated
ternal ear. HRTFs have been used in headphone- HRTFs to synthesize moving sound sources, we de-
and loudspeaker-based spatialization systems to scribe the capabilities and limitations of HRTF-
simulate the spectral cues responsible for direc- based moving sound source synthesis with some
tional hearing. observations made from informal listening experi-
This article presents some techniques on how to ments. Based on these observations, we suggest
compose with interpolated HRTFs to synthesize that current technical restrictions define certain
moving sound sources for binaural electroacoustic compositional problems that a composer might
music intended for headphone listening. In this choose to solve, avoid, or exploit to produce musi-
sense, we intend the article to describe some prag- cal results. We isolate left-right, front-back, and up-
matic compositional techniques and ‘‘rules of down spatial trajectories, demonstrate some
thumb’’ that may serve as a link between spatially techniques useful for emphasizing these trajecto-
based musical ideas and the scientific and techno- ries, and show how these techniques can be used to
logical realities of current binaural, headphone- express certain spatially based musical ideas.
based sound reproduction systems, such as those There are several binaural sound examples on the
described in Kendall (1995). accompanying compact disc which demonstrate
We first review duplex theory, a simple model many of the spatialization techniques discussed
that explains directional hearing in the azimuth here. These binaural examples have been specifi-
(left-right) direction. Next, we introduce HRTFs cally processed to be listened to over a good pair of
and show how a novel, spatially based visualization headphones. Nonetheless, some of these effects are
strategy for HRTF data employing Spatial Fre- more successful than others, and we realize that
quency Response Surfaces (SFRSs) provides impor- not all listeners may be able to immediately hear
tant insight about the structure of HRTF data. We the intended spatial effects. Because spatialization
Computer Music Journal, 25:4, pp. 57–80, Winter 2001 effects can be delicate and may vary somewhat
䉷 2001 Massachusetts Institute of Technology. from person to person, we suggest listening to each
We begin our background discussion with an over- Because listeners can differentiate among the loca-
view of duplex theory. tions of sounds in the free field, it is widely though
that the auditory system relies on ITD, IID, and
other spectral cues to determine spatial location.
Duplex Theory These spectral cues are called Head-Related Trans-
fer Functions (HRTFs) and summarize the
Duplex theory is a perceptual model for estimating direction-dependent acoustic filtering a sound un-
a sound’s spatial location using two binaural cues: dergoes owing to interactions with the head, torso,
interaural time differences and interaural intensity and outer ear (or pinna). Intuitively, HRTFs are
differences (Rayleigh 1907). An interaural time dif- simply filters that mimic the acoustic filtering of
ference (ITD) is defined as the difference in arrival the head, torso, and pinna and operate on free-field
times of a sound’s leading wavefront at the left and sounds much as graphic equalizers operate on re-
right ears. Similarly, an interaural intensity differ- corded sounds, as is shown in Figure 1. Different
ence (IID) is defined as the amplitude difference HRTFs corresponding to different spatial locations
generated by a sound in the free field between the can be described by different equalizer settings.
left and right ears. In general, a sound is perceived HRTFs are useful because they can be used to filter
to be closer to the ear at which the first wavefront a monaural sound into a binaural, stereo sound
arrives, where a larger ITD translates to a larger lat- which will sound as though it originates from a
eral displacement. However, at frequencies above prescribed spatial location.
about 1500 Hz, the wavelength of sound becomes Formally, a single HRTF is defined as a particular
comparable to the diameter of the head, the head subject’s left or right far-field frequency response as
starts to shadow the ear farther away from the measured from a specific point in the free field to a
sound, and ITD cues can become ambiguous. At specific point in the ear canal. Because the overall
these higher frequencies, the IID generated by head spectral contours of an individual’s HRTFs do not
shadowing becomes important to perceptual decod- change significantly as a function of distance (Duda
ing of azimuth angle. Loosely speaking, perceived and Martens 1998), both left- and right-ear HRTFs
azimuth varies approximately linearly with the log- are empirically measured from humans or manne-
arithm of the IID (Blauert 1983). quins for several azimuths (locations along the
Although the simplicity and success of duplex ‘‘left-right’’ direction) and elevations (locations
theory are attractive, the theory only explains the along the ‘‘up-down’’ direction) at a fixed radius
perception of sounds in the azimuth or ‘‘left-right’’ from the head. Thus, spatial location is designated
direction. If one attempts to apply duplex theory to by an ordered pair of angles (azimuth h, elevation
estimate a sound’s location in the free field where ), where (0⬚, 0⬚) corresponds to the location di-
chirp at
location 1
ITD1
HRTFL,1
HRTFR,1
chirp at
300 1k 3k 8 k 10 k 15 k 300 1k 3k 8 k 10 k 15 k
location 2
ITD2 Right EQ Left EQ
HRTFL,2 HRTFR,X HRTFL,X
HRTFR,2
chirp at
location 3
ITD3
HRTFL,3
HRTFR,3
rectly in front of a listener. Similarly, (–90⬚, 0) and nite Impulse Response Filter (FIR) description of
(Ⳮ90⬚, 0) correspond to locations directly opposite HRTFs in two important ways. First, the minimum
the left and right ears, respectively; (0⬚, ⳮ45⬚) and phase assumption allows us to uniquely specify an
(0⬚, Ⳮ45⬚) correspond to locations in front-below HRTF’s phase by its magnitude response alone, be-
and in front-above the listener, respectively. In ad- cause the log magnitude frequency response and
dition to ordered pairs of angles, spatial location the phase response of a minimum phase causal sys-
may also be described with the general terms ipsi- tem form a Hilbert transform pair (Oppenheim and
lateral and contralateral, meaning ‘‘on the same Schafer 1989). Second, the minimum phase as-
side of the head as the sound source’’ and ‘‘on the sumption allows us to separate ITD information
opposite side of the head as the sound source,’’ re- from the FIR specification of HRTFs. Because mini-
spectively. For example, the right ear is the contra- mum phase filters have the minimum group delay
lateral ear for sounds originating on the left side of property and minimum energy delay property, most
the head. of an HRTF’s energy occurs at the beginning of its
Technically, HRTFs are commonly specified as impulse response, so that the left and right ear
minimum phase filters. Note that an HRTF sub- minimum phase HRTFs both have approximately
sumes both ITD and IID information: time delays zero delay. Thus, complete characterization of the
are encoded into the filter’s phase spectrum, and auditory cues associated with a single spatial loca-
IID information is related to the overall power of tion involves the measurement of three quantities:
the filter. However, HRTFs have been found empir- left and right ear HRTF magnitude responses and
ically to be minimum phase systems (Kulkarni et the ITD.
al. 1995, 1999), which allows us to simplify the Fi- There are several different methods for measur-
(a)
(c)
the head. Specifically, the head can have an ampli- of all HRTFs in a data set for a fixed frequency as a
fying effect on sounds originating from certain lo- function of azimuth and elevation. Specifically, one
cations, even when these locations are completely color plot is constructed for every frequency bin in
blocked or ‘‘shadowed’’ by the head. Described as the HRTF left or right magnitude response, where
‘‘bright spots’’ by Shaw (1974) because they appear magnitude is plotted as a color height against azi-
as local maxima, these diffraction effects can be muth and elevation. Because the spatial sampling
seen by noting the relatively large peaks at low fre- pattern used during HRTF measurement is irregu-
quencies at azimuths Ⳮ90⬚ for the left and ⳮ90⬚ lar, triangulation and linear interpolation is used to
for the right HRTFs, respectively, in Figure 3. construct a surface that approximates a continuous
Although this method for visualizing HRTF data surface. These graphs are called Spatial Frequency
in the frequency domain is intuitive, it is cumber- Response Surfaces (SFRSs), and they indicate how
some when used to compare different sets of much energy the right and left ears receive as a
HRTFs, as only a single ‘‘slice’’ of HRTFs sharing function of spatial location. Figure 4 shows the
the same azimuth or elevation can be compared at construction of SFRSs from HRTFs.
a time. Alternatively, there are other ways of visu- SFRSs provide an immediate, visual, and very
alizing different cross-sections of HRTF data, in- compact presentation of HRTF data. The most
cluding additional frequency-domain methods striking quality of SFRSs is the location and appar-
(Carlile and Pralong 1994), cluster analysis ent motion of the ‘‘hot spots,’’ or well-defined local
(Martens 2000), and time-domain comparisons peaks, seen in SFRSs as a function of frequency.
(Duda and Martens 1998). Figure 5 shows SFRSs constructed for several fre-
In this article, we focus on spatial representa- quencies, and Table 1 summarizes the main struc-
tions of HRTFs which plot the magnitude response tures found in each of these SFRSs. In addition, one
can readily see general, cross-subject trends in cation in virtual space, computing interpolated
these plots which are more difficult to identify by HRTFs is of central importance for synthesizing both
analyzing raw magnitude frequency responses stationary and moving sounds. We define an interpo-
alone. Specifically, initial observations reveal that lated HRTF to be an HRTF corresponding to an arbi-
SFRSs are relatively smooth surfaces that change trary spatial location which is computed from a
slowly as a function of frequency, and that different known, finite set of HRTFs. A widely used time-
subjects’ SFRSs are very similar to each other for a domain HRTF interpolation method constructs a
specific frequency. Figure 6 shows a comparison of time-domain, interpolated impulse response from a
three subjects’ SFRSs for 2.4 kHz and 9.7 kHz. weighted average of other time-domain HRTFs corre-
sponding to locations near the desired spatial loca-
tion. A similar interpolation algorithm can be
Interpolated HRTFs performed in the frequency domain, where an inter-
polated HRTF magnitude response is constructed
Because the goal of synthesizing virtual auditory from a weighted average of other HRTF magnitude
space is to be able to place a sound at an arbitrary lo- responses corresponding to locations near the desired
=169
=169
=159
=159
=148
=148
600 600 =138
=138
=127
=127
=116
=116
=106
=106
=95
=95
500 500 =80
=80
=70
=70
=60
=60
=50
=50
=40
Magnitude Response in relative dB
=40
400 400 =30
=30
=20
=20
=10
=10
=0
=0
= –10
= –10
300 300 = –20
= –20
= –30
= –30
= –40
= –40
= –50
= –50
= –60
= –60
200 200 = –70
= –70
= –80
= –80
= –90
= –90
= –95
= –95
= –106
= –106
100 100 = –116
= –116
= –127
= –127
= –138
= –138
= –148
= –148
= –159
= –159 0
0 = –169
= –169
= –180
= –180
0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5
4 4
Frequency in Hz x 10 Frequency in Hz x 10
Bin 72 = 7 kHz
Azimuth 0, Spatial Frequency Response Surface
|H| Elevation 20 (SFRS) for left ear,
f Bin 72 = 7 kHz
Azimuth 20,
Elevation 50
|H|
triangulation
Left-ear HRTFs f
for all available Azimuth -60,
linear Elevation
interpolation
spatial locations Elevation 30
|H|
f
… Azimuth
Azimuth 120,
|H| Elevation 20
spatial location. A good summary and comparison of lated value for that surface is taken as the value of
some of these HRTF interpolation methods can be the plane evaluated at the desired spatial location.
found in Hartung et al. (1999). This process is repeated for each SFRS, and each in-
Our method for computing interpolated HRTFs terpolated value is placed into the appropriate fre-
is derived from observations made from SFRSs. Ex- quency bin of the interpolated magnitude response.
amination of SFRSs in Figures 5 and 6 suggests that Figure 7 describes the interpolation algorithm and
for a fixed frequency, HRTF data contain similar, shows how it relates to the interpolation used to cre-
cross-subject features which are continuous and ate SFRSs.
change relatively slowly as a function of spatial lo- This interpolation algorithm is an immediate,
cation. Because the construction of SFRSs relies on quantitative application of SFRSs. In addition, this
linear interpolation and the resulting SFRSs seem interpolation algorithm allows different frequency
to preserve these features, it is reasonable to as- components of different HRTFs to contribute to
sume that SFRSs suggest an alternative HRTF in- the interpolated HRTF to varying degrees, unlike
terpolation method that exploits the interpolation simpler spatial methods (Middlebrooks 1992). No
used to create the SFRSs themselves. internal parameters, such as model order, need be
predetermined, as is the case with pole-zero inter-
The present algorithm constructs an interpolated
polation (Blommer and Wakefield 1995). The pres-
HRTF magnitude response for a desired spatial loca-
ent interpolation algorithm has been shown to
tion one frequency at a time, according to a weighted
produce perceptually convincing HRTFs corre-
average of values taken directly from each SFRS. Spe- sponding to arbitrary spatial positions, and further
cifically, the algorithm first performs a triangulation details on the algorithm’s performance can be
of the azimuth-elevation coordinate system to create found in Cheng and Wakefield (1999a). All of the
a grid for the available, irregularly spaced HRTF data. interpolated HRTFs generated for this article and
The vertices of the triangulation are the locations at accompanying sound examples were generated by
which HRTFs are known. In order to minimize the our algorithm described herein.
effect of the irregularity in spatial sampling, interpo-
lated locations are taken only from where the trian-
gulation is most uniform. For each SFRS, a plane is Composing in Space with Moving Sound Sources
constructed using the three magnitude response val-
ues associated with the three vertices of the triangle There are several well-known techniques that can
enclosing the desired spatial location. The interpo- be used to spatialize moving sound sources. Simple
a) b) c) d)
Left Ear, 292.97 390.63 Hz Right Ear, 292.97 390.63 Hz Left Ear, 878.91 976.56 Hz Right Ear, 878.91 976.56 Hz
80 80 80 80
60 60 60 60
40 40 40 40
elevation in degrees
elevation in degrees
elevation in degrees
elevation in degrees
20 20 20 20
0 0 0 0
e) f) g) h)
Left Ear, 1855.5 1953.1 Hz Right Ear, 1855.5 1953.1 Hz Left Ear, 2343.8 2441.4 Hz Right Ear, 2343.8 2441.4 Hz
80 80 80 80
60 60 60 60
40 40 40 40
elevation in degrees
elevation in degrees
elevation in degrees
elevation in degrees
20 20 20 20
0 0 0 0
i) j) k) l)
Left Ear, 3808.6 3906.3 Hz Right Ear, 3808.6 3906.3 Hz Left Ear, 4785.2 4882.8 Hz Right Ear, 4785.2 4882.8 Hz
80 80 80 80
60 60 60 60
40 40 40 40
elevation in degrees
elevation in degrees
elevation in degrees
elevation in degrees
20 20 20 20
0 0 0 0
m) n) o) p)
Left Ear, 5761.7 5859.4 Hz Right Ear, 5761.7 5859.4 Hz Left Ear, 7714.8 7812.5 Hz Right Ear, 7714.8 7812.5 Hz
80 80 80 80
60 60 60 60
40 40 40 40
elevation in degrees
elevation in degrees
elevation in degrees
elevation in degrees
20 20 20 20
0 0 0 0
q) r) s) t)
Left Ear, 9668 9765.6 Hz Right Ear, 9668 9765.6 Hz Left Ear, 12598 12695 Hz Right Ear, 12598 12695 Hz
80 80 80 80
60 60 60 60
40 40 40 40
elevation in degrees
elevation in degrees
elevation in degrees
20 20 20 elevation in degrees 20
0 0 0 0
panning methods are still very effective for many sound sources relies only on the use of interpolated
types of left-right motions, and dynamic reverbera- HRTFs and ITDs as described in previous sections
tion manipulation produces effective illusions of of this article, and thus the present results are in-
varying depth. More complex, pitch-based process- tended for headphone listening only. In this sense,
ing, such as Doppler shifting, can also produce our current research attempts to discover how well
compelling examples of moving sounds. Other we can spatialize sounds using these two parame-
methods for producing moving sound sources, ters alone, and thus to isolate the strengths and
such as ambisonics (Malham and Myatt 1995) and weaknesses of HRTF-based moving sound synthe-
interaural cross-talk cancellation (Cooper and sis. This section documents a MATLAB-based spa-
Bauck 1989), rely on systems designed for loud- tialization tool and graphical user interface (GUI)
speaker reproduction of spatial sound. An excel- designed to synthesize moving sound sources with
lent production-quality software package that interpolated HRTFs and ITDs. Then, we outline
incorporates these and other spatialization tech- the strengths and weaknesses of the spatialization
niques is IRCAM’s Spatialisateur (‘‘Spat’’) pro- tool in creating convincing spatialized sounds hav-
gram. ing different spatial trajectories. Finally, we discuss
However, our method for synthesizing moving compositional techniques designed to take advan-
60 60 60 60
40 40 40 40
elevation in degrees
elevation in degrees
elevation in degrees
elevation in degrees
20 20 20 20
0 0 0 0
60 60 60 60
40 40 40 40
elevation in degrees
elevation in degrees
elevation in degrees
elevation in degrees
20 20 20 20
0 0 0 0
60 60 60 60
40 40 40 40
elevation in degrees
elevation in degrees
elevation in degrees
elevation in degrees
20 20 20 20
0 0 0 0
tage of the strengths of HRTF-based moving sound titrack mixing programs, this interface allows the
source synthesis. user to manipulate a sound’s azimuth and eleva-
tion trajectories independently with piecewise lin-
ear envelopes that can be saved and applied to
Implementation of a MATLAB-Based other sound files of different lengths. Sound design-
Spatialization GUI ers can use the interface to process sounds with dif-
ferent sets of interpolated HRTFs to customize
Interpolated HRTFs can be used not only to place a spatialized sounds to a particular individual. The
sound at an arbitrary point in space, but they can interface also shows the source waveform to be
also be used to produce a moving sound following spatialized so that sound designers can synchronize
an arbitrary trajectory through space. In order to azimuth and elevation trajectories with important,
synthesize moving sounds for headphone playback, visually identifiable cues in the source sound, such
interpolated HRTFs and interpolated ITDs corre- as sharp transients, clicks, or onsets of sounds. Fi-
sponding to slowly varying spatial locations along nally, the interface provides rudimentary playback
the desired spatial trajectory are dynamically up- capabilities for both source and target (spatialized)
dated during the processing of a monaural sound sounds, so sound designers can evaluate input
source, as depicted in Figure 8. sounds, output sounds, and spatial trajectories si-
To make the process of experimenting with in- multaneously. Figure 9 shows a snapshot of this in-
terpolated HRTFs and spatial trajectories more in- terface.
tuitive and user-friendly, we have developed a Although the implementation of the system de-
simple MATLAB-based user interface which imple- picted in Figure 8 is relatively straightforward,
ments a non-real-time version of the spatialization there are some details about its use of interpolation
system shown in Figure 8. All of the accompanying and time-varying filtering which are important to
HRTF-based sound examples were produced using discuss. First, note that it is undesirable for either
this interface. Following the paradigm of volume the delay lines or the filters to change so abruptly
and panning envelope editors present in many mul- that their combined processing will produce clicks,
Back Front
(-180°,0°) (0°,0°)
Right
(+90°,0°)
Down
(0°,-90°)
and problems that occur in practice. While some of from in front of or above a listener actually sound
these restrictions place limits on the ‘‘spatial tessi- like they originate from in back of or below the lis-
tura’’ available to composers, other restrictions can tener (the so-called ‘‘front-back’’ and ‘‘up-down’’
be viewed as interesting compositional problems confusions) (Wightman and Kistler 1989b). Synthe-
that can be exploited to produce musical results. sis of sounds with non-zero elevations is difficult.
Many limitations of moving sound source syn- Also, because every individual has a unique set of
thesis are closely related to some well-known limi- HRTFs, a subject listening to a spatialized sound
tations of stationary sound source synthesis, in generated from a ‘‘generalized’’ set of HRTFs may
which the synthesized sounds do not move through not perceive the sound in the intended spatial loca-
space. For example, the simple HRTF-based spatial- tion (Pralong and Carlile 1996; Wenzel et al 1993).
ization algorithm shown in Figure 8 may not pro- Figure 10 summarizes some informal observa-
duce the intended spatialization effects, even for tions that show how stationary sounds are more
broadband stationary sounds: listeners often report convincingly spatialized or externalized at some
that there is a lack of ‘‘presence’’ in spatially- spatial locations rather than others. Composers
synthesized sounds, and that sounds spatialized might choose to limit themselves to these spatial
near the median plane (0⬚ azimuth) sound as areas if they wish to render spatial ideas that can
though they are inside the head (Griesinger 1999). be easily heard by a wide range of listeners over
Signals processed to sound as though they originate wide variety of headphones. Alternatively, compos-
6
Left-Right Motion
Left-right motion is perhaps still the best known, 7
a) b) c)
Sound L-R F-B
F Example 5, Tra- Tra-
Tennis jec- jec-
& œ
œ œ
œ œ
œ œ
œ ˙
˙ Stroke # tory tory
1 L
œ œ œ œ ˙ Forward-
2 R
? œ œ œ #œ ˙ Backward
Spatial 3 L
Trajectory 4 R
C: I V4 3 I6 5 L B
G: IV6 V65 I Left-Right 6 F
BL Spatial R
Trajectory
7 B
8 F
Pivot chord between two key areas 9 B
Pivot location between two spatial trajectories 10 F
gerated change in externalization by intentionally and sounds that travel too slowly from back to
and quickly moving a sound source from a location front to back can lose their overall front-back spa-
with good externalization (directly opposite the ears tialization effect.
at Ⳳ90⬚ azimuth) to a location with poor localiza- However, this limitation may be of musical in-
tion (directly in front of the listener at 0⬚ azimuth). terest to composers who wish to exploit the spatial
ambiguity between front and back locations over
an extended period of time. For example, a com-
Front-Back Motion
poser may wish to write in a ‘‘minimalist’’ style in
One of the advantages HRTF-based spatialization which recurring patterns of sounds have spatial tra-
algorithms have over simpler methods is the ability jectories that change slowly over time. By gradually
to render sources in front of and behind a listener. swapping front locations with rear locations in
However, synthesizing these types of sounds is still these repeating spatial trajectories, a composer may
very difficult, as front-back confusions occur often, be able to create a piece in which a listener sud-
especially for sounds generated from generic, non- denly perceives a large change in the spatial charac-
individualized sets of HRTFs and played back over ter of the sound from front to back after an
commonly available headphones. extended period of listening.
In Sound example 4, the tennis ball travels re- The rendering of an ambiguous spatial location
peatedly from left and in front of the listener to left can also be used as a transition point, or ‘‘pivot lo-
and in back of the listener. Note that this effect is cation,’’ that connects two different spatial ideas.
more subtle than left-right motion and might be Specifically, a front or rear spatial location can
more difficult to hear. For this reason, we have serve as a common spatial location shared by two
found it helpful to keep changes in the angular distinct spatial trajectories. In this sense, a front or
speed of the sound source fairly low when trying to rear spatial location can function much like a
achieve front-back motion. In addition, another ‘‘pivot chord’’ does during the modulation between
helpful ‘‘rule of thumb’’ is to try to keep the trajec- two different tonal key areas: whereas a pivot chord
tory’s elevation constant and near 0⬚ azimuth when is a chord common to both the source and destina-
trying to achieve front-back motion. tion key areas, a front or rear location may serve as
The delicacy of front-back motion has important a ‘‘pivot location’’—which is a location common
musical consequences. Rendering a sound that between two spatial trajectories. Figure 12 shows
passes either extremely slowly or extremely the analogy between ‘‘pivot chords’’ and ‘‘pivot lo-
quickly from front to back is difficult using HRTF- cations,’’ and illustrates the idea of linking two dif-
based spatialization techniques alone. This is be- ferent spatial trajectories.
cause sounds that travel too quickly from front to Sound example 5, another spatialization of the
back can sound like they travel from back to front, tennis example, attempts to mix two distinct spa-
a) b) Up-Down Motion
Using interpolated HRTFs also enables a composer
to move sounds from positions above the listener
(positive elevations) to positions below the listener
(negative elevations). However, this effect is more
difficult to produce than left-right motion, as both
the discernment of up-down and front-back trajec-
tories depends heavily on the particular set of
HRTFs used during processing. For example, it has
been shown that the number of ‘‘front-back’’ and
‘‘up-down’’ confusions increases when listening to
spatialized sounds produced from non-
individualized HRTFs (Wenzel et al. 1993).
In order to orchestrate up-down trajectories so
that they sound more convincing to a wider range
of listeners, we have found it useful to modify up-
tial ideas: left-right motion and front-back motion. down spatial trajectories having a single azimuth
The intent here is to ‘‘trick’’ the listener into ini- so that they traverse other spatial locations which
tially hearing one spatial trajectory, and then to are more easily spatialized. For example, Figure 13a
shift the listener’s perceived spatial orientation and is a sketch of a spatial trajectory intended to pass
attention to another spatial trajectory through the from lower elevations to higher elevations directly
use a ‘‘pivot location’’ common to both trajectories. across from the left and right ears (azimuths ⳮ90⬚
In Sound example 5, the listener hears left-right and Ⳮ90⬚, respectively). However, according to Fig-
motion for the first five strokes of the tennis ball, ure 10, it might be useful to slightly skew the
and is most likely convinced that the two tennis ‘‘straight,’’ up-down trajectory in Figure 13a to take
players are on the listener’s left and right. How- advantage of better spatialized areas behind/below
ever, strokes 5–8 all occur on the left side of the lis- and in front of/above the listener. Thus, to realize
tener, such that there is a subtle front-back motion the intended spatial trajectory in Figure 13a, we ac-
of the tennis ball for these strokes. Therefore, the tually implement the spatial trajectory in Figure
listener realizes that, in order for the tennis players 13b, which traverses several different azimuths to
to be on different sides of the net, one player must exploit these spatial ‘‘sweet spots.’’
be in front of the listener, and the other player Sound example 6 contains a monaural, unspatial-
must be in back of the listener. ized recording of footsteps climbing up and down a
Thus, for the first five strokes, one of the players
flight of stairs. Sound example 7 contains a spatial-
is in back of and to the left of the listener, and the
ized version of the footsteps that makes use of the
other player is in front of and to the right of the lis-
modified spatial trajectory sketched in Figure 13b
tener. Because the front-back spatialization of the
first five strokes is more subtle than the left-right to accentuate the intended up-down trajectory. Fig-
spatialization of the first five hits, most listeners ure 9 shows the exact spatial trajectory pro-
will not hear the front-back trajectory of the tennis grammed into the spatialization GUI used to
ball and will instead hear only the tennis ball’s produce this sound example. Note that as elevation
more dominant left-right trajectory. It is only on increases from ⳮ40⬚ to Ⳮ40⬚, the azimuth trajec-
the fifth stroke, which occurs in a rear-left ‘‘pivot tory increases from ⳮ135⬚ to ⳮ45⬚ as well. In this
location,’’ that the listener may realize that the manner, the elevation change is reinforced with an
first five strokes might have contained a front-back azimuth change that takes advantage of the good
component to spatialization all along. spatialization properties behind and in front of the
1. Start with
HRTF version of
spatialized sound.
2. Cross-fade to hard,
L-R panned version of
spatialized sound.
3. Cross-fade back
to HRTF version of
spatialized sound.
1. 2. 3.
Mono Stereo sound spatialized
soundfile with interpolated HRTFs
Figure 14.
Figure 15.
HRTFs
?
Cheng and Wakefield 77
impression of highly accelerated motion around the interpolated HRTFs. Using SFRSs to visualize
listener’s head. This ‘‘slingshot’’ effect exploits the HRTF data suggests a new signal processing strat-
different externalization properties of each spatiali- egy for computing interpolated HRTFs, a technique
zation method. Because the panned sound sounds that is useful for the synthesis of moving sound
closer to the listener and the HRTF-spatialized sources for binaural electroacoustic music. In the
sound sounds farther away from the listener, cross- future, new visualization approaches may uncover
fading between these two versions of the same more structure in HRTFs that could suggest other
source sound can produce the impression of in- signal processing and compositional strategies for
creasing acceleration. Figure 14 shows how the handling the synthesis of moving sound sources.
same sound spatialized these two different ways Investigations into even more compact, three-
can be cross-faded to produce a sound that origi- dimensional, volumetric representations of HRTFs
nates from a distant spatial location to the left of a similar to those already used to visualize the hu-
listener, approaches and accelerates quickly around man anatomy are already underway. Figure 15
the listener’s right side, and then returns to a dis- shows the motivation behind these studies, and the
tant location on the left. interested reader can find preliminary results in
Sound example 15, another excerpt from Fish- Cheng and Wakefield (2000).
bowl, demonstrates the ‘‘slingshot’’ effect. Note Admittedly, it is sometimes easy for the authors,
how the spatial focus of the excerpt moves from as scientists trained in inquiry for inquiry’s sake, to
the center to the near right, and then quickly to the forget about music for music’s sake. Science and
far left, producing the illusion that the sounds are technology may be art forms in themselves, but we
accelerating quickly from near the listener to far must not forget that in the ideal case, electroacous-
away. The sounds at the near right of the listener tic music should not have to exist simply to ex-
are produced with panning, and the sounds at the press the intricate technology behind it. Indeed,
far left of the listener are produced with HRTF- with reference to the current subject of spatializa-
based spatialization. Compare this to Sound exam- tion, others have cautioned that ‘‘[i]t is important
ple 16, an excerpt from Fishbowl that attempts to to remember that space, and spatialization as a
develop this same spatial trajectory over a longer sound parameter, can and should be used composi-
period of time with different sounds. tionally in computer music’’ (Pope 1995). With this
Finally, Sound example 17 contains the entire in mind, this article has presented several practical,
piece Fishbowl. Although tension and release are concrete techniques for working with moving
developed in spatial terms throughout the piece, sound sources synthesized from interpolated
Fishbowl also attempts to integrate these spatial HRTFs. It is our sincere hope that these techniques
ideas with other non-spatial ideas into a larger mu- add to a growing vocabulary for the expression of
sical context. For example, the manipulation of spatial ideas, and that the science and engineering
spatial trajectories combined with the extramusical from which these techniques are motivated and
ideas implied by some of the processed water built prove useful for achieving artistic goals.
sounds helps to paint the fishbowl’s creatures with
different personalities and temperaments. Also, the
final ‘‘splash’’ of tone color near the end of the Acknowledgments
piece is an attempt to release spatial tension by us-
ing pitched material. The authors wish to thank Dr. John C. Middle-
brooks at the Kresge Hearing Research Institute of
the University of Michigan for providing data used
Conclusions and Future Directions in this research. We also thank Dr. Thomas Buell
and Ms. Heather Kelly at the Naval Submarine
This article introduced several techniques for com- Medical Research Laboratory (NSMRL) in Groton,
posing with moving sound sources generated from Connecticut, for their work on developing the