Sei sulla pagina 1di 9

Grove Music Online

Hearing and psychoacoustics


Brian C.J. Moore

https://doi.org/10.1093/gmo/9781561592630.article.42531
Published in print: 20 January 2001
Published online: 2001

Hearing is the sense with which sound is detected and analysed.


Psychoacoustics is concerned with the relationship between the
physical characteristics of sound (e.g. intensity, physical location in
space) and what is actually perceived by the listener (e.g. loudness,
perceived position in space). It is also concerned with the ability to
discriminate between different sounds. This section deals with basic
aspects of hearing; for other aspects see Periodicals, ; Consonance,
§2; Psychology of music, §II; and Sound.

1. Sound spectra and level.

Sound usually originates from the vibration of an object. This


vibration is impressed upon the surrounding medium (usually air) as
a pattern of changes in pressure. The pressure changes are
transmitted through the medium and may be heard as sound.
Although any sound can be described in terms of sound pressure as
a function of time (often called the waveform of the sound), it is
often more meaningful to describe sound in a different way, based on
a theorem by Fourier, who proved that any complex waveform can be
analysed (or broken down) into a series of sinusoids. A sinusoid
resembles the sound produced by a tuning-fork, and it is often called
a simple tone or a pure tone. The analysis of a sound in this way is
called Fourier analysis, and each sinusoid is called a (Fourier)
‘component’ of the complex sound. A plot of the magnitudes of the
components as a function of frequency is called the ‘spectrum’ of the
sound.

Many sounds produced by musical instruments are periodic, or


almost periodic; the waveform repeats at regular time intervals and
the repetition rate remains rate remains roughly constant over the
duration of a musical note. Such sounds have a clear pitch. Other
sounds, such as that of a snare drum, are aperiodic and noise-like. A
periodic sound is composed of a number of sinusoids, each of which
has a frequency that is an integer multiple of the frequency of a
common (not necessarily present) fundamental component. The
fundamental component has a frequency equal to the repetition rate
of the complex waveform as a whole. The frequency components of
the complex sound are known as harmonics and are numbered, the
fundamental being given harmonic number 1. The nth harmonic has
a frequency which is n times that of the fundamental. The relative

Page 1 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).
magnitudes of the harmonics vary across different instruments. For
example, the clarinet has a relatively weak 2nd harmonic and a
strong 3rd harmonic.

One of the reasons for representing sounds in terms of their


sinusoidal components is that the human auditory system performs a
similar analysis. For example, two simultaneous sinusoids, whose
frequencies are not too close, are usually heard as two separate
tones each with its own pitch. The perceived timbre of steady tones
is quite closely related to the spectrum (Plomp, 1976).

Because the auditory system can deal with a huge range of sound
pressures, sound level or magnitude is usually expressed using a
logarithmic measure known as the Decibel. Each 20 decibel increase
in level corresponds to an increase in sound pressure by a factor of
ten. For example, a 60 decibel increase corresponds to a 1000-fold
increase in sound pressure. Normal conversation typically has a
level of 65–70 decibels, while an orchestra playing fortissimo may
produce sound levels of 110 decibels at seats close to the front.
Musicians seated in front of the brass section in an orchestra may be
exposed to sound levels up to 120 decibels, which can be damaging
to the ear (see also Loudness).

2. Structure and function of the auditory


system.

Fig.1 shows the structure of the peripheral part of the human


auditory system. The outer ear is composed of the pinna and the
auditory canal or meatus. Sound travels down the meatus and
causes the eardrum, or tympanic membrane, to vibrate. These
vibrations are transmitted through the middle ear by three small
bones, the ossicles (malleus, incus and stapes) to a membrane-
covered opening (the oval window) in the bony wall of the spiral-
shaped structure of the inner ear, the cochlea.

The cochlea is divided along its length by the basilar membrane,


which moves in response to sound. The response to sinusoidal
stimulation takes the form of a travelling wave which moves along
the membrane, with an amplitude that increases at first and then
decreases rather abruptly. Fig.2 shows the instantaneous
displacement of the basilar membrane for two successive instants in
time, in response to a 200 Hz sinusoid. The line joining the
amplitude peaks is called the envelope. The envelope shows a peak
at a particular position on the basilar membrane.

The position of the peak in the envelope differs according to the


frequency of stimulation. High-frequency sounds (around 15,000 Hz)
produce a peak near the oval window, while low-frequency sounds
(around 50 Hz) produce a peak towards the other end of the
membrane (the apex). Intermediate frequencies produce peaks at
intermediate places. Thus, each point on the basilar membrane is
‘tuned’ to a particular frequency. When a sound is composed of
several sinusoids with different frequencies, each sinusoid produces

Page 2 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).
a peak at its own characteristic place on the basilar membrane. In
effect, the cochlea behaves like a Fourier analyser, although with a
less than perfect frequency-analysing power.

Recent measurements of basilar membrane vibration have shown


that the membrane is much more selectively tuned than originally
found by von Békésy (1960). The better the physiological condition
of the membrane, the more selective is the tuning (Khanna and
Leonard, 1982). In a normal, healthy ear, each point on the basilar
membrane responds with high sensitivity to a limited range of
frequencies; higher sound intensities are required to produce a
response as the frequency is made higher or lower. This selective
tuning and high sensitivity probably reflect an active process; that is,
they do not result simply from the mechanical properties of the
membrane and surrounding fluid, but depend on biological
structures that actively influence the mechanics (Yates, 1995).

Lying above the basilar membrane is a second structure, the


tectorial membrane. Between the two membranes are hair cells,
which form part of a structure called the organ of Corti. The hair
cells are divided into two groups by an arch known as the tunnel of
Corti. Those on the side of the arch closest to the outside of the
cochlea are called outer hair cells, and are arranged in three rows in
cats and up to five rows in humans. The hair cells on the other side
of the arch form a single row and are called inner hair cells. There
are about 25,000 outer and about 3500 inner hair cells. The tectorial
membrane, which has a gelatinous structure, lies above the hairs.
When the basilar membrane moves up and down, a shearing motion
is created between the basilar membrane and the tectorial
membrane. As a result, the hairs at the tops of the hair cells are
displaced. This leads to excitation of the inner hair cells, which leads
in turn to the generation of action potentials in the neurones, or
nerve cells, of the auditory nerve. The action potentials are brief
electrical ‘spikes’ or ‘impulses’ which travel along the nerve and
carry information to the brain. The main role of the outer hair cells
may be actively to influence the mechanics of the cochlea so as to
produce high sensitivity and selective tuning (Yates, 1995).

Each neurone in the auditory nerve derives its activity from one or
more hair cells lying at a particular place on the basilar membrane.
Thus, the neurones are ‘tuned’. In addition, nerve firings tend to be
phase-locked or synchronized to the time pattern of the stimulating
waveform. A given neurone does not necessarily fire on every cycle
of the stimulus but, when firings do occur, they occur at roughly the
same point on the waveform each time. This phase-locking is lost at
high frequencies, above around 5000 Hz.

3. The limits of hearing.

The absolute threshold of a sound is the minimum detectable level of


that sound in the absence of any other external sounds. The sounds
are usually delivered by a loudspeaker in a large anechoic chamber
(a room whose walls are highly sound-absorbing). The measurement

Page 3 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).
of sound level is made after the listener is removed from the sound
field, at the point formerly occupied by the centre of the listener's
head.

Fig.3 shows estimates of the absolute threshold of sound at various


frequencies. The curve represents the average data from many
young listeners with normal hearing. However, individual listeners
may have thresholds as much as 20 decibels above or below the
mean at a specific frequency and still be considered ‘normal’.
Absolute sensitivity is greatest in the frequency range between 2
and 5 kHz, partly because of a broad resonance produced by the ear
canal. This frequency range corresponds to the higher formant
frequencies (resonances in the vocal tract) of speech sounds. The
‘singing formant’, a resonance in the vocal tract produced by singers
to boost frequencies between 2 and 3 kHz, typically falls within this
range as well (Sundberg, 1974).

Thresholds increase rapidly at very high and very low frequencies.


This effect depends at least partly on the transmission characteristic
of the middle ear. Transmission is most efficient for mid-range
frequencies and drops off markedly for very low and very high
frequencies (Rosowski, 1991). The highest audible frequency varies
considerably with the age of the listener. Young children can often
hear tones as high as 20 kHz, but for most adults the threshold rises
rapidly above about 15 kHz. The loss of sensitivity with increasing
age (presbyacusis) is much greater at high frequencies than at low,
and the variability between different listeners is also greater at high
frequencies.

There is no clear low-frequency limit to human hearing. However,


sounds with frequencies below about 16 Hz are not heard in the
normal sense, but are detected by virtue of the distortion products
(harmonics) that they produce after passing through the middle ear.
In addition, very intense low-frequency tones can sometimes be felt
as vibration before they are heard. The low-frequency limit for the
‘true’ hearing of pure tones probably lies at about 16 Hz. This is
close to the lowest frequency that evokes a pitch sensation.

4. Masking and frequency analysis.

The auditory system acts as a limited-resolution frequency analyser;


complex sounds are broken down into their sinusoidal components.
This analysis almost certainly depends mainly on the tuning
observed on the basilar membrane. Largely as a consequence of this
analysis, we are able to hear one sound in the presence of another
sound with a different frequency. This ability is known as frequency
selectivity, or frequency resolution. Frequency selectivity plays a role
in many aspects of auditory perception, including pitch, timbre and
loudness.

Important sounds are sometimes rendered inaudible by other


sounds, a process known as ‘masking’. Masking may be considered
as a failure of frequency selectivity, and it can be used as a tool to
measure the frequency selectivity of the ear. One theory of masking
assumes that the auditory system contains a bank of overlapping

Page 4 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).
band-pass filters (Fletcher, 1940; Patterson and Moore, 1986). Each
of these ‘auditory filters’ is assumed to respond to a limited range of
frequencies. In the simple case of a sinusoidal signal presented in a
background noise, it is assumed that the listener detects the signal
using the filter whose output has the highest signal-to-masker ratio.
The signal is detected if that ratio exceeds a certain value. In most
situations, the filter involved has a centre frequency close to that of
the signal.

A good deal of work has been directed towards determining the


characteristics of the auditory filters (see Moore, 4/1997). One way
of characterizing a filter is in terms of the range of frequencies to
which it responds most strongly. This range is referred to as the
‘bandwidth’. The bandwidth of an auditory filter estimated from
masking experiments is often called the ‘critical
bandwidth’ (Fletcher, 1940; Zwicker, 1961), although more recently
the term ‘equivalent rectangular bandwidth’ has been used (Moore
and Glasberg, 1983; Glasberg and Moore, 1990). This is defined as
the frequency range covered by a rectangular filter with the same
peak value and which passes the same total power of white noise (a
sound containing equal energy at all frequencies). When we listen to
a complex sound containing many partials, an individual partial can
be ‘heard out’ (perceived as separate tone) when it is separated from
neighbouring partials by a little more than one equivalent
rectangular bandwidth (Moore and Ohgushi, 1993). For harmonic
complex tones, this means that only the lower harmonics (up to the
5th to 8th) can be heard out (Plomp, 1964).

5. The perception of loudness.

The Loudness of a given sound generally increases with increasing


physical intensity. However, two sounds with the same intensity may
appear very different in loudness, since loudness is also affected
strongly by the spectrum of the sounds. It is useful to have a scale
that allows one to compare the loudness of different sounds. A first
step towards this is to construct equal-loudness contours for
sinusoids of different frequencies. Say, for example, we take a
standard tone of 1 kHz at a level of 40 decibels, and ask the listener
to adjust the level of a second tone (say, 2 kHz) so that it sounds
equally loud. If we repeat this for many different frequencies of the
second tone, then the sound level required, plotted as a function of
frequency, maps out an equal-loudness contour. If we repeat this
procedure for different levels of the 1 kHz standard tone, then we
will map out a family of equal-loudness contours. Note that the
contours resemble the absolute threshold curve (lowest curve in the
figure) at low levels, but tend to become flatter at high levels. As a
result, the relative loudness of different frequencies can change with
overall sound level. For example, a 100 Hz tone at 40 decibels would
sound quieter than a 1000 Hz tone at 30 decibels. However, if both
tones were increased in level by 60 decibels, the 100 Hz tone at 100
decibels would sound louder than the 1000 Hz tone at 90 decibels.

The subjective loudness of a sound is not directly proportional to its


physical intensity. For sound levels above about 40 decibels, the
loudness roughly doubles when the intensity is increased by a factor
Page 5 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).
of ten, which is equivalent to adding 10 decibels (Stevens, 1957).
This property of the ear has important implications for the
perception of musical sounds. For example, ten violins each playing
with the same intensity will sound only twice as loud as a single
violin, and 100 violins will sound only four times as loud as a single
violin.

6. Frequency discrimination and the


perception of pitch.

Pitch is defined as the attribute of auditory sensation in terms of


which sounds may be ordered on a musical scale, that is, the
attribute in which variations constitute melody (see Pitch). For
sinusoids (pure tones) the pitch is largely determined by the
frequency: the higher the frequency, the higher the pitch. One of the
classic debates in hearing theory is concerned with the mechanisms
underlying the perceptions of pitch. One theory, called the ‘place’
theory, suggests that pitch is related to the position of maximum
vibration on the basilar membrane, which is coded in terms of the
relative activity of neurones tuned to different frequencies. The
alternative theory, the ‘temporal’ theory, suggests that pitch is
determined by the time pattern of neural spikes (phase-locking).

One major fact that these theories have to account for is our
remarkably fine acuity in detecting frequency changes. This ability is
called frequency discrimination and is not to be confused with
frequency selectivity. For two tones presented successively and
lasting 500 milliseconds, a difference of about 3 Hz (or less in
trained subjects) can be detected at a centre frequency of 1 kHz. It
has been suggested that tuning-curves (or auditory filters) are not
sufficiently sharp to account for this acuity in terms of the place
theory (Moore and Glasberg, 1986). A further difficulty for the place
theory is that frequency discrimination worsens abruptly above 4 or
5 kHz (Moore, 1973). Neither neural measures of frequency
selectivity (such as tuning-curves) nor psychoacoustical measures of
frequency selectivity (such as auditory filters) show any abrupt
change there.

These facts can be explained by assuming that temporal mechanisms


are dominant at frequencies below 4 –5 kHz. The worsening
performance for frequencies above this level corresponds well with
the frequency at which the temporal information ceases to be
available. Studies of our perception of musical intervals also indicate
a change in mechanism around 4–5 kHz (Ward, 1954). Below this, a
sequence of pure tones with appropriate frequencies conveys a clear
sense of melody. Above this, the sense of musical interval and of
melody is lost, although the changes in frequency may still be heard.
The important frequencies for the perception of music and speech lie
in the frequency range where temporal information is available.

When we listen to a complex tone, such as that produced by a


musical instrument or a singer, the pitch usually corresponds to the
fundamental component. However, the same pitch is heard when the
fundamental component is weak or absent completely, an effect

Page 6 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).
called ‘the phenomenon of the missing fundamental’. It appears that
the perceived pitch is somehow constructed in the brain from the
harmonics above the fundamental (Moore, 4/1977, #2731).

7. Localization of sounds.

Two major cues for sound localization are differences in the time of
arrival and differences in intensity at the two ears. For example, a
sound coming from the left will arrive first at the left ear and be
more intense in the left ear. For steady sinusoidal stimulation,
differences in time of arrival can be detected and used to judge
location only for frequencies below about 1500 Hz. At low
frequencies, very small changes in relative time of arrival at the two
ears can be detected, of about 10–20 millionths of a second, which is
equivalent to a lateral movement of the sound source of one to two
degrees.

Intensity differences between the two ears are primarily useful at


high frequencies. This is because low frequencies bend or diffract
around the head, so that there is little difference in intensity at the
two ears whatever the location of the sound source. At high
frequencies the head casts more of an acoustic ‘shadow’, and above
2–3 kHz the intensity differences are sufficient to provide useful
cues. For complex sounds, containing a range of frequencies, the
difference in spectral patterning at the two ears may also be
important.

Binaural cues are not sufficient to account for all of our localization
abilities. For example, a difference in time or intensity will not define
whether a sound is coming from in front or behind, or above or
below, but people can clearly make such judgments. The extra
information is provided by the pinnae (Grantham, 1995. The spectra
of sounds entering the ear are modified by the pinnae in a way that
depends on the direction of the sound source. This direction-
dependent filtering provides cues for sound-source location. The
cues occur mainly at high frequencies, above about 6 kHz. The
pinnae are important not only for localization, but also for judging
whether a sound comes from within the head or from the outside
world. A sound is judged as coming from outside only if the spectral
transformations characteristic of the pinnae are imposed on it. Thus,
sounds heard through headphones are normally judged as being
inside the head; the pinnae do not have their normal effect when
headphones are worn. However, sounds delivered by headphones
can be made to appear to come from outside the head if the signals
delivered to the headphones are pre-recorded on a dummy head or
synthetically processed (filtered) so as to mimic the normal action of
the pinnae. Such processing can also create the impression of a
sound coming from any desired direction in space.

Bibliography
H. Fletcher : ‘Auditory Patterns’, Review of Modern
Physics, 12 (1940), 47–65

Page 7 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).
G. von Békésy : ‘The Variations of Phase along the Basilar
Membrane with Sinusoidal Vibrations’, JASA, 19 (1947),
452–60

W.D. Ward : ‘Subjective Musical Pitch’, JASA, 26 (1954),


369–80

S.S. Stevens : ‘On the Psychophysical Law’, Psychological


Review, 14 (1957), 153–81

G. von Békésy : Experiments in Hearing (Eng. trans., New


York, 1960)

E. Zwicker : ‘Subdivision of the Audible Frequency Range


into Critical Bands (Frequenzgruppen)’, JASA, 33 (1961),
248 only [letter to editor]

R. Plomp : ‘The Ear as a Frequency Analyzer’, JASA, 36


(1964), 1628–36

B.C.J. Moore : ‘Frequency Difference Limens for Short-


Duration Tones’, JASA, 54 (1973), 610–19

J. Sundberg : ‘Articulatory Interpretation of the “Singing


Formant”’, JASA, 55 (1974), 838–44

R. Plomp : Aspects of Tone Sensation (London, 1976)

S.M. Khanna and D.G.B. Leonard : ‘Basilar Membrane


Tuning in the Cat Cochlea’, Science, 215 (1982), 305–6

B.C.J. Moore and B.R. Glasberg : ‘Suggested Formulae for


Calculating Auditory-Filter Bandwidths and Excitation
Patterns’, JASA, 74 (1983), 750–53

B.C.J. Moore and B.R. Glasberg : ‘The Role of Frequency


Selectivity in the Perception of Loudness, Pitch and Time’,
Frequency Selectivity in Hearing, ed. B.C.J. Moore
(London, 1986), 251–308

R.D. Patterson and B.C.J. Moore : ‘Auditory Filters and


Excitation Patterns as Representations of Frequency
Resolution’, Frequency Selectivity in Hearing, ed. B.C.J.
Moore (London, 1986), 123–77

L. Robles, M.A. Ruggero and N.C. Rich : ‘Basilar


Membrane Mechanics at the Base of the Chinchilla
Cochlea I: Input-Output Functions, Tuning Curves, and
Response Phases’, JASA, 80 (1986), 1364–74

A.R. Palmer : ‘Physiology of the Cochlear Nerve and


Cochlear Nucleus’, Hearing, ed. M.P. Haggard and E.F.
Evans (Edinburgh, 1987), 838–55

Page 8 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).
B.R. Glasberg and B.C.J. Moore : ‘Derivation of Auditory
Filter Shapes from Notched-Noise Data’, Hearing
Research, 47 (1990), 103–38

J.J. Rosowski : ‘The Effects of External and Middle-Ear


Filtering on Auditory Threshold and Noise-Induced
Hearing Loss’, JASA, 90 (1991), 124–35

B.C.J. Moore and K. Ohgushi : ‘Audibility of Partials in


Inharmonic Complex Tones’, JASA, 93 (1993), 452–61

D.W. Grantham : ‘Spatial Hearing and Related


Phenomena’, Hearing, ed. B.C.J. Moore (San Diego, 1995),
297–345

G.K. Yates : ‘Cochlear Structure and Function’, Hearing,


ed. B.C.J. Moore (San Diego, 1995), 41–73

Acoustics: Reference Zero for the Calibration of


Audiometric Equipment, Part 7: Reference Threshold of
Hearing Under Free-Field and Diffuse-Field Listening
Conditions (Geneva, 1996) [ISO 389–7]

B.C.J. Moore : An Introduction to the Psychology of


Hearing (San Diego, 4/1997) [orig. pubd, Baltimore and
London, 1977]
See also
Computers and music, §VIII: Psychology research
Psychology of music, §II, 1(iv): Perception & cognitionof
pitch: Auditory scene analysis
Rhythm, §I, 2: Fundamental concepts & terminology: The
perception of duration and succession
Sound, §3: Visual representation of sound

Page 9 of 9
PRINTED FROM Oxford Music Online. © Oxford University Press, 2019. All Rights Reserved. Under the terms of the licence agreement, an
individual user may print out a PDF of a single article in Oxford Music Online for personal use (for details see Privacy Policy).

Potrebbero piacerti anche