Sei sulla pagina 1di 322

Chapter 1: Introduction

• General introduction
• Communication by sound and voice
– Examples of communication situations
• Systems approach to communication
• Modeling and theory formation in research

1
M. Karjalainen
Information Transmission by Sound

Environmental orientation by sound

2
M. Karjalainen
Communication by Speech

Speech communication via acoustic medium

3
M. Karjalainen
Communication by Music

Music via acoustic medium


4
M. Karjalainen
Communication by Music

Origins of speech and music ?


Speech has been important in evolution by what about music?
Role of music: just a side product or important factor?
- Charles Darvin: Important for mating etc.

Two interesting recent books:

Steven Mithen: “The Singing Neanderthals ---


The Origins of Music, Language, Mind, and Body”
Harward University Press, 2006
Daniel J. Levitin: This is Your Brain on Music ---
The Science of a Human Obsession, PLUME 2006
5
M. Karjalainen
Speech Transmission

Speech communication electronic medium

6
M. Karjalainen
Virtual Acoustic Reality

Virtual instrument in virtual space


7
M. Karjalainen
Man-Machine Communication by Speech

Speech synthesis and recognition


8
M. Karjalainen
A Black-Box Approach

Input-output relationship

9
M. Karjalainen
A Systems Approach

A multi-level system
10
M. Karjalainen
Systemic Concepts

• Element (part of a whole, entity)


• Relation / property
• Structure (relatively permanent properties of a system)
• Function(ality) (relatively variant properties of a system)
• Event (a relatively discrete change, typically in time)
• State
• Object
• Type (class)
• System
• Control
• Process
• Organization
• Hierarchy / heterarchy
• Data / information / knowledge (communication, language)

11
M. Karjalainen
Abstraction in Modeling and Theory Formation

Abstraction hierarchy
12
M. Karjalainen
Communication by Sound and Voice

contentware
Information Cognition
functionware
Analysis Synthesis

software
Signals Physics
hardware

13
M. Karjalainen
Chapter 2: Acoustics

This is background information that is not asked directly


in the exam, but knowing it certainly helps, especially if
you need to apply your knowledge in practice.

1
M. Karjalainen
Chapter 2: Acoustics

Sound as physical phenomenon

When a tree in a forrest falls, and there is


no one to listen, does it make a sound?

• Vibration – generation of sound


• Sound radiation
• Sound propagation
• Reflection, absorption,
• Diffraction, refraction
• Standing waves
• Resonance, resonators

2
M. Karjalainen
Vibrating systems

• Simple vibration: mass–spring system

3
M. Karjalainen
Vibrating systems

Undamped and damped oscillation

4
M. Karjalainen
Resonance

Mass-spring
resonator

Helmholtz-
resonator

5
M. Karjalainen
Two-mass vibrating system

Transversal and longitudinal vibration


of a two-mass system

6
M. Karjalainen
Vibration modes of a string

7
M. Karjalainen
Wave propagation

Wave equation:

D’Alembert:

8
M. Karjalainen
Sound pressure, sound pressure level, decibel

Sound pressure: p [Pa]


Sound pressure level:
Reference:

9
M. Karjalainen
Wave phenomena: spherical wave

Sound velocity in the air:

Spherical wave:

10
M. Karjalainen
Wave phenomena: planar wave

Planar wave in a tube:

Reflection (and transmission):

11
M. Karjalainen
Lowest resonance modes in a tube

Open ends One end closed

12
M. Karjalainen
Spectral content of string vibration

13
M. Karjalainen
Bar and membrane modes

Bar

Membrane

14
M. Karjalainen
Reflection and refraction (bending)

15
M. Karjalainen
Diffraction

16
M. Karjalainen
Sound propagation paths in a room

17
M. Karjalainen
Sound field decay in a room

Tapiola-sali

18
M. Karjalainen
Sound field in a room, Computer simulation

19
M. Karjalainen
Sound field level in a reverberant room

20
M. Karjalainen
Modal behavior in a room

L i = dimensions of a rectangular room


n i = integer indices 0, 1, 2, ...

measured magnitude response in a room

21
M. Karjalainen
Sound propagation by image source model

Solid line = real path; dotted line virtual path

22
M. Karjalainen
Electroacoustics: Loudspeaker

Dynamic loudspeaker
principle driver structure enclosure

23
M. Karjalainen
Electroacoustics: Microphone

Condenser microphone

principle construction

24
M. Karjalainen
Chapter 3: Sound and Voice as Signals

This is background information that is not asked directly


in the exam, but knowing it certainly helps, especially if
you need to apply your knowledge in practice.

1
M. Karjalainen
Sound and Voice as Signals

In signal representations a physical or abstract variable


is typically reptesented as a function of time, such as:

• Signal as a mathematical function:


– Pure tone:

– Random signal:

• Discrete-time numeric sequence

Continues ...

2
M. Karjalainen
Sound and Voice as Signals

Continues ... • Graphical presentations:

sinewave random noise

sample sequence speech waveform

unit impulse unit pulse

3
M. Karjalainen
Linear and time-invariant (LTI) systems

Properties of LTI systems:


• Any (stable) LTI system can be fully
represented by its impulse response
• Output cannot include any frequencies that
are not in the input (no nonlinear distortion)
• Any bandlimited LTI system can be
approximated by digital filters with arbitrary
accuracy (theoretically)
4
M. Karjalainen
Signal processing algorithms

Convolution

Fourier analysis

5
M. Karjalainen
Signal processing algorithms

Fourier synthesis

Convolution vs. Fourier transform

6
M. Karjalainen
Decomposition of sawtooth waveform

7
M. Karjalainen
Spectrum analysis

Magnitude spectrum

Phase spectrum

Phase delay
Group delay
8
M. Karjalainen
Fourier analysis with windowing

• Rectangular window
• Hamming window
• Hann(ing) window
• Kaiser window
• Blackman (Blackman-Harris) window

9
M. Karjalainen
Spectrum analysis using Fourier analysis with windowing

Sine wave

Sine wave
windowed
synchronously

Sine wave
windowed non-
synchronously

Sine wave,
Hamming-
windowed

10
M. Karjalainen
Vowel spectra

11
M. Karjalainen
Time-frequency representations: Spectrogram

Word: /kaksi/

12
M. Karjalainen
Auto- and cross-correlation

Cross-correlation
Autocorrelation

13
M. Karjalainen
Cepstrum

• Compute Fourier transform


• Logarithm of (power) spectrum
• Inverse Fourier transform

14
M. Karjalainen
Digital signal processing: DSP systems

• Analog-to-digital (A/D) converter


• Digital signal processor (+ software)
• Digital-to-analog (D/A) converter

15
M. Karjalainen
Signal quantization: A/D conversion

• Linear quantization (PCM-coding)


• Discrete levels: 2n (n= bit number)
• 16–24 bits/sample in audio ( 96 dB SNR)
• Sample rate: 44100 or 48000 samples/sec
16
M. Karjalainen
Z-transform

Linear transform of sequence x(n) :

Unit delay as building element:

Digital filtering can be expressed as


rational function (or polynomial) of z-1

17
M. Karjalainen
Digital filtering: FIR filters

FIR = finite impulse response filter

18
M. Karjalainen
Digital filtering: IIR filters

IIR = infinite impulse response filter

19
M. Karjalainen
Linear prediction (AR-modeling)

Modeling of signal generation with flat


spectrum excitation (impulse or noise)
and IIR (all-pole) filter. Speech example:

Signal

Windowed LP-spectra

FFT-spectrum

20
M. Karjalainen
Neural networks

MLF = multilayer feedforward network


= multilayer perceptron

Input layer + hidden and output layer nodes


with sigmoidal nonlinearity

Backpropagation algorithm for training

21
M. Karjalainen
Hidden Markov models (HMM)

For probabilistic modeling of state sequences


Used especially in speech recognition

22
M. Karjalainen
Audio reproduction: loudspeaker response

Magnitude response of a non-ideal loudspeaker

23
M. Karjalainen
Group delay response of a loudspeaker

24
M. Karjalainen
Reproduction quality: Distortion and SNR

Nonlinearity results in distortion: Sine wave input


results in generation of harmonic components A(i)
Distortion (usually given in %):

Distortion in general is discussed in later chapters

Signal-to-noise ratio (SNR):

25
M. Karjalainen
Response equalization

Non-flat magnitude response can be equalized


(flattened), by digital filtering.
Example by so-called frequency-warped filters

26
M. Karjalainen
Chapter 4: Speech and Music

• Speech communication
• Speech production:
– Speech production mechanism
– Vocal cords – phonation
– Vocal and nasal tract – articulation
– Units and notation of speech: vowels, consonants
– Prosody of speech
– Modeling of speech production
• Singing voice
• Speech processing: analysis, synthesis, coding, recognition
• Musical instruments as sound sources
• Music signal processing
– Sound synthesis techniques
– Physical modeling
– Digital audio vs. music

1
M. Karjalainen
Speech communication chain

2
M. Karjalainen
Speech production mechanism

3
M. Karjalainen
Phonation and articulation

• Vocal cords (vocal folds) — phonation


– Generation and controlling of voiced sound at glottis
• Vocal tract and nasal tract — articulation
– Controlling of voice features by articulation organs

• Concepts:
– Glottis (vocal cord opening)
– Voiced / unvoiced / combined
– Constriction
– Formant (and antiformant)
– Vowel / consonant
– Prosodic features

4
M. Karjalainen
Units and notation of speech – Phonetics

• Phonetics: study and description of spoken language


• Languages and language families
– Indo-European, Finno-Ugric, …
• Phonetic alphabet:
– IPA (International Phonetic Alphabet)
– Computerized: SAMPA, Worldbet, ...
• Units of spoken language:
– Phoneme (smallest linguistic unit), abstract unit class
– Allophone (variant of a phoneme)
– Phone (äänne in Finnish), a concrete unit of speech
– Diphone (from mid phone via transition to the mid of next one
– Triphone (similar combination of three successive phones)
– Speech segment (typically subunit of a phone)
5
M. Karjalainen
Vowels (Finnish)

• Front–back (etisyys: etu–taka)


• Open–closed (suppeus: suppea–väljä)
• Rounded–unrounded (lavea–pyöreä)

6
M. Karjalainen
Consonants (Finnish)

• Articulation place (ääntämispaikka):


– Labial, dental, palatal, velar, laryngeal
• Articulation manner (ääntämistapa)
– Stop consonant (klusiili), fricative (frikatiivi), nasal
(nasaali) tremulant (tremulantti), lateral (lateraali),
semivowel (puolivokaali)

7
M. Karjalainen
Prosody (suprasegmental features)

• Intonation (intonaatio)
– Primarily by fundamental frequency trajectory
• Stress (paino)
– Primarily by intensity (loudness) of pronounciation
• Timing (ajoitus)
– Rhythmic pattern (primarily by segment durations)

8
M. Karjalainen
Modeling of speech production

• Simplification of the speech production mechanism


– Acoustic model

9
M. Karjalainen
Circuit model (transmission-line model)

• Glottal oscillator
– Varying cross-section between vocal cords
• Vocal tract as a transmission line
– Two-directional wave propagation
• Lip radiation (acoustic load)

• Variables: pressure and volume velocity


10
M. Karjalainen
Signal model = Source-Filter model

• Source = excitation
– (a) voiced = quasiperiodic excitation
– (b) unvoiced = noislike excitation
• Filter = vocal and nasal tract

11
M. Karjalainen
Glottal oscillation

• Phonation = vibration of vocal folds


– Glottal opening is a function of time:
• Open phase, closed phase
• Glottal closure event generates the main
excitation to the vocal tract

12
M. Karjalainen
Formants (tract resonances)

• Example: resonances of a homogeneous tube


– Volume velocity transfer function

– 17 cm tube corresponds to typical male vocal tract


– quarter waveleght resonator with resonances at

13
M. Karjalainen
Vocal tract transfer functions: vowel /i/

• Inhomogeneous vocal tract area profile /i/


– Constriction in frontal tract
– Cavity in the rear part of tract
– First formant down from neutral position
– Second formant up from neutral position

14
M. Karjalainen
Radiation directivity of speech

• Omnidirectional at low frequencies


• Increased frontal directivity at high frequencies

Azimuth Elevation
15
M. Karjalainen
Singing voice

• Classical singing style


– `Singers formant´ around 3 kHz makes voice more audible
– In soprano singing the high fundamental frequency or a
harmonic component should match a formant
• Singing in popular music
– Style and way of voice production is free since
amplification makes it loud anyway
– Personality of voice is important

16
M. Karjalainen
Speech processing

• Speech analysis
– Feature analysis of speech signals
• Speech synthesis
– Typically synthesis from text
• Speech recognition
– From speech to text or commands
• Speech coding
– Compression for transmission or storage
• Speech enhancement
– Improving degraded speech signals

17
M. Karjalainen
Formant synthesis models

• Cascaded and parallel filter models

18
M. Karjalainen
Synthesis by waveform concatenation

• Overlap-add reconstruction of voiced speech


– Fundamental frequency (pitch) can be changed

19
M. Karjalainen
Text-to-speech synthesis

• Transforming text to speech signal


– Language-dependent text processing
– Speech signal production quite language-independent

20
M. Karjalainen
Text-to-speech synthesis

21
M. Karjalainen
Speech coding

• Speech signal analysis


– Typically model-based (linear prediction) where source and
filter parameters are analyzed from speech signal
• Quantization of the parameters (bit compression)
• Transmission or storage of parametrized speech
• Reconstruction of parameters
• Reconstruction of speech signal

• Encoding -> transmission -> decoding

22
M. Karjalainen
Speech recognition

• Feature analysis of signal


– Typically mel cepstral coefficients
– Compression of data & redundancy removal
• Pattern recognition
– Comparison to speech units
– Typically by Hidden Markov Models (HMM)
• Possible postprocessing
– Language modeling
• Formal grammar
• Unlimited text is difficult

23
M. Karjalainen
Musical instrument sounds

• String instruments
– Plucked string instruments
– Struck string instruments
– Bowed string instrument
• Wind instruments
– Brass instruments
– Woodwind instruments
• Percussion instruments
– Drums etc.

24
M. Karjalainen
Modeling of musical instruments (string modeling)

• String model
– Two-dimensional waveguide (transmission line)
– Excitation (pluck) inserted to both delay lines
– Wave reflections at terminations modeled as filters
– Output is taken at bridge or pickup, sum of both lines
– The same model is applicable to wind instrument bores
(but there is a nonlinear oscillating feedback in them)

25
M. Karjalainen
Simplified string modeling

• String model reduction (signal model)


– Two delay lines can be combined to one
– Filters in the loop can be combined to a single loop filter
– Computation is more efficient
– So-called Karplus-Strong model is a simplified case where
an intial random noise is inserted in the delay line before
synthesis and loop filter is a simple two-tap FIR filter

26
M. Karjalainen
Impulse response of a simple string model

• Impulse and magnitude responses of the previous model

27
M. Karjalainen
Body response modeling

• String instrument body works like an LTI system (filter)

Impulse
response

Magnitude
response
(low frequencies)

28
M. Karjalainen
Chapter 5: Structure and Function of Hearing

• Peripheral hearing
– External ear
– Middle ear
– Inner ear (cochlea)
• Basilar membrane
• Hair cells
• Auditory nerve
• Active cochlea and nonlinearities
• Higher levels of the auditory system
• Basic properties of human hearing
– Effective hearing area (level vs. frequency)
– Equal loudness curves
– Technical measures related to hearing
• Sound level and frequency weighting functions

1
M. Karjalainen
Approaches to hearing research

• Anatomy of hearing
– The structure of hearing organs is studied
• Physiology of hearing
– The (physiological) responses of hearing to physical
sound stimuli are studied
• Psychology of hearing
– Functional properties of auditory perception are studied
as subjects reactions to physical sound stimuli

• The main interest here is ’Engineering psychoacoustics’ and


computational models of auditory functions

2
M. Karjalainen
Peripheral hearing

• External ear (outer ear) Middle ear Inner ear

3
M. Karjalainen
Schematic of peripheral hearing

• External ear (outer ear) Middle ear Inner ear

4
M. Karjalainen
External ear and ear canal transmission

• Transfer functions
– Frontal sound source to the eardrum (solid line)
– Entrance of ear canal to the eardrum (dotted line)

• Head-related transfer functions (HRTFs) discussed later


5
M. Karjalainen
Middle ear: Bone conduction

• Ossicles
– Malleus (hammer-shaped bone)
– Incus (anvil-shaped bone)
– Stapes (stirrup-shaped bone)

• Impedance match from air to liquid (1:3000)

6
M. Karjalainen
Animations of middle ear function

7
M. Karjalainen
Animations: University of Wisconsin http://www.neurophys.wisc.edu/~ychen/auditory/fs-auditory.html
Middle ear conduction and features

• Signal transfer function is a bandpass filter

• Other middle ear features:


– Acoustic reflex
– Eustachian tube
8
M. Karjalainen
Inner ear: the cochlea

• Cochlea is a spiral-shaped, liquid-filled tube of about


2.7 turns and 35 mm long
• Stapes vibration enters to cochlea through oval window
• Another window to mid-ear is called round window
• Basilar membrane divides the cochlea into two parts

Cochlea linearized

9
M. Karjalainen
Cross-section of the cochlea

• Basilar membrane between bony shelves


– Division to scala vestibuli and scala tympani
• Reissner’s membrane separates scala media
• Organ of Corti: hair cells
• Tectrorial membrane

10
M. Karjalainen
Basilar membrane motion: traveling waves

• Basilar membrane is a nonhomogeneous transmission line:


– Wider and more massive towards apex
– Sound pressure entering the liquid of cochlea generates a
traveling wave along the basilar membrane
– Traveling wave has maximum vibration amplitude depending
on the frequency of wave (characteristic frequency = C.F.)
– High frequencies resonate close to the oval window and low
frequencies close to helicotrema

11
M. Karjalainen
Animation of basilar membrane motion

12
M. Karjalainen
Basilar membrane response to a square-wave signal

• Time–position–amplitude pattern of basilar membrane


movement as a response to square-wave signal

13
M. Karjalainen
Hair cells

• Inner hair cells, in one row


• Outer hair cells, in 3-5 rows
• Together about 15000 – 16000 hair cells
• Each hair cell is equipped on top with u-, v-, or w-
shaped filament called stereocilia
• Neural fibers are connected to hair cells

14
M. Karjalainen
Hair cells in the organ of Corti

15
M. Karjalainen
Stereocilia (= ’hair bundles’ of hair cells)

16
M. Karjalainen
Movement of the organ of Corti

17
M. Karjalainen
Movement and activation of hair cells

18
M. Karjalainen
Hair cells: neural conduction

• Vibration of the basilar membrane causes bending of


stereocilia and this opens ion channels which modulates
potential within the cell
• Activation of the cell releases neurotransmitter to
synaptic junctions between hair cell and neural fibers of
the auditory nerve
• A neural spike is generated that propagates in the
auditory nerve fiber
• Next spike possible only after at least 1 ms

19
M. Karjalainen
Activation and inhibition of hair cells

• Asymmetrical effect of sterocilia bending on firing rate


• Cochlear potentials

20
M. Karjalainen
Phase-locking and synchrony of neural firing

• Statistically phase-locked • Statistical synchrony of


within half cycle neural firing

21
M. Karjalainen
Passive vs. active cochlea

• Georg von Békésy found basilar membrane behavior by


experimention with ears from dead animals
=> reduced frequency resolution
• Explanation: second filter needed
• Now it is known that the cochlea is active:
– Especially at low signal levels the outer hair cells amplify
basilar membrane motion
• Outer hair cells receive many efferent neural fibers from
higher neural levels
• Outer hair cells are able to change their length very
rapidly (in synchrony with high audio frequencies)
• Otoacoustic emission (cochlear echo) as a response to
external stimulus, recordable in near canal, is related to
this phenomenon
22
M. Karjalainen
Auditory nerve responses: firing rate

• Steady-state firing rate is a saturating function with


spontaneous rate (= without sound excitation)
• There are fibers with different sensitivity (and
spontaneous rate)

23
M. Karjalainen
Poststimulus time histogram (PST)

• Firing rate overshoot and undershoot with onset and


offset of excitation
– Works like automatic gain control

24
M. Karjalainen
PST with steady-state sinusoidal excitation

• Statistically, half-wave rectification appears along with


automatic gain control

25
M. Karjalainen
Firing rate saturation for a vowel excitation

• For increasing level of excitation, the firing rate profile


(’neural activation spectrum’) saturates

26
M. Karjalainen
Tuning curves for constant firing level

• If the firing rate of a neural fiber is kept constant for varying


excitation frequency, a tuning curve is obtained
• This characterizes the frequency selectivity of cochlea

27
M. Karjalainen
Effects of active cochlea

• Low-level signals are amplified substantially by


active cochlea:
– Sensitivity of hearing is increased
– Due to AGC-like compression, the narrow dynamic range
(about 25 dB) of hair cells is expanded to more than 100 dB
• Selectivity (frequency resolution) is increased
(especially at low signal levels) due to active function
• If outer hair cells are damaged, the active
amplification is degraded or disappears
– Loss of auditory sensitivity
– Tuning curves are broadened
– Otoacoustic emissions disappear

28
M. Karjalainen
Cochlear nonlinearity: Two-tone suppression

• Addition of another tone (shaded area in figure below)


suppresses the activation due to probe tone at its characteristic
frequency (= kind of masking)

29
M. Karjalainen
Cochlear nonlinearity: Combination tones

• Nonlinear interaction of two tones generates


new tones that are perceived:
– Difference tone: fdiff = f2 – f1
• E.g.: 1.1 kHz and 1.0 kHz => 100 Hz
– Cubic difference tone: fcubic = 2f1 – f2
• E.g.: 1.0 kHz and 1.1 kHz => 900 Hz
• Appears already at low level of excitation

30
M. Karjalainen
Central auditory system

• Higher-level functions
not known well.
• Cochlear nucleus has
specific cells such as
’chopper cells’ that do
temporal processing.
Spectral information is
recovered unsaturated.
• Binaural hearing starts
at superior olive level.
• Auditory cortex is the
center for processing
perceptions and
integrating the sound
scene.
• Interaction with other
senses (vision) strong.
31
M. Karjalainen
Dynamic range of hearing

Sound
level
’thermo-
meter’
6 dB steps

3 dB steps

1 dB steps

32
M. Karjalainen
Equal loudness curves and threshold of hearing

• Equal loudness level perception, unit phone = SPL at 1 kHz

33
M. Karjalainen
Sound level and frequency weighting curves

• Weighting filters for sound level measurement (A most common)

34
M. Karjalainen
Recommended frequences and bands

• Recommended
frequences and
frequency bands
for measurements
and technical
applications:

• Octave = 2:1
• 1/2 octave
• 1/3 octave

35
M. Karjalainen
Filtered noise demo

• White noise

• Low-pass filtered noise,


decreasing cutoff frequency

• High-pass filtered noise,


increasing cutoff frequency

• 1/3 octave noise,


increasing center frequency

• White and pink noise

36
M. Karjalainen
Chapter 6: Fundamentals of Psychoacoustics

• Psychoacoustics = auditory psychophysics


• Sound events vs. auditory events
– Sound stimuli types, psychophysical experiments
– Psychophysical functions
• Basic phenomena and concepts
– Masking effect
• Spectral masking, temporal masking
– Pitch perception and pitch scales
• Different pitch phenomena and scales
– Loudness formation
• Static and dynamic loudness
– Timbre
• as a multidimensional perceptual attribute
– Subjective duration of sound

1
M. Karjalainen
Psychophysical experimentation

• Sound events (si) = pysical (objective) events


• Auditory events (hi) = subject’s internal events
– Need to be studied indirectly from reactions (bi)
• Psychophysical function h=f(s)
• Reaction function b=f(h)

2
M. Karjalainen
Sound events: Stimulus signals

• Elementary sounds
– Sinusoidal tones
– Amplitude- and frequency-modulated tones
– Sinusoidal bursts
– Sine-wave sweeps, chirps, and warble tones
– Single impulses and pulses, pulse trains
– Noise (white, pink, uniform masking noise)
– Modulated noise, noise bursts
– Tone combinations (consisting of partials)
• Complex sounds
– Combination tones, noise, and pulses
– Speech sounds (natural, synthetic)
– Musical sounds (natural, synthetic)
– Reverberant sounds
– Environmental sounds (nature, man-made noise)
3
M. Karjalainen
Sound generation and experiment environment

• Reproduction techniques
– Natural acoustic sounds (repeatability
problems)
– Loudspeaker reproduction
– Headphone reproduction
• Reproduction environment
– Not critical in headphone reproduction
– Anechoic chamber (free field)
• Room effects minimized
• Not a natural environment
– Listening room
• Carefully designed, relatively normal
acoustics
– Reverberation chamber
• Special experiments with diffuse
sound field
4
M. Karjalainen
Psychophysical functions

• Sound event property to auditory event property mapping

h = a log(s) Weber, Weber-Fechner law


h = c sk (e.g., loudness)

5
M. Karjalainen
Experimental concepts: Thresholds

• Threshold values
– Absolute thresholds (e.g., threshold of hearing)
– Difference thresholds (just noticeable difference, JND)

Example: Threshold of perception:


- 50%, 75%, etc. thresholds

6
M. Karjalainen
Experimental concepts

• Comparison of percepts
– Magnitude estimation
– Magnitude production
• Probe tone method
– Generation of a probe tone to make test tone
audible/noticeable
– Modulation, canceling, interference
• Classification and scaling of percepts
– Nominal scale (rough, sharp, reverberant, …)
– Ordinal scale (percepts have ordering)
– Interval scale (numeric scale, no zero point defined)
– Ratio scale (numeric scale, zero point defined)
• Multidimensional scaling
– Semantic differentials: low – high, dull – sharp, ...

7
M. Karjalainen
Psychoacoustic experiments

• Description of auditory events


– Oral or written description
• Method of adjustment
– Adjusting a stimulus to correspont to a reference
• Selection methods
– Forced choice methods (select one!):
• Two alternative forced choice (TAFC, 2AFC)
• Method of tracking
– Tracking with varying stimulus
• Bekesy audiometry
• Bracketing method
– Descending and ascending bracketing
• Yes/no answering
• Reaction time measurement
– Indicates the difficulty of decision task

8
M. Karjalainen
Békésy audiometry

• Slow frequency sweep and level tracking

9
M. Karjalainen
Typical psychoacoustical test types

• AB test
– Set in preference order / select one
– AB hidden reference (one must be recognized)
• AB scale test
– As AB but assign numeric values for A and B
• ABC test
– A is fixed reference (anchor point) for assigning
values for B and C
• ABX test
– Which one, A or B, is equal to X ?
• TAFC (2AFC)
– Two alternative forced choice

• Formation of a listening test panel


• Formation of a description language
10
M. Karjalainen
Masking effect

• ”A loud sound makes a weaker sound imperceptible”


• Categories and aspects of masking
– Frequency masking
– Temporal masking
– Time-frequency masking
– Frequency selectivity of the auditory system
– Psychophysical tuning curves
– Critical band
• Bark bandwidth
• ERB bandwidth

• Masking tone and test tone

11
M. Karjalainen
Frequency masking

• Masking by white noise

12
M. Karjalainen
Frequency masking

• Masking by narrow-band noise (0.25, 1, 4 kHz)

13
M. Karjalainen
Frequency masking

• Frequency masking as a function of masker level

14
M. Karjalainen
Frequency masking

• Frequency masking by lowpass and highpass noise

15
M. Karjalainen
Frequency masking

• Frequency masking by 1 kHz sinusoidal signal

16
M. Karjalainen
Frequency masking

• Frequency masking by a complex tone


(harmonic complex)

17
M. Karjalainen
Temporal masking

• Masking before and after a noise signal

18
M. Karjalainen
Temporal masking

• Beginning of postmasking

19
M. Karjalainen
Temporal masking

• Postmasking as a function of time


– For 200 ms long masker
– For 5 ms long masker

20
M. Karjalainen
Time-frequency masking

• Masking of a tone burst in time and frequency


by a time-frequency block of noise

21
M. Karjalainen
Temporal masking

• Masking due to an impulse train

22
M. Karjalainen
Frequency selectivity of hearing

• Masking curves tell much about auditory selectivity


• Psychophysical tuning curves match with physiological curves

23
M. Karjalainen
Critical band experiment

• Experiment: loudness vs. bandwidth of noise

24
M. Karjalainen
Critical band

• Loudness vs. bandwidth of noise


– Loudness increases when bandwidth exceeds
a critical band

25
M. Karjalainen
Critical band (Bark band) vs. frequency

• Critical band (Bark band) fG vs. mid frequency


• Ref: just noticeable tone frequency change vs. frequency

26
M. Karjalainen
Critical band: 24 Bark bands (Zwicker)

27
M. Karjalainen
ERB band experiment

• ERB = Equivalent Rectangular Bandwidth


• Loudness of a tone is measured as a function of frequency
gap in masking noise around the test tone
• ERB band is narrower than Bark band, especially at low
frequences

28
M. Karjalainen
Pitch scales

• Pitch = subjective measure of tone hight


• Mel scale
or

• Bark scale

or

Inverse function:
• ERB scale

Inverse :
29
M. Karjalainen
Logarithmic pitch scale

• Logarithmic scale used in music and audio


• Frequency ratios more important than absolute frequencies
• Octave and ratios of small integers important

30
M. Karjalainen
Comparison of pitch scales

• Pitch scales are related to place coding on the basilar


membrane, although they are measured by psychoacoustic
experiments

31
M. Karjalainen
Comparison of pitch scales

• Comparison (log reference) of:


– logarithmic scale
– ERB scale
– Bark scale
– linear scale

32
M. Karjalainen
Comparison of pitch scales

• Comparison (linear reference) of:


– logarithmic scale
– ERB scale
– Bark scale
– linear scale

33
M. Karjalainen
Pitch

• Continues in file KA6b

34
M. Karjalainen
Pitch phenomena
Cont’d from file 6a

• Pitch of a pure tone as a function of amplitude


– Individually varying property

1
M. Karjalainen
JND of frequency modulation

• Frequency modulation JND threshold


– As a function of carrier frequency
– As a function of modulation frequency
– About 4 Hz modulation most easily perceivable

2
M. Karjalainen
Minumum duration of a tone for pitch percept

• Duration to make pitch perceivable


– Duration in milliseconds
– Duration of two cycles as a reference

3
M. Karjalainen
JND pitch change vs. tone duration

• Threshold of perceived pich variation increases below


200 ms duration

4
M. Karjalainen
Pitch strength

• How strong or weak a pitch perception is?

5
M. Karjalainen
Pitch phenomena and theories

• Place (spectral) pitch vs. temporal pitch theories


• Spectral pitch (due to spectral peak)
• Temporal pitch (periodicity)
• Missing fundamental
• Virtual pitch
• Repetition pitch
• Pitch of inharmonic signals
• Absolute pitch (memory)

6
M. Karjalainen
Loudness

• Loudness is the perceived subjective ’strength’


(’volume’, ’intensity, etc.) of a sound
– Subjective scale defined in relation to physical scale
– Unit is sone: 1 sone — 40 dB SPL at 1 kHz

7
M. Karjalainen
Loudness of a sinusoidal tone

• Loudness N vs. SPL of a 1 kHz tone


– Power law found to mach best

Loudness vs.
loudness level :

Power law:

More precisely:

8
M. Karjalainen
Partial loudness (by noise masking)

• Partial loudness of 1 kHz tone in presence of masking noise


– As a function of tone level and masking noise level

9
M. Karjalainen
Loudness example: two tones

• Loudness of a pair of tones as a function of frequency difference


– Slow beat range: loudness due to peaks (6 dB over 60 dB)
– Medium rate fluctuation: power doubled => 3 dB increase
– Fast fluctuation: wideband signal => loudness doubled (10 dB)

10
M. Karjalainen
Loudness computation (Zwicker formulation)

• Excitation signal => power spectral density on the Bark scale

• Spreading function B(z), such as

• Convolution by spreading function

• Loudness density

• Total loudness

11
M. Karjalainen
Loudness computation, examples

• Left: excitation level for sinusoidal tone and white noise


• Right: loudness density for sinusoidal and white noise

12
M. Karjalainen
Loudness graphically

• Graphical chart determination of loudness (Zwicker)

13
M. Karjalainen
JND of loudness level

• Just noticeable difference by amplitude modulation


– Modulation of 1 kHz tone
– Modulation of white noise
– Modulation frequency 4 Hz

14
M. Karjalainen
JND of loudness level

• Just noticeable difference by amplitude modulation


– As a function of modulation frequency
– Modulation of 1 kHz tone
– Modulation of white noise

15
M. Karjalainen
Modulation detection

• Detection of amplitude and frequency modulation


– Amplitude modulation easily detectable by ’off-band listening’
(loudness modulated due to upper spreading slope variation)
– No slope variation in frequency modulation

16
M. Karjalainen
Loudness vs. duration

• Temporal integration of loudness for duration < 200 ms


– Loudness level decreases 10 phon for for 10-fold decrease in
duration

17
M. Karjalainen
Loudness formation temporally

• Loudness formation for different durations of a tone burst


– Peak value of total loudness is tracked in time-varying cases

18
M. Karjalainen
Timbre (perceived ’sound color’)

• Timbre is a multidimensional attribute of sound


– For stationary sounds:
• Spectrum: (loudness spectrum)
• Periodicity (periodic, multiperiodic, noise-like)
• Repetitiveness (reflections, reverberation, spatialness)
– For time-varying signals
• Amplitude envelope important
– Amplitude envelope at each critical band
– For transients and onsets
• Changes are more prominent than steady-state parts,
especially onsets

19
M. Karjalainen
Subjective duration

• Subjective vs. objective duration

20
M. Karjalainen
Auditory Demonstrations 1

1 Cancelled harmonics
2-6 Critical bands by masking
7 C.B. by loudness comparison
8-11 The decibel scale
12-16 Filtered noise
17-18 Frequency response of the ear
19-20 Loudness scaling
21 Temporal integration
22 Asymmetry of masking by pulsed tones
23-25 Backward and forward masking
26 Pulsation threshold
21
M. Karjalainen
Auditory Demonstrations 2

27-28 Dependence of pich on intensity


29 Pitch salience and tone duration
30 Influence of masking noise on pitch
31 Octave matching
32 Streched and compressed scales
33 Frequency difference limen
34-35 Log and lin frequency scales
36 Pitch streaming
37 Virtual pitch (missing fundamental)
38-39 Shift of virtual pitch
40-42 Masking spectral and virtual pitch
22
M. Karjalainen
Auditory Demonstrations 3

43-45 Virtual pitch with random harmonics


46-47 Strike note of chime
48 Analytic vs synthetic pitch
49-51 Scales with repetition pitch
52 Circularity in pitch judgment
53 Effect of spectrum on timbre
54-56 Effect of tone envelope on timbre
57 Change in timbre with transposition
58-61 Tones and tuning with streched partials
62-63 Primary and secondary beats

23
M. Karjalainen
Chapter 7: Other psychoacoustic concepts

• Sharpness
– Spectral center of gravity
• Fluctuation strength
– Perception of slow modulations (beats)
• Impulsiveness
• Roughness
– Perception of fast modulations
• Tonality
– Periodic vs. random excitation
• Sensory pleasantness
• Psychoacoustic concepts and music
– Sensory consonance and dissonance
– Intervals, scales, and tunings
– Rhythm, tempo, bar, measure
• Perceptual organization of sound

1
M. Karjalainen
Sharpness

• Perceived sharpness is proportional to spectral center of gravity


• Unit of sharpness is 1 acum ~ for noise of 60 dB, 1 kHz, 1 Bark
• Sharpness for 1 Bark wide noise, lowpass noise, and highpass noise
• Increase of level from 30 dB to 90 dB doubles the sharpness

Bandpass noises:

2
M. Karjalainen
Computation of sharpness

• Sharpness can be estimated (without level effect) from

where is defined by curve:

3
M. Karjalainen
Fluctuation strength

• Perception of relatively slow modulations: fluctuation strength


• Highest sensitivity to modulation at 4 Hz
• Unit of fluctuation strength is 1 vacil
~ 4 Hz 100 % modulation of 1 kHz 60 dB tone
• Figure: (a) AM broadband noise, (b) AM sinusoidal tone,
(c) FM sinusoidal tone

1 Hz

4 Hz

16 Hz

4
M. Karjalainen
Fluctuation strength

• Left: fluctuation strength for AM (4 Hz) wideband noise (60 dB)


• Right: sine tone, 1.5 kHz, 70 dB, modulated at 4 Hz, as a function
of FM deviation

5
M. Karjalainen
Fluctuation strength

• Fluctuation strength computation:

6
M. Karjalainen
Impulsiveness

• There is no clearly defined psychoacoustic concept of impulsiveness


• Impulsiveness is related to rapid onsets in signal
• If the repetition rate of impulses is > 10–15 Hz, roughness is perceived
• In noise control, impulsiveness is considered to increase hearing
damage risk compared to non-impulsive sound of same energy

7
M. Karjalainen
Roughness

• Fast (> 15 Hz) modulation is perceived as roughness


• Addition of two tones of different frequencies creates envelope
fluctuation
• When the frequency difference increases, tones start to segregate
• When the frequency difference is larger than a critical band,
roughness disappears

1 kHz+f

7 Hz

70 Hz

300 Hz

8
M. Karjalainen
Roughness

• Unit of roughness is 1 asper ~ 1 kHz tone, 60 dB, 100 % AM


modulated at 70 Hz.
• Towards lower and higher modulation frequences the roughness
decreases

9
M. Karjalainen
Roughness

• Roughness for different carrier frequencies as a function of AM


modulation frequency with 100 % modulation.

1 kHz+f

7 Hz

70 Hz

300 Hz

10
M. Karjalainen
Tonality

• Tonality (tonalness) = sound exhibits voiced component(s), periodicity


• Non-tonal sound is noise-like, non-periodic
• Non-tonal (noisy) signal masks a tonal one more easily than vice versa
• For tonality index , critical band index i, the masking threshold is:
– ( = 0.0: non-tonal,  = 0.5: half-tonal,  = 1: fully tonal)

• Tonality with varying modal density,


log. distribution of frequencies (approx/critical band):
10/CB 20/CB 40/CB 80/CB

11
M. Karjalainen
Sensory pleasantness

• Sensory pleasantness (example by Zwicker):


– P = sensory pleasantness
– S = sharpness
– R = roughness
– T = tonality
– N = loudness

– Product sound quality measures are often constructed by


similar techniques.

12
M. Karjalainen
Sensory consonance and dissonance

• Consonance and dissonance are closely related to roughness


• Consonance vs. dissonance of two partials:

13
M. Karjalainen
Consonance and dissonance of harmonic tones

• Roughness due to interaction of partials in a sound contribute to


dissonance
• Rations of small integers are most consonant (just intonation)
• Consonance vs. dissonance of two harmonic complexes:

14
M. Karjalainen
Examples of intervals

• Pythagoras noticed that intervals 2:1, 3:2, and 4:3 sound


”pleasant”
• Consonant intervals (decreasing order of consonance):
– 2:1 octave
Equally
– 3:2 perfect fifth
tempered
– 4:3 perfect fourth intervals
– 5:3 major sixth
– 5:4 major third
1.4983 fifth
– 8:5 minor sixth
– 6:5 minor third 1.2599 third

– 16/15 (dissonant)
– 40/27 (dissonant)
15
M. Karjalainen
Examples of intervals

Octave and its partitioning

• Log and lin uniformly spaced scales


• Which one is the best octave ?
• Stretched and compressed scales

Circularity of pitch

• Shepard effect

16
M. Karjalainen
Intervals, scales, tuning

• Just intonation, Pythagorean scale, (equally) tempered scale

• On a tempered scale a semitone is  1:1.05946


• 1 cent is 1/100 of a semitone
17
M. Karjalainen
Non-western scales and tunings

• The (tempered) western scale is adapted to a multitude of


harmonic timbres of western instruments
• For example the Balinese gamelan music is quite different
– W. A. Sethares: Tuning, Timbre, Spectrum, Scale. Springer 1998
• Example of tuning where octave is a very dissonant interval!

• Tunings and musical scales are strongly bound with spectral


properties of musical instruments

18
M. Karjalainen
Temporal structures in music: Rhythm, tempo

• Rhythm: periodicity and repeated structure in music


• Tempo: rate of main events in music
• Beat: positioning of emphasis on some events
• Measure: basic rhythmic sequence
• Duration of a note or another basic unit

19
M. Karjalainen
Perception of magnitude and phase spectrum

• Magnitude
– 1 dB deviation per critical band noticeable in direct comparison.
Even smaller deviations can be noticed by trained ”golden ears”
– Even ± 3...5 dB deviations are not easy to ”perceive” when there is
no immediate reference (except for well trained listeners)
– Magnitude response deviations = spectral coloration
• Phase and time differences
– The auditory system is relatively insensitive to phase (Helmholtz)
in general: magnitude spectrum more important than phase
spectrum, but sometimes phase is important
– Phase functions from Fourier analysis are circular and difficult to
analyze and interpret
– Group delay (phase derivative) is a relatively good perceptual
measure which describes the delay of modulation (not the carrier)

20
M. Karjalainen
Perception of phase: extreme cases

• Special phase effects:


– The following two signals have the same magnitude spectrum but
sound (as well as look) different

This is how the response looks


like in a single critical band

21
M. Karjalainen
Perceptual organization of sound

• Streaming (sequential grouping) of pitch sequences:


– Slow repetition: one stream perceived
– Fast repetition: segregation into two separate streams

D D D
B B B
F F F

C C C
E E E
A A A
(a) (b)
Time Time
One stream Two streams

22
M. Karjalainen
Perceptual organization of sound

• Streaming may change also the perceived rhythm:


– Large separation: B-D-F vs. A-C-E
– Small separation: B-D vs. A-C-E-F

D
B
F

D
B
C F
C
E E
A A

Time Time

Upper stream Lower stream Upper stream Lower stream

23
M. Karjalainen
Perceptual organization of sound

• Streaming with increasing tempo

increasing segregation
tempo or of multiple
frequency streams
difference

time

TIMBRE/TEXTURE

24
M. Karjalainen
Perceptual organization of sound

• Streaming or segregation as a function of frequency


difference and repetition period

20 10 5 3
20

15
always
separated

10

separated
or coherent
5
always
coherenti

0
0 50 100 150 200 250 300 400 500
Repetition period (msec)
25
M. Karjalainen
Auditory scene analysis

• Auditory scene analysis


– Bregman: Auditory scene analysis (MIT Press, 1990)
• Sequential integration and segregation
– Spectral vs. temporal relations
– Spatial cues in segregation
• Integration and segregation of simultaneous auditory components
– Spectral vs. temporal relations
– The ”old-plus-new” heuristics
– Spatial cues in segregation
• Primitive auditory organization
– Built-in and low-level mechanisms
• Schema-based auditory organization
– Learning of stream integration and segregation

26
M. Karjalainen
Computational auditory scene analysis (CASA)

• Computational auditory scene analysis (CASA) is an attempt to


computationally simulate and model human auditory scene analysis
– Sound source segregation (separation)
– Multipitch signal analysis of harmonic sound mixtures
– Bottom-up vs. top-down driven processing
– Prediction-driven processing
– Spatial source separation (coctail-party effect)
– Applications:
• Audio content analysis and content-based coding
• Automatic music transcription
• Speech recognition

27
M. Karjalainen
Tilakuuleminen

Ville Pulkki
Akustiikan ja äänenkäsittelytekniikan laboratorio
Teknillinen korkeakoulu
Espoo, Suomi
http://www.acoustics.hut.fi/
Ville Pulkki@hut.fi
TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Ääni tilassa

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 3


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Tilakuuleminen
Suuntakuulo
• Suuntakuulon tarkkuus
• Suuntakuulon teoria
Etäisyyskuulo
Tilan havaitseminen
Tilaän̈entoisto

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 4


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Siirtofunktio äänilähteestä korvakäytävään

Head Related Impulse Response (HRIR)


Head Related Transfer Function (HRTF)
Duda:
c http://interface.cipic.ucdavis.edu/CIL tutorial/

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 5


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

HRTF:ien mittaaminen

Algazi
c et al.: http://interface.cipic.ucdavis.edu/

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 6


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

HRTF:n riippuvuus äänilähteen suunnasta

0.2 vasen 0.2 vasen 0.2 vasen


0.1 ϕ =0 0.1 ϕ = 60 0.1 ϕ =0
δ =0 δ =0 δ = 60
0 0 0
-0.1 -0.1 -0.1
-0.2 -0.2 -0.2
0 1 2 ms 0 1 2 ms 0 1 2 ms
a) b) c)

0.2 oikea 0.2 oikea 0.2 oikea


0.1 ϕ =0 0.1 ϕ = 60 0.1 ϕ =0
δ =0 δ =0 δ = 60
0 0 0
-0.1 -0.1 -0.1
-0.2 -0.2 -0.2
0 1 2 ms 0 1 2 ms 0 1 2 ms

M.
c Karjalainen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 7


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

HRTF:n riippuvuus äänilähteen vaakakulmasta

Algazi
c et al.: http://interface.cipic.ucdavis.edu/

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 8


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

HRTF:n riippuvuus äänilähteen pystykulmasta

Algazi
c et al.: http://interface.cipic.ucdavis.edu/

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 9


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

HRTF:n riippuvuus äänilähteen suunnasta

0dB 0 0

-10 -10 -10

-20 -20 -20


vasen vasen vasen
-30 ϕ =0 -30 ϕ = 60 -30 ϕ =0
δ =0 δ =0 δ = 60
-40 -40 -40
2 4
10 2 10 3 10 4 Hz 10 2 10 3 10 4 Hz 10 10 3 10 Hz
a) b) c)

0dB 0 0

-10 -10 -10

-20 -20 -20


oikea oikea oikea
-30 ϕ =0 -30 -30 ϕ =0
ϕ = 60
δ =0 δ =0 δ = 60
-40 -40 -40
2 4 2 4
10 2 10 3 10 4 Hz 10 10 3 10 Hz 10 10 3 10 Hz

M.
c Karjalainen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 10


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Suuntakuulon tarkkuus horisontaalitasossa

90°
80,7°
±9,2°
ϕ

179,3° 359°
180° 0°
±5,5° ±3,6°

281,6°
±10° Kuulotapahtuman suunta
Äänitapahtuman suunta
270°

M.
c Karjalainen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 11


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Suuntakuulon tarkkuus mediaanitasossa

Äänitapahtuman
suunta
δ = 90ο
Kuulotapah-
tuman suunta

δ = 36ο +74ο +68ο


±13ο δ = 36ο
±22ο
+30ο
±10ο +27οο
±15

δ = 0ο 0ο δ = 0ο ο
ϕ = 0ο ±9ο ϕ = 180

M.
c Karjalainen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 12


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Lateralisaatiokokeet

a) b)

τph1 τph2 viivepiirit a1 a2 vaimentimet


signaali signaali

M.
c Karjalainen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 13


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Lateralisaatiokokeet, aikaviive

6
vasen aiemmin vasen myöh.

havaittu lateraalisijainti
4

oikea
2

0
vasen
2

6
-15000 -1000 -500 0 500 1000 15000
korvien välinen vaiheviive τph / μs

M.
c Karjalainen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 14


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Lateralisaatiokokeet, ominaisuuksia
Hyvät puolet:
• Voidaan vapaasti tuottaa mikä tahansa ITD-ILD yhdistelmä
• Perustulokset
Ongelmat:
• Epäluonnollisuus
• Pään sisälle lokalisointi
• Korkeiden taajuuksien toisto erilainen eri kuuntelukerroilla

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 15


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Suuntakuulo

Vihjeet:
• Binauraaliset vihjeet
– Korvienvälinen aikaero
– Korvienvälinen voimakkuusero
• Monauraalinen spektri
• Pään kääntelyn vaikutus binauraalisiin vihjeisiin
• Heijastusten suppressio

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 16


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Binauraaliset vihjeet

• Interaural Time Difference, korvienvälinen aikaero


• ITD

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 17


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

ITD:n taajuusriippuvuus
matalat taajuudet ~200 − ~1600 Hz korkeat taajuudet > ~1600 Hz
kantoaallon aikaviive verhokayran aikaviive

vasen

oikea

ITD ITD

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 18


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

ITD:n mallinnus

vasen keski oikea

τ τ

τ τ

τ τ

τ τ

τ τ

oikeasta
vasemmasta korvasta korvasta

M.
c Karjalainen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 19


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

ITD:n mallinnus

GTFB
IACC
ITD
spectrum
IACC

GTFB Composite
IACC
IACC

half wave low pass


rectification filtering

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 20


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Ristikorrelaatio ERB-kanavilla
Band cross correlation functions
°
60° 40°
90
°
20 °
0 °
20 °
40 60°
90°

1
21 kHz
0.8

0.6 10 kHz
5
0.4
10
0.2
3 kHz
0 15
1.5 kHz
1 20
0.5 800 Hz
0 25
−0.5 200 Hz
−1 30

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 21


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

ITD:n taajuusriippuvuus

−3
x 10
1

0.5
ITD [ms]

−0.5

−1

18.2 90
12.4
8.5 60
5.7 30
3.9
2.6 0
1.7 −30
1.1
0.7 −60
0.4
0.2 −90
Direction [degree]
Frequency [kHz]

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 22


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Binauraaliset vihjeet

dB dB

• Interaural Level Difference, korvienvälinen voimakkuusero


• ILD

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 23


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

ILD:n mallinnus

GFTB LL CLL
Composite
LL loudness
CLL
level
spectrum

LL CLL
GFTB
LL ILD

ILD ILD
LL
spectrum

LL ILD

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 24


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

ILD:n taajuusriippuvuus

60

40

20
ILD [phon]

−20

−40

−60
18.2
12.4 90
8.5
5.7 60
3.9 30
2.6
1.7 0
1.1 −30
0.7
0.4 −60
0.2 −90
Direction [degree]
Frequency [kHz]

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 25


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Sekaannuskartio
θcc

ääni-
lähde

φ
cc

sekaannuskartio

• ITD ja ILD ratkaisevat missä sekaannuskartiossa äänilähde on


– korvalehden ja kehon vaikutus
– pään kääntely

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 26


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Pään kääntelyn vaikutus binauraalisiin vihjeisiin

ITD & ILD


vakio

ITD & ILD


ITD & ILD muuttuvat paljon
muuttuvat paljon paan pyoritys
vastakkaiseen suuntaan

- karkea vihje

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 27


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Kehon vaikutus

Korvalehti, pää, keho


Spektri muuttuu, ILD muuttuu

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 28


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Korvalehden vaikutus

• Korvalehden onkalot värittävät ääntä saapumissuunnasta riippuen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 29


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Elevaation vaikutus spektriin


30 30
Loudness level spectrum [phon]

Loudness level spectrum [phon]


20 20

10 10

0 0

−10 −10

−20 −20

−30 −30
90 90
60 60
30 30
15 15
0 0
−15 −15
−30 3.9 5.7 8.5 12.4 18.2 −30 3.9 5.7 8.5 12.4 1
v [degr] 0.4 0.7 1.1 1.7 2.6 Elev [degr] 0.4 0.7 1.1 1.7 2.6
0.2 0.2
Frequency [kHz] Frequency [kHz]

1 2
Auditorinen spektri mediaanitasossa, suunnasta riippumaton osuus keski
voistettu pois.

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 30


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Elevaation vaikutus spektriin


30 30
Loudness level spectrum [phon]

Loudness level spectrum [phon]


20 20

10 10

0 0

−10 −10

−20 −20

−30 −30
90 90
60 60
30 30
15 15
0 0
−15 −15
−30 3.9 5.7 8.5 12.4 18.2 −30 3.9 5.7 8.5 12.4 1
v [degr] 0.4 0.7 1.1 1.7 2.6 Elev [degr] 0.4 0.7 1.1 1.7 2.6
0.2 0.2
Frequency [kHz] Frequency [kHz]

3 4
Auditorinen spektri mediaanitasossa, suunnasta riippumaton osuus keski
voistettu pois.

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 31


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Vihjeiden luotettavuus
Jos vihjeet ovat ristiriitaisia:
• Signaalin spektri < ˜ 1000 Hz
– ITD yleensä vahvin
– ILD heikko, trading?
• Korkeammat taajuudet
– ITD ja ILD kumpikin vahvoja
– ILD voimakkaampi joskus
• Johdonmukaisempi vihje voittaa [Wightman]
• Voi syntyä useita havaintoja suunnasta
• Äänilähteen koko
• Individuaalisuus

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 32


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Suuntakuulon fysiologia

Kalat
c 1998

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 33


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Presedenssiefekti

Vihjeet relevantteja vain silloin kun suora ääni dominoi

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 34


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Presedenssiefekti

ensimm. kuulotapahtuma
So ϕ
= 40o
ϕ

kaikukynnys
α=80o ϕ = 0o

kaiku
ϕ =-40o
ST 0 1 2ms 20 30 40 50ms
ST:n viive τph

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 35


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Kaikujen havaitsemiskynnykset

40 ensimmäinen äänitapahtuma
dB ei enää erotettavissa
(ensiääni estetty) (≥ 6 henkeä)
tasoero LST - LSO

20

ensimmäinen äänitapahtuma ja kaiku


0 yhtä äänekkäät (≥ 6 henkeä)

kaiku häiritsevä (80 henkilöä)


-20

-40 peittokynnys
(1-2 henkilöä)
0 20 40 60 80 100 ms
ST:n viive

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 36


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Äänekkyyden vaikutus vapassa kentässä


8

kuulotapahtuman etäisyys / m 6

4
viiden henk.
keskiarvo
2

0
0 2 4 6 8 10
äänilähteen etäisyys / m

M.
c Karjalainen

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 37


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Etäisyyden havaitseminen

Vihjeet
• Äänekkyys
• Binauraaliset vihjeet
• Suoran äänen suhde kaiuntakenttään
• Spektri

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 38


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Tilaäänen toistometodit

• Perinteinen toisto
– Monofonia
– Stereofonia
– Monikanava 2-D
– Monikanava 3-D
• Binauraalinen toisto
– Kuulokkeet
– Kaiuttimet, ristiinkuulumisen esto

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 39


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Monofoninen toisto

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 40


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Stereofoninen toisto

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 41


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

“Surround” toisto

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 42


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

3-D monikanavatoisto

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 43


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Binauraalinen toisto

xm yˆ l yˆ r

Hc Hc
Hl Hr Hi Hi

yl yr yl yr

M.
c Karjalainen

Yksinkertaisimmillaan kuunnellaan keino- tai tosipää-äänitystä kuulokkeilla.

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 44


TKK, Akustiikan ja äänenkäsittelytekniikan laboratorio 26.3.2002

Binauraalinen toisto
(a) (c)
Hl + Hr yˆl
Hl yl xl
xm Hi + Hc
mono binauraalinen stereo transau-
raalinen
Hl − Hr
Hr yr xr yˆr
− Hi − Hc −

(b) (d)

yl 1
yˆl yˆl Hi + Hc yl
Hi + Hc
binau- transau- transau- binau-
raalinen raalinen raalinen raalinen
1
yr yˆr yˆr Hi − Hc yr
− Hi − Hc − − −

M.
c Karjalainen

Yksinkertaisimmillaan kuunnellaan keino- tai tosipää-äänitystä kuulokkeilla.

Ville Pulkki (Ville.Pulkki@hut.fi) sivu 45


Chapter 9: Auditory modeling

• Simple psychoacoustic models


– Psychoacoustic spectrum and spectrogram
– Mel spectrum and cepstrum
– Perceptual linear prediction
– Examples of auditory spectra
• Auditory filter bank models
– Gammatone filterbanks
– Inner ear simulation models
– Temporal dynamics and masking
• Cochlear models
– Basilar membrane models
– Hair cell models
• Modeling of higher level functions
– Pitch and periodicity analysis
– Speech specific models
– Computational auditory scene analysis
• Binaural auditory modeling

1
M. Karjalainen
Simple psychoacoustic modeling

• Problems with Fourier spectrum from auditory


perception viewpoint:
– Linear frequency scale vs. critical band scale
– Level (dB) vs. loudness scaling
– Frequency bins vs. spreading and masking
– Flat response vs. equal loudness sensitivity
– Windowing vs. temporal integration and masking
– Temporal adaptation in auditory perception

2
M. Karjalainen
Auditory spectrum through FFT

3
M. Karjalainen
Examples of psychoacoustic spectra

• Auditory spectra
– Sinewave (400 Hz)
– White noise

4
M. Karjalainen
Examples of psychoacoustic spectra

• Vowel /a/ and fricative /s/


– Fourier spectrum vs. auditory spectrum

5
M. Karjalainen
Mel frequency cepstral coefficients

• MFCC computation
– FFT, mel warping, logarithm, inverse cosine transform

6
M. Karjalainen
Filterbank auditory models

• General principle of an auditory filterbank model

7
M. Karjalainen
Response of a filterbank model (Bark-bank)

• Simple Bark-filterbank by warped filters (Karjalainen)

8
M. Karjalainen
Gammatone filterbank

• Temporal and magnitude response of one channel


• Filterbank

9
M. Karjalainen
Neural adaptation

• Neural adaptation model by Dau et al


– Automatic gain control feedbacks

10
M. Karjalainen
Temporal processing

• Adaptation, temporal integration, and masking model (Karjalainen)


– Neural feedback model
– Adaptation (AGC)
– Loudness (level) computation
– Teporal masking effect

11
M. Karjalainen
Responses

• Excitation, firing rate response, and loudness level response

12
M. Karjalainen
Basilar membrane traveling wave model

• Principle of approximating basilar membrane traveling wave propagation

13
M. Karjalainen
Meddis hair cell model

• Processing of neurotransmitter in the hair cell

14
M. Karjalainen
Periodicity analysis (Meddis)

• Computation of sum autocorrelation function (SACF)

15
M. Karjalainen
Periodicity analysis example

• Signal, filterbank responses, cochlegrams, sum autocorrelation for speech

16
M. Karjalainen
Auditory spectrum vs. auditory formant spectrum

• Example of vowel /ä/ and fricative /s/

17
M. Karjalainen
Auditory representation of speech

• Example of vowel transitions /...iaiai.../


– Auditory spectrogram
– Auditory formant spectrogram

18
M. Karjalainen
Applications of auditory modeling

• Audio coding
– Psychoacoustic or perceptual models of masking
• Sound quality modeling
– Modeling of perceived differences
– Criteria for audio reproduction
– Binaural audio quality
• Speech recognition
– Advanced front-end models
• Advanced hearing aids
– Cochlear implants

19
M. Karjalainen
Chapter 10: Sound quality

• Effects of sound:
– Physical effects (generally meaningless)
– Physiological effects (hearing loss)
– Information and knowledge effects (communication)
– Esthetic and emotional effects (communication)
• Concept of quality in general:
– Quality as contrast to quantity (categorical
dissimilarity)
– Quality on scale low-Q vs. high-Q (measure of
preference)
• Speech intelligibility and quality
• Sound quality of concert halls and auditoria
• Sound quality in audio reproduction
• Noise quality
• Product sound quality
1
M. Karjalainen
Evaluation and measurement of sound quality

• Sound quality is a fundamentally subjective (perceptual) concept


but it can be approximated by objective and computational criteria
• Subjective quality can be evaluated by listening experiments, for
example:
– Compare to ’perfect quality’ reference to find out if any degradation
can be noticed
– Compare two or more sounds and sort then by quality preference
– Characterize sound quality by conceptual description (such as not
annoying, slightly annoying, annoying, very annoying)
– Give an overall quality rating on a numerical scale
– Give a rating for a specific quality factor (numerical scale)
– Give quality ratings for several different quality factors
(multidimensional scaling)
• Based on subjective experimentation, a computational (objective)
measure and model can be derived to simulate the perceived quality
– Objective measures are less laborious and yield high repeatability
– It is important to check the validity range of a model
2
M. Karjalainen
Development of sound quality models and theories

Theories and models in general

Computational models

Computational models with reference


3
M. Karjalainen
Intelligibility and quality of speech

• Intelligibility of speech in general depends on:


– the ability of a speaker to produce intelligible message and clear speech
– quality of speech transmission medium (acoustic or technical)
– the ability of a listener to analyze and conceive the message
• Technical concept of speech intelligibility:
– related to the quality of transmission channel
– developed since 1920’s (Harvey Fletcher, Bell Labs)
• Articulation
– score of correct recognition of phones and (nonsense) phone sequences
– articulation index is a measure that is additive from frequency bands
(like loudness adds from critical band specific loudnesses)
• Speaker identification score
– quality of channel to convey speaker identity
• Naturalness of speech
– particularly in speech synthesis (and coding)

4
M. Karjalainen
Speech quality: subjective measures and methods

• Articulation tests and articulation score


– /CV/ or /CVC/ sequences used to measure recognition percentage
• Intelligibility test and intelligibility score
– recognition percentage using meaningful words or sentences
• Rhyme tests (RT)
– using ’rhyme’ words or syllables (in Finnish: /patti/, /tatti/, /katti/)
• Diagnostic rhyme tests (DRT)
– modifying single distinctive feature at a time (nasality, voicing, etc.) in RT
• Speech interference tests (find a disturbing noise level of 50% articulation)
• Quality comparison method, including pairwise comparison methods
– ordering of sound examples by overall or specific quality factor
• Mean opinion score (MOS)
– overall rating on 1–5 scale
• Other methods
– Indirect judgement tests (PARM, QUART)
– Communicability tests (communicate a drawing task, measure the difficulty)
– Task recall tests (memorizing ability)
– Analytic measures (multidimensional scaling)

5
M. Karjalainen
Speech quality: objective measures and methods

• Articulation index (AI)


– for measuring a (linear) speech transmission channel with additive noise
– articulation loss is assumed to be additive from 20 frequency band AI
values
• Percentage articulation loss of consonants (%ALcons)
– measure of speech intelligibility, can be estimated from acoustic
properties of a room
• Room acoustical indices, see below
• Speech transmission index (STI, RASTI)
– based on modulation transfer function, see below
• Signal-to-noise ratio (SNR)
– ratio of speech vs. noise (power) level (in dB)
– segmental SNR (SNRseg) based on short-time segmental SNRs
• Spectral distance measures (distance measures in the frequency domain)
• Auditory sound quality measures (based on auditory modeling)
• Other methods
– weighted spectral slope distance
– LPC (linear prediction) distance measure

6
M. Karjalainen
MOS (mean opinion score)

• A very popular technique to quantify overall quality in speech


and audio
• Combines a quantitative scale and qualitative categorizations
• Three sorts of MOS measures used:
– MOS = (direct) evaluation on 1–5 scale
– DMOS = degradation MOS (how much signal is degraded)
– CMOS = comparative MOS (typically scale -3...+3)
• Sometimes a scale of 1–10 by step of 0.1 is used instead
• Basic MOS scaling:

Rating Quality (MOS) Degradation (DMOS)


5 Excellent Not noticeable
4 Good Just noticeable, not disturbing
3 Fair Noticeable, slightly disturbing
2 Poor Disturbing but tolerable
1 Bad Very disturbing
7
M. Karjalainen
Modulation transfer function

• The auditory system analyzes signals by critical bands


• Each band is analyzed by signal level, i.e., modulation
envelope
• More important than the exact transfer function is
modulation transfer function, i.e., how signal modulations in
each critical band are transmitted
• The auditory system is most sensitive to modulations of
about 4 Hz
• Modulation transfer is degraded by:
– Reverberation (lowpass of modulation)
– Background noise (reduction of relative modulation)
– These effects are multiplicative (cascaded)
• Modulation transfer function is a mathematically motivated
approximation of auditorily relevant signal transfer analysis

8
M. Karjalainen
Modulation transfer function (2)

9
M. Karjalainen
Modulation transfer function (3)

10
M. Karjalainen
Modulation transfer function (4): STI

• Total effect on modulation transfer function

• Apparent SNRapp vs. modulation reduction

• Speech transmission index (STI), for each band:


– STI = 1.0 for SNRapp  15 dB
– STI = 0.0 for SNRapp  -15 dB
– otherwise STI = m, see also next figure

– (Weighted) average of SNRapp values of bands is computed


and converted to total STI
11
M. Karjalainen
Modulation transfer function (5)

12
M. Karjalainen
STI vs. speech intelligibility

13
M. Karjalainen
RASTI vs. STI

• RASTI = Rapid STI


• Partial evaluation of
frequency bands &
modulation bands
used
• Specific RASTI
instrument available
for speech acoustics
evaluation

14
M. Karjalainen
Percentage articulation loss of consonants (%ALcons)

• Estimate of speech intelligibility


• %Alcons can be estimated

• where
– r = distance of source and listener
– RT = reverberation time
– V = room volume
– Q = directivity of a sound source
– k = constant (for individual listener) = 1.5 ... 12.5 %
• %Alcons can also be estimated from room
measurements
• %Alcons up to 25...30% can be tolerated in
meaningful speech due to information
redundancy
15
M. Karjalainen
Sound quality in concert halls (and performing spaces)

• Esthetic effects very important


– communication by esthetic and emotional factors
• ’Good acoustics’ depends on type of music
– for example tempo, mixture of instruments (size of orchestra)
• Many factors to be taken into account
– multidimensional scaling of quality needed
• Different proposed theories and models exist
– no full agreement upon indices and factors of quality
• Visual factors also very prominent in concert halls
– a concert is a multimodal experience to most listeners
• It is not only the audience but also the musicians
– stage acoustics is important as well
• Theaters and other performing spaces
– may require different acoustics
• Active (electroacoustically created or enhanced) acoustics
– used increasingly except for classical acoustic music
16
M. Karjalainen
Sound quality in concert halls: (1) subjective indices

• Intimacy or presence
• Reverberation (subjective)
• Spaciousness (apparent source width, listener envelopment)
• Clarity (separation of sounds and sources)
• Warmth (level and reverberation at low frequencies)
• Loudness
• Acoustic glare (walls should not reflect like mirrors)
• Brilliance (due to long reverberation at high frequences)
• Balance (how sound sources (instruments) are balanced)
• Blend (how instruments are mixed harmonically)
• Ensemble (how musicians can play together)
• Immediacy of response (from the hall back to musicians)
• Texture (how early reflections arrive to listeners)
• Freedom from echo (discrete echoes are highly undesirable)
• Dynamic range (useful range of playing levels)
• Extraneous effects on tonal quality (no extra sounds desired)
• Uniformity of sound (quality should be equal in all positions)

17
M. Karjalainen
Sound quality in concert halls: (2) objective measures

• Loudness
– Gmid (sound level at mid frequencies)
• Reverberation time
– RT60 (decay time of 60 dB for full hall)
– EDT (early decay time, 0–10 dB scaled to correspond to 60 dB)
• Clarity
– Early vs. late energy ratio C80 (empty hall)
• Spaciousness
– IACCearly (interaural cross-correlation, early)
– LFearly (lateral energy fraction, early)
• Envelopment
– IACClate and visual inspection of surface irregularity
• Intimacy
– ITDG (initial time delay)
• Warmth
– BR (base ratio, full hall)
• Stage support
– Early energy (20-100 ms), sound source on the stage 1m from the
microphone

18
M. Karjalainen
Objective sound quality in concert halls: definitions

• Interaural cross-correlation function IACFt()

from pressure signals of left and right ears


• Interaural cross-correlation, max of IACFt()
• Lateral energy fraction (LF or LEF)

• Gain factor (level vs 10 m free field distance level)

• Base ratio

• Stage support

19
M. Karjalainen
Early vs. late ratios

Clearness

Centertime

20
M. Karjalainen
Audio sound quality

• HiFi (High Fidelity) vs. professional reproduction


• Good quality is defined indirectly by loss of
degradations
• Degradations & distortions:
– Linear distortion
– Nonlinear distortion
– Transient distortion
– Noise & quantization noise (SNR)
– Spatially poor reproduction

21
M. Karjalainen
Perception of audio reproduction

• Phase in audio reproduction


– Group delay differences of about 1 ms are noticed in extreme cases
• In high-Q-value cases even much lower differences
– Group delay differences of about 2 ms become noticeable in critical
listening (about 60 cm of propagation distance difference)
– 5-10 ms group delay differences may start to be disturbing
– Even 50-100 ms group delay errors may be tolerable sometimes
– In spatial sound perception (Chapter 8): precedence effect
• Perception of distortion
– Linear distortion = magnitude and phase distortion
– Nonlinear distortion = new spectral components are produced

22
M. Karjalainen
Nonlinear distortion

• Nonlinear distortion
– In a nonlinear system a sine wave generates harmonics:

– If total rms level is:

– Then harmonic distortion (HM):

– HM is not a particularly good measure from a perceptual point of view


– Low-order HM may improve perceived quality
– JND:  1% for 2nd,  0.3% for 3rd,  0.1-0.3% for 4th harmonic

23
M. Karjalainen
Audio distortion mechanisms

• Other distortion mechanisms:


– Intermodulation distortion (IM)
• Sine waves, of f1 and f2 generate f1 – f2 , f1 + f1 etc.
• IM describes perceived distortion better than HM
– Transient intermodulation distortion (TIM)
• Distortion that is created in fast transients but not in steady
state signals
– Quantization noise in digital signal processing
• Perceived as distortion if correlation with signal
• Perceived as noise if not correlated
– Pre-echo in audio coding
• Temporal spreading of a signal in time ”backwards”

– Perceptual criteria needed in digital audio instead of simple


distortion and SNR measures
24
M. Karjalainen
Perceptual (objective) sound quality models for audio

• Schroeder
et al.:

• Karjalainen:

25
M. Karjalainen
PAQM (perceptual audio quality measure)

26
M. Karjalainen
Product sound quality

• Minimize negative effects and maximize positive


effects of product sound
• Examples:
– Cars and work machines
– Home appliances
– Office equipment
– Personal devices
• Computational models of product sound quality

27
M. Karjalainen
Chapter 11: Technical audiology

• How do we hear ? (discussed already)


• What if we don’t hear ?
– Why don’t we hear? (mechanisms)
– How to measure ? (audiometry)
– How to improve hearing? (hearing aids)
• Technical devices:
– Audiometric equipment
– Hearing aids
– Cochlear implants

1
M. Karjalainen
Hearing degradation I

• Hearing disabled population


– WHO: 270 million hearing disabled in the world (5 %)
– In Finland: ~740 000 with hearing degradation
14 000 new hearing device fittings per year
• Categories of handicap
– Disease (sairaus)
– Impairment (vaurio)
– Disability (toimintavajavuus)
– Handicap (haitta)
• Hearing disorders: social classification
– Hard-of-hearing persons (huonokuuloinen)
– Deafened persons (kuuroutunut)
– Deaf persons (kuuro)

2
M. Karjalainen
Hearing degradation II
• Medical classification of hearing impairments
– Conductive hearing loss (äänen johtumisvika)
• External and middle ear problems
• Attenuation of loudness
– Sensorineural hearing loss
• Inner ear and retrocochlear problems
• Attenuation or recruitment
• Tinnitus
– Central hearing loss
• Higher neural levels
• Problems in sound separation or speech analysis
• Problems in localization (spatial separation)
• Tinnitus
– Psychic hearing problems
• No clear physiological reason
3
M. Karjalainen
Hearing threshold change

4
M. Karjalainen
Audiometry

Audiometer and calibrated headphones


5
M. Karjalainen
Audiogram behavior

Loud noise effect Effect of age


(impulse noise) (presbyacusis)

6
M. Karjalainen
Degrees of hearing impairment

• Measure of hearing degradation


– Average of threshold values at
500, 1000, 2000, 4000 Hz

7
M. Karjalainen
Other hearing impairment problems

• Other effects of impairment


– Sound separation problems, particularly in
noise and reverberation
– Speech communication problems
– Tinnitus
• Source at different levels
• No good treatment known
• Often like sinusoidal tone,
but can be hum, broadband noise,
pulsation, etc.

8
M. Karjalainen
Ear drum impedance measurement

9
M. Karjalainen
Noise and causes of hearing loss

• Noise measurement
– A-weighted equivalent level

– 85 dB long-term daily exposure limit


• Other factors:
– Vibration
– Smoking
– Drugs
– Deseases
– Genetic effects
– Combined = often more than their sum
10
M. Karjalainen
Inner ear damage

Inner hair cell


damage

Outer hair cell


partial damage

Outer hair cell


full damage
11
M. Karjalainen
Temporary threshold shift

12
M. Karjalainen
Hearing protectors

Ear plugs Ear muffs


Attenuation

13
M. Karjalainen
Hearing aid types

14
M. Karjalainen
Hearing aid response

Typical frequency response


of a traditional hearing aid

Multichannel digital hearing aids:


- each frequency channel programmed separately

15
M. Karjalainen
Hearing aid gain control

Linear gain + limiter Automatic gain control

16
M. Karjalainen
Hearing aid AGC control

Feedback control Feedforward control

17
M. Karjalainen
Hearing aid output waveforms

18
M. Karjalainen
Other issues in hearing aids

• Directional microphones
• Binaural processing
• Noise cancellation
• Wind noise cancellation
• Feedback cancellation
• Speech enhancement

19
M. Karjalainen
Cochlear implants

• Electronic stimulation of auditory nerve

20
M. Karjalainen
Cochlear implants II

• ~100 000 units fitted worldwide


• For deafened adults and deaf-born children
• Price about 50 000 $ in USA
• Multielectrode devices nowadays
– (e.g. 24 channels)
– Speech from microphone is divided to channels
– Inductive coupling through skin
– Multielectrode in the cochlea
– Different pulse modulations used

21
M. Karjalainen