Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
潘奕誠
4/7/2003
Introduction
Efficient speech-coding techniques
Advantages for VoIP
Digital streams of ones and zeros
The lower the bandwidth, the lower the
quality
RTP payload types
Processing power
The better quality (for a given bandwidth)
uses a more complex algorithm
A balance between quality and cost
Voice Quality
Bandwidth is easily quantified
Voice quality is subjective
MOS, Mean Opinion Score
ITU-T Recommendation P.800
Excellent – 5
Good – 4
Fair – 3
Poor – 2
Bad – 1
A minimum of 30 people
Listen to voice samples or in conversations
P.800 recommendations
The selection of participants
The test environment
Explanations to listeners
Analysis of results
Toll quality
A MOS of 4.0 or higher
About Speech
Speech
Air pushed from the lungs past the vocal
cords and along the vocal tract
The basic vibrations – vocal cords
The sound is altered by the disposition of
the vocal tract ( tongue and mouth)
Model the vocal tract as a filter
The shape changes relatively slowly
The vibrations at the vocal cords
The excitation signal
Speech sounds
Voiced sound
The vocal cords vibrate open and close
Quasi-periodic pulses of air
The rate of the opening and closing – the pitch
Unvoiced sounds
Forcing air at high velocities through a constriction
Noise-like turbulence
Show little long-term periodicity
Short-term correlations still present
Plosive sounds
A complete closure in the vocal tract
Air pressure is built up and released suddenly
Voice Sampling
Discrete Time LTI Systems: The
Convolution Sum
x[n] x[k ] [n k ]
k
y[n] x[k ]h[n k ]
k
1
h[n]
0 1 2 n
2.5
2 2
x[n] y[n]
0.5 0.5
0 1 n 0 1 2 3 n
Nyquist sampling theorem
X c ( j )
s (t ) (t nT )
n
N N
xs (t ) xc (t ) s (t )
xc (t ) (t nT )
S 0 X c ( j ) S n
2
S ( j)
T
( k )
k
s
S N N S
( S N )
Quantization (Scalar
Quantization)
v1 v2 vk+1 vL
x[n] ^
x[n]
F(x) Uniform Uniform F1(x)
Quantization Decoder
formant structure of
speech signals
A good approximation,
though not precise enough
LPC Vocoder(Voice Coder)
x[n] { ak }
LPC Encoder
Analysis N,G
…11011
v/u
…
N by pitch detection
v/u by voicing detection
receiver
{ ak } x[n]
Decoder Ex g[n]
N,G G(z)
…11011
v/u
…
If uniform quantization
12 bits * 8 k/sec = 96 kbps
Non-uniform quantization
65 kbps DS0 rate
law
North America
A-law
Other countries, a little friendlier to
of the excitation
Transmit
Filter coefficients, gain, a pointer to the vector chosen
Delay < 1 ms
1
h[n]
0 1 2 n
2 2.5 2
x[n] y[n]
0.5 0.5
0 1 n 0 1 2 3 n
Frequency-Domain
Representation of Sampling
X c ( j)
s (t ) (t nT )
n
N N
xs (t ) xc (t ) s (t )
xc (t ) (t nT )
S 0 X c ( j) S n
2
S ( j )
T
( k )
k
s
S N N S
( S N )
Speech Source Model and
Source Coding
Vocal Tract Model
p
u (n) ak x[n k ] x[n]
k 1
1 X ( z)
G( z) p
U ( z)
1 ak z k
k 1