Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
/2m2
(1)
where k is the time index and m is the scale parameter. The Fourier
transform of the Morlet wavelet is given by
(, m) = 1/4 e(o /m)
/2
, 0
(2)
(k, 0)
(k 1, 0)
Wk =
...
(k N + 1, 0)
(4)
(10)
(11)
M
M
1
M
1
lni2 + lnv2 + 2
2
2
2v
i=1
(k, M 1)
(k 1, M 1)
..
.
(k N + 1, M 1)
(5)
yk = xk + vk
i=1
i2 v2
i2
|Ui yk |2
(12)
Combining the data independent terms into the threshold and scaling, the test statistics T (yk ) for the GLRT becomes
T (yk ) =
M
Gi |Ui yk |2
(13)
i=1
where
(6)
Gi =
where
2
xi
2
, xi
= i2 v2
i2
(14)
In Eq. (14), the singular value, xi , characterizes the signal component. Then the decision rule becomes
(7)
T (yk ) =
where
(8)
M
(15)
lni2
M lnv2
(16)
i=1
Using the subspace concept [8] and SVD of the correlation matrix
Ryy , the test statistics can approximate as
T(yk ) =
r
i yk |2
|U
(17)
i=1
U
r ] contains r orthogonal eigenvectors having
1 U
where U=[
the r largest eigenvalues of Ryy , whereas the remaining (M r)
eigenvectors correspond to the eigenvalues (power-spectral density) of the noise.
Then the decision rule can be implemented as
T(yk ) =
> for H1
yy ; H1 )ln p(yk , R
vv ; Ho )
l(yk ) = ln p(yk , R
for Ho
(9)
yy and R
vv are the maximum likelihood estimates of
where R
the covariance under hypothesis H1 and Ho . This GLRT test is
implemented by a simple and efficient scheme based on principle
component analysis (PCA) or subspace technique. For this the
covariance matrix of yk as Ryy under the hypothesis H1 is given
by
v2
> for H1
for Ho
where
=
v2
r
> for H1
for Ho
(18)
ln
i2
(M
r)ln
v2
(19)
i=1
and
i2
xi
v2
i = 1, , r
i = r + 1, , M
(20)
4. PERFORMANCE EVALUATION
1.5
Amplitude
0.5
0.5
(21)
1.5
10
12
Sample
10
(a)
1.5
0.5
Amplitude
|AD ED|
RE =
AD
0.5
1.5
10
15
Sample
10
(b)
Fig. 1. The results of segmentation for a recorded speech data
(The segmetation functions are plotted together with the real input
signal) (a) Example 1; (b) Example 2.
the signal, pauses with durations more than 1/2 second are selected automatically. Then, duration of each selected pause extends to 2 seconds so the problem of overlapping between the
two successive active speech segments occured due to reverberation, can be overcomed. For our particular data base, the durational information of 1/2 second(i.e. 2000 samples) and 2 seconds(16,000 samples) are chosen experimentally. In future, we
would like to vary these fixed values based on reverberation time
and tail information of the speech signal to improve the performance. The original speech signals and the results of extended
output signals for various male and female speakers are found at
http://www.ntu.edu.sg/home/efsattar/web/audio files.htm).
6. OBJECTIVE QUALITY MEASURE
We measured the spectral distances between the reference and distorted speech utterances. Our objective measure is based on parameterizations of a linear predictive vocal tract model of the reference and the distorted speech (due to reverberation in our case).
The parameters used in such measures can be the linear predictor
coefficients (LPC) [9]. In order to measure this, the reference and
distorted signals are divided into analysis frames of 20 ms duration. A linear predictive analysis is done for each frame of speech,
and the distance measure is calculated from the results of analysis
in the following way [10, 11, 12]:
d(Q, p, m) =
N
1
|Q(i, m, ) Q(i, m, d)|p
N
1/p
(22)
i=1
[7] L. L. Scharf, Statistical Signal Processing: Detection, Estimation and Time Series Analysis, Addison-Wesley, 1991.
[8] Alle-Jan V. der Veen, ED F. Deprettere, and A. Lee Swindlehurst, Sub-space based signal analysis using singular value
decomposition, Proc. of the IEEE, vol. 81, pp. 12771307,
1993.
[9] H. H. Monson, Statistical Digital Signal Processing and
Modeling, John Wiley & Sons Inc, 1996.
[10] M. A. Clements S. R. Quackenbush, T. P. Barnwell III ., Objective Measures of Speech Quality, Prentice Hall, 1988.
[11] J. G. Proakis J. R. Deller, J. J. L. Hansen, Discrete-Time
Processing of Speech Signals, IEEE Press, 1996.
[12] D. G. Manolakis J. G. Proakis, Digital Signal Processing,
Principles, Algorithms, and Applications, Prentice Hall,
1996.
[13] U. Zoler D. Arfib, F. Keiler, DAFX - Digital Audio Effects,
John Wiley, 2002.
(a)
7. CONCLUSION
0.5
0.5
0.5
1.5
2.5
5
x 10
2
1.5
1
0.5
0
0.5
1
1.5
0.5
1.5
2.5
5
x 10
(b)
Fig. 2. The application of segmentation; (a) Original speech signal, (b) corresponding reverberated speech signal (x-axis: samples,
y-axis: amplitude).
(a)
1
0.5
0.5
0.5
1.5
2.5
3
5
x 10
2
1.5
1
0.5
0
0.5
1
8. REFERENCES
1.5
0.5
1.5
2.5
3
5
x 10
(b)
[3] GSM 06.94 ETSI (digital cellular telecommunications system), Voice activity detector (vad) for adaptive multi-rate
(amr) speech traffic channels, European Telecommunication Standards Institute, 1999.
[4] O. Rioul and M. Vetterli, Wavelet and signal processing,
IEEE Signal Processing Magazine, pp. 1438, 1991.
[5] C. Torrence and G. P. Compo, A practical guide to wavelet
analysis, Bulletin of the American Meterological Society,
vol. 79, pp. 6176, Jan. 1998.
[6] S. M. Kay, Fundamentals of Statistical Signal Processing:
Detection Theory, Prentice-Hall, 1998.
(a)
2
1.5
Frame distance
0.5
200
400
600
800
frame index
1000
1200
1400
200
400
600
800
frame index
1000
1200
1400
1.5
Frame distance
0.5
(b)
Fig. 4. Illustration for speech quality/intelligibility measure; (a)
frame distances measured for Fig. 2, (b) frame distances measured
for Fig. 3.