Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Mehryar Mohri
Courant Institute of Mathematical Sciences
mohri@cims.nyu.edu
Speech Recognition Components
Acoustic and pronunciation model:
!
Pr(o | w) = Pr(o | d) Pr(d | c) Pr(c | p) Pr(p | w).
d,c,p
a2 a1
R1 R2 X1
• Regression:
(x1 , y1 ), . . . , (xm , ym ) ∈ RN × R
• Classification (k classes):
(x1 , y1 ), . . . , (xm , ym ) ∈ RN × {1, . . . , k}
Result:
!
r
h(x) = ai 1x∈Ri
i=1
play
play do not play do not play
barometer <= 18 in
(19/236) (1/2) (9/72)
(48/113)
• misclassification:
1 !
C(T, j) = 1yi "=cj = 1 − pcj ,j .
|{xi ∈ Rj }|
xi ∈Rj
• Entropy:
!
k
C(T, j) = − pc,j log pc,j .
c=1
Consonants:
plosives: affricates:
fricatives:
nasals:
approximants:
n ae r 0
d0:!
1
d1:!
2
d2:ae n,r
3
p ae n 0
d0:!
1
d1:!
2
d2:aep,n
3
L(Sl ) L(Sr )
Best question:
! " N N
" #
2 2
q ∗ = argmin ml log(σlk ) − mr log(σrk ) ,
q
k=1 k=1
1 ! 2 1 !
with 2
σlk = xk − 2 ( xk )2
ml ml
x∈Sl x∈Sl
2 1 ! 2 1 !
σrk = xk − 2 ( xk )2 .
mr mr
x∈Sr x∈Sr
y,y y:y/y_!
y:y/x_ ! y,!
y:y/ !_!
• Luc Devroye, Laszlo Gyorfi, Gabor Lugosi. A Probabilistic Theory of Pattern Recognition.
Springer, 1996.
• Ladefoged, P., A Course in Phonetics., New York: Harcourt, Brace, and Jovanovich, 1982.
Automatic Generation of Lexicons 17.
• Kai-Fu Lee. Context-Dependent Phonetic Hidden Markov Models for Continuous Speech
Recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, 38(4):
599-609, 1990.
• Mehryar Mohri, Fernando C. N. Pereira, and Michael Riley. Speech Recognition with
Weighted Finite-State Transducers. In Larry Rabiner and Fred Juang, editors, Handbook on
Speech Processing and Speech Communication, Part E: Speech recognition. volume to
appear. Springer-Verlag, Heidelberg, Germany, 2007.
• Randolph, M. "A data-driven method for discover and predicting allophonic variation,"
Proc. ICASSP `90, S14.10, 1990.
• Michael Riley and Andrej Ljolje. Lexical access with a statistically-derived phonetic
network. In Proceedings of the European Conference on Speech Communication and
Technology, pages 585-588, 1991.
• Steve Young, J. Odell, and Phil Woodland. Tree-Based State-Tying for High Accuracy
Acoustic Modelling. In Proceedings of ARPA Human Language Technology Workshop, Morgan
Kaufmann, San Francisco, 1994.