Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Tony (Mr. T)
1
Papers
[1] Clark D. Shaver, John M. Acken , “A Brief Review of Speaker Recognition Technology”
Electrical and Computer Engineering Faculty Publications and Presentations 2016
[2] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “Voice Recognition Algorithms using Mel
Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”
[3] ModelsDouglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn, “Speaker Verification Using Adapted
Gaussian Mixture”, M.I.T. Lincoln Laboratory, 244 Wood St., Lexington, Massachusetts 02420, 2000
[4]Masahide Sugiyamat , Hidehumi Sazoait and Alexander H. Waibelj, “REVIEW OF TDNN (TIME DELAY NEURAL
NETWORK) ARCHITECTURES FOR SPEECH RECOGNITION” , School of Computer Science, CMU, Pittsburgh, PA,
15213, U.S.A
[5] David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, Sanjeev Khudanpur, “X-VECTORS: ROBUST
DNN EMBEDDINGS FOR SPEAKER RECOGNITION”, The Johns Hopkins University, Baltimore, MD 21218, USA
[6] Wei LiEmail authorTianfan FuJie Zhu, “An improved i-vector extraction algorithm for speaker verification”, EURASIP Journal
on Audio, Speech, and Music Processing, December 2015
[7] Arsha Nagrani†, Joon Son Chung†, Andrew Zisserman, “VoxCeleb: a large-scale speaker identification dataset”, Visual
Geometry Group, Department of Engineering Science,University of Oxford, UK
2
Speaker Recognition [1]
3
Type of Speaker Recognition [1]
Speaker
Verification
Speaker
Recognition Open-Set
Identification
Speaker
Identification Text-
Closed-Set Independent
Identification
Text-
Dependent
4
Speaker Recognition System [1]
Enroll Enroll
Database
Test
Decision ID
5
High Level Features [2][3]
overlap
Frame size
iDCT
6
Deep Features [5][6]
● i-vector
Baseline model
● X-vector
Time Delay Neural Network
7
Classifier[4]
● Cosine distance
● K-nearest
● SVM
● PLDA
8
X-Vector the state of the art SR [5][7]
● Deep Learning
● Big Dataset: Voxceleb 1 + 2:
○ Free
● Open source: Kaldi
ID
10