Sei sulla pagina 1di 10

Speaker Recognition

Tony (Mr. T)

1
Papers
[1] Clark D. Shaver, John M. Acken , “A Brief Review of Speaker Recognition Technology”
Electrical and Computer Engineering Faculty Publications and Presentations 2016

[2] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “Voice Recognition Algorithms using Mel
Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”

[3] ModelsDouglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn, “Speaker Verification Using Adapted
Gaussian Mixture”, M.I.T. Lincoln Laboratory, 244 Wood St., Lexington, Massachusetts 02420, 2000

[4]Masahide Sugiyamat , Hidehumi Sazoait and Alexander H. Waibelj, “REVIEW OF TDNN (TIME DELAY NEURAL
NETWORK) ARCHITECTURES FOR SPEECH RECOGNITION” , School of Computer Science, CMU, Pittsburgh, PA,
15213, U.S.A

[5] David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, Sanjeev Khudanpur, “X-VECTORS: ROBUST
DNN EMBEDDINGS FOR SPEAKER RECOGNITION”, The Johns Hopkins University, Baltimore, MD 21218, USA

[6] Wei LiEmail authorTianfan FuJie Zhu, “An improved i-vector extraction algorithm for speaker verification”, EURASIP Journal
on Audio, Speech, and Music Processing, December 2015

[7] Arsha Nagrani†, Joon Son Chung†, Andrew Zisserman, “VoxCeleb: a large-scale speaker identification dataset”, Visual
Geometry Group, Department of Engineering Science,University of Oxford, UK
2
Speaker Recognition [1]

3
Type of Speaker Recognition [1]

Speaker
Verification

Speaker
Recognition Open-Set
Identification
Speaker
Identification Text-
Closed-Set Independent
Identification

Text-
Dependent

4
Speaker Recognition System [1]

Enroll Enroll
Database

Train Pre- Features


Models Scoring
processing Extraction

Test
Decision ID

5
High Level Features [2][3]

Sampling Freq. domain

overlap

Frame size

Mimic human ear

Features Reconstruct voice source

iDCT

Mel-Frequency Cepstral Coefficients

6
Deep Features [5][6]

● i-vector
Baseline model

● X-vector
Time Delay Neural Network

Could be understood like a 1D Convolution Neural Network

7
Classifier[4]

● Cosine distance
● K-nearest
● SVM
● PLDA

8
X-Vector the state of the art SR [5][7]

● Deep Learning
● Big Dataset: Voxceleb 1 + 2:

○ ~6000 spk of 1600 mil utterances

○ Free
● Open source: Kaldi

ID

Speech MFCC TDNN PLDA

X- vector for Speaker Recognition 9


Thank you so much

10