Benvenuto in Scribd!

Speaker Recognition: Tony (Mr. T)

Caricato da

Il 0% ha trovato utile questo documento (0 voti)

68 visualizzazioni10 pagine

This document discusses speaker recognition technology. It provides an overview of speaker recognition, including the different types (verification vs identification), systems used (enrollment databases, feature extraction, models, scoring, decision), and high level features involved (MFCC, i-vectors, x-vectors). Deep learning methods using neural networks like TDNN and large datasets like VoxCeleb have become the state-of-the-art for speaker recognition, represented by "x-vector" embeddings trained on millions of utterances. Classification is typically done using metrics like cosine distance or PLDA models.

Descrizione originale:

This slide is for my presentation at the School of AI - Pi Campus, Rome, Italia 2018.

Titolo originale

SpeakerRecognition_paperaday

Copyright

Formati disponibili

PPTX, PDF, TXT o leggi online da Scribd

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Segnala questo documento

Copyright:

Formati disponibili

Scarica in formato PPTX, PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Il 0% ha trovato utile questo documento (0 voti)

68 visualizzazioni10 pagine

Speaker Recognition: Tony (Mr. T)

Caricato da

Tran Trung

Copyright:

Formati disponibili

Scarica in formato PPTX, PDF, TXT o leggi online su Scribd

Segnala contenuti inappropriati

Salta alla pagina

Sei sulla pagina 1di 10

Cerca all'interno del documento

Speaker Recognition

Tony (Mr. T)

1
Papers
[1] Clark D. Shaver, John M. Acken , “A Brief Review of Speaker Recognition Technology”
Electrical and Computer Engineering Faculty Publications and Presentations 2016

[2] Lindasalwa Muda, Mumtaj Begam and I. Elamvazuthi, “Voice Recognition Algorithms using Mel
Frequency Cepstral Coefficient (MFCC) and Dynamic Time Warping (DTW) Techniques”

[3] ModelsDouglas A. Reynolds, Thomas F. Quatieri, and Robert B. Dunn, “Speaker Verification Using Adapted
Gaussian Mixture”, M.I.T. Lincoln Laboratory, 244 Wood St., Lexington, Massachusetts 02420, 2000

[4]Masahide Sugiyamat , Hidehumi Sazoait and Alexander H. Waibelj, “REVIEW OF TDNN (TIME DELAY NEURAL
NETWORK) ARCHITECTURES FOR SPEECH RECOGNITION” , School of Computer Science, CMU, Pittsburgh, PA,
15213, U.S.A

[5] David Snyder, Daniel Garcia-Romero, Gregory Sell, Daniel Povey, Sanjeev Khudanpur, “X-VECTORS: ROBUST
DNN EMBEDDINGS FOR SPEAKER RECOGNITION”, The Johns Hopkins University, Baltimore, MD 21218, USA

[6] Wei LiEmail authorTianfan FuJie Zhu, “An improved i-vector extraction algorithm for speaker verification”, EURASIP Journal
on Audio, Speech, and Music Processing, December 2015

[7] Arsha Nagrani†, Joon Son Chung†, Andrew Zisserman, “VoxCeleb: a large-scale speaker identification dataset”, Visual
Geometry Group, Department of Engineering Science,University of Oxford, UK
2
Speaker Recognition [1]

3
Type of Speaker Recognition [1]

Speaker
Verification

Speaker
Recognition Open-Set
Identification
Speaker
Identification Text-
Closed-Set Independent
Identification

Text-
Dependent

4
Speaker Recognition System [1]

Enroll Enroll
Database

Train Pre- Features

Models Scoring
processing Extraction

Test
Decision ID

5
High Level Features [2][3]

Sampling Freq. domain

overlap

Frame size

Mimic human ear

Features Reconstruct voice source

iDCT

Mel-Frequency Cepstral Coefficients

6
Deep Features [5][6]

● i-vector
Baseline model

● X-vector
Time Delay Neural Network

Could be understood like a 1D Convolution Neural Network

7
Classifier[4]

● Cosine distance
● K-nearest
● SVM
● PLDA

8
X-Vector the state of the art SR [5][7]

● Deep Learning
● Big Dataset: Voxceleb 1 + 2:

○ ~6000 spk of 1600 mil utterances

○ Free
● Open source: Kaldi

Speech MFCC TDNN PLDA

X- vector for Speaker Recognition 9

Thank you so much

Potrebbero piacerti anche

UTS g7
Documento15 pagine
UTS g7
John Emerson Merelos
Nessuna valutazione finora
Parenting Style Inventory
Documento8 pagine
Parenting Style Inventory
vinsynth
100% (1)
Personality Reflection Paper
Documento6 pagine
Personality Reflection Paper
api-314073148
100% (7)
SD Access Cisco DNA 1626950126
Documento80 pagine
SD Access Cisco DNA 1626950126
adil
Nessuna valutazione finora
Teenage Depression
Documento17 pagine
Teenage Depression
vaaleentiinee
85% (52)
Intro To Deep Learning
Documento39 pagine
Intro To Deep Learning
hiperboreoatlantec
Nessuna valutazione finora
Audio Signal Processing and Coding
Da Everand
Audio Signal Processing and Coding
Andreas Spanias
Nessuna valutazione finora
Microsoft SDL Threat Modeling: Michael Howard
Documento43 pagine
Microsoft SDL Threat Modeling: Michael Howard
anon_320877389
Nessuna valutazione finora
The Distracted Mind PDF
Documento6 pagine
The Distracted Mind PDF
santiagovb
Nessuna valutazione finora
Best Practices Cyber Security Testing
Documento4 pagine
Best Practices Cyber Security Testing
theeffani
100% (1)
The Human Person As An Embodied Spirit
Documento15 pagine
The Human Person As An Embodied Spirit
Michelin Danan
100% (1)
Digital Signal Processing: (Course code-ECE 303
Documento39 pagine
Digital Signal Processing: (Course code-ECE 303
Anurag Srivastav
100% (1)
Thought Disorder
Documento5 pagine
Thought Disorder
binteazhar
Nessuna valutazione finora
Beyond VoIP Protocols: Understanding Voice Technology and Networking Techniques for IP Telephony
Da Everand
Beyond VoIP Protocols: Understanding Voice Technology and Networking Techniques for IP Telephony
Olivier Hersent
Nessuna valutazione finora
Chapter 2 - Perception
Documento46 pagine
Chapter 2 - Perception
BeeChen
100% (1)
Gender Classification
Documento5 pagine
Gender Classification
Adedayo tunji
Nessuna valutazione finora
Introduction To Biometrics
Documento41 pagine
Introduction To Biometrics
toffiq
Nessuna valutazione finora
PTE Overview Slides
Documento12 pagine
PTE Overview Slides
nntvnn
Nessuna valutazione finora
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
Documento6 pagine
Speech Recognition: A Complete Perspective: Ashok Kumar, Vikas Mittal
varalakshmi suvarna
Nessuna valutazione finora
Valerio Maggio Keystroke Behavioural Analysis For Fraud Detection
Documento29 pagine
Valerio Maggio Keystroke Behavioural Analysis For Fraud Detection
sad
Nessuna valutazione finora
Ijet V3i4p19
Documento6 pagine
Ijet V3i4p19
International Journal of Engineering and Techniques
Nessuna valutazione finora
MBA EXE 4 Final Paper Course
Documento41 pagine
MBA EXE 4 Final Paper Course
Omerhayat Mian
Nessuna valutazione finora
Flexible OFDM Signal Generation, Analysis and Troubleshooting
Documento70 pagine
Flexible OFDM Signal Generation, Analysis and Troubleshooting
Edwin Drx
Nessuna valutazione finora
11
Documento5 pagine
11
20135A0504 CHAVAKULA VIJAYA RAMA KRISHNA
Nessuna valutazione finora
(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M
Documento6 pagine
(IJCST-V10I3P32) :rizwan K Rahim, Tharikh Bin Siyad, Muhammed Ameen M.A, Muhammed Salim K.T, Selin M
EighthSenseGroup
Nessuna valutazione finora
Gender Detection by Voice Using Deep Learning
Documento5 pagine
Gender Detection by Voice Using Deep Learning
International Journal of Innovative Science and Research Technology
Nessuna valutazione finora
Mohini Dey - Capstone
Documento52 pagine
Mohini Dey - Capstone
Gautham Krishna Kongattil
Nessuna valutazione finora
Speaker Recognition
Documento15 pagine
Speaker Recognition
Vignesh Vivekanandhan
100% (1)
Heligate - AI Machine Learning - 20210719 - EN
Documento51 pagine
Heligate - AI Machine Learning - 20210719 - EN
Leonard TV
Nessuna valutazione finora
ds862 Rs Decoder
Documento32 pagine
ds862 Rs Decoder
aliyazdani
Nessuna valutazione finora
2021-Titanet Neural Model For Speaker Representation With 1D Depth-Wise
Documento5 pagine
2021-Titanet Neural Model For Speaker Representation With 1D Depth-Wise
Mohammed Nabil
Nessuna valutazione finora
Jaideep jatinPPT
Documento12 pagine
Jaideep jatinPPT
harshdeepbaby4
Nessuna valutazione finora
Presentation Slides
Documento13 pagine
Presentation Slides
nguyen hung
Nessuna valutazione finora
Radio Identification
Documento7 pagine
Radio Identification
Savindu Nanayakkara
Nessuna valutazione finora
Mavlas Slides
Documento554 pagine
Mavlas Slides
itamar amrani
Nessuna valutazione finora
Digital Communication System
Documento11 pagine
Digital Communication System
praditya
Nessuna valutazione finora
Acu-Expert TD Eng
Documento2 pagine
Acu-Expert TD Eng
Radu Bogdan
Nessuna valutazione finora
Voice Activation Using Speaker Recognition For Controlling Humanoid Robot
Documento6 pagine
Voice Activation Using Speaker Recognition For Controlling Humanoid Robot
Dyah Ayu Anggreini
Nessuna valutazione finora
Audiowatermarking 160126225133
Documento24 pagine
Audiowatermarking 160126225133
hamed raza
Nessuna valutazione finora
Day 1 S2
Documento24 pagine
Day 1 S2
Shailaja Udtewar
Nessuna valutazione finora
Chapter 4
Documento26 pagine
Chapter 4
Adriano Vianna
Nessuna valutazione finora
ROMAIJCA
Documento8 pagine
ROMAIJCA
Akah Precious Chiemena
Nessuna valutazione finora
Digital Communication: Prof. Sandhya Potadar
Documento72 pagine
Digital Communication: Prof. Sandhya Potadar
vaishnavi khilari
Nessuna valutazione finora
Automatic Speaker Recognition System Based On Machine Learning Algorithms
Documento12 pagine
Automatic Speaker Recognition System Based On Machine Learning Algorithms
dileeppatra
0% (1)
Kimball Et Al 2019, Figure 2
Documento1 pagina
Kimball Et Al 2019, Figure 2
Abby Kimball
Nessuna valutazione finora
Week 05 ML
Documento31 pagine
Week 05 ML
Balaji Srinivas
Nessuna valutazione finora
5th Generation Wide Band DRFM
Documento2 pagine
5th Generation Wide Band DRFM
sophie
Nessuna valutazione finora
Speaker Recognition Using MFCC and VQ
Documento2 pagine
Speaker Recognition Using MFCC and VQ
Asjad Iqbal
Nessuna valutazione finora
Software Architecture
Documento15 pagine
Software Architecture
bara ankit
Nessuna valutazione finora
ISO27k ISMS Mandatory Documentation Checklist Release 1v1
Documento2 pagine
ISO27k ISMS Mandatory Documentation Checklist Release 1v1
Ngo Hong Quang
Nessuna valutazione finora
Lecture 3
Documento49 pagine
Lecture 3
Rakshith Kamath
Nessuna valutazione finora
Cryptanalysis of Radio Frequency Identification System Mutual Authentication
Documento25 pagine
Cryptanalysis of Radio Frequency Identification System Mutual Authentication
mexiwe
Nessuna valutazione finora
Fluke DTX 1800 Datasheet PDF
Documento9 pagine
Fluke DTX 1800 Datasheet PDF
Raja Balaji
Nessuna valutazione finora
130 NM CMOS Platform Technology
Documento16 pagine
130 NM CMOS Platform Technology
TahmidAzizAbir
Nessuna valutazione finora
4 Voice - PCM
Documento33 pagine
4 Voice - PCM
Christyan Leon
Nessuna valutazione finora
4 Voice - PCM
Documento33 pagine
4 Voice - PCM
Christyan Leon
Nessuna valutazione finora
Speaker Identification Using Mel Frequency Cepstral Coefficients
Documento5 pagine
Speaker Identification Using Mel Frequency Cepstral Coefficients
AliAmin
Nessuna valutazione finora
Water
Documento26 pagine
Water
Nagarajan Malmurugan
Nessuna valutazione finora
Speaker Identification Based Proxy Attendance Detection System
Documento5 pagine
Speaker Identification Based Proxy Attendance Detection System
Gv IIIT
Nessuna valutazione finora
Speaker Recognition
Documento11 pagine
Speaker Recognition
Raman
Nessuna valutazione finora
Application Determination
Documento40 pagine
Application Determination
craig
Nessuna valutazione finora
Stac9227 Idt
Documento237 pagine
Stac9227 Idt
Claudinei Figueira
Nessuna valutazione finora
Home Automation Please Read This Shit
Documento5 pagine
Home Automation Please Read This Shit
Enh Manlai
Nessuna valutazione finora
2 Pam2
Documento22 pagine
2 Pam2
Cak Riz
Nessuna valutazione finora
Yeastar S Series VoIP PBX Datasheet en
Documento2 pagine
Yeastar S Series VoIP PBX Datasheet en
hewhc85
Nessuna valutazione finora
Yeastar S-Series Voip PBX: Performance and Power
Documento2 pagine
Yeastar S-Series Voip PBX: Performance and Power
Jeronimo Reynoso
Nessuna valutazione finora
D1T2 - Najwa Aaraj - Side Channel Attacks Against iOS Crypto Libraries and More
Documento23 pagine
D1T2 - Najwa Aaraj - Side Channel Attacks Against iOS Crypto Libraries and More
kasama mori
Nessuna valutazione finora
Osi Model: Presented By:-Karan Mehta
Documento19 pagine
Osi Model: Presented By:-Karan Mehta
mrgodzilla00000000009
Nessuna valutazione finora
Dialog System A Comprehensive Understanding
Documento42 pagine
Dialog System A Comprehensive Understanding
Tran Trung
Nessuna valutazione finora
Knowing When To Look - Adaptive Attention Via A Visual Sentinel For Image Captioning
Documento9 pagine
Knowing When To Look - Adaptive Attention Via A Visual Sentinel For Image Captioning
Tran Trung
Nessuna valutazione finora
ALIZE 3.0 - Open Source Toolkit For State-Of-The-Art Speaker Recognition
Documento6 pagine
ALIZE 3.0 - Open Source Toolkit For State-Of-The-Art Speaker Recognition
Tran Trung
Nessuna valutazione finora
ALIZE 3.0 - Open Source Toolkit For State-Of-The-Art Speaker Recognition
Documento6 pagine
ALIZE 3.0 - Open Source Toolkit For State-Of-The-Art Speaker Recognition
Tran Trung
Nessuna valutazione finora
Multithreaded Java Approach To Speaker Recognition: Radosław Weychan, Tomasz Marciniak, Adam Dąbrowski
Documento6 pagine
Multithreaded Java Approach To Speaker Recognition: Radosław Weychan, Tomasz Marciniak, Adam Dąbrowski
Tran Trung
Nessuna valutazione finora
Mercedes Benz Technology in Car
Documento15 pagine
Mercedes Benz Technology in Car
Tran Trung
Nessuna valutazione finora
Academic Word List
Documento7 pagine
Academic Word List
Tran Trung
100% (1)
Inteligencia Artificial Java (English)
Documento222 pagine
Inteligencia Artificial Java (English)
donvidela
100% (4)
Assignment2 Solution
Documento10 pagine
Assignment2 Solution
Tran Trung
Nessuna valutazione finora
Lecture4 2
Documento34 pagine
Lecture4 2
santhosh
Nessuna valutazione finora
Neurocognitive Poetics: Methods and Models For Investigating The Neuronal and Cognitive-Affective Bases of Literature Reception
Documento22 pagine
Neurocognitive Poetics: Methods and Models For Investigating The Neuronal and Cognitive-Affective Bases of Literature Reception
Alina Roiniță
Nessuna valutazione finora
ELC501 Blended Learning - Week7
Documento27 pagine
ELC501 Blended Learning - Week7
Muzammel
Nessuna valutazione finora
Test Bank For Language in Mind An Introduction To Psycholinguistics 2nd Edition Julie Sedivy 2
Documento15 pagine
Test Bank For Language in Mind An Introduction To Psycholinguistics 2nd Edition Julie Sedivy 2
Addison Rogers
100% (28)
2015-Accuracy of Subthalamic Nucleus Targeting by T2, FLAIR
Documento8 pagine
2015-Accuracy of Subthalamic Nucleus Targeting by T2, FLAIR
Paul Rodrigo
Nessuna valutazione finora
Micro Teaching Lesson Plan Template
Documento5 pagine
Micro Teaching Lesson Plan Template
Jos Lopez
Nessuna valutazione finora
Scope and Sequence PANSARILI
Documento3 pagine
Scope and Sequence PANSARILI
mark belen
Nessuna valutazione finora
1adolescent Brain Development
Documento1 pagina
1adolescent Brain Development
ninaesba-1
Nessuna valutazione finora
Project - Based Learning
Documento12 pagine
Project - Based Learning
Agustina Sarti
Nessuna valutazione finora
Sarivanivatmaulia (2223170001 6a) Worksheet 11
Documento3 pagine
Sarivanivatmaulia (2223170001 6a) Worksheet 11
tapdew 25
Nessuna valutazione finora
Topic 3 Perception Attribution
Documento15 pagine
Topic 3 Perception Attribution
Sarvishnie Muniyandy
Nessuna valutazione finora
Differences Between Skeletal, Smooth and Cardiac Muscle
Documento10 pagine
Differences Between Skeletal, Smooth and Cardiac Muscle
ksbabar
Nessuna valutazione finora
17
Documento35 pagine
17
Gwen Pham
Nessuna valutazione finora
What Is The Function of The Various Brainwaves?
Documento3 pagine
What Is The Function of The Various Brainwaves?
michael784653
Nessuna valutazione finora
Half Yearly 23-24 Grade 5 - Answer Key
Documento4 pagine
Half Yearly 23-24 Grade 5 - Answer Key
shamshad
Nessuna valutazione finora
Concept Paper
Documento11 pagine
Concept Paper
Sun Shine
Nessuna valutazione finora
Rebecca Wilman 17325509 Educ4020 Assessment 2
Documento5 pagine
Rebecca Wilman 17325509 Educ4020 Assessment 2
api-314401095
Nessuna valutazione finora
DaTscan PDF
Documento3 pagine
DaTscan PDF
Suresh Umadi
Nessuna valutazione finora
Expectancy Theory of Motivation & Herzberg'S Theory of Motivation
Documento11 pagine
Expectancy Theory of Motivation & Herzberg'S Theory of Motivation
surabhi vyas
Nessuna valutazione finora
HED 487 - Defense Mechanisms Such As DID As A Result of Child Abuse
Documento9 pagine
HED 487 - Defense Mechanisms Such As DID As A Result of Child Abuse
Madelyn O'Connell
Nessuna valutazione finora
Difficulties Following Spoken Directions
Documento1 pagina
Difficulties Following Spoken Directions
api-230724261
Nessuna valutazione finora
(CITATION Sar11 /L 1033) : 1. James - Lange Theory
Documento5 pagine
(CITATION Sar11 /L 1033) : 1. James - Lange Theory
SHINGY
Nessuna valutazione finora