Sei sulla pagina 1di 13

V.Satish Kumar Y. Anil Kumar S.

Durga Prasad

-Automatic speech recognition

ASR

What

is the task? What are the main difficulties? How is it approached? How good is it? How much better could it be?

3/34

Getting a computer to understand spoken language By understand we might mean

React appropriately Convert the input speech into another medium, e.g. text

Several variables impinge on this

How do humans do that?


Articulation produces sound waves which the ear conveys to the brain for processing
4/34

Digitization

Converting analogue signal into digital representation Separating speech from background noise Variability in human speech Recognizing individual sound distinctions (similar phonemes)

Signal processing Phonetics Phonology Lexicology and syntax


Disambiguating homophones Features of continuous speech

Syntax and pragmatics


Interpreting prosodic features Filtering of performance errors (disfluencies)


6/34

Pragmatics

go

home Markov model backbone composed of phones (hidden because we dont know correspondences)

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

Acoustic observations

Each line represents a probability estimate (more later)

Different

types of tasks with different difficulties


Speaking mode (isolated words/continuous speech) Speaking style (read/spontaneous) Enrollment (speaker-independent/dependent) Vocabulary (small < 20 wd/large >20kword) Language model (finite state/context sensitive) Perplexity (small < 10/large >100) Signal-to-noise ratio (high > 30 dB/low < 10dB) Transducer (high quality microphone/telephone)

Health

care

Military

Air

traffic controller

Mobile
Voice

telephony

User interface

Speech

to text

Speech

Recognition works best if the microphone is close to the user (e.g. in a phone, or if the user is wearing a microphone). More distant microphones (e.g. on a table or wall) will tend to increase the number of errors. User may speak different languagesLocal accents may not be recognized

Encouraged

by some innovative models, developments in ASR appear to be accelerating. The outlook is optimistic that future applications of automatic speech recognition will contribute substantially to the quality of life among deaf children and adults, and others who share their lives, as well as public and private sectors of the business community who will benefit from this technology

Potrebbero piacerti anche