Speech Recognition and Synthesis

LiverpoolJohn Moores University
School of Computing and Mathematical Sciences
CMSCD1008 Introduction to Multimedia Technology

Lecture 6: Speech recognition and synthesis
Chris Wren
c.wren@livjm.ac.uk
In this session...
What is speech recognition?

Types of speech recognition How does it work? What are its uses?
What is speech synthesis?

How does it work? What are its uses?
Speech recognition
Speech recognition is the process of recognising and understanding human speech and interpreting this inside a computer It is a very demanding task that requires a huge amount of processing power
Bear in mind that it takes humans many years to fully understand spoken instructions!
Has recently become more widespread due to

Reductions in cost of CPU chips Increases in CPU processing power
3
Types of speech recognition
Command-based
The user gives the computer simple spoken commands, e.g. Start Word
Discrete recognition
The user speaks single words separated by distinct pauses to construct a given sentence or phrase
e.g. This is HND Multimedia
Continuous recognition
The user speaks using natural language with no pauses (i.e. they use normal conversation)
4
Characteristics of speech
Phonemes are the fundamental elements of pronunciation in a language

There are about 80 phonemes in the English language from which all words are constructed
Normal conversation requires the recognition of 10 to 15 phonemes per second
Syllables are composed of phonemes Words are made up of syllables Phrases and sentences are composed of words
5
Understanding speech
I would like an ice-cream
Eye wood like a nice scream
This is the sentence that has been spoken by the human.
The first attempt at recognition will try to identify words that it understands.
I would like a nice scream

I would like an ice-cream
The recogniser then uses language- and grammar- specific rules to determine whether the sentence actually makes sense.
Some words have common pairings in general text and the recogniser will try many different combinations.
Speaker recognition
Speaker-independent recognition
The software can recognise any user Is generally pre-trained by a lot of different users Difficult to develop and expensive to build
Speaker-dependent recognition
The software can only recognise one user Is generally trained by that user Can be made to recognise new words
Problems in recognising speech

Background noise Differences in microphones

Headsets, lapel microphones and speaker phones
Children's voices as well as adult voices Accents and dialects Foreign languages Specialist vocabularies
e.g. legal or medical terminology
Recognition rates
The recognition rate is the percentage of words that a recogniser can accurately recognise without mistake This can be calculated as follows:
Number of correct words 100 Number of test words

Current rates are around 90% - 95% To improve this rate you have to train the recogniser to recognise your voice
9
Uses for speech recognition
Hands-free typing
Dictation (no need for a secretary) Language translation
Voice print identification

e.g. accessing a cash machine using your voice
Devices that are difficult to use with your hands

In a car e.g. the AutoPC
Improved accessibility for disabled people Telephone support

Eliminating press hash now type prompts
10
Speech synthesis
Speech synthesis is the conversion of electronic text into spoken output Sometimes known as Text-To-Speech (TTS) Has a reputation of sounding like a robot
Listen to Stephen Hawkings speech synthesiser!
Modern TTS synthesisers have very realistic sounding voices for general text
Some can even be made to sing and whistle!
11
Types of speech synthesiser
Formant synthesis
This models the human vocal system from scratch using a frequency analysis of real speech It then recreates these frequencies using a sound synthesiser
Concatenative synthesis
This constructs the speech by joining together small samples of the basic phonemes that make up real speech
12
Problems with speech synthesis

Speech synthesis is very challenging The main area where speech synthesisers are weak is in their simulation of prosody
The changes in rhythm, intonation and stress as we speak
To generate realistic prosody, the computer must understand the meaning of the text
Otherwise it sounds lifeless and electronic
13
Uses for speech synthesis
Proof-reading of written text

Documents, reports, etc. Email
Improved accessibility for disabled people

Screen readers
Automated telephone help systems User interface enhancements

Synthesised voice can be more friendly than text Voice does not take up any screen space
14
Summary
Speech recognition and speech synthesis require a large amount of processing power to be effective They also require large amounts of memory in which to process this data Successful use of a recognition package requires extensive training which is a continuous process Recognition rates are currently around 90% 95%
15
Next session...
Is a directed learning week
16

Speech Recognition and Synthesis

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Speech Recognition and Synthesis

Caricato da

Copyright:

Formati disponibili

LiverpoolJohn Moores University

School of Computing and Mathematical Sciences

CMSCD1008 Introduction to Multimedia Technology

What is speech recognition?

What is speech synthesis?

Has recently become more widespread due to

Types of speech recognition

Phonemes are the fundamental elements of pronunciation in a language

I would like a nice scream

Problems in recognising speech

Background noise Differences in microphones

Number of correct words 100 Number of test words

Uses for speech recognition

Voice print identification

Devices that are difficult to use with your hands

Improved accessibility for disabled people Telephone support

Types of speech synthesiser

Problems with speech synthesis

Uses for speech synthesis

Proof-reading of written text

Improved accessibility for disabled people

Automated telephone help systems User interface enhancements

Is a directed learning week

Potrebbero piacerti anche