Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
In this session...
Speech recognition
Speech recognition is the process of recognising and understanding human speech and interpreting this inside a computer It is a very demanding task that requires a huge amount of processing power
Bear in mind that it takes humans many years to fully understand spoken instructions!
Command-based
The user gives the computer simple spoken commands, e.g. Start Word
Discrete recognition
The user speaks single words separated by distinct pauses to construct a given sentence or phrase
e.g. This is HND Multimedia
Continuous recognition
The user speaks using natural language with no pauses (i.e. they use normal conversation)
4
Characteristics of speech
Syllables are composed of phonemes Words are made up of syllables Phrases and sentences are composed of words
5
Understanding speech
I would like an ice-cream
Eye wood like a nice scream
This is the sentence that has been spoken by the human.
The first attempt at recognition will try to identify words that it understands.
The recogniser then uses language- and grammar- specific rules to determine whether the sentence actually makes sense.
Some words have common pairings in general text and the recogniser will try many different combinations.
Speaker recognition
Speaker-independent recognition
The software can recognise any user Is generally pre-trained by a lot of different users Difficult to develop and expensive to build
Speaker-dependent recognition
The software can only recognise one user Is generally trained by that user Can be made to recognise new words
Children's voices as well as adult voices Accents and dialects Foreign languages Specialist vocabularies
e.g. legal or medical terminology
Recognition rates
The recognition rate is the percentage of words that a recogniser can accurately recognise without mistake This can be calculated as follows:
Hands-free typing
Dictation (no need for a secretary) Language translation
Speech synthesis
Speech synthesis is the conversion of electronic text into spoken output Sometimes known as Text-To-Speech (TTS) Has a reputation of sounding like a robot
Listen to Stephen Hawkings speech synthesiser!
Modern TTS synthesisers have very realistic sounding voices for general text
Some can even be made to sing and whistle!
11
Formant synthesis
This models the human vocal system from scratch using a frequency analysis of real speech It then recreates these frequencies using a sound synthesiser
Concatenative synthesis
This constructs the speech by joining together small samples of the basic phonemes that make up real speech
12
Speech synthesis is very challenging The main area where speech synthesisers are weak is in their simulation of prosody
The changes in rhythm, intonation and stress as we speak
To generate realistic prosody, the computer must understand the meaning of the text
Otherwise it sounds lifeless and electronic
13
Summary
Speech recognition and speech synthesis require a large amount of processing power to be effective They also require large amounts of memory in which to process this data Successful use of a recognition package requires extensive training which is a continuous process Recognition rates are currently around 90% 95%
15
Next session...
16