Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract Speech Recognition is the process of converting a speech signal to a sequence of words, by means of algorithms implemented as a computer program. Speech is the most natural form of human communication. Speech recognition
technology has made it possible for computer to follow human voice commands and understand human languages. The
main goal of speech recognition area is to develop techniques and systems for speech input to machine. Dynamic Time
Warp and Hidden Markov techniques are used for isolated word recognition. The objective of this paper is to compare their
performances.
Keywords Comparison, Performance Evaluation, Dynamic Time Wrapping model, Hidden Markov Model
1. Introduction
The Speech recognition is an area with considerable literature, but there is little discussion of the topic within the computer
science algorithms literature. Speech recognition is a multileveled pattern recognition task, in which a acoustical signals are
examined and structured into a hierarchy of sub words units (e.g. phonemes), words, phrases, and sentences. Each level may
provide additional temporal constraints, e.g. known word pronunciations or legal word sequences, which can compensate for
errors or uncertainties of lower levels. This hierarchy of constraints can best be exploited by combining decisions probabilistically at all lower levels, and making discrete decisions only at highest level [5]. There are two related speech tasks they can
be listed as; Speech understanding and Speech recognition
Speech understanding is getting the meaning of an utterance such that one can respond properly whether or not one has
correctly recognized all the words. Speech recognition is simply transcribing the speech without necessarily knowing the
meaning of the utterance. The technology is also helpful to handicapped persons who might otherwise require helpers to
control their environments. Studies in speaking recognition field, as well as studies in other fields, follow two trends; Fundamental research whose goal is to devise and test new methods, algorithms and concepts in a non- commercial manner and
applied research whose goal is to improve existing methods, following specific criteria.
The fundamental research aims at medium and especially long term benefits , while applied research aims at quick performance improvements of existing methods or extending their use in domains where they have less been used so far. Improving
performance in speech recognition can be done taking into account the following criteria;
Dimension of recognizable vocabulary
Spontaneous degree of speaking to be recognized.
Dependence or independence on the speaker.
Time to put in motion of the system.
System accommodating time at new speakers.
Decision and recognition time.
Recognition rate which is expressed by words or by sentences.
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -22
ISSN: 2395-0560
2. Literature review
Speech recognition technology has many applications on embedded systems, Such as stand-alone devices and single purpose
command and control systems. This paper presents the comparative study on the algorithms which are mainly used for the
speech recognition. This report represents and emphasizes techniques to prevent overflows of probability scores and efficiently represent some of the key variability in implementing them for the process. The highly complex algorithms have to be
optimized to meet the limitations in computing power and memory resources. The optimization which typically involves
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -23
ISSN: 2395-0560
Although, dynamic time wrapping is an early developed ASR technique, dynamic time wrapping was being popular in lots of
applications. Dynamic time wrapping is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligence comparison of speech recognition system for using an improved dynamic
time wrapping approach for multimedia and other areas. The improved version of dynamic time wrapping presented in the
recent areas of work called hidden markov model- like dynamic time wrapping is essentially a Hidden Markov Model like
method where the concept of the typical hidden markov model statistical model is brought into the design of dynamic time
wrapping. the developed hidden markov model like dynamic time wrapping method, transforming feature based dynamic
time wrapping based recognition into model based dynamic time wrapping recognition, will be able to behave as the hidden
markov model recognition technique and therefore proposed hidden markov model like dynamic time wrapping with the
hidden markov model- like recognition model will have the capability to the further perform model adaption (Speaker
adaption).
Speaker verification is one among the widely used biometrics which usually offer more secure authentication for user access
than regular passwords. This is one of the areas in which speech recognition plays an important role for security issues.
Dynamic Time wrapping and Hidden Markov Model are two well-studied non-linear sequence alignment algorithm. The
research trend from dynamic time wrapping to hidden markov model in approximately1988-1990, since dynamic time
wrapping is a deterministic and lack of the power to model stochastic signals. Dynamic time wrapping has been applied to
mostly in speech recognition, since its obvious that the speech tends to have different temporal rate, and alignment is very
important for a good performance. The standard dynamic time wrapping is basically using the idea of deterministic Dynamic
programming. However, a lot of real signals are stochastic processes, such as speech signal, video signaled. A new algorithm
called stochastic dynamic time wrapping was introduced based on the drawbacks of basic dynamic time wrapping. in hidden
markov model, Viterbi algorithm is used for searching the optimal state transition sequence, for a given observation sequence.
It turns out to be another application of dynamic programming to cut down its computations. In a regular Markov Model, the
state is directly visible to the observer and therefore the state transition probabilities are the only parameters. In a hidden
markov model, the state is not directly visible, but variables influenced by the state are visible. Each state has a probability
distribution over the possible output tokens. Therefore the sequence o token generated by a hidden markov model gives some
information about the sequence of states.
3. Speech Recognition
The schematic diagram depicts the block diagram if speech recognition. The process description goes as; Input is digitalized
into a sequence of feature vectors. An acoustic phonetic recognizer transforms the feature vectors into a time sequenced
lattices of phones. A word recognition module transforms the phones lattice into a word lattice, with the help of a lexicon.
Finally, in the case of continuous or connected word recognition, a grammar is applied to pick the most likely sequence of
words from the word lattice. As fundamental equations of speech recognition there are many paradigms of speech recognition
systems, the major two paradigms which are used abundantly are; stochastic approach and Template based approach.
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -24
ISSN: 2395-0560
Speech
Speech Recognition
Feature Vector Lattice
Acoustic Models
Phonetic Recognition
Phone lattice
Lexicon
Word Recognition
Word Lattice
Grammar
Task Recognition
Text
The other efficiently used algorithm while in the speech recognition is Dynamic Time Wrapping algorithm (DTW) The
Dynamic Time Wrapping is an algorithm which calculates an optimal wrapping path between two time series. The algorithm
calculates both wrapping path values between the two series and the distance between them. Suppose we have two numerical sequences (a1,a2..an) and (b1,b2bm). As we can see, the length of the two sequences can be different. The algorithm starts with local distances calculation between the elements of the two sequences using different types of distances. The
most frequent used method for distance calculation is the absolute distance between the values of the two elements (Euclidian
distance). That results in a matrix of distances having n lines and m columns. Starting with local distances matrix, then the
minimal distance matrix between sequences is determined using the dynamic programming algorithm.
1.
Deterministic model : This exploits some known property of signal lke amplitude of wave.
2.
Statistical model : This model takes statistical property of signal property of signal into acount . Example of this
type model is said to be Gaussian Model, Poisson Model, Markov Model and Hidden Markov Model.
The Categorization od speech recognition can be based on different aspect of speechs, they can be desribed as ;
1.
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -25
ISSN: 2395-0560
nizer. As this recognizer is complex in nature, the word boundary conditions are followed as the start and end condition must
be specified in detail.
3.
Speaker Dependent
This category of speech recognition needs less training and the recognition of speech is based on the specific
speaker.
4.
Speaker Independent
Speaker Independent recognizer does not depend on specific speaker. This is of more general recognition
category but yet, this requires large training data.
5. Comparative Analysis
Each algorithm has its own advantages and disadvantages, which makes dynamic time wrapping good for one type of application and hidden markov model better for another. Hidden markov model works very well if there is a large amount of
training data available, however it does not do as well as dynamic time wrapping when there is a limited number of training
samples. A core advantage of hidden markov model algorithm is that it is can work well even if there is high within-gesture variance.
Dynamic time wrapping on the other hand needs several templates if there is high within-gesture variance. The major disadvantage of hidden markov model algorithm is that it has a lot of magic numbers that need to be select by the user before
training the algorithm. Dynamic time wrapping has very few magic numbers and can therefore be easier.
Table 1. Comparative Analysis
Sr. No
1.
Parameter
Complexity
The complexity of
Dynamic Time
wrapping algorithm is
O(n+|E|log n)
O(|E|log|E|)
allowed.
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -26
ISSN: 2395-0560
Structure
Security
secured.
Reliability
5.
6.
7.
Backup
Data Transmission
Economical
be no backup of data.
transmission.
Technology is used
Approaches
backward algorithm.
time or speed.
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -27
ISSN: 2395-0560
Applications
applications like;
10.
Constraints
Cryptanalysis
Speaker recognition
Speech synthesis
Machine translation
Evaluation:
Boundary conditions
Continuity.
6. Conclusion
In this research we tried to find better technique for speech recognition. The fact that performance of Hidden Markov Model
recognizer is somewhat poorer than the Dynamic Time Wrapping based recognizer appears to be primarily because of insufficiency of the Hidden Markov models training data. The Performance of the Hidden Markov Model depends on the
number of states of the model. It is necessary that number of states should be such that they can model the word. The time and
space complexity of Hidden Markov Model approach is less than the Dynamic Time Wrap approach because the hidden
markov model tests to compute the probability of each model to produce that observed sequence. In some cases, the accuracy
of hidden markov model is better in comparison to Dynamic time wrapping algorithm for speech recognition. Speech
recognition using hidden markov model gives good result due to resemblance between the architecture of hidden markov
model and varying speech data.
Neural network is another method, which uses gradient decent method with back propagation algorithm. While in hidden
markov model, the recognition ability is good for unknown word. Hidden markov model is generic concept and is used in
many area of research. Dynamic time wrapping algorithm is very useful for isolated word recognition in a limited dictionary.
For fluent speech recognition, Hidden Markov model chains are used. The main complexity in using dynamic time wrapping
is that it may not be as satisfactory for a large dictionary which could ensure an increase in the success rate of recognition
process. The models provide flexible but rigorous stochastic frame work in which to build our systems. However, hidden
markov models do not model certain aspects of speech recognition such as suprasemental (long span) words phenomena
well.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
F. Jelinek. "Continuous Speech Recognition by Statisical Methods." IEEE Proceedings 64:4(1976): 532-556
Young, S., A Review of Large-Vocabulary Continuous Speech Recognition, IEEE Signal Processing Magazine, pp. 45-57, Sep.
1996
Sakoe, H. & S. Chiba. (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE, Trans. Acoustics,
Speech, and Signal Proc., Vol. ASSP-26.
D. Raj Reddy, Speech Recognition by Machine: A Review, Proceedings of the IEEE, Vol. 64, No. 4, April 1976,pp 501-531.
Santos, J. ,Ciudad Universitaria, Madrid, Spain ,Nombela, J. Text-to-speech conversion in Spanish a complete rule-based
synthesis system||Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '82
Kain, A. ,CSLU, Oregon Graduate Inst. of Sci. & Technol., Beaverton, OR Macon, M.W. Spectral voice conversion for
text-to-speech synthesis || Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -28