The Comparative Study of Speech Recognition Models

ISSN: 2395-0560
International Research Journal of Innovative Engineering

www.irjie.com
Volume1, Issue 3 of March 2015
The Comparative Study of Speech Recognition Models

-Hidden Markov and Dynamic Time Wrapping Model
V.Vaidhehi1, Anusha J2, Anand P3
1,2,3
Department of Computer Science, Christ University, Bangalore, 560034, India
Abstract Speech Recognition is the process of converting a speech signal to a sequence of words, by means of algorithms implemented as a computer program. Speech is the most natural form of human communication. Speech recognition
technology has made it possible for computer to follow human voice commands and understand human languages. The
main goal of speech recognition area is to develop techniques and systems for speech input to machine. Dynamic Time
Warp and Hidden Markov techniques are used for isolated word recognition. The objective of this paper is to compare their
performances.
Keywords Comparison, Performance Evaluation, Dynamic Time Wrapping model, Hidden Markov Model
1. Introduction
The Speech recognition is an area with considerable literature, but there is little discussion of the topic within the computer
science algorithms literature. Speech recognition is a multileveled pattern recognition task, in which a acoustical signals are
examined and structured into a hierarchy of sub words units (e.g. phonemes), words, phrases, and sentences. Each level may
provide additional temporal constraints, e.g. known word pronunciations or legal word sequences, which can compensate for
errors or uncertainties of lower levels. This hierarchy of constraints can best be exploited by combining decisions probabilistically at all lower levels, and making discrete decisions only at highest level [5]. There are two related speech tasks they can
be listed as; Speech understanding and Speech recognition
Speech understanding is getting the meaning of an utterance such that one can respond properly whether or not one has
correctly recognized all the words. Speech recognition is simply transcribing the speech without necessarily knowing the
meaning of the utterance. The technology is also helpful to handicapped persons who might otherwise require helpers to
control their environments. Studies in speaking recognition field, as well as studies in other fields, follow two trends; Fundamental research whose goal is to devise and test new methods, algorithms and concepts in a non- commercial manner and
applied research whose goal is to improve existing methods, following specific criteria.
The fundamental research aims at medium and especially long term benefits , while applied research aims at quick performance improvements of existing methods or extending their use in domains where they have less been used so far. Improving
performance in speech recognition can be done taking into account the following criteria;
Dimension of recognizable vocabulary
Spontaneous degree of speaking to be recognized.
Dependence or independence on the speaker.
Time to put in motion of the system.
System accommodating time at new speakers.
Decision and recognition time.
Recognition rate which is expressed by words or by sentences.
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -22
ISSN: 2395-0560

www.irjie.com
Todays speech recognition systems are based on the general principles of forms recognition [1][2]. The methods and algorithms that have been used so far can be divided into four large classes as;
Dimension of recognizable vocabulary

Spontaneous degree of speaking to be recognized.
Dependence or independence on the speaker.
Time to put in motion of the system.
System accommodating time at new speakers.
Decision and recognition time.
Recognition rate which is expressed by words or by sentences.
Todays speech recognition systems are based on the general principles of forms recognition [1][2]. The methods and algorithms that have been used so far can be divided into four large classes as;
Discriminated Analysis Methods based on Bayesian discrimination.
Hidden Markov Model
Dynamic programming Dynamic Time wrapping (DTW) [4].
Neural networks.
Algorithm optimization is therefore necessary to remove undesirable operations s far as possible. Hidden Markov Model is
popular statistical models used to implement speech recognition technologies [3]. The time variances in the spoken language are modeled as Markov processes with discrete states. Each state produces speech observations according to the
probability distribution characteristics of that state. The speech observations can take on a discrete or a continuous value. In
either case, speech observations represent fixed time duration that is, a frame. The states are not directly observable, which is
why the model is called the hidden Markov model.
The organization of this paper has been structured in the systematic manner as the introduction leads the reader from a general subject area to a particular field of research. It establishes the context and significance of the research being conducted
by summarizing current understanding and background information, stating the purpose of the work in the form of the research problem supported by a hypothesis or a set of questions, briefly explaining the methodological approach used by
highlighting the potential outcomes which the study reveals and outlining the remaining structure of the paper. The literature
review describes the overall goal with an integrative summary of the other research findings and the questions that remains
unanswered or require additional research. The next sections of the paper are brief insight of speech recognition and categorization of speech recognition. Comparative analysis is made based on the study made throughout the course. The conclusions are made at the end of the work as they discuss instances in which the findings were made.
2. Literature review
Speech recognition technology has many applications on embedded systems, Such as stand-alone devices and single purpose
command and control systems. This paper presents the comparative study on the algorithms which are mainly used for the
speech recognition. This report represents and emphasizes techniques to prevent overflows of probability scores and efficiently represent some of the key variability in implementing them for the process. The highly complex algorithms have to be
optimized to meet the limitations in computing power and memory resources. The optimization which typically involves
_____________________________________________________________________________________________________________
Page -23
ISSN: 2395-0560

www.irjie.com
simplification and approximation, inevitably leads to the loss of precision and the degradation of recognition accuracy. This
paper describes the comparison of state of art algorithms and their techniques for the speech recognition. By optimizing the
speech recognition algorithms, the computation time for both front-end and pattern recognition has been efficiently reduced.
On the other hand, the execution time for the back-end is proportional to the complexity of the model. The more complex the
model is, the more execution time and memory will be required.
The aim of this paper was to investigate the algorithms of speech recognition. In the past, the kernel of Automatic Speech
Recognition (ASR) is dynamic Time Wrapping which is feature based template matching and belongs to the category
technique of Dynamic programming (DP).
Although, dynamic time wrapping is an early developed ASR technique, dynamic time wrapping was being popular in lots of
applications. Dynamic time wrapping is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligence comparison of speech recognition system for using an improved dynamic
time wrapping approach for multimedia and other areas. The improved version of dynamic time wrapping presented in the
recent areas of work called hidden markov model- like dynamic time wrapping is essentially a Hidden Markov Model like
method where the concept of the typical hidden markov model statistical model is brought into the design of dynamic time
wrapping. the developed hidden markov model like dynamic time wrapping method, transforming feature based dynamic
time wrapping based recognition into model based dynamic time wrapping recognition, will be able to behave as the hidden
markov model recognition technique and therefore proposed hidden markov model like dynamic time wrapping with the
hidden markov model- like recognition model will have the capability to the further perform model adaption (Speaker
adaption).
Speaker verification is one among the widely used biometrics which usually offer more secure authentication for user access
than regular passwords. This is one of the areas in which speech recognition plays an important role for security issues.
Dynamic Time wrapping and Hidden Markov Model are two well-studied non-linear sequence alignment algorithm. The
research trend from dynamic time wrapping to hidden markov model in approximately1988-1990, since dynamic time
wrapping is a deterministic and lack of the power to model stochastic signals. Dynamic time wrapping has been applied to
mostly in speech recognition, since its obvious that the speech tends to have different temporal rate, and alignment is very
important for a good performance. The standard dynamic time wrapping is basically using the idea of deterministic Dynamic
programming. However, a lot of real signals are stochastic processes, such as speech signal, video signaled. A new algorithm
called stochastic dynamic time wrapping was introduced based on the drawbacks of basic dynamic time wrapping. in hidden
markov model, Viterbi algorithm is used for searching the optimal state transition sequence, for a given observation sequence.
It turns out to be another application of dynamic programming to cut down its computations. In a regular Markov Model, the
state is directly visible to the observer and therefore the state transition probabilities are the only parameters. In a hidden
markov model, the state is not directly visible, but variables influenced by the state are visible. Each state has a probability
distribution over the possible output tokens. Therefore the sequence o token generated by a hidden markov model gives some
information about the sequence of states.
3. Speech Recognition
The schematic diagram depicts the block diagram if speech recognition. The process description goes as; Input is digitalized
into a sequence of feature vectors. An acoustic phonetic recognizer transforms the feature vectors into a time sequenced
lattices of phones. A word recognition module transforms the phones lattice into a word lattice, with the help of a lexicon.
Finally, in the case of continuous or connected word recognition, a grammar is applied to pick the most likely sequence of
words from the word lattice. As fundamental equations of speech recognition there are many paradigms of speech recognition
systems, the major two paradigms which are used abundantly are; stochastic approach and Template based approach.
_____________________________________________________________________________________________________________
Page -24
ISSN: 2395-0560

www.irjie.com
Speech
Speech Recognition
Feature Vector Lattice
Acoustic Models
Phonetic Recognition
Phone lattice
Lexicon
Word Recognition
Word Lattice
Grammar
Task Recognition
Text
Figure 1: Block diagram of speech recognition
The other efficiently used algorithm while in the speech recognition is Dynamic Time Wrapping algorithm (DTW) The
Dynamic Time Wrapping is an algorithm which calculates an optimal wrapping path between two time series. The algorithm
calculates both wrapping path values between the two series and the distance between them. Suppose we have two numerical sequences (a1,a2..an) and (b1,b2bm). As we can see, the length of the two sequences can be different. The algorithm starts with local distances calculation between the elements of the two sequences using different types of distances. The
most frequent used method for distance calculation is the absolute distance between the values of the two elements (Euclidian
distance). That results in a matrix of distances having n lines and m columns. Starting with local distances matrix, then the
minimal distance matrix between sequences is determined using the dynamic programming algorithm.
4. Categories of Speech Recognition

The problem of fundamental interest is characterizing uch signal in the terms of signal model, signal model gives us the
following data product. Theoritical description of a signal processing system which can be used to process the signal and so as
to provide desired output. And also It helps us to understand great deal about signal source without having to have source
available.The signal model can be classified as ;
1.
Deterministic model : This exploits some known property of signal lke amplitude of wave.
2.
Statistical model : This model takes statistical property of signal property of signal into acount . Example of this
type model is said to be Gaussian Model, Poisson Model, Markov Model and Hidden Markov Model.
The Categorization od speech recognition can be based on different aspect of speechs, they can be desribed as ;
1.
Single word recognizer.

The single recognizer is the one which is one of the categorical methods of speech recognition which is very
easy to construct and there is no boundary limitation as to conditonalize the speech.
_____________________________________________________________________________________________________________
Page -25
ISSN: 2395-0560

www.irjie.com
2.
Continuous word recognizer.

The continuous word recognizer is the one which has complex structure comparatively to single word recog-
nizer. As this recognizer is complex in nature, the word boundary conditions are followed as the start and end condition must
be specified in detail.
3.
Speaker Dependent
This category of speech recognition needs less training and the recognition of speech is based on the specific
speaker.
4.
Speaker Independent
Speaker Independent recognizer does not depend on specific speaker. This is of more general recognition
category but yet, this requires large training data.
5. Comparative Analysis
Each algorithm has its own advantages and disadvantages, which makes dynamic time wrapping good for one type of application and hidden markov model better for another. Hidden markov model works very well if there is a large amount of
training data available, however it does not do as well as dynamic time wrapping when there is a limited number of training
samples. A core advantage of hidden markov model algorithm is that it is can work well even if there is high within-gesture variance.
Dynamic time wrapping on the other hand needs several templates if there is high within-gesture variance. The major disadvantage of hidden markov model algorithm is that it has a lot of magic numbers that need to be select by the user before
training the algorithm. Dynamic time wrapping has very few magic numbers and can therefore be easier.
Table 1. Comparative Analysis
Sr. No
1.
Parameter
Complexity
Hidden Markov Model
Dynamic Time Wrap
The Hidden Markov Model algorithm
The complexity of
Dynamic Time
is complex in comparison to dynamic
wrapping algorithm is
time wrapping algorithm. the com-
O(n+|E|log n)
plexity of Hidden Markov Model is ;
In Dynamic time wrapping, there may be
O(|E|log|E|)
number of paths to traverse the data.
In hidden markov model one node is
There may be chance to repetition of
traverse exactly once. No repetition is
nodes to traverse the data.
allowed.
_____________________________________________________________________________________________________________
Page -26
ISSN: 2395-0560

www.irjie.com
2.
Structure
The structure of Hidden Markov
The structure of dynamic time wrap is
Model is complex. It uses unicasting.
very easy to understand. In dynamic time
So the number of comparisons in
wrapping, the broadcasting is used to
Hidden markov model is less. But in
traverse the data. In this, the numbers of
unicasting the comparison are so
comparisons are more but there may be
complex and to find the shortest path
number of paths traversing the data.
at minimum cost is very difficult.

3.
Security
The hidden Markov Model is more
In dynamic time wrapping, the security
secure in comparison to Dynamic
of data is not possible, because it uses
Time Wrapping, because hidden mar-
broadcasting. Any node can access the
kov model use unicasting to transfer
data, whether it is required by them or
the data. So only the source and des-
not. So it is hence proved to be not
tination know about the data so the
secured.
data transmission is secure.

4.
Reliability
The Markov model is not reliable,
The dynamic time wrapping algorithm is
because there are only one source and
more reliable than hidden markov mod-
destination. I due to any reason the
el ,because as it uses broadcasting and
system get fail then there will be no
there are many number of sources and
chance to recover the data.
destinations, so if any one node fails, the

other system get the data and the data
will be secure , And in case of corruption
of any system , the data can be recovered.
5.
6.
7.
Backup
Data Transmission
Economical
In hidden Markov Model there should
In dynamic time wrapping, backup of
be no backup of data.
data should be kept on other system.
It uses unicasting technology for data
In dynamic time wrapping , broadcasting
transmission.
Technology is used
Hidden markov Model algorithm im-
In dynamic time wrapping, the nodes are
plementation is economical in com-
used for transmissions. So the software
parison to dynamic time wrapping
and the components are installed on all
because ,in hidden markov model and
the systems, which is much costlier.
software installed on only single system for data transmission , so the

number of installation are not required
in huge numbers.
8.
Approaches
The markov model, Algorithm uses
This uses time series analysis with sto-
dynamic Bayesian network approach.
chastic approach which is mainly used
The core logical algorithm used in
for measuring similarity between two
hidden markov model is forward
temporal sequences which may vary in
backward algorithm.
time or speed.
_____________________________________________________________________________________________________________
Page -27
ISSN: 2395-0560

www.irjie.com
9.
Applications
The application goal of hidden mar-
The application goal of dynamic time
kov model algorithm is to recover a
wrapping is to measure an optimal wrap-
data sequence that is not immediately
ping path between two time
observable. They mainly include few
series; They include;
applications like;
10.
Constraints
Cryptanalysis
Speaker recognition
Speech synthesis
Online signature recognition
Machine translation
Partial shape matching application.
The constraints of hidden markov
The constraints of dynamic time wrap-
model algorithm can be listed as;
ping can be listed as;
Evaluation:
Boundary conditions
Uncovered hidden data.
Continuity.
6. Conclusion
In this research we tried to find better technique for speech recognition. The fact that performance of Hidden Markov Model
recognizer is somewhat poorer than the Dynamic Time Wrapping based recognizer appears to be primarily because of insufficiency of the Hidden Markov models training data. The Performance of the Hidden Markov Model depends on the
number of states of the model. It is necessary that number of states should be such that they can model the word. The time and
space complexity of Hidden Markov Model approach is less than the Dynamic Time Wrap approach because the hidden
markov model tests to compute the probability of each model to produce that observed sequence. In some cases, the accuracy
of hidden markov model is better in comparison to Dynamic time wrapping algorithm for speech recognition. Speech
recognition using hidden markov model gives good result due to resemblance between the architecture of hidden markov
model and varying speech data.
Neural network is another method, which uses gradient decent method with back propagation algorithm. While in hidden
markov model, the recognition ability is good for unknown word. Hidden markov model is generic concept and is used in
many area of research. Dynamic time wrapping algorithm is very useful for isolated word recognition in a limited dictionary.
For fluent speech recognition, Hidden Markov model chains are used. The main complexity in using dynamic time wrapping
is that it may not be as satisfactory for a large dictionary which could ensure an increase in the success rate of recognition
process. The models provide flexible but rigorous stochastic frame work in which to build our systems. However, hidden
markov models do not model certain aspects of speech recognition such as suprasemental (long span) words phenomena
well.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
F. Jelinek. "Continuous Speech Recognition by Statisical Methods." IEEE Proceedings 64:4(1976): 532-556
Young, S., A Review of Large-Vocabulary Continuous Speech Recognition, IEEE Signal Processing Magazine, pp. 45-57, Sep.
1996
Sakoe, H. & S. Chiba. (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE, Trans. Acoustics,
Speech, and Signal Proc., Vol. ASSP-26.
D. Raj Reddy, Speech Recognition by Machine: A Review, Proceedings of the IEEE, Vol. 64, No. 4, April 1976,pp 501-531.
Santos, J. ,Ciudad Universitaria, Madrid, Spain ,Nombela, J. Text-to-speech conversion in Spanish a complete rule-based
synthesis system||Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '82
Kain, A. ,CSLU, Oregon Graduate Inst. of Sci. & Technol., Beaverton, OR Macon, M.W. Spectral voice conversion for
text-to-speech synthesis || Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference
_____________________________________________________________________________________________________________
Page -28

The Comparative Study of Speech Recognition Models - Hidden Markov and Dynamic Time Wrapping Model

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

The Comparative Study of Speech Recognition Models - Hidden Markov and Dynamic Time Wrapping Model

Caricato da

Copyright:

Formati disponibili

ISSN: 2395-0560

International Research Journal of Innovative Engineering

Department of Computer Science, Christ University, Bangalore, 560034, India

International Research Journal of Innovative Engineering

Dimension of recognizable vocabulary

International Research Journal of Innovative Engineering

International Research Journal of Innovative Engineering

Figure 1: Block diagram of speech recognition

4. Categories of Speech Recognition

Single word recognizer.

International Research Journal of Innovative Engineering

Continuous word recognizer.

Hidden Markov Model

Dynamic Time Wrap

The Hidden Markov Model algorithm

is complex in comparison to dynamic

time wrapping algorithm. the com-

plexity of Hidden Markov Model is ;

In Dynamic time wrapping, there may be

number of paths to traverse the data.

In hidden markov model one node is

There may be chance to repetition of

traverse exactly once. No repetition is

nodes to traverse the data.

International Research Journal of Innovative Engineering

The structure of Hidden Markov

The structure of dynamic time wrap is

Model is complex. It uses unicasting.

very easy to understand. In dynamic time

So the number of comparisons in

wrapping, the broadcasting is used to

Hidden markov model is less. But in

traverse the data. In this, the numbers of

unicasting the comparison are so

comparisons are more but there may be

complex and to find the shortest path

number of paths traversing the data.

at minimum cost is very difficult.

The hidden Markov Model is more

In dynamic time wrapping, the security

secure in comparison to Dynamic

of data is not possible, because it uses

Time Wrapping, because hidden mar-

broadcasting. Any node can access the

kov model use unicasting to transfer

data, whether it is required by them or

the data. So only the source and des-

not. So it is hence proved to be not

tination know about the data so the

data transmission is secure.

The Markov model is not reliable,

The dynamic time wrapping algorithm is

because there are only one source and

more reliable than hidden markov mod-

destination. I due to any reason the

el ,because as it uses broadcasting and

system get fail then there will be no

there are many number of sources and

chance to recover the data.

destinations, so if any one node fails, the

In hidden Markov Model there should

In dynamic time wrapping, backup of