Sei sulla pagina 1di 7

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 3 of March 2015

The Comparative Study of Speech Recognition Models


-Hidden Markov and Dynamic Time Wrapping Model
V.Vaidhehi1, Anusha J2, Anand P3
1,2,3

Department of Computer Science, Christ University, Bangalore, 560034, India

Abstract Speech Recognition is the process of converting a speech signal to a sequence of words, by means of algorithms implemented as a computer program. Speech is the most natural form of human communication. Speech recognition
technology has made it possible for computer to follow human voice commands and understand human languages. The
main goal of speech recognition area is to develop techniques and systems for speech input to machine. Dynamic Time
Warp and Hidden Markov techniques are used for isolated word recognition. The objective of this paper is to compare their
performances.

Keywords Comparison, Performance Evaluation, Dynamic Time Wrapping model, Hidden Markov Model

1. Introduction
The Speech recognition is an area with considerable literature, but there is little discussion of the topic within the computer
science algorithms literature. Speech recognition is a multileveled pattern recognition task, in which a acoustical signals are
examined and structured into a hierarchy of sub words units (e.g. phonemes), words, phrases, and sentences. Each level may
provide additional temporal constraints, e.g. known word pronunciations or legal word sequences, which can compensate for
errors or uncertainties of lower levels. This hierarchy of constraints can best be exploited by combining decisions probabilistically at all lower levels, and making discrete decisions only at highest level [5]. There are two related speech tasks they can
be listed as; Speech understanding and Speech recognition
Speech understanding is getting the meaning of an utterance such that one can respond properly whether or not one has
correctly recognized all the words. Speech recognition is simply transcribing the speech without necessarily knowing the
meaning of the utterance. The technology is also helpful to handicapped persons who might otherwise require helpers to
control their environments. Studies in speaking recognition field, as well as studies in other fields, follow two trends; Fundamental research whose goal is to devise and test new methods, algorithms and concepts in a non- commercial manner and
applied research whose goal is to improve existing methods, following specific criteria.

The fundamental research aims at medium and especially long term benefits , while applied research aims at quick performance improvements of existing methods or extending their use in domains where they have less been used so far. Improving
performance in speech recognition can be done taking into account the following criteria;
Dimension of recognizable vocabulary
Spontaneous degree of speaking to be recognized.
Dependence or independence on the speaker.
Time to put in motion of the system.
System accommodating time at new speakers.
Decision and recognition time.
Recognition rate which is expressed by words or by sentences.

_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -22

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 3 of March 2015
Todays speech recognition systems are based on the general principles of forms recognition [1][2]. The methods and algorithms that have been used so far can be divided into four large classes as;

Dimension of recognizable vocabulary


Spontaneous degree of speaking to be recognized.
Dependence or independence on the speaker.
Time to put in motion of the system.
System accommodating time at new speakers.
Decision and recognition time.
Recognition rate which is expressed by words or by sentences.
Todays speech recognition systems are based on the general principles of forms recognition [1][2]. The methods and algorithms that have been used so far can be divided into four large classes as;
Discriminated Analysis Methods based on Bayesian discrimination.
Hidden Markov Model
Dynamic programming Dynamic Time wrapping (DTW) [4].
Neural networks.
Algorithm optimization is therefore necessary to remove undesirable operations s far as possible. Hidden Markov Model is
popular statistical models used to implement speech recognition technologies [3]. The time variances in the spoken language are modeled as Markov processes with discrete states. Each state produces speech observations according to the
probability distribution characteristics of that state. The speech observations can take on a discrete or a continuous value. In
either case, speech observations represent fixed time duration that is, a frame. The states are not directly observable, which is
why the model is called the hidden Markov model.
The organization of this paper has been structured in the systematic manner as the introduction leads the reader from a general subject area to a particular field of research. It establishes the context and significance of the research being conducted
by summarizing current understanding and background information, stating the purpose of the work in the form of the research problem supported by a hypothesis or a set of questions, briefly explaining the methodological approach used by
highlighting the potential outcomes which the study reveals and outlining the remaining structure of the paper. The literature
review describes the overall goal with an integrative summary of the other research findings and the questions that remains
unanswered or require additional research. The next sections of the paper are brief insight of speech recognition and categorization of speech recognition. Comparative analysis is made based on the study made throughout the course. The conclusions are made at the end of the work as they discuss instances in which the findings were made.

2. Literature review
Speech recognition technology has many applications on embedded systems, Such as stand-alone devices and single purpose
command and control systems. This paper presents the comparative study on the algorithms which are mainly used for the
speech recognition. This report represents and emphasizes techniques to prevent overflows of probability scores and efficiently represent some of the key variability in implementing them for the process. The highly complex algorithms have to be
optimized to meet the limitations in computing power and memory resources. The optimization which typically involves
_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -23

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 3 of March 2015
simplification and approximation, inevitably leads to the loss of precision and the degradation of recognition accuracy. This
paper describes the comparison of state of art algorithms and their techniques for the speech recognition. By optimizing the
speech recognition algorithms, the computation time for both front-end and pattern recognition has been efficiently reduced.
On the other hand, the execution time for the back-end is proportional to the complexity of the model. The more complex the
model is, the more execution time and memory will be required.
The aim of this paper was to investigate the algorithms of speech recognition. In the past, the kernel of Automatic Speech
Recognition (ASR) is dynamic Time Wrapping which is feature based template matching and belongs to the category
technique of Dynamic programming (DP).

Although, dynamic time wrapping is an early developed ASR technique, dynamic time wrapping was being popular in lots of
applications. Dynamic time wrapping is playing an important role for the known Kinect-based gesture recognition application now. This paper proposed an intelligence comparison of speech recognition system for using an improved dynamic
time wrapping approach for multimedia and other areas. The improved version of dynamic time wrapping presented in the
recent areas of work called hidden markov model- like dynamic time wrapping is essentially a Hidden Markov Model like
method where the concept of the typical hidden markov model statistical model is brought into the design of dynamic time
wrapping. the developed hidden markov model like dynamic time wrapping method, transforming feature based dynamic
time wrapping based recognition into model based dynamic time wrapping recognition, will be able to behave as the hidden
markov model recognition technique and therefore proposed hidden markov model like dynamic time wrapping with the
hidden markov model- like recognition model will have the capability to the further perform model adaption (Speaker
adaption).

Speaker verification is one among the widely used biometrics which usually offer more secure authentication for user access
than regular passwords. This is one of the areas in which speech recognition plays an important role for security issues.
Dynamic Time wrapping and Hidden Markov Model are two well-studied non-linear sequence alignment algorithm. The
research trend from dynamic time wrapping to hidden markov model in approximately1988-1990, since dynamic time
wrapping is a deterministic and lack of the power to model stochastic signals. Dynamic time wrapping has been applied to
mostly in speech recognition, since its obvious that the speech tends to have different temporal rate, and alignment is very
important for a good performance. The standard dynamic time wrapping is basically using the idea of deterministic Dynamic
programming. However, a lot of real signals are stochastic processes, such as speech signal, video signaled. A new algorithm
called stochastic dynamic time wrapping was introduced based on the drawbacks of basic dynamic time wrapping. in hidden
markov model, Viterbi algorithm is used for searching the optimal state transition sequence, for a given observation sequence.
It turns out to be another application of dynamic programming to cut down its computations. In a regular Markov Model, the
state is directly visible to the observer and therefore the state transition probabilities are the only parameters. In a hidden
markov model, the state is not directly visible, but variables influenced by the state are visible. Each state has a probability
distribution over the possible output tokens. Therefore the sequence o token generated by a hidden markov model gives some
information about the sequence of states.

3. Speech Recognition
The schematic diagram depicts the block diagram if speech recognition. The process description goes as; Input is digitalized
into a sequence of feature vectors. An acoustic phonetic recognizer transforms the feature vectors into a time sequenced
lattices of phones. A word recognition module transforms the phones lattice into a word lattice, with the help of a lexicon.
Finally, in the case of continuous or connected word recognition, a grammar is applied to pick the most likely sequence of
words from the word lattice. As fundamental equations of speech recognition there are many paradigms of speech recognition
systems, the major two paradigms which are used abundantly are; stochastic approach and Template based approach.

_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -24

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 3 of March 2015

Speech
Speech Recognition
Feature Vector Lattice

Acoustic Models

Phonetic Recognition
Phone lattice
Lexicon
Word Recognition
Word Lattice
Grammar
Task Recognition

Text

Figure 1: Block diagram of speech recognition

The other efficiently used algorithm while in the speech recognition is Dynamic Time Wrapping algorithm (DTW) The
Dynamic Time Wrapping is an algorithm which calculates an optimal wrapping path between two time series. The algorithm
calculates both wrapping path values between the two series and the distance between them. Suppose we have two numerical sequences (a1,a2..an) and (b1,b2bm). As we can see, the length of the two sequences can be different. The algorithm starts with local distances calculation between the elements of the two sequences using different types of distances. The
most frequent used method for distance calculation is the absolute distance between the values of the two elements (Euclidian
distance). That results in a matrix of distances having n lines and m columns. Starting with local distances matrix, then the
minimal distance matrix between sequences is determined using the dynamic programming algorithm.

4. Categories of Speech Recognition


The problem of fundamental interest is characterizing uch signal in the terms of signal model, signal model gives us the
following data product. Theoritical description of a signal processing system which can be used to process the signal and so as
to provide desired output. And also It helps us to understand great deal about signal source without having to have source
available.The signal model can be classified as ;

1.

Deterministic model : This exploits some known property of signal lke amplitude of wave.

2.

Statistical model : This model takes statistical property of signal property of signal into acount . Example of this
type model is said to be Gaussian Model, Poisson Model, Markov Model and Hidden Markov Model.

The Categorization od speech recognition can be based on different aspect of speechs, they can be desribed as ;
1.

Single word recognizer.


The single recognizer is the one which is one of the categorical methods of speech recognition which is very
easy to construct and there is no boundary limitation as to conditonalize the speech.

_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -25

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 3 of March 2015
2.

Continuous word recognizer.


The continuous word recognizer is the one which has complex structure comparatively to single word recog-

nizer. As this recognizer is complex in nature, the word boundary conditions are followed as the start and end condition must
be specified in detail.
3.

Speaker Dependent
This category of speech recognition needs less training and the recognition of speech is based on the specific

speaker.
4.

Speaker Independent
Speaker Independent recognizer does not depend on specific speaker. This is of more general recognition
category but yet, this requires large training data.

5. Comparative Analysis
Each algorithm has its own advantages and disadvantages, which makes dynamic time wrapping good for one type of application and hidden markov model better for another. Hidden markov model works very well if there is a large amount of
training data available, however it does not do as well as dynamic time wrapping when there is a limited number of training
samples. A core advantage of hidden markov model algorithm is that it is can work well even if there is high within-gesture variance.
Dynamic time wrapping on the other hand needs several templates if there is high within-gesture variance. The major disadvantage of hidden markov model algorithm is that it has a lot of magic numbers that need to be select by the user before
training the algorithm. Dynamic time wrapping has very few magic numbers and can therefore be easier.
Table 1. Comparative Analysis

Sr. No
1.

Parameter
Complexity

Hidden Markov Model

Dynamic Time Wrap

The Hidden Markov Model algorithm

The complexity of

Dynamic Time

is complex in comparison to dynamic

wrapping algorithm is

time wrapping algorithm. the com-

O(n+|E|log n)

plexity of Hidden Markov Model is ;

In Dynamic time wrapping, there may be

O(|E|log|E|)

number of paths to traverse the data.

In hidden markov model one node is

There may be chance to repetition of

traverse exactly once. No repetition is

nodes to traverse the data.

allowed.

_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -26

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 3 of March 2015
2.

Structure

The structure of Hidden Markov

The structure of dynamic time wrap is

Model is complex. It uses unicasting.

very easy to understand. In dynamic time

So the number of comparisons in

wrapping, the broadcasting is used to

Hidden markov model is less. But in

traverse the data. In this, the numbers of

unicasting the comparison are so

comparisons are more but there may be

complex and to find the shortest path

number of paths traversing the data.

at minimum cost is very difficult.


3.

Security

The hidden Markov Model is more

In dynamic time wrapping, the security

secure in comparison to Dynamic

of data is not possible, because it uses

Time Wrapping, because hidden mar-

broadcasting. Any node can access the

kov model use unicasting to transfer

data, whether it is required by them or

the data. So only the source and des-

not. So it is hence proved to be not

tination know about the data so the

secured.

data transmission is secure.


4.

Reliability

The Markov model is not reliable,

The dynamic time wrapping algorithm is

because there are only one source and

more reliable than hidden markov mod-

destination. I due to any reason the

el ,because as it uses broadcasting and

system get fail then there will be no

there are many number of sources and

chance to recover the data.

destinations, so if any one node fails, the


other system get the data and the data
will be secure , And in case of corruption
of any system , the data can be recovered.

5.

6.

7.

Backup

Data Transmission

Economical

In hidden Markov Model there should

In dynamic time wrapping, backup of

be no backup of data.

data should be kept on other system.

It uses unicasting technology for data

In dynamic time wrapping , broadcasting

transmission.

Technology is used

Hidden markov Model algorithm im-

In dynamic time wrapping, the nodes are

plementation is economical in com-

used for transmissions. So the software

parison to dynamic time wrapping

and the components are installed on all

because ,in hidden markov model and

the systems, which is much costlier.

software installed on only single system for data transmission , so the


number of installation are not required
in huge numbers.
8.

Approaches

The markov model, Algorithm uses

This uses time series analysis with sto-

dynamic Bayesian network approach.

chastic approach which is mainly used

The core logical algorithm used in

for measuring similarity between two

hidden markov model is forward

temporal sequences which may vary in

backward algorithm.

time or speed.

_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -27

ISSN: 2395-0560

International Research Journal of Innovative Engineering


www.irjie.com
Volume1, Issue 3 of March 2015
9.

Applications

The application goal of hidden mar-

The application goal of dynamic time

kov model algorithm is to recover a

wrapping is to measure an optimal wrap-

data sequence that is not immediately

ping path between two time

observable. They mainly include few

series; They include;

applications like;

10.

Constraints

Cryptanalysis

Speaker recognition

Speech synthesis

Online signature recognition

Machine translation

Partial shape matching application.

The constraints of hidden markov

The constraints of dynamic time wrap-

model algorithm can be listed as;

ping can be listed as;

Evaluation:

Boundary conditions

Uncovered hidden data.

Continuity.

6. Conclusion
In this research we tried to find better technique for speech recognition. The fact that performance of Hidden Markov Model
recognizer is somewhat poorer than the Dynamic Time Wrapping based recognizer appears to be primarily because of insufficiency of the Hidden Markov models training data. The Performance of the Hidden Markov Model depends on the
number of states of the model. It is necessary that number of states should be such that they can model the word. The time and
space complexity of Hidden Markov Model approach is less than the Dynamic Time Wrap approach because the hidden
markov model tests to compute the probability of each model to produce that observed sequence. In some cases, the accuracy
of hidden markov model is better in comparison to Dynamic time wrapping algorithm for speech recognition. Speech
recognition using hidden markov model gives good result due to resemblance between the architecture of hidden markov
model and varying speech data.

Neural network is another method, which uses gradient decent method with back propagation algorithm. While in hidden
markov model, the recognition ability is good for unknown word. Hidden markov model is generic concept and is used in
many area of research. Dynamic time wrapping algorithm is very useful for isolated word recognition in a limited dictionary.
For fluent speech recognition, Hidden Markov model chains are used. The main complexity in using dynamic time wrapping
is that it may not be as satisfactory for a large dictionary which could ensure an increase in the success rate of recognition
process. The models provide flexible but rigorous stochastic frame work in which to build our systems. However, hidden
markov models do not model certain aspects of speech recognition such as suprasemental (long span) words phenomena
well.

REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]

F. Jelinek. "Continuous Speech Recognition by Statisical Methods." IEEE Proceedings 64:4(1976): 532-556
Young, S., A Review of Large-Vocabulary Continuous Speech Recognition, IEEE Signal Processing Magazine, pp. 45-57, Sep.
1996
Sakoe, H. & S. Chiba. (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE, Trans. Acoustics,
Speech, and Signal Proc., Vol. ASSP-26.
D. Raj Reddy, Speech Recognition by Machine: A Review, Proceedings of the IEEE, Vol. 64, No. 4, April 1976,pp 501-531.
Santos, J. ,Ciudad Universitaria, Madrid, Spain ,Nombela, J. Text-to-speech conversion in Spanish a complete rule-based
synthesis system||Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '82
Kain, A. ,CSLU, Oregon Graduate Inst. of Sci. & Technol., Beaverton, OR Macon, M.W. Spectral voice conversion for
text-to-speech synthesis || Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference

_____________________________________________________________________________________________________________
2015, IRJIE-All Rights Reserved
Page -28

Potrebbero piacerti anche