My Paper

International Journal of Faculty Electrical Engineering ISSN 0001-0002 Volume 1 Number 1 (2017) pp.
0000-0000
© Telkom University Publications. http://www.telkomuniversity.ac.id
A Review: Speech to Text by Machine

Using Discrete Method Hidden Markov Models on Smart House
Deyan Havith Dailamy

Research Scholar, Computer Engineering,
Faculty of Electrical Engineering,Telkom University,
Jl. Telekomunikasi No.1, Dayeuhkolot, Bandung, Jawa Barat, Indonesia.
Orcid Id: 0000-0001-9817-3185
Abstract review paper project the classification process used is the

Speech recognition is a technology used to recognize words method of hidden markov model because each observed
spoken by humans and turn them into data. In smart house sound is calculated its probability to every existing command
applications, speech recognition can be applied as an interface model, whereas the euclidean distance method only compares
for users to control the motor or other electronic devices. In the closest distance between the received voice data and the
this paper review, the writer implements speech recognition existing command model database. Speech Recognition is an
system with MATLAB software. The characteristic extraction advance technique to be followed in the fields of Automation,
process uses the melep method of frequency cepstral Artificial Intelligence and soon. The improvement in the
coefficients (MFCC). As for the process of classification, recognition accuracy, robustness may cause effects on the
using the method of hidden markov model (HMM). The performance of ASR. Linear Predictive Coding (LPC), Mel
output signal from the speech recognition process will be sent Frequency Cepstral Coefficients (MFCC) and Perceptual
to the microcontroller system and then used to control the Linear Prediction (PLP) are the various methods of feature
flame of the lamp and the room temperature (Air Conditioner) extraction techniques available for ASR. MFCC and PLP are
which is part of the smart house implementation. Problems most widely used feature extraction techniques which are
encountered in this research is the noise (noise) from the room required to reconstruct the original signal [2]. The recent
when recording, as well as pronunciation commands that must focus of researchers involved in the implementation of ASR
be clear and normal within one second. In this research, the into an embedded platform [3]. The speech-to-text conversion
best accuracy is 88,889% in non-realtime test and 70% is a useful technique which is helpful for handicapped peoples
accurate on realtime test. The system is made can be said to be [4]. The goal of speech recognition area is to developed
good because the results obtained relatively good accuracy. technique and system to developed for speech input to
macine.based on major advanced in statically modeling of
Keywords: speech reecognition, hidden markov model, speech ,automatic speech recognition today find widespread
speech-to-text, digital signal processor , smart house. application in task that require human machine interface such
as automatic call processing [19].
Introduction
Speech recognition is a technology used to translate words Methods
spoken by humans into writing. Systems that implement
speech recognition are able to understand the instructions 1. Proposed Methodology
spoken by humans and act according to the instructions given. In this paper will explain how the implementation of HMM
In a smart house application, the application of speech method in smart house, but before we entering into the main
recognition is used as an interface for the user to control topic on the implementation of HMM for smart house, first we
household electronic appliance [1]. This paper review project take look at described the methodology used in HMM. The
is a further development of the last task made by one of speech signal has to be in wave format, then only the signal
college student where in the paper using euclidean distance has to be processed and feature extraction can be done. The
classification method, the best accuracy level obtained on the speech signal is sampled at 16000 Hz and the number of bits
system is 63.33%, the cause of accuracy is quite low at the per sample as 32 [5]. The reasonable up into number of
final task is because the euclidean distance method can not frames according to the frame length. The Proposed
classify the sound other than the sound that exist in the system methodology consists of the feature extraction module and the
database. In this review paper project using the method of codebook generation. The proposed architecture has been
classification of hidden markov model which is expected to shown in Figure 1.
produce a higher level of accuracy. In the method of hidden
markov model, the HMM parameter is built for each
Acoustic Feature Codebook Speech to
command to be detected. This parameter is built by training Wave Extraction Generatiom text
each command model with 1 or more speakers. After these nn
HMM parameters are trained, the test sound data obtained will
be observed by the system and calculated the probability
estimates for each command model of the voice data. In this
0000
International Journal of Faculty Electrical Engineering ISSN 0001-0002 Volume 1 Number 1 (2017) pp. 0000-0000
DHMM
Modelling
gg
Figure 1: Block Diagram of Proposed Methodoly[6]
The original signal has to be split up into number of frames

according to the frame length. The low frequency signals are
selected by blocking the low frequency in every frame. After
frame blocking of each and every frames, Hamming window
is applied to reduce the discontinuity of the signal. Determine
the DCT coefficients for the evaluation of campestral
coefficients by using filter bank spacing to each and every
windowed frame. Apply logarithmic values to get the DCT
values as a single value [7].
2. Feature Extraction Techniques

The speech feature extraction in a categorization problem is
about reducing the dimensionality of the input vector while
maintaining the discriminating power of the signal. As we
know from fundamental formation of speaker identification
and verification system, that the number of training and test
vector needed for the classification problem grows with the
dimension of the given input so we need feature extraction of
speech signal. Following are some feature extraction has
explained below:
Tabel 1: List of technique with their properties For Feature

extraction
Feature Extraction: Figure 2 describes the block diagram of

feature extraction techniques. The steps that are followed
for the desired system has explained below.
Pre- Frame
Emphasize Blocking
FFT Hamming
Windowing
Warping Log
Filter Bank
Cepstral DCT
Coefficients
Figure 2: Feature Extraction Techniques [8]

0000
the number of columns equal to the number of filters in the

 Wrapping Filter Bank filter bank.
Generally done by using a filterbank. Filterbank is one form
of filter that is done with the aim to find out the size of a
particular frequency band in the sound signal, on MFCC used
Mel Filterbank using mel scale. The mel scale is linear at (3)
frequencies below 1000 Hz, and logarithms at frequencies
above 1000 Hz. To find the mel scale, the equations are used :
 Discrete Cosine Transform
The next step is to convert the log spectrum into a time
(2) domain. This result is called mel frequency cepstrum
coefficient (MFCC). The cepstral repre- sentation of the
The Mel Filter bank spacing has to be applied to the FFT speech spectrum provides a good reprentation of the local
values to get the conversion for the frequency domain into the spectral properties of the signal for known frame analysis.
Mel scale. Triangular band pass filters are applied as a filter Because the coefficient of mel spectrum is a real number [13].
bank spacing which his non -uniformly spaced on the linear By converting it into time domain using discrete cosine
frequency axis and it is uniformly spaced on the linear transform (DCT). The cepstral analysis includes the
frequency axis, with the larger number of filters in the low conversion of spatial domain to frequencydomain by applying
frequency region and lesser number of filters in the high DCT to the Mel Scale values. DCTexpresses a finite set of
frequency region and is shown in Figure 3: data points in terms of a sum ofcosine functions. The
conversion of DCT is similar to the DFT in the conversion
process, DCT is more preferablesince the value obtained will
provide us the accuracy for increasing the robustness of ASR
[14]. Equation (3.5) expresses the DCT:
(4)
 The Cepstral Coefficients

Obtained by applying DCT to the Mel scale values. The
coefficients which areobtained after the evaluation of DCT
Figure 3: Mel-Scale Filter Bank called MelFrequency Cepstral Coefficients (MFCC). When
thenumber of coefficients increases the accuracy and alsothe
The bark-scale is originally defined in Zwicker (1961). A increase in the complex of the designer complexity,there is a
distance of 1 on the bark scale is known as a critical band. The lag in design of ASR system [15].
implementation provided in this function is described in
Traunmuller (1990). An approximate expression for the Bark 3. Speech Recognition
scale frequency warping, due to Schroedinger is used in these
proposed method [12]. 3.1. Literature Survey of Speech Recognition:
3.1.1. 1920-1960s:
The earliest attempts to devise systems for automatic speech
recognition by machine were made in 1950s, when various
researchers tried to exploit the fundamental ideas of acoustic
phonetics. During 1950s[20], most of the speech recognition
systems investigated spectral resonances during the vowel
region of each utterance which were extracted from output
signals of an analogue filter bank and logic circuits. In 1952,
at Bell laboratories, Davis, Biddulph, and Balashek built a
system for isolated digit recognition for a single speaker [21].
The system relied heavily on measuring spectral resonances
during the vowel region of each digit. In an independent effort
Figure 4: Bark-Scale Frequency Filter Bank at RCA Laboratories in 1956, Olson and Belar tried to
recognize 10 distinct syllables of a single talker, as embodied
 Logarithm of Energies in 10 monosyllabic words [22]. The system again relied on
To compute the log-energy, i.e.,the logarithm of the sum of spectral measurements (as provided by an analog filter bank)
filtered components for each filter.Equation (3) expresses the primarily during vowel regions. In 1959, at University
computing logarithm of weighted sum of spectral values in the College in England, Fry and Denes tried to build a phoneme
filter-bank channel. At this stage, the number of rows equal to recognizer to recognize four vowels and nine consonants [23].
3.1.2 1960-1970:
0000
In the 1960s several fundamental ideas in speech recognition [36], and the frame synchronous level building approach of
surfaced and were published. In the 1960s since computers Lee and Rabiner at Bell Labs[37]. most practical speech
were still not fast enough, several special purpose hardware recognition systems are based on the statistical frame work
were built [24]. However, the decade started with several
developed in the 1980s and their results, with significant
Japanese laboratories entering the recognition arena and
building special purpose hardware as part of their systems. On additional improvements have been made in the 1990s like
early Japanese system, described by Suzuki and Nakata of the HMM, Neural Net, and Darpa Programs:
Radio Research Lab in Tokyo, was a hardware vowel
recognizer [25]. An elaborate filter bank spectrum analyzer a) Hidden Markov Model (HMM):
was used along with logic that connected the outputs of each HMM is one of the key technologies developed in the 1980s,
channel of the spectrum analyzer (in a weighted manner) to a is the hidden Markov model(HMM) approach [38]. It is a
vowel decision circuit, and majority decisions logic scheme
doubly stochastic process which as an underlying stochastic
was used to choose the spoken vowel. Another hardware
effort in Japan was the work of Sakai and Doshita of kyoto process that is not observable (hence the term hidden), but
University in 1962, who built a hardware phoneme recognizer canbe observed through another stochastic process that
[25]. A hardware speech segmented was used along with a produces a sequence of observations. Although the HMM was
zero crossing analysis of different regions of the spoken input well known and understood in a few laboratories (primarily
to provide the recognition output. A third Japanese effort was IBM, Institute for Defense Analysis (IDA) and Dragon
the digit recognizer hardware of Nagata and coworkers at Systems), it was not until widespread publication of the
NEC Laboratories in 1963[26].
methods and theory of HMMs in the mid-1980s that the
3.1.3. 1970-1980:
In the 1970s speech recognition research achieved a number technique became widely applied in virtually every speech
of significant milestones. First the area of isolated word or recognition research laboratory in the world. In the early
discrete utterance recognition became a viable and usable 1970s, Lenny Baum of Princeton University invented a
technology based on fundamental studies by Velichko and mathematical approach to recognize speech called Hidden
Zagoruyko in Russia[27], Cakoe and Chiba in Japan[28], and Markov Modeling (HMM). The HMM pattern-matching
Itakura in the united States. The Russian studies helped the strategy was eventually adopted by each of the major
advance use of pattern recognition ideas in speech
companies pursuing the commercialization of speech
recognition; the Japanese research showed how dynamic
programming methods could be successfully applied; and recognition technology (SRT).The U.S. Department of
Itakura s research showed how the ideas of linear predictive Defense sponsored many practical research projects during the
coding (LPC), which had already been successfully used in 70s that involved several contractors, including IBM, Dragon,
low bit rate speech coding, could be extended to speech AT&T, Philips and others. Progress was slow in those early
recognition systems through the use of an appropriate distance years.
measure based on LPC spectral parameters[29]. .Another b) Neural Net:
milestone of the 1970s was the beginning of a longstanding,
Another new technology that was reintroduced in the late
highly successful group effort in large vocabulary speech
recognition at IBM in which researchers studied three distinct 1980s was the idea of applying neural networks to problems
tasks over a period of almost two decades, namely the New in speech recognition. Neural networks were first introduced
Raleigh language [30] for simple database queries, the laser in the 1950s, but they did not prove useful initially because
patent text language [31] for transcribing laser patents, and the they had many practical problems. In the 1980s however, a
office correspondent tasks called Tangora[32]. deeper understanding of the strengths and limitations of the
3.1.4. 1980-1990:
technology was achieved, as well as, understanding of the
Just as isolated word recognition was a key focus of research
technology to classical signal classification methods. Several
in the 1970s, the problems of connected word recognition was
new ways of implementing systems were also proposed [39].
a focus of research in the 1980s. Here the goal was to create a
c) DARPA Program:
robust system capable of recognizing a fluently spoken string
Finally, the 1980s was a decade in which a major impetus was
of words(eg., digits) based on matching a concatenated pattern
given to large vocabulary, continuous speech recognition
of individual words. Moshey J. Lasry has developed a
systems by the Defense Advanced Research Projects Agency
featurebased speech recognition system in the beginning of
(DARPA) community, which sponsored a large research
1980. Wherein his studies speech spectrograms of letters and
program aimed at achieving high word accuracy for a 1000
digits[33].A wide variety of the algorithm based on matching
word continuous speech recognition, database management
a concatenated pattern of individual words were formulated
task.
and implemented, including the two level dynamic
programming approach of Sakoe at Nippon Electric
3.1.5. 2000-2009:
Corporation (NEC)[34],the one pass method of Bridle and
A) General:
Brown at Joint Speech Research Unit(JSRU) in UK[35], the
level building approach of Myers and Rabiner at Bell Labs
0000
Around 2000, Bayesian estimates (VB) were varied and 3.2. The Process of Classification of Voice Recognition
pooling techniques developed [40]. Unlike the maximum Speech recognition is a process by which the computer can
Perhaps, this VB approach is based on the back the identify the spoken words. The identification process is
required for the system to recognize a voice input so that it
distribution parameters Giuseppe Richardi [41] have develop
can be utilized. Voice recognition results can be used to
techniques to solve adaptive problems learning, in the perform various tasks such as controlling the machine,
introduction of automatic speech and even proposed Active accessing the database, and producing text [16].
learning algorithm for ASR. In 2005, several Enhancements
have been made in the large Vocabulary Continuous Speech
Recognition. Performance System repair. In 2007, the
difference in acoustic functionality between spontaneous
speech and large-scale reading the greeting database, CSJ was
analyzed. Sadaoki Furui studied SR methods that can adapt to
the word variations using a large number of trained models
based on pooling techniques In 2008, the authors explored
application of the conditional random field (CRF) to combine
local post estimates given by multilayer perceptions based on Figure 8: HMM Model left-right
the frame rate forecast of the phone and the classes are
In this calculations, HMM uses several notations, including:
phonologically linked. De-wachter et.al. , trying to cope with
time dependence, deep problems voice recognition using a
1. N is represents the number of hidden states (state to 1,
direct model corresponding method Xinwei Li et.al., he
2, ..., n) in HMM. All the states in the model are connected to
proposed a new one optimization method is semi-final
each other.
programming (SDP) a solving the problem of large margin
estimate (LME) HMM continuous density (CDHMM) in
2. M is the probability of appearing observed variables.
voice recognition. The training dissociates the acoustic model
3. T is the length of the observation circuit
to speak the recognition is offered under Massimo reciprocal
information (MMI). Around 2007, Rajesh M. Hegde et al.,
4. π represents the matrix π or π i with π i = P (q 1 = 1) ie the
proposed an alternative method for processing Fourier
probability distribution matrix at the initial stage resides in
transform for speech extraction function, which group delay
state i. In this case, Σ N i = 1 π i = 1.
process (GDF) that can be directed calculated by voice signal.
5. A is a transition opportunity between states. A = {a ij}

B) The DARPA Program:
where a ij = P (q t + 1 = j│q t = i) is the probability of being in
Affordable Accessible Speech (EARS) This program is made
state j at time t + 1 if at time t is in state i. in this case, a ij is
to develop speech-to-text (automatic transcription technology)
assumed to be free of time.
with the goal of achieving much richer and far more accurate
than output before. Activities include detection of phrase
6. B is the probability of occurrence of variables observed in a
limits, fillers, and fluids. The program focuses on nature,
state. B = {b j (k)} with b j (k) = P (v k when t if the state that
human speech that is not limited by broadcasts and foreigners
occurs is j.
voice conversations in various languages. The goal is to
allowing the machine to do a much better job detect, extract,
7. O represents the notation for observed values at time t, so
summarize and translate what counts information, thus
the observable symbol is O = O1, O2, O3, ..., O T with T
allowing humans to understand what they are She says being the length of the observation.
reading the transcript instead of listening to the audio signal.
C) Spontaneous voice recognition: 6. Design and Implementation

Despite reading speeches and similar types of speech, eg. In general, the purpose of designing the system made is to
News Shipped to read the text can be accurately recognized send voice commands that have been processed in matlab to
the microcontroller system.
over 95% using state-of-the-art voice recognition technology
and recognition accuracy decreased drastically spontaneous
discourse Expanding voice application Recognition is very 6.1. System Diagram Block
important to increase recognition performance for The system created in this final project can be described in
spontaneous speech. the following block diagram on figure 9:
0000
Figure 10: Hardware Design
Notes:
1. Microphone Voice input device from the speaker / user is
then forwarded by interconnect from microphone to line in
laptop.
2. Interconnect line out laptop to DTMF Decode Is a media
conductor DTMF signal from line out laptop to block DTMF
Decoder.
3. DTMF Block Decoder This block translates the received
DTMF signal from the laptop into a 4 bit binary number.
4. ATMega 16 Minimum System This block processes 4 bit
data from the DTMF decoder and defines it as one of the
commands in the database and displays it on the LCD Display.
5. LCD Display Device to display the results of commands
defined by ATMega 16.
6. AC Adapter to DC 5V Device that produces 5 V output
from 220 V AC input as system power supply
Figure 9: Block Diagram System
7. Conclusion:
Speech is the primary, and the most convenient means of
6.2. Software Design
communication between people. Whether due to technological
Implementation of feature extraction methods and
curiosity to build machines that mimic humans or desire to
classification is done by using MATLAB. The system is
automate work with machines, research in speech and speaker
divided into 2 processes namely the process of training and
recognition, as a first step toward natural human-machine
test process. The training process aims to generate a database
communication, has attracted much enthusiasm over the past
of HMM parameters from the MFCC characteristic
five decades. we have also encountered a number of practical
coefficients to be used in the test process. The test process
limitations which hinder a widespread deployment of
works by using the HMM parameters that have been
application and services. In most speech recognition tasks,
generated by the training process to determine the word model
human subjects produce one to two orders of magnitude less
that best matches the MFCC characteristic coefficients of the
errors than machines. Conclusion Based on the review
test data.
implementation and testing of speech recognition system
using HMM methods, there are several things that can be
6.3. Test
concluded, among others. The speech signal can be processed
Test In this option there are two test methods are: realtime and
and that can be trained and compared with the feature vectors
non-realtime. The non-realtime test tests the previously
that are obtained by processing the speech.
recorded test data, and the realtime test tests the data recorded
at the time
References:
6.4. Hardware Design
[1] Adhi, Agita Prasetyo , “Design & Implementaion MFCC
The hardware designed in this research is microcontroller
on Robot Using Voice Based on Micro controller”.
system of ATMega 16 minimum system, LCD Display circuit,
Institut Teknologi Telkom, 2012.
and DTMF circuit (Dual-Tone Multi-Frequency) Decoder
[2] Yuan Mang, “Speech Recognition on DSP: Algorithm on
with Power Source using AC to DC 5V adapter. Here is a
optimization & performance analysis”, The Chinese
picture of the device that have created.
University of Hong Kong, 2004.
[3] Huggins-Daines D., M. Kumar, A. Chan A. Black, M.
Ravishekar and A. Rudnicky. “Real-time continuous
speech recognition system”. Pocketsphinx: A free, 2006.
[4] Rumia Sultana and Rajesh Palit, “A Survey on Bengali
Speech-To-Text Recognition Techniques, The 9
0000
International Forum on Strategic Technology”, Cox’s [22] H.F.Olson and H.Belar, Phonetic Typewriter ,
Bazar: Bangladesh, 2014. J.Acoust.Soc.Am.,28(6):1072-1081,1956.
[5] S. Suganya and C. Sheeba Joice, “1506 Speech [23] D.B.Fry, Theoritical Aspects of Mechanical speech
Recognition Using Discrete Hidden Markov Model”, Recognition , and P.Denes, The design and Operation of
Department of ECE, Saveetha Engineering College: the Mechanical Speech Recognizer at Universtiy
Chennai, India, 2015. College London, J.British Inst. Radio Engr., 19:4,211-
[6] S. Suganya and G. Premaletha, “Speech to Text 299,1959.
Conversion Using Discrete Hidden Markov Model”, CK [24] J.Suzuki and K.Nakata, Recognition of Japanese Vowels
College of Engineering & Technology: Tamil Nadu, Preliminary to the Recognition of Speech , J.Radio
India, 2016. Res.Lab37(8):193-212,1961
[7] Joshi, Siddhant C. and Dr. A.N.Cheeran, “MATLAB [25] T.Sakai and S.Doshita, The phonetic typewriter,
based Feature Extraction using Mel Frequency Cepstrum information processing 1962 , Proc.IFIP Congress, 1962.
Coefficients for Automatic Speech Recognition”, [26] K.Nagata, Y.Kato, and S.Chiba, Spoken Digit
IJESTR, 3(6), 2014. Recognizer for Japanese Language , NEC Res.Develop.,
[8] Mikael Nilsson and Marcus Ejnarsson, “Speech No.6,1963.
Recognition using Hidden Markov Model performance [27] D.R.Reddy, An Approach to Computer Speech
evaluation in noisy environment”, Department of Recognition by Direct Analysis of the Speech Wave ,
Telecommunications and Signal Processing: Blekinge Tech.Report No.C549, Computer Science Dept.,
Institute of Technology, 2002. Stanford Univ., September 1966.
[9] Zayana Dwistha Baswara, “Design And Implementation [28] V.M.Velichko and N.G.Zagoruyko, Automatic
Of Speech Recognition For Smart House Applications Recognition of 200 words , Int.J.Man-Machine
Using Hidden Markov Model Model”, Universitas Studies,2:223,June 1970.
Telkom: Indonesia, 2014. [29] .H.Sakoe and S.Chiba, Dynamic Programming
[10] Satya Dharanipragada et.al, Gaussian mixture models Algorithm Optimization for Spoken Word Recognition
with covariance s or Precisions in shared multiple ,IEEE Trans.Acoustics, Speech, Signal Proc.,ASSP-
subspaces, , IEEE Transactions on Audio, Speech and 26(1):43- 49,February 1978.
Language Processing, V.14,No.4,July 2006 [30] F.Itakura, Minimum Prediction Residula Applied to
[11] Joshi, Siddhant, C, “MATLAB Based Feature Extraction Speech Recognition ,IEEE Trans.Acoustics,
Using Mel Frequency”. IJSTR, 2014 Speech,Signal Proc., ASSP-23(1):67-72,February 1975.
[12] Ben J.Shannon, KuldipK.Paliwal, “A Comparitive study [31] C.C.Tappert,N.R.Dixon, A.S.Rabinowitz, and
of Filter Bank Spacing for Speech Recognition”, 2003. W.D.Chapman, Automatic Recognition of Continuous
[13] Yuan Mang, “Speech Recognition on DSP : An Speech Utilizing Dynamic Segmentation, Dual
Algorithm on Optimization & Performance Analysis”, Classification, Sequential Decoding and Error Recover ,
The Chinese University of Hong Kong, pp.1-18, 2004. Rome Air Dev.Cen, Rome, NY,Tech.Report TR-71-
[14] Nitin Washani, “Speech Recognition System: A 146,1971.
Review”, M. Tech Scholar: DIT University, 2015. [32] F.Jelinek, L.R.Bahl, and R.L.Mercer, Design of a
[15] N. Srivastava, “Speech Recognition using Artificial Lingusistic Statistical Decoder for the Recognition of
Neural Network”, IJESIT, Volume 3, Issue 3, May 2014. Continuous Speech , IEEE Trans.Information Theory,IT-
[16] Rabiner, Lawrence R., “A Tutorial on Hidden Markov 21:250-256,1975.
Model and Selected Applications in Speech [33] R.K.Moore, Twenty things we still don t know about
Recognition” . IEEE ASSP Magazine, 1989. speech , Proc.CRIM/ FORWISS Workshop on Progress
[17] Pan Shing-Tai and Xu-Yu Li, “An FPGA based and Prospects of speech Research an Technology , 1994.
Embedded Robust Speech Recognition system designed [34] H.Sakoe, Two Level DP Matching A Dynamic
by combining Empirical Mode Decomposition and a Programming Based Pattern Matching Algorithm for
Genetic Algorithm”, IEEE Transaction on Connected Word Recognition , IEEE Trans. Acoustics,
Instrumentation and Measurement, 61(9),2012. Speech, Signal Proc., ASSP-27:588-595, December
[18] T. Ma, Y.-D. Kim, Q. Ma, M. Tang, and W. Zhou, 1979.
“Wireless And Mobile Computing, Networking And [35] J.S.Bridle and M.D.Brown, Connected Word
Communications”, IEEE International Conference, Recognition Using whole word templates , Proc.
2005. Inst.Acoust.Autumn Conf.,25-28,November 1979.
[19] A. J. Carlos and M. Paul, “Computer Science and [36] C.S.Myers and L.R.Rabiner, A Level Building Dynaimc
Information Systems”, ch. Ambient Intelligence: Time Warping Algorithm for Connected Word
Concepts and apllications, 2007. Recognition , IEEE Trans. Acoustics, Speech Signal
[20] Sadaoki Furui, 50 years of Progress in speech and Proc., ASSP-29:284-297,April 1981.
Speaker Recognition Research , ECTI Transactions on [37] C.H.Lee and L.R.Rabiner, A Frame Synchronous
Computer and Information Technology,Vol.1. No.2 Network Search Algorithm for Connected Word
November 2005. Recognition , IEEE Trans. Acoustics, Speech, Signal
[21] .K.H.Davis, R.Biddulph, and S.Balashek, Automatic Proc., 37(11):1649-1658, November 1989.
Recognition of spoken Digits, J.Acoust.Soc.Am., [38] J.Ferguson, Ed., Hidden Markov models for speech,
24(6):637-642,1952 IDA, Princeton, NJ,1980.
0000
[39] L.R.Rabiner, A tutorial on hidden Markov models and

selected applications in speech recognition ,
Proc.IEEE,77(2),pp.257-286,1989.
[40] Mohamed Afify and Olivier Siohan, Sequential
Estimation With Optimal Forgetting for Robust Speech
Recognition , IEEE Transactions On Speech And Audio
Processing, Vol. 12, No. 1, January 2004.
[41] Mohamed Afify, Feng Liu, Hui Jiang, A New
Verification-Based Fast-Match for Large Vocabulary
Continuous Speech Recognition , IEEE Transactions On
Speech And Audio Processing, Vol. 13, No. 4, July 2005
0000

My Paper

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

My Paper

Caricato da

Copyright:

Formati disponibili

International Journal of Faculty Electrical Engineering ISSN 0001-0002 Volume 1 Number 1 (2017) pp.

A Review: Speech to Text by Machine

Deyan Havith Dailamy

Abstract review paper project the classification process used is the

Figure 1: Block Diagram of Proposed Methodoly[6]

The original signal has to be split up into number of frames

2. Feature Extraction Techniques

Tabel 1: List of technique with their properties For Feature

Feature Extraction: Figure 2 describes the block diagram of

Figure 2: Feature Extraction Techniques [8]

the number of columns equal to the number of filters in the

 The Cepstral Coefficients

5. A is a transition opportunity between states. A = {a ij}

C) Spontaneous voice recognition: 6. Design and Implementation

Figure 10: Hardware Design

[39] L.R.Rabiner, A tutorial on hidden Markov models and

Potrebbero piacerti anche