Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
0000-0000
© Telkom University Publications. http://www.telkomuniversity.ac.id
DHMM
Modelling
gg
Pre- Frame
Emphasize Blocking
FFT Hamming
Windowing
Warping Log
Filter Bank
Cepstral DCT
Coefficients
(4)
In the 1960s several fundamental ideas in speech recognition [36], and the frame synchronous level building approach of
surfaced and were published. In the 1960s since computers Lee and Rabiner at Bell Labs[37]. most practical speech
were still not fast enough, several special purpose hardware recognition systems are based on the statistical frame work
were built [24]. However, the decade started with several
developed in the 1980s and their results, with significant
Japanese laboratories entering the recognition arena and
building special purpose hardware as part of their systems. On additional improvements have been made in the 1990s like
early Japanese system, described by Suzuki and Nakata of the HMM, Neural Net, and Darpa Programs:
Radio Research Lab in Tokyo, was a hardware vowel
recognizer [25]. An elaborate filter bank spectrum analyzer a) Hidden Markov Model (HMM):
was used along with logic that connected the outputs of each HMM is one of the key technologies developed in the 1980s,
channel of the spectrum analyzer (in a weighted manner) to a is the hidden Markov model(HMM) approach [38]. It is a
vowel decision circuit, and majority decisions logic scheme
doubly stochastic process which as an underlying stochastic
was used to choose the spoken vowel. Another hardware
effort in Japan was the work of Sakai and Doshita of kyoto process that is not observable (hence the term hidden), but
University in 1962, who built a hardware phoneme recognizer canbe observed through another stochastic process that
[25]. A hardware speech segmented was used along with a produces a sequence of observations. Although the HMM was
zero crossing analysis of different regions of the spoken input well known and understood in a few laboratories (primarily
to provide the recognition output. A third Japanese effort was IBM, Institute for Defense Analysis (IDA) and Dragon
the digit recognizer hardware of Nagata and coworkers at Systems), it was not until widespread publication of the
NEC Laboratories in 1963[26].
methods and theory of HMMs in the mid-1980s that the
3.1.3. 1970-1980:
In the 1970s speech recognition research achieved a number technique became widely applied in virtually every speech
of significant milestones. First the area of isolated word or recognition research laboratory in the world. In the early
discrete utterance recognition became a viable and usable 1970s, Lenny Baum of Princeton University invented a
technology based on fundamental studies by Velichko and mathematical approach to recognize speech called Hidden
Zagoruyko in Russia[27], Cakoe and Chiba in Japan[28], and Markov Modeling (HMM). The HMM pattern-matching
Itakura in the united States. The Russian studies helped the strategy was eventually adopted by each of the major
advance use of pattern recognition ideas in speech
companies pursuing the commercialization of speech
recognition; the Japanese research showed how dynamic
programming methods could be successfully applied; and recognition technology (SRT).The U.S. Department of
Itakura s research showed how the ideas of linear predictive Defense sponsored many practical research projects during the
coding (LPC), which had already been successfully used in 70s that involved several contractors, including IBM, Dragon,
low bit rate speech coding, could be extended to speech AT&T, Philips and others. Progress was slow in those early
recognition systems through the use of an appropriate distance years.
measure based on LPC spectral parameters[29]. .Another b) Neural Net:
milestone of the 1970s was the beginning of a longstanding,
Another new technology that was reintroduced in the late
highly successful group effort in large vocabulary speech
recognition at IBM in which researchers studied three distinct 1980s was the idea of applying neural networks to problems
tasks over a period of almost two decades, namely the New in speech recognition. Neural networks were first introduced
Raleigh language [30] for simple database queries, the laser in the 1950s, but they did not prove useful initially because
patent text language [31] for transcribing laser patents, and the they had many practical problems. In the 1980s however, a
office correspondent tasks called Tangora[32]. deeper understanding of the strengths and limitations of the
3.1.4. 1980-1990:
technology was achieved, as well as, understanding of the
Just as isolated word recognition was a key focus of research
technology to classical signal classification methods. Several
in the 1970s, the problems of connected word recognition was
new ways of implementing systems were also proposed [39].
a focus of research in the 1980s. Here the goal was to create a
c) DARPA Program:
robust system capable of recognizing a fluently spoken string
Finally, the 1980s was a decade in which a major impetus was
of words(eg., digits) based on matching a concatenated pattern
given to large vocabulary, continuous speech recognition
of individual words. Moshey J. Lasry has developed a
systems by the Defense Advanced Research Projects Agency
featurebased speech recognition system in the beginning of
(DARPA) community, which sponsored a large research
1980. Wherein his studies speech spectrograms of letters and
program aimed at achieving high word accuracy for a 1000
digits[33].A wide variety of the algorithm based on matching
word continuous speech recognition, database management
a concatenated pattern of individual words were formulated
task.
and implemented, including the two level dynamic
programming approach of Sakoe at Nippon Electric
3.1.5. 2000-2009:
Corporation (NEC)[34],the one pass method of Bridle and
A) General:
Brown at Joint Speech Research Unit(JSRU) in UK[35], the
level building approach of Myers and Rabiner at Bell Labs
0000
International Journal of Faculty Electrical Engineering ISSN 0001-0002 Volume 1 Number 1 (2017) pp. 0000-0000
© Telkom University Publications. http://www.telkomuniversity.ac.id
Around 2000, Bayesian estimates (VB) were varied and 3.2. The Process of Classification of Voice Recognition
pooling techniques developed [40]. Unlike the maximum Speech recognition is a process by which the computer can
Perhaps, this VB approach is based on the back the identify the spoken words. The identification process is
required for the system to recognize a voice input so that it
distribution parameters Giuseppe Richardi [41] have develop
can be utilized. Voice recognition results can be used to
techniques to solve adaptive problems learning, in the perform various tasks such as controlling the machine,
introduction of automatic speech and even proposed Active accessing the database, and producing text [16].
learning algorithm for ASR. In 2005, several Enhancements
have been made in the large Vocabulary Continuous Speech
Recognition. Performance System repair. In 2007, the
difference in acoustic functionality between spontaneous
speech and large-scale reading the greeting database, CSJ was
analyzed. Sadaoki Furui studied SR methods that can adapt to
the word variations using a large number of trained models
based on pooling techniques In 2008, the authors explored
application of the conditional random field (CRF) to combine
local post estimates given by multilayer perceptions based on Figure 8: HMM Model left-right
the frame rate forecast of the phone and the classes are
In this calculations, HMM uses several notations, including:
phonologically linked. De-wachter et.al. , trying to cope with
time dependence, deep problems voice recognition using a
1. N is represents the number of hidden states (state to 1,
direct model corresponding method Xinwei Li et.al., he
2, ..., n) in HMM. All the states in the model are connected to
proposed a new one optimization method is semi-final
each other.
programming (SDP) a solving the problem of large margin
estimate (LME) HMM continuous density (CDHMM) in
2. M is the probability of appearing observed variables.
voice recognition. The training dissociates the acoustic model
3. T is the length of the observation circuit
to speak the recognition is offered under Massimo reciprocal
information (MMI). Around 2007, Rajesh M. Hegde et al.,
4. π represents the matrix π or π i with π i = P (q 1 = 1) ie the
proposed an alternative method for processing Fourier
probability distribution matrix at the initial stage resides in
transform for speech extraction function, which group delay
state i. In this case, Σ N i = 1 π i = 1.
process (GDF) that can be directed calculated by voice signal.
0000
International Journal of Faculty Electrical Engineering ISSN 0001-0002 Volume 1 Number 1 (2017) pp. 0000-0000
© Telkom University Publications. http://www.telkomuniversity.ac.id
Notes:
1. Microphone Voice input device from the speaker / user is
then forwarded by interconnect from microphone to line in
laptop.
2. Interconnect line out laptop to DTMF Decode Is a media
conductor DTMF signal from line out laptop to block DTMF
Decoder.
3. DTMF Block Decoder This block translates the received
DTMF signal from the laptop into a 4 bit binary number.
4. ATMega 16 Minimum System This block processes 4 bit
data from the DTMF decoder and defines it as one of the
commands in the database and displays it on the LCD Display.
5. LCD Display Device to display the results of commands
defined by ATMega 16.
6. AC Adapter to DC 5V Device that produces 5 V output
from 220 V AC input as system power supply
Figure 9: Block Diagram System
7. Conclusion:
Speech is the primary, and the most convenient means of
6.2. Software Design
communication between people. Whether due to technological
Implementation of feature extraction methods and
curiosity to build machines that mimic humans or desire to
classification is done by using MATLAB. The system is
automate work with machines, research in speech and speaker
divided into 2 processes namely the process of training and
recognition, as a first step toward natural human-machine
test process. The training process aims to generate a database
communication, has attracted much enthusiasm over the past
of HMM parameters from the MFCC characteristic
five decades. we have also encountered a number of practical
coefficients to be used in the test process. The test process
limitations which hinder a widespread deployment of
works by using the HMM parameters that have been
application and services. In most speech recognition tasks,
generated by the training process to determine the word model
human subjects produce one to two orders of magnitude less
that best matches the MFCC characteristic coefficients of the
errors than machines. Conclusion Based on the review
test data.
implementation and testing of speech recognition system
using HMM methods, there are several things that can be
6.3. Test
concluded, among others. The speech signal can be processed
Test In this option there are two test methods are: realtime and
and that can be trained and compared with the feature vectors
non-realtime. The non-realtime test tests the previously
that are obtained by processing the speech.
recorded test data, and the realtime test tests the data recorded
at the time
References:
6.4. Hardware Design
[1] Adhi, Agita Prasetyo , “Design & Implementaion MFCC
The hardware designed in this research is microcontroller
on Robot Using Voice Based on Micro controller”.
system of ATMega 16 minimum system, LCD Display circuit,
Institut Teknologi Telkom, 2012.
and DTMF circuit (Dual-Tone Multi-Frequency) Decoder
[2] Yuan Mang, “Speech Recognition on DSP: Algorithm on
with Power Source using AC to DC 5V adapter. Here is a
optimization & performance analysis”, The Chinese
picture of the device that have created.
University of Hong Kong, 2004.
[3] Huggins-Daines D., M. Kumar, A. Chan A. Black, M.
Ravishekar and A. Rudnicky. “Real-time continuous
speech recognition system”. Pocketsphinx: A free, 2006.
[4] Rumia Sultana and Rajesh Palit, “A Survey on Bengali
Speech-To-Text Recognition Techniques, The 9
0000
International Journal of Faculty Electrical Engineering ISSN 0001-0002 Volume 1 Number 1 (2017) pp. 0000-0000
© Telkom University Publications. http://www.telkomuniversity.ac.id
International Forum on Strategic Technology”, Cox’s [22] H.F.Olson and H.Belar, Phonetic Typewriter ,
Bazar: Bangladesh, 2014. J.Acoust.Soc.Am.,28(6):1072-1081,1956.
[5] S. Suganya and C. Sheeba Joice, “1506 Speech [23] D.B.Fry, Theoritical Aspects of Mechanical speech
Recognition Using Discrete Hidden Markov Model”, Recognition , and P.Denes, The design and Operation of
Department of ECE, Saveetha Engineering College: the Mechanical Speech Recognizer at Universtiy
Chennai, India, 2015. College London, J.British Inst. Radio Engr., 19:4,211-
[6] S. Suganya and G. Premaletha, “Speech to Text 299,1959.
Conversion Using Discrete Hidden Markov Model”, CK [24] J.Suzuki and K.Nakata, Recognition of Japanese Vowels
College of Engineering & Technology: Tamil Nadu, Preliminary to the Recognition of Speech , J.Radio
India, 2016. Res.Lab37(8):193-212,1961
[7] Joshi, Siddhant C. and Dr. A.N.Cheeran, “MATLAB [25] T.Sakai and S.Doshita, The phonetic typewriter,
based Feature Extraction using Mel Frequency Cepstrum information processing 1962 , Proc.IFIP Congress, 1962.
Coefficients for Automatic Speech Recognition”, [26] K.Nagata, Y.Kato, and S.Chiba, Spoken Digit
IJESTR, 3(6), 2014. Recognizer for Japanese Language , NEC Res.Develop.,
[8] Mikael Nilsson and Marcus Ejnarsson, “Speech No.6,1963.
Recognition using Hidden Markov Model performance [27] D.R.Reddy, An Approach to Computer Speech
evaluation in noisy environment”, Department of Recognition by Direct Analysis of the Speech Wave ,
Telecommunications and Signal Processing: Blekinge Tech.Report No.C549, Computer Science Dept.,
Institute of Technology, 2002. Stanford Univ., September 1966.
[9] Zayana Dwistha Baswara, “Design And Implementation [28] V.M.Velichko and N.G.Zagoruyko, Automatic
Of Speech Recognition For Smart House Applications Recognition of 200 words , Int.J.Man-Machine
Using Hidden Markov Model Model”, Universitas Studies,2:223,June 1970.
Telkom: Indonesia, 2014. [29] .H.Sakoe and S.Chiba, Dynamic Programming
[10] Satya Dharanipragada et.al, Gaussian mixture models Algorithm Optimization for Spoken Word Recognition
with covariance s or Precisions in shared multiple ,IEEE Trans.Acoustics, Speech, Signal Proc.,ASSP-
subspaces, , IEEE Transactions on Audio, Speech and 26(1):43- 49,February 1978.
Language Processing, V.14,No.4,July 2006 [30] F.Itakura, Minimum Prediction Residula Applied to
[11] Joshi, Siddhant, C, “MATLAB Based Feature Extraction Speech Recognition ,IEEE Trans.Acoustics,
Using Mel Frequency”. IJSTR, 2014 Speech,Signal Proc., ASSP-23(1):67-72,February 1975.
[12] Ben J.Shannon, KuldipK.Paliwal, “A Comparitive study [31] C.C.Tappert,N.R.Dixon, A.S.Rabinowitz, and
of Filter Bank Spacing for Speech Recognition”, 2003. W.D.Chapman, Automatic Recognition of Continuous
[13] Yuan Mang, “Speech Recognition on DSP : An Speech Utilizing Dynamic Segmentation, Dual
Algorithm on Optimization & Performance Analysis”, Classification, Sequential Decoding and Error Recover ,
The Chinese University of Hong Kong, pp.1-18, 2004. Rome Air Dev.Cen, Rome, NY,Tech.Report TR-71-
[14] Nitin Washani, “Speech Recognition System: A 146,1971.
Review”, M. Tech Scholar: DIT University, 2015. [32] F.Jelinek, L.R.Bahl, and R.L.Mercer, Design of a
[15] N. Srivastava, “Speech Recognition using Artificial Lingusistic Statistical Decoder for the Recognition of
Neural Network”, IJESIT, Volume 3, Issue 3, May 2014. Continuous Speech , IEEE Trans.Information Theory,IT-
[16] Rabiner, Lawrence R., “A Tutorial on Hidden Markov 21:250-256,1975.
Model and Selected Applications in Speech [33] R.K.Moore, Twenty things we still don t know about
Recognition” . IEEE ASSP Magazine, 1989. speech , Proc.CRIM/ FORWISS Workshop on Progress
[17] Pan Shing-Tai and Xu-Yu Li, “An FPGA based and Prospects of speech Research an Technology , 1994.
Embedded Robust Speech Recognition system designed [34] H.Sakoe, Two Level DP Matching A Dynamic
by combining Empirical Mode Decomposition and a Programming Based Pattern Matching Algorithm for
Genetic Algorithm”, IEEE Transaction on Connected Word Recognition , IEEE Trans. Acoustics,
Instrumentation and Measurement, 61(9),2012. Speech, Signal Proc., ASSP-27:588-595, December
[18] T. Ma, Y.-D. Kim, Q. Ma, M. Tang, and W. Zhou, 1979.
“Wireless And Mobile Computing, Networking And [35] J.S.Bridle and M.D.Brown, Connected Word
Communications”, IEEE International Conference, Recognition Using whole word templates , Proc.
2005. Inst.Acoust.Autumn Conf.,25-28,November 1979.
[19] A. J. Carlos and M. Paul, “Computer Science and [36] C.S.Myers and L.R.Rabiner, A Level Building Dynaimc
Information Systems”, ch. Ambient Intelligence: Time Warping Algorithm for Connected Word
Concepts and apllications, 2007. Recognition , IEEE Trans. Acoustics, Speech Signal
[20] Sadaoki Furui, 50 years of Progress in speech and Proc., ASSP-29:284-297,April 1981.
Speaker Recognition Research , ECTI Transactions on [37] C.H.Lee and L.R.Rabiner, A Frame Synchronous
Computer and Information Technology,Vol.1. No.2 Network Search Algorithm for Connected Word
November 2005. Recognition , IEEE Trans. Acoustics, Speech, Signal
[21] .K.H.Davis, R.Biddulph, and S.Balashek, Automatic Proc., 37(11):1649-1658, November 1989.
Recognition of spoken Digits, J.Acoust.Soc.Am., [38] J.Ferguson, Ed., Hidden Markov models for speech,
24(6):637-642,1952 IDA, Princeton, NJ,1980.
0000
International Journal of Faculty Electrical Engineering ISSN 0001-0002 Volume 1 Number 1 (2017) pp. 0000-0000
© Telkom University Publications. http://www.telkomuniversity.ac.id
0000