Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
com
Using dierent acoustic, lexical and language modeling units for ASR
of an under-resourced language Amharic
Martha Yiru Tachbelie a,, Solomon Teferra Abate a, Laurent Besacier b
b
a
School of Information Sciences, Addis Ababa University, Addis Ababa, Ethiopia
Laboratoire dinformatique de Grenoble (LIG), Universite Joseph Fourier, Grenoble 1, France
Abstract
State-of-the-art large vocabulary continuous speech recognition systems use mostly phone based acoustic models (AMs) and word based
lexical and language models. However, phone based AMs are not ecient in modeling long-term temporal dependencies and the use of
words in lexical and language models leads to out-of-vocabulary (OOV) problem, which is a serious issue for morphologically rich languages. This paper presents the results of our contributions on the use of dierent units for acoustic, lexical and language modeling for
an under-resourced language (Amharic spoken in Ethiopia). Triphone, Syllable and hybrid (syllable-phone) units have been investigated
for acoustic modeling. Word and morphemes have been investigated for lexical and language modeling. We have also investigated the
use of longer (syllable) acoustic units and shorter (morpheme) lexical as well as language modeling units in a speech recognition system.
Although hybrid AMs did not bring much improvement over context dependent syllable based recognizers in speech recognition performance with word based lexical and language model (i.e. word based speech recognition), we observed a signicant word error rate
(WER) reduction compared to triphone-based systems in morpheme-based speech recognition. Syllable AMs also led to a WER reduction over the triphone-based systems both in word based and morpheme based speech recognition. It was possible to obtain a 3% absolute WER reduction as a result of using syllable acoustic units in morpheme-based speech recognition. Overall, our result shows that
syllable and hybrid AMs are best tted in morpheme-based speech recognition.
2013 Elsevier B.V. All rights reserved.
Keywords: Syllable-based acoustic modeling; Hybrid (phonesyllable) acoustic modeling; Morphemebased; Speech recognition; Under-resourced
languages; Amharic
1. Introduction
Many languages, especially languages of developing
countries, lack sucient resources and tools required for
the implementation of human language technologies. These
languages are commonly referred to as under-resourced or
low density languages (Besacier et al., 2006). The term
under-resourced languages introduced by (Berment, 2004)
refers to a language with some of the following aspects:
lack of a unique writing system or stable orthography,
Corresponding author. Tel.: +251 923518241.
182
183
Table 1
Amharic consonants (adapted from Leslau, 2000).
Manner of Articulation
Labial
Dental
Palatal
Velar
Glottal
Voiceless
Voiced
Glottalized
Rounded
p
b
p
t
d
t
c
g
c
k
g
q
Voiceless
Voiced
Glottalized
Rounded
Nasals
Voiced
Liquids
Voiceless
Voiced
Stops
Fricatives
Semivowels
kw, gw, qw
s
z
s
s
z
hw
n
l,r
w
184
instrument used for breaking can be derived. Case, number, deniteness, and gender marker axes inect nouns.
Adjectives are derived from nouns, stems or verbal roots
by adding a prex or a sux. For example, it is possible to
derive dIngayama stony from the noun dngay stone;
zngu forgetful from the stem zng; sEnEf lazy from the
root snf by suxation and intercalation. Adjectives can
also be formed through compounding. For instance,
hodEsE tolerant, patient, is derived by compounding
the noun hod stomach and the adjective sE wide. Like
nouns, adjectives are inected for gender, number, and case
(Yimam, 2007).
Unlike the other word categories such as noun and
adjectives, the derivation of verbs from other parts of
speech is not common. The conversion of a root to a basic
verb stem requires both intercalation and axation. For
instance, from the root gdl kill we obtain the perfective
verb stem gEddEl by intercalating the pattern E_E. From
this perfective stem, it is possible to derive a passive
(tEgEddEl-) and a causative stem (asgEddEl-) using the prexes tE- and as-, respectively. Other verb forms are also
derived from roots in a similar fashion. Verbs are inected
for person, gender, number, aspect, tense and mood
(Yimam, 2007). Other elements like negative markers also
inect verbs in Amharic. In this work, only the concatenative morphemes are considered.
1.3.3. Amharic writing system
Amharic is written in its own script which is known as
dEl. The Amharic script is syllabary since each symbol
represents a consonant combined with a vowel and the
vowel has no independent existence (Leslau, 2000). In other
words, each symbol in Amharic orthography represents a
CV syllable.1 The writing system consists of 276 distinct
symbols, 20 numerals and eight punctuations. There are
33 core consonants each of which have seven shapes or
orders according to the vowels combined to them as shown
in Table 2. This makes 231 (33 7) distinct symbols (CV
syllables) out of the 276. The remaining symbols include
labiovelars (20), labialized consonants (18) and the consonant v (which appears only in modern loan words like
viza meaning visa) in its seven orders.
1.3.4. Amharic syllable structure
Most Amharic linguists (Yimam, 2007; Haile, 1995)
agree that the syllable structure of Amharic is (C)V(C)(C)
where C represents a consonant and V a vowel. That means
the syllable types of Amharic are V, CV, CVC, VC, CVCC,
VCC. Some others (Seyoum, 2001) claim that the only
Amharic syllable types are CV and CVC. However, CV syllables cover the large majority of syllable distribution in
Amharic (H/Mariam et al., 2004). Since the Amharic
writing system is syllabary (representing approximately
1
Except the 6th order character that represents a consonant with or
without the vowel .
Table 2
Sample Amharic dEl.
CV syllables) and since CV syllables cover the large majority of syllable distribution in Amharic, only CV syllables
have been considered in the current investigation.
1.4. Previous works on Amharic speech recognition
Research in automatic speech recognition for Amharic
started in 2001 when (Berhanu, 2001) developed isolated
Consonant-Vowel syllable recognition system. Since then,
several attempts have been made in the academic research.
At the begining, the researches were conducted using small
data sets developed by the researchers themeselves for their
research purpose. The development of a medium size read
speech corpus (Abate et al., 2005) facilitated the research in
the area. Although there are several attempts (Tadesse,
2002; Seifu, 2003; Tachbelie, 2003; Girmaw, 2004; Seid
and Gamback, 2005; Abate, 2006; Abate and Menzel,
2007a; Abate and Menzel, 2007b; Pellegrini and Lamel,
2006a, 2009; Tachbelie et al., 2009, 2010, 2011), in this section we give only a review of pertinent works that investigated the use of units dierent from phone for acoustic
modeling and dierent from words for lexical and language
modeling.
For Amharic, the rst experiment on the use of syllable
in acoustic modeling is due to (Abate and Menzel, 2007a).
As the Amharic orthography has more or less a one to one
correspondence with consonant vowel (CV) syllabic
sounds, they experimented on the use of CV syllables in
acoustic modeling. Models with dierent HMM topologies
have been developed. A model with ve states per HMM
and with no skips found to be the best one in terms of accuracy. Compared to a triphone-based model, the context
independent syllable based model performed slightly worse
in terms of accuracy. However, the syllable based recognizers were found to be better in terms of recognition speed
and storage requirement. Thus, they concluded that the
use of CV syllables is a promising alternative in the development of automatic speech recognition for Amharic.
The application of automatic word decomposition
(using Harris algorithm) for Amharic speech recognition
has been investigated by (Pellegrini and Lamel, 2006a). In
their study, the units obtained through decomposition have
been used in both lexical and language models. They
reported recognition results for four dierent congurations: full word and three decomposed forms (detaching
both prex and sux, prex only and sux only). A word
error rate (WER) reduction over the base-line word-based
system has been reported using two hours of training data
in speech recognition in all decomposed forms although the
level of improvement varies. The highest improvement
(5.2% absolute WER reduction) has been obtained with
the system in which only the prexes have been detached.
When both prexes and suxes are considered, the
improvement in performance is small, namely 2.2%. As
the authors said, this might be due to the limited span of
the n-gram language models.
Decomposing lexical units with the same algorithm led
to worse performance when more training data (35 h)
was used (Pellegrini and Lamel, 2007). This can be
explained by a higher acoustic confusability. (Pellegrini
and Lamel, 2007, 2009) tried to solve this problem by using
other modied decomposition algorithms. Their starting
algorithm was Morfessor (Creutz and Lagus, 2005) which,
however, has been modied by adding dierent information. In Morfessor Baseline, the prior probability of getting
N distinct morphs (p(Lexicon) is estimated on the basis of
the frequency and length (character sequence probability)
of morphs. The rst modication made by (Pellegrini and
Lamel, 2007, 2009) aects the calculation of morph length.
In Morfessor Baseline character probabilities are static and
calculated as a simple ratio of the number of occurrences of
the character (irrespective of its place in words) divided by
the total number of characters in the corpus. Inspired by
Harris algorithm (Pellegrini and Lamel, 2007, 2009) made
the calculation context sensitive. The probability that a
word beginning (WB5) is a morpheme, is dened as the
ratio of the number of distinct letters L(WB) which can follow WB over the total number of distinct letters L. The
other modication is adding a phone-based feature in the
calculation of p(Lexicon). The third modication is to
avoid segmentation if it results in phonetically confusable
morphemes. During the decomposition process, morphemes that dier from each other by only one syllable
are compared. If the pair of syllables is among the most frequently confused pairs (found in their previous study (Pellegrini and Lamel, 2006b)), the segmentation is forbidden.
They were only able to achieve a word error rate reduction
if the phonetic confusion constraint was used to block the
decomposition of words which would result in acoustically
confusable units.
(Tachbelie et al., 2010) showed the eect of OOV words
on the performance of Amharic speech recognition system
and investigated the use of sub-word units in lexical and
language modeling with the aim of reducing the OOV rate
and thereby the performance degradation it causes. Morfessor which is a freely available, language independent,
unsupervised morphology learning tool that tries to identify all the morphemes found in a given word has been used
for morphological segmentation. The acoustic model used
in the study was a collection of cross-word tri-phone
185
186
The speech corpus used for speech recognition experiments is a read speech corpus (Abate et al., 2005) developed at the university of Hamburg. The audio corpus
was collected in the following manner. Texts were rst
extracted from news websites and then segmented by sentence. Recordings were made by native speakers reading
sentence by sentence with the possibility to rerecord anytime they considered having mispronounced. The corpus
contains 20 hours of training speech collected from 100
speakers who read a total of 10,850 sentences (28,666
tokens). Compared to other speech corpora that contain
hundreds of hours of speech data for training, this corpus is obviously small in size and accordingly the models
will suer from a lack of training data.
The corpus also includes four dierent test sets (5k
and 20k both for development and evaluation). However, for the purpose of the current investigation we
have used the 5k development test set, which includes
360 sentences (4106 tokens or 2836 distinct words)
read by 20 speakers.
The text corpus used for this study is the ATC_120k
(Tachbelie, 2010). It consists of 120,262 sentences
(2,348,150 tokens or 211,120 types). The ATC_120k corpus
has been used to derive the vocabulary for the pronunciation dictionaries and to train language models.
Approved
Rejected
Total
No. of hits
12
171
177
589
492
Manual
Automatic
584
497
1081
Table 4
Content of Rejected HITs.
Nature of rejected HITs
% of Rejected HITs
Empty
Non-sense
Copy from Instruction
Trying without knowledge
60.57
20.33
5.70
13.40
http://www.itl.nist.gov/iad/mig/tests/rt.
187
http://cmusphinx.sourceforge.net.
188
Table 6
Performance of CI syllable-based recognizers.
Syllable CI Models
WER in %
Syll_CI_12gau
Syll_CI_16gau
Syll_CI_24gau
Syll_CI_32gau
18.9
18.3
18
18.3
Table 5
Performance of triphone-based systems.
Triphone Models
WER in %
Triphone_2500sen_16gau_w?
Triphone_2500sen_16gau
Triphone_250sen_16gau_w?
Triphone_250sen_16gau
18.8
18.2
22.5
21.2
Table 7
Performance of CD syllable-based recognizers.
Syllable CD models
WER in %
Syll_CD_1000sen_16gau
Syll_CD_1500sen_24gau
Syll_CD_2000sen_24gau
Syll_CD_2500sen_24gau
Syll_CD_3000sen_24gau
Syll_CD_3500sen_24gau
18
17.3
17.6
17.6
17.9
17.9
189
Table 8
WER of hybrid acoustic models using word-based LM.
acoustic units context dependent. However, in our experiment only a slight improvement (WER of 18 Vs 17.3) has
been obtained as a result of context modeling. This can
be explained with the scarcity of training speech data. A
very large training data should be used to implement a context dependent syllable acoustic unit as the number of
parameters to be estimated is very large. As we have used
only 20 hours of training speech, it is obvious that our
models suer from insucient training data. The other reason is due to the nature of the acoustic unit itself, i.e. syllables are less context sensitive than phones.
Further parameter tuning of the best CD CV syllable
model (Syll_CD_1500sen_24gau) brought small improvement in performance (WER of 17.1%). As we did for the
best triphone-based system, we have used user dened
questions for decision tree clustering. However, no WER
reduction has been obtained. Rather, a slight increase in
WER (from 17.1% to 17.3%) has been observed. This can
be explained with the simplicity4 of the clusters that we
have dened for the syllable-based recognizers.
3.4. Hybrid recognizers
From our experiments on syllable-based acoustic modeling, we noticed that some of the CV syllables are relatively
rare in the training data and, therefore, not trained very
well. Fig. 2 shows the distribution of syllables in our training data. Since it is not feasible to record audio data in
order to enrich the rare syllables with the time and resource
available for our experiment and since our aim is to nd the
best way of developing a speech recognition system for
Amharic (an under-resourced language) using the available
data, we decided to decompose the rare CV syllables (based
on their frequency in the training transcription) into constituent phones and train hybrid (CV syllable and phones)
acoustic models.
The clusters (Nasals, fricatives, etc.) are dened based only on the
phonetic category of the consonant irrespective of the vowel of the
syllable.
Hybrid Models
WER in %
Hybrid_204Units_5statesWS
Hybrid_204Units_5statesWOS
Hybrid_204Units_4statesWOS
Hybrid_170Units_5statesWS
Hybrid_170Units_5statesWOS
Hybrid_170Units_4statesWOS
17.8
16.9
17.1
18.9
17.0
17.5
Using the pronunciation dictionary used in the CV syllable-based recognizers, we have prepared two versions of pronunciation
dictionaries:
HyridDict_FL100
and
HyridDict_FL500. In HyridDict_FL100, all the CV syllables
with a frequency of less than 100 have been decomposed into
their constituent phones. The number of distinct pronunciation units considered in this dictionary is 204 (31 phones,
172 CV syllables and a silence). CV syllables that appeared
less than 500 times in the training transcription have also been
decomposed (into phones) to form HyridDict_FL500 dictionary. The total number of distinct pronunciation units in this
dictionary is 170 (41 phones, 128 CV syllables and a silence).
Hybrid acoustic models have, then, been developed using
these dictionaries (HyridDict_FL100 and HyridDict_FL500)
with dierent HMM topologies.
While it might be possible to use dierent number of states
for various units (phones and syllables) in one system, we
decided to use a common HMM topology of ve states with
skips. We assume that this HMM topology handles the irregularities (in length) of the hybrid acoustic units. However,
for comparison purpose, we have also developed hybrid
acoustic models with four and ve states without skips.
The hybrid acoustic models have been evaluated using
the 65k pronunciation dictionary used in the syllable-based
recognizers. However, the CV syllables that are decomposed into phones in the training dictionaries have also
been decomposed in the 65k dictionary. Table 8 presents
the performance of the hybrid system evaluated on the
5k development test set.
As it can be seen from the table, decomposing rare syllables
into phones did not bring signicant performance improvement over the syllable based systems. Rather, in some cases,
the result is even worse compared to the pure context dependent CV syllable-based speech recognition systems. The use of
ve states with skip model topology led to the worst performance (17.8% and 18.9% for the acoustic models developed
with HyridDict_FL100 and HyridDict_FL500 dictionaries,
respectively). Although this topology enables us to capture
irregularities in acoustic units length, it requires much training data as the number of parameters (transition matrices) to
be estimated is larger than the number of parameters needed
in acoustic models without skips. This is why hybrid models
with skips did not perform well compared to that of ve states
without skips (see Hybrid_204Units_5statesWOS and
Hybrid_170Units_5statesWOS in the table).
190
191
Table 9
WER of several AMs with FSM segmented morpheme-based LMs.
Units
Models
WER in %
Phone
Triphone_3states
Triphone_3states_UDQ
16.7
15.9
CV Syllable
CD_Syllable
CD_Syllable_UDQ
14.3
14.6
Phone + CV Syllables
Hybrid_170Units_5statesWS
Hybrid_170Units_5statesWOS
Hybrid_170Units_4statesWOS
Hybrid_204Units_5statesWS
Hybrid_204Units_5statesWOS
Hybrid_204Units_4statesWOS
16
13.9
14.3
14.6
14.2
14.3
hybrid models described in Section 3. Morpheme-based trigram language models have been developed (in similar
fashion as the word trigram language model described in
Section 3.1) with the unsupervised and supervised morphologically segmented corpora. For the latter corpus, a 65k
morpheme vocabulary has been prepared by taking the
most frequent morphemes. This vocabulary has been used
to prepare three types of pronunciation dictionaries
according to the units (phone, CV syllable, and hybrid)
used in the acoustic models. Recognition experiment has,
then, been performed using the 5k development test set.
Table 9 gives the WER of dierent acoustic models in
FSM segmented morpheme-based recognition.
As the table shows, the CV syllable-based acoustic models outperformed the triphone-based ones. A 2.4% absolute
WER reduction (cf. Triphone_3states and CD_Syllable in
the table) has been obtained as a result of using syllable
acoustic units in morpheme-based recognition. This
improvement is statistically signicant with p-value of less
than 0.001. The use of user dened question for decision
tree-based clustering has positive inuence (resulted in a
0.8% absolute WER reduction) on the triphone-based
acoustic models. However, using user dened question
did not bring WER reduction for the syllable-based acoustic models as it is also true in word-based recognition
experiments described in Section 3.3. Nevertheless, the syllable-based acoustic model with user dened questions
(CD_Syllable_UDQ) resulted in a signicant (at p value
of 0.005) WER reduction compared to the equivalent triphone-based system (Triphone_3states_UDQ). Generally,
all hybrid acoustic models performed signicantly (at pvalue of 0.001) better than the triphone-based systems,
the best performing (with a WER of 13.9%) being the
Hybrid_170units_5states_WOS.5 This system has also a
slightly lower WER compared to the pure CV syllablebased systems. Although the model topology is crude in
representing the acoustic units (not state of the art for
phones), this model achieved the lowest WER among all
The acoustic models used in morpheme-based recognition are the triphone, context dependent CV syllables and
192
Table 10
WER of several AMs with morfessor seg. morpheme-based LMs.
Units
Models
WER in %
Phone
Triphone_3states
Triphone_3states_UDQ
17.8
15.9
CV Syllable
CD_Syllable
CD_Syllable_UDQ
14.8
13.9
Phone + CV Syllables
Hybrid_170Units_5statesWS
Hybrid_170Units_5statesWOS
Hybrid_170Units_4statesWOS
Hybrid_204Units_5statesWS
Hybrid_204Units_5statesWOS
Hybrid_204Units_4statesWOS
15.5
13.7
13.5
13.7
13.7
13.3
the others. This indicates the potential of the hybrid acoustic model for even higher performance provided that
proper topologies are used for each of the units.
For the unsupervised morpheme-based recognition, all
the distinct morphemes in the segmented text (45k) have
been considered as entries for pronunciation dictionaries.
As we did for the FSM segmented text, we prepared three
versions of pronunciation dictionaries according to the
type of units used in the acoustic models. Performance of
the Morfessor morpheme-based recognition using the different acoustic models on the 5k development test set is
presented in Table 10. As it is true in the FSM segmented
morpheme-based recognition experiment, the use of syllable acoustic models led to greater WER reductions in morpheme-based recognition. 3% and 2% absolute WER
reductions have been obtained compared to triphone-based
ones that use automatically and user dened tree questions,
respectively. These error rate reductions are statistically
signicant with the p-value of less than 0.001. The result
clearly shows that syllable-based acoustic models are best
tted for morpheme-based recognition. The hybrid acoustic models (except Hybrid_170units_5statesWS) outperformed the triphone and CV syllable-based models
although the improvement over the syllable-based model
with user dened question (CD_Syllable_UDQ) is not statistically signicant. In all the other cases the WER reduction is statistically signicant at p-value of less than 0.001
compared to the triphone-based systems and with minimum p-value of 0.002 compared to the syllable-based
one. This shows that the hybrid models are the best for
Amharic morpheme-based recognition systems.
Although the dierence is not big, Morfessor-based segmentation led to a lower WER than FSM-based segmentation. This can be explained by a relatively high OOV rate
(3.58%) of FSM based system compared to that of Morfessors which is almost zero (0.10%). The results presented in
this section also showed that using morphemes (instead of
words) as entries in pronunciation dictionary and units in
language model brings improvement in Amharic speech
recognition system. Our results show that the use of long
acoustic units (syllables) and short lexical units
(morphemes) is the best for Amharic speech recognition.
CD syllables are the best alternative acoustic units provided that enough training data is available.
Hybrid (phone-syllables) based recognizers did not
bring signicant performance improvement over the best
CD syllable based recognizers when word units are used
in the pronunciation dictionary and language model.
However, they brought better WER reductions in morpheme-based speech recognition. The syllable-based
acoustic models also outperformed the triphone-based
models in morpheme-based speech recognition. This
enables us to conclude that the use of syllables and hybrid
units in acoustic modeling and morphemes in lexical and
language modeling is the best for Amharic speech recognition. We recommend investigation of the use of syllable
acoustic models in morpheme-based speech recognition
for other morphologically rich languages.
In the current study, only consonant vowel (CV) syllables are considered. As Amharic has other syllable structures (V, VC, CVC, CVCC) as well, we will investigate
the use of all types of syllables in acoustic modeling. We
think that a better performance can be obtained by considering all syllable structures in acoustic modeling because
the use of long (in time) units will improve recognition.
However, considering all syllable structures has its own
challenge. Since the number of syllables will be much larger
than 233 (considered in this study), large training speech
data is required to train all the syllables adequately. Thus,
a way to model all Amharic syllable types while using the
available training data has to be found. Experiments with
dierent (from 5) number of states per HMM will also be
conducted for the syllable based acoustic modeling.
In hybrid acoustic modeling, we have used an HMM
topology of 5 states with skips assuming that this topology
can handle the irregularities (in length) of the acoustic
units. However, since the number of parameters estimated
in this model topology is very big, large training data is
required to adequately train such models. As we have used
only 20 hours of training speech data, we could not see the
benet of using such a topology in hybrid acoustic models.
Instead a high WER has been observed. Thus, in order to
get the real benet of using models with skips, large training data has to be used. Since acquiring data is not easily
achievable (especially for under-resourced languages),
using dierent model topologies for syllable and phone
units in hybrid acoustic modeling will be investigated.
In our morpheme-based systems we decomposed all the
words in our corpus irrespective of their frequency in our
data. Decomposing rare words while keeping frequent
words as they are is an interesting direction in dealing with
morphologically rich languages. An approach followed by
(EI-Desoky et al., 2009), where no segmentation is done for
the top N highly ranked decomposable words is an interesting future endeavor.
Last but not least, specic language issues like gemination, epenthetic vowel insertion and the glottal stop realization will also be handled. We will model geminated and
non-geminated consonants separately. In addition, the
193
194
Seifu, Zegaye, 2003. HMM Based Large Vocabulary, Speaker Independent, Continuous Amharic Speech Recognizer. M.Sc. Thesis, School
of Information Studies for Africa, Addis Ababa University, Ethiopia.
Sethy, Abhinav, Narayanan, Shrikanth, Parthasarthy, S., 2002. A syllable
based approach for improved recognition of spoken names. In:
Proceeding of the 5th ISCA Pronunciation Modeling Workshop, pp.
3035.
Seyoum, Mulugeta, 2001. The Syllable Structure and Syllablication in
Amharic. Masters thesis, Department of Linguistics, Trondheim,
Norway.
Siivola, Vesa, Hirsimaki, Teemu, Creutz, Mathias, Kurimo, Mikko.
Unlimited vocabulary speech recognition based on morphs discovered
in an unsupervised manner. In: Proceedings of Eurospeech, pp. 2293
2296.
Snow, Rion, OConnor, Brendan, Jurafsky, Daniel, Ng, Andrew Y.,
2008. Cheap and fast but is it good? Evaluating non-expert
annotations for natural language tasks. In: Proceedings of EMNLP08, pp. 254263.
Stolcke, Andreas, 2002. SRILM an extensible language modeling
toolkit. In: Proceedings of ICSLP-2002. Denber, Colorado, USA, pp.
901904.
Tachbelie, Martha Yiru, 2003. Automatic Amharic Speech Recognition
System to Command and Control Computers, M.Sc. Thesis, School of
Information Studies for Africa, Addis Ababa University, Ethiopia.
Tachbelie, Martha Yiru, 2010. Morphology-Based Language Modeling
for Amharic. Ph.D. thesis, University of Hamburg, Germany.
Tachbelie, Martha Yiru, Abate, Solomon Teferra, Menzel, Wolfgang,
2009. Morpheme-based language modeling for Amharic speech
recognition. In: Proceedings of the 4th Language and Technology
Conference LTC-09, pp. 114118.
Tachbelie, Martha Yiru, Abate, Solomon Teferra, Menzel, Wolfgang.
Morpheme-based automatic speech recognition for a morphologically
rich language Amharic. In: Proceeding of SLTU10, Penang,
Malaysia, pp. 6873.
Tachbelie, Martha Yiru, Abate, Solomon Teferra, Menzel, Wolfgang,
2011. Morpheme-based and factored language modeling for Amharic
speech recognition. Lecture Notes in Computer Science, Human
Language Technology: Challenges for Computer Science and Linguists, vol. 6562, pp. 8293.
Tachbelie, Martha Yiru, Abate, Solomon Teferra, Besacier, Laurent,
2011. Part-of-speech tagging for under-resourced and morphologically
rich languages the case of Amharic. In: Proceedings of the HLTD
2011, pp. 5055.
Tadesse, Kinfe, 2002. Sub-Word Based Amharic Speech Recognizer: An
Experiment Using Hidden Markov Model (HMM). M.Sc. Thesis,
School of Information Studies for Africa, Addis Ababa University,
Ethiopia.
Thangarajan, R., Natarajan, A.M., 2008. Syllable based continuous
speech recognition for Tamil. South Asian Language Review 17 (1),
7185.
Voigt, Reiner M., 1987. The classication of central semitic. Journal of
Semitic Studies 32 (1), 121.
Whittaker, E.W.D., Woodland, P.C., 2000. Particle-based language
modeling. In: Proceeding of International Conference on Spoken
Language Processing, pp. 170173.
Whittaker, E.W.D., Van Thong, J.M., Moreno, P.J., 2001. Vocabulary
independent speech recognition using particles. In: IEEE Workshop on
Automatic Speech Recognition and Understanding, pp. 315318.
Woodland, P.C., Leggetter, C.J., Odell, J.J., Valtchev, V., Young, S.J.,
1995. The 1994 HTK large vocabulary speech recognition system. In:
Proceedings of the 1995 International Conference on Acoustics,
Speech and Signal Processing, vol. 1, pp. 7376.
Yimam, Baye, 2007. yEamarNa sEwasEw, second ed., EMPDE, Addis
Ababa.