Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2, April 2016
MIC search team, Laboratory MISC, Faculty of sciences, Ibn Tofail University KenitraMorocco
2
RIME search team, LRIE laboratory, Mahammadia School of Engineers, Mohammed
Vth University in Rabat, Morocco
3
OMEGA search team, Laboratory LERES, Faculty FSJES, Moulay Ismal University
Meknes-Morocco
ABSTRACT
The dictionary resources are very important for Natural Language Processing (NLP). Generating high
quality dictionary resources is a crucial step for the success and effectiveness of NLP application.
Linguistic information about lexical database is complex, large size and various (ie, phonological,
morphological, syntactic, semantic and pragmatic). Among such lexical database entries, we find
conjugated verbs. To this end, we present in this paper the open source mobile application of our
conjugator that we developed in Java platform under the Android. This Conjugator allowed us to generate
a lexicon of more than 18667 conjugated verbs. This lexicon will be used to generate textual words. The
resultant lexicon can be used in various applications such as morphological analysis (lexical approach),
text indexing, etc
KEYWORDS
NLP, Lexical databases, conjugator, Root, Pattern, textual words, morphological analysis.
1. INTRODUCTION
In the Arabic language, each word having a meaning consists of a root and a pattern. So we can
represent all the Arabic words by a matrix in which the patterns are the columns and roots are the
rows. A word is simply an element in this matrix [1] [3] [5] [9] (see Figure. 1).
13
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
14
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
Verbs: Aspect (perfect, imperfect, imperative); voice (passive, active); Number (singular,
dual, plural); gender (male, female); person (first person, second person, third person);
case ( nominative, accusative, apocope ); insistence.
(science )
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
Determining the verb class (see Figure. 6) by consulting the lexicon t of triliteral verbs or
make a treatment based on the length of the verb and the location of certain consonants.
The adequate model is determined from the triple (class tense, time, voice). (see Figure
7).
The triple (person, number, gender) and the model number allow to extract the desired
mould (see Figure 8).
17
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
Substitution operation consist to replace the mould numbers with the corresponding
consonants of the verb as shown in the example of table 1
Table 1. The substitution operation.
Verb
Mould
Substitution
( DaKaLa, Enter)
123
" (YaDoKuLu,Enter)
For each case, we define the actions to perform on the consonants and vowels of the verb. Table 2
shows an example of the transformations actions:
Table 2. Some transformations actions.
Action
number
Action
Coding of actions
18
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
Table 3. Application of the RT.
Rule number
Transformations actions
Example ( Before)
Example ( After)
2,3
2,3,4
Spellchecking: This step involves performing the possible corrections on the verb, based
on the lexicon of the corresponding anomalies and corrections; an example is giv-en in
table 4.
Table 4. Anomaly and correction.
Verb anomaly
&
&
The presentation module enables communication with the user, which can be a person or
an application in which our conjugator can be integrated. This module makes the
necessary checks on the verb to conjugate and determines the pattern to apply.
The substitution and transformation module enables to apply the transformation rules on
the pattern determined by the previous module.
The correction module applies correction rules on the result provided by the
transformation module to obtain the final shape.
The acquisition module and control will be used by a linguist to feed the repository by the
verbs, the models and the correction rules missing.
19
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
The repository of our conjugutor contains all relevant data: verbs, models and correction rules.
These data can be stored in several forms (relational DB, XML, Jason ...).
We implemented our conjugator, in its first version, as an Android mobile application using the
Java language for coding, the DBMS SQLite for managing the repository and XML to generate
mobile interfaces. The mobile application is open source and can be downloaded from the RIME
search team web site ( http://rime.emi.ac.ma/arabic_conjugator/conjugator.apk).
Figure 10 shows the interface that allows the user to enter the verb, choose the aspect, the voice,
and the mode of conjugation.
20
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
Arabic term
Number
Roots
6413
Triliteral roots
5635
Roots quadrilitaires
592
Canonical schemas
199
Irreducible trilitaires verbs () *
6138
Reducible trilitaires verbs ( "+ )
11832
Irreducible quadrilitaires verbs ( ) * 452
Reducible quadrilitaire verbs ( "+ )
245
21
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
The bending words have a termination that changes depending on the case: nominative,
accusative, apocope, etc
We try to generate the inflected forms (lexical kernel of the textual word), noted M0C, for each
verb in the lexicon M0. The principle of generation processing of inflected forms is as follows:
M0 is the set of verbs
M0 = v1, v2,...
For each m in M0
if m.VALG = imperfect then
+ Inflected_Form1 Generate_Accusative(m)
+ Inflected_Form2 Generate_ Apocope (m)
+ Add(Inflected_Form1, M0C) {to add Inflected_Form1 to the lexicon M0C}
+ Add(Inflected_Form2, M0C) {to add Inflected_Form1 to the lexicon M0C}
End if
End for
Where: VALG = Grammatical value of the verb (perfect, imperfect, imperative)
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
( his meaning)
+(
meaning)
23
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
6. CONCLUSIONS
In this paper, we presented the architecture of a system of conjugation of Arab verbs. It operates
according to a five-step process: determining the class of verb, determining the model, the
substitution operation, applying transformation rules, and spells correcting. The results of the
performed tests are very satisfactory.
We used the conjugator to enrich the lexical database by the verbs for morphological analysis
using the dictionary-based approach. We proposed also a micro-syntax and an algorithm to
generate the textual words to enrich the same lexical database.
We intend to use our conjugator as the core of a learning environment for Arabic and especially
REFERENCES
[1]
[2]
Ali Nabil (1988) Arabic Language and Computer, [in Arabic], Taareeb.
Hlal Yahya, (1979) Learning Methods for Morphosyntactic Analysis (Experienced in the case of
Arabic and French), thesis doctoral degree, University Paris.
[3] Hlal Yahya, (1987) Generation from the root and pattern, Conference on the progress of linguistics
in Arab countries.
[4] Hlal Yahya, (1990) Morphology and syntax of the Arabic language, In Proceedings of the Arab
School of Science and Technology Applied Arabic Linguistics for Informatics (pp.201).
[5] Mourchid Mohamed, (1999) Generation Morphological and Applications, Specialty thesis of 3rd
round, Mohammed V University in Rabat-Morocco.
[6] Nizar Y. Habash, (2010) Introduction to Arabic Natural Language Processing, Morgan & Claypool
Publishers series.
[7] Abdelhadi Soudi, Antal van den BoschNizar, & Gunter Neumann (2007) Arabic Computational
Morphology, Knowledge-based and Empirical Methods, Springer
[8] Joseph Dichy, Ali Farghaly, (2003) Roots & Patterns vs. Stems plus Grammar-Lexis Specifications:
on what basis should a multilingual lexical database centred on Arabic be built? Workshop on
Machine Translation for Semitic Languages: issues and approaches New Orleans, USA.
[9] Riyad Al-Shalabi, (2005) Pattern-based Stemmer for Finding Arabic Roots, Information
Technology Journal, 4(1): p. 38-43.
[10] A.Yousfi, (2010) The morphological analysis of Arabic verbs by using the surface patterns, IJCSI
International Journal of Computer Science Issues,7(3(11)): p. 33-36.
24
International Journal on Natural Language Computing (IJNLC) Vol. 5, No.2, April 2016
AUTHORS
M. MOURCHID Doctorate degree in Computer Science in 1999; Assistant Professor
at the Computer Science Department at the Faculty of sciences, Ibn Tofail University
in Kenitra Morocco; Ongoing research interests: Natural Language Processing, Web
Semantic, and information systems.
N. El Faddouli Doctorate degree in Computer Science in 1999; Assistant Professor at
the Computer Science Department at the Mohammadia School of Engineers (EMI); 15
recent publications papers between 2007 and 2015; Ongoing research interests: elearning, Big Da ta, Natural Language Processing, and information systems. Language
Processing, and information systems
25