Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 23
Abstract— Text classification has been widely used to assist users with the discovery of useful information from the Internet.
However, traditional classification methods are based on the word representation, which only accounts for term frequency in the
documents, and ignores important relationships between key terms and fields. This paper considering a specific word that related to
the field called Field Association (FA) words by considering their ranks (levels). Moreover, this paper built a Java software system to
make classification on Arabic text using keyword, FA words and compound FA words. Furthermore, a comparative study of keywords,
FA words and compound FA words on Arabic text were done using experimental results generated by our software. The presented
methods estimated by simulation results of 1819 files and 16 super fields.
Index Terms— Field association words; compound Field association words; Arabic Information retrieval; precision; recall;
document classification.
—————————— ——————————
1 INTRODUCTION
preformed separately for each class". In this way, the means female teacher in English). The dual is formed by
most important words specific to each class are adding ( )انor ( )ﻳ ﻦat the end of the noun as in ()ﻣﻌﻠﻤ ﺎن
determined. (moaleman- which means two teacher in English). In the
third case, often referred to as broken plurals, the pattern
The concept of Field Association (FA) words is
of the singular noun is dramatically altered. We can
based on the fact that the subject of a text (document
recognize these plurals from the patterns. There are 27
field) can usually be identified by looking at certain
patterns for most of the broken nouns.
specific words or phrases in that text. It is natural for
Another kind of suffixation is the personal
people to identify the field of a document when they
pronouns. The personal pronoun can appear as an
notice these specific words or phrases. These specific
isolated form or as suffixes attached to the nouns, verbs,
words or phrases are called FA words. A FA word is
or prepositions. Certain suffixes are attached at the end of
defined as the minimum word or phrase that serves to
words to make them possessive pronouns. The attached
identify a particular field [20]. FA words form a limited
can be one letter, for example (( )ﺑﻴﺘ ﻲbaiety which means
set of discriminating terms that can specify document
my home in English). When the letter "("يy) is attached to
fields [4, 5]. For example, “homerun” indicates the
the end of the word (( )ﺑﻴ ﺖbaiet- which means "home" in
subfield <Baseball> of super-field <Sports>, and “US
English) to form “my house” in English. For plural, two
presidential election” indicates sub-field <Election> of
letters are attached to the end of the word, for the
super-field <Politics>. Therefore, “homerun” and “US
masculine, the letters "( "ه ﻢhom) are attached ()ﺑﻴ ﺘﻬﻢ
presidential election” are examples of FA Terms.
(baiethom which means their home in English) and the
The aim of this paper is to evaluate the use of letters "( "ه ﻦhn) for the feminine noun (( )ﺑﻴ ﺘﻬﻦbaiethn-
normal keywords via the use of field association words which means their home in English) . These are the most
for Arabic language. In addition, [4] present compound common modifications to Arabic words. Example 1
FA words and applied it in classification English text. In shows different pattern for one word "( "ﻃﻔ ﻞtefl which
this paper we applied it to Arabic documents with means child in English). And, summarizations for Arabic
response to the morphological analysis for Arabic affixes are shown in table (1). Dictionaries do not store
language. Accordingly, this paper presents two methods every form of regular words. Most dictionaries entries are
to classify the Arabic document using FA words and NB stored in singular form except the words that are usually
classifier. used in the plural like (( ) آﻤﺎﻟﻴ ﺎتkamaleyat- which
The reset of paper organized as follows. Section 2 means “luxuries” in English). The verbs are stored in
of this paper presents Arabic word formation. Also, perfect form. Therefore, compound FA terms may be
defines Compound FA words in Arabic language. Section classified as permanent or temporary. Permanent
3 defines Arabic document field tree. Section 4 identifies compounds are fixed by common usage and can usually
Field association words, its levels and how we determine be found in a dictionary. Temporary compounds consist
it. Section 5 is a comparison with traditional classification of words with addition. Dictionaries or reference books
methods. Section 6 illustrates how we classify Arabic may disagree on the evolutionary stage of a compound or
document using FA words and compound FA words. may not include temporary compounds.
Section 7 represents the experimental evaluation and the Compound FA terms appear in the Arabic text
comparison with other traditional approaches. Section 8 more, and some compound FA terms become more
focuses on conclusion and future work. restricted and allow more information retrieval If
2. ARABIC WORD STRUCTURE compound FA terms are divided into single words, they
will be ranked lower and will be at different levels, for
Unlike the English language, nouns in Arabic can example, compound FA terms " "أﻧﻔﻠ ﻮﻧﺰا اﻟﺨﻨ ﺎزﻳﺮrelate to
be masculine or feminine. The nouns can be definite as in sub-field <( >اﻷﻣ ﺮاضal amrad- which means diseases in
(( )اﻟﻤﻌﻠ ﻢal moalem- which means the "teacher" in English) English) with high rank. Therefore, using compound FA
or indefinite as in (( )ﻣﻌﻠ ﻢmoalem- which means a teacher words are more accurate than single FA words. [18, 35]
in English) Adding the prefix (( )اﻟ ـal which means "the" explain methods for determining compound FA words.
in English) makes a difference in meaning. Plurals in Antefixes Prefixes infixes Suffixes Postfixes
Arabic are three kinds; they are the masculine plural, the وﺑﺎل, وال, ﺑﺎل, ﻓﺎل, ال, وﻟﻞ, ا, ن, ي, ت ا, و ﺗﻤﺎ, ﻳﻮن, ﺗﻴﻦ, ﺗﺎن, ات, ان, ﺁﻣ ﺎ, هﻤ ﺎ, ﺁن, ه ﻦ,
feminine plural, and the broken plural. The plural is ال, وب, ول, ﻟﻞ, ﻓﺲ, ﻓﺐ, ﻓﻞ,
وس, ك, ف, و, ب, ل
ون, ﻳﻦ, وا, ﺗﺎ, ﺗﻢ, ﺗﻦ, ﻧﺎ,
ت, ن, ا, ي, و
ﺗﻲ,
هﺎ, ﻧﺎ, هﻢ, ﺁم, ك, ﻩ,
formed via suffixes or via pattern modification of the Prepositions meaning Letters A letter Terminations of
ي
Pronouns
nouns. In the first case, the suffix ~een for the accusative respectively: and meaning the add give conjugation for meaning
with conjugation meaning verbs respectively:
( )ﻣﻌﻠﻤ ﻴﻦfor the masculine plural (moalmeen which means the, and the, with the, person of the person and your,
then the, as the, and verbs of dual/plural/female their, your,
teachers in English) and genitive or ~oon for the to in the present conjugation marks for nouns their,
(for) the, the, and tense my, her, our,
nominative ( )ﻣﻌﻠﻤ ﻮنthe broken plural (moalmoon which with, their,
means teachers in English) is appended to the and to (for), then will,
then with, then to
your,
his,
your,
3. FIELD TREE
A document field is defined as basic and
FA word Field association path Levels
common knowledge useful for human communication "("اﻟﺘﻤﺮﻳﺾal-tamreed <<( >اﻟﻄﺐ\ﻋﻠﻮم اﻟﺼﺤﺔal tep / olom al sehha >- which mean <
[37, 40]. A field tree is a schematic representation of - which means
nursing in English)
medicine \health science> in English) 1
relationships among document fields. Leaf nodes in the < <(>اﻟﻄﺐ\ﻋﻠﻮم اﻟﺼﺤﺔal tep / olom al sehha >- which mean <
"( "اﻟﺼﻴﺪﻟﺔal-sydalah - medicine \health science> in English)
field tree correspond to terminal fields, nodes connected which means <<( >اﻟﻄﺐ\ﺻﻴﺪﻟﺔal tep al-sydalah >- which mean < medicine \
2
to the root are super-fields and other nodes correspond to pharmacology in
English)
pharmacology > in English)
documents, contains of 15 super-fields and 114 sub-fields. Table 2: shows an example for the levels of FA words
An example of a field tree is given in Figure 1. All FA 4.2 Determination of FA words
terms and paths are manually assigned to levels. The traditional algorithm [4, 39, 50] that automatically
determines the candidates for FA words and their ranks
causes misleading redundant words (unnecessary
words). In [5] the author introduced normalized word
4. BACKGROUND WITH FA WORDS
frequency instated of word frequency. In this paper, we
4.1 FA words and levels will use it to extract efficient Arabic FA words and use it
A single FA word indicates a minimum unit in Arabic document classification.
(word) with semantic meaning that identifies a particular Definition 2: Let (<T>) be the total frequency of all words
field e.g., words "( " ﻓﻴﺮوسvairoos which means Virus in in the terminal field <T>; let (w, <T>) be the frequency of
English), "( "أﻧﻔﻠﻮﻧﺰاanfluanza - which means flu in the word w in the terminal field <T>, the (Normalization
English) are single FA words. A compound FA word (w, <T>)) can be defined as follows:
consists of two or more single FA words. e.g., terms "أﻧﻔﻠﻮﻧﺰا
Normalization (w, <T>) = Frequency( w, T ) (1)
"( اﻟﺨﻨﺎزﻳﺮanfluanza al khanazeer - which means swine flu Total _ Frequency( T )
in English) ,"( "ﻋﻠﻮم اﻟﺤﺎﺳﺐoloum el haseb - which means
computer science in English) "( "ﻧﻈﻢ اﻟﻤﻌﻠﻮﻣﺎتnozom el The normalized frequency defines how much a specific
malomat- which means information system in English) word is concentrated in a specific field.
are compound FA words. Definition 3: For the parent <S>, the child field <C>, the
FA words have its strength and scope particularly; the concentration ratio (Concentration (w, <C>)) of the FA
scope is ambiguous .In [5] the levels of the scope are word w in the field <C> is defined as in the following:
defined as follows for the field identification.
Definition 1 FA words have different scope to associate
with a field; five precision levels are used to classify FA Normalization( w, c )
words to document fields, they are: concentration( w, c)
Levels of FA words are: Normalization( w, s )
(Level 1): perfect FA words are associated with (2)
one subfield uniquely. The following algorithm determines FA words by
(Level 2): medium FA words are associated with considering their ranks.
a few subfields of one super- field. Algorithm1: FA words determination algorithm
(Level 3): super FA words are associated with Input: (a) w, candidates for FA words,
one super-field. (b) Normalization (w, <C>) for w and for field
(Level 4): multiple FA words are associated with <C>,
a few subfield of some different super-field. (c) Threshold α, to judge FA words ranks,
(Level 5): non-FA words unable to specify the (d) Field tree.
fields. Output: associated FA words and their ranks for w.
Example 2: The following Table shows some examples for (Step 1): Determination of Perfect FA words
FA words and their field path and levels. (tep - which For the root = <S>, the child field = <S/C> of the
means medicine in English) field tree, the following formula is used to judge
whether or not the word w is a Perfect-FA word.
concentration( w, S ) (3).
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
(Al riyadah -which means sport in English)
اﻟﺮﻳﺎﺿﺔ
WWW.JOURNALOFCOMPUTING.ORG 26
اﻟﺼﺤﺔ
اﻟﺘﻐﺬﻳﺔ
If Formula (3) is fulfilled, <S/C> is replaced by <S> swine flu in English) in the medium field < >اﻟﻄ ﺐwhere
and the same judgment is carried out on the field “”أﻧﻔﻠ ﻮﻧﺰاappears the most frequently. As the
<S/C>. By repeating the same determination process, determination is made only in the terminal field<C>
if <S/C> becomes a terminal field, w is determined =< > أﻧﻔﻠ ﻮﻧﺰا اﻟﺨﻨ ﺎزﻳﺮand the concentration ratio is (0.965)
as a Perfect FA word in the field <S/C>. If the field exceeds the threshold α (0.90), the word " "أﻧﻔﻠ ﻮﻧﺰاis
<S/C> cannot fulfill the condition in Formula (3), determined as a Perfect FA word in the terminal field >.
then the process enters (Step 2). < أﻧﻔﻠﻮﻧﺰا اﻟﺨﻨﺎزﻳﺮ
So, when applying Algorithm 2 after this algorithm all
(Step 2): Determination of Semi-perfect or Medium FA documents that will check and have the same word as a
words perfect FA word return to the same field. Otherwise, if a
If w is not determined as a Perfect FA word in the field < document has the word as a semi perfect or medium FA
S/C >, the terminal field has not been reached. word then one or more children field will appear
Therefore, the field <S> should be a medium field and 5. COMPARISON WITH TRADITIONAL
has at least two or more (m≥2) children fields. From all CLASSIFICATION
children fields <S/Ck> (1<k<m) of the medium field
Popular supervised machine learning text classification
<S> calculate the average value of k times children
algorithms include Naïve Bayes, Centroid-based [21], k-
including word w as in the following:
Nearest Neighbour (kNN) [31], and Support Vector
m Normalization (w , c k ) Machines (SVM) [49]. Centroid-based, kNN and SVM are
k 1 (4) based on the vector space model which represents text
m documents as vectors consisting of features. Naïve Bayes
classifier [21] is a probabilistic model based on applying
Accumulated Concentration (w, <S/Ck>) ratio for the
Bayes’ theorem with strong (naive) independence
children fields has higher normalized frequencies than
assumptions.
the average value in Formula (4). If the accumulated
These algorithms normally use the classical text
concentration ratio of k times (1<k<m) exceeds α and the representation technique [41] that maps a document to a
children fields <S/Ck> are all terminal fields, w is high dimensional feature vector consisting of a “bag of
judged as a Semi-perfect FA word in fields <S/Ck>. If words”. This representation leads to the inclusion of
the accumulated value does not exceed the threshold α, unimportant features, and the loss of important semantic
w is determined as a Medium FA word of field <S>. relationships and inflection information [29], resulting in
However, if all of these children fields are not terminal accuracy reduction.
fields, the process enters (Step 3) and conducts the Moreover, the accuracy of a text classifier
determination process of Multiple FA words. depends to a large extent upon the classification
(Step 3): Determination of Multiple FA words granularity, and on how well separated the training or
Extract the terminal field <S/C> from k children fields test documents belonging to different categories as in
and determine w as a Multiple FA word of the field < [32]. It may be relatively easy to classify two distinct
S/C>. Except for the terminal fields the child field categories such as ‘sports’ and ‘politics’, but it may be
<S/C> is changed into root <S> of the field tree, repeat more difficult to distinguish between similar categories.
the process to conduct (Step 1) and (Step 2). Then, many To overcome such problems, [3] has proposed
medium fields and terminal fields are obtained, and w is the use of compound words extracted by morphological
judged as a Multiple FA word of the field <S>. (end of analyzers. [32] Used automatically extracted summaries
algorithm) rather than the whole documents while [24] have
Example 3: Consider FA word candidates ""أﻧﻔﻠ ﻮﻧﺰا proposed the use of a limited set of automatically selected
(anflwanza- which means flu in English)” " ﺗﻐﺬﻳ ﺔ keywords as features. Recently, [30] have investigated the
,(taghzya- which means feeding in English) and “”ﻣﺼ ﻞ use of ontology for text categorization. However, the
(masel- which means serum in English) as in Figure 1. problems are still far from being solved.
The number of children fields in <root> is 16 field. We In this paper, we introduce a text classification
choosed ,<( >ﻃﺐtep - which means medicine in English) methodology based on Field Association (FA) words,
,<(>اﻟﺼ ﺤﺔal seha- which means health in English) and compound FA words in Arabic language. As FA words of
, <(>اﻟﺘﻐﺬﻳ ﺔtaghzya- which means feeding in English) are a field collectively store the essence (knowledge) of that
subfields A hreshold value α was chosen to be 0.90. field, they are effective for text classification. Accordingly,
this paper presents two methods to classify the Arabic
In (Step 1), suppose that w is “ “أﻧﻔﻠ ﻮﻧﺰاand < S> is <root>.
document using FA words and NB classifier.
The word “ ”أﻧﻔﻠ ﻮﻧﺰاappears the most frequently in the
selecting field ,< > ﻃ ﺐthen calculate the concentration
ratio of the field <C>= < > ﻃ ﺐon the field <S/C> = 6. DOCUMENT CLASSIFICATION USING FA
<root/ ,< >>ﻃﺐ WORDS
Concentration(< > اﻟﻄﺐ," =)"أﻧﻔﻠﻮﻧﺰا0.91 Text classification techniques are used in many
Repeating the same process, select terminal field < applications, including e-mail filtering, mail routing,
( >أﻧﻔﻠ ﻮﻧﺰا اﻟﺨﻨ ﺎزﻳﺮanfluanza al khanazeer - which means spam filtering, news monitoring, sorting through
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 28
digitized paper archives, automated indexing of scientific applied to classify any Arabic document such as web
articles, classification of news stories and searching for documents, scientific paper, articles, news and others. In
interesting information on the WWW this paper the experimental is formed on a collection of
[6,11,13,23,25,26,43,44,53,57,58, 52, 49, 48, 45,54, 55]. The web documents because, its unstructured data and hardly
majority of these systems are designed to handle to classify it.
documents written in the English language, but it is not 7.1 Experimental Data
applicable to documents written in the Arabic language. Our experiments trained the system using Arabic
For example, stemming techniques refers to the process of documents collected from the Internet. It mainly collected
removing affixes from words. So, if we remove the affixes from Al-jazeera Arabic news channel which is the largest
from the following Arabic word"( "وﺳﺎمwisaam- which Arabic site, Al-Ahram newspaper, Al-watan newspaper,
means accolade in English) after stemming became ""ﺳﺎم Al Akhbar, Al Arabiya and Wikipedia the free
(saam- which means poisonous in English). The encyclopedia. The documents categorized into 16 super-
meaning of the word change at all. For more detail about field and 137 subfields. The number of files in our corpus
defects in applying traditional methods in information is 1,819 file and it is about 26.4 MB. For experimental
retrieval for Arabic language refer to [1]. Developing text evaluation, we download a source code written in JAVA
classification systems for Arabic documents is a from
challenging task due to the complex and rich nature of http://nlp.cs.byu.edu/mediawiki/index.php/CS601R:Project
the Arabic language. Previous work on Arabic text _1_Guidelines.
classification has used distance-based algorithms [38], In addition, we modified it to be suitable for
Learning algorithms [9], and Bayesian classification the new NB algorithm. Also, we prepare the system
methods [8] in developing automated text classification according to
systems. Specifically in [10] used N-grams is used for http://nlp.cs.byu.edu/mediawiki/index.php/How_to_prepar
searching Arabic text documents. e_your_system.
In the following we illustrated the new algorithm and 7.2 Preprocess
explain how to use FA words in Arabic document Before applying the classification algorithm for
classification. testing data, some preprocessing in the text been
Because all FA words for a document must be exist, performed. All the experiments are performed after
Algorithm 2 normalizing the text. In normalization, the text is
first calls Algorithm 1 to find all FA words. Then the converted to UTF-8 encoded and punctuations and non-
algorithm generate the derivation frame for each FA letters are removed. Also, some Arabic letters are
word by added all affixes for the FA word, all affixes are normalized such as:
abbreviate in Table 1. Each document contains these FA Replace a final “ “ؤwith,""ء
word or any of its derivatives belongs to the same field. Replace a final " "َﺊwith,""ء
Algorithm 2: FA Words Classification Algorithm Replace a final " "تwith,""ة
Input: (a) V {v1 , v2 ,...., vn } is the set of FA words
Replace ,""إ,""أor " "ﺁwith,""ا
(b) , ,…, is a collection of unclassified
Replace ""ىwith,""ي
documents
Replace ""ةwith,""ﻩ
Output: F the classification of D.
Method:
Replace ""وءwith ,""ؤ
1. Run Algorithm 1 to get the set of FA words Replace ""ئwith ," "ىand
2. Replace ""ااwith.""ا
3. Set F= { } In addition, all Arabic text contains redundant words
4. Set
or unnecessary word, these words called stop words.
5. for each vi V , do They are very common words that appear in the text that
6. for each k , 1 do carry little meaning; they serve only a syntactic function
7. if vi d k ,copy d k to Fi , but do not indicate subject matter. These stop words have
8.
two different impacts on information retrieval process.
9. else, go to step 4 They can affect the retrieval effectiveness because they
10. Return F. have a very high frequency and tend to diminish the
The new idea for use FA word with derivation frame to impact of frequency difference among less common
Arabic language is more suitable to face the complexity of words. Deleting the stop words, the document changes
Arabic morphology. In addition, it can be applied on length and affects the weighting process. Identifying a
earlier techniques such as vector space model; stop words list or a stop list that contains such words in
probabilistic model and language model to modified it order to eliminate them from text processing is essential
and become efficient and suitable for Arabic language. to an information retrieval system. [2] explores the use of
The aim of this paper is appearing the advantage of use stop words and their effect on Arabic information
FA words with derivation frames for Arabic language retrieval. A general stop list1 is created, base on the
using comparable study. Arabic language structure and characteristics without any
7
7. EXPERIMENTAL 1 The stop lists for all the languages are available at
The new method for using FA words can be http://www.unine.ch/info/clef.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 29
additions. The word categories that used are: Name of field Precision Recall F-
Adverbs. measure
Conditional pronouns. ( اﻻﺳﺘﻨﺴﺎخal estensakh- which means 0.67 1 0.8
Interrogative pronouns. the cloning in English)
Prepositions. ( اﻷﻣﺮاضal amrad- which means 0.74 0.69 0.71
Pronouns. diseases in English)
Referral names/ determiners. ( اﻟﺒﻴﺌﺔal beaa- which means the 0.72 1 0.8
association words was made. Finally, a classification ( اﻻﺳﺘﻨﺴﺎخal estensakh- which 0.86 1 0.92
using compound FA words was made. means the cloning in English)
Simulation results for classification (اﻷﻣﺮاضal amrad- which means 0.83 1 0.9
Input data: (keywords, text) diseases in English)
Output: classified data according to keywords. ( اﻟﺒﻴﺌﺔal beaa- which means the 0.75 0.9 0.83
We have used about 150 keywords selected by human environment in English)
from corpus. Precision, Recall and F-measure are used to ( اﻟﺘﻜﻨﻮﻟﻮﺟﻴﺎal tecnologia- which 0.74 0.99 0.85
estimate relevancies of the presented methods and means technology in English)
defined as follows [8] [35]: ( ﺟﺴﻢ اﻹﻧﺴﺎنgesem al ensaan - 0.52 1 0.7
which means the human body
in English)
/ ( ﻣﻘﺎﻻت ﻋﻠﻤﻴﺔmakalat elmiah- 0.57 1 0.73
which means scientific articles
/ in English)
(1984). [55] Yang J, Zhong N., Yao Y. and Wang J. "Peculiarity Analysis for
[37] Olshen R. A., Breiman L., Friedman J. H., and Stone C. J. “Classification Classifications," icdm, pp.607-616, 2009 Ninth IEEE International
and regression trees”. London: Chapman & Hall, (1984). Conference on Data Mining, 2009
[38] Paice C. D. "Constructing literature abstracts by computer: techniques [56] Zechner K. “Fast generation of abstracts from general relevant
and prospects". Information Processing and Management, 26, 171–186, sentences”, In Proceedings of the 16th international conference
(1990). on computational linguistic (COLING’96) (pp. 986–989), (1996).
[39] Peng F., Huang X., Schuurmans D. and Wang S. "Text Classification in [57] Zhang W. and Yoshida T. “Text classification based on multi-word
Asian Languages without Word Segmentation", In Proceedings of the with support vector machine”, Knowledge-Based Systems, 21(8), 879-
Sixth International Workshop on Information Retrieval with Asian 886, (2008).
Languages (IRAL 2003), Association for Computational Linguistics, [58] http:lltrec.nist.gov/pubs/trec10/papers/UMass_TREC10_final.pdf,
Sapporo, Japan (2003). (2002)
[40] Safavian S. R., and Landgrebe D. “A survey of decision tree classifier
methodology”, IEEE Transactions on Systems, Man, and Cybernetics, M . E. Abd El-Monsef M . E. Abd El-Monsef received the B.S&ED
degree in Mathematics from Assuit University, Egypt, in 1968 and the B.S
21(3), 660–674, (1991).
degree from Faculty of Science, Assuit University, Egypt, in 1973. He
[41] Salton, G., Wong, A., Yang, C. "A vector space model for automatic received the MS degree in Mathematics from Al Azhar University, Cairo,
indexing", Communications of the ACM, 18(11), 613–620, (1975). Egypt, in 1977.
[42] Sawaf H., Zaplo J. and Ney H. " Statistical classification methods for He received his PhD degree in Mathematics from Tanta University,
Tanta, Egypt, in 1980. He was assistant professor in the Department
Arabic news articles". Natural Language Processing in ACL2001,
of Mathematics, Faculty of Science, Tanta University, from 1984. He
Toulouse, France (2001). was a professor of Mathematics in the Department of Mathematics,
[43] Sebastiani F. " Machine learning in automated text categorization. Faculty of Science, Tanta University, from 1988. He worked as Vice
ACM Computing Surveys", Vol. 34 number 1, pp.1-47, (2002). Dean of Faculty of Science, Tanta University, for postgraduate and
researches affair from 1991 to 1996. He worked as Vice Dean of
[44] Shin, K., Abraham, A., Han, A. "Enhanced Centroid-Based
Faculty of Science, Tanta University for students affairs from 1996 to
Classification Technique by Filtering Outliers", In Proceedings of the 9th 1999. He worked as Dean of Faculty of Science, Tanta University
International Conference on Text, Speech and Dialogue, (2006). from 1999 to 2005. He was Chairman of the Scientific Committee of
[45] Saha S. and Bandyopadhyay S. "A new multi objective clustering Promotion to Assistant Professors Post in Mathematics of the
Scientific Council of the Egyptian Universities from 2001 to 2004.
technique based on the concepts of stability and symmetry"
Member of the Scientific Committee of Promotion to Professors Post
,Knowledge and Information Systems ,Volume 23, Number 1 / April 1- in Mathematics from 2004 to 2008. He is a member of National
27, DOI: 10.1007/s10115-009-0204-4, (2010 ). Committee for Mathematics, a member of National Committee for
[46] Tayli M. and Al-Salamah A. "Building Bilingual Microcomputer History and Philosophy of Science, member of the Board of Directors
of Egyptian mathematics Society, member of the editorial board of
Systems". In Communications of the ACM, Vol. 33, No.5, Pages 495-
the Journal of Egyptian Mathematical Society, member of the
505, (1990). Egyptian Society for the Arabization of Sciences. He Was the Editor
[47] Tang X. and Han M. "Ternary reversible extreme learning machines: of the Delta Science Journal .He is Representative of Tanta
the incremental tri-training method for semi-supervised classification", University in the League of Islamic Universities, member of Editorial
Board of the international scientific journal Science Echoes and
Knowledge and Information Systems Volume 23, Number 3, 345-372,
Applied Mathematics & Information Sciences. He is a member of the
DOI: 10.1007/s10115-009-0220-4, (2010). Supreme Advisory Committee of the Centre for Development of the
[48] Tamine-Lechani L., Boughanem M. and Daoud M. "Evaluation of Delta Region of the Academy of Scientific Research and
contextual information retrieval effectiveness: overview of issues and Technology. He participated in more than 90 Scientific Conference
and seminar specialist. He Also supervised over 51 PhD and about
research" Knowledge and Information Systems Volume 24, Number 2,
47 Master. Sovereignty to a lot of researches in the fields of general
221-233, DOI: 10.1007/s10115-009-0245-8, (2010). topology and fuzzy topology about 100 research papers published in
[49] Tezel S. and Latecki L., "Improving SVM Classification on Imbalanced scientific journals, interior and exterior prestigious. Was awarded the
Data Sets in Distance Spaces," icdm, pp.259-267, 2009 Ninth IEEE University of Tanta estimated in the basic sciences for the year
2001/2002. His research interests include General Topology, Rough
International Conference on Data Mining, (2009).
Sets, Digital Topology and Fuzzy Sets.
[50] Tsuji T., Nigazawa H., Okada M., and Aoe J. "Early Field Recognition
by Using Field Association Words", In the Proceeding of the 18th
International Conference on Computer Processing of Oriental Dr. El-Sayed Atlam: Received B.Sc. and M. Sc. Degrees in
Language, 2, 301-304,(1999). Mathematics from, Faculty of Science, Tanta University,
Egypt, in 1990 and 1994, respectively, and the Ph.D. degree in
[51] Wei F., Li w., Lu Q. and He Y. "A document-sensitive graph model for information science and Intelligent systems from University of
multi-document summarization", Knowledge and Information Tokushima, Japan, in 2002. He has been awarded by a Japan
Systems, Volume 22, Number 2, 245-259, DOI: 10.1007/s10115-009- Society of the Promotion of Science (JSPS) postdoctoral Fellow from
0194-2,(2010). 2003 to 2005 in Department of Information Science & Intelligent
Systems, Tokushima University; He is currently Associate professor
[52] Wang H. and Wang S. “Mining incomplete survey data through at the Department of information science and Intelligent systems
classification” , Knowledge and Information Systems Volume 24, from University of Tokushima, Japan. He is also Associate professor
Number 3, 441-465, DOI: 10.1007/s10115-009-0214-2 at the Department of Statistical and Computer science, Tanta
[53] Yang Y. and Liu X. “A Re-examination of Text Categorization University, Egypt. Dr. Atlam is a member in the Computer Algorithm
Series of the IEEE computer society Press (CAS) and the Egyptian
Methods”, In Proceedings of SIGIR-99, 22nd ACM International Mathematical Association (EMA). His research interests include
Conference on Research and Development in Information Retrieval. information retrieval, natural language processing and document
Berkeley (1996). processing.
[54] Yang Y. and Pedersen J.O. “A Comparative Study on Feature Selection
in Text Categorization”, In: Proceedings of the 14th International Mohammed Amin was graduated in mathematics in 1983 at
Conference on Machine Learning 412–420, (1997). Menoufiya University. He studied computer science from 1986 to
1989 at Ain Shams University in Cairo and received the M.Sc.
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 4, APRIL 2011, ISSN 2151-9617
HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/
WWW.JOURNALOFCOMPUTING.ORG 32
degree in 1990 and the Ph.D degree in computer science in 1997 at
the University of Gdansk, Poland. He is associate professor of
computer science at the faculty of science, Menoufiya University,
and research visitor to the faculty of Philosophy and sciences of the
Silesian University, Opava, Czech Republic. Hisresearch area in
formal languages and their application in compiler design.
Cooperating/distributed systems, web information retrieval, Petri nets
and its applications, and finite automata and cryptograph.