A Machine Learning Approach For Classifying Sentiments

A Machine Learning Approach For Classifying Sentiments
in Arabic tweets
Rihab Bouchlaghem Aymen Elkhelifi Rim Faiz
LARODEC, ISG, University of Tunis, Paris Sorbonne University, France LARODEC, IHEC, University of
Tunisia Carthage, Tunisia
rihab.bouchlaghem@isg.rnu.tn Aymen.Elkhlifi@paris.sorbonne.fr Rim.Faiz@ihec.rnu.tn
ABSTRACT
This research field has found many useful applications such as:
Nowadays, sentiment analysis methods become more and more
opinionated web search, automatic analysis of product reviews,
popular especially with the proliferation of social media platform
discover of customers opinions as part of marketing purposes.
users number. In the same context, this paper presents a sentiment
Sentiment analysis is also used politics since it allows to predict
analysis approach which can faithfully translate the sentimental
election results or to know public opinions about different
orientation of Arabic Twitter posts, based on a novel data
policies. Consequently, it is being actively studied by researchers
representation and machine learning techniques. The proposed
particularly with the use of machine learning algorithms for
approach applied a wide range of features: lexical, surface-form,
various languages. Most of existing research on sentiment
syntactic, etc. We also made use of lexicon features inferred from
analysis focuses on English text [1, 2, 3, 4, 5, 6, 7, 8].Despite its
two Arabic sentiment words lexicons. To build our supervised
reputation as one of the most used languages in the world, a few
sentiment analysis system, we use several standard classification
number of research has been dealt with Arabic sentiment analysis
methods (Support Vector Machines, K-Nearest Neighbour, Naïve
[15, 16, 17].
Bayes, Decision Trees, Random Forest) known by their
effectiveness over such classification issues.
In our study, Support Vector Machines classifier outperforms In this work, we want to study the sentiment analysis for the case
other supervised algorithms in Arabic Twitter sentiment analysis. of the Modern Standard Arabic Twitter posts from a machine
Via an ablation experiments, we show the positive impact of
lexicon based features on providing higher prediction
learning perspective. For this purpose, we propose a novel data
performance.CCS Concepts representation model applying several lexicon based features
• Computing methodologies➝Artificial intelligence➝Natural generated from two different sentiment words lexicons, in
language processing➝Language resources • Computing addition to other features categories (syntactic, linguistic, etc.)
methodologies➝Machine learning approaches. To investigate the impact of the proposed features set, we applied
five supervised classification algorithms: Support Vector
Keywords Machines (SVM), K-Nearest Neighbour (KNN), Naïve Bayes
Sentiment analysis; Twitter; Modern Standard Arabic; Supervised
(NB), Decision Trees (DT) and Random Forest (RF). The
classification; Arabic sentiment lexicon
proposed classification system was tested on Arabic Tweets
related to recent terroristic acts and organizations in Arabic world.
1. INTRODUCTION To our knowledge, research on sentiment in texts related to such
Recently, sentiment analysis becomes to be one of the most domain is almost non-existent. The obtained results show that
rapidly emerging research areas. The main purpose of sentiment SVM classifier gives the best classification accuracy. We also
analysis is to extract users’ sentiments/opinions from created propose an ablation experiment allowing to evaluate the impact of
contents by using automatic mining techniques to determine their each features group on the classification system performance.
attitudes with respect to some topic, often expressed in textual For the rest of this paper, we first introduce Twitter, our data
form. source. Then, we describe our data collecting, filtering and pre-
Permission to make digital or hard copies of all or part of this processing methods. After that, a detailed description of the
work for personal or classroom use is granted without fee proposed data representation and applied features is given. The
provided that copies are not made or distributed for profit or next section is about experiments steps and results. Finally, we
commercial advantage and that copies bear this notice and the full conclude the paper and point out directions for future work.
citation on the first page. To copy otherwise, or republish, to post
on servers or to redistribute to lists, requires prior specific 2. Twitter, our data source
permission and/or a fee. Our corpus is collected from one of the most known social
WIMS ’16, June 13 - 15, 2016, Nîmes, France. networks: Twitter. It’s a free micro blogging service where a great
number of users broadcast their content. Public figures such as
 2016 Copyright held by the owner/author(s). Publication rights celebrities and politicians, media channels and companies are also
licensed to ACM. interested in Twitter; having use it to engage with their followers.
ISBN 978-1-4503-4056-4/16/06. . . $15.00 Twitter text unit is named “tweet” which is a short text easily
DOI: http://dx.doi.org/10.1145/2912845.2912874
disseminated. Tweets have a specific syntax that must respect Firstly, tweets of the corpus are normalized to format all
some conventions which comprise [4]: characters which can cause confusion. In fact, the normalization
 Limited characters’’ number: a tweet content can’t, in any task consists, on the one hand, in converting all the various forms
way, exceed 140 characters, of a word to a common form. On the other hand, there are other
Arabic characters which must be removed, being mainly the
 Mention: is the case when a user inserts a “@username” in his
shadda ligature; a special symbol used to accentuate the
tweet in order to mention the corresponding user “usename”,
consonant (e.g. “‫”ع ّذب‬, means “to torment”), and diacritics,
for example:
representing short vowels in Arabic texts (e.g.: “‫ب‬ ََ ‫ َ“ ُع ِّذ‬,”‫ب‬
ََ ‫)” َع َّذ‬
“ ‫ساعة_استجابة‬#‫”يارب تنتهي فتنة الخوارج في هذا الشهر الفضيل‬ [21]. We have used an existing normalizer to apply most of the
@Manar480 Arabic language normalization rules such as: {‫ئ‬،‫ؤ‬،‫;>ء< → }ء‬
 Reply: is a mention particular case when the “@username” {‫ا‬،‫إ‬،،‫أ‬،‫>ا< → }اآ‬.
mention is placed in the beginning of the tweet in order to In the same perspective, we perform the last pre-processing task
start conversation with another user, e.g.: allowing deleting URL, usermentions, and the '#' symbol, after
‫ساعة_استجابة‬#‫“يارب تنتهي فتنة الخوارج في هذا الشهر الفضيل‬ extracting Twitter specific features.
:@Manar48” The corpus is ready for specific NLP methods. We perform
 Retweet: when a user re-shares a tweet previously posted by tokenization and POS (Part Of Speech) using the NLP tool
another user. The new tweet comprises the RT symbol and the Stanford Parser2[22].
username of the original tweet publisher, followed by the
original tweet content. 4. Data representation
We propose to represent Arabic tweets of our dataset by applying
 Hashtag: is a term having a hash symbol (#) prefix, commonly commonly used text classification features like: n-grams and part-
used in social networks. Hashtags tend to represent the topic of-speech tag counts, as well as common Twitter-specific features
or the key words of the tweet [21]. such as user mention and hashtag counts.
3. Data We also introduce several sentiment features generated from two
Based on Twitter API1, we developed our tweet collecting module sentiment lexicons introduced in [31]. Then, each tweet is
and used it to automatic gather tweets about recent major political represented as a features vector.
events and social reforms in the Arabic world. Therefore, we 4.1.1 Lexicon based features
usually had launched collect campaigns in specific time steps The sentiment lexicon features are derived from two newly
using suitable key words to retrieve relevant tweets which are created Arabic sentiment words lexicons.
subjective towards specific target entities, events, organizations,
etc. 4.1.1.1 General-purpose lexicon
Following [12], these features are generated from general purpose
The resulted collected corpus comprises a great number of lexicon involving subjective words and their sentiment scores. For
duplicated and non-informative tweets. We propose a token based each token t occurring in a tweet and present in the lexicon, we
similarity measurement to identify all similar tweets in a given use its sentiment score to compute:
corpus.
 The number of tokens with score (t) ≠0;
Lets:  The number of tokens with sentiment score (t) > 0;
 T1, T2 : two pre-processed tweets (after deleting urls and,  The number of tokens with sentiment score (t) < 0;
@RT symbols and @usermentions),  The total sentiment score = ∑𝑡€ 𝑡𝑤𝑒𝑒𝑡 𝑠𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡_𝑠𝑐𝑜𝑟𝑒(𝑡)
 Token(T), a function that returns the tokens of a tweet T  The maximal score = max t€tweet sentiment_score(t)
(result of the tokenization task), The general purpose lexicon introduced in [32] is created with
large word-sentiment association lists which are automatically
We define Sim_Tweet, a similarity measure based on Tweet’s generated from an existed English polarity annotation corpus,
token counting, as follows: manually filtered, and automatically expanded and translated.
|𝑇𝑜𝑘𝑒𝑛(𝑇1)𝑇𝑜𝑘𝑒𝑛(𝑇2)| 4.1.1.2 Tweet specific lexicon
𝑆𝑖𝑚_𝑇𝑤𝑒𝑒𝑡(𝑇1, 𝑇2) = (1)
|𝑇𝑜𝑘𝑒𝑛(𝑇1)𝑇𝑜𝑘𝑒𝑛(𝑇2)|
The proposed tweet-specific sentiment lexicon [32] is built using a
The proposed measurement helps us in tweets filtering. gold seed words set manually annotated and extracted from the
data set, and automatically expanded from 500000 tweets using
From the collected corpus, we have selected a tweet subset to be
co-occurrence and coordination computing methods.
manually annotated for sentiment polarity. Each tweet in this
subset must: hold one sentiment with clear orientation, be written We mainly employed two features inspired from this lexicon:
in MSA, have informative content, etc. If a given tweet respects  The number of tokens appearing in positive tweet specific
the relevance restriction cited above, it will receive one of these lexicon;
tags: positive, negative or neutral, expressing the tweet owner’s  The number of tokens appearing in negative tweet specific
position. lexicon;
Because of the Arabic language particularities, several NLP 4.1.2 Linguistic features
methods can't be directly applied to Arabic texts and yield valid We explored a set of linguistic features able to handle several
output. Thus, the filtered corpus needs to be pre-processed to
Arabic language structures: negation (“‫”ال‬,”‫”ليس‬, “‫”لم‬, “‫)”لن‬
promote an efficient use of such methods.
1 2
http://twitter4j.org/en/index.html http://nlp.stanford.edu/
intensifiers (“‫”كثيرا‬, “‫)”جدا‬, supplication and questions. Table 1 In our context, classifying a given tweet according to its sentiment
summarises the purpose of such features and gives related polarity consists in performing multi-class categorization by
examples. mapping it to the classes positive, negative or neutral.
4.1.3 Tweet specific features
Figure 1. The obtained KNN classification results when
As many existing works, we call Twitter-specific features the
varying the K value
commands and conventions used by Twitter users in their posts.
We used the following features:
0,800
 URL (or links): computes the number of links in tweet, 0,700
 User mentions: identifies the number of username mentions in
0,600
a given tweets. It also indicates if the tweet replies to other
K value
users, 0,500
 Presence of retweet symbol “RT”, 0,400
 Hashtag number, 0,300
 Tweet’s Length. 0,200
0,100
0,000
2 3 4 5 6 7 8 9
Table 1. Linguistic features examples
F-measure 0,64 0,65 0,63 0,65 0,66 0,66 0,66 0,65
Precision 0,63 0,62 0,61 0,63 0,63 0,63 0,63 0,61
Features Arabic Arabic Markers
English translation Recall 0,68 0,71 0,65 0,74 0,73 0,75 0,75 0,75
type example markers translation
To perform the classification step, we decided to apply various

Prayer ‫“لك الحمد يا‬ “praise be to you, supervised machine learning algorithms from different statistical
”‫هللا‬ o GOD” approaches. The first one is Support Vector Machine (SVM)
which is proved to be effective on text categorization tasks and
‫“فلماذا ال‬ “Why do some ‫من‬ who robust on large feature spaces. In the preliminary experiments, a
Question linear-kernel SVM showed better performance than an SVM with
markers ‫يتعظ بعض‬ young enthusiasts ‫ماذا‬ what
other used kernels. Naïve Bayes (NB) is also applied, being
‫الشباب‬ do not learn a ‫كيف‬ how
yielded similar performance to SVM models [30]. We employed
‫المتحمسين؟‬ lesson?” ‫لماذا‬ why
K Nearest Neighbors (KNN) classifier which stores all available
”
cases and classifies new cases based on a similarity measure. We
4.1.4 Sentence level features tested with many K values and concluded that k=6 gives best F
measure value, as shows Figure 1.
We employed the following sentence specific features:
 The number of contiguous sequences of exclamation marks The last applied classifiers are Random Forest (RF) which
”!!!”, question marks “???”, and both exclamation and question operates by constructing a multitude of decision trees, and the J45
marks ”?!?!!”. classifier.
 The total number of punctuation marks. To built-in our classifiers, we have applied Weka API [29] using
10-fold cross validation. Experiments and results are presented
 The number of words with one character repeated more than and discussed in section 7.
two times, for example, “‫”ههههههههه‬.
5.2. Classification results
4.1.5 N-gram features
We employed character n-gram, the presence or absence of We used the evaluation measurements introduced in [28]. We
contiguous sequences of 3 characters. trained the classifiers listed in the previous section on the set of
2000 annotated tweets using 10-fold cross validation. Table 5
5. Experiments and results describes the obtained classification results.
In this section, we present firstly the supervised classifications
algorithms we employed to build our classification system. After Table2. Classification results
that, we detail performed experiments and obtained results.
5.1. Applied supervised algorithms F-score Precision Recall
SVM 70.64% 69.66% 72.28%
NB 70.02% 71.76% 68.86%
KNN 66.44% 63% 73.82%
RF 66.84% 66.2% 76.02%
J48 64.59% 63.49% 65.85%
As shows Table 5, the best F-score is given by the SVM classifier

(70.64%) which outperforms the results given by other classifiers.
The second F1 close to the SVM F-score value is given by the NB
algorithm.
5.3. Ablation experiments results facts. The second method relied on co-occurrence score to extend
a 1336 subjective adjectives seed list. [4] focused on subjective
We performed an ablation experiments where we repeat the same nested clauses classification and proposed a machine learning
classification process using SVM, but remove one feature group at classification approach with wide range of syntactic features. [5]
a time. We aim here to detect the impact of each feature group on proposed an unsupervised classifier for subjective-objective
the classification perform and, then, identify the best data classification. Various supervised methods were proposed in [6]
representation. for sentiment analysis in reviews extracted from travel blogs. In
this work, the SVM classifier outperformed the Naive Bayes
Table3. The obtained F-scores during the ablation classifier.
experiments
Symbolic approaches such as proposed in [7, 8, 9] provided a
better text analysis to represent the grammatical and semantic
structure of analyzed text. However, such methods require an
Experiment F-score Gain/loss
important manual work. In a similar context, [10] proposed a
Run 1 All features 70.64% ------ hybrid approach for sentiment classification applying different
70.17% classifiers in series. The approach was tested on movie reviews,
Run 2 All - general-purpose lexicon -0.5%
product reviews and MySpace comments. For the rule based
Run 3 All - tweet-specific lexicon 69.7% -0.94% classification, the authors combined existing rules with the ID3
70.59% and RIPPER induction algorithms to generate induced rule sets
Run 4 All - Linguistic features -0.05%
for classification. They also proposed a machine learning based
Run 5 All - Syntactic features 71.95% +1.31 method applying the SVM classifier for positive and negative
70.37%
sentiment classification.
Run 6 All - sentence level features -0.27%
Other studies aimed to build subjective lexicons. [11] presented a
Run 7 All – N-gram features 65.9% -4.74% sentiment lexicon called SentiFul. The authors proposed methods
Run 8 All - Tweet specific features 70.94% +0.3% to automatically generate the lexicon using an existing affect
database. The generated lexicon was enlarged using direct
Firstly, we found that the n-gram features were the most useful. synonymy relations and morphologic modifications.
In fact, removing just the character n-gram features results in a [12] developed two tweet-specific sentiment lexicons for English
drop in performance (-4.74%). The results show that the sentiment and proposed a supervised lexicon based system for sentiment
lexicon features are useful. In fact, removing Tweet specific classification in short messages (tweets and SMS).
lexicon based features leads to a drop in F-score of 0.94%; where
Unfortunately, most of these systems are developed for the
the use of general purpose lexicon based features enhances the F-
English language and are not directly usable on other language.
score by 0.5 %. It is interesting to note that sentence level features
Only a few works try to deal with sentiment analysis for
(elongation, punctuation sequences, etc.) contributed to improve
morphologically rich languages such as Arabic. The most of the
the classification performance. Incorporating the linguistic tweets
proposed works dealt with texts from the web and social media.
structures has also positive impact on performance, since the use
There are two main requirements to improve sentiment analysis
of linguistic features improved performance by 0.05%.
effectively in any language and genres; high coverage sentiment
Removing the Tweet specific and syntactic features had almost no lexicon and tagged corpora to train the sentiment classifier. [13]
impact on performance. This can be happened because the exploited web data collected from micro-blogs, forums and online
discriminating information in them was also captured by some market services to propose YADAC, a multi-genre dialectal
other features such as character n-grams or lexicon inferred Arabic corpus. In [14], the authors described AWATIF, which is a
features. multi-genre corpus of Standard Arabic subjectivity and sentiment
analysis. Three resources were exploited during the extraction
6. Related work process: the Penn Arabic Treebank collection, a set of Wikipedia
Several approaches have been recently proposed to study various user talk pages and conversations from online forums. In the same
aspects of sentiment analysis such as subjectivity classification context, [15] developed a new annotated dataset for Arabic tweet
and classifying positive and negative language. Machine learning subjectivity and sentiment analysis. Furthermore, [16] offered a
techniques are mainly employed and have been applied to many manually labelled corpus for subjectivity and sentiment analysis,
different kinds of texts including customer reviews, newspaper collected from Twitter. While [17] presented a sentence-Level
headlines, novels, blogs, Twitter posts, etc. sentiment analysis system for Modern standard Arabic (MSA) and
To classify sentiments conducted to single words, [1] used a Egyptian Dialect. They proposed a lexicon of Arabic sentiment
supervised learning algorithm that automatically retrieves the words labelled as positive, negative and neutral. In addition to
adjectives’ polarity from conjunction constraints collected from a lexicon based features, they used many other features. However,
large corpus. A specific unsupervised learning algorithm based on the most of these resources are not publicly released yet.
the mutual information statistical computing was applied in [2] to Different approaches have applied language independent features
identify associations between sentences and the words “excellent” selection methods to perform sentiment analysis in Arabic texts,
and “poor”. In this work, only singletons “excellent” and “poor” such as genetic algorithms. [18] applied this method to identify
were used as seed words having respectively positive and negative discriminant features for both Arabic and English languages. The
potential subjectivity. While [3] developed a statistical approach authors used many types of features, except semantic ones
for proposition opinion classification, comprising two methods. because they are language dependent. The proposed system was
The first method used TREC 8, 9, and 11 text collections to evaluated on movie reviews texts. Another way of performing
compute, for each word, the difference between the frequency in sentiment analysis in Arabic texts consists in applying hybrid
subjective documents and the frequency in documents containing classifiers. For example, the work of [19] aimed to experiment
sentiment analysis in a bilingual Arabic-English corpus using [8] Vernier, M., Monceaux, L., Daille, B.L. Catégorisation des
SVM and Naive Bayes (NB) classifiers. The applied features are évaluations dans un corpus de blogs multi-domaine. In :
both numeric and lexical. Furthermore, the experiments Revue des nouvelles technologies de l'information, pp. 25
performed by [15] used NB in order to prove the performance of (2009)
the baseline they proposed, including various features. Other [9] Chardon, B., Muller, S., Laurent, D., Pradel, C., Séguéla, P.
works used Arabic specific features in performing sentiment Chaîne de traitement symbolique pour l'analyse d'opinion -
analysis. In this context, [20] proposed an approach of customers' l'analyseur d'opinions de Synapse Développement face à
Arabic comments mining which is based on new slang sentiment Twitter. Proc. In : DEfi Fouille de Textes, Caen, France
words extracted from web resources. The authors applied SVM (2015)
classifiers to decide if a given comment conducts satisfaction or
dissatisfaction. [10] Prabowo, R., Thelwall, M. Sentiment analysis: A combined
approach. In: Journal of Informetrics, vol. 3, pp. 143-157
7. Conclusion and perspective (2009)
In this work we focused on sentiment analysis for MSA and using [11] Neviarouskaya, A., Prendinger, H., Ishizuka, M. SentiFul: A
a corpus of Arabic Twitter posts. Lexicon for Sentiment Analysis. In: IEEE Transactions on
We aimed to deal with the complexity of Arabic language. Affective Computing, Vol. 2, No. 1, pp. 22-36 (2011)
Therefore, various relevant and rich feature sets have been applied
to represent Arabic specific structures (negation, question, etc.). [12] Kiritchenko, S., Zhu, X., Mohammad, S. M. Sentiment
We investigated the impact of exploiting general-purpose and analysis of short informal texts. In: Journal of Artificial
tweet-specific sentiment lexicons on classification performance. Intelligence Research archive. Vol. 50, Issue 1, pp. 723-762
We also implemented a variety of features based on surface form (2014)
and syntactic categories. The obtained results are promising given [13] Al-Sabbagh, R., Girju, R. Yadac: Yet another dialectal
the Arabic Natural Language Processing challenges, and this Arabic corpus. In: the Eight International Conference on
encourages us to continue working on this topic. Language Resources and Evaluation, Istanbul (2012)
It will be helpful, for future work, to test other methods for [14] Abdul-Mageed, M., Diab, M. Awatif: A multi-genre corpus
association degree estimation to build general purpose lexicon and for modern standard Arabic subjectivity and sentiment
improve its coverage. It’s also interesting to increase the SA analysis. In: the Eight International Conference on Language
training data size, and identify its impact on classifiers Resources and Evaluation, Istanbul (2012)
performance. In addition, we plan to improve the data
representation by exploring other features groups that may be [15] Mourad, A., Darwish, K. Subjectivity and Sentiment
more discriminative such as words embedding. Analysis of Modern Standard Arabic and Arabic Microblogs.
In: the 4th Workshop on Computational Approaches to
Subjectivity, Sentiment and Social Media Analysis,
Association for Computational Linguistic. pp. 55–64,
8. REFERENCES Atlanta, Georgia (2013)
[1] Hatzivassiloglou, V., McKeown, K. R.: Predicting the [16] Refaee, E., Rieser, V. An Arabic Twitter Corpus for
semantic orientation of adjectives. In: ACL’97, pp.174–181. Subjectivity and Sentiment Analysis. In: the 9th International
Madrid, Spain (1997) Conference on Language Resources and Evaluation
[2] Turney, P. D.: Thumbs up or thumbs down? Semantic (LREC'14). Reykjavik, Iceland (2014)
orientation applied to unsupervised classification of reviews. [17] Ibrahim, H. S., Abdou, S. M., Gheith, M. Sentiment analysis
In: ACL’02. Philadelphia (2002) for Modern Standard Arabic colloquial. In: International
[3] Bethard, S., Yu, H., Thornton, A., Hatzivassiloglou, V., Journal on Natural Language Computing (IJNLC) Vol. 4,
Jurafsky, D.: Automatic extraction of opinion propositions No.2 (2015)
and their holders. In: the Association for the Advancement of [18] Abbasi, A., Chen, H., Salem, A. Sentiment analysis in
Artificial Intelligence(AAAI-04), San Jose, California (2004) multiple languages: feature selection for opinion
[4] Wilson, T., Wiebe, J., Hwa, R. Just how mad are you? classification in Web forums. In: ACM Transactions on
Finding strong and weak opinion clauses. Proc. In: the Information Systems, vol. 26, no. 3, article 12. (2008)
Association for the Advancement of Artificial [19] Rushdi-Saleh, M., Martin-Valdivia, M., Ureña-López, L.,
Intelligence(AAAI-04), San Jose, California (2004) Perea-Ortega, J. Bilingual Experiments with an Arabic-
[5] Wiebe, J., Riloff, E. Finding Mutual Benefit between English Corpus for Opinion Mining. In: Recent Advances in
Subjectivity Analysis and Information Extraction. In: IEEE Natural Language, pp. 740–745, Hissar, Bulgaria (2011)
Transactions on Affective Computing, Vol. 2, No. 4, pp. [20] A. Soliman, H. T., M. Ali, M., Hedar, A. R., Doss, M. M.
175-191 (2011) Mining social networks’ Arabic slang comments. In: IADIS
[6] Ye, Q., Zhang, Z., Law, R. Sentiment classification of online European Conference on Data Mining 2013 (ECDM'13),
reviews to travel destinations by supervised machine learning Prague,Czech Republic (2013)
approaches. In: Expert Systems with Applications Vol. 36, [21] Bouchlaghem, R., Elkhelifi, A., Faiz, R. Opinion mining in
Issue 3, Part 2, pp. 6527–6535 (2009) Microblog Texts using machine learning techniques. In: the
[7] Maurel, S., Dini, L. Exploration de corpus pour l’analyse de Knowledge Discovery and Data Analysis (KDDA 2015),
sentiments. In : DEfi Fouille de Textes, Paris, France, pp. 11- Alger’s, Algeria (2015)
23 (2009) [22] Green, S., Manning, C. D. Better Arabic Parsing: Baselines,
Evaluations, and Analysis. In: COLING (2010)
[23] Pang, Bo, Lee L., Vaithyanathan, Shivakumar. Thumbs up? Language, Speech and Communication, The MIT Press.
Sentiment Classification using Machine Learning Cambridge, Massachusetts, chapter 9, pp. 216-237. (1998)
Techniques. In: EMNLP (2002) [29] Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann,
[24] Heiden, S., Magué, Pinceminb, J. P. Txm : Une plateforme P., Witten, I. H. The WEKA Data Mining Software: An
logicielle open-source pour la textométrie conception et Update. In: SIGKDD Explorations, vol. 11, Issue 1. (2009)
développement. In JADT 2010, p. 1021–1032 (2010) [30] Morlane-Hondère, F., D’hondt, E. Feature engineering for
[25] Church, K. W., Hanks, P. Word association norms, mutual tweet polarity classification in the 2015 DEFT challenge. In:
information, and lexicography. In: Computational DEfi Fouille de Textes, Caen, France (2015)
Linguistics. 16 (1): 22–29. (1990) [31] Yi, J., Nasukawa, Bunescu, R., Niblack, W. Sentiment
[26] Turney, P., Littman, M. L. Measuring praise and criticism: analyzer: Extracting sentiments about a given topic using
Inference of semantic orientation from association. In:ACM natural language processing techniques. In: the 3rd IEEE
Transactions on Information Systems, 21(4). (2003) International Conference on Data Mining (ICDM), pp. 427–
[27] Hu, M., Liu, B. Mining and summarizing customer reviews. 434. (2003)
In: the ACM SIGKDD Conference on Knowledge Discovery [32] Bouchlaghem R., Elkhelifi A., Faiz R. Sentiment analysis in
and Data Mining (KDD), pp. 168–177 (2004) Arabic Twitter posts using supervised methods with
[28] Fellbaum, C., Grabowski, J., Landes, S. Performance and combined features. Proceedings of the 17th international
confidence in a semantic annotation task. In: C. conference of Computational Linguistics and Intelligent Text
FELLBAUM, Ed., WordNet: an electronic lexical database, Processing, CiCLing 2016, Konya Turkey, (2016).

A Machine Learning Approach For Classifying Sentiments

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

A Machine Learning Approach For Classifying Sentiments

Caricato da

Copyright:

Formati disponibili

A Machine Learning Approach For Classifying Sentiments

To perform the classification step, we decided to apply various

As shows Table 5, the best F-score is given by the SVM classifier

Potrebbero piacerti anche