Sei sulla pagina 1di 3

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2699



Classifi cation of Unwanted Messages i n Onli ne Soci al Network Using Machi ne Learning Algori thms
Padma Priya.B
#1
, Sathiyakumari.K
*2

#1
Research Scholar,*
2
Assistant Professor
PSGR Krishnammal College for Women
Bharathair University
Coimbatore
India



Abstract This One major fact in today's technical world, people
are very active users of Online Social Networks. They share
every details of their day to day life and are in touch with their
loved ones no matter in which part of the world they live. The
main issue is the ability to control the messages that are posted in
the user's private message or walls to detect and negotiate
unwanted messages. This work focus on predicting the emotions
of a particular message or post in various OSN like twitter, blogs
etc for emotion analysis so as to filter the messages which are
inappropriate. This paper focuses on collecting corpus for
sentimental analysis and performs linguistic analysis and
machine learning techniques for predicting emotions accurately.
Using the corpus we define distinct emotions and filter unwanted
messages.

Keywords Online Social Networks (OSN), information filtering,
short text classification, criteria-based personalization

I. INTRODUCTION
Online social network is one of the standard platforms for
social collaboration.. Unlike olden days, messages are send
through letters, telephones, emails etc.Due to the
overwhelming technical development people share their day
to day life details through social networking websites.
Continuous communication among people implies that there is
a considerable amount of data transfer which includes text,
audio, video which depicts one's human life information
explicitly. Interpersonal communication is a growing issue
where people tend to explore themselves, relationships and
social cultural artefacts. The huge and dynamic nature of this
data employs the researcher to mine or discover useful
information from online social networks. In online social
networks Information filtering can be used for more sensitive
purpose as there is a possibility of posting or commenting
texts or content those are inappropriate. In psychology and
philosophy emotion is a subjective conscience which is
categorized into different types. Here we deal with emotions
that are expressed using text for example tweets, comments
etc. The aimof the present work is to propose a systemwhich
will be able to classify the short text messages in different
categories and cordially filtering it. For learning model we use
SVM , Nave Bayes for classifying emotion. So for emotion
analysis for text is done in documents, stories, novels which
has its own limitations whereas here we predict the emotion
for user conversations, tweets, comments for a socially safe
environment since lately people try to misuse the privileges
and sometimes spammessages and vulgar content is exhibited
by users. First the text is classified in to five categories.
Primary emotions are detected like happy, sad, angry, surprise
and in non neutral two emotions are detected vulgar and
offensive.
The data is collected fromtwitter [2].As we need to
find the emotions of different people and different type of
conversation twitter is the exact mediumfor data collection.
Conversations fromblogs and micro blogging sites are also
collected. Nearly two thousand tweets are collected and a text
fromvarious online social networks is collected.
II.RELATED WORK
Adil et all [1] has studied human emotions in text in a
multimodal formwhich includes visual and acoustic features.
Alec Go et al [5] have classified the tweets as positive,
negative and neutral. Dan Roth et all [9] .Diana et all [14]
have used two data sets SemEval 2007 Task 14 and emotion
annotated blog corpus where they classify six basic emotion
using SVM and other machine learning algorithms. Schaffer
and Diana 2011 [16].
III.DATA SET
The data set is collect fromtwitter. Tweets are collected
fromweb [2].The data set had multilingual tweets. Foreign
language tweets have been removed from the dataset. The data
set has only tweets in English. The resulting data set has 7500
tweets.
TABLE I
EXAMPLE OF REFINED TWEETS
Honesty hurts. :) @im_rahultomar: frankly speaking i donno...
Im a proud human being but when it comes to being Indian i
dont know
@Jiah Khan no more? Unbelievable! She was so young.
Ritu Da, a sensitive artistic mind, a gentle human, considerate
and caring. Gone! Spoke while ago on doing another film
together!

A. Data Annotation
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2700

Emotion labeling is reliable and effective if there are more
than one judgment for each label. Five judges have manually
annotated the data. They have to label the data set (tweets) as
to which emotion category they emotion category it is
described as undefined.
B. Measuring Annotations
The interpretation of emotion analysis in text is very
subjective which leads to disagreement between judges. To
predict emotion effectively we use Cohens kappa method.
Cohens kappa is a statistical and efficient measure for inter
annotator agreement which helps in predicting the accurate
emotion of a particular text.
C. Learning Model and Feature Set
Our emotion classifier is based on Machine learning
algorithms. First fromthe collected data set the stop words are
removed and stemmed. The normalized data thus obtained is
used as vector for training the vector. The following features
extracted .They are Unigrams, Bigrams, Personal , pos, pos
bigrams, word net effect emotion lexicon, BoW,Dp .Each
word is stemmed using porter stemmer. Personal pronoun,
adjectives pos, pos bigramare extracted using Stanford Penn
Bank POS-Tagger. Word net effect emoticon lexicon captures
the contextual information of the particular text. Using these
features emotion of a text is defined. All proposed features are
analyzed in our experiment in order to find the combination of
most appropriate context message classification.
D. Experiment and Result
This section describes the data collections, classifiers and
other parameters used to conduct the experiments, as well as
the demonstrate results obtained using the tool. The open
source data mining tool Rapid Miner 5.There are two
classification algorithms are used for the emotion
classification, such as nave bayes and support vector machine.
These are implemented and trained using Rapid Minor. The
Rapid minor is a collection of state-of-the-art machine
learning algorithms and data pre-processing tools. . The
robustness of the classifiers are evaluated using 10 fold cross
validation for all the algorithms. Predictive accuracy is used as
a primary performance measure for predicting the emotions in
text. Precision, Recall, F Score are the parameters used in
evaluating the predictive accuracy there by comparing with
machine learning algorithms. Using these metrics and features
combined we compare the prediction accuracy with the two
machine learning algorithms.

TABLE III
COMBI NATI ON OF FEATURES IN TERMS OF PRECISION, RECALL, F SCORE.
FEATURES PRECISION RECALL F-SCORE
DP 38% 25% 32%
BoW 42% 29% 35%
Bigram 56% 30% 36%
Unigram 28% 45% 40%
Pos 63% 47% 49%
Pos Bigram 56% 58% 52%
Dp+BoW 65% 59% 57%
Dp+Bow+Bigram 55% 60% 59%
Dp+BoW+Bigram+Unigram 67% 64% 60%
Dp+BoW+Bigram+Unigram+Pos
Bigram
74% 67% 67%



TABLE IIIII
RESULT OF THE PROPOSED WORK IN TERMS OF PRECISION,
RECALL, F SCORE IN CLASS VALUES
Metrics Happy Sad Angry Vulgar Offensive
Precision 87% 53% 66% 65% 58%
Recall 78% 79% 69% 72% 63%
F Score 73% 81% 77% 80% 77%

TABLE IVV
PREDICTION ACCURACY COMPARED WITH TWO ALGORITHMS
NA VE BAYES AND SVM
classifiers Naive Bayes SVM
Time taken to
build
model(min)
3 5
Correctly
classified
instances
732 954
Incorrectly
classified
instances
115 95
Prediction
accuracy
67.64% 75%

The above table shows that comparison of NB and SVM. The
NB algorithmgives the low accuracy compare to SVM.
E. Future Work
In future work we can use other machine learning
algorithms and fuzzy neural network to create a hybrid of
algorithms in order to acquire more accurate results.
Online Social Network can use these text mining and
sentimental analysis techniques to a greater level so as to
filter unwanted text from the user wall.
International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 8August 2013

ISSN: 2231-2803 http://www.ijcttjournal.org Page 2701


IV. REFERENCES
[1] 1. Adil Alpkocak Jan 1 2008 AISB 2008 Convention Communication.
[2] Information gathered fromhttp://infolab.tamu.edu/resources
[3] http://archive.ics.uci.edu/ml/datasets/SMS+Spam+CollectionGo_Bhay
ani_Huang_2009
http://www.stanford.edu/~alecmgo/papers/TwitterDistantSupervision0
9.pdf CS224N Project Report, Stanford
[4] Cecilia Ovesdotter Alm, Dan Roth, Richard Sproat 01/2005; In
proceeding of: HLT/EMNLP 2005, Human Language Technology
Conference and Conference on Empirical Methods in Natural
Language Processing, Proceedings of the Conference, 6-8 October
2005, Vancouver, British Columbia, Canada
[5] KNOWLEDGE ENGINEERING: PRINCIPLE AND TECHNIQUE,
KEPT 2008 International Conference on Knowledge Engineering
Principles and Techniques Selected Papers, Cluj-Napoca (Romania),
J uly 2-4 2000
[6] Soumaya Chaffar and Diana Inkpen, "Using a Heterogeneous Dataset
for Emotion Analysis in Text", in Proceedings of the 24th Canadian
Conference on Artificial Intelligence (AI 2011), St-J ohn's, NFL,
Canada, May 2011, pp. 62-67\
[7] B. Liu. Sentiment Analysis and Subjectivity. Handbook of Natural
Language Processing, SecondEdition, (editors: N. Indurkhya and F. J .
Damerau), 2010
[8] B. Pang and L. Lee, Opinion Mining and Sentiment Analysis.
Foundations and Trends inInformation Retrieval 2(1-2), pp. 1135,
2008.
[9] J . Wiebe, T. Wilson, R. Bruce, M. Bell, and M. Martin, Learning
Subjective Language, Computational Linguistics, vol. 30, pp. 277
308, September 2004
[10] M. Hu and B. Liu, Mining and Summarizing Customer
Reviews, Proceedings of the AC SIGKDD Conference
on Knowledge Discovery and Data Mining (KDD), pp.
168177, 2004.
[11] N. J indal, and B. Liu. Opinion Spamand Analysis. Proceedings of
the ACM Conference on Web Search and Data Mining (WSDM), 2008.