Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract – Social networking sites these days are great implements classification, especially in a concrete
source of communication for internet users. So these are implementation, is known as a classifier. The term
important source for understanding the emotions of "classifier" sometimes also refers to the mathematical
people. In this paper, we use data mining techniques for function, implemented by a classification algorithm that
the purpose of classification to perform sentiment maps input data to a category[15].
analysis on the views people have shared in Twitter. We Text mining is the analysis of data contained in natural
collect dataset, i.e. the tweets from twitter that are in language text. It is used to process unstructured (textual)
natual language and apply text mining techniques – information, extract meaningful numeric indices from the
tokenization, stemming etc to convert them into useful text. Generally some information retrieval methods, or
form and then use it for building sentiment classifier natural language processing or some pre processing of text
that is able to predict happy, sad and neutral sentiments is done in order to make it useful for applying data
for a particular tweet. Rapid Miner tool is being used, mining(statistical and machine learning) algorithms. It is
that helps in building the classifier as well as able to most beneficial when -
apply it to the testing dataset. We are using two different 1) Summarizing documents
classifiers and also compare their results in order to find 2) Extracting concepts from text
which one gives better results. 3) Indexing text for use in predictive analytics[10]
In this work, we are using two different classifiers to
Keywords: Sentiment analysis, natural language, data extract the sentiments of tweets people are sharing in twitter
mining, text mining, Twitter. and classify them broadly into 3 categories – happy, sad and
neutral. And also compare the precision and recall of each
I. INTRODUCTION classifier and then also find out which classifier gives the
best result in terms of better precision and recall ratios and
Sentiment analysis refers to the application of natural accuracy.
language processing, computational linguistics, and text II. RELATED WORK
analytics to identify and extract subjective information from A lot of work has been carried out in the field of sentiment
the source materials. Sentiment analysis aims to determine analysis for the live data from the users in order to extract
the attitude of a speaker or a writer towards any topic or the sentiments of common people towards any topic, trend,
incident. It is the computational study of people’s opinions, products etc. The studies mainly focus on extracting useful
appraisals, attitudes, and emotions toward entities, information from the natural language of users and process
individuals, issues, events, topics and their attributes. For it to get the real sentiments. It has gathered interest with the
this, there is need for data mining as well as text mining ever growing use of internet by people to share their
techniques. Data mining, also called knowledge discovery in opinions. Osaimi and Badruddin[1] have worked on the
Databases, is the process of discovering interesting and sentiment analysis of the tweets in Arabic language. They
useful patterns and relationships in large volumes of data. build different classifiers by training them with proper
The field combines tools from statistics and artificial dataset i.e. processed tweets in Arabic language and then
intelligence (such as neural networks and machine learning) analyized the accuracy of these classifiers in order to predict
with database management to analyze large digital the correct sentiments. Pak and Paroubek [2] used the twitter
collections, known as data sets. It is the process of analyzing data as corpus to perform linguistic analysis and build a
data from different perspectives and summarizing it into classifier that is highly efficient. A very broad overview of
useful information. It employs data pre-processing, data the existing work was presented by Pang and Lee[3]. In
analysis, and data interpretation processes in the course of their survey, the authors describe existing techniques and
data analysis. Data mining uses sophisticated mathematical approaches for an opinion-oriented information retrieval.
algorithms[15]. Data classification is the process of Albert Bifet and Eibe Frank[4] have discussed the
organizing data into categories for its most effective and challenges that Twitter data streams pose, focusing on
efficient use. Classification is a data mining function that classification problems, and then consider these streams for
assigns items in a collection to target categories or classes. opinion mining and sentiment analysis.
The goal of classification is to accurately predict the target
class for each case in the data. An algorithm that
669
testing. Tokenize, filter tokens, transform case filter stop
words and stemming operators are used to perform text
mining related operations on the training and testing dataset.
It is important to pre process the dataset because data
available in natural language form cannot be directly used
with data mining techniques. We process the text files so
that we can use them for training the classifier and then
these classifiers can predict labels for testing dataset.
670
Figure 5: Results- Classification of tweets provided in the testing dataset using Naive Bayes Classifier
Figure 7: Results- Classification of tweets provided in the testing dataset using K-NN Classifier
671
Figure 8: Precision and Recall ratio for the classification
VI. REFERENCES
672