Sei sulla pagina 1di 2

The hurdle was for which phrase we should

Panama Release search on Twitter to get data about all the


entities appearing in The Panama. Manually
Sentiment Analysis analyzing the tweets about panama, we realized
that most of the tweets that are about Panama
Prologue: Release, contains a phrase Panama Papers or
some #Panama; regardless of the point that
This is a Twitter analysis that is used to analyze whether the tweets is about an entity or
the sentiment response of public about their generally a comment on Panama Release. So we
political leaders, celebrities, heroes etc. decided to use the word Panama for search
regarding Panama Papers. We have mined purpose. When Panama was searched, a lot
through Twitter data to see how public of tweets appeared about Panama (A republic
sentimentally response to the people appearing on the Isthmus of Panama; achieved
in Panama. independence from Colombia in 1903), or
Panama Canal as garbage to us. Then we figure
The Panama release (May-10-2016) contains
out some words that our tweets must not
about 42000 names; a lot of them are unknown
contain. Here is the query that was used to
to the general public. So we short listed the
search Twitter.
people appearing in Panama Papers. For this
purpose, we used the list given by Wikipedia, "panama -summer -itunes -travel -orlaeans -
which contains about 417 names. gulf -shore -ocean -jungle -pic -photo -rio -
iceland -trip -sky -beautiful -party -hat -tour -
To carry out all of this process, we used two
canal -food -city -beach -born -florida -visit
programming languages: Java for Data Mining
since:2016-05-10 until:2016-08-14"
and R language for sentiment analysis. The
reason behind why we did not use R for getting Twitter was searched with this query and result
data from Twitter is that: R was not returning of 10692 tweets was saved to local database.
sufficient data from Twitter API. In our case, if
we search on Twitter web site for Emma (ii) Query local database to get data about
Watson + panama, about 25 tweets appear on a particular entity.
browser; on the other hand, if we search the
We have a list of names of persons appearing in
same phrase using R, then only four tweets are Panama Papers, from Wikipedia, in a file. We
returned. Furthermore, when we used Java for
searched in local database for tweets about
such search query, then sufficient number of
each entity one by one. When tweets of an
tweets was returned. entity are obtained from local database,
sentiment analysis is performed on them and
Framework: results are stored back in local DB.

So our, framework includes three steps: There is Jackie Chan in Wikipedia list, we
queried local database for tweets that contain
(i) Mine data from Twitter save this data
the name Jackie Chan. When the tweets are
in local database
returned, they are forwarded for sentiment
analysis.
Results:
Out of our list of 417 entities: 164 where such
(iii) Perform sentiment analysis on the
entities which were not tweeted about, 85 are
data returned from local database.
those entities, total tweets on them were less
We performed lexicon based sentiment than five; we ignored all of them. Remaining
analysis, for this we used R language, as it 168 entities, whose total tweets were more
provides some built-in functions that minimize than four, were considered in drawing out the
line of code. results.

The scheme used of sentiment analysis is that, Following is a graph of sentiment analysis of top
we got lists of 2006 positive words and 4783 10 entities on the bases of no. of total tweets
negative words. These are the words that are related to Panama appeared in Panama Papers.
mostly used on social media for positive and
negative expressions. No. of Positive and Negative
For sentiment analysis of tweets; as example words
Jackie Chan, we performed following steps on
Nawaz Sharif
tweets: Maryam Nawaz
Lee Shing Put
Remove all the punctuation marks,
Shahid Nazir
Remove all the usernames, Dan Gertler
Remove links, Ken Whitney
Remove stop words (the, there, he, she, Mark Thatcher
now, when, etc. ) as they have nothing Michael Mates
to do with sentiments, Michael Ashcroft
Gul Muhammad Tabba
Split all the tweets on the bases of
spaces and count the number of 0 100 200 300
positive and the number of negative no. of -ve words no. of +ve words
words in a set of tweets.

Negative count is subtracted from positive References:


count, if result value is negative then
sentiments about a person are negative, if the https://en.wikipedia.org/wiki/List_of_people_n
value is positive, then the public sentiments are amed_in_the_Panama_Papers
positive. In this way, sentiments are judged.
https://www.cs.uic.edu/~liub/FBS/sentiment-
(iv) Repeat step (ii) and (iii) until the list of analysis.html#lexicon
people named in Panama Papers is not
finished.

Potrebbero piacerti anche