Sei sulla pagina 1di 15

INFORMATICS INSTITUTE OF TECHNOLOGY

In Collaboration with
UNIVERSITY OF WESTMINSTER (UOW)

BEng/BEng. (Hons) in Software Engineering

Final year Project 2018/2019


Project Initiation Document
For
Multilingual Sentiment Analysis on Social Media Platforms
By
2014509 - Sashini Piyaratne
Supervised By
Krishnakripa Jayakumar

……………………….. ……………………..
Signature of Supervisor Signature of Student
Chapter Overview

The purpose of this chapter is to come to an agreement with the supervisory body about
the scope, features of the prototype, project aim, outline of the project and project timeline.

This chapter will help the author to carry out the research as it act as a guide. It includes
an introduction about sentiment analysis and how it is done with multilingual languages. ​The
author wishes to discuss the problems of the ​multilingual ​sentiment analysis for social media,
solutions for those problems and previous works that had been done in that area.​ This chapter
also includes the aim and features of the prototype.

Background

In the last decade, Social media has seen a phenomenal growth and billions of people
across the globe use it daily for sharing ideas and feedback and exchanging information with
each other. Various social media channels like videos, blogs etc. and social networking sites like
Facebook, Twitter, LinkedIn, Telegram, Google+, Instagram etc. have made it easier for people
to stay connected and updated irrespective geographical boundaries and other cultural
restrictions.

Social media is a proven strategy for digital marketing and advertising. Various business
houses have resorted over social media to promote customer awareness and loyalty over their
values, product line, unique selling proposition etc thus obtaining customer feedback and
adjusting to what market demands for.

And also people tend to share their thoughts and facts about their life, knowledge they
have on different areas, experiences they have gone through, through social media. And
undoubtedly, this tendency is growing.

They actively participate in various social activities by expressing their opinions and
stating their comments on that particular activity. When it comes to sharing experiences, people
tend to use both negative and positive comments to describe their personal feeling and attitudes.
These types of attitudes can be considered as sentiments.
Also people who live in countries like Sri Lanka, are tend to use different languages
based on ethnicity and religion such as English, Sinhala, Tamil and also a multilingual language
base such as Tamil words written in English, Sinhala words written in English (Singlish) to
communicate with others. Out of the above languages mostly in usage are English and Singlish.

Sentiment analysis is the field of study that analyzes people’s sentiment from written
language. It can be performed at three different levels.
1. Document level
2. Message level
3. Aspect level

The computational treatment of opinion, sentiment, and subjectivity has recently attracted
a great deal of attention, in part because of its potential applications. For instance,
information-extraction and question-answering systems could flag statements and queries
regarding opinions rather than facts
(​http://delivery.acm.org/10.1145/1220000/1218990/p271-pang.pdf?ip=203.189.65.141&id=1218990&acc=OPEN&
key=4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35%2E6D218144511F3437&__acm_
_=1540795618_5142ddc2e177333edd76d7204b171ace​)

Sentiment analysis systems are being applied in almost every business and social domain
because opinions are central to almost all human activities and are key influencers in our
behaviors. Our beliefs and perceptions of reality, and the choices we make are largely
conditioned on how others see and evaluate the world. For this reason, when we need to make a
decision we often seek out the opinions of others. This is true not only for individuals but also
for organizations. ​(​https://www.morganclaypool.com/doi/abs/10.2200/s00416ed1v01y201204hlt016​)

Social media content takes a major role when it comes to sentiment analysis. People tend
to use social media mostly to express their feelings and thoughts. So analysing sentiment in
social media with different languages makes sentiment analysis much more easier.
Problem Domain

The Social media is a huge virtual space where a person can express and share individual
opinions, influencing aspects of life, with implications for marketing and communication alike.
People make judgements around them when they are living in the society. Blogs, online forums,
social media sites such as facebook, Twitter, YouTube, Instagram etc.can capture views or word
of mouth of people around the world. Communication and the availability of these real time
opinions from people around the world make a revolution in computational linguistics and social
network analysis. Thus, Social media is becoming an increasingly more important source of
information for an enterprise.
(​https://www.researchgate.net/profile/Saminda_Premaratne/publication/268817500_Sentiment_
Analysis_for_Social_Media/links/5478afec0cf293e2da2b2aa0.pdf​)

As a platform they tend to use social media to make positive and negative comments
about these things while the purpose of having social media is to connect individuals in the
society and motivate them to share their experience with others.

And also businesses make use of social media to promote and market their products and
services. ​Most marketers believe that there are no absolute means to measure sentiments.
According to online strategist, Thomas Walker, you can understand the effectiveness of a
business campaign by designing and running a campaign in such a way that it is able to connect
to the sentiments of more and more people.​ ​The customers believe on reviews on the products
when they are purchasing a product or a service. And similarly, as a platform of reviewing social
media is used. (​https://ieeexplore.ieee.org/document/6425642​)

With this sharing system, businesses drive to collect information about their companies,
products and to know how reputed they are among the people and thereby take decisions to go
on with their businesses effectively.

Therefore it is clear that sentiment analysis is a key component of leading innovative


“Customer Experience Management” and “Customer Relationship Marketing” focused
enterprises. Moreover for businesses looking to market their products, they can identify new
opportunities and manage their reputation through this.

But most of the social media posts have been written in English Sinhala or Singlish
(sinhala words written in english) when it comes to sri lanka. The problem in sentimenting these
kind of posts is them having a mix of languages such as Singlish (Sinhala words written in
English) Tamil words written in english etc. Most solutions that are available on market today is
only to process a single language such as Sinhala, english, tamil etc. In these kind of scenarios
there is no way to use algorithms that have been implemented to analyze a single language.

This paper reveals an approach on developing a algorithm to do multilingual sentiment


analysis so that the countries like Sri lanka which the national language not English can also
involve in sentiment analysis.

Therefore an algorithm to analyze sentiment in Singlish content in social media is


proposed with a dashboard to show the results which fulfills main objectives of the project. The
problem is addressed considering nlp researchers who is trying to analyse sentiments in social
media to get various outputs based on their research, who have stuck in processing multilingual
languages.

Draft Literature Review

Problem Justification

The majority of current sentiment analysis systems address a single language,


usually English. However with the growth of internet and social media uses express their
feelings and thought using different languages. Sentiment analysis in only single
language increases the risks of missing essential information in texts written in other
languages.
(​https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4981629/pdf/12559_2016_Article_941
5.pdf​) In order to analyse data in different languages, multilingual sentiment analysis
techniques takes a major role.

Existing Systems

Since it has been identified that the sentiment analysis on social media is important there are few
pieces of research done in this area over the past decade.
Title and the Introduction Features Limitations
Owner

Adobe Adobe Social Monitor and measure popular social ● Does not have
SocialAnalytics Analytics basically media platforms multilingual sentiment
by Adobe measures the impact of ● Facebook analysis.
Systems social media on ○ Facebook Measurement ● No real time process
Incorporated, businesses by ○ Facebook App Measurement
powered by understanding how ● Youtube
Omniture. conversations on social Viral Video Measurement
networks and online ● Twitter
communities influence ○ Twitter Measurement
marketing Integrate social media data with other
performance. After digital analytics.
capturing and
understanding the
conversations going
on, it correlates the
impact of those
conversations with key
business metrics such
as revenue and brand
value.

Brandwatch Brandwatch is also a Social Media Monitoring ● Does not have


Sentiment sentiment analysis tool ● Social Reporting multilingual sentiment
Analysis by developed by a team of ● Sentiment Scoring analysis.
Brandwatch PhD qualifiers in the ● Influencer Identification ● No real time process
United Kingdom; this ● Social Measurement
is also commercially ● Social Listening
available currently.
Through this tool they Reports and Dashboards
are trying to access ● Competitor Analysis
whether a sentiment is ● Follower Analysis
positive, negative or ● Content Engagement Analysis
neutral. ● Paid Campaign Tracking
● Attribution

TweetFeel TweetFeel is also a Analysis based not just on emoticons, ● Does not have
web tool that analyzes but also words and phrases. multilingual sentiment
sentiments of the given analysis.
input through the Build customized stories to monitor ● No real time process
twitter social media. exactly how people are talking about
This gathers real time each search term.
data on Twitter, about
the search items and Ability to create very specific and
evaluates those tweets accurate search terms by specifying
into positive and terms that must be searched and
negative categories in terms that must not be searched.
real time.

Determining the Sentiment ● Does not have


Semantic classification is a multilingual sentiment
Orientation of recent sub discipline of analysis.
Terms through text classification ● No real time process
Gloss which is concerned not
Classification with the topic a
by Andrea Esuli document is about, but
and Fabrizio with the opinion it
Sebastiani expresses.

Algorithmic Analysis (IT aspect)

To come to a conclusion through a proper analysis it needs proper sentiment. There are
millions of sentiment data in web. But the format of those data is not in a proper manner due to
which it has become difficult to get the maximum value out of it.

This research will mainly focus building a algorithm that contains sentiment analysis on
social media content for English, Sinhala and Singlish.

When it comes to sentiment analysis in english language there are in build algorithms
which have been implemented in past decade. Adobe Social Analytics, BrandWatch Sentiment
Analysis, Sentiment140, Social Mention, TweetFeel are some of them. Adobe social analytics
measures the impact of social media on businesses by understanding how conversations on social
networks and online communities influence marketing performance [3]. Brandwatch is also a
sentiment analysis tool developed by a team of PhD qualifiers in the United Kingdom[4].
Sentiment140 is an online tool for analyzing sentiments of Twitter social network. Social
Mention is a social media search and analysis platform which analyses user sentiments through
social media[5]. TweetFeel is also a web tool that analyzes sentiments of the given input through
the twitter social media.
(​https://www.researchgate.net/profile/Saminda_Premaratne/publication/268817500_Sentiment_
Analysis_for_Social_Media/links/5478afec0cf293e2da2b2aa0.pdf​).
When considering all those mentioned algorithms, Adobe social analytics gets real time
data and understands how the conversation goes with past data. And also it monitors the
behavioral patterns. Sentiment140 focuses on Spanish as well. tweetFeel only considers twitter
feeds. Social mention considers on reviewing products and services. Therefore when taken as a
whole, most compatible & feasible library to use for english sentiment is Adobe social analytics.

When it comes to sentiment analysis for sinhala language there aren’t much in built
libraries. Drupal, Algorithm that has used naive bayes classifier are some of them. From
mentioned algorithms drupal is more speedy and its accuracy is also good. So considering those
reasons, drupal can be considered the best library to use for sinhala sentiment analysis.

Singlish language does not have any in built algorithms for sentiment analysis. This
research propose an algorithm for singlish sentiment analysis which will convert the singlish text
into sinhala and use the defined algorithm for sinhala sentiment analysis.

Reflection on research gap (Should be an IT contribution)

When it comes to sentiment analysis mostly used language is English. But there are
existing systems for languages like Sinhala, Tamil, Spanish, Chinese too. There aren’t any
research for multilingual sentiment analysis like Singlish. By doing this research, it will help to
gather data and do a sentiment analysis even for multilingual languages.
(​https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4981629/pdf/12559_2016_Article_9415.pdf​)

Motivation

In countries like Sri Lanka, they use a multitude of languages such as English, Sinhala,
Tamil and also a mixture of words from those three languages . When it comes to sentiment
analysis, even though there are algorithms to process main languages, no such tool is available
for mixture of these languages.

Due to this problem there is a gap in sentiment analysis for such kind of languages. The
fact that, a solution does not exist to date, that solves all of these problems, is the major
motivation behind the proposed solution.
Project Aim

To design, develop and evaluate a dashboard which will show the sentiment on English
Sinhala and mixture of languages such as Singlish posts on social media.

Further elaborating, the aim of this project is to create a dashboard that will show the
sentiment on text posts which have been written in English, Sinhala and mixture of languages
such as Singlish (Sinhala words written in sinhala) in social media which can be used as a
platform to analyze the sentiment on social media.

Research Objectives

In order to achieve the aim of the project some of the objectives are defined below.

1. Prepare the project initiation document to outline the overview and background of the
problem, and clearly describe the aim the project is planning to achieve, along with the
project deliverables.
2. Identify the suitable project management strategy that will aid in achieving the aim of the
project while clearly identifying the different stages of the project. (research, design,
development, testing and evaluation)
3. Conduct a literature survey and prepare a review from the research of the following areas,
while performing an in-depth review on the problem domain as well as the existing
solutions.
a. Evaluate and identify the current multilingual language sentiment analysis
application.
b. Identify the approaches that have been taken to implement existing systems.
4. Prepare a System Requirement Specification by identifying stakeholders, and perform a
requirement elicitation.
a. Identify stakeholders and design the diagrams.
b. Identify functional and non-functional requirements.
5. Prepare a design specification to elaborate the outline architecture of the project.
6. Prepare an interim report halfway through the project to show the current progress of the
project and the status of the research.
7. To display the capabilities and the impact of the project in the field of networking.
8. To test cases to test the capabilities of switching to custom protocols on the fly. Also
make sure the project works smoothly by testing all the features. Also document a
analyse all the results of the tests.
9. Project works smoothly by testing all the features. Also document a analyse all the results
of the tests.
10. Document all findings on all aspects of the project, and prepare a final thesis with future
work.

Research Methodology

According to Dubois and & Gadde (2002) there are two categories of researches.
1. Inductive researching (top-down" approach)
2. Deductive researching ("bottom up" approach)

Deductive approach is to narrow down the topic from a broad area. It begins with the
observations and theories are proposed towards the latter part of the research process.

Inductive approach is moving from the specific idea to broader area. This approach is
usually starts with a hypothesis.

Therefore, this project is categorized under the Inductive research approach as the aim of
the project is to implement algorithm for multilingual sentiment analysis. And for this research I
used to refer journals, research papers and technical documents published regarding NLP,
machine learning and sentiment analysis, etc.
Rich Picture

Project Requirements and insight of the project


scope

Project Requirements

Functional Requirements

● Algorithm for Sentiment analysis for English language, Sinhala Language and Singlish
Language
Non-Functional Requirements

● Accuracy - The results of the algorithm should be more accurate


● Performance - Since its a real time batch job, it should not slower the system
● Security - The security of the data collected should be protected

Project Scope

When it comes to multilingual sentiment analysis there can have many combinations such
as English + Sinhala, English + Tamil, English + Spanish,etc. and also the approaches taken to
build algorithms is different. Some of the approachers that have been used is as following.
● Translating documents from their original source language to English, and then
performing sentiment analysis with English-based approaches.
● Translating documents from English to the target language of the sentiment
analysis method.
● making use of a lexicon with sentiment-denoting words for all considered
languages.

Out of all these combinations, the selected combination is sinhala words written in
English in combination of English words in a given text.

And also social media is huge area where it covers all types of sites such as
facebook,twitter and also blogs, forums etc. this research will only consider facebook as the
social media platform.

The security of the system will not be captured in this project, and it is out of the project
scope.

Resource Requirements

The resource requirements are subjected to changes.


Hardware Requirements Software Requirements

Core i7 2.13 GHz processor IntelliJ IDEA - To build the algorithm


16GB DDR3 RAM Java SDK 1.8 - To build the algorithm
SQL Server Management Studio - Save Data
JavaScript - Develop the dashboard
Microsoft word - To compose the document
Microsoft project - To draw the timeline
Survey tool - To conduct the survey on requirement analysis

Contribution
Considering sentiment analysis as the problem area there have been implanted algorithms for sentiment
analysis in single languages. My contribution is to implement an algorithm for ​multilingual sentiment
analysis which will only consider English and Sinhala.

Ethical Aspects

Social: - Ethical: ​-
● Impacts on the individual’s ● Privacy Breaches.
profile, when unwanted parties ● Eavesdropping
notice the proposed mood for an ● Ethical Conduction of
individual. Research
● Unexpected conflicts/arousals. ● Involvement of sensitive topics
● Impact on the Social media ● Involvement of deliberately
security. misleading the participants
● Impacts on psychological stress or
anxiety or cause harm or negative
consequences beyond the risks
encountered in normal life?

Legal: - Professional: -
● Data Protection Act Violations ● Public Interest
● Computer Misuse Act Violation ● Due regard for public
● Data Privacy Violations ● Privacy procedures
● Personal Safety Violations

Work plan
Document Summary
As aimed, a dashboard will be developed to show the results of English sentiment
analysis, Sinhala sentiment analysis and mixture of languages sentiment analysis in social media.

References
https://www.researchgate.net/profile/Saminda_Premaratne/publication/268817500_Sentiment_
Analysis_for_Social_Media/links/5478afec0cf293e2da2b2aa0.pdf

[3] Adobe® SocialAnalytics, powered by Omniture®.

[4] Brandwatch. [Online].​http://www.brandwatch.com

[5] Sentiment140.[Online]. ​http://www.sentiment140.com

https://www.adobe.com/aboutadobe/pressroom/pressreleases/201103/030911AdobeOmnitureSocialAnaly
tics.html

http://www.omniture.com/offer/1228?s_iid=38573

https://www.g2crowd.com/products/brandwatch/features

https://mashable.com/2009/07/13/tweetfeel/#PSBSnQbjhiqh

https://www.crunchbase.com/organization/tweetfeel#section-overview

https://www.pr.com/press-release/191119

http://ontotext.fbk.eu/Publications/CIKM05-short.pdf

Potrebbero piacerti anche