Sei sulla pagina 1di 16

Design and Validation of Sentiment Analysis Technique for Blogs

A Research Proposal
By

AJAY KUMAR
Reg. No. 1481022008

Under the Supervision of

Prof. (Dr.) Rama Sushil


Head, Dept. of IT
DIT University, Dehradun

Submitted in partial fulfillment of the requirements for the award of the degree of

Doctor of Philosophy
in
Computer Science & Engineering

DIT UNIVERSITY, DEHRAUDN


(State Private University through State Legislature Act No. 10 of 2013, Uttarakhand and Approved by UGC)
Mussoorie-Diversion Road, Dehradun - 248 009 Uttarakhand, INDIA

Session 2016 -17


Department of Computer Science & Engineering
DIT UNIVERSITY, DEHRADUN, UTTARAKHAND

Name of Student : AJAY KUMAR

Date of Enrollment : 13 Sept, 2014

Registration No. : 1481022008

Branch : Computer Science & Engineering (CSE)

Title of Degree : Doctor of Philosophy (PhD)

Batch : 2014-17

Title of Synopsis : Design & Validation of Sentiment Analysis Technique


for Blogs

Name of Supervisor : Prof. (Dr.) Rama Sushil


Head, Dept. of IT
DIT University Dehradun
Total no. of pages : 13

2
Table of Contents

S.No. Sections Page No.


1
1. Introduction

2. Literature Review 4

3. Identified Problem Gaps & Objectives 7

4. Motivation 8

5. Methodology 9

6. Time Scheduling & PERT Chart 10

7. Workshops & Conferences participated 11

8. References 12

3
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

1. Introduction
This is an era of massive data generated by all corner from heterogeneous organizations as well as all
types of users. Massive data is called Data Tsunami [12] and technically it is called Big Data. Big
data has become important now-a-days as many organizations both public and private have been
collecting massive amounts of domain-specific information, which can contain useful information about
problem such as national intelligence, cyber security, fraud detection, marketing and medical informatics.
Companies such as Google and Microsoft are analyzing large volumes of data for business analysis and
their prediction, impacting existence and future technology [13].
A key benefit of machine learning is the analysis and learning of massive amounts of supervised and
unsupervised data, making it a valuable tool for Big Data Analytics where raw data is largely unlabeled
and un-categorized. The Big Data problem can be resolved by machine learning including extracted
complex pattern from massive volume of data, semantic indexing, data tagging, fast information retrieval,
and simplifying discriminative tasks. Analytics of Big Data may provide a sentiment analysis of the
blogs data using intelligent computational method of machine learning. Polarity of sentiment analysis
may result positive, negative or neutral.

Big data is a collection of data sets so large and complex that it becomes difficult tom process using on-
hand database management tools or traditional data processing application.

Big Data is characterized by variety, velocity, volume and veracity [15] where

Volume: defines volume of data. IBM research finds that every day we add about 2.5 quintillion bytes
(2.5 * 1018) of data; Facebook alone adds 500 TB of data on daily basis; 90% of worlds data is
generated in last 2 years. Google processes about 1 petabytes of data every year.

Velocity: rate of data production and processing. The rate of data growth is also astonishing. Gartner
research finds that data is growing at 800% rate out of which 80% is unstructured. EMC research
indicates that data increasing is following Moores Law by doubling every 2 years.

Variety: variety represents the data types. The data i.e. getting added is also of various types ranging
from unstructured feeds, social media data, multi-media data, sensor data etc.

Veracity: how much of the data be trusted based on the reliability of its source.

Page 1 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

Big data is high-volume, high-velocity and high-variety information assets that demand cost-effective,
innovative forms of information processing for enhanced insight and decision making.

Predictive
Analysis

Visualization Data
Mining
Big Data

Machine Unstructured
Learning Processing
Algorithm
Natural
Language
Processing

Fig: Various technologies used in Big Data

The term Big Data symbolizes the analysis and treatment of data repositories of a colossal size, which
traditional data management systems and analytics are unable to deal with [17]. Big Data analysis now
drives nearly every aspect of society, including mobile services, retail, manufacturing, financial services,
life sciences, and physical sciences [11].

Data from various Data cleaning


Data integration
resources

Validation Model training Data filtering

Model scoring Predictions Recommendation


s

Fig : Flowchart for a typical data analytics based solution [5]

Page 2 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

Sentiment means feelings, attitudes, emotions, opinions. It is not facts but a subjective impression of
sentiments. It is a text mining which extracts some identity or characterize the content of a text unit. The
sentiments are such as for/against, like/dislike, good/bad etc. the common question may be raised in
sentiment analysis:
- Is this product review positive or negative?
- Is this customer email satisfied or dis-satisfied?
- Based on sample of tweets, how are people responding to this ad campaign/product release?
- How have bloggers attitude about the president changed since the election?

The other related tasks may be information extraction which is nothing but a discarding the subjective
information. The opinion based question may be designed for a task of automated question-answer. The
more tasks would be a purpose of summarization accounting for multiple viewpoints.

The sentiment analysis can be performed in flame detection, identifying the video for the suitability of
child based on comments, Bias identification in news broadcasting, identifying appropriate/ in-
appropriate content for ad placement. The Sentiment Analysis (SA) in Business Intelligence may be
applied for such a question; why arent consumers buying our laptop?
Ask for the online review about the product on Amazon, Blogs, and Tweets etc. The application from
SA have been useful in cross-domain like politics/political science, law/policy making, sociology,
psychology. In general, humans are subjective creators and opinions are important. Being able to interact
with people on that level has many advantages for information system.

Challenges in Sentiments analysis

1. People express opinions in complex ways


2. In opinion text, lexical contents alone can be misleading
3. Intra-textual and sub-sentential reversal, negation, topic change common
4. Rhetorical devices/modes such as sarcasm, irony, implication etc.

Page 3 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

2. Literature Review

In 2008, Ahmed Abbasi et al [24] proposed for classification of Web forum opinions in multiple
languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English
and Arabic content. A hybridized genetic algorithm called Entropy Weighted Genetic Algorithm
incorporates the information-gain heuristic for key feature selection. This EWGA algorithm evaluated
on movie review, US, and Middle Eastern web forum posting. Experiments results using EWGA with
SVM is performed over 91% accuracies on datasets.

In 2009, Yue Lu et al [26] discussed about the short comments rating given by the users for sellers,
products, services etc. on the basis of rated aspect summary of short comments, we conclude overall
rating for major selection or minor selection. She defined the problem and decomposed the solution into
3 steps and demonstrated the effectiveness of her method by using eBay sellers feedback comments.

In August 2009, Vishal Gupta et al [19] surveyed various text mining techniques which are the discovery
of unknown information of written resources automatically. Most text mining operations are Feature
extraction, Text-based navigation, search & retrieval, categorization (supervised classification),
clustering (unsupervised classification) and summarization. He also explained about trends analysis for
document collected over period of time, attribute analysis about document feature & pattern.
Visualization builds a graphical representation of document collected and location of specific document
in 2D/3D representation.

In 2011, A. Abbasi et al [25] proposed a rule-based multivariate text feature selection method called
Feature Relation Network (FRN) that considers semantic information and also leverages the syntactic
relationship between n-gram features. FRN is intended to enhance sentiment classification for
heterogeneous n-gram features. Comparative experiments are performed on uni-variate, multivariate and
hybrid feature selection methods by incorporating syntactic information about n-gram relations. FRN
was able to select features in efficient manner than multivariate and hybrid technique.

In 2012, Akshi Kumar et al [20] expressed her view by applying substantial research on the subject of
sentiment analysis, expounding its basic terminology, tasks and granularity level when web2.0 was
launched. She further gave an overview of state-of-the-art attempt to study sentiment analysis. She
showed the practical and potential application followed by well described issues and challenges that keep

Page 4 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

the field dynamic and lively for years to come. She also carried the further discussion for product review,
cross-domain, Blog-post, tweets and web pages.

FEB 2014, Muhammad Zubair Asghar et al. [10] identified the most & least commonly used Feature
Selection techniques to future work. He mainly focused on the opinion or sentiment generated by people
about the products, services, policies, and politics to seek the polarity of statement whether it is positive,
negative, likes or dislikes shared by the users for feature of particular products or services. He added the
performance of sufficient work in text analytics, feature extraction from a statement.

In LDAV conference 2014, Hyoji Ha et al [21] presented a new method to recognize intricate network
and cluster by connecting multi-dimensional scaling (MDS) and social network graph (SNG) which later
comprehend the feature of each node using Heat map visualization. He used Netizens movie review data
and followed by 1) calculated frequency of sentiments word from each movie review, 2) designed heat
map visualization to catch emotion appeared in each movies review, 3) each node location reflects the
frequencies of sentiment word by designing-movie network by MDS & SNG in terms of node & cluster,
4) Asterism graphics to facilitate cognitive interpretation.

In APRIL 2014 Walaa Medhat et al [9] said about Sentiment Analysis (SA) is a research field of text
mining. SA is the computational treatment of opinions, sentiments and subjectivity of text. There are
several field in SA viz transfer learning, emotion detection, building resources. He says that enhancement
of Sentiment Classification (SC) and Feature Selection (FS) algorithm. Nave Bayes (NB) and Support
Vector Machine (SVM) are the most frequently used Machine Learning algorithm for solving sentiment
classification problems. The most common Lexicon source used is WordNet which exist in language
other than English Building Resources. Using social network sites and micro blogging sites as a source
of data still needs deeper analysis. We need to make more research on content-based sentiment analysis.
Using Transfer Learning (TL) techniques, we can use related data to the domain in question as a training
data. NLP tools reinforce Sentiment Analysis process still needs some enhancement.

In July 2014 Haseena Rahmath P et al [8] explored about the data generated on social media, E-
Commerce, forum, blog etc. exchanges her views, ideas, suggestion and experiences about any services
or products. If the content of the web can be extracted and analyzed properly then it can act as a key
factor in decision making. As the content is unstructured in nature and it is written in natural language.
She showed a new research area on opinion mining or sentiment analysis i.e. an extension of data mining
that extracts and analyzed the unstructured data automatically. She also showed the comparative analysis

Page 5 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

of various technologies used Support Vector Machine (SVM) technique performed well with the high
accuracy than any other Machine Learning algorithm. New research can be explorer in the area of
Sentiment classification, sentiment analysis to handle short sentences, abbreviation and spam content as
well.

Dec 2014, Giovanni Acampora et al [22] explained about the e-commerce business through mobile as
well as PC desktop Internet is a boom to give review of a product. He introduced an innovative framework
for efficiently analyzing customer sentiments in textual review in order to compute their corresponding
numerical rating to allow companies for better plan in future business activities.

In April 2015, Xing Fang et al [23] aimed to tackle the problem of sentiment polarity categorization.
Data used for study is online product review collected from amazon.com. He experimented both the level
sentence-level categorization and review-level categorization for better outcomes.

In MAY 2016 Devika M. D. et al [1] discussed that sentiment analysis is an intellectual process of
extricating users feelings and emotions. Both travelers and customers find the information for their
understanding. Sentiment Analysis poses as a powerful tool for users to extract the needful information
as well as to aggregate the collective sentiments of the reviews opinion. She compared various
techniques used for SA by analyzing various methodologies. Research work may be carried out for better
analysis methods including semantics by considering n-gram evaluation instead of word by word
analysis. We may look up some other methods like rule-based and lexicon-based methods.

In APRIL 2016 Vishal A. Kharde et al [2] discussed on Sentiment Analysis of Twitter data to analyze
the blog where opinions are highly constructed, heterogeneous and are either positive, negative or neutral
in some cases. He also provided a survey and comparative analysis of existing technique for opinion
mining like Machine Learning and Lexicon-based approaches, together with cross-domain evaluation
metrics. He also discussed general challenges and application of various Machine Learning algorithm
like Nave Bayes (NB), Max Entropy, and Support Vector Machine (SVM) to Twitter data for Sentiment
Analysis. He later showed that SVM and NB have the highest accuracy and can be regarded as the
baseline learning methods while Lexicon-base method are very effective in some cases which requires
few effort in human-labelled document. He says more cleaner the data, more accurate can be obtained.
Using Bi-gram model, gives better accuracy as compared to other model. He suggested that combining
ML method with Opinion Lexicon method in order to improve the accuracy of Sentiment Classification.

Page 6 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

3. Research Gaps and Objective

1. We can find large feature sets in the blogs and this causes performance degradation due to
computational problems and thus proper features selection methods are essentially required [25].
2. Performance of clustering based feature extraction technique is domain dependent. We need to work
to create cross-domain for generalized problem [26].
3. Lexicon-structural features have limited work to carry out in feature extraction algorithm. It consist
of special symbol frequencies of word distribution and word level lexical feature which is rarely used
in opinion mining [24].
4. Feature selection technique depend on accuracy of POS tagging, designing and developing an
efficient non-rule based POS stagger is an issue to be resolved for non-English language. Existing
POS taggers are NLP linguistic parser, Stanford POS tagger, Gate ANNIE POS tagger, Claws POS
tagger, HMM model.
5. Lemmatization is more accurate than stemming. Development and evaluation of lemmatization for
un-segmented language (no clear boundaries).
6. Redundancy removal is a challenging task. N-gram is highly redundant causing redundancy problem
in both uni-variate and bi-variate method.
7. Because of different style of writing sentence by people, same sentence may arise polarity of being
negative, positive. Customer comments in free format need a lot of work to mine opinion for his
abbreviated and short words like cam camera, pic-picture, gud-good, gr8-great, f9-fine etc.
8. Identification of spam & fake review in blog is a serious concerned.

Research Work Plan Objective


In this research, following are the main objective:
1. Developing a POS tagger for better enhancement of lemmatization where featured data sets could be
generated after removal of noise, stop word from the blogs.
2. Need to categorize feature using Lexicon structural method for hot feature selection using Apriori
association rule mining (or Feature frequent mining) on the basis of adjective and adverb feature
indicators.
3. Feature pruning algorithm will be able to clean irrelevant feature. In first phase, POS tagger is used.
In next phase, Noun & noun phrases are cleansed. In the final step, nearest adjective may be identified
for redundancy and elimination.
4. Applying text mining methodology to the cleaned data to extract the polarity of the statements.

Page 7 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

4. Motivation
Motivation for sentiment analysis comes from todays era of hi-tech information technology. Without
social media our life is nowhere. Everyone is using social media app on mobile or PC. Now a days all
users familiar of internet are able to develop their own data. Data may be a blog, wall post, tweet, picture,
selfie, sms or various types of text messaging including multimedia type file, say, audio, video, picture,
contacts, etc. now we can say that organization as well as internet users are creating so many data on
social media app. Here data is increasing very rapidly. Management of Data is going to be tough today
onwards. Big Data is a new concepts where we can manage large data sets through various tools and
software.

We are going to work upon the text feed by the users on various social networking app. The text may be
simple text, feedback, movie review, political tweets, sport news, TV channel information. We will be
able to find the polarity of the statement given. Polarity means the blog may be positive, negative, or
neutral.

Page 8 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

5. Research Methodology
A study of blog analysis details the usage of Machine Learning techniques.

Blogs Collection To meet our research objective, our research will be divided into
various phases.
Phase 1: We will be collecting the blogs from Twitter.
Pre Processing Phase 2: Preprocessing will be done like syntactic & semantic
text analysis.
Phase 3: Feature categorization means feature generation and
Feature feature identification of words found in the strings.
Categorization
Phase 4: Feature selection demands statistical calculation for
finding the counting of each entity.
Phase 5: unwanted and redundant feature will be reduced in
Feature Selection
Feature cleansing
Phase 6: Text/Data Mining: application of many algorithm and
tools to find the actual text.
Feature Cleansing
Phase 7: Result analysis & Interpretation.

Text/Data Mining The data we collect has to be extracted and minimized the noisy data
to obtain the mined data.

Result Analysis &


Interpretation

Page 9 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

6. PERT Chart and Time Scheduling


Activity Activities Start Activities End Date
No. Date Duration
1. Course Work: 1st Sem of PhD Program 22/09/14 3 months 31/12/14
i. Research Methodology (4 credit)
ii. Cloud Technologies (4 credit)
2. Course Work: 2nd Sem of PhD Program 19/01/15 6 months 31/06/14
i. Fuzzy Logic & Genetic Algorithm (4 credit)
ii. Distributed System (4 credit)
iii. Seminar (1 credit)
3. Literature Review & Research Proposal 1/07/15 13months 01/08/16
Preparation
4. RDC -1 : Research Proposal Presentation 22/08/16 1 day 27/08/16
5. Working for fulfillment of Phase-1 28/08/16 4 months 28/12/16
6. RDC -2 Internal Progress Presentation 1/01/17 1 day 7/01/17
7. Processing work of Phase-2 10/01/17 6 months 10/07/17
Syntactic & Semantic text analysis
8. RDC 3: Progress Presentation 1/08/17 1 day 8/08/17
9. Literature Review for Feature categorization 9/08/17 5 months 1/01/18
and identification of strings (Phase -3) &
Working on Feature Selection (Phase-4)
10. RDC -4 : Progress Presentation 5/01/18 1 day 12/01/18
11. Working on Feature Cleansing (Phase-5) & 13/01/18 6 months 13/07/18
Implementation of various Machine Learning
Algorithm (phase -6)
12. RDC -5: Progress Presentation 20/08/18 1 day 27/08/18
13. Result Analysis (Phase 7) 1/09/18 4 months 1/01/19
14. RDC -6: Final Presentation 7/01/19 1 day 14/01/19
15. Thesis writing 15/01/19 5 months 15/06/19
16. Thesis Submission 16/06/19 1 day 20/06/19
Total 4 years
Time 4 months

Page 10 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

7. Workshops & Conferences Participated


Faculty development Programme (FDP) /Workshop/ STC attended
1. Participated in one-week STC on Open Source Technologies conducted by NITTTR
Chandigarh organized by MCA Department at DIT University, Dehradun from Aug 28 Sept 2,
2016.
2. Participated in 2-week FDP workshop on Emerging Trends in Computer Science and IT in
association with CSI Division-I & ISTE Delhi Section organized by Bharati Vidyapeeths Institute
of Computer Applications and Management (BVICAM), New Delhi from MAY 16-27, 2016.
3. Participated in a 2-days FDP workshop on Cloud Computing and its Application conducted by
IMS Unison University Dehradun from Feb 6-7, 2016.
4. Participated in a Brain Storming Session on Skill of Scientific Project Dissertation and paper
th
writing of 10 Uttarakhand State Science & Technology Congress (USSTC-2016) at UCoST
Dehradun on Feb 10, 2016.
5. Participated in a one-week AICTE recognized STC workshop on Mobile Computing through
ICT conducted by NITTTR Chandigarh hosted by ECE Dept. DIT University Dehradun from Jan
25-29, 2016.
6. Participated in a one-week AICTE recognized STC workshop on Cloud Computing via ICT
conducted by NITTTR Chandigarh hosted by IT Dept., DIT University Dehradun from Oct 12
16, 2015.
7. Participated in one-day National workshop on Intellectual Property Rights in Knowledge
Economy (IPRKE-2015) organized by MBA Dept. DIT University Dehradun in Association with
Uttarakhand State Council for Science and Technology (UCoST) Dehradun on Mar 30, 2015.

Page 11 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

9. Reference
[1] Devika M.D. Sunitha C, Amal Ganesh, Sentiment Analysis: A Comparative Study on
Different Approaches, Fourth International Conference on Recent Trends in Computer
Sceince & Engineering (ICRTCSE 2016), Chennai INDIA, Procedia Computer Science,
ELSEVIER, DOI: 10.1016/j.procs.2016.05.124, ISSN: 1877-0509, pp 44-49, 2016.
[2] Vishal A. Kharde, S.S.Sonawane, Sentiment Analysis of Twitter Data: A Survey of
Techniques, International Journal of Computer Applications (IJCA), ISSN 0975-8887, Vol.
139 Issue 11, pp 5-15, April 2016.
[3] Dr.Erton Boci, Susan Thistlethwaite, A Novel BIG DATA Architecture in Support of ADS-
B Data Analytic published in Exelis , Apr 21, 2015
[4] Praveen Kshirsagar, Data Science: the next frontier for business competitiveness,
Veravision Cosnulting P.Ltd. Pune, CSI Communication, July 2015, Page 8.
[5] Pritee parwekar, Suresh Chandra satapathy, Leveraging Bigdata towards enabling analytics
based intrusion detection system in WSN, ANITS, Vishakhapatnam, CSI Communication,
June 2015, page 12.
[6] Maryam M Najafabdi, Flavio Villanustre, Taghi M Khoshgoftaar, Naeem Seliya, Randall Wald,
Edfin Muharemagic, Deep learning application and challenges in big data analytics, Springer
Journal of Big Data 2015 2:1, DOI 10.1186/s40537-014-0007-7
[7] Victoria Lopez, Sara del Ro, Jose Manuel Bentez and Francisco Herrera, On the use
of MapReduce to build Linguistic Fuzzy Rule Based Classification Systems for Big Data
published in 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), July 6-
11, 2014, Beijing, China
[8] Haseena Rahmath, Tanvir Ahmad, Sentiment Analysis Techniques A Comparative
Study, International Journal of Computational Engieering & Management (IJCEM), vol.17
Issue 4, pp 25-29, ISSN(Online): 2230-7893, July 2014.
[9] Walaa Medhat, Ahmed Hassan, Hoda Korashy, Sentiment Analysis algorithms and
Applications: A Survey, Ain Shams Engineering Journal ( www.elsevier.com/locate/asej)
2014, ISSN 2090-4479, pp 1093-1113, DOI: http://dx.doi.org/10/1016/j.asej.2014.04.011,
April 19, 2014.
[10] Muhammad Zubai Asghar, Aurangzeb Khan, Shakeel Ahmad, Fazal Maud Kundi, A
Review of Feature Extraction in Sentiment Analysis, Journal of Basic and Applied
Scientific Research 2014, ISSN 2090-4304, pp 181-186, Feb 18, 2014.
[11] H.V. JAGADISH, JOHANNES GEHRKE, ALEXANDROS LABRINIDIS, YANNIS
PAPAKONSTANTINOU, JIGNESH M. PATEL, RAGHU RAMAKRISHNAN,CYRUS
SHAHABI, Big Data and Its Technical Challenges DOI:10.1145/2611567, Review
Article, COMMUNICATIONS OF THE ACM | JULY 2014 | VOL. 57 | NO. 7
[12] Arantxa Duque Barrachina, Aisling ODriscoll, A big data methodology for categorizing
technical support requests using Hadoop and Mahout, Springer Journal of Big Data 2014,
1:1, http://www.journalofbigdata.com/content/1/1,

Page 12 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar
Design and Validation of Sentiment Analysis Technique for Blogs - A PhD synopsis

[13] D. Laney, 3D Data Management: Controlling Data Volume, Velocity, and Variety,
[Online; accessed January 2014] (http://blogs.gartner.com/doug-laney/files/2012/01/ad949-
3D-Data-Management-Controlling-Data-Volume-Velocity-and-Variety.pdf), 2001.
[14] Dilpreet Singh, Chandan K Reddy, A survey on platforms for big data analytics, Springer
Journal of Big Data 2014, 1:8, http://www.journalofbigdata.com/content/1/1/8
[15] Shailesh kumar shivakumar, Big Data - A Big game changer, Infosys, CSI
Communication, April 2013, page 9
[16] Lars Marius Garshol, Big-data-101-long-130515154138-phpapp01.pptx, Introduction to
Machine Learning, Bouvet, 2012-05-15
[17] P. Zikopoulos, C. Eaton, D. DeRoos, T. Deutsch and George Lapis, Understanding Big Data:
Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill, 2011.
[18] http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
[19] Vishal Gupta, Gurpreet Singh Lehal, A Survey of Text Mining techniques and Applications,
Journal of Emerging Technologies In Web Intelligence (JETWI), vol. 1, No. 1, pp 60-76,
August 2009
[20] Akshi Kumar, Teeja Mary Sebastian, Sentiment Analysis: A Perspective on its Past, Present and
Future, I.J. Intelligent Systems and Applications, 2012, 10, pp 1-14, Published Online September
2012 in MECS (http://www.mecs-press.org/), DOI: 10.5815/ijisa.2012.10.01
[21] Hyoji Ha, Gi-nam Kim, Wonjoo Hwang, Hanmin Choi, Kyungwon Lee, CosMovis: Analyzing
semantic network of sentiment words in movie reviews, IEEE 4th Symposium on large Data
Analysis and Visualization (LDAV), 9-10 Nov 2014, DOI: 10.1109/LDAV.2014.7013215, 2014
[22] Giovanni Acampora, Georgina Cosma, A hybrid computational intelligence approach for efficiently
evaluauting customer sentiments in E-commerce reviews, IEEE 2014 Symposium on Intelligent
Agents (IA), DOI: 10.1109/IA.2014.7009461, 2014.
[23] Xing Fang, Justin Zhan, Sentiment analysis using product review data, SpringerOpen Journal of
Big Data 2015, DOI 10.1186/s40537-015-0015-2
[24] Ahmed Abbasi, Hsinchun Chen, Arab Salem, Sentiment Analysis in Multiple Language: Feature
Selection for Opinion Classification in web Forums, ACM Transactions on Information Systems,
Vol. 26, No. 3, Article 12, Publication date: June 2008.
[25] A. Abbasi, Stephan France, Zhu Zhang, Hsinchun Chen, Selecting attributes for sentiment
classification using feature relation networks, IEEE Transactions on Knowledge and Data
Engineering, Vol.23, Issue 3, pp 447-462, 2011.
[26] Yue.Lu, ChengXiang Zhai, Neel.Sunderasen, Rated aspect summarization of short comments
ACM 18th International Conference on world wide web, pp 131-140, Madrid, Spain, 2009.

Page 13 of 13
Ajay Kumar, PhD (PT-CSE) Research Scholar

Potrebbero piacerti anche