Sei sulla pagina 1di 59

PREDICTING THE REVIEWS OF THE RESTAURANT

USING NATURAL LANGUAGE PROCESSING TECHNIQUE TO IMPROVE


RESTAURANT SERVICES

A project report submitted in partial fulfillment of the requirements for the

award of the degree of


BACHELOR OF TECHNOLOGY

IN

COMPUTER SCIENCE AND ENGINEERING

by
D.NAMRATHA (15A31A0566)

CH.MEGHANA (15A31A0564)

S.MOUNICA (15A31A0590)

K.SRUTHI (15A31A0575)

G.POOJA PRIYANKA (15A31A0570)

Under the Esteemed Guidance of


Internal Guide Head of the Department

Mr.T.Soma Sekhar Dr.M.Radhika Mani

Professor Professor & HOD

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

PRAGATI ENGINEERING COLLEGE


i
(Approved by AICTE & Permanently Affiliated to JNTUK & Accredited by NBA and NAAC)
1-378,ADB Road, Surampalem, E.G.Dist., A.P, Pin-533437.

2018-2019

ii
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

PRAGATI ENGINEERING COLLEGE


(Approved by AICTE & Permanently Affiliated to JNTUK & Accredited by NBA and NAAC)
1-378, ADB Road, Surampalem, E.G.Dist., A.P, Pin-533437.

CERTIFICATE

This is to certify that the Project Report entitled “Predicting The Reviews Of The
Restaurant Using Natural Language Processing Technique”, that is being
submittedbyD.NAMRATHA(15A31A566),CH.MEGHANA(15A31A0564),

S.MOUNICA (15A31A0590), K.SRUTHI (15A31A075),G.POOJA(15A31A0570)in


partial fulfillment for the award of the Degree of Bachelor of Technology in
Computer Science and Engineering, Pragati Engineering College is a record of
bonafide work carried out by them.

iii
Internal Guide Head of the Department

Mr.T.Soma Sekhar Dr.M.Radhika Mani

Professor Professor & HOD

External Examiner

iv
ACKNOWLEDGEMENTS

Entrusting into Project work of“Predicting The Reviews Of The Restaurant Using

Natural Language Processing Technique”enabledustoexpressourspecialthanksto Dr.P. Krishna


Rao, Chairman of Pragati Engineering College, Surampalem.

I am extremely thankful to our honorable principal Dr. S. Sambhu Prasad, who has shown keen
interest in us and encouraged us by providing all the facilities to complete our project successfully.

I owe our gratitude to our beloved Head of the Department of CSE, Dr.M.Radhika Mani, for
assisting us in completing our project work.

I express our sincere thanks to our guide Mr.T.SomaSekhar, who has been a source of inspiration
for us throughout our project and for her valuable advices in making our project a success.

I wish to express my sincere thanks to all teaching and non-teaching staff of Computer Science and
Engineering Department.

D.NAMRATHA (15A31A0566)

CH.MEGHANA (15A31A0564)

S.MOUNICA (15A31A0590)

K.SRUTHI (15A31A0575)

G.POOJAPRIYANKA(15A31A0570)

v
vi
ABSTRACT
In the era of the web, a huge amount of information is now flowing over the network. Since the
range of web content covers subjective opinion as well as objective information, it is now common
for people to gather information about products and services that they want to buy. However, since
a considerable amount of information exists as text-fragments without having any kind of
numerical scales, it is hard to classify their evaluation efficiently without reading full text. Here we
will focus on extracting scored ratings from text fragments on the web and suggests various
experiments in order to improve the quality of a classifier. Methodologies like Sentiment Analysis
as Text Classification Problem, Sentiment analysis as Feature Classification with mathematical
treatment areexplored. Of late, the word of mouth opinions expressed online are more valuable as
people visit the restaurant by seeing the reviews.

Keywords:Sentimental Analysis, Naive Bayes, Support Vector Machine

vii
CONTENTS

S.NO DESCRIPTION PAGE NO

ACKNOWLEDGEMENTS ...................................................................................................iii

ABSTRACT ............................................................................................................................iv

CONTENTS.............................................................................................................................v

LIST OF FIGURES ................................................................................................................vi

LIST OF TABLES ..................................................................................................................vi

1. INTRODUCTION..............................................................................................................1

2. LITERATURE SURVEY...................................................................................................3

3. SYSTEM ANALYSIS.......................................................................................................5

3.1 EXISTING SYSTEM .........................................................................................................5

3.2 PROPOSED SYSTEM .......................................................................................................6

4. SYSTEM DESIGN ...........................................................................................................8

4.1 SYSTEM ARCHITECTURE ..............................................................................................8

4.2 UML REPRESENTATION .................................................................................................9

5. SYSTEM IMPLEMENTATION.....................................................................................15

5.1 MODULES ..........................................................................................................................15

5.2 SYSTEM REQUIREMENTS ..............................................................................................15

5.3 SOFTWARE ENVIRONEMNT ..........................................................................................16

6. SYSTEM TESTING .........................................................................................................18

6.1 TESTING OBJECTIVES .....................................................................................................18

6.2 TEST PLAN .........................................................................................................................18

6.3 TEST CASES .......................................................................................................................20

viii
6.4 EXPERIMENTAL RESULTS .............................................................................................22

7. SCREENSHOTS ...............................................................................................................24

8. CONCLUSION AND FUTURE WORK ........................................................................37

9. REFERENCES ..................................................................................................................38

10. SOURCE CODE ............................................................................................................40

LIST OF FIGURES

S.NO DESCRIPTION PAGE NO

Figure 3-1 Example for bloom filter……...………………………….……..........................6

Figure 4-1 System model architecture……...………………………….……........................8

Figure 4-2 Use Case Diagram for end user ...........................................................................9

Figure 4-3 Use Case Diagram for data consumer..................................................................10

Figure 4-4 Use Case Diagram for attribute............................................................................11

Figure 4-5Use Case Diagram for cloud server......................................................................11

Figure 4-6Class Diagram .....................................................................................................12

Figure 4-7 Sequence Diagram for end user............................................................................13

Figure 4-8 Sequence Diagram for data consumer..................................................................13

Figure 4-9Sequence Diagram attribute.................................................................................14

Figure 4-10Sequence Diagram for cloud server.....................................................................14

LIST OF TABLES

ix
S.NO DESCRIPTION PAGE NO

Table 6-1 End user login page Test Cases .............................................................................20

Table 6-2 User registration form Test Cases .........................................................................21

Table 6-3 User file uploading Test Cases...............................................................................21

INTRODUCTION

x
Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

1. INTRODUCTION

Businesses often want to know how customers think about the quality of their services in order to
improve and make more profits. Restaurant goers may want to learn from others’ experience using
a variety of criteria such as food quality, service, ambience, discounts and worthiness. Users may
post their reviews and ratings on businesses and services or simply express their thoughts on other
reviews. Bad (negative) reviews from one’s perspective may have an effect on potential customers
in making decisions, e.g., a potential customer may cancel a service and persuade other do the
same.Thequestionistoquantifyhowcustomers and businesses areinfluenced and how business ratings
change in response to recent feedback.

In this project we use Naïve Bayes algorithm. Naive Bayes is a simple technique for constructing
classifiers: models that assign class labels to problem instances, represented as vectors
of feature values, where the class labels are drawn from some finite set. There is not a
single algorithm for training such classifiers, but a family of algorithms based on a common
principle: all Naive Bayes classifiers assume that the value of a particular feature is independent of
the value of any other feature, given the class variable. For example, a fruit may be considered to be
an apple if it is red, round, and about 10 cm in diameter. A Naive Bayes classifier considers each of
these features to contribute independently to the probability that this fruit is an apple,regardless of
any possible correlations between the color, roundness, and diameter features.

In this project we used the Natural Language Processing Technique(NLP) for pre-processing the
text.NLP is an area of computer science and artificial intelligence concerned with the interactions
between computers and human (natural) languages, in particular how to program computers to
process and analyze large amounts of natural language data. It is the branch of machine learning
which is about analyzing any text and handling predictive analysis.

Scikit-learn is a free software machine learning library for Python programming language. Scikit-
learn is largely written in Python, with some core algorithms written in Cython to achieve
performance. Cython is a superset of the Python programming language, designed to give C-like
performance with code that is written mostly in Python.

Pragati Engineering College Page 1


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Here we focus on the task of sentiment categorization, which takes a segment of unlabeled text and
attempts to classify the text according to overall sentiment. In this project, we apply natural
language processing techniques to classify a set of restaurant reviews based on the number of stars
that each review received. More specifically:

 We develop a classifier to categorize each review from 1-star to 5-stars.


 We implement a set of features that we believe to be relevant to the sentiment
expressed in reviews and analyze their effect on performance, providing insights into what
works and why sentiment categorization can be so difficult.
 We analyze how a review’s conformance to a particular language model can be affected by
the sentiment of the review.
 We experiment with different linguistically motivated models of sentiment expression,
again using the results to improve the performance of our classifier.
 We examine the effects of part-of-speech tagging on our ability to predict sentiment.

Pragati Engineering College Page 2


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

LITERATURE SURVEY

Pragati Engineering College Page 3


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

2. LITERATURE SURVEY

This section reviews literature on machine learning. In machine learning, naive Bayes classifiers
are a family of simple "probabilistic classifiers" based on applying Bayes' theorem with strong
(naive) independence assumptions between the features.

Naive Bayes has been studied extensively since the 1960s. It was introduced (though not under that
name) into the text retrieval community in the early 1960s, and remains a popular (baseline)
method for text categorization, the problem of judging documents as belonging to one category or
the other (such as spam or legitimate, sports or politics, etc.) with word frequencies as the features.
With appropriate pre-processing, it is competitive in this domain with more advanced methods
including support vector machines.

Bo Pang et al., used machine learning techniques to investigate the effectiveness of classification of
documents by overall sentiment. Experiments demonstrated that the machine International Journal
of Computer Applications (0975 – 888) Volume 47– No.11, June 2012 37 learning techniques are
better than human produced baseline for sentiment analysis on movie review data. The
experimental setup consists of movie-review corpus with randomly selected 700 positive sentiment
and 700 negative sentiment reviews. Features based on unigrams and bigrams are used for
classification. Learning methods Naïve Bayes, maximum entropy classification and support vector
machines were employed. Inferences made by Pang et al., is that machine learning techniques are
better than human baselines for sentiment classification. Whereas the accuracy achieved in
sentiment classification is much lower when compared to topic based categorization.

Zhu et al., proposed aspect-based opinion polling from free form textual customers reviews. The
aspect related terms used for aspect identification was learnt using a multi-aspect bootstrapping
method. Aproposed aspect-basedsegmentation model, segments the multi aspect sentence into

Pragati Engineering College Page 4


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

single aspect units which was used for opinion polling. Using a opinion polling algorithm, they
tested on real Chinese restaurant reviews achieving 75.5 percent accuracy in aspect-based opinion
polling tasks. This method is easy to implement and are applicable to other domains like product or
movie reviews.

Jeonghee Yi et al., proposed a Sentiment Analyzer to extract opinions about a subject from online
data documents. Sentiment analyzer uses natural language processing techniques. The Sentiment
analyzer finds out all the references on the subject and sentiment polarity of each reference is
determined. The sentiment analysis conducted by the researchers utilized the sentiment lexicon and
sentiment pattern database for extraction and association purposes. Online product review articles
for digital camera and music were analyzed using the system with good results.

Pragati Engineering College Page 5


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SYTEM ANALYSIS

Pragati Engineering College Page 6


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

3.SYSTEM ANALYSIS

3.1 EXISTING SYSTEM

Many researchers have done experiments to classify the sentiments of the customers on different
datasets earlier. Like Turney (2002) used a semantic orientationalgorithm to classify reviews based
on the numbersof positively oriented and negatively oriented phrasesin each review.Pang et al.
(2002) used machine learning tools such as Maximum Entropy and Support Vector Machine
(SVM) classifiers to classify movie reviews using a number of simple textual features.

3.1.1 Algorithms used in existing system

3.1.1.1 Semantic Orientation

The classification of a review is predicted by the average semantic orientation of the phrases in the
review that contain adjectives or adverbs. A phrase has a positive semantic orientation when it has
good associations (e.g.,"subtle nuances") and a negative semantic orientation when it has bad
associations (e.g.,"very cavalier"). The semantic orientation of a phrase is calculated as the mutual
information between the given phrase and the word "excellent" minus the mutual information
between the given phrase and the word "poor". A review is classified as recommended if the
average semantic orientation of its phrases is positive.

3.1.1.2 Maximum Entropy

The Max Entropy classifier is a probabilistic classifier which belongs to the class of exponential
models. Unlike the Naive Bayesclassifier that we discussed in the previous article, the Max Entropy
does not assume that the features are conditionally independent of each other. The MaxEnt is based
on the Principle of Maximum Entropy and from all the models that fit our training data, selects the
one which has the largest entropy. The Max Entropy classifier can be used to solve a large variety
of text classification problems such as language detection, topic classification, sentiment analysis
and more.

Pragati Engineering College Page 7


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

3.1.1.3 Support Vector Machine(SVM)

“Support Vector Machine” (SVM) is a supervised machine learning algorithmwhich can be used
for both classification or regression challenges. However, it is mostly used in classification
problems. In this algorithm, we plot each data item as a point in n-dimensional space (where n is
number of features you have) with the value of each feature being the value of a particular
coordinate. Then, we perform classification by finding the hyper-plane that differentiate the two
classes very well (look at the below snapshot).

3.1.2 Drawbacks of the existing system

 This type of classification is only done when the classifier has to work on the binary data
which is not the case with Restaurant Reviews.

 However, from a practical point of view perhaps the most serious problem with SVMs is the
high algorithmic complexity and extensive memory requirements of the required quadratic
programming in large-scale tasks.

 If categorical variable has a category (in test data set), which was not observed in training data
set,then model will assign a 0 (zero) probability and will be unable to make a prediction. This
is oftenknown as “Zero Frequency”.

Pragati Engineering College Page 8


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

3.2 PROPOSED SYSTEM

Our proposed system is to apply natural language processing techniques to classify a set of
restaurant reviews based on the number of stars that each review received.We develop a maximum
entropy classifier to categorize each review from 1-star to 5-stars. We implement a set of features
that we believe to be relevant to the sentiment expressed in reviews and analyze their effect on
performance, providing insights into what works and why sentiment categorization can be so
difficult.We analyze how a review’s conformance to a particular language model can be affected by
the sentiment of the review.

We experiment with different linguistically motivated models of sentiment expression, again using
the results to improve the performance of our classifier We examine the effects of part-of-speech
tagging on our ability to predict sentiment.We experimented with different methods of
preprocessing the data. Because the reviews are unstructured in terms of user input, reviews can
look like anything from a paragraph of well-formatted text to a jumble of seemingly unrelated
words to a run-on sentence with no apparent regard for grammar orPunctuation.Our initial pass
over the data simply tokenized the reviews based on whitespace and treated each token as a
unigram, but we were able to improve performance by removing punctuation in addition to the
whitespace and converting all letters to lowercase.

In this way, we treat the occurrences of “good”, “Good”, and “good.” all as the same, which gives
better predictive power to any test set review containing any of these three forms.Before converting
into the unigram stemming was also done which means the various forms (tenses, verbs) of the
words were removed and treated as a single word. After the matrix is build the non-frequent words
are removed by setting a threshold in order to improve the accuracy. So our matrix includes
relevant unigrams as well as bigrams which are occurring more than the threshold times.

Pragati Engineering College Page 9


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

3.2.1 Algorithm used in proposed system

3.2.1.1 Naive Bayes

Proposed system uses this Naive BayesIt is a classification techniquebased on Bayes’ Theoremwith
an assumption of independence among predictors. In simple terms, a Naive Bayes classifier
assumes that the presence of a particular feature in a class is unrelated to the presence of any other
feature. For example, a fruit may be considered to be an apple if it is red, round, and about 3 inches
in diameter. Even if these features depend on each other or upon the existence of the other features,
all of these properties independently contribute to the probability that this fruit is an apple and that
is why it is known as ‘Naive’.

Naive Bayes model is easy to build and particularly useful for very large data sets. Along with
simplicity, Naive Bayes is known to outperform even highly sophisticated classification methods.

3.2.2 Advantages of proposed system

 Good at pattern recognition problems


 Data-driven, and performance is high in many problems
 End-to-End training: little or no domain knowledge is needed in system construction
 Learn of representations: cross-modal processing is possible
 Gradient-based learning: learning algorithm is simple
 Mainly supervised learning methods

Pragati Engineering College Page 10


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SYTEM DESIGN

Pragati Engineering College Page 11


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

4. SYSTEM DESIGN

4.1 SYSTEM ARCHITECTURE

The system architecture is shown in Figure. It comprises two main modules, anoffline
processing module, where the user profiles are being generated and the feature extraction and
rating happens, as well as an online module, that generates real-time

recommendations.The prototype uses user review data from restaurant. The dataset contains
user information, business information and user reviews. These objects are stored on Sqlite3
database.A brief overview of the system is provided in what follows.

Fig4-1: System Model

Pragati Engineering College Page 12


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

4.2 UML REPRESENTATION

The Unified Modeling Language is a standard language for specifying, visualization, constructing
and documenting the artifacts of software system, as well as for business modeling and other non-
software systems.

The following are the UML diagrams used in this project

4.2.1 Use case diagrams:

A use case diagram in the Unified Modeling Language (UML) is a type of behavioral diagram
defined by and created from a Use-case analysis.A use case diagram at its simplest is a
representation of a user's interaction with the system that shows the relationship between the user
and the different use casesin which the user is involved. A use case diagram can identify the
different types of users of a system and the different use cases and will often be accompanied by
other types of diagrams as well. The use cases are represented by either circles or ellipses

Use case:

In software and systems engineering, a use case is a list of actions or event steps typically defining
the interactions between a role (known in the Unified Modeling Language as an actor and a system
to achieve a goal. The actor can be a human or other external system. In systems engineering, use
cases are used at a higher level than within software engineering, often representing missions
or stakeholdergoals. The detailed requirements may then be captured in the Systems Modeling
Language (SysML) or as contractual statements.

Pragati Engineering College Page 13


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Fig 4-2: Use case diagram for restaurant reviews

Fig 4-2 shows the usecase diagram in which the actor is the end user who can import the data.The
use cases for the actor that is for the end user are splitting the data, training the data, predicting,
constructing confusion matrix and calculating Accuracy score. End user has only this system
boundary that is the actor can perform only these tasks. Beyond these tasks the actor is not given
permission.

Pragati Engineering College Page 14


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

4.2.2Sequence Diagram: A sequence diagram in Unified Modeling Language (UML) is a kind of


interaction diagram that shows how processes operate with one another and in what
order.A sequence diagram shows object interactions arranged in time sequence. It depicts the
objects and classes involved in the scenario and the sequence of messages exchanged between the
objects needed to carry out the functionality of the scenario. Sequence diagrams are typically
associated with use case realizations in the Logical View of the system under development.
Sequence diagrams are sometimes called event diagrams or event scenarios.

Fig 4-7:Sequence diagram for restaurant reviews

Fig 4-7 shows the sequence diagram in which the actor is the end user. Here there is a synchronous
process in which the end user can perform all the functions of importing, data cleaning, classifying
and splitting .More over the actor can also receive response for the actions performed.

Pragati Engineering College Page 15


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

4.2.3 Communication diagrams:

A Communication diagram models the interactions between objects or parts in terms of sequenced
messages. Communication diagrams represent a combination of information taken
from Class, Sequence, and Use Case Diagramsdescribing both the static structure and dynamic
behavior of a system.

However, communication diagrams use the free-form arrangement of objects and links as used in
Object diagrams. In order to maintain the ordering of messages in such a free-form diagram,
messages are labeled with a chronological number and placed near the link the message is sent
over. Reading a communication diagram involves starting at message 1.0, and following the
messages from object to object.

Fig 4-7: Communication diagram for restaurant reviews

Pragati Engineering College Page 16


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Fig 4-7 shows the communication diagram in which the actor is the end user. Here there is a
synchronous process in which the end user can perform all the functions of importing, data
cleaning, classifying and splitting .More over the actor can also receive response for the actions
performed.

4.2.4 Deployment diagrams:

A deployment diagram in the Unified Modeling Languagemodels the physical deployment


of artifacts on nodes. To describe a web site, for example, a deployment diagram would show what
hardware components ("nodes") exist (e.g., a web server, an application server, and a database
server), what software components ("artifacts") run on each node (e.g., web application, database),
and how the different pieces are connected (e.g. JDBC, REST, RMI).

The nodes appear as boxes, and the artifacts allocated to each node appear as rectangles within the
boxes. Nodes may have subnodes, which appear as nested boxes. A single node in a deployment
diagram may conceptually represent multiple physical nodes, such as a cluster of database servers.

libraries dataset
datacleanin
g
user
bag of words

system
accuracy features and labels

naïve bayes algorithm


prediction splitting
Fig

4-7: Deployment diagram for restaurant reviews

Fig 4-7 shows the deployment diagram in which the actor is the end user. Here there is a
synchronous process in which the end user can perform all the functions of importing, data

Pragati Engineering College Page 17


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

cleaning, classifying and splitting .More over the actor can also receive response for the actions
performed.

4.2.5 Component diagrams:

In Unified Modeling Language (UML), a component diagram depicts how components are wired
together to form larger components or software systems. They are used to illustrate the structure of
arbitrarily complex systems.

A component is something required to execute a stereotype function. Examples of stereotypes in


components include executables, documents, database tables, files, and library files. Components
are wired together by using an assembly connector to connect the required interface of one
component with the provided interface of another component.

numpy nltk pandas

matplotlib
server

user interface

algorithm

Fig 4-7: Component diagram for restaurant reviews

Fig 4-7 shows the component diagram in which the actor is the end user. Here there is a
synchronous process in which the end user can perform all the functions of importing, data
cleaning, classifying and splitting .More over the actor can also receive response for the actions
performed.

Pragati Engineering College Page 18


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

4.2.6Activity diagrams:

Activity diagrams are graphical representations of workflowsof stepwise activities and actionswith
support for choice, iteration and concurrency. In the Unified Modeling Language, activity diagrams
are intended to model both computational and organizational processes (i.e., workflows), as well as
the data flows intersecting with the related activities.Although activity diagrams primarily show the
overall flow of control, they can also include elements showing the flow of data between activities
through one or more data stores

USER DATSET UPLOAD CLASSIFICATION UNSUPERVISE


D

SUPERVISED

DISPLAY RESULTS FEATURES LABELS

ACCURACY SCORE PREDICTING TRAININGAND TESTING

Fig 4-7: Activity diagram for restaurant reviews

Fig 4-7 shows the Activity diagram in which the actor is the end user. Here there is a synchronous
process in which the end user can perform all the functions of importing, data cleaning, classifying
and splitting .More over the actor can also receive response for the actions performed.

Pragati Engineering College Page 19


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SYSTEM IMPLEMENTATION

Pragati Engineering College Page 20


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

5. SYSTEM IMPLEMENTATION

5.1 SYSTEM REQUIREMENTS

5.1.1 HARDWARE REQUIREMENTS

 RAM : 4GB and Higher


 Processor : Intel i3 and above
 Hard Disk : 500GB: Minimum

5.1.2 SOFTWARE REQUIREMENTS

 Operating Systems : Windows Family

 Python IDE : Python (2.7.x and above) and Pycharm IDE

 setup tools and pip to be installed for 3.6.x and above

5.2 SOFTWARE ENVIRONMENT

5.2.1 Python

Python is a high-level, interpreted, interactive and object-oriented scripting language.

Python is designed to be highly readable. It uses English keywords frequently where as other

languages use punctuation, and it has fewer syntactical constructions than other languages.

5.2.1.1 Features of python

 Python is Interpreted − Python is processed at runtime by the interpreter. You do not need

to compile your program before executing it. This is similar to PERL and PHP.

Pragati Engineering College Page 21


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

 Python is Interactive − You can actually sit at a Python prompt and interact with the

interpreter directly to write your programs.

 Python is Object-Oriented − Python supports Object-Oriented style or technique of

programming that encapsulates code within objects.

 Python is a Beginner’s Language − Python is a great language for the beginner-level

programmers and supports the development of a wide range of applications from simple

text processing to WWW browsers to games.

5.2.2 Django Framework

Django is a Python-based free and open-source web framework, which follows the model-view-
template (MVT) architectural pattern. It is maintained by the Django Software Foundation (DSF),
an independent organization established as a non-profit.

Django's primary goal is to ease the creation of complex, database-driven websites. The framework
emphasizes reusability and "pluggability" of components, less code, low coupling, rapid
development, and the principle of don't repeat yourself.] Python is used throughout, even for
settings files and data models. Django also provides an optional administrative create, read, update
and delete interface that is generated dynamically through introspection and configured via admin
models.

5.2.2.1 Features of Django Framework

 a lightweight and standalone web server for development and testing

 a form serialization and validation system that can translate between HTML forms and
values suitable for storage in the database

 a template system that utilizes the concept of inheritance borrowed from object-oriented
programming

Pragati Engineering College Page 22


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

 a caching framework that can use any of several cache methods

 support for middleware classes that can intervene at various stages of request processing
and carry out custom functions

 an internal dispatcher system that allows components of an application to communicate


events to each other via pre-defined signals

 an internationalizationsystem, including translations of Django's own components into a


variety of languages

 a system for extending the capabilities of the template engine

 an interface to Python's built-in unit testframework

Pragati Engineering College Page 23


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SYSTEM TESTING

Pragati Engineering College Page 24


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

6. SYSTEM TESTING
6.1 TESTING OBJECTIVES

The reason for testing is to find system errors. Testing is the way toward attempting to find each
possible blame or shortcoming in a work item. It gives an approach to check the usefulness of parts,
sub-gatherings, congregations or potentially a completed item It is the way toward practicing
programming with the aim of guaranteeing that the programming framework lives up to its
necessities and client desires and does not bomb in an unsuitable way. There are different sorts of
test. Every test sort addresses a particular testing prerequisite.

 Identification of deformities: imperfections must be distinguished first in the item.


 Isolating the deformities: After distinguishing proof imperfections must be recorded.
Segregation implies division. Physical division is finished by the designer.
 Subjected for amendment: This is the obligation of the TE to send the rundown of
deformities for correction.
 Ensure that the item is sans imperfection: Ensure that the deformities are truly redressed
and the item is sans imperfection.

6.2 Test Plan

It is characterized as the key archive, which clarifies the general system of how to test an
application in a powerful, productive and in an enhanced way. The following testing techniques are
performed

6.2.1What is Web Testing?

Web testing is a software testing practice to test the websites or web applications for potential bugs.
It’s a complete testing of web-based applications before making live.

A web-based system needs to be checked completely from end-to-end before it goes live for end
users.

Pragati Engineering College Page 25


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

By performing website testing, an organization can make sure that the web-based system is
functioning properly and can be accepted by real-time users.

The UI design and functionality are the captains of website testing.

1.1.1 Web testing checklists

1) Functionality Testing
2) Usability testing
3) Interface testing
4) Compatibility testing
5) Performance testing
6) Security testing

1) Functionality Testing

Test for – all the links in web pages, database connection, forms used for submitting or getting
information from the user in the web pages, Cookie testing etc.

Check all the links:

 Test the outgoing links from all the pages to the specific domain under test.
 Test all internal links.
 Test links jumping on the same pages.
 Test links used to send email to admin or other users from web pages.
 Test to check if there are any orphan pages.
 Finally, link checking includes, check for broken links in all above-mentioned links.

Test forms on all pages:


Forms are an integral part of any website. Forms are used for receiving information from users and
to interact with them. So what should be checked in these forms?

 First, check all the validations on each field.

Pragati Engineering College Page 26


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

 Check for default values of the fields.


 Wrong inputs in the forms to the fields in the forms.
 Options to create forms if any, form delete, view or modify the forms.

Let’s take an example of the search engine project currently I am working on, in this project we
have advertiser and affiliate signup steps. Each sign-up step is different but its dependent on the
other steps.

So sign up flow should get executed correctly. There are different field validations like email Ids,
User financial info validations etc. All these validations should get checked in manual or automated
web testing.

Cookies Testing:

Cookies are small files stored on the user machine. These are basically used to maintain the
session- mainly the login sessions. Test the application by enabling or disabling the cookies in your
browser options.

Test if the cookies are encrypted before writing to the user machine. If you are testing the session
cookies (i.e. cookies that expire after the session ends) check for login sessions and user stats after
the session ends. Check effect on application security by deleting the cookies. (I will soon write a
separate article on cookie testing as well)

Validate your HTML/CSS:

If you are optimizing your site for Search engines then HTML/CSS validation is the most important
one. Mainly validate the site for HTML syntax errors. Check if the site is crawlable to different
search engines.

Database testing:

Data consistency is also very important in a web application. Check for data integrity and errors
while you edit, delete, modify the forms or do any DB related functionality.

Pragati Engineering College Page 27


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Check if all the database queries are executing correctly, data is retrieved and also updated
correctly. More on database testing could be a load on DB, we will address this in web load or
performance testing below.

In testing the functionality of the websites the following should be tested:

Links
i. Internal Links
ii. External Links
iii. Mail Links
iv. Broken Links

Forms
i. Field validation
ii. Error message for wrong input
iii. Optional and Mandatory fields

Database
Testing will be done on the database integrity.

2) Usability Testing

Usability testing is the process by which the human-computer interaction characteristics of a system
are measured, and weaknesses are identified for correction.

• Ease of learning
• Navigation
• Subjective user satisfaction
• General appearance

Pragati Engineering College Page 28


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Test for navigation:

Navigation means how a user surfs the web pages, different controls like buttons, boxes or how the
user uses the links on the pages to surf different pages.

Usability testing includes the following:

 The website should be easy to use.


 Instructions provided should be very clear.
 Check if the instructions provided are perfect to satisfy its purpose.
 The main menu should be provided on each page.
 It should be consistent enough.

Content checking:

Content should be logical and easy to understand. Check for spelling errors. Usage of dark colors
annoys the users and should not be used in the site theme.

You can follow some standard colors that are used for web page and content building. These are the
commonly accepted standards like what I mentioned above about annoying colors, fonts, frames
etc.

Content should be meaningful. All the anchor text links should be working properly. Images should
be placed properly with proper sizes.

These are some of the basic important standards that should be followed in web development. Your
task is to validate all for UI testing.

Other user information for user help:

Like search option, sitemap also helps files etc. The sitemap should be present with all the links in
websites with a proper tree view of navigation. Check for all links on the sitemap.

Pragati Engineering College Page 29


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

“Search on the site” option will help users to find content pages that they are looking for easily and
quickly. These are all optional items and if present they should be validated.

3) Interface Testing

In web testing, the server side interface should be tested. This is done by verifying that
communication is done properly. Compatibility of the server with software, hardware, network, and
the database should be tested.

The main interfaces are:

 Web server and application server interface


 Application server and Database server interface.

Check if all the interactions between these servers are executed and errors are handled properly. If
database or web server returns an error message for any query by application server then
application server should catch and display these error messages appropriately to the users.

Check what happens if the user interrupts any transaction in-between? Check what happens if the
connection to the web server is reset in between?

4) Compatibility Testing

Compatibility of your website is a very important testing aspect. See which compatibility test to be
executed:

 Browser compatibility
 Operating system compatibility
 Mobile browsing
 Printing options

Pragati Engineering College Page 30


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Browser compatibility:

In my web-testing career, I have experienced this as the most influencing part of website testing.
Some applications are very dependent on browsers. Different browsers have different
configurations and settings that your web page should be compatible with.

Your website coding should be a cross-browser platform compatible. If you are using java scripts
or AJAX calls for UI functionality, performing security checks or validations then give more stress
on browser compatibility testing of your web application.

Test web application on different browsers like Internet Explorer, Firefox, Netscape Navigator,
AOL, Safari, Opera browsers with different versions.

OS compatibility:

Some functionality in your web application is that it may not be compatible with all operating
systems. All new technologies used in web development like graphic designs, interface calls like
different API’s may not be available in all Operating Systems.

Hence test your web application on different operating systems like Windows, Unix, MAC, Linux,
Solaris with different OS flavors.

Mobile browsing:

We are in the new technology era. So in future Mobile browsing will rock. Test your web pages on
mobile browsers. Compatibility issues may be there on mobile devices as well.

Printing options:

If you are giving page-printing options then make sure fonts, page alignment, page graphics etc.,
are getting printed properly. Pages should fit the paper size or as per the size mentioned in the
printing option.

Pragati Engineering College Page 31


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

5) Performance testing

The web application should sustain to heavy load. Web performance testing should include:

 Web Load Testing


 Web Stress Testing

Test application performance on different internet connection speed.

Web load testing: You need to test if many users are accessing or requesting the same page. Can
system sustain in peak load times? The site should handle many simultaneous user requests, large
input data from users, simultaneous connection to DB, heavy load on specific pages etc.

Web Stress testing: Generally stress means stretching the system beyond its specified limits. Web
stress testing is performed to break the site by giving stress and its checked as for how the system
reacts to stress and how it recovers from crashes. Stress is generally given on input fields, login and
sign up areas.

In web performance, testing website functionality on different operating systems and different
hardware platforms is checked for software and hardware memory leakage errors.

Performance testing can be applied to understand the web site’s scalability or to benchmark the
performance in the environment of third-party products such as servers and middleware for
potential purchase.

Connection Speed
Tested on various networks like Dial-Up, ISDN etc.

Load.

i. What is the no. of users per time?


ii. Check for peak loads and how the system behaves
iii. A large amount of data accessed by the user

Pragati Engineering College Page 32


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Stress
i. Continuous Load
ii. Performance of memory, CPU, file handling etc..

6) Security Testing

Following are some of the test cases for web security testing:

 Test by pasting internal URL directly into the browser address bar without login. Internal
pages should not open.
 If you are logged in using username and password and browsing internal pages then try
changing URL options directly. I.e. If you are checking some publisher site statistics with
publisher site ID= 123. Try directly changing the URL site ID parameter to different site ID
which is not related to the logged in user. Access should be denied for this user to view
others stats.
 Try some invalid inputs in input fields like login username, password, input text boxes etc.
Check the system’s reaction to all invalid inputs.
 Web directories or files should not be accessible directly unless they are given download
option.
 Test the CAPTCHA for automating script logins.
 Test if SSL is used for security measures. If it is used, the proper message should get
displayed when user switch from non-secure HTTP:// pages to secure HTTPS:// pages and
vice versa.
 All transactions, error messages, security breach attempts should get logged in log files
somewhere on the web server.

The primary reason for testing the security of a web is to identify potential vulnerabilities and
subsequently repair them.

 Network Scanning
 Vulnerability Scanning
 Password Crackin

Pragati Engineering College Page 33


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

SCREENSHOTS

Pragati Engineering College Page 34


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

7.SCREENSHOTS

Pragati Engineering College Page 35


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Pragati Engineering College Page 36


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

CONCLUSION

&

FUTURE WORK

Pragati Engineering College Page 37


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

8. CONCLUSION AND FUTURE WORK

Humans are the "Gold Standard" of sentiment analysis yet there is always disagreement within a
group of raters on sentiment. Humans generally only agree about 80% of the time. Automatic
sentiment analysis can strive towards this level but, obviously, can not exceed it.

People and automatic systems both have a place in the process. The Automated systems can go
through huge quantities of data while humans can do a higher quality job on a smaller
sample. Saying "People are no good because they are not scalable" is probably just as silly as
saying "Automatic systems are no good because they are not as accurate".

Focus on and use the strengths of each as needed for your particular situation.It will have a lot to do
with social forums/platforms where people express free opinion. Presently tweets are one such open
medium, then if facebook at some point chooses to make the timeline updates/status messages open
to search (I think it will someday do that through a minuscule sounding update in "privacy policy")
it will be gold mine of real-time sentiments.

Present Sentiments hold a key to the future events. To make it sound a bit technical, you can say
that the sentiments represent the "present value of future events". Now this value can have deep
social, political and monetary significance. It can be "Expression of opinion about a public figure",
"opinions expressed through tweets before elections", or "the buzz before a movie release", all
these can be great cues for things to come.

Therefore when people comment about present news stories, the sentiment analysis can actually
offer a key to predict the future outcomes or atleast anticipate them better!

Pragati Engineering College Page 38


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

REFERENCES.

Pragati Engineering College Page 39


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

9. REFERENCES

1] Ariyasriwatana, W., Buente, W., Oshiro, M., & Streveler, D. (2014). Categorizing
health-related cues to action: using Yelp reviews of restaurants in Hawaii. New
Review of Hypermedia and Multimedia, 20(4), 317-340.

[2] Byers, J. W., Mitzenmacher, M., & Zervas, G. (2012, June). The groupon effect on
yelp ratings: a root cause analysis. In Proceedings of the 13th ACM conference on
electronic commerce (pp. 248-265). ACM.

[3] Hicks, A., Comp, S., Horovitz, J., Hovarter, M., Miki, M., & Bevan, J. L. (2012).
Whypeople use Yelp. com: An exploration of uses and gratifications. Computers in
Human Behavior, 28(6), 2274-2279.

[4] Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. S. (2013, July). What yelp
fake review filter might be doing?. In ICWSM. 6

[5] dos Santos, C. N., & Gatti, M. (2014). Deep Convolutional Neural Networks for
Sentiment Analysis of Short Texts. In COLING (pp. 69-78).

[6] Mullen, T., & Collier, N. (2004, July). Sentiment Analysis using Support Vector
Machines with Diverse Information Sources. In EMNLP (Vol. 4, pp. 412-418).

[7] Kiritchenko, S., Zhu, X., Cherry, C., & Mohammad, S. (2014, August). NRC-
Canada-2014: Detecting aspects and sentiment in customer reviews. In Proceedings of the
8th International Workshop on Semantic Evaluation (SemEval 2014) (pp. 437-442).
Dublin, Ireland: Association for Computational Linguistics and Dublin City University.

[8] Huang, J., Rogers, S., & Joo, E. (2014). Improving restaurants by extracting

Pragati Engineering College Page 40


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

subtopics from yelp reviews. iConference 2014 (Social Media Expo).

[9] Shalev-Shwartz, S., Singer, Y., Srebro, N., & Cotter, A. (2011). Pegasos: Primal
estimated sub-gradient solver for svm. Mathematical programming, 127(1), 3-30.

[10] Saif, Hassan, et al. "Onstopwords, filtering and data sparsity for sentiment
analysis of Twitter." (2014): 810-817.

Pragati Engineering College Page 41


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

APPENDIX

Pragati Engineering College Page 42


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

10.SOURCE CODE

VIEWS.PY:

from django.shortcutsimport render


import pandas as pd
from sklearnimport metrics
from django.views.genericimport TemplateView
import sklearn

def result(request):
# Importing the dataset
dataset = pd.read_csv('static/Restaurant_Reviews.tsv',delimiter='\t', quoting=3)

# Cleaning the texts


import re
import nltk
nltk.download('stopwords')
from nltk.corpusimport stopwords
from nltk.stem.porterimport PorterStemmer
corpus = []
for iin range(0, 1000):
review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
review = review.lower()
review = review.split()
ps = PorterStemmer()
review = [ps.stem(word) for word in review if not word in set(stopwords.words('english'))]
review = ' '.join(review)
corpus.append(review)

Pragati Engineering College Page 43


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

# Creating the Bag of Words model


from sklearn.feature_extraction.textimport CountVectorizer
cv = CountVectorizer(max_features=1500)
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values

# Splitting the dataset into the Training set and Test set
from sklearn.model_selectionimport train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=0)

# Fitting Naive Bayes to the Training set


from sklearn.naive_bayesimport GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

# Predicting the Test set results


y_pred = classifier.predict(X_test)

# Making the Confusion Matrix


from sklearn.metricsimport confusion_matrix
cm = confusion_matrix(y_test, y_pred)

from sklearn.metricsimport accuracy_score


accuracy=accuracy_score(y_test, y_pred, normalize=False)

#d={'i':accuracy,'j':cm}
d = {'i': metrics.accuracy_score(y_test, y_pred), 'j': metrics.confusion_matrix(y_test, y_pred)}
return render(request,'restaurant.html',context=d)

################################################################################

Pragati Engineering College Page 44


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

class Home(TemplateView):
template_name = 'home.html'

restaurant.html:

<!DOCTYPE html>
{% extends 'base.html' %}
{% load staticfiles%}

{% block body_block%}
<title>Restaurant Reviews</title>
<style>

h1{
color: green;
text-align: center;
}
h2{
color: blue;
}
p{
font-family: Arial;
}
</style>
<div class="container">
<div class="jumbotron">
<h1><em>This is Result page!!!</em></h1><br><br>
<h2>Accuracy :{{ i}}</h2><br>

Pragati Engineering College Page 45


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

<h2>Confusion Matrix :{{ j }}</h2><br><br>


<p><imgsrc="{% static 'images/restaurant.jpg' %}" align="center"></p>
</div>
</div>
{% endblock%}

home.html:

{% extends 'base.html' %}
{% load staticfiles%}
{% block body_block%}
<title>Restaurant Reviews</title>
<style>
h1{
color: green;
text-align: center;
}
h2{
color: blue;
}
p{
font-family: Arial;
}
</style>

<div class="container">
<div class="jumbotron">
<h1>This is about the restaurant reviews!!</h1><br><br>
<imgsrc="{% static 'images/restaurant1.jpg' %}" align="center"><br><br>

Pragati Engineering College Page 46


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

<p>
The purpose of this analysis is to build a prediction model to predict whether a review on the
restaurant is positive or negative.
To do so, we will work on Restaurant Review dataset, we will load it into predicitve algorithms
Multinomial Naive Bayes,
Bernoulli Naive Bayes and Logistic Regression. In the end, we hope to find a "best" model for
predicting the review's sentiment.
</p>
<p>
However since a considerable amount of information exists as text-fragments without having
any kind of numerical scales, it is hard to classify their evaluation efficiently without reading
full text. Here we will
focus on extracting scored ratings from text fragments on the web and suggests various
experiments in order to improve the
quality of a classifier.
</p>
</div>
</div>
{% endblock%}

base.html:

<!DOCTYPE html>
{% load staticfiles%}
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
<link rel="stylesheet"
href="https://stackpath.bootstrapcdn.com/bootstrap/4.1.3/css/bootstrap.min.css" integrity="sha384-
MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPMO"

Pragati Engineering College Page 47


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

crossorigin="anonymous">
</head>
<div class="container">
<nav class="navbar navbar-expand-lg navbar-light bg-light">
<a class="navbar-brand" href="{% url 'app1:home' %}">Home</a>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-
target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false"
aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav mr-auto">
<li class="nav-item active">
<a class="nav-link" href="{% url 'app1:result'%}">Restaurant Reviews<span class="sr-
only">(current)</span></a>
</li>
</ul>
</div>
</nav>
</div>
{% block body_block%}
{% endblock%}
</body>
</html>

Pragati Engineering College Page 48


Predicting The Reviews Of The Restaurant Using Natural Language Processing Technique

Restaurant_Reviews.tsv:

/* .tsv file consists of thousand lines separated by tab space and some of them are
mentioned below */

Review Liked 1
Wow... Loved this place. 1
Crust is not good. 0
Not tasty and the texture was just nasty. 0
Stopped by during the late May bank holiday off Rick Steve recommendation and loved it. 1
The selection on the menu was great and so were the prices. 1
Now I am getting angry and I want my damn pho. 0
Honeslty it didn't taste THAT fresh.) 0
The potatoes were like rubber and you could tell they had been made up ahead of time being kept
under a warmer. 0
The fries were great too. 1
A great touch. 1

Pragati Engineering College Page 49

Potrebbero piacerti anche