Sei sulla pagina 1di 30

R Programming

Topic: Analyzing a dataset


Dataset: Amazon Alexa Reviews
Introduction
 This dataset consists of a nearly 3000 Amazon customer reviews (input text),
and 5 variables which are star ratings, date of review, variant, verified reviews
and feedback of various Amazon Alexa products like Alexa Echo, Echo dots,
Alexa Firesticks etc.
Explanation of variables:
1. Star ratings(num): The star ratings are the ratings given by customers to the various
products of amazon after testing and using them and generally range from 1 to 5.
2. Date of review(num): The date on which the customers are reviewing the various products
of amazon.
3. Variation(char): The different variants of the products of amazon. For E.g.: Black variant of
echo show & white variant of echo show are two different variants of echo show.
4. Verified reviews(char): These reviews are given by customers or official personnel who
have genuinely tested the various amazon products and are marked as verified by Amazon.
5. Feedback(num): This variable is having two face value i.e. 1 and 0.
• 1 indicates a positive feedback from the customer.
• 0 indicates a negative feedback from the customer.
Amazon Alexa

 Amazon Alexa known simply as Alexa is a virtual assistant developed by Amazon,


first used in the Amazon Echo and the Amazon Echo Dot smart speakers developed
by Amazon Lab126.
 It is capable of voice interaction, music playback, making to-do lists, setting alarms,
streaming podcasts, playing audiobooks, and providing weather, traffic, sports, and
other real-time information, such as news.
 Alexa can also control several smart devices using itself as a home
automation system.
Objectives
 Discover insights into consumer reviews and assist with machine learning models.
 Train your machine models for sentiment analysis.
 Analyze customer reviews how many positive reviews .
 How many negative reviews.
 Bringing out relation between variables.
 Summarizing data and finding patterns.
 Analyzing patterns through plot of graph.
Features of Amazon Alexa
Products of Amazon

ECHO DOT ECHO SPOT

ECHO PLUS AMAZON ECHO


FIRE TV STICK ECHO SHOW

Based on their experience, the customers give reviews, feedback and


star ratings to the various products they have used and tested.
Structure & Summary function
which.min & which.max function
Subset Function
Table function
Histogram function
hist(amazon_alexa$i..rating,
xlab=“Alexa Rating”,
ylab=“Number of Ratings”,
main=“Bar Chart of Alexa
Ratings”)
Tapply () function

 Syntax: Tapply(x, INDEX, FUN = NULL, …, default = NA, simplify = TRUE)


1. X = a vector,
2. INDEX = list of one or more factor,
3. FUN = function or operation that needs to be applied.

Lets understand with data :-


Now we have to summarize data by date with rating to see interesting patterns
Regression Analysis
• Regression Analysis is a very widely used statistical tool to establish a relationship model
between two variables. One of these variable is called predictor variable whose value is gathered
through experiments. The other variable is called response variable whose value is derived from
the predictor variable.
• Syntax: Lm(formula,data)
• Formula is a symbol presenting the relation between x and y.
• Data is the vector on which the formula will be applied.
• Adjusted r square:-the adjusted r-squared compares the explanatory power of regression models
that contain different numbers of predictors.
• Interpret the p-values:-a low p-value (< 0.05) indicates that you can reject the null hypothesis. In
other words, a predictor that has a low p-value is likely to be a meaningful addition to your model
because changes in the predictor's value are related to changes in the response variable.
Linear regression
Logistic Regression
Conclusion

 By analyzing this dataset named “Amazon Alexa” we can give the potential
customers a clear understanding of the product by providing them the reviews of
those customers who already have Alexa.
 This analysis shows that the products star ratings have maximum frequency at 5 star
and minimum frequency at 2 star, this takes us to the conclusion that the product is
liked by the customers.
 From the histogram we had plotted for the variable star ratings, a conclusion can be
drawn that almost 2500 customers out of the total 3150 customers have given a 5-star
rating to the various products of amazon, thus making it the most frequent star rating
given by the customers to the products.
 Relationships between the different variables were analyzed.
 The data was summarized and interesting patterns were obtained.
 These patterns were then further analyzed by using various plot functions such as
histogram, boxplot, etc.
Thank You!!

Potrebbero piacerti anche