Sei sulla pagina 1di 15

USAGE OF STATISTICS

ON MACHINE LEARNING
SCOPE OF MACHINE LEARNING

decision game
theory theory
AI control
theory
information
biological theory
evolution
Machine
probability Learning
& philosophy
statistics
optimization
Data Mining statistical psychology
mechanics

computational
complexity
theory neurophysiology
WHAT IS MACHINE LEARNING?
MODAL CONFIGURATION

The interpretation and comparison of the results between different


hyperparameter configurations is made using one of two subfields of statistics,
namely:

• Statistical Hypothesis Tests. Methods that quantify the likelihood of


observing the result given an assumption or expectation about the
result (presented using critical values and p-values).

• Estimation Statistics. Methods that quantify the uncertainty of a result


using confidence intervals.
Regression and Classification | Supervised Machine Learning

A regression problem is when the output variable is a real or continuous value,


such as “salary” or “weight”. Many different models can be used, the simplest
is the linear regression. It tries to fit data with the best hyper-plane which goes
through the points.

• Finding age of a person based on questions


USAGE OF STATISTICS ON E-COMMERCE WEBSITES

Judging by Amazon’s success, the recommendation


system works.
The company reported a 29%
sales increase to $12.83 billion
• Recommended for you
• Frequently bought together
• Your recently viewed items
• Browsing History
• Related to items you viewed
• Customers who bought this has also bought
• Frequently bought together
RECCOMENDATION ALGORITHMS
TRADITIONAL COLLABRATIVE FILTERING

1. Customers are represented as N dimensional vector items


2. Components are +ve for positive review & -ve for negative reviews
3. Algorithm generates recommendation based on few customers who are most
similar to the user.

A,B are customers


4. Ranking each item based upon number of customers who have purchased it

Drawbacks
1. if the algorithm examines only a small customer sample, the selected
customers will be less similar to the user.
2. if the algorithm discards the most popular or unpopular items, they will never
appear as recommendations, and customers who have purchased only
those items will not get recommendations.
CLUSTER MODELS

1. To find customers who are similar to the user, cluster models divide the customer base
into many segments
2. The algorithm’s goal is to assign the user to the segment containing the most similar
customers.
3. It then uses the purchases and ratings of the customers in the segment to generate
recommendations.
4. The most similar customers combine to form “clusters” based on algorithm.
5. These algorithms typically start with an initial set of segments, which often contain one
randomly selected customer each. They then repeatedly match customers to the
existing segments.
6. They have better online scalability and performance
7. CLUSTER MODELS > COLLABRATIVE FILTERING.

CLUSTER MODELING
SEARCH BASED METHODS

• Based on Search history


• Giving recommendations to similar products
purchased
• Good for customers with few purchases
or ratings.
• For users with thousands of purchases
it is either too general or too narrow
ITEM TO ITEM COLLABRATIVE FILTERING