Sei sulla pagina 1di 24

Marketing & Retail Analytics –

Market Basket Analysis

- Rajarshi Pandit
+91 98457 43370
raj@ankanalytics.com

@ 2019 Ank Analytics Confidential


"A large US retailer made an interesting discovery: there
was a higher-than-expected correlation between beer
sales and diaper sales between 5:00 pm and 7:00 pm.

The data seemed to indicate that young fathers buying


diapers and baby supplies after work were also likely to
buy beer for themselves (perhaps to counteract the
stress of fatherhood).

The retailer rearranged their stores so that beer and Does it make sense to
diapers were displayed next to each other. Sales of both promote them together?
beer and diapers soared."

@ 2019 Ank Analytics Confidential 2


The "beer and diapers" story is a great illustration of the potential
of predictive analytics. Paying attention to associations between
items in the market basket reveals correlations that are not obvious
and would not have otherwise come to light.
Obvious: Bread + Milk.
Not So Obvious: Diapers + Beer.

However, "correlation is not causation." Buying diapers does not


cause the shopper to automatically buy beer (nor vice versa). But
the convenience factor of displaying beer and diapers together can
be used to influence buyer behavior.

@ 2019 Ank Analytics Confidential 3


Market Basket Analysis: Overview
Market Basket Analysis (MBA)

• Market Basket Analysis (Also called MBA) is a modelling technique based


upon the theory that if you buy a certain group of items, you are more (or
less) likely to buy another group of items
• It is a widely used technique among the
Marketers to identify the best possible Answering business questions
combination of the products or services
which are frequently bought by the
customers.
• This is also called product association
analysis.
• Association analysis mostly done based on
an algorithm named “Apriori Algorithm”.
• The Outcome of this analysis is called
association rules.
• Marketers use these rules to strategize their
recommendations.
@ 2019 Ank Analytics Confidential 4
Market Basket Analysis: Application
Market Basket Analysis (MBA)

Some applications of Market Basket Analysis


• Cross selling: offer the associated items when customer buy any items from your store

• Product placement: items that are associated (such as bread and butter, or tissue and
cold medicine, potato chip and beer) can be put near to each other. If the customers
see them, it has higher probability that they will purchase them together.

• Affinity promotion: design the promotional events based on associated products.

• Customer behavior: associating purchase with demographic, and socio economic data
(such as age, gender and preference) may produce very useful results for marketing.

@ 2019 Ank Analytics Confidential 5


Market Basket Analysis: Definitions
Market Basket Analysis (MBA)

✓ Objective of MBA is to find all frequent item sets and then generate strong
association rules from the frequent item sets

Definitions
– Transaction is a set of items (Item set).

– Support : It is the measure of how often the collection of items in an association


occur together as percentage of all transactions

– Confidence : It is the measure of uncertainty or trust worthiness associated with


each discovered pattern.

– Frequent item set : If an item set satisfies minimum support, then it is a frequent
item set.

– Strong Association rules: Rules that satisfy both a minimum support threshold and
a minimum confidence threshold
@ 2019 Ank Analytics Confidential 6
Market Basket Analysis: Definitions
Market Basket Analysis (MBA)

Mathematical Definition

Item Set I = { i1,i2,…,in } Total “n” items

Transaction tn = { ij,ik,…,in } Transaction basket contains


“(j, k, …,n)” items

Association Rule Customers who bought item “1”&“2”


{ i1,i2} => { ik}
are most likely to buy item “k”

Support, Confidence & Lift are


X Y measures of the association rule
@ 2019 Ank Analytics Confidential 7
Market Basket Analysis: Calculations/ Estimation
Market Basket Analysis (MBA)

• Support Is the frequency of Example: Transaction/Basket


transactions to have all the items Transaction Items from the customers who bought
ID more than 1 items
on both sets X and Y are bought 1 Apple, Banana, Cherry, Durian
together* 2
3
Apple, Durian
Banana, Durian
𝒏 𝑿∩𝒀 4 Durian, Banana, Cherry
𝒔𝒖𝒑𝒑𝒐𝒓𝒕 𝑿 → 𝒀 = 𝑷 𝑿 ∩ 𝒀 =
𝑵 5 Banana, Durian
6 Apple, Banana
7 Apple, Cherry, Durian

• Confidence is the % of customers


who bought items on set X also Rule: Customers who bought Banana (X) are most
bought items on set Y** probably will buy Durian (Y)
𝒏 𝑿∩𝒀
𝒄𝒐𝒏𝒇𝒊𝒅𝒆𝒏𝒄𝒆 𝑿 → 𝒀 = 𝑷 𝒀|𝑿 =
𝒏 𝑿 - Support: 4 / 7 = 57%
- Confidence: 4/5 = 80%
• Lift Is the ratio by which the - Lift: 80%/60% = 133% (min confidence=60%)
confidence of a rule exceeds the
*Probability of union of set X & Y
expected confidence.
@ 2019 Ank Analytics Confidential **conditional probability to obtain set Y given set X 8
Market Basket Analysis (MBA)

• Process Steps
Set Support &
Organize Calculate Support, Identify Association
Confidence
Data Confidence & Lift Rule
Threshold

•Data should be •Set threshold •Use ‘apriori’ •Set threshold


organized at limits algorithm to limits
transaction • Support >x% estimate •Can be defined
basket level • Confidence Support, based on the
>y% Confidence & distribution of
Lift the support &
confidence

Validate analytically identified association


rules using category knowledge and
business sense
@ 2019 Ank Analytics Confidential 9
Market Basket Analysis: Apriori
Market Basket Analysis – apriori algorithm

• Apriori algorithm is mostly used to execute market basket analysis


• It is used to identify frequent item sets based on Support and then finally selects the
representative rules by using Confidence threshold
Example
Thresholds Support -3, Confidence – 75%
Transaction Set 1 Element Set 2 Elements Set 3 Element Sets
Transaction Items Item Set Support
1 A C D F G {A,C,D} 3
2 A B C D F {A,C,F} 3
3 C D E {A,D,F} 4
4 A D f {C,D,E} 3
5 A C D E F Support < threshold {C,D,F} 4
6 B C D E F G

Representative Sets
Representative Sets Support All item sets are
{A,C,D,F} 3 “Frequent Item Sets”
Generate {A,D,F} 4 4 Element Sets We won’t be able to make
Representative {C,D,E} 3 Item Set Support any other 4 element sets as
Rules {C,D,F} 4 {A,C,D,F} 3 the subsets are infrequent
{C,D} 5
*Set ACD & ACF are not part of representative Sets as they are subset of set ACDF and
@ 2019 Ank Analytics Confidential {D,F} 5 having the same support value
10
Market Basket Analysis – apriori algorithm
Thresholds Support -3, Confidence – 75%
Representative Rules – Select rules that satisfies the confidence threshold

{C,D,E} {C,D,F}
Rules Confidence Rules Confidence
C-->DE 60% C-->DF 80% Final Set of Rules
D-->CE 50% D-->CF 67% Rules Confidence
E-->CD 100% F-->CD 80% A-->CDF 75%
CD-->E 60% CF-->AD 100%
A-->DF 100%
F-->AD 80%
Confidence < threshold E-->CD 100%
C-->DF 80%
F-->CD 80%
{A,D,F} {C,D} C-->D 100%
{D,F}
Rules Confidence Rules Confidence D-->C 83%
Rules Confidence
A-->DF 100% C-->D 100% D-->F 83%
D-->F 83%
D-->EF 67% D-->C 83% F-->D 100%
F-->D 100%
F-->AD 80%

Support of {ACDF} / Support {A} You can’t create rules which contains any
→ ¾ = 75% infrequent set
@ 2019 Ank Analytics Confidential 11
CA3
Market Basket Analysis

• Identify top selling products/items (Item Frequency Plot)


• Use apriori algorithm to identify the frequent item set and representative
association rules.
Min Support 50%
Transaction Data Set Min Confidence 50%

@ 2019 Ank Analytics Confidential 12


Market Basket Analysis Case Study: Market Basket Analysis for an Organic Grocery Store

Problem:
• Mom & Pop's Whole Foods opened to great acclaim from the local community. Its customers praise M&P for
the great selection of high-quality, locally sourced organic foods and products.
• However, a year after opening, M&P is still far from profitable. The owners have asked for our help with
analyzing their data to improve marketing and operational effectiveness.

Methodology
• We will use "Market Basket Analysis", also known as affinity or association analysis, to better understand M&P
customer buying preferences. Market Basket analysis addresses the questions: "What goes with what? Which
items are ordered or purchased together?"
• We will take the following steps:
• Group transactions by product category, then graph item frequencies in a bar plot.
• Find all product category association rules and then use a "Bubble Matrix" to show the intensity and
frequency of each association.
• Find the top 6 best-selling product categories. For each best-seller, narrow down association rules to just
the top 5, and create a network graph visualization to help M&P improve its product marketing and sales
results.

@ 2019 Ank Analytics Confidential


Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 13
Market Basket Analysis Case Study: Market Basket Analysis for an Organic Grocery Store

Available Data:
• 30 days worth of point-of-sale retail data (9,835 "market basket" transactions).

• Each market basket represents a list of items on a grocery store receipt. For example, these are the
contents of market baskets 9832-9834:
• 9832 {cooking chocolate}
• 9833 {chicken, citrus fruit, other vegetables, butter, yogurt, frozen dessert, domestic eggs, rolls/buns, rum,
cling film/bags}
• 9834 {semi-finished bread, bottled water, soda, bottled beer}

• There are 169 product items for sale. Products are categorized in a 2-level hierarchy. For example:
• Level 1: drinks (Category)
• Level 2: beer (Sub-Category)
• Product Items: bottled beer, cannedbeer (Item)

• Unfortunately, no information is available about:


• Date/Time, Customer Information
• Product Quantities

@ 2019 Ank Analytics Confidential


Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 14
Market Basket Analysis Case Study: Market Basket Analysis for an Organic Grocery Store

Step 1: Aggregate Data, Determine Frequency/Support


We'd like to answer the question: What types of products are purchased most frequently? Rather
than finding the frequency of all 169 product items, we'd like to roll up the data into product
categories.
38
✓ But there are only 10 categories, which would
not be too informative.
24
21
✓ So we will aggregate using Sub-categories (55)
16 15
13 12 11 11 instead.
8
✓ Also, insights at category level is just good to
know.
✓ To really understand customer buying
preferences, we need to look at the
itemFrequencyPlot at Sub-category level.

@ 2019 Ank Analytics Confidential


Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 15
Market Basket Analysis Case Study: Market Basket Analysis for an Organic Grocery Store

Item Frequency Bar Plot by Sub-Category

Only 32 out of 55 possible Sub-


categories are shown because we
specified support=0.025.

Support refers to the prevalence of an


Candidates item. Each item is evaluated for the
for further proportion of the times it appears in
the data set. Support =0.025 means
study that at least one in every 250 market
baskets must contain the item.

The arrows indicate 6 Sub-categories


that are worth further study because
these products occur more frequently
than the other product sub-
categories..

@ 2019 Ank Analytics Confidential


Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 16
@ 2019 Ank Analytics Confidential
Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 17
Market Basket Analysis Case Study: Market Basket Analysis for an Organic Grocery Store

@ 2019 Ank Analytics Confidential


Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 18
Market Basket Analysis Case Study: Market Basket Analysis for an Organic Grocery Store

@ 2019 Ank Analytics Confidential


Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 19
Market Basket Analysis Case Study: Market Basket Analysis for an Organic Grocery Store

@ 2019 Ank Analytics Confidential


Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 20
Market Basket Analysis Case Study: Market Basket Analysis for an Organic Grocery Store

Conclusion:

Market Basket Analysis is a great example of the power of data mining. Paying attention to associations between items
in the market basket (or retail receipt, or online shopping cart) reveals correlations that are not obvious and would not
have otherwise come to light.
Obvious: {Bread} => {Milk}.
Not So Obvious: {fruit, vinegar/oils} => {dairy products}.

However, it's important to add the disclaimer that "correlation is not causation." Buying {fruit, vinegar/oils} does not
cause the shopper to automatically buy {dairy products}(nor vice versa). But the convenience factor of displaying those
products together can be used to influence buyer behavior.
None of the association rules seem to been particularly intuitive nor counter-intuitive. The key is that the rules are
based on data rather than intuition ("Based on my extensive experience as a grocer, I think we should do this") or
opinion ("As a customer, I think your store should do this…"). Data-driven analysis has greater accuracy and predictive
power vs. intuitive decision-making or focus groups.

@ 2019 Ank Analytics Confidential


Case Study Source: Michelle Darling - UCSC Ext. Intro to Data Analysis - Fall 2013 21
Market Basket Analysis – Analysis Development using R

• Load required libraries (‘arules’ & ‘arulesViz’)


• Read data in to R (from csv)
• Data understanding and EDA
– Understand data structure and variables
– Convert/Modify variables
– Clean data
• Exploratory analysis for business understanding
• Data preparation for executing ‘apriori’ algorithm
• Application of apriori algorithm for identifying significant association rules
• Interpretation of significant rules and their use

@ 2019 Ank Analytics Confidential 22


Market Basket Analysis – Case Study

Case Study: Organic Market Basket Recommendation by Michelle Darling


Data: Groceries (inbuilt in ‘arules’ package)

R Packages – Required packages to execute Market Basket Analysis in R:


• arules – "Association rules." Analyzes transaction data and patterns,
such as frequent itemsets and association rules.
• arulesviz – "Association rules visualization." An extension to arules
which delivers visualization techniques for association rules and
itemsets.
• RColorBrewer - provides palettes for drawing nice maps shaded
according to a variable.

@ 2019 Ank Analytics Confidential 23


Market Basket Analysis (MBA)

Ref:
• Market Basket Analysis Tutorial --- Kardi Teknomo
• Market Basket Analysis with
✓ http://www.salemmarafi.com/code/market-basket-analysis-
with-r/
✓ https://select-statistics.co.uk/blog/market-basket-analysis-
understanding-customer-behaviour/

@ 2019 Ank Analytics Confidential 24

Potrebbero piacerti anche