Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
- Rajarshi Pandit
+91 98457 43370
raj@ankanalytics.com
The retailer rearranged their stores so that beer and Does it make sense to
diapers were displayed next to each other. Sales of both promote them together?
beer and diapers soared."
• Product placement: items that are associated (such as bread and butter, or tissue and
cold medicine, potato chip and beer) can be put near to each other. If the customers
see them, it has higher probability that they will purchase them together.
• Customer behavior: associating purchase with demographic, and socio economic data
(such as age, gender and preference) may produce very useful results for marketing.
✓ Objective of MBA is to find all frequent item sets and then generate strong
association rules from the frequent item sets
Definitions
– Transaction is a set of items (Item set).
– Frequent item set : If an item set satisfies minimum support, then it is a frequent
item set.
– Strong Association rules: Rules that satisfy both a minimum support threshold and
a minimum confidence threshold
@ 2019 Ank Analytics Confidential 6
Market Basket Analysis: Definitions
Market Basket Analysis (MBA)
Mathematical Definition
• Process Steps
Set Support &
Organize Calculate Support, Identify Association
Confidence
Data Confidence & Lift Rule
Threshold
Representative Sets
Representative Sets Support All item sets are
{A,C,D,F} 3 “Frequent Item Sets”
Generate {A,D,F} 4 4 Element Sets We won’t be able to make
Representative {C,D,E} 3 Item Set Support any other 4 element sets as
Rules {C,D,F} 4 {A,C,D,F} 3 the subsets are infrequent
{C,D} 5
*Set ACD & ACF are not part of representative Sets as they are subset of set ACDF and
@ 2019 Ank Analytics Confidential {D,F} 5 having the same support value
10
Market Basket Analysis – apriori algorithm
Thresholds Support -3, Confidence – 75%
Representative Rules – Select rules that satisfies the confidence threshold
{C,D,E} {C,D,F}
Rules Confidence Rules Confidence
C-->DE 60% C-->DF 80% Final Set of Rules
D-->CE 50% D-->CF 67% Rules Confidence
E-->CD 100% F-->CD 80% A-->CDF 75%
CD-->E 60% CF-->AD 100%
A-->DF 100%
F-->AD 80%
Confidence < threshold E-->CD 100%
C-->DF 80%
F-->CD 80%
{A,D,F} {C,D} C-->D 100%
{D,F}
Rules Confidence Rules Confidence D-->C 83%
Rules Confidence
A-->DF 100% C-->D 100% D-->F 83%
D-->F 83%
D-->EF 67% D-->C 83% F-->D 100%
F-->D 100%
F-->AD 80%
Support of {ACDF} / Support {A} You can’t create rules which contains any
→ ¾ = 75% infrequent set
@ 2019 Ank Analytics Confidential 11
CA3
Market Basket Analysis
Problem:
• Mom & Pop's Whole Foods opened to great acclaim from the local community. Its customers praise M&P for
the great selection of high-quality, locally sourced organic foods and products.
• However, a year after opening, M&P is still far from profitable. The owners have asked for our help with
analyzing their data to improve marketing and operational effectiveness.
Methodology
• We will use "Market Basket Analysis", also known as affinity or association analysis, to better understand M&P
customer buying preferences. Market Basket analysis addresses the questions: "What goes with what? Which
items are ordered or purchased together?"
• We will take the following steps:
• Group transactions by product category, then graph item frequencies in a bar plot.
• Find all product category association rules and then use a "Bubble Matrix" to show the intensity and
frequency of each association.
• Find the top 6 best-selling product categories. For each best-seller, narrow down association rules to just
the top 5, and create a network graph visualization to help M&P improve its product marketing and sales
results.
Available Data:
• 30 days worth of point-of-sale retail data (9,835 "market basket" transactions).
• Each market basket represents a list of items on a grocery store receipt. For example, these are the
contents of market baskets 9832-9834:
• 9832 {cooking chocolate}
• 9833 {chicken, citrus fruit, other vegetables, butter, yogurt, frozen dessert, domestic eggs, rolls/buns, rum,
cling film/bags}
• 9834 {semi-finished bread, bottled water, soda, bottled beer}
• There are 169 product items for sale. Products are categorized in a 2-level hierarchy. For example:
• Level 1: drinks (Category)
• Level 2: beer (Sub-Category)
• Product Items: bottled beer, cannedbeer (Item)
Conclusion:
Market Basket Analysis is a great example of the power of data mining. Paying attention to associations between items
in the market basket (or retail receipt, or online shopping cart) reveals correlations that are not obvious and would not
have otherwise come to light.
Obvious: {Bread} => {Milk}.
Not So Obvious: {fruit, vinegar/oils} => {dairy products}.
However, it's important to add the disclaimer that "correlation is not causation." Buying {fruit, vinegar/oils} does not
cause the shopper to automatically buy {dairy products}(nor vice versa). But the convenience factor of displaying those
products together can be used to influence buyer behavior.
None of the association rules seem to been particularly intuitive nor counter-intuitive. The key is that the rules are
based on data rather than intuition ("Based on my extensive experience as a grocer, I think we should do this") or
opinion ("As a customer, I think your store should do this…"). Data-driven analysis has greater accuracy and predictive
power vs. intuitive decision-making or focus groups.
Ref:
• Market Basket Analysis Tutorial --- Kardi Teknomo
• Market Basket Analysis with
✓ http://www.salemmarafi.com/code/market-basket-analysis-
with-r/
✓ https://select-statistics.co.uk/blog/market-basket-analysis-
understanding-customer-behaviour/