Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Pattern Evaluation
Data Mining
Task-relevant
Data
Selection and
Transformation
Data
Data Warehouse
Cleaning
Data Integration
Databases
Data Mining - Market Basket
1/15/2018 Analysis 1
Market Basket Analysis
Analysis of customer buying habits by finding associations
and correlations between the different items that
customers place in their "shopping basket"
Milk, eggs, sugar, bread Milk, eggs, cereal, bread Eggs, sugar
Customer1
Customer2 Customer3
• Find:
• Groups of items which are frequently purchased together
Trivial:
"Customers who purchase maintenance
agreements are very likely to purchase large
appliances.“
Unexplicable/unexpected:
"When a new hardware store opens, one of
the most sold items is toilet rings."
Method:
L1 = find frequent_1-itemsets(D);
for (k = 2; Lk-1 ; k++) {
Ck = apriori_gen(Lk-1, min_sup);
for each transaction t D { // scan D for counts
Ct = subset(Ck, t); // get the subsets of t that are candidates
for each candidate c Ct
c.count++;
}
Lk = {c Ck|c.count min_sup}
}
return L = kLk;
Data Mining - Market Basket
1/15/2018 Analysis 14
Frequent Sets with Apriori
procedure apriori_gen( Lk-1:frequent (k-1)-itemsets;
min_sup: minimum support)
for each itemset l1 Lk-1
for each itemset l2 Lk-1
if (l1[1] = l2[1]) ^ (l1[2] = l2[2]) ^ … ^ (l1[k - 2] = l2[k - 2])
^ (l1[k - 1] < l2[k - 1]) then {
c = l1 l2 // join step: generate candidates
if has_infrequent_subset(c, Lk-1) then
delete c; // prune step: remove unfruitful candidate
else add c to Ck;
}
return Ck;
C4={abcd}
Database D C1 L1
itemset sup.
TID Items itemset sup.
{1} 2
100 134 {1} 2
Scan D {2} 3 Prune
200 235 {2} 3
{3} 3
300 1235 {3} 3
{4} 1
400 25 {5} 3
{5} 3
C2 C2 L2
itemset itemset sup
{1 2} {1 2} 1 itemset sup
{1 3} {1 3} 2 {1 3} 2
{1 5} Scan D {1 5} 1 Prune {2 3} 2
{2 3} {2 3} 2 {2 5} 3
{2 5} {2 5} 3 {3 5} 2
{3 5} {3 5} 2
C3 L3
Rule 2 to remember:
For Frequent Itemsets generation, support
threshold is used
For Association Rules, confidence threshold is
used
Subjective measures
(Silberschatz & Tuzhilin, KDD95)
A rule (pattern) is interesting ifit is
unexpected (surprising to the user);
and/or
actionable (the user can do something with it)
RULES:
nationality = French income = high [50%, 100%]
income = high nationality = French [50%, 75%]
age = 50 nationality = Italian [33%, 100%]
age(X,”19-25”) occupation(X,“student”)
buys(X,“coke”)
TID Items
T1 {1110, 1210, 2110, 2210}
T2 {1110, 2110, 2220, 3230}
T3 {1120, 1222, 2210, 4113}
T4 {1110, 1210}
T5 {1110, 1222, 2110, 2210, 4113}
Data Mining - Market Basket
1/15/2018 Analysis 34