Sei sulla pagina 1di 18

Mining Association Rules

Dr. Shahzad Faisal


Lecture Source: Vasileios Megalooikonomou
Dept. of Computer and Information Sciences
Temple University

(based on notes by Jiawei Han and Micheline Kamber)


Association Mining?
• Association rule mining:
– Finding frequent patterns, associations, correlations, or causal structures
among sets of items or objects in transaction databases, relational
databases, and other information repositories.
– Association rule mining finds interesting associations and correlation
relationships among large sets of data items. Association rules show
attribute value conditions that occur frequently together in a given data
set.
• Applications:
– Basket data analysis, cross-marketing, catalog design, loss-leader
analysis, clustering, classification, etc.
• Examples.
– Rule form: “Body ead [support, confidence]”.
– buys(x, “bread”)  buys(x, “butter”) [0.5%, 60%]
– major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]
Association Rules: Basic Concepts
• Given: (1) database of transactions, (2) each transaction
is a list of items (purchased by a customer in a visit)
• Find: all rules that correlate the presence of one set of
items with that of another set of items
– E.g., 98% of people who purchase tires and auto accessories
also get automotive services done
• Applications
– *  Maintenance Agreement (What the store should do to
boost Maintenance Agreement sales)
– Home Electronics  * (What other products should the store
stocks up?)
– Attached mailing in direct marketing
Transactional Data
Market basket example:
Basket1: {bread, cheese, milk}
Basket2: {apple, eggs, salt, yogurt}

Basketn: {biscuit, eggs, milk}
Definitions:
– An item: an article in a basket, or an attribute-value pair
– A transaction: items purchased in a basket; it may have
TID (transaction ID)
– A transactional dataset: A set of transactions

Spring 2005 CSE 572, CBS 598 by H. Liu 4


Itemsets and Association Rules
• An itemset is a set of items.
– E.g., {milk, bread, cereal} is an itemset.
• A k-itemset is an itemset with k items.
• Given a dataset D, an itemset X has a (frequency)
count in D
• An association rule is about relationships between
two disjoint itemsets X and Y
XY
• It presents the pattern when X occurs, Y also occurs
Spring 2005 CSE 572, CBS 598 by H. Liu 5
Use of Association Rules

• Association rules do not represent any sort of


causality or correlation between the two itemsets.
– X  Y does not mean X causes Y, so no Causality

• Association rules assist in marketing, targeted


advertising, floor planning, inventory control,
churning management, homeland security, …

Spring 2005 CSE 572, CBS 598 by H. Liu 6


Basic Concepts: Frequent Patterns
Tid Items bought • (absolute) support, or, support
10 milk, Nuts, Diaper, Butter, Bread count of X: Frequency or
20 Bear toy, Coffee, Diaper occurrence of an itemset X
30 Bear toy, Diaper, Eggs • (relative) support, s, is the fraction
40 Nuts, Eggs, Milk, Butter, Bread of transactions that contains X (i.e.,
the probability that a transaction
contains X)
Customer Customer
buys both buys diaper
• An itemset X is frequent if X’s
support is no less than a minsup
threshold

Customer
buys Bear
7
Mining Association Rules—An Example

Transaction ID Items Bought Min. support 50%


2000 A,B,C Min. confidence 50%
1000 A,C
4000 A,D Frequent Itemset Support
{A} 75%
5000 B,E,F
{B} 50%
{C} 50%
For rule A  C: {A,C} 50%
support = support({A U C}) = 50%
confidence = support({A U C})/support({A}) = 66.6%
The Apriori Algorithm — Example
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
If the rule had a lift of 1, it would imply that the probability of occurrence of the
antecedent and that of the consequent are independent of each other. When
two events are independent of each other, no rule can be drawn involving those
two events.

If the lift is > 1, that lets us know the degree to which those two occurrences
are dependent on one another, and makes those rules potentially useful for
predicting the consequent in future data sets.
Lift
Multi-dimensional Association
▪ Single-dimensional rules:

buys(X, “milk”)  buys(X, “bread”)

▪ Multi-dimensional rules:  2 dimensions or predicates

▪ Inter-dimension assoc. rules (no repeated predicates)


age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”)

▪ hybrid-dimension assoc. rules (repeated predicates)


age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)

▪ Categorical Attributes

▪ finite number of possible values, no ordering among values

▪ Quantitative Attributes

▪ numeric, implicit ordering among values


Example: Generating Rules from an Itemset

▪ Frequent itemset from golf data:


Humidity = Normal, Windy = False, Play = Yes

▪ Seven potential rules:


If Humidity = Normal and Windy = False then Play = Yes
If Humidity = Normal and Play = Yes then Windy = False
If Windy = False and Play = Yes then Humidity = Normal
If Humidity = Normal then Windy = False and Play = Yes
If Windy = False then Humidity = Normal and Play = Yes
If Play = Yes then Humidity = Normal and Windy = False
If True then Humidity = Normal and Windy = False and Play = Yes
Rules for the weather data

▪ Rules with support > 1 and confidence = 100%:


Association rule Sup. Conf.
1 Humidity=Normal Windy=False Play=Yes 4 100%
2 Temperature=Cool Humidity=Normal 4 100%
3 Outlook=Overcast Play=Yes 4 100%
4 Temperature=Cold Play=Yes Humidity=Normal 3 100%
... ... ... ... ...
58 Outlook=Sunny Temperature=Hot Humidity=High 2 100%

▪ In total: 3 rules with support four, 5 with support


three, and 50 with support two
Example of Discovering Rules
TID List of
Item_IDs Let use consider the 3-itemset {I1, I2, I5}
T100 I1, I2, I5 with support of 0.22(2)%. Let generate all
the association rules from this itemset:
T200 I2, I4
T300 I2, I3 I1  I2  I5 confidence= 2/4 = 50%
T400 I1, I2, I4
I1  I5  I2 confidence= 2/2 = 100%
T500 I1, I3
I2  I5  I1 confidence= 2/2 = 100%
T600 I2, I3
T700 I1, I3 I1  I2  I5 confidence= 2/6 = 33%

T800 I1, I2, I3, I5 I2  I1  I5 confidence= 2/7 = 29%


T900 I1, I2, I3 I5  I1  I2 confidence= 2/2 = 100%
Applications
▪ Market basket analysis
▪ Store layout, client offers
▪ This analysis is applicable whenever a customer purchases multiple
things in proximity
▪ telecommunication (each customer is a transaction containing the set
of phone calls)
▪ weather analysis (each time interval is a transaction containing the set
of observed events)
▪ credit cards
▪ banking services
▪ medical treatments
▪ Finding unusual events
▪ WSARE – What is Strange About Recent Events
▪ …

Potrebbero piacerti anche