Chapter 10 Association Rule

Chapter 10
ASSOCIATION RULE
By:
Aris D.(13406054)
Ricky A.(13406058)
Nadia FR. (13406069)
Amirah K.(13406070)
Paramita AW.(13406091)
Bahana W.(13406102)
Introduction
• Affinity Analysis
 Study of attributes or characteristics that “go
together”.
• Market Based Analysis

The method, uncover rules for quantifying the
relationship between two or more attributes.
“If antecedent, then consequent”

Affinity Analysis & Market Basket Analysis
• Example:
Supermarket may find that of the 1000 customers
shopping on a Thursday night, 200 bought
diapers, and of the 200 who bought diapers, 50
bought beer.
The association rule:

If buy diapers, then buy beers”,
with support of 50/1000 = 5%,
and confidence of 50/200=25%
Affinity Analysis & Market Basket
Analysis (2)
Examples business & research:
• Investigating the proportion of subscribers to your
company’s cell phone plan that respond positively to an
offer of a service upgrade
• Examining the proportion of children whose parents
read to them who are themselves good readers
• Predicting degradation in telecommunications networks
• Finding out which items in a supermarket are purchased
together & which are never purchased together
• Determining the proportion of cases in which a new drug
will exhibit dangerous side effects
Affinity Analysis & Market Basket
Analysis (3)
• The number of possible association rules grows
exponentially in the number of attributes.
• If binary attributes (yes/no) then there are
k.[2^(k-1)] possible association rule.
• Example: a convinience store that sells 100
items. Possible association rules = 100.[2^99] ≈
6,4 x (10^31)
• A priori algorithm (pendahuluan) reduce the
search problem to a more manageable size
Notation for Data Representation in
Market Basket Analysis
• Farmer sells I = {asparagus, beans, broccoli,
corn, green peppers, squash, tomatoes}
• A customer puts in a basket, Subset I =
{broccoli, corn}
• Subset doesn’t keep track of how much each
item is purchased, just the name of item.
Transactional Data Format
Tabular Data Format
Support, Confidence, Frequent
Itemsets, & the Apriori Property
• Example:
D : set of transactions represented in Table 10.1
T : a transaction in D represents a set of items
I : set of items
Set of items A : beans, squash
Set of items B : asparagus
THEN …
Association rule takes the form if A, then B (AB),
A and B are PROPER subsets of I, and are mutually
exclusive
Table of Transaction Made
 Support and Confidence
• Support, s, is the proportion of transactions in D
that contain both A and B.
support = P(AB)
= number of transactions containing both A&B
total number of transactions
• Confidence, c, is a measure of the accuracy of the
rule.
confidence = P(B|A)= P(AB)
P(A)
= number of transactions containing both A&B
number of transactions containing A
• Analysts prefer RULES:

High support AND High confidence
 Frequent Itemset
 Definition…
An Itemset is a set of items contained in I, and a k-
itemset containing k items.
e.g: {beans, squash}  2-itemset
 The itemset frequency…
the number of transactions that contain the
particular itemset
 A frequent itemset …
itemset that occurs at least a certain minimum
number of times, having itemset frequency
Example:
Set that = 4, then itemsets that occur more than
FOUR times are said to be frequent.
 The Apriori Property
• Mining Association Rules
It is a two-steps process:
1. Find all frequent itemsets (all itemsets with
frequency   )
2. From the frequent itemsets, generate
association rules satisfying the minimum
support and confidence conditions
• The Apriori property states that if an itemset Z is

not frequent, then adding another item A to
the itemset Z will not make Z more
frequent. This helpful property reduces
significantly the search space for the a priori
algorithm.
How does the Apriori Algorithm Work?
• Part 1: Generating Frequent Itemsets

• Part 2: Generating Association Rules
Generating Frequent Itemsets
• Example:
let  = 4, so that an itemset is frequent if it occurs
four or more times in D.
F1= {asparagus, beans, broccoli, corn, green

peppers, squash, tomatoes}
F2 first, constructs a set Ck of candidate k-itemsets
by joining Fk-1 with itself. Then it prunes Ck using
the a priori property.
Ck for k=2, consists of all the combinations of
vegetables in Table 10.4
F3 not much different than the steps for F2, but
use k number = 3
Table 10.3 (pg.183)
Table 10.4 (pg. 185)
• However, consider s={beans, corn, squash}
the subset {corn, squash} has frequency 3 < 4 =
, so that {corn, squash} is not frequent.
By the priori property, therefore, {beans, corn,
squash} cannot be frequent, is therefore pruned,
and doesn’t appear in F3
So does the s= {beans, squash, tomatoes}, the

frequency of the subsets is < 4
Generating Association Rules
1. Generate all subsets of s.
2. Association Rule R : ss ⇒ (s-ss)
Generate R if fulfills the minimum confidence
requirement.
(s-ss) is set s without ss
Example two antecedent
• All transaction = 14
• Transaction include asparagus and beans = 5
• Transaction include asparagus and Squash = 5
• Transaction include Beans and squash = 6
Ranked by support x Confidence
• Minimum Confidence 80%

Clementine generating Association
Rules
Clementine generating Association
Rules (2)
• Support means occurences of antecedent,
different from what we defined before.
• First columns indicates number of antecedent
occurs.
• To find actual “support” using clementine,
multiply support and confidence.
Extension From Flag Data to General
Categorical Data
- Association rule not only for Flag (Boolean)

data.
- A priori algorithm can be applied to categorical
data.
Example using Clementine
• Recall Normalized adult data set in chapter 6
and 7
Information-Theoretic Approach:
Generalized Rule Induction Method
Why GRI?
• A priori algorithm is not well equipped to handle
numerical attributes, need discretization
• Discretization can lead to loss of information
• GRI can handle both categorical or numerical
variables as inputs, but still requires categorical
variables as output
Generalized Rule Induction Method (2)
J-Measure
 p( y | x) 1  p ( y | x) 
J  p( x)  p( y | x). ln  [1  p( y | x)]. ln 
 p ( y ) 1  p ( y ) 
• p(x) probability of the value of x (antecedent)

• p(y) probability of the value of y (consequent)
• p(y|x) conditional probability of y given that x
has occured
Generalized Rule Induction Method (3)
• J-Measure shows “interestingness”
• In GRI, user specifies how many association
rules would be reported
• If the “interestingness” of new rule > current
minimum J in the rule table, new rule is
inserted, rule with minimum J is eliminated
Application of GRI
p(x) : female, never married
p(x) = 0.1463
Application of GRI (2)
p(y) : work class = private
p(y) = 0.6958
Application of GRI (3)
p(y|x) : work class = private;
given : female, never married
p(y|x) = conditional probabilities = 0.763

Application of GRI
Calculation :
 p( y | x) 1  p( y | x) 
J  p( x)  p( y | x). ln  [1  p ( y | x)]. ln
 p( y) 1  p ( y ) 
 0.763 0.237 
 0.14630.763. ln  (0.237). ln
 0.6958 0.3042 
 0.1463[0.763. ln(1.0966)  (0.237). ln(0.7791)]
 0.001637
When not to use Association Rules
• Association Rules chosen a priori could be used
based on:
▫ Confidence
▫ Confidence Difference
▫ Confidence Ratio
• Association Rules need to be applied with care

because the results are sometimes unreliable.
When not to use Association Rules (2)
Association Rules chosen a priori, based on confidence
• Applying this association rule reduces the
probability of randomly selecting desired data.
• Eventhough the rule is useless, software still
reported it probably because the default ranking
mechanism for priori’s algorithm is
confidence.
• We should never simply believe the computer
output without making the effort to
understand the models and mechanism
underlying the result.
Association Rules chosen a priori, based on confidence
Association Rules chosen a priori, based on confidence difference
• A random selection from the database would

have provided more effective results (none
useless report)than applying the association
rule.
• This rule provide the greatest increase in

confidence from the prior to posterior.
• Evaluation measures the absolute difference

between the prior and posterior confidences.
Association Rules chosen a priori, based on confidence difference
Association Rules chosen a priori, based on confidence ratio
• Analyst prefer to use the confidence ratio to

evaluate potential rules.
• Confidence difference criterion yielded the very

same rules as did the confidence ratio
criterion.
Association Rules chosen a priori, based on confidence ratio
• Example:
If Marital_Satus = Divorced, then sex = Female. p(y)=0.3317 dan
p(y|x)=0.60
Do Association Rules Represent
Supervised or Unsupervised Learning?
• Supervised learning:
▫ Variable is prespecified
▫ Algorithm is provided with a rich collection of examples
where possible association between the target vaiable and
the predictor variables may be uncovered
• Unsupervised learning:
▫ No target variable is identified explicitly
▫ Algorithm searches for patterns and structure among all the
variables
• Association Rules generally used for unsupervised learning but can

also be applied for supervised learning for classification task
Local Patterns Versus Global Models
 Model: Global Description or Explanation of a
data set.
 Patterns: Essential local features of Data
 Association rules are well suited to uncovering
local patterns in data
 Applying “if “clause drills down deep into data set,
uncovering a hidden local pattern that might be
relevant
 Finding local patterns is one of the most
important goals in data mining. It can lead to new
profitable initiatives.

Chapter 10 Association Rule

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter 10 Association Rule

Caricato da

Copyright:

Formati disponibili

Chapter 10

• Market Based Analysis

“If antecedent, then consequent”

The association rule:

• Analysts prefer RULES:

• The Apriori property states that if an itemset Z is

• Part 1: Generating Frequent Itemsets

F1= {asparagus, beans, broccoli, corn, green

So does the s= {beans, squash, tomatoes}, the

• Minimum Confidence 80%

- Association rule not only for Flag (Boolean)

• p(x) probability of the value of x (antecedent)

p(y|x) = conditional probabilities = 0.763

• Association Rules need to be applied with care

• A random selection from the database would

• This rule provide the greatest increase in

• Evaluation measures the absolute difference

• Analyst prefer to use the confidence ratio to

• Confidence difference criterion yielded the very

• Association Rules generally used for unsupervised learning but can

Potrebbero piacerti anche