Bab 05 - Association Mining

Data Mining Arif Djunaidy FTIF ITS
Bab 5 - 1/58
Bab 5
Mining Association Rules

Arif Djunaidy
e-mail: arif@its-sby.edu
URL: www.its-sby.edu/~arif
Bab 5 - 2/58
Outline
What is association rules mining?
The Apriori algorithm
Iceberg Queries
Methods to improve Aprioris efficiency
Mining frequent patterns without candidate
generation
Interestingness measurements
Multiple-level associations rules mining
Bab 5 - 3/58
Association rule mining:
Finding frequent patterns, associations, correlations, or causal structures
among sets of items or objects in transaction databases, relational
databases, and other information repositories.
Applications:
Basket data analysis, cross-marketing, catalog design, clustering,
classification, etc.
Examples.
buys(x, computer) buys(x, software) [2%, 75%]
age(x, mature) ^ takes(x, DM) grade(x, A) [5%, 75%]
What Is Association Rules Mining?
Bab 5 - 4/58
Association Rules Mining: Basic Principle
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other
items in the transaction
Also known as market basket analysis
Market-Basket transactions
TI D I tems
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke

Example of Association Rules
{Diaper} {Beer},
{Milk, Bread} {Eggs,Coke},
{Beer, Bread} {Milk},
Implication means co-occurrence,
not causality!
Bab 5 - 5/58
Definition: Frequent Itemset
Itemset
A collection of one or more items
Example: {Milk, Bread, Diaper}
k-itemset
An itemset that contains k items
Support count (o)
Frequency of occurrence of an itemset
E.g. o({Milk, Bread,Diaper}) = 2
Support
Fraction of transactions that contain an
itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greater
than or equal to a minsup threshold
TI D I tems
1 Bread, Milk

Bab 5 - 6/58
Definition: Association Rule
Example:
Beer } Diaper , Milk {
4 . 0
5
2
| T |
) Beer Diaper, , Milk (
= = =
o
s
67 . 0
3
2
) Diaper , Milk (
) Beer Diaper, Milk, (
= = =
o
o
c
Association Rule
An implication expression of the form
X Y, where X and Y are itemsets
Example:
{Milk, Diaper} {Beer}

Rule Evaluation Metrics
Support (s)
Fraction of transactions that contain
both X and Y
Confidence (c)
Measures how often items in Y
appear in transactions that
contain X
TI D I tems
1 Bread, Milk

Bab 5 - 7/58
Association Rule Mining Task
Given a set of transactions T, the goal of association
rule mining is to find all rules having
support minsup threshold
confidence minconf threshold

High confidence = strong pattern
High support = occurs often
Less likely to be random occurrence
Larger potential benefit from acting on the rule
Bab 5 - 8/58
Application 1 (Retail Stores)
Real market baskets
chain stores keep TBs of customer purchase info
Value?
how typical customers navigate stores
positioning tempting items
suggests cross-sell opportunities e.g., hamburger sale
while raising ketchup price

High support needed, or no $$s
Bab 5 - 9/58
Application 2 (Information Retrieval)
Scenario 1
baskets = documents
items = words in documents
frequent word-groups = linked concepts.
Scenario 2
items = sentences
baskets = documents containing sentences
frequent sentence-groups = possible plagiarism
Bab 5 - 10/58
Application 3 (Web Search)
Scenario 1
baskets = web pages
items = outgoing links
pages with similar references about same topic
Scenario 2
baskets = web pages
items = incoming links
pages with similar in-links mirrors, or same
topic
Bab 5 - 11/58
Example of Rules:

{Milk,Diaper} {Beer} (s=0.4, c=0.67)
{Milk,Beer} {Diaper} (s=0.4, c=1.0)
{Diaper,Beer} {Milk} (s=0.4, c=0.67)
{Beer} {Milk,Diaper} (s=0.4, c=0.67)
{Diaper} {Milk,Beer} (s=0.4, c=0.5)
{Milk} {Diaper,Beer} (s=0.4, c=0.5)
TI D I tems
1 Bread, Milk

Observations:
All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
Rules originating from the same itemset have identical support but
can have different confidence
Thus, we may decouple the support and confidence requirements
Bab 5 - 12/58
Goal find all association rules such that
Support > s
confidence > c
Reduction to Frequent Itemsets Problems
Find all frequent itemsets X
Given X={A
1
, ,A
k
}, generate all rules X-A
j
A
j
Confidence = sup(X)/sup(X-A
j
)
Support = sup(X)
Exclude rules whose confidence is too low
Observe X-A
j
also frequent support known
Finiding all frequent itemsets is the hard part!
Bab 5 - 13/58
Association Rule Mining: A Road Map
Boolean vs. quantitative associations (Based on the types of
values handled)
buys(x, WINDOWS 2K) ^ buys(x, SQLServer) buys(x,
DBMiner) [0.2%, 50%]
age(x, 30..39) ^ income(x, 42..48K) buys(x, PC) [1%, 75%]
Single dimension vs. multiple dimensional associations (see
ex. Above)
Single level vs. multiple-level analysis

Bab 5 - 14/58
How are association rules mined form
large databases?
Association rule mining is a two-step process.
1. Find all frequent itemsets:
By definition, each of these itemsets will occur at least as frequent as a pre-
determined minimum support count.
2. Generate strong association rules form the frequent
itemsets:
By definition, these rules must satisfy minimum support and minimum
confidence

Bab 5 - 15/58
Itemset Lattice: An Example
null
AB AC AD AE BC BD BE CD CE DE
A B C D E
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
ABCD ABCE ABDE ACDE BCDE
ABCDE
Given m items, there
are 2
m
-1 possible
candidate itemsets
Bab 5 - 16/58
Scale of Problem
WalMart
sells m=100,000 items
tracks n=1,000,000,000 baskets
Web
several billion pages
approximately one new word per page
Exponential number of itemsets
m items 2
m
-1 possible itemsets
Cannot possibly example all itemsets for large m
Even itemsets of size 2 may be too many
m=100,000 5 trillion item pairs
Bab 5 - 17/58
Frequent Itemsets in SQL
DBMSs are poorly suited to association rule mining
Star schema
Sales Fact
Transaction ID degenerate dimension
Item dimension
Finding frequent 3-itemsets:
SELECT Fact1.ItemID, Fact2.ItemID,
Fact3.ItemID, COUNT(*)
FROM Fact1
JOIN Fact2 ON Fact1.TID = Fact2.TID
AND Fact1.ItemID < Fact2.ItemID
JOIN Fact3 ON Fact1.TID = Fact3.TID
GROUP BY Fact1.ItemID, Fact2.ItemID, Fact3.ItemID
HAVING COUNT(*) > 1000
Finding frequent k-itemsets requires joining k copies of fact table
Joins are non-equijoins
Impossibly expensive!
Bab 5 - 18/58
Association Rules and Data Warehouses
Typical procedure:
Use data warehouse to apply filters
Mine association rules for certain regions, dates
Export all fact rows matching filters to flat file
Sort by transaction ID
Items in same transaction are grouped together
Perform association rule mining on flat file
An alternative:
Database vendors are beginning to add specialized data mining
capabilities
Efficient algorithms for common data mining tasks are built in to the
database system
Decisions trees, association rules, clustering, etc.
Not standardized yet
Bab 5 - 19/58
Finding Frequent Pairs
Frequent 2-Sets
hard case already
focus for now, later extend to k-sets
Nave Algorithm
Counters all m(m1)/2 item pairs (m = # of distinct items)
Single pass scanning all baskets
Basket of size b increments b(b1)/2 counters
Failure?
if memory < m(m1)/2
m=100,000 5 trillion item pairs
Nave algorithm is impractical for large m

Bab 5 - 20/58
Pruning Candidate Itemsets
Monotonicity principle:
If an itemset is frequent, then all of its subsets must also
be frequent

Monotonicity principle holds due to the following
property of the support measure:

Converse:
If an itemset is infrequent, then all of its supersets must
also be infrequent
) ( ) ( ) ( : , Y s X s Y X Y X > _
Bab 5 - 21/58
Found to be
Infrequent
null
A B C D E
ABCDE
Illustrating Monotonicity Principle
null
A B C D E
ABCDE
Pruned
supersets
Bab 5 - 22/58
Mining Frequent Itemsets: the Key Step
The Apriori principle:
Any subset of a frequent itemset must be frequent

Find the frequent itemsets: the sets of items that have
minimum support
A subset of a frequent itemset must also be a frequent itemset
i.e., if {AB} is a frequent itemset, both {A} and {B} should be a
frequent itemset
Iteratively find frequent itemsets with cardinality from 1 to k
(k-itemset)
Use the frequent itemsets to generate association rules.
Bab 5 - 23/58
The Apriori Algorithm
Join Step: C
k
is generated by joining L
k-1
with itself
Prune Step: Any (k-1)-itemset that is not frequent cannot be a
subset of a frequent k-itemset
Pseudo-code:
C
k
: Candidate itemset of size k
L
k
: frequent itemset of size k

L
1
= {frequent items};
for (k = 1; L
k
!=C; k++) do begin
C
k+1
= candidates generated from L
k
;
for each transaction t in database do
increment the count of all candidates in C
k+1
that are contained in t
L
k+1
= candidates in C
k+1
with min_support
end
return
k
L
k
;
Bab 5 - 24/58
The Apriori Algorithm Example (sup_min=2)
TID Items
100 1 3 4
200 2 3 5
300 1 2 3 5
400 2 5
Database D itemset sup.
{1} 2
{2} 3
{3} 3
{4} 1
{5} 3
itemset sup.
{1} 2
{2} 3
{3} 3
{5} 3
Scan D
C
1
L
1
itemset
{1 2}
{1 3}
{1 5}
{2 3}
{2 5}
{3 5}
itemset sup
{1 2} 1
{1 3} 2
{1 5} 1
{2 3} 2
{2 5} 3
{3 5} 2
itemset sup
{1 3} 2
{2 3} 2
{2 5} 3
{3 5} 2
L
2
C
2
C
2
Scan D
C
3
L
3
itemset
{2 3 5}
Scan D
itemset sup
{2 3 5} 2
Bab 5 - 25/58
L2=(1,3)
1->3
Sup(1U3)=2
conf(1->3) = sup(1U3)/sup(1)=2/2=100%

3->1
Sup(1U3)=2
conf(3->1) = sup(1U3)/sup(3)=2/3=67%

Generateing Associatin Rules form Frequent Itemsets
Bab 5 - 26/58
L3 = (2,3,5)
2 U 3 -> 5
sup (2U3U5) = 2, conf (2U3 -> 5) = sup(2U3U5)/sup(2U3) = 2/2 =
100%
2 -> 3 U 5
sup (2U3U5) = 2, conf (2 -> 3 U 5) = sup(2U3U5)/sup(2) = 2/3 =
67%
2 U 5 -> 3
67%
3U5 -> 2
100%
3 -> 2U5
sup (2U3U5) = 2, conf (3 ->2U 5) = sup(2U3U5)/sup(3) = 2/3 =
67%
5 -> 2U3
sup (2U3U5) = 2, conf (5 -> 2U3) = sup(2U3U5)/sup(5) = 2/3 =
67%
Generateing Associatin Rules form Frequent Itemsets
Bab 5 - 27/58
How to Generate Candidates?
Suppose the items in L
k-1
are listed in an order
Step 1: self-joining L
k-1

insert into C
k
select p.item
1
, p.item
2
, , p.item
k-1
, q.item
k-1

from L
k-1
p, L
k-1
q
where p.item
1
=q.item
1
, , p.item
k-2
=q.item
k-2
, p.item
k-1
< q.item
k-1
Step 2: pruning
forall itemsets c in C
k
do
forall (k-1)-subsets s of c do
if (s is not in L
k-1
) then delete c from C
k

Bab 5 - 28/58
Example of Generating Candidates
L
3
={abc, abd, acd, ace, bcd}
Self-joining: L
3
*L
3

abcd from abc and abd
acde from acd and ace
Pruning:
acde is removed because ade is not in L
3
C
4
={abcd}
Bab 5 - 29/58
Iceberg Queries
Icerberg query: Compute aggregates over one or a set of
attributes only for those whose aggregate values is above
certain threshold
Example:
select P.custID, P.itemID, sum(P.qty)
from purchase P
group by P.custID, P.itemID
having sum(P.qty) >= 10
Compute iceberg queries efficiently by Apriori:
First compute lower dimensions
Then compute higher dimensions only when all the lower ones
are above the threshold
Bab 5 - 30/58
Iceberg Queries (Cont.)
Generate cust_list, a list of customer who bought three or
more items in total, for example,
select P.cust_ID
from Purchases P
group by P.cust_ID
having SUM(P.qty)>=3;

Generate item_list, a list ofitems that were purchased by
any customer in quantuties of three or more, for example,
select P.item_ID
from Purchases P
group by P.item_ID
having SUM(P.qty)>=3;

Bab 5 - 31/58
Is Apriori Fast Enough?
Performance Bottlenecks
The core of the Apriori algorithm:
Use frequent (k 1)-itemsets to generate candidate frequent k-itemsets
Use database scan and pattern matching to collect counts for the
candidate itemsets
The bottleneck of Apriori: candidate generation
Huge candidate sets:
10
4
frequent 1-itemset will generate 10
7
candidate 2-itemsets
To discover a frequent pattern of size 100, e.g., {a
1
, a
2
, , a
100
}, one
needs to generate 2
100
~ 10
30
candidates.
Multiple scans of database:
Needs (n +1 ) scans, n is the length of the longest pattern
Bab 5 - 32/58
Methods to Improve Aprioris Efficiency
Transaction reduction:
A transaction that does not contain any frequent k-itemset is
useless in subsequent scans because it can not contain any
fewquent (k+1)-itemsets. Therefore, such a transaction can be
removed from further consideration.
Partitioning:
Any itemset that is potentially frequent in DB must be frequent
in at least one of the partitions of DB
Bab 5 - 33/58
Partitioning
Transactions
in D
Divide D
into n
partitions
Find the
frequent
itemsets
local to each
partition
(1 scan)
Combine all
local
frequent
itemsets to
form
candidate
itemset
Find global
frequent
itemsets
among
candidates
(1 scan)
Frequent
itemsets
in D
Phase II
Phase I
Bab 5 - 34/58
Scan once Algorithm (Support count: 3)
Item a Item b Item c Item d Item e
Transaction 1 1 1 0 1 1
Table Boolean relational database D
Bab 5 - 35/58
Scan once Algorithm
Figure: A complete itemset tree for the five items a, b, c, d and e exemplified in
database shown in the table
a b c d e
ab ac ad ae
bc bd be
cd ce
de
abc abd abe acd ace ade bcd bce bde cde
abcd abce abde
acde bcde
abcde
Level 0 (C
1
5
)
Level 1 (C
2
5
)
Level 2 (C
3
5
)
Level 3 (C
4
5
)
Level 4 (C
5
5
)
d
d d d c
a
c
c
d
b
b
d c
d
d
Bab 5 - 36/58
Support
Count
T6 T5 T4 T3 T2 T1 Itemset T1-a T1-b T1-d T1-e T6-b T6-c T6-d
4
5

1
1
1
1
1
1
1

1
1
1
a
b
1
1

1
4
4
1
1
1
1
1
1
1
1
c
d

1
1
1
5
4
1
1
1
1
1
1
1 1
1
e
ab

1/2

1/2
1
1/2
2
3
1
1
1
1

1
ac
ad
1/2
1/2

1/2
1/2
1/2
4
4

1
1
1
1
1
1
1
1 ae
bc
1/2
1/2
1/2
1/2

1/2
4 1 1 1 1 bd 1/2 1/2 1/2 1/2
5
2

1
1
1
1 1 1 1 be
cd
1/2
1/2
1/2 1/2
1/2

1/2
3
3
1
1
1
1
1
1
ce
de

1/2
1/2
1/2
1/2
1/2
2
3
1
1
1
1

1
abc
abd
1/3
1/3
1/3
1/3

1/3
1/3
1/3
1/3
1/3
4
1
1
1
1 1 1 abe
acd
1/3
1/3
1/3
1/3
1/3 1/3
1/3

1/3
2
3
1
1
1
1

1
ace
ade
1/3
1/3

1/3
1/3
1/3
1/3
1/3
2
3
1 1
1

1

1
bcd
bce
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
1/3
3
1
1
1
1 1 bde
cde
1/3
1/3
1/3
1/3

1/3
1/3
1/3
1
2
1
1

1
abcd
abce
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
3
1
1
1
1 1 abde
acde
1/4
1/4
1/4 1/4
1/4
1/4
1/4
1/4
1/4
1/4
1/4
1
1
1
1
bcde
abcde

1/5
1/4
1/5
1/4
1/5
1/4
1/5
1/4
1/5
1/4
1/5
1/4
1/5
Bab 5 - 37/58
Mining Frequent Patterns Without
Candidate Generation
Compress a large database into a compact, Frequent-
Pattern tree (FP-tree) structure
highly condensed, but complete for frequent pattern mining
avoid costly database scans
Develop an efficient, FP-tree-based frequent pattern
mining method
A divide-and-conquer methodology: decompose mining tasks
into smaller ones
Avoid candidate generation: sub-database test only!
Bab 5 - 38/58
Construct FP-tree from a Transaction DB
{}
f:4 c:1
b:1
p:1
b:1 c:3
a:3
b:1 m:2
p:2 m:1
Header Table

I tem frequency head
f 4
c 4
a 3
b 3
m 3
p 3
min_support =0.5
TI D I tems bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
Steps:
1. Scan DB once, find frequent
1-itemset (single item
pattern)
2. Order frequent items in
frequency descending order
3. Scan DB again, construct
FP-tree
Bab 5 - 39/58
Benefits of the FP-tree Structure
Completeness:
preserves complete information for frequent pattern mining
Compactness
reduce irrelevant informationinfrequent items are gone
frequency descending ordering: more frequent items are more likely
to be shared
never be larger than the original database (if not count node-links
and counts)
Bab 5 - 40/58
Mining Frequent Patterns Using FP-tree
General idea (divide-and-conquer)
Recursively grow frequent pattern path using the FP-tree
Method
For each item, construct its conditional pattern-base, and then its
conditional FP-tree
Repeat the process on each newly created conditional FP-tree
Until the resulting FP-tree is empty, or it contains only one path
(single path will generate all the combinations of its sub-paths, each of
which is a frequent pattern)
Bab 5 - 41/58
Major Steps to Mine FP-tree
1) Construct conditional pattern base for each
node in the FP-tree
2) Construct conditional FP-tree from each
conditional pattern-base
3) Recursively mine conditional FP-trees and
grow frequent patterns obtained so far
Bab 5 - 42/58
Step 1: From FP-tree to Conditional
Pattern Base
Starting at the frequent header table in the FP-tree
Traverse the FP-tree by following the link of each frequent item
Accumulate all of transformed prefix paths of that item to form a
conditional pattern base
Conditional pattern bases
item cond. pattern base
c f:3
a fc:3
b fca:1, f:1, c:1
m fca:2, fcab:1
p fcam:2, cb:1
{}
f:4 c:1
b:1
p:1
b:1 c:3
a:3
b:1 m:2
p:2 m:1
Header Table

f 4
c 4
a 3
b 3
m 3
p 3
Bab 5 - 43/58
Step 2: Construct Conditional FP-tree
For each pattern-base
Accumulate the count for each item in the base
Construct the FP-tree for the frequent items of the pattern
base
m-conditional pattern
base:
fca:2, fcab:1
{}
f:3
c:3
a:3
m-conditional FP-tree
All frequent patterns
concerning m
m,
fm, cm, am,
fcm, fam, cam,
fcam

{}
f:4 c:1
b:1
p:1
b:1 c:3
a:3
b:1 m:2
p:2 m:1
Header Table
f 4
c 4
a 3
b 3
m 3
p 3
Bab 5 - 44/58
Mining Frequent Patterns by
Creating Conditional Pattern-Bases
Empty Empty f
{(f:3)}|c {(f:3)} c
{(f:3, c:3)}|a {(fc:3)} a
Empty {(fca:1), (f:1), (c:1)} b
{(f:3, c:3, a:3)}|m {(fca:2), (fcab:1)} m
{(c:3)}|p {(fcam:2), (cb:1)} p
Conditional FP-tree Conditional pattern-base
Item
Bab 5 - 45/58
Single FP-tree Path Generation
Suppose an FP-tree T has a single path P
The complete set of frequent pattern of T can be
generated by enumeration of all the combinations of the
sub-paths of P
{}
f:3
c:3
a:3
m-conditional FP-tree
All frequent patterns
concerning m
m,
fm, cm, am,
fcm, fam, cam,
fcam

Bab 5 - 46/58
Principles of Frequent Pattern
Growth
Pattern growth property
Let o be a frequent itemset in DB, B be o's conditional pattern
base, and | be an itemset in B. Then o | is a frequent
itemset in DB iff | is frequent in B.
abcdef is a frequent pattern, if and only if
abcde is a frequent pattern, and
f is frequent in the set of transactions containing abcde
Bab 5 - 47/58
Why Is Frequent Pattern Growth Fast?
Our performance study shows
FP-growth is an order of magnitude faster than Apriori, and is
also faster than tree-projection
Reasoning
No candidate generation, no candidate test
Use compact data structure
Eliminate repeated database scan
Basic operation is counting and FP-tree building
Bab 5 - 48/58
Interestingness Measurements
Objective measures
Two popular measurements:
support; and
confidence

Subjective measures (Silberschatz & Tuzhilin,
KDD95)
A rule (pattern) is interesting if
it is unexpected (surprising to the user); and/or
actionable (the user can do something with it)
Bab 5 - 49/58
Criticism to Support and Confidence
Example 1: (Aggarwal & Yu, PODS98)
Among 5000 students
3000 play basketball
3750 eat cereal
2000 both play basket ball and eat cereal
play basketball eat cereal [40%, 66.7%] is misleading because the overall
percentage of students eating cereal is 75% which is higher than 66.7%.
play basketball not eat cereal [20%, 33.3%] is far more accurate, although
with lower support and confidence

basketball not basketball sum(row)
cereal 2000 1750 3750
not cereal 1000 250 1250
sum(col.) 3000 2000 5000
Bab 5 - 50/58
Criticism to Support and Confidence
(Cont.)
Example 2:
X and Y: positively correlated,
X and Z, negatively related
support and confidence of
X=>Z dominates
We need a measure of dependent
or correlated events

P(B|A)/P(B) is also called the lift
of rule A => B
X 1 1 1 1 0 0 0 0
Y 1 1 0 0 0 0 0 0
Z 0 1 1 1 1 1 1 1
Rule Support Confidence
X=>Y 25% 50%
X=>Z 37.50% 75% ) ( ) (
) (
,
B P A P
B A P
corr
B A

=
Bab 5 - 51/58
Other Interestingness Measures: Interest
Interest (correlation, lift)
taking both P(A) and P(B) in consideration
P(AUB)=P(B)*P(A), if A and B are independent events
A and B negatively correlated, if the value is less than 1; otherwise A
and B positively correlated
) ( ) (
) (
B P A P
B A P
X 1 1 1 1 0 0 0 0
Y 1 1 0 0 0 0 0 0
Z 0 1 1 1 1 1 1 1
Itemset Support Interest
X,Y 25% 2
X,Z 37.50% 0.9
Y,Z 12.50% 0.57
Bab 5 - 52/58
Multiple-Level Association Rules
Items often form hierarchy.
Items at the lower level are
expected to have lower
support.
Rules regarding itemsets at
appropriate levels could be
quite useful.
Transaction database can be
encoded based on dimensions
and levels
We can explore shared multi-
level mining
All
Printer Computer
Desktop
Compaq IBM
Laptop B/W Color
TID Items
T1 {111, 121, 211, 221}
T2 {111, 211, 222, 323}
T3 {112, 122, 221, 411}
T4 {111, 121}
T5 {111, 122, 211, 221, 413}

Sony
HP
Bab 5 - 53/58
Mining Multi-Level Associations
A top_down, progressive deepening approach:
First find high-level strong rules:
computer printer [20%, 60%].
Then find their lower-level weaker rules:
desktop printer [6%, 50%].
Variations at mining multiple-level association rules.
Level-crossed association rules:
desktop Sony color printer
Association rules with multiple, alternative hierarchies:
desktop Color printer
Bab 5 - 54/58
Uniform Support
Multi-level mining with uniform support
Computer
[support = 10%]
Desktop
[support = 6%]
Laptop
[support = 4%]
Level 1
min_sup = 5%
Level 2
min_sup = 5%
Back
Bab 5 - 55/58
Reduced Support
Multi-level mining with reduced support
Desktop
[support = 6%]
Laptop
[support = 4%]
Level 1
min_sup = 5%
Level 2
min_sup = 3%
Computer
[support = 10%]
Bab 5 - 56/58
Multi-Dimensional Association:
Concepts
Single-dimensional rules:
buys(X, milk) buys(X, bread)

Multi-dimensional rules:
Inter-dimension association rules (no repeated predicates)
age(X,19-25) . occupation(X,student) buys(X,coke)
hybrid-dimension association rules (repeated predicates)
age(X,19-25) . buys(X, popcorn) buys(X, coke)
Bab 5 - 57/58
Summary
Association rule mining
probably the most significant contribution from the
database community in KDD
A large number of papers have been published
Many interesting issues have been explored
An interesting research direction
Association analysis in other types of data: spatial
data, multimedia data, time series data, etc.
Bab 5 - 58/58
Akhir
Bab 5

Bab 05 - Association Mining

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Bab 05 - Association Mining

Caricato da

Copyright:

Formati disponibili

Data Mining Arif Djunaidy FTIF ITS

Potrebbero piacerti anche