Sei sulla pagina 1di 4

ASSOCIATION RULE MINING

Generating Association Rules from Frequent Itemsets


Strong association rules satisfy both minimum
support and minimum confidence levels
Confidence (A B)
= P(B / A )
= support_count(A U B) / support_count(A)
Association rules
For each frequent itemset l, generate all nonempty subsets of l
For every non-empty subset s of l, output s
-s)
if sup_count(l) / sup_count(s) >= min_conf

Example
I = {I1, I2, I5} Confidence Threshold : 70%
Non empty subsets: {I1, I2}, {I1, I5}, {I2, I5}
{I1}, {I2}, {I5}
I1
I1
I2 I
I1
I2
I5
Improving the Efficiency of Apriori
Hash based technique
Transaction reduction
A transaction which does not contain k frequent
itemsets cannot contain k+1 frequent itemsets
Partitioning

Sampling
Dynamic itemset counting
Start points

Hash Based Technique


Partition: Scan Database Only Twice
Any itemset that is potentially frequent in DB must
be frequent in at least one of the partitions of DB
Scan 1: partition database and find local frequent
patterns
Scan 2: consolidate global frequent patterns
Sampling for Frequent Patterns
Select a sample of original database, mine frequent
patterns within sample using Apriori
Can use a lower support threshold
Scan database once to verify frequent itemsets found
in sample
Scan database again to find missed frequent patterns

Bottleneck of Frequent-pattern Mining


Multiple database scans are costly
Mining long patterns needs many passes of scanning
and generates lots of candidates
To find frequent itemset i1i2i100
# of scans: 100
100
30
# of Candidates: = 2 -1 = 1.27*10
Bottleneck: candidate-generation-and-test
Avoid candidate generation

Mining Frequent Patterns Without Candidate


Generation
FP Growth
Divide and Conquer technique
FP-Tree
Grow long patterns from short ones using local
frequent items
FP-tree from a Transaction Database - Example
FP-Growth
For each frequent length-1 pattern(Suffix pattern):
Construct conditional pattern base (Sub-database
consisting of set of prefix paths co-occurring with
suffix)
Construct conditional FP-tree and mine
recursively
Generate all combinations of frequent patterns by
combing with suffix
FP-Growth
Algorithm
Input:
A transaction db D; min_sup
Output:
Frequent patterns
Construction of FP-Tree

Scan database, collect frequent items F and sort in


descending order of support

Create root of FP-tree labeled null


For each Trans, sort in descending order [p|P]
Insert_tree([p|P],T)
If T has a child N = p, increment count
else create new node with count 1 and set
parent and node links
If P is non-empty call insert_tree(P,N) recursively

Algorithm
Procedure FP_growth (Tree, a)
If Tree contains a single path P then
for each combination of nodes- b generate b a with
support = min. support of nodes in b
else for each xi in the header of the Tree
{
generate pattern b = xi
construct bs conditional pattern base and bs
conditional FP_tree Treeb
if Treeb < > NULL then call FP_growth(Treeb, b)
}
Features
Finds long frequent patterns by looking for shorter
ones recursively
Items in frequency descending order: the more
frequently occurring, the more likely to be shared
Main-memory based FP-tree
Efficient and scalable
Faster than Apriori

Potrebbero piacerti anche