F P-Tree F P-Growth

This paper proposes:
Mining Frequent Patterns without

Candidate Generation
A novel frequent pattern tree
structure: F P-tree
An efficient FP-tree-based
a paper by Jiawei Han, Jian Pei and Yiwen Yin
School of Computing Science
S imon Fraser University
Presented by Maria Cutumisu
mining method: FP-growth
Department of Computing Science
University of Alberta
T his approach is very efficient

due to: F P-tree: Design and Construction
Compression of a large To ensure that the tree structure
database into a smaller data is compact, only frequent
structure length-1 items will have nodes
Pattern fragment growth mining in the tree
method More frequently occurring nodes
Partitioning-based divide-and- will have better chances of
conquer search method sharing nodes than the others
Example: a transaction database The corresponding FP-tree
T ransaction ID Items Bought (Ordered) T ransactions
F requent Items
sharing an identical
100 f, a, c, d, i, m, p f, c, a, m, p
itemset can be
200 a, b, c, f, l, m, o f, c, a, b, m merged into one
300 b, f, h, j, o f, b with the number of
occurrences
400 b, c, k, s, p c, b, p
registered as count.
500 a, f, c, e, l, p, f, c, a, m, p
m, n
An FP-tree is a tree structure

which consists of: F P-tree construction algorithm
One root labeled as "null" Input: a transaction database DB
A set of item prefix sub-trees with and a minimum support threshold ε
each node formed by three fields: Output: Its frequent pattern tree,
item-name, count, node-link F P-tree
A frequent-item header table with Method: The FP-tree is constructed
two fields for each entry: item- in the following steps:
name, head of node-link
2. Create a root of an FP-tree, T,
1. Scan DB once: and label it as "null"
Collect the set of frequent items For each transaction T rans in DB do
F and their supports the following:
select and sort the frequent items
Sort F in support descending in T rans according to the order of
order as L, the list of frequent L
items let the sorted frequent item list in
T rans be [p|P], where p is the
first element and P is the
remaining list. Call
insert_tree([p|P], T)
Note: insert_tree([p|P], T) is
performed as follows: Analysis
IF T has a child N such that Two scans of the DB are necessary:
N.item_name=p.item_name, then the first collects the set of frequent
increment N's count by 1 items and the second constructs the
E L S E create a new node N, and let its F P-tree.
count by 1, its parent link be linked to T,
and its node-link be linked to the nodes T he cost of inserting a transaction
with the same item_name via the node- T rans into the FP-tree is
link structure O(|Trans|), where | T rans| is the
IF P is nonempty, call insert_tree(P,N) number of frequent items in T rans.
recursively
F P-growth: the FP-tree-based
mining method
F P-tree contains the complete Starts from a frequent length-1
information for frequent pattern mining.
pattern
T he size of the FP-tree is bounded by the
size of the database, but due to frequent Examines only its conditional
items sharing, the size of the tree is pattern base
usually much smaller than its original
database. Constructs its FP-tree
High compaction is achieved by placing Performs mining recursively on
more frequently items closer to the root the tree
(being thus more likely to be shared).
F P-growth algorithm P rocedure FP-growth ( T ree, α)

Input: F P-tree constructed using IF T ree contains a single path P
DB and a minimum support T H EN for each combination β of the nodes in

the path P DO generate pattern β ∪ α with
threshold ε support = minimum support of nodes in β
Output: The complete set of E L S E for each ai in the header of T ree DO
frequent patterns generate pattern β = ai ∪ α with ai.support;

construct β 's conditional pattern base and
Method: Call F P-growth (FP- F P-tree T reeβ
tree, null) IF T reeβ <> void THEN Call F P-
growth(T reeβ, β)
Analysis of the FP-growth Search technique: partitioning-
algorithm based divide-and-conquer
Finds the complete set of frequent U sed instead of the Apriori-like
itemsets
bottom-up generation of
Efficient because:
it works on a reduced set of pattern bases
frequent itemsets combinations
it performs mining operations less costly than Reduces the size of the
generation and test:
conditional pattern base
prefix count adjustment
counting
generated at the subsequent
pattern fragment concatenation level of search and of its
corresponding FP-tree
Performance comparison with

other algorithms
T ransforms the problem of T reeProjection is the supporting
finding long frequent patterns to algorithm of another novel tree
looking for shorter ones and structure: lexicographic tree
then concatenating the suffix. Comparative analysis of the FP-
growth with Apriori and
Employs the least frequent
T reeProjection algorithms show
items as suffix, which offers a that FP-growth outperforms both
good selectivity. of them
Improvements: how to design a
disk-resident F P-tree Performance improvements
Cluster F P-tree nodes by path and by Materialization of an FP-tree
item prefix sub-tree
B+-tree for F P-tree not fitting into main Incremental updates of an F P-
memory tree
Group access mode mining to reduce the
I/O cost F P-tree mining with item
Release space of the conditional pattern constraints
base or conditional FP-tree after usage
F P-tree mining of other frequent
Remove the node-links of the FP-tree
patterns
Advantages of the FP-growth

mining method: Drawbacks:
Efficient and scalable for both long and T he tree does not achieve maximal
short frequent patterns; the running compactness all the time.
memory requirements of FP-growth For the databases with mostly short
increase linearly when the support transactions, the reduction ratio of
threshold goes down the tree in respect to the database
An order of magnitude faster than the is not very high.
Apriori algorithm
T he F P-tree does not always fit into
Faster than recently reported new the main memory.
frequent pattern mining methods
Conclusions
F P-growth method has satisfactory
performance when tested in large
industrial databases
It is open to a lot of research issues
Due to compression, sometimes large
databases (order of gigabytes) containing
many long patterns may generate F P-
trees which fit in main memory

F P-Tree F P-Growth

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

F P-Tree F P-Growth

Caricato da

Copyright:

Formati disponibili

This paper proposes:

Mining Frequent Patterns without

T his approach is very efficient

An FP-tree is a tree structure

F P-growth algorithm P rocedure FP-growth ( T ree, α)

DB and a minimum support T H EN for each combination β of the nodes in

frequent patterns generate pattern β = ai ∪ α with ai.support;

Performance comparison with

Advantages of the FP-growth

Potrebbero piacerti anche

F P-Tree F P-Growth

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

F P-Tree F P-Growth

Caricato da

Copyright:

Formati disponibili

This paper proposes:

Mining Frequent Patterns without

T his approach is very efficient

An FP-tree is a tree structure

F P-growth algorithm P rocedure FP-growth ( T ree, α)

DB and a minimum support  T H EN for each combination β of the nodes in

frequent patterns  generate pattern β = ai ∪ α with ai.support;

Performance comparison with

Advantages of the FP-growth

Potrebbero piacerti anche

DB and a minimum support T H EN for each combination β of the nodes in

frequent patterns generate pattern β = ai ∪ α with ai.support;