Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
{}
Header Table
Conditional pattern bases
Item frequency head f:4 c:1 item cond. pattern base
f 4 c f:3
c 4 c:3 b:1 b:1
a 3 a fc:3
b 3 a:3 p:1 b fca:1, f:1, c:1
m 3
p 3 m fca:2, fcab:1
m:2 b:1
p fcam:2, cb:1
p:2 m:1
m-conditional pattern base:
fca:2, fcab:1
{}
Header Table
f:4 c:1 {} All frequent patterns
Item frequency head relate to m
f 4 m,
c:3 b:1 b:1 f:3
c 4
fm, cm, am,
a 3 c:3
b 3 a:3 p:1 fcm, fam, cam,
m 3 a:3 fcam
p 3 m:2 b:1
m-conditional FP-tree
p:2 m:1
GENERALIZED SEQUENTIAL PATTERN MINING
ALGORITHM
1. Initially, every item in DB is a candidate of
length-1.
2. For each level (i.e., sequences of length-k) do
2.1 Scan database to collect support count for each
candidate sequence.
2.2 Generate candidate length-(k+1) sequences from
length-k frequent sequences using Apriori.
3. Repeat until no frequent sequence or no
candidate can be found.
Cand Sup
<a> 3
Seq. ID Sequence
10 <(bd)cb(ac)> <b> 5
20 <(bf)(ce)b(fg)> <c> 4
30 <(ah)(bf)abf> <d> 3
40 <(be)(ce)d> <e> 3
50 <a(bd)bcb(ade)>
<f> 2
Minimum support =2 <g> 1
<h> 1
Length-1 Candidates
<a> <b> <c> <d> <e> <f>
<a> <aa> <ab> <ac> <ad> <ae> <af>
<b> <ba> <bb> <bc> <bd> <be> <bf>
<c> <ca> <cb> <cc> <cd> <ce> <cf>
<d> <da> <db> <dc> <dd> <de> <df>
<e> <ea> <eb> <ec> <ed> <ee> <ef>
<f> <fa> <fb> <fc> <fd> <fe> <ff>
<a> <b> <c> <d> <e> <f>
<a> <(ab)> <(ac)> <(ad)> <(ae)> <(af)>
<b> <(bc)> <(bd)> <(be)> <(bf)>
<c> <(cd)> <(ce)> <(cf)>
<d> <(de)> <(df)>
Length-2 Candidates
<e> <(ef)>
<f>
5th scan: 1 cand. <(bd)cba> Cand. cannot pass
1 length-5 seq. pat. sup. threshold
2nd scan: 51 cand. <aa> <ab> … <af> <ba> <bb> … <ff> <(ab)> … <(ef)>
19 length-2 seq. pat.
1st scan: 8 cand. <a> <b> <c> <d> <e> <f> <g> <h>
6 length-1 seq. pat.
Seq. ID Sequence
min_sup =2 10 <(bd)cb(ac)>
20 <(bf)(ce)b(fg)>
30 <(ah)(bf)abf>
40 <(be)(ce)d>
50 <a(bd)bcb(ade)>
Security(credit card fraud)
Global climate modeling
Business
Disaster Management
[1] Florian Verhein, Frequent Pattern Growth (FP-Growth)
Algorithm, 2008.