Sei sulla pagina 1di 48

Fast discovery of sequential patterns in large

databases using effective time -indexing


Information Sciences ( 2008 ) 4228 -4245
Ming-Yen Lin , Suh-Yin Lee and Sheng-Shun Wang
National Chiao Tung University , Taiwan

Advisor Prof. Huang, Jen-Peng


Student TU,JING-GUO

Outline

Introduction
Related work
Definition

An example

Performance analysis and experimental


evaluation
Conclusions

Introduction

Introduction

The time constraints between elements of a sequential pattern ar e not


specified so that some uninteresting patterns may appear.
For example, without specifying the maximum time gap, one my fin d a
pattern < ( b, d, e ) ( a, f ) >, which means an item -set having a and f
will occur after the occurrence of an item -set having b, d, and e.
However, the pattern could be insignificant if the time interva l between
the two item-set is too long such as over months.

?
time

pc

printer

Ink ,paper

Introduction

Introduction

The time constraints between elements of a sequential pattern ar e not


specified so that some uninteresting patterns may appear.
For example, without specifying the maximum time gap, one my fin d a
pattern < ( b, d, e ) ( a, f ) >, which means an item -set having a and f
will occur after the occurrence of an item -set having b, d, and e.
However, the pattern could be insignificant if the time interva l between
the two item-set is too long such as over months.

pc

printer

Ink ,paper

100

Related work

Sequentail pattern mining


GSP ( apriori )
DELISP

Definition
Definition .1 (frequent item)
An item x is called a frequent item in a sequence database DB if the supp ort of 1sequence <(x)> is greater than or equal to minsup.
Definition .2 (type-1, type-2 , prefix , stem)
itemset

Type

< (a) (b) >

Type-1

< (a , b) >

Type-2

Definition
Definition .1 (frequent item)
An item x is called a frequent item in a sequence database DB if the supp ort of 1sequence <(x)> is greater than or equal to minsup.
Definition .2 (type-1, type-2 , prefix , stem)
itemset

Type

< (a) (b) >

Type-1

< (a , b) >

Type-2

prefix

stem

Definition
Definition .3 ( it , lst , let )

Transaction

itemset

TIdx

T1

< 1(a) 2(b) 9(d) 15(c) >

[1:1:1]

T2

< 1(a) 2(b) 9(d) 15(c) 21(a)>

[ 1:1:1 , 21:21:21 ]

[x:y:z]
Last end-time
initial-time
Last start-time

Definition
Definition .3 ( it , lst , let )

itemset

TIdx

< 1(a) 2(b) 9(d) 15(c) >

( a) (b )

[ 1:2:2 ]

< 1(a) 2(b) 9(d) 25(c) 28(a)>

( a) (c )

[ 1:25:25 ]

[x:y:z]
Last end-time
initial-time
Last start-time

Definition

Time-constraints
swin = sliding time-window
mingap = minimum time gap
maxgap = maximum time gap
duration = constraint time window

Definition
Lemma .1 ( type1 )
leti + mingap VTP lsti + maxgap

VTP = valid time periods

Definition
Lemma .1 ( type1 )
leti + mingap VTP lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

10

17

18

c ,d

24

Definition
Lemma .1 ( type1 )
leti + mingap VTP lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

duration = 25

a,c

10

17

18

c ,d

24

35

Definition
Lemma .1 ( type1 )
leti + mingap VTP lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

maxgap = 15

a,c

10

17

18

c ,d

24

32

35

Definition
Lemma .1 ( type1 )
leti + mingap VTP lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

10

17

mingap = 3

20

c ,d
24

32

35

Definition
Lemma .1 ( type1 )
leti + mingap VTP lsti + maxgap

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

10

17

VTP
20

32

35

Definition
Lemma .1 ( type1 )
leti + mingap VTP lsti + maxgap

a,c

10

17

VTP
20

32

35

Definition
Lemma .2 ( type2 )
leti - swin VTP minimum of { lsti + swin , iti + duration }

Definition
Lemma .2 ( type2 )
leti - swin VTP minimum of { lsti + swin , iti + duration }

Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

10

c ,d

17

24

35

Definition
Lemma .2 ( type2 )
leti - swin VTP minimum of { lsti + swin , iti + duration }
Ex: < (b) (e) >

Transaction

itemset

TIdx

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 10:17:17 ]

a,c

10

17

An example

Item

Support

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

Tran, ID

sequences

C1

min_Sup=2

An example
min_Sup=2
<( a )> -TIdx
[ 5:5:5 , 31:31:31 ]
[ 6:6:6 , 18:18:18 ]
[ 5:5:5 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

Tran, ID

sequences

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

An example
item

Tran, ID

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

a ,f

18

31

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

45

An example
item

Tran, ID

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

duration = 25

30
c

a ,f

18

31

45

An example
item

Tran, ID

C1

1.

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti + mingap VTP lsti + maxgap

8 VTP 20

30
c

a ,f

18

31

45

An example
item

Tran, ID

C1

1.

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti + mingap VTP lsti + maxgap

8 VTP 20

<( a )( b )> 1

30
c

a ,f

18

31

45

An example
item

Tran, ID

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

2.

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti - swin VTP minimum of { lsti + swin , iti + duration }

3 VTP 7

a ,f

18

31

45

An example
item

Tran, ID

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

2.

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti - swin VTP minimum of { lsti + swin , iti + duration }

3 VTP 7

<( a ,c )> 1
<( a ,f )> 1

a ,f

18

31

45

An example
item

Tran, ID

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

duration = 25

a ,f

18

31

45

56

An example
item

Tran, ID

C1

1.

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti + mingap VTP lsti + maxgap

33 VTP 46

a ,f

18

31

45

56

An example
item

Tran, ID

C1

1.

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti + mingap VTP lsti + maxgap

33 VTP 46

<( a )( f )> 1

a ,f

18

31

45

56

An example
item

Tran, ID

C1

TIdx

sequences

< 3(c) 5(a,f) 18(b) 31(a) 45(f) > [ 5:5:5 , 31:31:31 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

<( a )( b )> 1
<( a )( f )> 1
<( a ,c )> 1

a ,f

18

31

45

56

An example
sequences

TIdx

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

[ 6:6:6 , 18:18:18 ]

item Tran, ID

C2

1.

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti + mingap VTP lsti + maxgap

9 VTP 21

a ,c

c ,d

10

17

18

24

An example
item

Tran, ID

C2

1.

TIdx

sequences

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) > [ 6:6:6 , 18:18:18 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti + mingap VTP lsti + maxgap

9 VTP 21

<( a )( b )> 1
<( a )( e )> 1
<( a )( a )> 1
a ,c

c ,d

10

17

18

24

An example
item

Tran, ID

C2

2.

TIdx

sequences

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) > [ 6:6:6 , 18:18:18 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti - swin VTP minimum of { lsti + swin , iti + duration }

4 VTP 8

<( a ,c )> 1

a ,c

c ,d

10

17

18

24

An example
item

Tran, ID

C2

1.

TIdx

sequences

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) > [ 6:6:6 , 18:18:18 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

leti + mingap VTP lsti + maxgap

21 VTP 33

<( a )( c )> 1
<( a )( d )> 1

a ,c

c ,d

10

17

18

24

An example
item

Tran, ID

C2

TIdx

sequences

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) > [ 6:6:6 , 18:18:18 ]

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

<( a )( a )> 1
<( a )( b )> 1
<( a )( c )> 1
<( a )( d )> 1
<( a )( e )> 1
<( a ,c )> 1

a ,c

c ,d

10

17

18

24

An example
item

Tran, ID

sequences

TIdx

C4

< 5(a) 10(d) 21(c,d) 26(e) >

[ 5:5:5 ]

1.

leti + mingap VTP lsti + maxgap

8 VTP 20

<( a )( d )> 1

c ,d

10

21

26

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

An example
min_Sup=2
Tran, ID

sequences

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

<( a )> -TIdx


[ 5:5:5 , 31:31:31 ]
[ 6:6:6 , 18:18:18 ]
[ 5:5:5 ]

<( a )( a )> 1
<( a )( b )> 2
<( a )( c )> 1
<( a )( d )> 2
<( a )( e )> 1
<( a ,c )> 2

An example
min_Sup=2
Tran, ID

sequences

[ 3:3:5 ]

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

[ 6:6:6]

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

<( a ,c )> -TIdx

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

<( a ,c )( b)> 2

An example
min_Sup=2
Tran, ID

sequences

[ 3:3:18 ]

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

[ 6:6:10]

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

C4

< 5(a) 10(d) 21(c,d) 26(e) >

<( a ,c )( b)> -TIdx

Time-constraints
swin = 2
mingap = 3
maxgap = 15
duration = 25

No more
patterns can be
formed

An example

Min_Sup=2
Frequent itemset

Frequent itemset

c
(c )( b)

Tran, ID

sequences

(a ,c)

C1

< 3(c) 5(a,f) 18(b) 31(a) 45(f) >

(a )( b)

C2

< 6(a,c) 10(b) 17(e) 18(a) 24(c,d) >

(a )( d)

C3

< 1(b) 20(b,g) 27(e) 34(d,g) 35(g) >

(a ,c)( b)

C4

< 5(a) 10(d) 21(c,d) 26(e) >

Frequent itemset
b
(b )( a)
(b )( d)
(b )( e)
(b )( e)( d)

(c )( e)
(c )( b)( a)
Frequent itemset
d
Frequent itemset
e
(e )( d)

Dealing with extra-large databases

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10


Average number of items per transaction = 2.5
Average size of potentially sequential patterns = 4
Average size of potentially frequent itemsets =1.25
Number of data sequences in database = 100k

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10


Average number of items per transaction = 2.5
Average size of potentially sequential patterns = 4
Average size of potentially frequent itemsets =1.25
Number of data sequences in database = 100k

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10


Average number of items per transaction = 2.5
Average size of potentially sequential patterns = 4
Average size of potentially frequent itemsets =1.25
Number of data sequences in database = 100k

Performance analysis and experimental evaluation

Average number of transaction per data -sequence = 10


Average number of items per transaction = 2.5
Average size of potentially sequential patterns = 4
Average size of potentially frequent itemsets =1.25
Number of data sequences in database = 100k

Conclusions

This paper has presented METISP, a time -indexing algorithm for


mining sequential patterns with various time constraints , inclu ding
minimum-, maximum-, and exact-gaps, sliding time-windows, and
durations. METISP effectively shrinks the search space of potent ial
patterns.

Potrebbero piacerti anche