Sei sulla pagina 1di 46

Fast Random Walk with

Restart and Its Applications


Hanghang Tong, Christos Faloutsos and Jia-Yu (Tim) Pan

ICDM 2006

Dec. 18-22, HongKong

Motivating Questions
Q: How to measure the relevance?
A: Random walk with restart
Q: How to do it efficiently?
A: This talk tries to answer!

Random walk with restart


9

10
12

2
8

11

3
4
6

Random walk with restart


0.04
9

0.10
2

0.13
1

0.13

0.03
10
12

0.08
11

0.04

4
0.13

5
7

Node 4

0.05

0.02

Node 1
Node 2
Node 3
Node 4
Node 5
Node 6
Node 7
Node 8
Node 9
Node 10
Node 11
Node 12

0.13
0.10
0.13
0.22
0.13
0.05
0.05
0.08
0.04
0.03
0.04
0.02

0.05

Nearby nodes, higher scores


More red, more relevant

Ranking vector

r
r4

Automatic Image Caption


Q

{ Sea Sun Sky Wave}

{ Cat

Forest

Grass

Tiger }

?
A: RWR!
{?, ?, ?,}

[Pan KDD2004]
5

Region

Image
Test Image

Sea

Sun

Sky

Wave

Cat

Forest

Keyword

Tiger

Grass
6

Region

Image
Test Image
{Grass, Forest,
Cat, Tiger}

Sea

Sun

Sky

Wave

Cat

Keyword

Forest

Tiger

Grass
7

Neighborhood Formulation

Q: what is most related


conference to ICDM
A: RWR!
[Sun ICDM2005]

Conference

Author

NF: example

Center-Piece Subgraph(CePS)
Q

Original Graph
Black: query nodes

A: RWR! [Tong KDD 2006]

CePS
10

CePS: Example

11

Other Applications
Content-based Image Retrieval [He]
Personalized PageRank [Jeh], [Widom],
[Haveliwala]
Anomaly Detection (for node; link) [Sun]
Link Prediction [Getoor], [Jensen]
Semi-supervised Learning [Zhu], [Zhou]

12

Roadmap
Background
RWR: Definitions
RWR: Algorithms

Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage

Experimental Results
Conclusion
13

Computing RWR

r
r
r
%
ri cWri (1 c ) ei
0.13

0.10

0.13

0.22

0.13

0.05

0.05 0.9

0.08

0.04

0.03

0.04

0.02

nx1

Restart p

Adjacent matrix

Ranking vector

0 1/3 1/3 1/3 0


1/3 0 1/3 0 0
1/3 1/3 0 1/3 0
1/3 0 1/3 0 1/4

0
0
0
0

0
0
0
0

0
1/4
0
0

0
0
0
0

0 0 1/3 0 1/2 1/2 1/4


0
0 0 1/4 0 1/2 0
0
0 0 1/4 1/2 0 0
1/3 0 0 1/4 0 0 0

0
0
0
0

0
0
0
0

0
0
0
0

0
0
0
0

0 0 0 1/4
0 0 0 0
0 0 0 1/4
0 0 0 0

0 0 0 0

0 0 0 0
0 0 0 0

1/2 0 1/3 0

0 1/3 0 0

1/2 0 1/3 1/2

0 1/3 0 1/2
0 1/3 1/3 0
0
0
0
0

0
0
0
0

nxn

0
0
0
0

0
0
0
0

0.13

0.10

0.13

0.22

0.13

0.05

0.1

0.05

0.08

0.04

0.03

0.04

0.02

0
0

1
0

0
0

0
0

0
0

Starting vector

9
2
1

10
12
11

4
5

6
7

nx1
14

Beyond RWR : Maxwell Equation for Web!

[Chakrabarti]

SM Learning

RL in CBIR

[Zhou, Zhu]

[He]

P-PageRank
[Haveliwala]

RWR

PageRank

[Pan, Sun]

[Haveliwala]

Fast RWR Finds the Root Solution !

15

Q: Given query i, how to solve it?

0 1/3 1/3 1/3 0 0 0 0

1/3 0 1/3 0 0 0 0 1/4


1/3 1/3 0 1/3 0 0 0 0

1/3 0 1/3 0 1/4 0 0 0

0 0 0 1/3 0 1/2 1/2 1/4

0 0
0
0.9
0
0 0
0 1/3 0

0 0
0

0
0 0
0 0
0

0 0
0

0 1/4 0 1/2 0
0 1/4 1/2 0 0

0 1/4 0 0 0
0 0 0 0 1/4
0
0

0 0 0 0
0 0 0 1/4

0 0 0 0

0 0 0 0
0 0 0 0

0 0 0 0

0 0 0 0
0 0 0 0

0 0 0 0
1/2 0 1/3 0

0 1/3 0 0

1/2 0 1/3 1/2


0 1/3 0 1/2

0 1/3 1/3 0

0 0 0

0.1

16

OntheFly:
0.13
0.12
0.3
0.14
0.16
0.19

0.18

0.09
0.13
0
0.10

0.16

0.13
0.12
0.3
0.19
0.14

0.35
0.1
0.22
0.18
0.26
0.21

0.13

0.3
0.18
0.10
0.15
0.03

0.05

0.07
0
0.04
0.06
0.9
0
0.04
0.06
0.07

0.05
0.07

0.06
0.08
0

0.07

0.04
0
0
0.01
0.02


00
0
0.01
0.03

0.04

0
0
0.01
0.02

0.01
0.02
0
0
0

r
ri

0
1/3
1/3
1/3
0
0
0
0
0
0
0
0

1/3
0
1/3
0
0
0
0
1/3
0
0
0
0

1/3
1/3
0
1/3
0
0
0
0
0
0
0
0

1/3
0
1/3
0
1/3
0
0
0
0
0
0
0

0 0 0 0
0 0 0 1/4
0 0 0 0
1/4 0 0 0
0 1/2 1/2 1/4
1/4 0 1/2 0
1/4 1/2 0 0
1/4 0 0 0
0 0 0 1/4
0 0 0 0
0 0 0 1/4
0 0 0 0

0 0 0 0

0 0 0 0
0 0 0 0

0 0 0 0

0 0 0 0
0 0 0 0

0 0 0 0
1/2 0 1/3 0

0 1/3 0 0

1/2 0 1/3 1/2


0 1/3 0 1/2

0 1/3 1/3 0

0
0.13

0

0.10

0.13

1
0.22

0.13

0.05

0
0.1
0

0.05
0

0.08

0.04
0

0.03

0.04
0

0.02
0

r
ri

0.04
0.10

10 0.03
9
10
9
12
12
0.08
0.02
88 11
11

22
1 1
3 30.13
44
5 5 660.05
0.13
77

0.13

0.04

0.05

No pre-computation/ light storage

Slow on-line response

O(mE)
17

PreCompute
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12
0.20 0.13

0.28 0.20

0.39 0.34

0.60 0.48 0.53 0.45


0.44 0.35 0.39 0.33

0.43 0.34 0.38 0.32

0.73 0.58 0.66 0.56


0.30 0.24 0.27 0.22

0.30 0.24 0.27 0.22


1.49 1.19 1.33 1.13

1.78 1.00 0.76 0.79

1.50 2.45 1.54 1.80


1.14 1.54 2.28 1.72

0.79 1.20 1.14 2.05

0.14 0.13 0.68

0.56

0.56

0.63 0.44

0.13 0.96 0.64

0.53

0.53

0.85

0.14 0.13

0.20 1.29

0.68

0.56

0.56

0.63

0.13 0.10

0.09 0.09

0.13 2.06 0.95

0.78

0.78

0.61

0.09 1.27

2.41

1.97

1.97 1.05

0.03 0.04

0.04 0.52 0.98

2.06

1.37

0.43

0.03 0.04
0.08 0.11

0.04 0.52 0.98

1.37

2.06

0.43

0.04 0.82 1.05

0.86

0.86

2.13

0.03 0.04

0.03 0.28 0.36

0.30

0.30

0.74

0.04 0.04
0.04 0.05

0.04 0.34 0.44


0.04 0.38 0.49

0.36
0.40

0.36
0.40

0.89
1.00

0.02 0.21 0.28

0.22

0.22

0.56

R:

0.02 0.03

0.35

0.04
0.13

11

99 10
10 0.03
1212
0.08
88
0.02
11
11

0.10

22
44

3 0.13
5

0.13

0.04

66
77

0.05

0.05

[Haveliwala]
18

PreCompute:
0.13

2.20

0.10

1.28
0.13
1.43

0.22

1.29

0.91
0.13

0.05
0.37

0.05

0.37
0.08
0.84

0.04

0.29

0.03

0.35
0.04
0.39

0.02
0.22

1.28

1.43

1.29 0.68

2.02

1.28 0.96 0.64

1.28

2.20

0.96

1.29 2.06 0.95

0.78 0.78 0.61

0.43

0.34

0.86

0.91

1.27

2.41

1.97

1.97 1.05

0.73

0.58

0.35

0.37

0.52 0.98

2.06

1.37 0.43

0.30

0.24

0.35

0.37

0.52 0.98

1.37

2.06 0.43 0.30

0.24

1.14

0.84

0.82 1.05

0.86

0.86 2.13

1.49

1.19

0.40

0.29

0.28 0.36

0.30

0.30 0.74

1.78

1.00

0.48
0.53

0.35
0.39

0.34 0.44
0.38 0.49

0.36
0.40

0.36 0.89
0.40 1.00

1.50
1.14

2.45
1.54

0.30

0.22

0.21 0.28

0.22

0.22 0.56

0.79

1.20

1.29

0.56

0.63

0.44

0.35

0.53 0.53 0.85

0.60

0.48

0.68 0.56

0.56
0.56

0.63 0.44 0.35

0.39 0.34

0.53 0.45
0.39 0.33

0.38 0.32

0.66 0.56
0.27 0.22

0.27 0.22
1.33 1.13

0.76 0.79

1.54 1.80
2.28 1.72

1.14 2.05

0.04
0.13

11

99 10
10 0.03
1212
0.08
88
0.02
11
11

0.10

22
44

3 0.13
5

0.13

0.04

66
77

0.05

0.05

Fast on-line response


Heavy pre-computation/storage cost
O(n 3 )
O(n 2 )

19

Q: How to Balance?

Off-line

On-line

20

Roadmap
Background
RWR: Definitions
RWR: Algorithms

Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage

Experimental Results
Conclusion
21

Basic Idea
Find Community

4
9
2
1

9
8

12

11

10

10

0.04

12

0.13

11

0.10

2
4

4
5

0.08

0.13

2
4

Fix the remaining

9
8

3
5

10

12

0.13

10
11

0.03

12
0.02

0.04

6
7

0.05

0.05

Combine

11

6
7

22

Pre-computational stage
Q: Efficiently compute and store Q-1
A: A few small, instead of ONE BIG, matrices inversions

23

On-Line Query Stage


Q: Efficiently recover one column of Q-1
A: A few, instead of MANY, matrix-vector multiplication

ei

r
ri

0

0
0

1

0
0

0
0

0

0
0

0

24

Roadmap
Background
RWR: Definitions
RWR: Algorithms

Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage

Experimental Results
Conclusion
25

Pre-compute Stage
p1: B_Lin Decomposition
P1.1 partition
P1.2 low-rank approximation

p2: Q matrices
P2.1 computing
P2.2 computing

1
1 (for

each partition)
% (for concept space)

26

P1.1: partition
9
2
1

10
12
11

4
5

6
7

9
2
1

10
12
11

4
5

6
7

Within-partition links

cross-partition links
27

P1.1:
9
2
1

block-diagonal

10
12
11

4
5

6
7

28

P1.2: LRA for


9
2
1

10
12
11

4
5

6
7

~
|S| << |W2|
29

+
30

p2.1 Computing

31

Comparing

and

Q11

Computing Time
100,000 nodes; 100 partitions
Computing Q11 100,00x is Faster!

Storage Cost
100x saving!
1
1

Q1,1

Q1,2
Q1,k
32

~
~

Q11

Q: How to fix the green portions?


33

p2.2 Computing:

Q1,1

-1

Q1,2

U
Q1,k

9
1

10
12
11

4
5

6
7
34

We have:
Communities

Bridges

SM Lemma says:
1
1
1
1
%
Q Q1 cQ1 U VQ1

35

Roadmap
Background
RWR: Definitions
RWR: Algorithms

Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage

Experimental Results
Conclusion
36

On-Line Stage

r
ri

ei

Pre-Computation

0

0
0

1

0
0

0
0

0

0
0

0

Query

?
Result

A (SM lemma)
37

On-Line Query Stage

q1:
q2:
q3:
q4:
q5:
q6:

38

39

Roadmap
Background
RWR: Definitions
RWR: Algorithms

Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage

Experimental Results
Conclusion
40

Experimental Setup
Dataset
DBLP/authorship
Author-Paper
315k nodes
1,800k edges

Approx. Quality: Relative Accuracy


Application: Center-Piece Subgraph
41

Query Time vs. Pre-Compute Time


Log Query Time

Quality: 90%+
On-line:
Up to 150x speedup
Pre-computation:
Two orders saving

Log Pre-compute Time


42

Query Time vs. Pre-Storage


Log Query Time

Quality: 90%+
On-line:
Up to 150x speedup
Pre-storage:
Three orders saving

Log Storage
43

Roadmap
Background
RWR: Definitions
RWR: Algorithms

Basic Idea
FastRWR
Pre-Compute Stage
On-Line Stage

Experimental Results
Conclusion
44

Conclusion
FastRWR
Reasonable quality preservation (90%+)
150x speed-up: query time
Orders of magnitude saving: pre-compute & storage

More in the paper


The variant of FastRWR and theoretic justification
Implementation details
normalization, low-rank approximation, sparse

More experiments
Other datasets, other applications
45

Q&A

Thank you!
htong@cs.cmu.edu
www.cs.cmu.edu/~htong
46