Sei sulla pagina 1di 28

Link Prediction In Social Network

"Prediction is very difficult, especially if it's about the future."

—Nils Bohr, Nobel laureate in Physics

Submitted By:
Sourav sharma – 378/CO/15
 A Social Network is a collection of social actors (peoples,
organizations, co-workers etc.) and the connections or
relations between those actors.
 Social networks are best represented by Graphs where
vertices (nodes) represent the social actors and edges (links)
represent the relation between those actors.
 The relationships between people keep evolving with time,
new connections appear and sometime old ones disappear
and hence the edges in a social network keep changing with
Example Of A Social Network
 Real world examples of social
network include the set of
friends, families and followers
with edges defining relations
between them; or the set of
authors, or scientists with edges
defining their relations (co-
authors, research partners etc.)
with others in the network.
Problem Statement

Given a snapshot of a social network at time t (or network evolution

between t1 and t2), we seek to accurately predict the edges that will be
added to the network during the interval from time t (or t2) to a
given future time t'.
The Problem

 So, given a social network, can we

predict how the future network
would look like? Or what new
edges are likely to be formed in the
 This problem is known as the link
prediction problem in the social
networks and it is a part of social
network mining or analysis.
Will nodes 33 and 28
become friends in the

Does network structure

What about nodes contain enough
27 and 4? information to predict
what new links will form
in the future?
Detailed Introduction

 The link prediction problem can also be extended to inferring

missing (erased or broken) links in the network, for example, in a
meeting of three persons, the (missing) name of third participant
can be predicted if the other two persons and their past meetings
in the network are known.
 Real world examples of link prediction are suggesting friends and
followers on social networking websites, suggesting relevant
products to customers in e-commerce, providing suggestions to
scientists or researchers to work together based on their fields of
Who To Follow ?
What will Facebook friendships look like tomorrow?
 Friendship and Dating: It is the most popular use case of link
prediction. Almost all the friendship and dating websites use it to suggest
friends/families or dating partners to their users.
 E-commerce: E-commerce websites such as online shopping use the
link prediction methods to suggest various products to their users.
 Work Organizations: By analyzing the employee data, organizations
can build systems to suggest their employees best matched project or
team to work with.
 Public Security: It is very hard to predict the crimes to be committed
in the future and individuals behind those but by analyzing the criminals
and terrorists network, the links which have not been seen so far, can be
predicted which might be helpful in predicting the nature, place and the
involved individuals.
 Research and collaborations: Research in a field is often a long and
tedious process which can be made interesting and more successful by
bringing together the scientists with more common interests and goals.
Methods for Link Prediction

 Take the input graph during training period.

 Pick a pair of nodes (x, y).
 Assign a connection weight score(x, y).
 Make a list in descending order of score.
 Score is a measure of proximity.
 Any ideas for measures?
Methods for Link Prediction
 Common Neighbors: This
x method calculates a score based on
the number of common neighbors
(mutual friends, followers,
interests or other features) of
nodes x and y. Higher the number
of common nodes, higher are the
chances that they will have an edge
CN  3 (relation or collaboration) in the
Methods for Link Prediction
 Jaccard Coefficient: Jaccard
Coefficient calculates the probability
that nodes x and y have a common
feature, for a randomly selected feature
f out of set of all neighbors around x
and y.
 This method might look similar to the
y Common Neighbors method but its
predictions are not exactly same as of
the latter.
 JC(x, y) = (Nx ∩ Ny) / (Nx ∪ Ny)
Methods for Link Prediction
 Adamic – Adar (Frequency
weighted common
neighbors): Adamic and Adar
proposed this formula which weighs
the common neighbors with smaller
degree more heavily. This means if x
and y share one or more common
y neighbors which are not much popular
in the network, the chances of x having
an edge with y in future are higher.
 AA(x, y) = ∑(z ∈ (Nx ∩ Ny)) 1/ log Nz
Methods for Link Prediction
 Preferential
Attachment:This method
states that the probability that
a new future edge involves
node x is directly proportional
to neighbors of x. In real
y world, it matches the idea –
rich become richer. In other
words, more the number of
current friends, higher are the
PA  d x d y chances of making new friends
in future.
Methods for Link Prediction

x  Katz score
 Measures number of paths between
two nodes, attenuated by their length
 Hitting time
 Expected time for a random walk
from x to reach y
Our Work In Link Prediction

 Given the below snapshot a social

network, we used a method to
calculate the prediction scores for
each of the 10 nodes in the picture.
 We ranked higher to the new
connection (currently not existing)
between nodes who share less
popular nodes in the network.
 Using the below formula:
AA(x,y)=∑(z∈(Nx∩Ny)) 1 / logNz
Adam Blake Clayton

Dexter, 1.4426950408889634 Irvin, 1.8204784532536746 Irvin, 0.9102392266268373*

Irvin, 0.9102392266268373 Adam, 0.9102392266268373 Blake, 0.9102392266268373*

Blake, 0.9102392266268373 Clayton, 0.9102392266268373 *(same score for both)

Dexter Edward Ford

Irvin, 2.8853900817779268 Geoffrey, 2.164042561333445 Hilton, 1.631586747071319

Adam, 1.4426950408889634 Jacob, 1.4426950408889634 Jacob, 0.9102392266268373

Hilton, 0.7213475204444817 Edward, 0.7213475204444817

Ford, 0.7213475204444817 Geoffrey, 0.7213475204444817

Geoffrey Hilton Irvin

Edward, 2.164042561333445 Ford, 1.631586747071319 Dexter, 2.8853900817779268

Hilton, 0.7213475204444817 Jacob, 0.9102392266268373 Blake, 1.8204784532536746

Ford, 0.7213475204444817 Edward, 0.7213475204444817 Adam, 0.9102392266268373

Geoffrey, 0.7213475204444817 Clayton, 0.9102392266268373


 In the above table,we found out that higher the score value, the chances
are higher that the nodes would connect in future. For example, chances
of Blake connecting with Irvin are more than that with Adam.
 The results are presented as:

1. Factor improvement of proposed predictors over

 Random predictor
 Graph distance predictor
 Common neighbors predictor

2. Relative performance vs. the above predictors

3. Common Predictions
Factor Improvement By Our Work
Relative Performance vs Random Predictions
Results on DataSets
 We discussed that social networks are made of social actors and
connections or relations between them and these networks are
best represented by Graphs where nodes are social actors and
edges are the connections between nodes. Social networks are
often very large and grow at a fast rate.
 These social network graphs consist of so much data that it is a
challenging task to process and do the analysis. In the link
prediction methods, we take a small snapshot of these large
networks to analyze and predict future edges between the nodes.
 We have also seen various methods to solve the link prediction
problems but none of them can be said to be the best as they all
introduce incorrect predictions too. But they can be further
improved by incorporating some additional methods such as
refining the dataset or introducing psychological studies
Future Scope
 Although many efforts have been put in developing link prediction
algorithms in the past, but still there is much room for the
 The predictions can be made more accurate by refining the edges,
for example – assigning large weight to recent edges and small to
old ones and by introducing psychology to better handle the
human characteristics present in the network.
 Social network are very large and processing them takes too much
time which further increase as the network grows, hence faster
algorithms can be developed. In addition, supervised algorithms
based on classifications and regressions can be developed which
might result in better predictions.
 Liben-NoWell , David and Kleinberg, Jon, 2004, The Link Prediction Problem
for Social Networks
 Lada A. Adamic and Eytan Adar. Friends and neighbors on
the web. Social Networks, 25(3):211{230, July 2003.
 A. L. Barabasi, H. Jeong, Z. N_x0013_ eda, E. Rav asz,
A. Schubert, and T. Vicsek. Evolution of the social network
of scientist collaboration. Physica A, 311(3{4):590{614,
 Sergey Brin and Lawrence Page. The anatomy of a large-
scale hyper textual Web search engine Computer
Networks and ISDN Systems, 30(1{7):107{117, 1998.