Sei sulla pagina 1di 14

17BCE2135 17BCE0980 G2+TG2

SOCIAL NETWORKING IN INTELLIGENT HEALTH CARE


SYSTEMS -
USING LINK PREDICTION TO ESTIMATE THE CHANCES OF
OCCURRENCE OF GENETIC DISORDERS
Project Report (CSE3021– Social and Information Networks)

submitted by

MOHITH.J
(17BCE2135)

NAMAN UPADHYAY
(17BCE0980)

in partial fulfillment for the award of the degree of

B.TECH
in

COMPUTER SCIENCE AND ENGINEERING

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING

NOVEMBER 2019

1|Page
17BCE2135 17BCE0980 G2+TG2

CONTENTS

SL.NO TOPIC PAGE.NO

1 ABSTRACT 3

2 INTRODUCTION 4

3 PROBLEM STATEMENT 5

4 LITERATURE SURVEY 6

5 PRELIMINARY DESIGN OF THE SYSTEM 8

6 DETAILED DESIGN OF THE SYSTEM 9

7 IMPLEMENTATION 10

8 RESULTS AND DISCUSSION 13

9 CONCLUSION 14

2|Page
17BCE2135 17BCE0980 G2+TG2

ABSTRACT

Genetic diseases are disorders that are inherited by a person from his or her
parents or are related to some type of spontaneous genetic change. Genetic
diseases are caused by a change or mutation in an individual’s DNA.
Genetic diseases can be inherited, in which case people are born with them,
even if they are not noticeable at first. Some disorders, however, are not
inherited but develop spontaneously when disease-causing mutations occur
during cell division.

Link prediction is an important task in link mining. Link prediction is to predict


whether there will be links between two nodes based on the attribute
information and the observed existing link information. Link prediction not only
can be used in the field of social network but can also be applied in other fields.

The main aim of this project is to predict the chances of a person to be diagnosed
with a genetic disease. In order to do so, factors such as family history of the
person, environmental conditions in which he/she lives and various others will
be taken into consideration.

Firefly algorithm will be used to do the same. Based on the approach mentioned
in the firefly algorithm, the likeliness of a particular person to be diagnosed with
a certain kind of genetic disease can be estimated. A person is more likely to be
diagnosed if he/she is genetically related to another person who was diagnosed
with that disease. Various environmental factors such as pollution, radioactive
exposure, etc. can also be responsible for the occurrence of the disease. Taking
all of the above into consideration, we would like to propose a method to check
the probability of a person to be diagnosed using link prediction.

3|Page
17BCE2135 17BCE0980 G2+TG2

INTRODUCTION

This project focuses on predicting the chances of occurrence of a disease in a


person by analyzing the family tree and a graph of environmental factors
responsible for the onset of the disease. For the family tree, every person in the
family is considered as a node in the binary tree. The person in consideration is
the root of the tree and his/her ancestors are the following nodes. At every
junction, the two parents of a person are considered as two children on the node
in the graph. Only the parents of the person, their parents, etc. are considered
because they have a direct connection to the genetic make-up of the person.
Every person in the tree is either diagnosed with a particular disease, or is not.
If we do not know the status of a person regarding the disease, by default it is
considered that the person is not diagnosed as the probability of him/her not
being diagnosed is more. Using this graph and the firefly algorithm, we predict
the chances of occurrence of a disease in person.

Firefly Algorithm was first developed by Xin-She Yang in late 2007 and 2008 at
Cambridge University, which was based on the flashing patterns and behavior
of fireflies. In essence, this algorithm uses the following three idealized rules:
Fireflies are unisex so that one firefly will be attracted to other fireflies
regardless of their sex;
The attractiveness is proportional to the brightness, and they both decrease as
their distance increases, thus for any two flashing fireflies, the less bright one
will move towards the brighter one. If there is no brighter one than a particular
firefly, it will move randomly;
The brightness of a firefly is determined by the landscape of the objective
function. In this project, the brightness of a firefly or a node is regarded to be
binary, i.e., the firefly is either luminous or it is not. The person is either
diagnosed with the disease or not. The distance between the nodes in the family
tree is analogous to the distance between the fireflies.

Therefore, the probability of a person to be attracted to a disease is inversely


proportional to the distance between that person and the other people who are
diagnosed in the family tree. A formula is developed based on this logic and is
used to predict the chances of occurrence of the disease in that person.

4|Page
17BCE2135 17BCE0980 G2+TG2

PROBLEM STATEMENT

The main aim of this project is to predict the chances of a person to be diagnosed
with a genetic disease. The project analyses the family tree of the person and
determines the percentage chances of the person being diagnosed with a
genetic disorder.

5|Page
17BCE2135 17BCE0980 G2+TG2

LITERATURE SURVEY

[1] Firefly Algorithm for Optimization Problem

Link :
https://www.researchgate.net/publication/259472546_Firefly_Algorithm_for_
Optimization_Problem

This paper reviews the applications of Firefly Algorithm (FA) in various domain
of optimization problem. Optimization is a process of determining the best
solution to make something as functional and effective as possible by minimizing
or maximizing the parameters involved in the problems. Several categories of
optimization problem such as discrete, chaotic, multi-objective and many more
are addressed by inspiring the behavior of fireflies as mentioned in the
literatures. Literatures found that FA was mostly applied by researchers to solve
the optimization problems in Computer Science and Engineering domain. Some
of them are enhanced or hybridized with other techniques to discover better
performance. In addition, literatures found that most of the cases that used FA
technique have outperformed compare to other meta heuristic algorithms.

[2] An Efficient Link Prediction Technique in Social Networks based on Node


Neighborhoods

Link :
https://thesai.org/Downloads/Volume9No6/Paper_37-
An_Efficient_Link_Prediction_Technique.pdf

The unparalleled accomplishment of social networking sites, such as


Facebook, LinkedIn and Twitter has modernized and transformed the way
people communicate to each other. Nowadays, a huge amount of
information is being shared by online users through these social networking
sites. Various online friendship sites such as Facebook and Orkut, allow online
friends to share their thoughts or opinions, comment on others’ timeline
or photos, and most importantly, meet new online friends who were known
to them before. However, the question remains as to how to quickly
propagate one’s online network by including more and more new friends.
For this, one of the easy methods used is list of ‘Suggested Friends ‘provided

6|Page
17BCE2135 17BCE0980 G2+TG2

by these online social networking sites. For suggestion of friends, prediction of


links for each online user is needed to be made based on studying the
structural properties of the network. Link prediction is one of the key research
directions in social network analysis which has attracted much attention in
recent years. This paper discusses about a novel efficient link prediction
technique Link-Gypand many other commonly used existing prediction
techniques for suggestion of friends to online users of a social network and
also carries out experimental evaluations to make a comparative analysis among
each technique. Our results on three real social network datasets show that
the novel Link-Gyplink prediction technique yields more accurate results than
several existing link prediction techniques.

[3] Link Prediction using Supervised Learning

Link :
https://archive.siam.org/meetings/sdm06/workproceed/Link%20Analysis/12.p
df

Social network analysis has attracted much attention in re-cent years. Link
prediction is a key research directions within this area. In this research, we study
link prediction as a supervised learning task. Along the way, we identify a set of
features that are key to the superior performance under the supervised learning
setup. The identified features are very easy to compute, and at the same time
surprisingly effective in solving the link prediction problem. We also explain the
effectiveness of the features from their class density distribution. Then we
compare different classes of supervised learning algorithms in terms of their
prediction performance using various performance metrics, such as ac-curacy,
precision-recall, F-values, squared error etc. with a 5-fold cross validation. Our
results on two practical social network datasets shows that most of the well-
known classification algorithms (decision tree, k-nn, multilayer perceptron,
SVM, rbf network) can predict link with surpassing performances, but SVM
defeats all of them with narrow marginal different performance measures.
Again, ranking of features with popular feature ranking algorithms shows that a
small subset of features always plays a significant role in the link prediction job.

7|Page
17BCE2135 17BCE0980 G2+TG2

PRELIMINARY DESIGN OF THE SYSTEM

8|Page
17BCE2135 17BCE0980 G2+TG2

DETAILED DESIGN OF THE SYSTEM


In the algorithm used in this project, the people in the family are considered as
nodes in a binary tree. Every node has a key, which matches the criteria of a
binary search tree. The person being considered is the root of the tree.
Each node will also be associated with a disease attribute, which will be set to 1
if that person is/was diagnosed with that particular disease. Total distance is the
sum of the reciprocal of all distances from every node to the root.
Total_impact_distance is the sum of reciprocal of all the distances from every
node with disease attribute equal to 1 to the root. The chance of the person
being diagnosed with the disease is the ratio of the total_impact_distance to the
total distance. The ratio is multiplied by 100 to obtain the percentage.

Algorithm:
members_of_family[31];
disease[31]; //initialize
//Build a binary tree with members_of_family.
total distance = 0;
total_impact_distance = 0;
for( i from 1 to 31):
dist = distance_from_members_of_family[0] to members_of_family[i];
total_distance += dist;
if( disease[i] = 1)
total_impact_distance += dist;
chance=(total_impact_distance/total_distance)*100;

This algorithm currently considers only the family history of a person to predict
the occurrence of a disease.

9|Page
17BCE2135 17BCE0980 G2+TG2

IMPLEMENTATION
#include <iostream>
#include <cstdlib>
using namespace std;

struct Family_tree
{
struct Family_tree *left, *right;
int key, disease;
};

//creating a new node for a family member


struct Family_tree* newMember(int key, int disease)
{
struct Family_tree* ptr = new Family_tree;
ptr->key = key;
ptr->disease = disease;
ptr->left = ptr->right = NULL;
return ptr;
}

//inserting the family member in the family tree


struct Family_tree* insert(struct Family_tree* root, int key, int disease)
{
if(!root)
root = newMember(key, disease);
else if(root->key > key)
root->left = insert(root->left, key, disease);
else if(root->key < key)
root->right = insert(root->right, key, disease);
return root;
}

//Calculating the distance of a node from the root


int distance(struct Family_tree* root, int x)
{
if(root->key == x)
return 0;
else if(root->key > x)
return (distance(root->left, x) + 1);
else
return (distance(root->right, x) + 1);

void implement(int members[31], int keys[31])


{

10 | P a g e
17BCE2135 17BCE0980 G2+TG2

struct Family_tree* root = NULL;

cout<<"The sequence: "<<endl;


for(int i=0;i<31;i++)
{
cout<<members[i]<<" ";
}
cout<<endl;

for(int i=0;i<31;i++)
{
root = insert(root, keys[i], members[i]);
}

double total_distance=0, total_impact_distance=0;


for(int i=1;i<31;i++)
{
double dist = 1/(double)distance(root,keys[i]);
total_distance += dist;
if(members[i]==1)
{
total_impact_distance += dist;
}
}
double chance = (total_impact_distance/total_distance)*100;
cout<<"The chances of occurance of the disease are: "<<chance<<"%"<<endl;;
}

int main()
{
int keys[31];
keys[0]=16;
keys[1]=8;
keys[2]=24;
keys[3]=4;
keys[4]=12;
keys[5]=20;
keys[6]=28;
keys[7]=2;
keys[8]=6;
keys[9]=10;
keys[10]=14;
keys[11]=18;
keys[12]=22;
keys[13]=26;
keys[14]=30;
keys[15]=1;
keys[16]=3;
keys[17]=5;
keys[18]=7;
keys[19]=9;

11 | P a g e
17BCE2135 17BCE0980 G2+TG2

keys[20]=11;
keys[21]=13;
keys[22]=15;
keys[23]=17;
keys[24]=19;
keys[25]=21;
keys[26]=23;
keys[27]=25;
keys[28]=27;
keys[29]=29;
keys[30]=31;

int members[10][31];
for(int i=0;i<10;i++)
{
members[i][0]=0;
for(int j=1;j<31;j++)
{
members[i][j] = rand()%2;
}
implement(members[i],keys);
}
/*int members1[31] = {0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1};
implement(members1,keys);*/

12 | P a g e
17BCE2135 17BCE0980 G2+TG2

RESULTS AND DISCUSSIONS

10 Random sequences were generated using random() function. The chances of


getting genetic disorder has been calculated based on firefly algorithm as
discussed before. The following results showed up for the 10 randomly
generated sequences.

The percentage chances of occurrence of the disease are determined based on


the family tree

13 | P a g e
17BCE2135 17BCE0980 G2+TG2

CONCLUSION
Therefore in this project we have discussed about prediction of a genetic
disorder for a person based upon his family tree. We have used the concept of
Link Prediction in social information networks and Firefly algorithm to predict
the chances of occurrence of the disease. The project scope can be extended by
considering other environmental like pollution, radioactive exposure cleanliness
of the person, exposure to harmful gases or chemicals etc. and health factors
too.

14 | P a g e

Potrebbero piacerti anche