Sei sulla pagina 1di 22

A

TECHNICAL SEMINAR
PRESENTATION
On
“MACHINE LEARNING IN GENOMIC
MEDICINE”
Table of content
•What is machine learning ?
•What is genomic medicine ?
•Working of ML
•Categories of ML
•Heterogeneous data
•Feature selection
•Missing data
•Applications
•conclusion
•Future scope
What is Machine Learning (ML)?

ML is an application of artificial
intelligence(AL) that provides the
system to automatically learn and
improve from experience without
being explicitly programmed.
What is Genomic Medicine?
Genomic medicine, sometimes also
known as personalized medicine, is a
way to customize medical care to
your body's unique genetic makeup.
Each of the cells in the body contains
DNA, the molecules you inherit from
your parents that determine how your
body looks and functions.
Working of ML
The process proceeds in three stages :
1. Development of an algorithm.
2. Providing a large collection of TSS & not to
be TSS sequences. The annotation indicating
these sequences is known as the label. The
algorithm processes these labeled sequences
& stores a model.
3. Unlabeled sequences are given to the
algorithm, and it uses the model to predict
labels (“TSS” or “not TSS”) for each sequence
Fig. 3 -Machine Learning
Categories of Machine Learning
•Supervised learning : Supervised learning
only makes sense when a labeled training set is
available.
•Unsupervised learning: ML algorithm takes
as input only the unlabeled data and the desired
number of different labels to assign . The
algorithm then automatically partitions the
genome into segments and assigns a label to
each segment, with the goal of assigning the
same label to segments that have similar data.
• Semi-supervised setting: It is a mixture of
above two: the algorithm receives a collection
of data points, but only a subset of those points
have associated labels. The learning procedure
begins by constructing an initial gene finding
model based solely on the labeled subset of the
training data. Then the model is used to scan
the genome, and tentative labels are assigned
throughout the genome. These tentative labels
can then be used to improve the learned model,
and the procedure iterates until no new genes
are found.
Generative & Discriminative Models

• Generative models build a full model of the


distribution of features in each of the two
classes and then compares how those two
distributions differ from one another.
• Discriminative approach focuses on accurately
modeling just the boundary between the two
classes
Heterogeneous data
For example, assign Gene Ontology terms to
genes. For a given term, a wide variety of
types of data might be relevant, including the
amino acid sequence of the gene's protein
product. Such data sets are difficult to analyze
jointly because of their heterogeneity.
Figure 2 Three ways to accommodate heterogeneous data in machine
learning
Methods to handle heterogeneous
data
1. The most straightforward way to solve this problem
is to transform each type of data into vector format
prior to processing (Figure A).
2. Alternatively, each type of data can be encoded
using a kernel function (Figure B), with one kernel
for each data type.
3. probability model explicitly represents diverse data
types in the model itself (Figure C).
Feature selection
In any supervised learning algorithm a large
set of feature is given and then automatically
decides to ignore some of the features,
focusing on the subset of features most
relevant to the task at hand.
Important points for feature
selection
• identify a very small set of features that yield
the best possible classifier.
• to identify all and only the genes whose
expression is actually relevant to the task at
hand.
• to train the most accurate possible classifier.
MISSING DATA
Missing values can come from a variety of
sources, such as defective cells, unmapped
genome positions. Missing data values can be
divided into two types:
• (1) values that are missing at random or for
reasons unrelated to the task at hand, such as
defective microarray cells
• (2) values whose absences provides information
about the task at hand, such as saturated
detectors..
Handling missing data
• Impute the missing value by replacing all
missing values with zero .
• Include in the model information about the
missing of each data point.
Applications of ML in Genomics

•Gene Editing :Gene editing is defined as a


method of making specific alterations to DNA
at the cellular or organism level .

•Direct-to-Consumer genomics :Companies


are using machine learning to achieve greater
depth in the interpretation of genetic
information such as how an individual’s genes
may impact their weight.
•Clinical Workflow :There are often gaps in the
patient data available to the different members
of a healthcare team serving a patient. By using
machine learning to the efficiency of the clinical
workflow process can be improved.

•Genome sequencing :ML can be used to


identify patterns within high volume genetic
data sets ,which may help predict an
individual’s probability of developing certain
diseases or help inform the design of potential
therapies.
Conclusion
•The field of machine learning is concerned with the development
and application of computer algorithms that improve with
experience.
•Machine learning methods can be divided into supervised, semi
supervised and unsupervised methods.
•Prior information can be added to a model in order to train the
model more effectively given limited data, limit the complexity of
the model, or incorporate data not used by the model directly.
•The field of machine learning is concerned with the development
and application of computer algorithms that improve with
experience.
•Choosing an appropriate performance measure depends strongly
upon the application task. Machine learning methods are most
effective when they optimize an appropriate performance measure.
Future scope
• Newborn Genetic Screening Tools :Data
collected at birth will help in detection of
disease .
• Agriculture :genomics to help improve soil
quality and crop yield .
• Pharmacogenomics : tells us how an
individual responds to drugs.
References
[1] Mitchell T. Machine Learning. McGraw-Hill; 1997.
[2] Ohler W, Liao C, Niemann H, Rubin GM. Computational analysis of core
promoters in the drosophilagenome. Genome Biology. 2002;3
[3] Degroeve S, Baets BD, de Peer YV, Rouz P. Feature subset selection for
splice site prediction. Bioinformatics. 2002;18:S75–S83.
[4] Bucher P. Weight matrix description of four eukaryotic RNA polymerase II
promoter elements derived from 502 unrelated promoter sequences. Journal of
Molecular Biology. 1990;4:563–578.
[5] Heintzman N, et al. Distinct and predictive chromatin signatures of
transcriptional promoters and enhancers in the human genome. Nature
Genetics. 2007;39:311–318.
[6] Segal E, et al. A genomic code for nucleosome
positioning. Nature. 2006;44:772–778.
[7] Picardi E, Pesole G. Computational methods for ab initio and comparative
gene finding. Methods in Molecular Biology. 2010;609:269–284.
[8] Gene Ontology Consortium Gene ontology: tool for the unification of
biology. Nature Genetics. 2000;25:25–29.
THANK YOU

Potrebbero piacerti anche