Sei sulla pagina 1di 29

Prof.

Silvio Tosatto
Aula 90, 5° est, Vallisneri
Email: silvio.tosatto@unipd.it,
Tel: 049-827-6269

Structural Bioinformatics
A.Y. 2017/2018

BioComputing UP,
Dipartimento di Scienze Biomediche,
Università di Padova
URL: http://protein.bio.unipd.it/
Bioinformatics
What is it?

Computer Science Molecular


Bioinformatics
Biology

1. Biological information can be studied with approaches typical of information


theory (aka computer science).

2. Understanding biology requires computer science.

3. Computer science as a tool for biological data management.


The biological data deluge is upon us…
PROTEINS
>IGF1R_HUMAN
MKSGSGGGSPTSLWGLLFLSAALSLWPTSGEICGPGIDIRNDYQQLKRLENCTVIEGYLH
ILLISKAEDYRSYRFPKLTVITEYLLLFRVAGLESLGDLFPNLTVIRGWKLFYNYALVIF
EMTNLKDIGLYNLRNITRGAIRIEKNADLCYLSTVDWSLILDAVSNNYIVGNKPPKECGD
LCPGTMEEKPMCEKTTINNEYNYRCWTTNRCQKMCPSTCGKRACTENNECCHPECLGSCS
APDNDTACVACRHYYYAGVCVPACPPNTYRFEGWRCVDRDFCANILSAESSDSEGFVIHD
GECMQECPSGFIRNGSQSMYCIPCEGPCPKVCEEEKKTKTIDSVTSAQMLQGCTIFKGNL
LINIRRGNNIASELENFMGLIEVVTGYVKIRHSHALVSLSFLKNLRLILGEEQLEGNYSF
YVLDNQNLQQLWDWDHRNLTIKAGKMYFAFNPKLCVSEIYRMEEVTGTKGRQSKGDINTR
NNGERASCESDVLHFTSTTTSKNRIIITWHRYRPPDYRDLISFTVYYKEAPFKNVTEYDG
QDACGSNSWNMVDVDLPPNKDVEPGILLHGLKPWTQYAVYVKAVTLTMVENDHIRGAKSE
ILYIRTNASVPSIPLDVLSASNSSSQLIVKWNPPSLPNGNLSYYIVRWQRQPQDGYLYRH
NYCSKDKIPIRKYADGTIDIEEVTENPKTEVCGGEKGPCCACPKTEAEKQAEKEEAEYRK
VFENFLHNSIFVPRPERKRRDVMQVANTTMSSRSRNTTAADTYNITDPEELETEYPFFES
RVDNKERTVISNLRPFTLYRIDIHSCNHEAEKLGCSASNFVFARTMPAEGADDIPGPVTW
EPRPENSIFLKWPEPENPNGLILMYEIKYGSQVEDQRECVSRQEYRKYGGAKLNRLNPGN
YTARIQATSLSGNGSWTDPVFFYVQAKTGYENFIHLIIALPVAVLLIVGGLVIMLYVFHR
KRNNSRLGNGVLYASVNPEYFSAADVYVPDEWEVAREKITMSRELGQGSFGMVYEGVAKG
VVKDEPETRVAIKTVNEAASMRERIEFLNEASVMKEFNCHHVVRLLGVVSQGQPTLVIME
LMTRGDLKSYLRSLRPEMENNPVLAPPSLSKMIQMAGEIADGMAYLNANKFVHRDLAARN
CMVAEDFTVKIGDFGMTRDIYETDYYRKGGKGLLPVRWMSPESLKDGVFTTYSDVWSFGV
VLWEIATLAEQPYQGLSNEQVLRFVMEGGLLDKPDNCPDMLFELMRMCWQYNPKMRPSFL
EIISSIKEEMEPGFREVSFYYSEENKLPEPEELDLEPENMESVPLDPSASSSSLPLPDRH
SGHKAENGPGPGVLVLRASFDERQPYAHMNGGRKNERALPLPQSSTC
Example: Insulin-like growth factor
receptor (IGFR1)
IGFR1 structure
IGFR1 structure

>IGF1R_HUMAN
MKSGSGGGSPTSLWGLLFLSAALSLWPTSGEICGPGIDIRNDYQQLKRLENCTVIEGYLH
ILLISKAEDYRSYRFPKLTVITEYLLLFRVAGLESLGDLFPNLTVIRGWKLFYNYALVIF
EMTNLKDIGLYNLRNITRGAIRIEKNADLCYLSTVDWSLILDAVSNNYIVGNKPPKECGD
LCPGTMEEKPMCEKTTINNEYNYRCWTTNRCQKMCPSTCGKRACTENNECCHPECLGSCS
APDNDTACVACRHYYYAGVCVPACPPNTYRFEGWRCVDRDFCANILSAESSDSEGFVIHD
GECMQECPSGFIRNGSQSMYCIPCEGPCPKVCEEEKKTKTIDSVTSAQMLQGCTIFKGNL
LINIRRGNNIASELENFMGLIEVVTGYVKIRHSHALVSLSFLKNLRLILGEEQLEGNYSF
YVLDNQNLQQLWDWDHRNLTIKAGKMYFAFNPKLCVSEIYRMEEVTGTKGRQSKGDINTR
NNGERASCESDVLHFTSTTTSKNRIIITWHRYRPPDYRDLISFTVYYKEAPFKNVTEYDG
QDACGSNSWNMVDVDLPPNKDVEPGILLHGLKPWTQYAVYVKAVTLTMVENDHIRGAKSE
ILYIRTNASVPSIPLDVLSASNSSSQLIVKWNPPSLPNGNLSYYIVRWQRQPQDGYLYRH
NYCSKDKIPIRKYADGTIDIEEVTENPKTEVCGGEKGPCCACPKTEAEKQAEKEEAEYRK
VFENFLHNSIFVPRPERKRRDVMQVANTTMSSRSRNTTAADTYNITDPEELETEYPFFES
RVDNKERTVISNLRPFTLYRIDIHSCNHEAEKLGCSASNFVFARTMPAEGADDIPGPVTW
EPRPENSIFLKWPEPENPNGLILMYEIKYGSQVEDQRECVSRQEYRKYGGAKLNRLNPGN
YTARIQATSLSGNGSWTDPVFFYVQAKTGYENFIHLIIALPVAVLLIVGGLVIMLYVFHR
KRNNSRLGNGVLYASVNPEYFSAADVYVPDEWEVAREKITMSRELGQGSFGMVYEGVAKG
VVKDEPETRVAIKTVNEAASMRERIEFLNEASVMKEFNCHHVVRLLGVVSQGQPTLVIME
LMTRGDLKSYLRSLRPEMENNPVLAPPSLSKMIQMAGEIADGMAYLNANKFVHRDLAARN
CMVAEDFTVKIGDFGMTRDIYETDYYRKGGKGLLPVRWMSPESLKDGVFTTYSDVWSFGV
VLWEIATLAEQPYQGLSNEQVLRFVMEGGLLDKPDNCPDMLFELMRMCWQYNPKMRPSFL
EIISSIKEEMEPGFREVSFYYSEENKLPEPEELDLEPENMESVPLDPSASSSSLPLPDRH
SGHKAENGPGPGVLVLRASFDERQPYAHMNGGRKNERALPLPQSSTC
IGFR1 structure

>IGF1R_HUMAN
MKSGSGGGSPTSLWGLLFLSAALSLWPTSGEICGPGIDIRNDYQQLKRLENCTVIEGYLH
ILLISKAEDYRSYRFPKLTVITEYLLLFRVAGLESLGDLFPNLTVIRGWKLFYNYALVIF
EMTNLKDIGLYNLRNITRGAIRIEKNADLCYLSTVDWSLILDAVSNNYIVGNKPPKECGD
LCPGTMEEKPMCEKTTINNEYNYRCWTTNRCQKMCPSTCGKRACTENNECCHPECLGSCS
APDNDTACVACRHYYYAGVCVPACPPNTYRFEGWRCVDRDFCANILSAESSDSEGFVIHD
GECMQECPSGFIRNGSQSMYCIPCEGPCPKVCEEEKKTKTIDSVTSAQMLQGCTIFKGNL
LINIRRGNNIASELENFMGLIEVVTGYVKIRHSHALVSLSFLKNLRLILGEEQLEGNYSF
YVLDNQNLQQLWDWDHRNLTIKAGKMYFAFNPKLCVSEIYRMEEVTGTKGRQSKGDINTR
NNGERASCESDVLHFTSTTTSKNRIIITWHRYRPPDYRDLISFTVYYKEAPFKNVTEYDG
QDACGSNSWNMVDVDLPPNKDVEPGILLHGLKPWTQYAVYVKAVTLTMVENDHIRGAKSE
ILYIRTNASVPSIPLDVLSASNSSSQLIVKWNPPSLPNGNLSYYIVRWQRQPQDGYLYRH
NYCSKDKIPIRKYADGTIDIEEVTENPKTEVCGGEKGPCCACPKTEAEKQAEKEEAEYRK
VFENFLHNSIFVPRPERKRRDVMQVANTTMSSRSRNTTAADTYNITDPEELETEYPFFES
RVDNKERTVISNLRPFTLYRIDIHSCNHEAEKLGCSASNFVFARTMPAEGADDIPGPVTW
EPRPENSIFLKWPEPENPNGLILMYEIKYGSQVEDQRECVSRQEYRKYGGAKLNRLNPGN
YTARIQATSLSGNGSWTDPVFFYVQAKTGYENFIHLIIALPVAVLLIVGGLVIMLYVFHR
KRNNSRLGNGVLYASVNPEYFSAADVYVPDEWEVAREKITMSRELGQGSFGMVYEGVAKG
VVKDEPETRVAIKTVNEAASMRERIEFLNEASVMKEFNCHHVVRLLGVVSQGQPTLVIME
LMTRGDLKSYLRSLRPEMENNPVLAPPSLSKMIQMAGEIADGMAYLNANKFVHRDLAARN
CMVAEDFTVKIGDFGMTRDIYETDYYRKGGKGLLPVRWMSPESLKDGVFTTYSDVWSFGV
VLWEIATLAEQPYQGLSNEQVLRFVMEGGLLDKPDNCPDMLFELMRMCWQYNPKMRPSFL
EIISSIKEEMEPGFREVSFYYSEENKLPEPEELDLEPENMESVPLDPSASSSSLPLPDRH
SGHKAENGPGPGVLVLRASFDERQPYAHMNGGRKNERALPLPQSSTC
IGFR1 structure

Receptor L domain Fibronectin 3 domain


Protein tyrosine kinase
domain
IGFR1 structure

Signal peptide Transmembrane Disorder

Receptor L domain Fibronectin 3 domain


Protein tyrosine kinase
domain
IGFR1 structure
Receptor ligand binding Cell adhesion Signal transduction

Signal peptide Transmembrane Disorder

Receptor L domain Fibronectin 3 domain


Protein tyrosine kinase
domain
Future: Systems biology
Proteins

Sequence Structure Function (?!)

MERPEPELIRQSWRAVSRSPLEHGTV
LFARLFALEPDLLPLFQYNCRQFSSP
EDCLSSPEFLDHIRKVMLVIDAAVTN
VEDLSSLEEYLASLGRKHRAVGVKLS
SFSTVGESLLYMLEKCLGPAFTPATR
AAWSQLYGAVVQAMSRGWDGE

ligand active site


In silico methods

Sequence Structure Function (?!)

Comparative Modelling & Docking


Fold Recognition
Database

3D characteristics
Alignments (from structure)
1D predictions
(sequence based)
Bioinformatics to understand disease-associated aspects of
protein structure and function

Interpretation of disease-
associated sequence
variants
p.C152F

Design and interpretation of >protein


experimental studies MPRRAENWDEAEVGAEEAGVE
EYGPEEDGGEESGAEESGPEE
SGPEELGAEEEMEAGRPRVLR
SVNSREPSQVIFCNRSPRVVLP
VWLNFDGEPQPYPTLPPGTGR
RIHSYRGHLWLFRAGTHDGLLV
NQTELFVPSLNVDGQPIFANITL
PVYTLKERCLQVVRSLVKPENY
RRLDIRSLYEDLEDHPNVQKDL
ERLTQERIAHQRMGD
Bioinformatics to understand disease-associated aspects of
protein structure and function
Alignment and identification of homologous sequences
Exploration of protein interaction network

Characterization of the primary protein architecture

Prediction of secondary and tertiary structure


Analysis of binding sites for proteins and ligands
Course structure
Topics

• Sequence alignments, searches & databases


• Structure analysis & prediction
• Functional analysis & databases
• Protein interactions

BioPython

http://biopython.org/
Useful information
All the course matherial is availble on the E-learning site:
http://elearning.unipd.it/dsb/
Registering to “Bioinformatics & Computational Biology“ course of L.Bio.Mol.
• Slides (previous year).
• Lecture notes (“dispense“).

Most of the software used during exercises is available from:


http://protein.bio.unipd.it/

Books: (not mandatory)


“Bioinformatica” (A. Tramontano, Zanichelli)

“Introduction to Bioinformatics” (A. Lesk, Oxford University Press)


Practicals

Assistant: Dr. Damiano Piovesan (damianopiovesan@gmail.com)

Practical sessions will take place once per week, from 14.30 to 17.30.
• 4 practicals: sequence, structure (x 2), non-globular proteins
• Online resources
• BioPython examples

• Final exam: coding project

Office hours: by appointment,


or Tuesday afternoon, from 15.30 to 17.00.
Evolution
Evolution

“The time will come, I believe, though I shall not live to see it, when we shall
have fairly true genealogical trees of each great kingdom of Nature.”
Charles Darwin
Evolution

Molecular evolution:
What does it mean???

“The time will come, I believe, though I shall not live to see it, when we shall
have fairly true genealogical trees of each great kingdom of Nature.”
Charles Darwin
Evolution
The study of changes occurring in DNA and in its products is the
object of study of Molecular Evolution.

AAAAAAAAA

ACAAAAAAAA
AAAAAADAA
ACAAAAARAA
ADAAAADAA AAAACADAA ACAAATAAAA

AEAAAADAA
ACAAQTAAAA
AEASAADAA ACAAATAAAW
AEAAAADAW
Evolution
The study of changes occurring in DNA and in its products is the
object of study of Molecular Evolution.
Alternative methods
Example: Alternative viewpoints on proteins

• Evolutionary model
– Desciptive
– “Knowledge-based“

• Physical model
– predictive
– Optimization (“Ab initio“)
Alternative methods
• A lot of methods we will cover (e.g. pattern and Neural
Networks) are knowledge-based, meaning that they require
background knowledge on the field of study with training sets
on which build predictions on top of.
– Use of previous knowledge to interpret new case, no new knowledge
generation.
– Not able to predict situations different from the already observed ones.

• As an alternative (e.g. alignments) it is possible to build


predicitons based on simple observations and sets of rules (to
be defined). These methods are called optimization and are
able to generate new, not observed yet, solutions.
– They create hypothesis which explain a behaviour, e.g. of Nature.
– In bioinformatics often are called de novo or ab initio.