Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Q
Nives Skunca
Slides prepared by Dr. Christophe Dessimoz
19/21 September 2012
This week
Course introduction
Basic Biology
perturbation
Reality
observation
Catalogue
observation
observation
Nature
Model
formulate/select
recreate life
synthetic biology
take it apart
in vitro
obs.
obs.
obs.
perturb.
Validate on
real data
obs.
perturb.
f(x)
Validate
Estimate
by simulation
prediction
Learning Outcomes
Topics
Molecular Genetics
Gene Evolution
Genome Evolution
Mass Spectrometry
Codon Bias
Modeling
Dynamic programming
Markov models
Least squares
Maximum Likelihood
Optimization
Heuristics
Simulation
Organization
Lecture
Exercises:
Teaching Assistants
Stefan Zoller
Nives Skunca
Date
Sept. 19/21
Topic
Course Introduction; Basic Molecular
Biology
Sept. 26/28 Markov models/String Alignment I
Oct. 3/5
String Alignment II (indels, estimating
distances)
Oct. 10/12
Substitution Matrices
Oct. 17/19
Approximate Alignment Methods;
Statistics of Pairwise Alignments
Oct. 24/26
Phylogeny I
Oct.31/Nov.2 Phylogeny II
Nov. 7/9
Phylogeny III
Nov. 14/16 Multiple Sequence Alignments
Nov. 21/23 Synthetic Evolution; Evaluation of
Estimators
Nov. 28/30 Current research; Mass profiling
Dec. 5/7
Dec. 12/14
Dec. 19/21
Lecturer
NS
GHG
GHG
GHG
GHG
GHG
GHG
GHG
AS
DD/GHG
Guests/
GHG
NS
SZ
GHG
Written Exam
Course Homepage
http://www.cbrg.ethz.ch/education/CompBiol
Course details
Schedule
Slides
Exercises
Darwin
Biorecipes
www.biorecipes.com
A collection of real
problems with coded
solutions in the
Darwin language
Other materials
Basic Biology
Slides of this part are largely
based on material from
Dr. Gina Cannarozzi
Basic Principles
10 m
Cryptomonadales
Encyclopedia of Life
(eol.org)
So what is life?
Living organisms undergo metabolism,
maintain homeostasis, possess a capacity
to grow, respond to stimuli, reproduce
and, through natural selection, adapt to
their environment in successive
generations.
Inside a Cell
Prokaryote
http://www.osovo.com/diagram/prokaryoticcelldiagram.htm
~2 m
Eukaryote
http://www.biologycorner.com/resources/cell.gif
10-30 m
Relevant components
Genome
chromosome
chromatin
histone
Genes consist of
regulatory regions,
intron, exons,
untranslated regions
http://www.scfbio-iitd.res.in/tutorial/geneorganization.html
Escherichia coli
Homo sapiens
23 chromosome pairs
1 circular chromosome
1 plasmid (multiple copies)
~4.6 million base pairs
~3.9 million
coding bases (85%)
4132 protein-coding genes
172 RNA (tRNA, rRNA,etc)
578 pseudogenes
DNA
Deoxyribonucleic acid
Double helix
Backbones: phosphate and
deoxyribose , directed
(5 3), antiparallel
34
(3.4 nm)
3.3
(0.33 nm)
Wikipedia
DNA Bases
PuRines
PYrimidines
C G: 3 H-bonds
A T: 2H-bonds
Wikipedia
Hydrogen Bond
Central dogma of
molecular biology
Wikipedia
DNA Replication
Wikipedia
Movie time!
Replication visualized:
http://www.wehi.edu.au/education/wehitv/molecular_visualisations_of_dna/
End of day 1
RNA
http://www.pdb.org/pdb/static.do?
p=education_discussion/
molecule_of_the_month/pdb15_2.html
Transcription
Roger Kornberg
Nobel Prize Chemistry 2006
Post-transcriptional
modifications (Eukaryotes)
5 Cap
Poly-A tail
Splicing (removal of introns)
Alternative Splicing
Translation
Wikimedia
Commons
Proteins
Encoded in DNA
Functions of Proteins
...
Amino
Acids
Only sidechains
differ (red)
Sidechains have
diverse chemical
properties
(charge, size, pH,
hydrophobicity, ...)
Wikimedia
Commons
Peptide Bond
G. Cannarozzi
Proteins
have a 3D
structure
Wikimedia Commons
Biological sequences
How are they identified?
Where are they stored?
Unidentified protein
extracted from gel
Proteomics
MDISTLTASEEIE
MEIDAEEIEIMAT
IDLAEDLISLFM
DDMFSSIDLESI
NFEIFNSSDIDSI
NIDLESIEEIEIMF
EEIEIMATIFNSS
DIDIMMDIMMD
SINFEIFNSSDIDI
MMDATIDLAED
LISLFMDDMFSS
IDLESINFEIFNSS
. . . AEDLISLFMDDM . . .
Determine mass
using MS (Mass
Spectrometry)
Determine amino
acid sequence and
compare with sequence database
Sequence
Database
Jiang Long, Science Creative
Quarterly Image Bank
Protein Identified
2.0
1.5
1.0
0.5
0
2000
2002
2004
2006
Year
2008
2010
2012
Getting Sequences
Ensembl
...
e.g. GenBank
File
Evolution
Darwinian Evolution
Population bottleneck
Founder effect
Species Evolution
Diane Dodds fruit fly experiment
Speciation: the
evolutionary process by
which new species arise
Genome Rearrangements
e.g. Human vs. Dog
Krzywinski et al. Circos: an information aesthetic for comparative genomics. Genome Research (2009) vol. 19 (9) pp. 1639-45
Example: recombination
among E. coli strains
Gene Evolution
Point mutations
Point mutations
Purines
Pyrimidines
Insertion/deletion
Lateral Gene
Transfer
Wikipedia
http://www.scq.ubc.ca/attack-of-the-superbugs-antibiotic-resistance/
Recombination
Gene Evolution
Evolutionary Distances
How can we quantify the amount of evolution
between two subjects?
Desirable properties
distance estimable without knowing history
metric properties (e.g. triangle inequality)
Markovian Evolution
Markov Model: every site evolves independently,
probability of mutation only depends on present
state (no memory), probabilities of mutation are
expressed by transition matrix.
A
M1=
A
C
G
T
C G
0.900
0.033
0.033
0.033
0.033
0.900
0.033
0.033
0.033
0.033
0.900
0.033
0.033
0.033
0.033
0.900
http://gi.cebitec.uni-bielefeld.de/people/boecker/bilder/tree_of_life_new.gif
Augustin Augier,
Arbre Botanique
(1801)
rRNA was used by Woese (1987) to group early life forms into
three kingdoms
NO
CS
J
C
O
R
JK
L
F C EIX
FRRALAM
X
S
ST ACAAAC 3
S
I
TH TRRC C
EF A O 1
YW
CH
LA
B
CC
HHLF
CHCHL CHLLCVF
LM TA
P
UR N
BBIF
OA IFLAOA
C
TR
OW
A 8T
ARRT
TAS2
T
TH D
ET DEIG
28 EIRD
A
LEPB
LEPIC
J
IN L
MAGSM
SA
SR FA P
OOC CV
RH N MY CSSJKS
YYC
MM S2
C
MY
N
CE
P
HY
AU
TTFO
YCCCTB P
M
MYYC
M
MY
UA
1MYC
AA
CP
MY
RH
OB
A
PS
EP PS
K
EA
PS PSEE
E
B
P EU2 4
PSSE14
E
PSSEM
PSE F5
PF AL PSE
C
MBAS U5
HA
HRCAV
SAC
H
DC
2 HRS
D
N
THIC
METITOC
R
A
HC
ALH
ALLH
EH X
Y
L
XYL F T
FA
XA
NC 8
XXAANNC5P
XANO AC
R
M
I
D F
R RE GBL
COCO OR
C
PR
PA
RU
W
CARR
P
MAGMM
ZYMM
O
RHOR
T
SPHAL
V
H
CN
SO C DI
VE TM
RU
APHL
PPC
EEGEGG U
R LL L XB
YACKT CO
YDIB
PS
SIAC
PA
AC
NOV
A
ERY D
LH
GRA
GBLCU
OX
RHIERH
MES
RHRIM
HE
IL3 C ILO SB
BARBK
BARQU
BAR
HE
BR
UO
BR
2U
US
BR
UM
BR
EB
UA
2
BRAJA
BRASO
RHOPA
BRASB
RHOP2
RHOPS
RHOP5
RHOPB
NITWN
NITHX
R
GB
WI
AGRT5
P
CB
BU
R
CAUC
B
PELUPM
OTLR
G
RW
OL
RR
HR
WW
EEH
CJ
EHRCR
P Z EH R
ANAMM
ANA
SM
NEOCN
E
RIC
ICF
RY
T
RIC PR BR
IC
RIC
ITB R
OR
CC
O
IL
ID
PS
Y
VI IN
SH
V I BV U
Y
SHEDO
VB
IBPA
EF
CH
SHSH VIB
N
E E PHF1
SH
SH
EO LPAM OP
S
R
H
E
S
N
HE SM
C
SH ES
SR
PSOLP
ES A
EH 3
W
T
PS
EA
6
PARDP
S1
RHOS4
JANSC
SILPO
SILST
ROSDO
HYPNA
MARMM
C
BU
P
CA AI
BU UCOPB
L
B
OF B L
BL
S4H
RH
R
AE
AE
ERYE
W RE
SCOT 8
D
G
TPIA
CLHY
SLALT
SSAA
EANPPS
RRPREPRPRP
YYEYEYEYE
LL
1T
LKU 6I5
8O
ICFO
DOOSLL57
EHEIC
HSFCICSSO
SSHSEIEIBC
SSHH
O
PH
BA
UC
PA H H
SMAE
HHHAA
HAEAE
U S1
EIEIGNI
I8E
M
HA AC AN
ED TP SM
U 2
TO AC
SUL SUL O
S
SUL RAR
PY
AE
PYR
IL
Y
P RJ
C
PY R
PD
THE
D
BX
RU
KO U
PYR
PYYRRFAB
P YRHO
P
N
NEI
EIM
MA
G
F1B
CHRV
AZOSB O
AZOS
E DECAR
BORB
BO
RPR
RP
AE
BORA1
RALEH
RAL
EJ
RALM
EO
RALS
BURP1
BURPS
BURP0
BURMA
BURTA
ARCFU
METST
METS3
METTH
DEHSC
SYMTH DEHE1
CLOD6
CLOTH
CLOAB CLOT
E
CLONN
CL
CLOP
OPSE
1
METFK
N
AT
FR
O
H
AT
ATW
T1
FRAT
FR
AT
FR
HALSA
A
HALM D
NATP
HALWD
P
METT
METBU
METBF A
METM
METAC
UNCMA
METLZ
METMJ
METHJ
METKA METJAMETMP
CENSY
THEMA
BURCM
JANMA
BURCA
BURXL
BURCH
BURS3 HERAR
THIDA
VEREI
METPP
ACIAC
ACISJ
RHOFD
NITMU
EU
A J NIT
LNLS
POPO
NITEC
M
STA
O
EV
THHEACTO NEQ
T PIC NA
E
CL
MY
THETN
MYCS5
XD
MYNXADE
A
JEJFF
AMMJR
M
CCACA
SB
NIT
U
IBL
LS
ACUE
WO
L
SO
D
ELC
SLG P EBA
PS
GEO
DES
OM DBD
MS GEPELP
F
N
SYYNA
V
S
VH
DES DG
S IP
DEA
L W
STA
EQ
S
S
SSTTS
STTA
S
A
SAT
TBA
AAA8
A
ATA
RAA
AAA
S3C
NA
W
M
A
TF
EN
B
LN N
SUHID
T
RD
L
SA
S
FU
GU
ITH
TR AN
IE AS
SY I VPT
NY
SY 3
SY
NE
NP
L
6
SSYYN
NJ JA
GLB
OV
I
P PR
P RO OM
PR R O M 5 P
PR OMMS
PR OM 0
PR OM 9
OM A
OS
S T1
TL
SYYNP
U
SYNSCX
SY N
LE
N S
LELEI ITMRA SYN PW9
IBRIN YB S
OR
B 3PRO
AR
Y
P
PO ATHSA
SY ROMM
PT
NRENM3
R
3 CC
U
SORAR
ERIEU
MONDO
CH
ORNANICK
LOXAF
DASNO
MYOLU
CANFA
FELCA
BOVIN
TUPGB
OTOGA
HUMAN
PANTR
MACMU
RABIT
SPETR
DRO
ME
DR
ANO OPS
GA
AED
AE
ECHTE
NA
L
DE
L
LO
DE
B
HA
ST
CRY
NE
UST
MA
APIM
SCH
PO
PH
AN
O
AS
PF
U MAG BO
GR TC
YA
I
RL
I
CA
OE
RATN
MOUS
CAVPO
PIC
FUGRU
TETNG
GASAC
IN
CIO
CIOSA
ORYLA
AS
HG
O
YE
AS KLU
LA
T
NG
A
EN
O
M
UM
LE
BC6P8G
RM
RRRTFPPPP
R1PP3
TTTSP
PRSRTDTPR
SRTTS
TSSSR
SSTS
CAEBR
CAERE
CAEEL
PLAF7
CRYPV
DICDI
XENTR
DANRE
CA
LA L
C AC
PL B
PA
PE
S1
C 3
LAS CC
CSLA
315
D
1 RAA
TT2 TTR
TRR SS
SST
U
RM
2 V
ST SY
R
S
ST STR
RN26
RRPP
SSTT
M
CL S
LACLA
LA
NW
SY
HY
DES
RHZ
CA
OTA
MO
FK J
RA P
G FLA
UP
BA
RR
PA
OO
BB
RGA
E TRE
BTO
RED
E
H LT
LCLPDLCDH
L
CH
CHPE
GI
8OR
N
CT N V8ARDP 3
BA CFR
C P TH
A
BA B
CY
LA
NN
AQUA
P
LH
HE
Y
LP
HEELPLHPJH
H HE LA
HE
LLIS S S
I W
IN6 TATSAH
LLIISSSM
1 J
MOF
OC
EI
H
BBB
BAAACC
BBAAACC
CCCR
1
CCAAHZH
G NK
B G EO
BAACEOKTN
L
BA CSUD A
CH B
L L A D AC S
LLAAACCA CD
K
CJG C BA
OA
OA
NYYW
BP
PE
MYC
CT
MESMYC
FL MS
MYCG
A
UR
MYCEPA
PE
MYCPU
MYCMO
MMYC
YCH7
HJ
H2
MYCP
N
MYC
GE
Eukaryota
Archaea
Bacteria
Planctomycetes
Fusobacteria
candidate division TG1
Dictyoglomi
Verrucomicrobia
Aquificae
Acidobacteria
Deinococcus-Thermus
Thermotogae
Chloroflexi
Chlamydiae
Chlorobi
Bacteroidetes
Spirochaetes
Tenericutes
Cyanobacteria
Clostridia
Bacilli
Lactobacillales
Actinobacteria
Proteobacteria
P
BU AER
LA
KLU
ST
YEA
GA
CAN
PO
E
YN
A
CR
TM
US
IME
H
EC
TE
A
OX
K
IC
H
C N
NA
R
O
PL
A
CR
F7
Y
PV
D
I
CD
CA
I
C EB
CA AER R
E
EL E
O
R
YL
CI
O
C IN
I
O
SA
FU
G
TE RU
G TNG
A
S
AC
PO
AV
C
R
AMT
ON
U
OS
E
NO
S
DA LU
O
MY NFA
A
CALC
AR O
E
R
N
F VI
O IEU ND
S
O
B GB
ER MO
P
N
TU GA MA TR
O U N U
OT H PA CM BIT TR
A RA E
M
SP
E
R
AN TR
N
E
OM
R
D PS
O
DR A
OG
AN DAE
AE
AP
EL
LO D
H
SC
XE
ST
PIC
HA
DEB
AL
CAN