Sei sulla pagina 1di 79

Phylogeny and Systematics

Phylogeny
Descent with modification: Evolutionary tree of elephant family, based on fossil evidence)

The evolutionary history of a species or a group of species over geologic time

Phylogeny is the evolutionary history of a species or group of related species.


A. Fossil record and geologic time 1. Sedimentary rocks are the richest source of fossils. a. The fossil record refers to the order in which fossils appear within layers of rock that mark the passing of geologic time. b. Organic substances in dead organisms typically decay rapidly. Parts that are rich in minerals (e.g., teeth, bones) may become fossils.

2. Paleontologists use many methods to date fossils. a. Relative dating

i. Fossils near the surface are relatively recent, while those that are deeper are relatively older.

ii. Geologists have established a geologic time scale that reflects a consistent sequence of historical periods.
Those periods are grouped into four eras: Precambrian, Paleozoic, Mesozoic, and Cenozoic.

b. Absolute dating age is given in years, instead of relative terms (before/after, early/late). i. Radiometric dating is the measurement of radioactive isotopes found in fossils and rocks, to determine age.

The half-life of an isotope is the number of years it takes for 50% of the original sample to decay.

3. The fossil record is substantial, but does not provide a complete evolutionary history. a. The fossil record usually tells us about abundant, widespread organisms with hard shells or skeletons. 4. Phylogeny has a biogeographic basis in continental drift. a. Moving continents isolate populations, allowing for evolution to occur. b. 250 million years ago all continents were connected as Pangaea. c. Pangaea broke apart about 180 million years ago.

Crustal plate boundaries

San Andreas fault

b. Permian extinction i. 90% of marine species went extinct. ii. Pangaea formed and some species began competing with each other for the first time. iii. Mass extinction was caused by volcanic eruptions and climate changes. c. Cretaceous extinction i. Dinosaurs went extinct. ii. An asteroid (or comet) hit the earth and created a cloud of debris that blocked out sunlight for months. Temperatures dropped and plants died.

1997 Core Tiny Creatures Tell a Big Story


65 Million years ago the curtain came down on the Age of Dinosaurs when a cataclysmic event led to mass extinctions. This interval of abrupt change in Earth's history, called the K/T Boundary, closed the Cretaceous (K) Period and opened the Tertiary (T) Period. This 40 centimeter slice of seafloor supports the hypothesis that an asteroid collision devastated terrestrial and marine environments. It also shows a record of flourishing marine life before the event, followed by mass extinction and then evolution of new species and slow recovery of surviving life forms after the event. Foraminifera are single-celled organisms that have inhabited the oceans for more than 500 million years. Both living and fossil foraminifera come in a variety of shapes and sizes and occur in many different marine environments. Their abundance, wide distribution, and sensitivity to environmental variations make them good indicators of past climate change.

Post-impact foraminifera from the Tertiary Period. Only tiny, less ornate foraminifera survived; a few new species evolved.
Tektites--glassy material condensed from the hot vapor cloud produced by the impact--rained down and accumulated in a distinctive layer within the core (SEM image). Pre-impact foraminifera from the Cretaceous Period. Large, ornate foraminifera flourished.

Examples of phylogenetic trees

Pace (2001) described a tree of life based on small subunit rRNA sequences.
Pace, N. R. (1997) Science 276, 734-740

This tree shows the main three branches described by Woese and colleagues.

Chlamydiae

Fig. 1. Phylogeny of chlamydiae. 16S rRNA-based neighbor-joining tree showing the affiliation of environmental and pathogenic chlamydiae with major bacterial phyla. Arrow, to outgroup. Scale bar, 10% estimated evolutionary distance.

Eukaryotes
(Baldauf et al., 2000)

Evolutionary processes include:


Ancestor

Expansion*
duplication HGT

Phylogeny*
genesis

species genome
loss

HGT

Exchange*

Deletion*

Original version

Actual version

Hurles M (2004) Gene Duplication: The Genomic Trade in Spare Parts. PLoS Biol 2(7): e206.

Homolog - Paralog - Ortholog


O Homologs: A1, B1, A2, B2 Paralogs: A1 vs B1 and A2 vs B2 A

Orthologs: A1 vs A2 and B1 vs B2

A1A1

BB1 1

A22 A

BB2 2 Sequence analysis a S1 S2 b

Species-1

Species-2

Molecular evolution

GACGACCATAGACCAGCAT AG GACTACCATAGACTGCAAAG *** ******** * *** ** GACGACCATAGACCAGCAT AG GACTACCATAGACTGCAAAG


Two possible positions for the indel

*** ********* *** **

Molecular Phylogenetic Analysis


Study of evolutionary relationships between genes and species The actual pattern of evolutionary history is the phylogeny or evolutionary tree which we try to estimate. A tree is a mathematical structure which is used to model the actual evolutionary history of a group of sequences or organisms.

Molecular Phylogeny Analysis


Specifying the history of gene evolution is one of the most important aims of the current study of molecular evolution; Molecular phylogeny methods allow, from a given set of aligned sequences, the suggestion of phylogenetic trees (inferred trees) which aim at reconstructing the history of successive divergence which took place during the evolution, between the considered sequences and their common ancestor. These trees may not be the same as the true tree. Reconstruction of phylogenetic trees is a statistical problem, and a reconstructed tree is an estimate of a true tree with a given topology and given branch length;

The accuracy of this estimation should be statistically established;


In practice, phylogenetic analyses usually generate phylogenetic trees with accurate parts and imprecise parts.

Nucleotide, amino-acid sequences


3 different DNA positions but -GGAGCCATATTAGATAGA- only one different amino acid position: -GGAGCAATTTTTGATAGA2 of the nucleotide substitutions Gly Ala Ile Phe asp Arg are therefore synonymous and one is non-synonymous. Gly Ala Ile Leu asp Arg

DNA yields more phylogenetic information than proteins. The nucleotide sequences of a pair of homologous genes have a higher information content than the amino acid sequences of the corresponding proteins, because mutations that result in synonymous changes alter the DNA sequence but do not affect the amino acid sequence. (But amino-acid sequences are more efficiently aligned)

Phenetics and Cladistics


Phenetics (Michener and Sokal, 1957): Pheneticists argued that
classifications should encompass as many variable characters as possible, these characters being analysed by rigorous mathematical methods. Such methods (exp. distance based) place a greater emphasis on the relationships among data sets than the paths they have taken to arrive at their current states.

Cladistics (Hennig 1966): emphasizes the need for large datasets


but differs from phenetics in that it does not give equal weight to all characters. Cladists, are generally more interested in evolutionary pathways than in relationships (exp. maximum parsimony).

Key features of DNA-based phylogenetic trees


A An unrooted tree
external nodes

C
branches

external nodes

Rooted trees C D B C D A A

internal nodes Hypothetical ancestor

D
A B C A D B D C

B
3

Rooted and Unrooted trees


An important distinction in phylogenetics between trees that make an inference about a common ancestor and the direction of evolution and those that do not. C A D

C B B D In rooted trees a single node is designated as a common ancestor, and a unique path leads from it through evolutionary time to any other node.
Unrooted trees only specify the relationship between nodes and say nothing about the direction in which evolution occured. Roots can usually be assigned to unrooted trees through the use of an outgroup.

Key features of DNA-based phylogenetic trees


The numbers of possible rooted (NR) and unrooted (NU) trees for n sequences are given by: NR = (2n-3)!/2n-2(n-2)!

NU = (2n-5)!/2n-3(n-3)! Note that only one of all


possible trees can represent the true tree that represents phylogenetic relationships among the sequences.

n 2 3

NR 1 3

NU 1 1

4
5 10

15
105 34459425

3
15 2027025

Gene tree - Species tree


Mutation events
Gene A

Speciation events

Species A

Gene B
Gene C Gene D

Species B Species C Species D

Gene E

Species E

Gene tree

Species tree

These two events - mutation and speciation- are not expected to occur at the same time. So gene trees cannot represent species tree.

Gene tree - Species tree


Time

Duplication Duplication A B Species tree Speciation

Speciation

B Gene tree

Tree construction: how to proceed?


1. Consider the set of sequences to analyse ; 2. Align "properly" these sequences ; 3. Apply phylogenetic making tree methods ; 4. Evaluate statistically the obtained phylogenetic tree.

Methodology :
1- Multiple alignment; 2- Bootstrapping; 3- Consensus tree construction and evaluation;

Alignment is essential preliminary to tree construction

GACGACCATAGACCAGCAT AG GACTACCATAGACTGCAAAG *** ******** * *** ** GACGACCATAGACCAGCAT AG GACTACCATAGACTGCAAAG


Two possible positions for the indel

*** ********* *** **

If errors in indel placement are made in a multiple alignment then the tree reconstructed by phylogenetic analysis is unlikely to be correct.

Steps in Multiple Sequence Alignments


A common strategy of several popular multiple sequence alignment algorithms is to: 1- generate a pairwise distance matrix based on all possible pairwise alignments between the sequences being considered; 2- use a statistically based approach to construct an initial tree; 3- realign the sequences progressively in order of their relatedness according to the inferred tree; 4- construct a new tree from the pairwise distances obtained in the new multiple alignment; 5- repeat the process if the new tree is not the same as the previous one.

S te ps i n mu l tipl e ali gn me n t A- Pai rwi se al ignme n t Example- 4 sequences, A, B, C, D A B C D Similarit y B- Mu l tiple al ignme n t fol l owi n g th e tre e from A B D Align most similar pair Gaps t o optimise alignment A C 6 pairwise comparisons t hen clust er analysis

B D A C

Align next most similar pair Ne w g ap to opti mi se al ignme n t of (BD) wi th (AC )

B D A C Align alignments- preserve gaps

Procedure
An efficient procedure consists of aligning amino-acid sequences and use the resulting alignment as template for corresponding nucleotide sequences. Alignment is garanteed at the codon level.
1. Alignment of a family protein sequences using clustalW

2. Alignment of corresponding DNA sequences using as template their corresponding amino acid alignment obtained in step 1
Note: clean multiple alignment from gaps common to the majority of considered sequences

Phylogenetic tree construction methods


A phylogenetic tree is characterised by its topology (form) and its length (sum of its branch lengths) ; Each node of a tree is an estimation of the ancestor of the elements included in this node; There are 3 main classes of phylogenetic methods for constructing phylogenies from sequence data : Methods directly based on sequences : Maximum Parsimony : find a phylogenetic tree that explains the data, with as few evolutionary changes as possible. Maximum likelihood : find a tree that maximizes the probability of the genetic data given the tree. Methods indirectly based on sequences : Distance based methods (Neighbour Joining (NJ)): find a tree such that branch lengths of paths between sequences (species) fit a matrix of pairwise distances between sequences.

Parsimony
The concept of parsimony is at the heart of all characterbased methods of phylogenetic reconstruction.

The 2 fundamental ideas of biological parsimony are:


1- Mutations are exceedingly rare events (?) ;

2- the more unlikely events a model invokes, the less likely the model is to be correct. As a result, the relationship that requires the fewest number of mutations to explain the current state of the sequences being considered, is the relationship that is most likely to be correct.

Parsimony
Informative and Uninformative Sites: Multiple sequence alignment, for a parsimony approach, contains positions that fall into two categories in terms of their information content : those that have information (are informative) and those that do not (are uninformative). Example: seq 1 2 3 4 1 G G G G 2 G G G A 3 G G A T 4 G A T C 5 G G A A 6 G T G T
In general, for a position to be informative regardless of how many sequences are aligned, it has to have at least 2 different nucleotides, and each of these nucleotides has to be present at least twice.

Position 1 is said invariant and therefore uninformative, because all trees invoke the same number of mutations (0); Position 2 is uninformative because 1 mutation occurs in all three possible trees; Position 3 idem, because 2 mutations occur; Position 4 requires 3 mutations in all possible trees.

Positions 5 and 6 are informative, because one of the trees invokes only one mutation and the other 2 alternative trees both require 2 mutations.

Krane &

1G G G

G3 T4 A3 G A A4 T3

1G G 3G 1G G 3A 1G G A 3T 1G G 3A 1G G G 3G 1G G G 3G G G T

T2 T4

1G G 4T 1G G G 4A 1G G A T

T2 T3 G2 A3

2T

1G 2G 1G

G2
A4 A2 C4 G2 T4

4
2A 1G

A2
T3 G2 G G

T
C4 A3

4C
1G 4T 1G G G 4A 1G G G 4G

A T4 G3

2G

A3 G2
G3 G3 G3

1G G G 2G 1G

G2
A4 G2 G4

A4 G3 G G G4

1
2G

Maximum Parsimony (Fitch, 1977)


Parsimony criterion consists of determining the minimum number of changes (substitutions) required to transform a sequence to its nearest neighbor. The maximum parsimony algorithm searches for the minimum number of genetic events (nucleotide substitutions or amino-acid changes) to infer the most parsimonious tree from a set of sequences. The best tree is the one which needs the fewest changes. Problems : 1. within practical computational limits, this often leads to the generation of tens or more "equally most parsimonious trees" which makes it difficult to justify the choice of a particular tree ; 2. long computation time is needed to construct a tree.

Maximum Parsimony (Fitch, 1977),...


The Maximum parsimony method takes account of information pertaining to character variation in each position of the sequence multiple alignment, to recreate the series of nucleotide changes. The assumption, possibly erroneous, is that evolution follows the shortest possible route and that the correct phylogenetic tree is therefore the one that requires the minimum number of nucleotide changes to produce the observed differences between the sequences. Trees are therefore constructed at random and the nucleotide changes that they involve calculated until all possible topologies have been examined and the one requiring the smallest number of steps identified. This is presented as the most likely inferred tree.

Maximum likelihood This approach is a purely statistically based method. Probabilities are considered for every individual nucleotide substitution in a set of sequence alignment. Exp.
Since transitions (exchanging purine for a purine and pyrimidine for a pyrimidine) are observed roughly 3 times as often as transversions .. C.. (exchanging a purine for a pyrimidine or vice versa); it can be reasonably argued that a greater likelihood exists that the sequence with C and T are ..T.. more closely related to each other than they are to the sequence with G.

..G.. Calculation of probabilities is complicated by the fact that the sequence of


the common ancestor to the sequences considered being unknown. Furthermore multiple substitutions may have occurred at one or more sites and that all sites are not necessarily independent or equivalent.

Still, objective criteria can be applied to calculating the probability for every site and for every possible tree that describes the relationships of the sequences in a multiple alignment.

Distance matrix methods (NJ,...) Convert sequence data into a set of discrete pairwise distance values, arranged into a matrix. Distance methods fit a tree to this matrix. Di,j = the distance between i and j sequences; di,j = sum of branches on the tree path from i to j; The phylogeny makes an estimation of the distance for each pair as the sum of branch lengths in the path from one sequence to another through the tree. A measure of how close is the tree to D is given by the least square criterion : ( Di,j - di,j )2/ D2ij
i,j

The phylogenetic topology tree is constructed by using a cluster analysis method (like the NJ method).
1. easy to perform ; 2. fast calculation ; 3. fit for sequences having high similarity scores ; drawbacks : 1. all sites are generally equally treated (do not take into account differences of substitution rates ) ; 2. not applicable to distantly related sequences; 3. Some of the information is lost, particularly those pertaining to the identities of the ancestral and derived nucleotides at each position in the

The choice of the outgroup


Most of phylogenetic methods construct unrooted trees.

It is best to root such trees on biological grounds.


The most used technique consists of including in the sequence data set to be analysed, a sequence which has some relation with the considered sequences without belonging to the same family. The aim is to normalize the branches of the unrooted tree relatively to the length of the branch related to the outgroup.

Evaluation of different methods


None of the previous methods of phylogenetic reconstruction makes any garantee that they yield the one true tree that describes the evolutionary history of a set of aligned sequences

There is at present no statistical method allowing comparisons of trees obtained from different phylogenetic methods; nevertheless many attempts have been made to compare the relative consistency of the existing methods. The consistency depends on many factors, including the topology and branch lengths of the real tree, the transition/transversion rate and the variability of the substitution rates. In practice, one infers phylogeny between sequences which do not generally meet the specified hypothesis.
One expects that if sequences have strong phylogenetic relationships, different methods will result in the same phylogenetic tree.

Statistical evaluation of the obtained phylogenetic tree


The accuracy is dependent on the considered multiple sequence alignments ; ML estimates branch lengths, their degree of significance and their confidence limits ;

At present only sampling techniques allow to test the topology of a phylogenetic tree :
Bootstrapping It consists of drawing columns from a sample of aligned sequences, with replacement, until one gets a data set of the same size as the original one (usually some columns are sampled several times and others left out).

Bootstrapping
Constructs a new multiple alignment at random from the real alignment, with the same size. Note that the same column can be sampled more than once, and consequently some columns are not sampled.
ATAGCCATA ATACCCATG ATACCCATA

ATAGCCATA
ATCCCCCAT

TCAAATGC A
TCGAATCC A TCAAATCC A

Methodology
1. Consider the set of sequences to analyse ; 2. Align "properly" these sequences ; 3. Apply phylogenetic making tree methods ; 4. Evaluate statistically the obtained phylogenetic tree.

1- Multiple alignment; 2- Bootstrapping (100 samples); 3. Apply phylogenetic making tree methods ; 4- Consensus tree construction and evaluation;

Example: The tree of life


Pace (2001) described a tree of life based on small subunit rRNA sequences.
Pace, N. R. (1997) Science 276, 734-740

This tree shows the main three branches described by Woese and colleagues.

B. Systematics: Connecting classification to phylogeny Systematics: the study of biological diversity in an evolutionary context, including taxonomy and phylogenetics. 1. Taxonomy uses a hierarchical classification system a. Review the Linnaean (binomial) system of classification: genus and species. b. Review hierarchical classfication: Kingdom, Phylum, Class, Order, Family, Genus, Species - A named taxonomic unit at any level is called a taxon.

c. Phylogenetic trees are used to place different taxonomic schemes together, and to show connection between classification and phylogeny.

2. Modern phylogenetic systematics are based on cladistic analysis a. A phylogenetic diagram (tree) is also called a cladogram. b. Each branch in the tree is called a clade. c. Monophyletic pertains to a taxon that is derived from a single ancestral species. only legitimate cladogram type!

d. Polyphyletic pertains to a taxon whose members were derived from two or more ancestors not common to all members. e. Paraphyletic pertains to a taxon that excludes some members that share a common ancestor with members included in the taxon.

3. Constructing cladograms
a. Identify homologies shared characteristics derived from one ancestor. NOTE: Analogous structures may look similar to one another, but are not derived from a common ancestor. These are in contrast to homologous structures. Example of an analogous structure in two distantly related plants.

When two organisms have analogous structures, this is an example of convergent evolution Independent development of similarity between species due to similar selection pressures.

b. When constructing a cladogram, the greater the number of homologous parts between two organisms, the more closely related they are. c. The classification scheme must reflect these similarities.

These similarities can be either:


-Shared primitive characters, I.e. homologous characters that are shared by more than one taxon, e.g. backbone is shared by mammals and reptiles. -Shared derived characters, I.e. an evolutionary novelty that is unique for a particular clade. The more derived characters that a species has, the more evolutionarily unique it is.

Example of how to construct a cladogram:

1. Select your species for which you want to make a cladogram. These are called the ingroup. They have shared primitive and derived characters.
2. Select an outgroup a species that is closely related to the species under study, the outgroup has a shared primitive character that is common to all species. 3. Construct a character table and tabulate the data. The more shared characters, the more closely related are the species. 4. Construct a cladogram based on the number of shared characters. For example: Figure 25.11 (p. 497) Constructing a cladogram. The outgroup here, the lancelet has a notochord, the shared primitive character. The ingroup is five vertebrates.

4. Phylogeny can be inferred also from molecular data


a. DNA and RNA sequences of nucleic acids can be compared to determine phylogeny. Example to follow. Note that each change in a nucleic acid = one evolutionary event! The more events, the more distantly related are the species. Fewer events means that a species is more closely related.

5. The principle of parsimony helps systematists reconstruct phylogeny a. Phylogenies can be extremely complicated. b. The principle of parsimony states that a theory about nature should be the simplest explanation that is consistent with facts. - Keep it simple. - Sometimes called Occams Razor. c. A phylogenetic tree is a hypothesis. There may be many possible trees, but the simplest one is probably the most accurate.

Parsimony and the analogy-versus-homology pitfall.

Isolation of full length Cp-sHSP gene


A. stononifera
S. alterniflora
HSE

TATA

+1

Transit Peptide

Met-rich

100bp
Met-rich

con II con II con II con II

con I con I con I con I

HSE

TATA

+1

Transit Peptide

100bp
Met-rich

C.album NY C.album MS A. americana F. wislizenii A.retroflexus

HSE

TATA

+1

Transit Peptide

390bp
Met-rich

HSE

TATA

+1

Transit Peptide
+1

436bp
Met-rich

HSE

TATA

Transit Peptide

?bp

con II con II con II

con I

HSE

TATA

+1

Transit Peptide

Met-rich

con I

390bp
HSE

TATA

Transit Peptide

+1

Met-rich

con I

421bp

Alignment of the derived amino acid sequence


A. s tol en ife ra 1 A. s tol en ife ra 2 A. s tol en ife ra 3 A. t hal ia na F. h ygr om etr ic a G. m ax H. v ulg ar e L. e scu le ntu m N. s ylv es tri s N. t aba cu m O. s ati va P. h ybr id a P. s ati vu m T. a est iv um Z. ma ys A. a mer ic ana C. a lbu m MS C. a lbu m NY F. w isl iz eni i S. a lte rn ifl ra o Co ns ens us
A. s tol en ife ra 1 A. s tol en ife ra 2 A. s tol en ife ra 3 A. t hal ia na F. h ygr om etr ic a G. m ax H. v ulg ar e L. e scu le ntu m N. s ylv es tri s N. t aba cu m O. s ati va P. h ybr id a P. s ati vu m T. a est iv um Z. ma ys A. a mer ic ana C. a lbu m MS C. a lbu m NY F. w isl iz eni i S. a lte rn ifl ra o Co ns ens us A. s tol en ife ra 1 A. s tol en ife ra 2 A. s tol en ife ra 3 A. t hal ia na F. h ygr om etr ic a G. m ax H. v ulg ar e L. e scu le ntu m N. s ylv es tri s N. t aba cu m O. s ati va P. h ybr id a P. s ati vu m T. a est iv um Z. ma ys A. a mer ic ana C. a lbu m MS C. a lbu m NY F. w isl iz eni i S. a lte rn ifl ra o Co ns ens us

(1 ) 1 10 20 30 40 50 60 70 85 (1 ) ------MAAANAPFALVSRLSSPAARLPIRAWRAARPAPLGAG--------RARPLTTASASQDNRDN-SVDVQVSQNGGG--NQ (1 ) ------MAAANAPFALVSRLSSPAARLPIRAWRAASPAPLGAG--------RARPLTTASASQDNRDN-SVDVQVSQNGGG--NQ (1 ) - - - - - - M AAANAPFALVSRLSSPATRLPARAW RAARPAPVAAG - - - - - - - RTRPLTTASASQ ENRDN- SVDVQ VSQ G - - NQ NG (1 ) - - - - - - - - - - M ALARLALRNLQ KLSPSLM Q Q G SCERG LVG NRHN- - - - - - PM KLNRFM ATSAG EDKM EQ NTEVSVSEKK- - - - SP (1 ) - - - - M ASSTAKSG FHTFM EALTG REPVTAVSCRPPCYG AG FRR- - - - - - - LAVVSSSQ ENASENSDRSLTQ Q LPRQ G DG SR- - SP (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - G DNKDN- SVEVQ G HVSKG - - - DQ (1 ) - - - - - - M AAATAPFALVSRLS- Q AARLPI RAW RAARPAPLW - - - - - - - - RTRPLSVASAAQ TG EDRDN- SVDVQ VSQ ARNAG NQ (1 ) - - - M AYTSLTSSPLVSNVSVG TSKI NNNK- - VSAPCSVFVP- - - - - - SM G RRPTTRLVARATG DNKDT- SVDVHHSSAQ G NNQ G (1 ) - - - - - LTCSAASPLSNVVNVSAASSRSNNR- - VTAPCSVFFP- - - SACNVKRPASRLVAQ ATG DNKDT- SVDVHVSSG G NNNQ Q G (1 ) M ACKTLTCSAASPL- - VVNG VTASSRSNNR- - VAAPFSVFFP- - - STCNVKRPASRLVVEATG DNKDT- SVDVHVSSG G NNNQ Q G (1 ) - - - - - - - - - M AAPFALVSRVS- PAARLPI RAAW RSEPTVG LPSS- - - - - - G RARQ LAVASAAQ ENRDNTAVDVHVNQ G - - NQ DG (1 ) M ACKTLTCSASPLVS- - NG VVSATSRTNNKKTTTAPFSVCFPY- - SKCSVRKPASRLVAQ ATG DNKDT- SVDVHVSNNNQ G G NNQ (1 ) - - - - - - - - M AQ SVSLSTI ASPI LSQ KPG SSVKSTPPCM ASFPLRRQ LPRLG LRNVRAQ G DNKDN- SVEVHRVNKD- - - - DQ AG DG (1 ) - - - - - - M AAANAPFALVSRLS- PAARLPI RAW RAARPAPLSTG - - - - - - - RTRPLSVASAAQ G ENRDN- SVDVQ VSQ NAG NQ AQ (1 ) - - - - - - - - M AAAPFAI AG RLS- PVARLPVRAW RPAHG FASSG - - - - - - - - ARSLAVASAAQ RENRDN- SVDVQ VSQ G - RQ NG N(1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (1 ) A AP L A R R A R A A DNRD SVDV VS NQ (8 6) 8 6 1 00 1 10 1 20 1 30 1 40 1 50 1 60 1 75 (6 9) QGNAVQRRPR-RAGFDVAP-------FGLVDPMSPMRTMRQMLDTMDRLFDD-AVGFPTTRRS-PAAASEAPRMPWDIVEDDKEVKMRFD (6 9) QGNAVQRRPR-RTGFDVAP-------FGLVDPMSPMRTMRQMLDTMDRLFDD-AVGFPTTRRS-PAAASEAPRMPWDIVEDDKEVKMRFD (6 8) Q NAVQ G RRPR- RAG FDI SP- - - - - - - FG LVDPM SPM RTM M RQ LDTM DRLFDD- TVG FPTTRRS- PATASEVPRM DI M PW EDDKEVKM RFD (6 6) RQ NFPRRRG RKSLW RNTDDHG YFTPTLNEFFPPTI G NTLI Q ATENM NRI FDN- - FN- - - - - - - - - - - - VNPFQ G VKEQ LM Q DDCYKLRYE (7 3) G PRRPM LRRG G DTRRDLTSS- - - - - LFDI W DPFI G DRSLKQ LNTVDRLFADPFFG M SPPS- - - - - - ATALDLRTPW DVKEDADAYKLRFD (2 0) G TAVEKKPR- RTAM DI SP- - - - - - - FG LDPW I SPM RSM I LDTM RQ DRVFED- TM TFPG RN- - - - - I G G RAPW KDEEHEI RM G EI DI RFD (6 9) Q NAVQ G RRPR- RAG FDI SP- - - - - - - FG LVDPM SPM RTM M KQ SDTM DRLFDD- AVG FPTARRSPAAAAG PRM DI M EM PW EDDKEVKM RFD (7 3) G TAVERRPT- RM ALDVSP- - - - - - - FG VLDPM SPM RTM M DTM RQ I DRLFED- TM TFPG RNRA- - - SG EI RTPW HDDENEI KM TG DI RFD (7 5) G STSVQ RRPR- KM ALDVST- - - - - - - FG LLDPM SPM RTM M DTM RQ M DRLFED- TM TFPG SNR- - - - ASTG RAPW KDDENEI KM EI DI RFD (7 8) G STSVDRRPR- KM SLDVSP- - - - - - - FG LLDPM SPM RTM M DTM RQ M DRLLED- TM TFPG RNRS- - - SAVG RAPW KDDENEI KM EI DI RFD (6 7) Q NAVQ G RRPR- RSSAFG RHL- - - - - PFG LVDPM SPM RTM M RQ LDTM DRM FDDVALG FPATPRR- - SLATG EVRM DVM PW EDDKEVRM RFD (8 1) G SAVERRPR- RM ALDVSP- - - - - - - FG LLDPM SPM RTM M DTM RQ M DRLFED- TM TFPG SRN- - - - RG EI RAPW KDDENEI KM TG DI RFD (7 3) G TAVERKPR- RSSI DI SP- - - - - - - FG LLDPW SPM RSM M RQ LDTM DRI FED- AI TI PG RN- - - - - I G G RVPW KDEEHEI RM G EI EI RFD (7 0) Q NAVQ G RRPR- RAG FDI SP- - - - - - - FG LVDPM SPM RTM M RQ LDTM DRLFDD- AVG FPTARRS- PAAASETPRM DI M PW EDEKEVKM RFD (6 5) Q NAVQ G RRPRRATALDI SPS- - - - - PFG LVDPM SPM RTM M RQ LDTM DRLFDD- AVG FPM TRR- SPATTG G DVRLPW VEDEKEVKM D DI RI (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - M RQ M M DTM DRM FED- AM TFPG SSRS- - - - TAG RAPW M EI DI EDEKEVKM RFD (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - M RQ LDTM M DRLFED- TM TVPTR- - - - - - - - M EM G RAPW M DI EDENEYKM RFD (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - M RQ M M DTM DRLFED- TM TVPTR- - - - - - - - M EM APW M RERVQ ST G Q DI VG VG (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - M RQ LDSM M DRLFED- AM PG - - - - - - - G AEM TM M RAPW VEDDNEVKM DI RFD (1 ) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - M RQ I DTM M DRLFDD- - - - - - - - - - - - - TM CPPARCG G CRG TSRATRKI CW LD (8 6) AV RRPR R A DI SP FG LVDPM SPM RTM M RQ LDTM DRLFED M TFP R A G R PW M EI DI EDE EVKM RFD (1 76 ) 1 76 1 90 2 00 2 10 2 20 2 30 2 40 2 50 2 65 (1 49 ) MPGLSRDEVKVMVEDDTLVIRGEHKKEVSEGQGDGAEGQGDGWWKERSVSSYDMRLALPDECDKSQVRAELKNGVLLVSVPKTE--TERK (1 49 ) MPGLSRDEVKVMVEDDTLVIRGEHKKEVSEGQGDGAEGQGDGWWKERSVSSYDMRLALPDECDKSQVRAELKNGVLLVSVPKTE--TERK (1 48 ) M PG LSRDEVKVM VEDDTLVI RG EHKKEAG Q DG EG G AEG G W KERSVSSYDM Q DG W RLTLPDECDKSQ VRAELKNG VLLVTVPKTE- - TERK (1 42 ) VPG LTKEDVKI TVNDG LTI KG I DHKAEEEKG SP- - - - - EEDEYW SSKSYG YYNTSLSLPDDAKVEDI KAELKNG VLNLVI PRTEK- PKKN (1 52 ) M PG LSKEEVKVSVEDG DLVI RG EHNAEDQ KEDS- - - - - - - - - - W SSRSYG SYNTRM ALPEDALFEDI KAELKNG VLYVVVPKSKKDAQ KK (9 5) M PG LAKEDVKVSVEDDM LVI KG HKSEQ G EHG - - - - - - - - DDSW G SSRTYSSYDTRLKLPDNCEKDKVKAELKNG VLYI TI PKTK- - VERK (1 50 ) M PG LSREEVKVM VEDDALVI RG EHKKEAG Q EAA- G G W KERSVSSYDM EG G G DG W RLALPDECDKSQ VRAELKNG VLLVSVPKRE- - TERK (1 50 ) M PG LSKEDVKVSVENDM LVI KG EHK- KEEDG - - - - - - - DKHSW RNYSSYDTRLSLPDNVVKDKI KAELKNG RG VLFI SI PKTE- - VEKK (1 52 ) M PG LSKEDVKVSVENDVLVI KG EHK- KEESG - - - - - - - - DDNSW RNYSSYDTRLSLPDNVEK- - - - - - - - - - - - - - - - - - - - - - - - - G (1 56 ) M PG LSKDEVKVSVEDDLLVI KG EYK- KEETG - - - - - - - - DDNSW RNYSSYDTRLSLPDNVEKDKI KAELKNG G VLFI SI PKTK- - VEKK (1 49 ) M PG LSREEVKVM VEDDALVI RG EHKKEEG AE- - - - G DG W EG SG W KERSVSSYDM RLALPDECDKSKVRAELKNG VLLVTVPKTE- - VERK (1 57 ) M PG LSKEEVKVSVEDDVLVI KG EHK- KEESG - - - - - - - - KDDSW RNYSSYDTRLSLPDNVDKDKVKAELKNG G VLLI SI PKTK- - VEKK (1 48 ) M PG VSKEDVKVSVEDDVLVI KSDHR- - EENG - - - - - - - - EDCW G SRKSYSCYDTRLKLPDNCEKEKVKAELKDG VLYI TI PKTK- - I ERT (1 50 ) M PG LSREEVRVM VEDDALVI RG EHKKEAG Q - - - - EG DG W EG G G W KERSVSSYDM RLALPDECDKSQ VRAELKNG VLLVSVPKRE- - TERK (1 48 ) M PG LARDEVKVM VEDDTLVI RG EHKKEEG AEG SG - G DG W RSVSSYDM G DG W KQ RLALPDECDKSKVRAELKNG VLLVTVPKTE- - VERK (4 8) M PG SKEEVKVSVEDNVLVI KG M EHKAEEG EE- - - - G EG KDESW RG W KSSSNYDM RLM LPDNCEKDKVRAELKNG VLL- - - - - - - - - - - - (4 4) M PG LDKG DVKVSVEDNM LVI KG ERK- KEEG - - - - - - - - - DDAW G SKRSYSSYDTRLQ LPDNCEM DKI KAEFKNG VLL- - - - - - - - - - - - (4 4) CRG STRG SRCRSRI TCLSSKESARRKKEVTM - - - - - - - - - - VKEVI AHM LG M HG I FNCLI I VSW RLRPSSRTECFY- - - - - - - - - - - - I (4 4) M PG LSKEDVKVM VEDDM LVI RG ETK- KEEG - - - - - - - - - DDAW G KRRSYSSYDTRLQ LPDDCEM DKI KAELKNG VLL- - - - - - - - - - - - (4 0) M PG LERDEVKVM VEDDTLVI RG EPKKEKG AEASG - - - - - DG W W KESSVSAYHM RLALPEACDKSKVRAELKNG VLL- - - - - - - - - - - - (1 76 ) M PG LSKEEVKVSVEDDM LVI KG EHK EEE G D W K RSYSSYDTRLALPDECDKDKVRAELKNG W VLLVSVPKT ERK

Phylogenetic comparisons of Cp-sHsp Based on full length genes

Phylogenetic comparisons of Cp-sHsp Based on conserved region

Potrebbero piacerti anche