Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
repetitive elements within introns. In animal genomes, are 256 possible patterns but some of them carry the
intron-mediated rearrangements have contributed same information. For example, the same relationship
importantly to the evolution of novel chimaeric would be inferred if the pattern were GGTT. The
genes by so-called `exon shuffling.' On the scale of method restricts itself to positions that have exactly
millions to hundreds of millions of years, homologous two purines (A and/or G) and two pyrimidines (C
genes may diverge by loss and gain of introns. Loss of and/or T) in their pattern as all the examples used
an intron may occur by way of reverse transcription here do. Their relationship is shown by the tree in
and recombinational reincorporation of a spliced gene Figure 1A where the arrowhead indicates that only a
product. Insertion of introns by transposition has single transversion mutation is required to explain the
been observed experimentally for group I, group II, observed nucleotides at the tips of this tree. (A trans-
and spliceosomal introns. For group I and II introns, version is the historical change from (or to) a purine
`homing' to (intronless) allelic sites is also observed. to (or from) a pyrimidine; all other interchanges are
called transitions.)
See also: Eukaryotic Genes; Pre-mRNA Splicing On the other hand, a pattern such as ACCA would
suggest that sequences 1 and 4 were sisters rather than
sequences 1 and 2 (see Figure 1B). The two relation-
Invariants, Phylogenetic ships (trees) cannot both be true, but if sequences 1
and 2 really are the true sister sequences, then this
W Fitch third pattern can only have arisen by virtue of two
Copyright ß 2001 Academic Press transversions having occurred during the history of
doi: 10.1006/rwgn.2001.0710 these sequences (see Figure 1C).
However, we can estimate how often the mislead-
ing case in Figure 1C arises. Note that in Figure 1D
Phylogenetic invariants is a method first proposed we have shown only three of the four nucleotides in the
by Lake (1987). The `invariants' derive from the fact pattern. What could the fourth nucleotide be? As we
that the addition and subtraction of the numbers of only consider those patterns with two purines and two
certain nucleotide distribution patterns are expected pyrimidines, there must be a pyrimidine. Which one?
to remain constant (at zero) for all incorrect phylo- If we assume that there is no bias as to which nucleo-
genies. And thus can be used to distinguish among tide the mutation is to, then it can be either C (as in
alternative phylogenetic trees. It is a property that is Figure 1C) or T (as in Figure 1E) with equal prob-
used on nucleotide sequences taken four at a time. For ability. But that means that, for the wrong tree, the
example, suppose that we had four such sequences number of occurrences of a pattern like that in Figure
that are homologously aligned from left to right, one 1C should be the same as the number for the pattern
under the other: like that in Figure 1E. Hence, subtracting those two
numbers should give an number not statistically differ-
...AGA...
ent from zero for the two tree structures that are wrong.
...AGT...
(The third possible tree is for the pattern ACAC
...C T T...
which suggests that sequences 1 and 3 are sisters.)
...C TA...
There are more details to the method but the pre-
so that for any position in the alignment the four ceding gives the spirit of the method. It is a method
nucleotides produce a (vertical) pattern such as that is guaranteed to give the correct answer given
AACC. This might suggest that the first two se- sufficient lengths of the sequences being compared.
quences are sister sequences meaning that they are This virtue, however, is more than offset by the answer
more closely related to each other than either of them to the question of how long the sequences must be to
is to the second two sequences (see Figure 1A). There get that correct answer. It turns out that the sequences
1A 4C 1A 4A 1A 4A 1A 4A 1A 4A
A
A C A A A A A A
C
2A 3C 2C 3C 2C 3C 2C 3? 2C 3T
A B C D E
Figure 1