Sei sulla pagina 1di 4

Genomes

A genome is the completely (or almost completely) determined DNA sequence of the genetic
material (chromosomes as well as any plasmids, mitochondrial DNA, etc) of an organism.
The word is somewhat of a misnomer: a genome isn't the same as 'all genes', it is rather
'sequence of all DNA' wherein all genes can be found.
The first genome of a free living organism (viruses aside) was that of Haemophilus
influenzaepublished in 1995 (Fleischmann et al, Science (1995) vol 269, pp 496-512).
Why are complete genomes interesting?
The most basic answer to that question is that we want to know the complete set of genes
that an organism has.
The genome of an organism in a certain sense the blueprint for that organism.
Many observations and experiments in biology involve mutants and mutations, and knowing
the complete set of genes for an organism can help with the analysis.
For example, we may want to be sure that a knocked-out gene does not have a backup copy
somewhere in the genome.
knowing the complete genome for an organism is only the first step in the complete
mapping of the constituents and processes of the organism.
The complete genome is a necessary (but not sufficient) requirement for understanding an
organism.
And yet another answer to the question is emerging: the availability of more and more
complete genomes allows entirely new kinds of comparisons to be made between
organism.
New types of analysis can be applied to old questions in biology, involving problems in
evolutionary history relating to the interactions between species.
However, the cost and effort required to sequence a genome, especially a bacterial genome, is
rapidly diminishing, as new and/or improved technologies and tools are developed.
We will soon find ourselves in a situation where the availability of a genome is going to be
considered a basic requirement for working on a specific organism.
How many genomes have been completed ?


TIGR Microbial Database, published microbial genomes, and projects in progress.
Lists maintained by TIGR, The Institute for Genomics Research. This institute was
founded by Craig Venter, and was the first to sequence complete genome for a
bacterium, Haemophilus influenzae, in 1995.
EBI Completed Genomes web site, links, resources. Contains sequence data for
organelles (mitochondria), phages and viruses.
Genome Monitoring Table by Stephan Beck and Peter Sterk at EBI.
NCBI Genomic Biology web site, with links and search resources.
GOLD (Genomes OnLine Database), maintained by the company Integrated
Genomics, which is selling annotation services for companies that have in-house
genome projects (primarily bacterial genomes).
Organism Type
Genome
size
(Mb)
Number
of genes
Links Comment
Haemophilus
influenzae
Bacterial 1.83 1850
Haemophilus
influenzae page at
TIGR.
The first genome of a free-
living organism. 1995
Escherichia coli Bacterial 4.64 4289
E.coli Genome
Project University
of Wisconsin-
Madison
The most studied bacterium.
1997
Rickettsia
prowazekii
Bacterial 1.11 834

The first genome to be
sequenced in Sweden (Siv
Andersson, Uppsala). 1998
Methanococcus
jannaschii
Archaeal 1.66 1750
Methanococcus
jannaschii page at
TIGR.
The first sequenced Archaea.
1996
Saccharomyces
cerevisiae
Eukaryote 12.1 6294
SGD, MIPS yeast
DB
The first sequenced
eukaryote. 1997
Caenorhabditis
elegans
Eukaryote,
nematode
97 18,424
WormBase, C.
elegans Genome
Project
The first sequenced
multicellular organism. 1998
Drosophila
melanogaster
Eukaryote,
insect
137 13,601 BDGP, Flybase
Celera Corp, publicly
available. 2000
Arabidopsis
thaliana
Eukaryote,
plant
125 25,498
The Arabidopsis
Information
resource
The first plant. 2000
Homo sapiens
Eukaryote,
primate
3,000 50,000 ?
HGP at Sanger,
HGP at Oak Ridge
Ensembl
Rough draft exists. Not yet
finished, except for
chromosomes 21 and 22.
The analysis of a genome
Define the location of genes (coding sequences, regulatory regions): gene prediction
(identification).
o Gene prediction ab initio using software based on rules and patterns. Find
Open Reading Frames (ORFs), with additional criteria for good start sequence
for a gene. This is considered reasonably easy for bacteria, but is very difficult
for eukaryotes.
o Gene identification through alignment with know proteins and EST sequences
(Expressed Sequence Tags; mRNA sequences).
o Gene prediction through similarity with proteins or ESTs in other organisms.
o Gene prediction through comparison with other genomes; conserved regions
are probably coding or regulatory regions. This is called synteny, and is very
promising for analysis of higher eukaryote genomes.
Annotation of the genes: Compare with genes/proteinsof known function in other
organisms. This is essentially the same as labelling the gene.
Functional classification. Broad groups of functional characterization, such as
'ribosomal proteins', 'nucleotide metabolism', 'signal transduction'.
Metabolic pathways.
o Are any common pathways missing?
o Are there 'gaps' (missing enzymes) in some pathways?
o Compare identified pathways with the life style of the organism.
Evolutionary history
o Internal genome duplications can sometimes be detected.
o Gene decay can sometimes be characterized: genes that are on their 'way out'
after duplication, or because the life style of the organism has changed.
o Horisontal gene transfer: genes that have been acquired from another
organism.











Comparative Genomics
It is now possible to investigate which sets of genes are common to many different
organisms, or groups of organisms. Is there a common core of genes necessary for all
life? Is that core sufficient for life?

If one looks at a specific, and yet fundamental, component such as the ribosome and
the protein synthesis, can one say anything about whether this system has changed
fundamentally through evolution, or has it stayed basically the same throughout?
Have there been inventions during evolution in such a fundamental system?
Which genes are necessary for multicellular life forms; which set of genes are only
found in multicellular organisms but not in unicellular ones?
The rate of horizontal gene transfer (genes that have jumped the species barrier)
among bacteria can now be investigated. How often, and under what circumstances do
bacteria exchange genes? Has anything similar happened with higher organisms?
Where and how have new genes emerged in evolutionary history? Can precursors
of some gene families be found in distant relatives of a species?
The problem of identifying and characterizing orthologous genes versus paralogous
genes becomes easier to address (but not necessarily solve).
o Orthologues are genes that have diverged from a common ancestor because
of a speciation event.
o Paralogues are genes that have diverged as the result of a gene duplication
event.

Potrebbero piacerti anche