Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
DNA Sequencing
OUTLINE
DIRECT SEQUENCING
Manual Sequencing
Automated Fluorescent Sequencing
PYROSEQUENCING
BISULFITE DNA SEQUENCING
BIOINFORMATICS
THE HUMAN GENDME PRQJECT
OBJECTIVES
Compare and contrast the chemical (Maxam/Gilbert) and
the chain termination (Snger) sequencing methods.
List the components and the molecular reactions that
occur in chain termination sequencing.
Discuss the advantages of dye primer and dye terminator
sequencing.
,-*>
Derive a text DNA sequence from raw sequencing data.
Describe examples of alternative sequencing methods,
such as bisulfite sequencing and pyrosequencing.
Define bioinformatics and describe electronic systems
for the communication and application of sequence
information.
Recount the events of the Human Genome Project.
203
204
Direct Sequencing
The importance of knowing the order, or sequence, of
nucleotides on the DNA chain was appreciated in the earliest days of molecular analysis. Elegant genetic experiments with microorganisms detected molecular changes
indirectly at the nucleotide level.
Indirect methods of investigating nucleotide sequence
differences are still in use. Molecular techniques, from
Southern blot to the mutation detection methods described in Chapter 9, are aimed at identifying nucleotide
changes. Without knowing the nucleotide sequence of the
targeted areas, results from many of these methods would
be difficult to interpret; in fact, some methods would not
be useful at all. Direct determination of the nucleotide
sequence, or DNA sequencing, is the most definitive
molecular method to identify genetic lesions.
Manual Sequencing
Direct determination of the order, or sequence, of
nucleotides in a DNA polymer is the most specific and
direct method for identifying genetic lesions (mutations)
or polymorpbisms. especially when looking for changes
affecting only one or two nucleotides. Two types of
sequencing methods have been used most extensively:
the Maxam-Gilbert method- and the Snger method.^
Advanced Concepts
To make a radioactive sequence template. (
ATP can be added to the 5' end of a fragment, using
T4 pol y nucleotide kinase. or the 3' end. using terminal transferase plus alkaline hydrolysis to remove
excess adenylic acid residues. Double-stranded fragments labeled only at one end are also produced by
using restriction enzymes to cleave a labeled fragment asymmetrically, and the cleaved products are
isolated by gel electrophoresis. Alternatively, denatured single strands are labeled separately, or a
'sticky" end of a restriction site is filled in incorporating radioactive nucleotides with DNA poiymerase.
piperidine, the single-stranded DNA will break at specific nucleotides (Table 10.1 ).
After the reactions, the piperidine is evaporated, and
the contents of each tube are dried and resuspended in
fonnamide for gel loading. The fragments are then separated by size on a denaturing polyacrylamide gel
(Chapter 5). The denaturing conditions (formamide, urea,
and heat) prevent tbe single strands of DNA from hydrogen bonding with one another or folding up so that they
FA;
H+S
m Figure 10-1 Chemical sequencing proceeds in four separate reactions in which the labeled DNA fragment is selectiveiy braken at specific nucleofides. (DMS-dimethyisulphate;
FA-formic acid; H-hydrazine; H+S=hydrazlne+salt)
DNA Sequencing
Chapter 10
205
Base Moditier
Reaction
Dimethyisulphale
Formic acid
Hydrazine
Hydrazine -1- salt
Methylates G
Protonates purines
Splits pyrimidine rings
Splits only C rings
G+A
T+C
C
Advanced Concepts
Polyacrylamide gels from 6% to 20% are used for
sequencing. Bromophenol blue and xylene cyanol
loading dyes are used to monitor the migration of
the fragments. Run times range 1-2 hours for short
fragments (up to 50 bp) to 7-8 hours for longer fragments (more than 150 bp).
G+A C+T C
G
C
T
T
T
A
G
A
A
T
A
T
C
G
A
G
C
A
T
G
C
C
-.A
5'
206
Section 2
Advanced Concepts
Because of extensive use of M13, a primer that
hybridizes to M13 sequences could be used to
sequence any fragment. This primer, the M13 universal primer, is still used in some applications,
even though the M3 method of template preparation is no longer practical.
Ml 3 bacteriophage (ssDNA)
Insert fragment
to be sequenced
Recombinant
M13 bacteriophages
Lawn of bacteria
with Ml3 plaques
containing ssDNA
DNA Sequencing
Nitrogen base
5' OP-I
3'
Chapter 10
207
Nitrogen base
Tempiate
Area to be
sequenced
Figure 10-4 Manual dideoxy sequencing requires a singlestranded version of the fragment to be sequenced (tempiate). Sequencing is primed with a short synthetic piece ot
DNA compiementary to bases Just before the region to be
sequenced (primer). The sequence of the template wiii be
determined by extension of the primer in the presence of
dideoxynucieotides,
dNTP
Growing strand
Template strand
Figure 10-6 DNA repiication (left) is terminated by the absence of the 3' hydroxyi group on the
dideoxyguanosfne nucieotide (ddG, right). The resulting fragment ends in ddG.
ddNTP
208
Section 2
Advanced Concepts
PCR products are often used as sequencing templates. It is important that the ampiicons to be used
as sequencing templates are free of residual components of the PCR reaction, especially primers and
nucleotides. These reactants can interfere with the
sequencing reaction and lower the quality of the
sequencing ladder. PCR ampiicons can be cleaned
by adherence and washing on solid phase (column
or bead) matrices, alcohol precipitation, or enzymatic digestion with alkaline phosphatase. Alternatively, ampiicons can be run on an agarose gel and
the bands eluted. The latter method provides not
only a clean template but also confirmation of the
product being sequenced. It is especially useful
when the PCR reactions are not completely free of
misprimed bands or primer dimers (see Chapter 7).
Sequencing bufter is usually provided with the sequencing enzyme and contains ingredients necessary for the
polymerase activity. Mixtures of all four dNTPs and one
of the four ddNTPs are then added to each tube, with a
different ddNTP in each of the four tubes.
The ratio of ddNTPs:dNTPs is critical for generation
of a readable sequence. If the concentration of ddNTPs is
too high, polymerization will terminate too frequently
early along the template. If the ddNTP concentration is
too low, infrequent or no termination will occur. In the
beginning days of sequencing, optimal ddNTP:dNTP
ratios were determined empirically (by experimenting
with various ratios). Modern sequencing reagent mixes
have preoptimized nucleotide mixes.
With the addition of DNA polymerase enzyme to the
four tubes, the reaction begins. After about 20 minutes,
the reactions are terminated by addition of a stop buffer.
The stop buffer consists of 20 mM EDTA to chelate
cations and stop enzyme activity, formamide to denature
the products of the synthesis reaction, and gel loading
dyes (bromophenol blue and/or xylene cyanol). It is
important that all four reactions be carried out for equal
time. Maintaining equal reaction times will provide consistent band intensities in all four lanes of the gel
sequence, which facilitates final reading of the sequence.
ddA
dAdGdCdTdGdCdCdCdG
dAdGddC
dAdGdCdTdGddC
dAdGdCdTdGdCddC
dAdGdCdTdGdCdCddC
dAddG
dAdGdCdTddG
dAdGdCdTdGdCdCdCddG
dAdGdCddT
dAdGdCdTdGdCdCdCdG
Advanced Concepts
Manganese (Mn ^ ' ) may be added to the sequencing
reaction to promote equal incorporation of all dNTPs
by the polymerase enzyme.'*''- Equal incorporation
of the dNTPs makes for uniform band intensities
on the sequencing gel, which eases interpretation of
the sequence. Manganese increases tbe relative
incorporation of ddNTPs as well, which will
enhance the reading of the first part of the sequence
by increasing intensity of the smaller bands on the
gel. Modified nucleotides, deaza-dGTP and deoxyinosine triphosphate (dlTP). are also added to
sequencing reaction mixes to deter secondary structure in tbe synthesized fragments. Additives such as
Mn^ "". deaza-dGTP. and dITP are supplied in preoptimized concentrations in commercial sequencing
buffers.
209
210
Section 2
Advanced Concepts
Fluorescent dyes used for automated sequencing
include fluorescein and rhodamine dyes and Bodipy
(4,4-difluoro-4-bora-3a.4a-diaza-,-indacene) dye
derivatives that are recognized by commercial
detection systems.'*-^'' Automated sequence readers
excite the dyes with a laser and detect the emitted
fluorescence at predetermined wavelengths. More
advanced methods have been proposed to enhance
the distinction between the dyes for more accurate
determination of the sequence.-^''
Gel electrophoresis
X
Capillary electrophoresis
DNA Sequencing
(A) Aufomated dye primer sequencing
211
, Dye primer
ddATP
Chapter 10
Primer
Dye terminators y
A
ACCGTA
ddATP
AC
ddCTP
ddGTP
ACCGO
ddTTP
ACCGTAO
Dye
;^ terminator
removal
ACCG A
Completed
sequencing
reaction
AC
ddCTP
ACC
Ethanol
precipitation
ACCG A
Completed
sequencing
reaction
ddGTP
ACCG
ACCGT
ddlTP
ACCGTAT
Figure 10-10 Fluorescent sequencing chemistries. Dye primer sequencing uses lobeled primers (A), The products of all four
reactions are resolved fogether in one lane of a gel or in a capillary. Using dye terminators (B) only one reaction tube is necessary, since the fragments can be distinguished directly by the dideoxynucleotides on their 3' ends.
212
Section 2
Eiectrophoresis and
Sequence Interpretation
Both dye ptimer and dye terminator sequencing reactions
are loaded onto a slab gel or capillary gel in a single lane.
The fluorescent dye colors, rather than lane assignment,
distinguish which nucleotide is at the end of each fragment. Running all four reactions together not only
increases throughput but also eliminates lane-to-lane
migration variations that affect accurate reading of the
sequence. The fragments migrate through the gel according to size and pass in turn by a laser beam and a detec-
Advanced Concepts
DNA sequences with high G/C content are sometimes difficult to read due to intrastrand hybridization in the template DNA. Reagent preparations that
include 7-dea2a-dGTP (2'-deoxy-7-deazaguanosine
triphosphate) or deoxyinosine triphosphate instead
of standard dGTP improve the resolution of bands in
regions that exhibit GC band compressions or
bunching of bands close together so thul they are not
resolved, followed by several bands running farther
apart.
DNA Sequencing
Chapter 10
213
C C T T T T T G A A A T A A A G N C C T G C C C N G T A T T G C T T T A A A C A A G A T T T
10
20
30
40
MA
C C T C T A T T G T T G G A T C A T T C G T C A C A A A A T G A T T C T G A A T T A G C G T A T C G T
60
70
80
90
100
Figure 10-11 Electropherogram showing a dye blob at the beginning of a sequence (positions 9-15). The sequence read
around this area is not accurate.
214
Section 2
Common Techniques in .
G A T T C T G A A T T A G C T G T A T C G
80
90
Biology
G A T T C T G G A A T T N G C T G T A T C G
100
110
Figure 1O12 Exomples of good sequence quality (left) ond poor sequence quality (right). Note the clean baseline on the
good sequence; that is, only one color peak is present at each nucleotide position. Automatic sequence reading software wili
not Qccurateiy call a poor sequence. Compare the text sequences above the two scans.
Pyrosequencing
Chain termination sequencing is the most widely used
method to determine DNA sequence. Other methods have
been developed that yield the same information but not
with the throughput capacity of the chain termination
method. Pyrosequencing is an example of a method
designed to determine a DNA sequence without having to
make a sequencing ladder.''^ This procedure relies on the
G C T[GJG T G G C G T A
70
G C T T G T G G C G T A G
120
C T A C G C C A
110
Figure 10-13 Sequencing of a heterozygous G>T mutation in exon 12 of the ras gene. The normai codon sequence is GGT (right). The heterozygous mutation, (G/T) (center) is confirmed in the
reverse sequence, (C/A) (right).
C A AGC
DNA Sequencing
Chapter 10
215
G T A T G C A G A A A A T C T T A G A G T G T C C C A T C T G G T A A G T C A G C
G T A T G C A G A A A A T C T T A G A G T G T C C C A T C T G G T A A G T C A G C
W
S
M Y M S K K R W W S
SMR
gure 10-14 187 delAG mutation in the BRCA^ gene. This heterozygous dinucleotlde deietion is evident In the iower
ponei where, at the site ot the mutation, two sequences are overioid: the normal sequence and the normal sequence
minus two bases.
Tobie 10 2 Software Programs Commonly Used to Analyze and Apply Sequence Data
Software
BLAST
Name
Basic Local Alignment Search Tool
GRAIL
FASTA
Phred
Application
Compares an input sequence with all sequences in a selected
database
Finds gene-coding regions in DNA sequences
Rapid alignment of pairs of sequences by sequence patterns
rather than individual nucleotides
Reads bases from original trace data and recalls the bases,
assigning quality values to each base
Coniinued on following page
216
Table 10 2 Software Programs Commonly Used to Analyze and Apply Sequence Data (continued)
Software
Name
Application
Polyphred
Polyphred
Phrap
TIGR Assembler
Factura
Factura
SeqScape
SeqScape
Assign
Matchmaker
Assign
Matchmaker
Advanced Concepts
Pyrosequencing requires a s ingle-stranded sequencing template. Methods using streptavidin-conjugated
beads have been devised to easily prepare the template. First the region of DNA to be sequenced is
PCR-amplified with one of the PCR primers covalently attached to biotin. The amplicons are then
immobilized onto the beads and the nonbiotinylated
strand denatured with NaOH. After several washings
to remove all other reaction components, the
sequencing primer is added and annealed to the pure
single-stranded DNA template.
DNA Sequencing
Chapter 10
217
Stepi
Polvmerase
dNTP
Step 2
Luciferin
+ PPi
Oxi luciferin
^ulfurylase
^
APS + PPi
ATP
Time
Step 3
nNTP
ATP
est are purified from the gel. The purified fragments are
denatured with heat (97C for 5 minutes) and exposed to
bisulfite solution (sodium bisulfite, NaOH and hydroquinone) for 16-20 hours. During this incubation, the
cytosines in the reaction are deaminated, converting them
to uracils. whereas the 5-methyl cytosines are unchanged.
After the reaction, the treated template is cleaned, precipitated, and resuspended for use as a template for PCR
amplification. The PCR amplicons are then sequenced
in a standard chain termination method. Methylation is
detected by comparing the treated sequence with an untreated sequence and noting where in the treated se-
Apyrase
Apyrase
Nucieotide sequence
Nucleotide added
quence C/G base pairs are not changed to U/G; that is,
the sequence will be altered relative to controls at the
unmethylated C residues (Fig. 10-16).
Nonsequencing detection methods have also been
devised to detect DNA methylation, such as using restriction enzymes to detect restriction sites generated or
destroyed by tbe O U changes. Other methods use PCR
primers that will bind only to tbe converted or noneonverted sequences so that the presence or absence of
PCR product indicates the methylation status. These
methods, however, are not always applicable to detection of methylation in unexplored sequences. As the role
218
Section 2
. GTCAGCTATCTATCGTGCA..,
.GTUAGUTATUTATUGTGUA...
Bioinformatics
Information technology has had to encompass the vast
amount of data arising from the growing numbers of
sequence discovery methods, especially direct sequencing and array technology. This deluge of information
requires careful storage, organization, and indexing of
large amounts of data. Bioinformatics is the merger of
biology with information technology. Part of the practice
in this field is biological analysis in silico; that is. by
computer rather than in the laboratory. Bioinformatics
dedicated specifically to handling sequence information
is sometimes termed computational biology. A list of
some of the terms used in bioinformatics is shown in
Table 10.3. The handling of the mountains of data being
generated requires continual renewal of stored data and a
number of databases are available for this purpose.'^
Definition
Identity
Alignment
Local alignment
Multiple sequence
alignment
Optimal alignment
Conservation
Similarity
Algorithm
Domain
Motif
Gap
Homology
Orthology
Paralogy
Query
Annotation
Interface
GenBank
PubMed
SwissProt
DNA Sequencing
Standard expression of sequence data is important for
the clear communication and organized storage of
sequence data. In some cases, such as in heterozygous
mutations, there may be more than one base or mixed
bases at the same position in the sequence. Polymorphic
or heterozygous sequences are written as consensus
sequences, or a family of sequences with proportional
representation of the polymorphic bases. The International Union of Pure and Applied Chemistry and the International Union of Biochemistry and Molecular Biology
(lUB) have assigned a universal nomenclature for mixed,
degenerate, or wobble bases (Table i 0.4). The base designations in the IUB code are used to communicate consensus sequences and for computer input of polymorphic
sequence data.
Base.s
Mnemonic
Adenine
Cytosine
Guanine
Thymine
Uracil
A. G
CT
A,C
G,T
CG
A.T
A.CT
CG.T
A, C G
A,G,T
A. C G . T
Unknown
Deletion
Adenine
Cytosine
Guanine
Thymine
Uracil
puRine
pY ri m i dine
aMino
Keto
Strong (3 H bonds)
Weak (2 H bonds)
Not G
Not A
NotT
Note
aNy
A or C or G or T
C
C
T
U
R
Y
M
K
S
W
H
B
V
D
N
X. ?
0.-
Chapter 10
219
220
i-n Techniques, in
Section 2
Epstein-Barr vims
Mycoplasma genitalium
Haemophilus inuenzae
Escherichia coli K-}2
E. coli OI57
Saccharomvces cerevisiae
Drosophila melanogasier
Caennrhabditis elegans
Arohidopsis tholiana
Genome
Size (Mb)
Estimated Number
of Genes
0.17
0.58
1.8
4.6
5.4
12.5
180
97
100
80
470
1740
4377
5416
5770
13,000
19,000
25,000
Whoie genome
^ Known regions
> of individual
J chromosomes
Random reads
Assembly
Anchoring
Genome assembly
Y T
y T
if
221
Completion Dale
21
22
20
14
December 1999
May 2000
December 2001
January 2003
June 2003
July 2003
October 2003
March 2004
March 2004
May 2004
May 2004
September 2004
December 2004
March 2005
April 2005
April 2005
January 2006
March 2006
March 2006
April 2006
April 2006
May 2006
Y
7
6
13
19
I
9
5
16
X
2
4
8
11
12
17
3
i
222
STUDY
QUESTIONS
References
1. Amos J. Grody W. Development and integration of
molecular genetic tests into clinical practice: The US
experience. Expert Review of Molecular Diagnostics
2004;4(4):465-77.
2. Maxam A, Gilbert W. Sequencing end-labeled DNA
with base-specific chemical cleavage. Methods in
Enzymology I98O;65:499-56O.
3. Snger F, Nicklen S. Coulson AR. DNA sequencing with chain terminating inhibitors. Proceedings
4.
5.
6.
7.
8.
9.
10.
11.
12.
223
224