Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Many DNA and RNA molecules are recognized by proteins that interact
preferentially with a specific DNA sequence or a particular RNA
molecule. I address here the structural basis by which these proteins
recognize their target nucleic acid and show in what ways recognition of
RNA and DNA is both similar and different. Sequence-specific DNA-
binding proteins interact with duplex DNA that is in B-form. RNA
molecules, on the other hand, invariably consist of duplex regions, often
stacked one on another, that are A-form, as well as regions of single-
stranded loops and bulges, making possible a more complex and richly
varied three-dimensional shape than can be assumed by duplex DNA.
Presently, the crystallographic and nuclear magnetic resonance (NMR)
structural database of proteins complexed with DNA is very large,
revealing some patterns and general conclusions about the source of
sequence-specific DNA recognition (for reviews, see Steitz 1990; Har-
rison 1991; Pabo and Sauer 1992). On the other hand, the structural data-
base for RNA-binding proteins, particularly in complex with RNA, is
very meager indeed, so that any generalizations made may soon be over-
turned by the next structure determination of an RNA-protein complex.
Nevertheless, some patterns of similarity and difference in the structural
basis of nucleic acid recognition by proteins can be seen at this time.
Structural, biochemical, and molecular genetic studies of protein
nucleic acid complexes have established at least three important sources
of sequence specificity in protein-nucleic acid interactions: (1) Direct
hydrogen bonding and van der Waals interaction between protein side
chains and the exposed edges of base pairs provide structural com-
plementarity to the correct, but not to the incorrect, sequences. The inter-
actions are primarily, but not exclusively, in the major groove of B-DNA
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
220 T.A. Steitz
and to both the minor groove and the major groove at the end of a helix
or at a bulge in RNA structures. (2) The sequence-dependent bendability
or deformability of duplex DNA or RNA molecules provides sequence
selectivity by virtue of the ability of some nucleic acid sequences to take
up a particular structure required for binding to a protein at a lower free
energy cost than other sequences. (3) Bases of RNA that are in single-
stranded regions or in bulges can be directly recognized by pockets on
the protein that are complementary to these bases in shape and hydrogen-
bonding capabilities.
Let us first consider the problem confronting proteins interacting with ei-
ther duplex DNA or the duplex portion of an RNA molecule. The three-
dimensional structure of double-stranded DNA is highly polymorphic
(Kennard and Hunter 1989), but variations of two forms, A-form and B-
form, are of relevance to the proteins of interest here. Figure 1 shows an
important difference between A-form and B-form DNA. In B-DNA, the
major groove is wide enough to accommodate either an a-helix or an
antiparallel ^-ribbon, and the functional groups on the exposed edges of
the base pairs can be directly contacted by side chains of the protein. The
minor groove, on the other hand, is deep and narrow (5.8 A wide) and
thus less accessible to secondary structures such as an a-helix. For RNA,
which is always A-form, the opposite is true. The minor groove is shal-
low and broad (10-11 A wide), whereas the major groove is very deep
and narrow (4 A) (Delarue and Moras 1989). The width of the minor
groove in B-DNA varies depending on its base composition, AT-rich se-
quences have a narrower minor groove (3.5 A) than GC-rich sequences
(Yoon et al. 1988). Where adequate information is available it appears
(as might be expected) that in general most DNA-binding proteins direct-
ly decode DNA sequences via interactions in the major groove, although
some important exceptions are known. Escherichia coli integration host
factor and eukaryotic TFIID appear to recognize sequences by interac-
tions in the minor groove, although co-crystal structures of these proteins
interacting with DNA are not yet available.
Whereas on the basis of RNA structure alone one might expect
proteins to discriminate among duplex RNA molecules by interactions
with sequences via the minor groove, examples of interaction between
protein and RNA in both the major and minor groove are now known
(Rould et al. 1989; Ruff et al. 1991). Although it is true that the edges of
base pairs are inaccessible in the major groove of A-form RNA in the
central portion of a long duplex, most naturally occurring RNA mole-
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA 221
Figure I Stru ctures of A-fo rm (top) and B-form (bottom) DNA in space -fi lling
represent ation showing diffe rences in major and min or groove widt hs and
shapes . In the models on the left, the hel ix axes are parallel to the page; on the
right , the helix axes have bee n tilted up by 32° to show the groove shapes . Bases
are co lored blue, phosphoru s atoms are gree n, and all ot her atoms are w hite. Th e
edges of the bases are easi ly accessible fro m the major groove of B-DNA and
the min or or shallow groove of A-DNA (or RN A). (m) Minor groove; (M) major
groove. (Reprinted, w ith permiss ion, fro m Steitz 1990) .
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
222 TA Steitz
Major groove
Minor groove
-+----'+-
Figure 2 Hydrogen-bond donors and acceptors presented by Watson-Crick pairs
to the major groove and the minor groove (adapted from Lewis et a1. 1985). The
symbols for hydrogen-bond donors (hourglasses) and acceptors (diamonds)
(Woodbury et al. 1980) show a varied pattern presented by the base pairs into
the major groove and a poor information array presented into the minor groove.
Although it is possible to distinguish among AT, TA, GC, and CB in the major
groove, functional groups in the minor groove allow easy discrimination only
between AT- and GC-containing base pairs. (Open circle) Methyl group.
(Reprinted, with permission, from Steitz 1990.)
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA 223
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
224 TA Steitz
major groove
.lJ. .lJ.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA 225
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
226 T.A. Steitz
base pair of the acceptor stem between nucleotides VI and A72 (Rould et
aI. 1989). For glutaminyl-tRNA synthetase (GlnRS) recognition in charg-
ing of tRNA, it is important that this base-pair not be GC (Yarus et aI.
1977). The added free energy cost of breaking the GC base pair makes
tRNAs containing a GC at 1-72 less suitable for proper binding to the en-
zyme, reducing kcatlKm by about 10-fold (Jahn et aI. 1991). In a second
example, the 3' ,5 ' -exonuclease active site of E. coli DNA polymerase I
is observed to denature duplex DNA and bind four single-stranded
nucleotides at the 3' terminus (Freemont et aI. 1988). In a competition
between the duplex-binding polymerase active site and the single-strand-
binding exonuclease active site for the 3' end of the primer strand,
duplex DNA containing a mismatch base pair will bind to the ex-
onuclease site with greater frequency than a correctly matched duplex,
thus enhancing the editing out of mismatch base pairs (Joyce and Steitz
1987; Freemont et al. 1988).
Sequence recognition in RNA also arises from the sequence-
dependent ability of single-stranded RNA to take up the conformation re-
quired for protein binding, as occurs in the single-stranded acceptor end
of tRNA Gin (Fig. 4). The observed interaction between the N2 of G73
and the backbone phosphate of A72 is not possible for the other three
bases (Rould et al. 1989), consistent with the observation that changing
G73 to A, C, or V reduces the kcat/Km for charging by one, three, and
four orders of magnitude, respectively (Jahn et al. 1991). Furthermore,
two non- Watson-Crick base pairs are formed at the end of the anticodon
stem in tRNA Gin (see Fig. 8), producing a structure that is recognized by
the synthetase (Rould et al. 1991). Other bases unable to make these non-
Watson-Crick base pairs would not allow formation of the structure
being recognized and bound to this enzyme. Although binding of
tRNAAsp to its cognate synthetase results in a very major change in the
conformation of the anticodon loop (Ruff et aI. 1991), it is not yet pub-
lished whether or not any part of this structural change involves altera-
tions in RNA-RNA interactions that are dependent on the RNA se-
quence, as occurs with GlnRS.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA 227
Figure 4 Conformation of the end of the acceptor stem and the 3' strand in
tRNA GIn bound to GlnRS (from Rould and Steitz 1992). The expected base pair
between VI and A72 is broken by Leu-136 , which packs against the guanine of
the G2-C71 base pair. The 2-amino group of guanine 73 hydrogen-bonds to the
phosphate backbone, stabilizing the hairpin conformation of the 3' strand into
the active site. Cytosine 74 binds into a tight pocket in the protein, allowing the
bases of nucleotides 73, 75, and 76 to stack.
it is also making at least two other hydrogen bonds with obligate donors
or acceptors on the protein and is sequestered from bulk solvent. In this
circumstance, the two unsatisfied water H-bond donors/acceptors
directed toward the nucleic acid become obligate donors/acceptors and
consequently become part of the H-bonding template surface of the
protein to which the nucleic acid must be complementary for optimal
binding (Fig. 5). In trp repressor-DNA complex, there are three water
molecules per half operator bound in the major groove between the
protein and the DNA bases; at least two of them appear to be making
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
228 T.A. Steitz
NUCLEIC ACID
Figure 5 Schematic drawing showing how a water molecule can be specifically
oriented by interactions with the protein turning it into a surrogate side chain.
For example, here two obligate proton donors from the protein bind a water
molecule such that it requires H-bond acceptors on the nucleic acid.
hydrogen bonds that specify base pairs 5, 6, and 7 from the dyad axis
(Otwinowski et al. 1988; Steitz 1990). In this case, water molecules are
playing the role of "honorary" protein side chains. In the GlnRS complex
with tRNA, two buried water molecules are an integral part of the
hydrogen-bonding matrix presented in the shallow groove of the tRNA
acceptor stem (Rould et al. 1989; Rould and Steitz 1992). Hydrogen
bonds between these two water molecules, as well as both a buried car-
boxy late of Asp-235 and a backbone amide of residue 183, serve to
orient one hydrogen-bond donor of water toward the 0 2 of cytosine 71
and one acceptor toward the N2 of guanine (Fig. 6).
As pointed out by Seeman et al. (1976), there are fewer features present-
ed by base pairs in the minor groove that allow discrimination among the
two base pairs in their two orientations (Fig. 2). The hydrogen-bond ac-
ceptors (N3 on guanine and adenine and 0 2 on cytosine and thymine)
occur in almost the identical place in the minor groove for all four bases.
Only the exocyclic of N2 of guanine distinguishes AT from GC and per-
haps GC from CG. Furthermore, the minor groove of B-DNA is in gener-
al too narrow to accommodate an a-helix or too deep for bases to be
reached by side chains alone.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA 229
Figure 6 View of the recognition interface between GlnRS and base pairs G2-
Gln
C71 and G3-C70 of tRNA (from Rould and Steitz 1992). Asp-235 directly
bonds to the 2-amino group of guanine 3 via the minor groove. The backbone
carbonyl of Pro-181 is rigidly directed to hydrogen-bond to the 2-amino group
of guanine 2. A network of water molecules between the proteins and minor
groove of the tRNA, only two of which are shown here, appear to enforce a re-
quirement for GC base pairs at these positions. The hydrophobic environment
formed by the proline, phenylalanine, isoleucine, and the underside of the ribose
sugars enhances the strength and specificity of these direct and water-mediated
hydrogen bonds.
There are ways, however, in which the interactions in the minor groove
can be made sequence-specific. For example, the sequence preferences
exhibited in the DNase I cleavage of DNA arise from its interactions in
the minor groove (Suck et al. 1988). This side chain of a tyrosine ob-
served to bind in the minor groove will fit into the normal-width minor
groove but not into the narrower minor groove that characterizes AT-rich
sequences.
Biochemical evidence for several sequence-specific DNA-binding
proteins implies that they interact with DNA via the minor groove, al-
though direct structural visualization of such an interaction has not yet
been achieved. Yang and Nash (1989) have argued on the basis of
methylation protection studies that E. coli integration host factor (IHF)
interacts in the minor groove. IHF has significant sequence similarity
with E. coli Hu protein, whose crystal structure (Tanaka et al. 1984)
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
230 T.A. Steitz
shows two long antiparallel ~-Ioops, one from each subunit of the dimer,
which form outstretched arms that create a large cleft sufficient in size to
accommodate duplex DNA. The model for an IHF-DNA complex (Yang
and Nash 1989) is based on one for Hu-DNA (Tanaka et a!. 1984) and
places the antiparallel ~-loops in the minor groove in a manner proposed
earlier for antiparallel f)-strands (Carter and Kraut 1974; Church et al.
1977). The recently determined crystal structure of Arabidopsis thaliana
TFIID similarly portrays a protein with pseudo-dyad symmetry and a
twisted, antiparallel ~-sheet forming a cleft of size sufficient to accom-
modate B-DNA (Nikolov et al. 1992). Biochemical data likewise point to
minor groove interaction by this protein (Lee et al. 1991; Starr and
Hawley 1991), although the structural basis of this interaction is not yet
established.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA 231
Forsyth and Schimmel 1992). The alanine synthetase has been clearly
Aia
shown to recognize base pair 3-70, which is GU in t R N A . Replacing
G3 by an inosine, which lacks the N2, dramatically reduces charging of a
minihelix (Musier-Forsyth and Schimmel 1992).
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
N
Coo)
N
~ ~
e~~ e~,l' -l
Co 'b
~,}~~
~0' -I
. " J.. v-Q~r
. ._)<l~'\~o~R410~ ~J'-!
. ._I~ ~<t'\ ~R410~
-II. V-
l'
. J....!. .V-),,{
' 1?'- " .... .e
II" .. ....• .! !' 0
.....__ "'-'" e = IlII ~ • . ""Il! -~ •
NH,o.,/ t U-35 .... ..o.~'.t.'g'~ C-34 NH~,C"'/ U-35 r .... .g••c~~;:;\.~ C-34
)1' e /1 •••• 1": ~& co '"t/" .Il! ,j(...... r. ~&" co
05 17 , ,/ !I[~\ \..... '--00 0 517 .,¥.. ,r~~\ : '7 "
•~ ;~J~ 0 ::.. R412 c<k' ;} .~ ,:.{'b~~ 0 :.. R412 {O\( ,
{~•.('••' r:;=:!/.. .•' ~"!"";/ If <. . ~irl' 0=\ .' ~ '% ! ct "
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
Figure 7 Stereo view of the binding pockets for the anticodon bases C34, U35, and G36 (dark) in the GlnRS-tRNAGln complex.
Each nucleotide is recognized primarily by a single short polypeptide segment in the enzyme (light). In each case, an arginine or
lysine from the polypeptide anchors the nucleotide by its phosphate group, allowing peptide backbone and side chains of the seg-
ment to specifically recognize the base (from Rould and Steitz 1992).
Protein Recognition of RNA and DNA 233
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
234 T.A. Steitz
Figure 8 Stereo view of the two novel non-Watson-Crick base pairs that extend
Gln
the anticodon stem of tRNA when complexed to GlnRS, showing the water
network between these bases and the sugar-phosphate backbone. Asp-370
directly contacts both base pairs via the minor groove.
SUMMARY
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA 235
REFERENCES
Aggarwal, A.K., D.W. Rodgers, M. Drottar, M. Ptashne, and S.c. Harrison. 1988. Recog-
nition of a DNA operator by the repressor of phage 434: A view at high resolution.
Science 242: 99-107.
Carter, C.W. and J. Kraut. 1974. A proposed model for interaction of polypeptides with
RNA. Proc. NaIl. Acad. Sci. 71: 283-287.
Church, G.M., J.L. Sussman, and S.-H. Kim. 1977. Secondary structure complementarity
between DNA and proteins. Proc. Natl. Acad. Sci. 74: 1458-1462.
Conley, J., H. Uemura, F. Yamao, J. Rogers, and D. SOIL 1988. E. coli glutaminyl tRNA
synthetase: A single amino acid replacement relaxes tRNA specificity. Protein Se-
quences Data Anal. 1: 479-485.
Delarue, M. and D. Moras. 1989. RNA structure. Nucleic Acids Mol. Biol. 3: 182-196.
Drew, H.R. and A.A. Travers. 1984. DNA structural variations in the E. coli tyrT
promoter. Cell 37: 491-502.
Frederick, CA, J. Grable, M. Melia, C. Samudzi, L. Jen-Jacobsen, B.-C. Wang, P.
Greene, H.W. Boyer, and J.M. Rosenberg. 1984. Kinked DNA in crystalline complex
with EcoRI endonuclease. Nature 309: 327-331.
Freemont, P.S., J.M. Friedman, L.S. Beese, M.R. Sanderson, and T.A. Steitz. 1988. Co-
crystal structure of an editing complex of Klenow fragment with DNA. Proc. Natl.
Acad. Sci. 85: 8924-8928.
Gartenberg, M.R. and D.M. Crothers. 1988. DNA sequence determinants of CAP-
induced bending and protein binding affinity. Nature 333: 824-829.
Harrison, S.c. 1991. A structural taxonomy of DNA-binding domains. Nature 353:
715-719.
Hou, Y.-M. and P. Schimmel. 1988. A simple structural feature is a major determinant of
the identity of a transfer RNA. Nature 333: 140-145.
Hou, Y.-M., C. Francklyn, and P. Schimmel. 1989. Molecular dissection of a transfer
RNA and the basis for its identity. Trends Biochem. Sci. 14: 233-237.
Jahn, M., J. Rogers, and D. S611. 1991. Anticodon and acceptor stem nucleotides in
tRNA GIn are major recognition elements for E. coli glutaminyl-tRNA synthetase. Na-
lure 352: 258-260.
Joyce, C.M. and T.A. Steitz. 1987. DNA polymerase 1. From crystal structure to function
via genetics. Trends Biochem. Sci. 12: 288-292.
Kennard, O. and W.N. Hunter. 1989. Oligonucleotide structure: A decade of results from
single crystal X-ray diffraction studies. Q. Rev. Biophys. 22: 327-379.
Kiesewetter, S., G. Ott, and M. Sprinzl. 1990. The role of modified purine 64 in in-
itiator/elongator discrimination of tRNA j Met from yeast and wheat germ. Nucleic Acids
Res. 18: 4677-4682.
Koudelka, G.B., S.C. Harrison, and M. Ptashne. 1987. Effect of non-contacted bases on
the affinity of 434 operator for 434 repressor and cro. Nature 326: 886-888.
Lee, D.K., M. Horikoshi, and R.G. Roeder. 1991. Interaction of TFIID in the minor
groove of the TATA element. Cell 67: 1241-1250.
Lewis, M., J. Wang, and C. Pabo. 1985. Structure of the operator binding domain of
lambda repressor. In Biological macromolecules and assemblies, vol. 2 (ed. F.A.
Jurnak and A. McPherson), pp. 266-287. Wiley, New York.
Matthews, B.W. 1988. No code for recognition. Nature 335: 294-295.
McClain, W.C. and K. Foss. 1988. Changing the identity of a tRNA by introducing a G-
U wobble pair near the 3' acceptor end. Science 240: 793-796. <,
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.
Protein Recognition of RNA and DNA 237
The RNA World © 1993 Cold Spring Harbor Laboratory Press 0-87969-380-0
For conditions see www.cshlpress.com/copyright.