Sei sulla pagina 1di 12

Protein Structure

Protein folding: a primer


As we saw in an earlier lecture, proteins are linear chains of amino acid residues linked by peptide bonds. However, these linear chains fold up into remarkably complex three-dimensional (3D) structures which show little of the regularity that is present in DNA and RNA. Consequently, it has proven very difficult to predict the 3D structures of proteins and they must instead be determined using experimental methods. Currently, we know the structures of a perhaps one thousand nonhomologous proteins. These structures are remarkably diverse, varying greatly in shape and surface properties, which is a reflection of the functional diversity of proteins. The folding of a polypeptide into a compact structure is accompanied by a large decrease in entropy (disorder), which is thermodynamically unfavorable. Why, then, do proteins fold? The native conformation of a protein is maintained by a large number of weak, noncovalent interactions (H-bonds, ionic and hydrophobic interactions) that act cooperatively to offset the unfavorable reduction in entropy. These noncovalent interactions provide barely enough enthalpic contributions to overcome the large decrease in protein entropy: the energy difference between the folded and denatured states of a protein is typically about 2060 kJ mol1, which amounts to only a handful of hydrogen bonds.

Folding
G = 2060 kJ mol1

Unfolded protein
disordered (high entropy) few noncovalent (enthalpic) interactions

Folded protein
highly ordered (low entropy) many noncovalent interactions

The Anfinsen paradigm


Since folded proteins are only marginally more stable than their denatured state, they can usually be unfolded (denatured) by high temperatures and chemicals such as urea and guanidine hydrochloride (GnHCl). An amazing discovery, made by Anfinsen in the 1960s (and for which he was awarded a Nobel prize in 1972), was that the enzyme ribonuclease, following complete denaturation with 6 M GnHCl and 1 mM mercaptoethanol (to break the disulfide bonds), refolded
Lecture 6: Protein structure Page 1

to its native form and was enzymatically active if the GnHCl and mercaptoethanol were removed by dialysis. Thus, the Anfinsen paradigm was established: the information required for correct folding of the protein is contained within the amino acid sequence.

THE ANFINSEN EXPERIMENT


Unfolding GnHCl, mercaptoethanol FOLDED PROTEIN UNFOLDED PROTEIN

Dialysis

The Levinthal paradox


Many proteins fold into their native conformation in just a few seconds. So how does a protein find its native conformation? You might guess that it randomly searches through conformational space, exploring all possible configurations, until it stumbles upon the most energetically favourable structure. Cyrus Levinthal was the first person to attempt to estimate how long it might take a protein to undertake this conformational search. Let us imagine we have a 100-residue protein and that each residue can only assume one of three discrete conformations, clearly a gross simplification of the real situation where each residue can access many different , , and dihedral angles. Thus, the number of possible protein conformations is 3100 = 5 x 10 47. Now let us assume that the protein can explore new conformations at the same rate that single bonds can reorient (1013 structures/second), no doubt a gross overestimate of the true rate at which it can explore conformational space. The time, in seconds, it would therefore require to search all of conformational space = 5 x 1047/1013 s1 = 5 x 1034 seconds or 1.6 x 10 27 years!!! Given that this is much greater than the age of the universe (20 billion years = 6 x 1017) then it should be clear that there is no way that a protein folds by searching all of conformational space for the native configuration. The fact that proteins fold into unique structures directed only by their sequence, but cannot do so by random searching of conformational space, is known as the Levinthal paradox. Dr Peng will help to resolve the paradox in a later lecture!

Lecture 6: Protein structure

Page 2

Hierarchy of protein structure


Perhaps the most remarkable features of the molecule are its complexity and its lack of symmetry. The arrangement seems to be almost completely lacking in the kind of regularities which one instinctively anticipates, and is more complicated than has been predicted by any theory of protein structureJohn Kendrew, 1958

The above statement was made by Sir John Kendrew about the crystal structure of myoglobin (for which he shared the Nobel Prize in Chemistry in 1962). Four decades and several thousand protein structures later we continue to be amazed by the beauty and dazzling complexity of protein structures, which are much more diverse than those of nucleic acids. In order to make sense of this structural complexity, it helps to consider protein structure in terms of the hierarchy proposed by the Danish biochemist Kai Linderstrm-Lang:

1. Primary structure: the amino acid sequence of the protein. 2. Secondary structure: local, regular structures (such as -helices and -strands) which are stabilized by short-range backbone hydrogen bonds. You will sometimes see the term supersecondary structure applied to patterns of secondary structure that occur frequently in proteins. One common recurring pattern is the -- motif which has a segment of -structure, an intervening -helix, and a second -strand which is hydrogen-bonded to the first.

Lecture 6: Protein structure

Page 3

3. Tertiary structure: the tertiary structure defines the way in which the various elements of secondary structure come together in three-dimensions, stabilized by interactions between residues that are often far apart in the amino acid sequence. The tertiary structure is often comprised of domains. These are a common feature of many globular proteins, particularly when the molecular mass is greater than 20 kDa. The domains of a protein interact to varying degrees but less so than structural elements within a domain. Larger proteins are often folded so that each domain is ~17 kDa. For example, in E. coli, the enzymes phosphoribosyl anthranilate isomerase and indoleglycerol phosphate synthase are actually separate domains of a single polypeptide chain. The enzyme pyruvate kinase has four structurally diverse domains.

Some protein domains have been conserved throughout the course of protein evolution and are mixed and matched with other domains in a wide variety of proteins with different functions. These evolutionarily conserved domains are known as protein modules.

Lecture 6: Protein structure

Page 4

4. Quaternary structure: this term is only applicable to proteins in which different polypeptide chains interact to produce an oligomeric structure stabilized only by noncovalent interactions.

How can the structure of a protein be described?


One could completely describe the three-dimensional structure of a protein by placing it in a Cartesian coordinate system and listing (x, y, z) coordinates for each atom in the protein. Indeed, this is how experimentally-determined protein structures are stored electronically in the Protein Data Bank (see http://www.rcsb.org/pdb/). However, the structure of a protein can be more succinctly described by listing the angles of rotation (known as torsion or dihedral angles) of each of the bonds in the protein. For example, the backbone conformation of an amino acid residue can be specified by listing the torsion angles (rotation around the NC bond), (rotation around the CC' bond), and (rotation around the NC bond). These torsion angles are illustrated below.

The zero position for is defined with the NH group trans to the CC' bond, and for with the CN bond trans to the C=O bond. A full description of the three-dimensional structure of a protein also requires a knowledge of the sidechain torsion angles.

Are there any restrictions on the conformations which proteins can adopt?
Yes, there are numerous conformations that are highly unfavorable. For a start, the peptide-bond torsion angle () is generally restricted to 180. The CN bond has partial double bond character, due to resonance between the two forms shown below:

O C N H

OC + N H

Lecture 6: Protein structure

Page 5

The length of the CN bond (1.32 ) is intermediate between that of a CN single bond (1.49 ) and a C=N double bond (1.29 ). The partial double-bond character restricts rotation around the CN bond, such that the favored arrangement is for the O, C, N and H atoms to be coplanar, with the O and H atoms in trans; this corresponds to a torsion angle of = 180. The cis conformation, which generally only occurs in proteins for X-Pro peptide bonds, corresponds to = 0. Furthermore, not all possible combinations of and angles are available to proteins, as many lead to clashes between atoms in adjacent residues. For all residues except glycine, the existence of such steric restrictions drastically reduces the number of energetically favorable conformations. The possible combinations of and angles that do not lead to clashes can be plotted on a conformation map known as a Ramachandran plot, named after the Indian biochemist G.N. Ramachandran (19222001) who first calculated these maps. The left-hand panel in the figure below shows the Ramachandran plot calculated for the allowed conformations of alanylalanine (Ala-Ala). The red areas represent backbone conformations (combinations of and ) for which no steric hindrance exists. The yellow areas represent conformations for which some steric hindrance exists, but which may be possible if the distortion can be compensated for by interactions elsewhere in the protein. The white areas represent energetically unfavorable conformations. The right-hand panel in the figure below shows the actual (, ) combinations for residues in several highly refined protein crystal structures superimposed on the theoretical Ramachandran plot (green). Clearly, there is a good match between experiment and theory.

There are also energetically favored conformations for the amino acid sidechains (i.e., preferred values of the dihedral angles). This stems from the fact that the most energetically favored conformations for two tetrahedrally coordinated carbon atoms are the so-called staggered conformations in which the substituents on one carbon atom are located midway between those

Lecture 6: Protein structure

Page 6

on the other carbon atom when viewed down the CC bond. There are three possible staggered conformations (rotamers) related by a 120 rotation around the CC bond. For alanine, the three rotamers are energetically equivalent as the three substituents on the sidechain are equivalent (i.e., hydrogen atoms). However, this is not the case when the substituents differin this case, one rotamer may be energetically favored over the others. For example, in the case of valine, the first rotamer shown below is energetically favored because both methyl groups are close to the small hydrogen atom, thus minimizing steric repulsion. In the other two rotamers, one of the methyls is close to the much larger carbon or nitrogen substituent on the -carbon atom.

Protein secondary structure


Linus Pauling (19011994; Nobel Prize in Chemistry, 1954; Nobel Peace Prize, 1962) and Robert Corey (18971971) surmised in the 1950s (i.e., well before any protein structures had been experimentally determined) that the most stable protein folds would be those that maximized hydrogen bond formation, while still maintaining normal bond lengths and bond angles, and avoiding unfavorable steric overlap. They showed that there are only two polypeptide folds that adhere to this rule, and they christened them helices and sheets. In both of these motifs, the backbone torsion angles of a polypeptide (, , and ) are kept constant from one residue to the next, resulting in a regular repeating structure. helices In an helix, the polypeptide backbone is folded in such a way that the C=O group of each amino acid residue is hydrogen-bonded to the NH group of the fourth residue along the chainfor example, the C=O group of the first residue bonds to the NH group of the fifth residue, and so on. The exceptions are at the N- and C-terminal ends of the helix. At the N-terminal end, the NH groups on residues 1 to 4 have no available C=O groups with which they can interact. Consequently they remain unbonded. Similarly, at the other end of the chain, four C=O groups remain unbonded. If the polypeptide chain is very long, the lack of bonding at the ends has a negligible effect on the overall stability. However, short -helical chains are less stable
Lecture 6: Protein structure Page 7

because the end effects are relatively more important.

The backbone of the helix winds around the long axis, as shown in the figure below. The hydrogen bonds are all aligned approximately parallel to this axis, and the side chains protrude outwards. Since there is a small dipole associated with each hydrogen bond, and these are aligned in parallel, the helix has a large macrodipole which is positive at the N-terminal end. Helices often prefer to interact in an antiparallel manner so that their macrodipoles interact favorably.

Lecture 6: Protein structure

Page 8

Each residue in the helix is spaced 1.5 from the next along the helical axis, and 3.6 residues are required to make a complete helical turn. Although both left-handed and right-handed helices are theoretically possible (see figure opposite), right-handed helices are energetically favored for L-amino acids. The amino acid sidechains project out roughly orthogonal to the long axis of the helix (see panel A below) and, with the exception of proline, they do not interfere sterically or prevent helix H-bond formation. However, recall that proline is an imino acid in which the sidechain is bonded to the backbone nitrogen atom, thereby forming a ring. This ring provides some steric hindrance in helices and, because the backbone nitrogen of proline does not have an attached hydrogen atom, it cannot form H-bonds. Thus, proline is not well tolerated in helices except in the first turn (i.e., since residues 1 to 4 of the helix have no available backbone C=O groups with which they can form H-bonds, proline will be no different to any other residue in these positions).

It is often informative to plot the amino acid sequences of helices in the form of a helical wheel diagram. Since the pitch of an helix is 3.6 residues per turn, each adjacent residue will be separated by 360/3.6 = 100 on a schematic helical wheel as illustrated in panel B. The helical wheel essentially shows what the view would be like if you looked down the helical axis from one end of the helix at a compressed (i.e., two-dimensional) image of the helix. Certain amino acids have weak but definite preferences for or against being in helices. These preferences are summarized in the Table overleaf.

Lecture 6: Protein structure

Page 9

Table 1.Tendency of Amino Acid Residues to Form Helices Helix formers Helix breakers Indifferent residues Glu, Ala, Leu, His, Met, Gln, Trp, Val, Phe, Lys, Ile Pro, Gly, Tyr, Asn Asp, Thr, Ser, Arg, Cys

-sheet structures The second major regular repeating structure, the structure, differs from the helix in that the polypeptide chains are almost completely extended, and hydrogen bonding occurs between polypeptide strands, rather than within a single strand. In parallel sheets, adjacent chains are aligned in the same direction (i.e., N-terminal to C-terminal) whereas they are aligned in opposite orientations as in the antiparallel sheets. In antiparallel sheets, the hydrogen bonds are relatively linear but alternate between widely-spaced and narrowly-spaced pairs, as illustrated below.

In parallel sheets, the hydrogen bonds are evenly spaced, but they angle across between the strands, as illustrated in the figure overleaf. The sidechains in sheets alternate between protruding above and below the plane of the sheet. One consequence of this arrangement is that -sheet structures are favored by amino acids with relatively small side chains, such as alanine and

Lecture 6: Protein structure

Page 10

glycine. Large, bulky sidechains cause steric interference, which destabilizes the sheet.

-sheets

are generally twisted, and often highly so (see panel A below). Note that sheets often contain of a mixture of parallel and antiparallel strands as illustrated in panel B below.

Lecture 6: Protein structure

Page 11

Turns
turns, also known as reverse turns or hairpin turns, are another common structural motif found in proteins. -Turns consist of four residues, with a hydrogen bond between the carbonyl oxygen (C=O) of residue n and the amide hydrogen (NH) of residue n+3. Thus, they cause the polypeptide chain to abruptly change direction. -Turns are commonly found as the connections between two antiparallel -strands, hence their name. When used as a connection between strands, the first residue of the -turn is actually the last residue of the first -strand, while the fourth residue is actually the first residue of the connected antiparallel strand (thus, in the histogram opposite, n = 2 corresponds to a turn). A -hairpin is formed by connecting two antiparallel -strands by a -turn.

There are two main types of -turnstype I and type II. They differ mainly in the orientation of the central peptide bond with respect to the sidechain of residue 3. In type I (common) -turns, the carbonyl oxygen of the central peptide bond and R 3 point in opposite directions, thus avoiding steric hindrance. However, in type II -turns, these groups point in the same direction, and consequently the small sidechain of glycine is favored in the i+3 position as this minimizes steric hindrance. Glycine is found as the i+3 residue in ~60% of all type II turns.

Lecture 6: Protein structure

Page 12

Potrebbero piacerti anche