Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
.
Hence any structure determined using this method does not include any
long-range information at all. This is not a signicant problem for globular
proteins, where there is usually a high density of distance restraints asso-
ciated with the atoms in the core of the molecule.
Since an NOE gives only an estimate of the distance between two
atoms, it is common practice to assign each NOE to two or more bins,
for example, between 1.8 and 2.7 A
, and between
1.8 and 5.0 A
, ring pla-
narity, and proline puckering, as well as checks for atoms that may be too
close to each other (bumps). The chirality of atom groups is checked by
calculating the improper dihedral angles, which should not deviate
greatly from 35
structure of gene V
binding protein (PDB code 2gn5); on the right is a backbone representation of a
higher-resolution structure (1.8 A
P
n
i1
d
2
i
n
v
u
u
u
t
where d
i
is the distance between two equivalent heavy atoms, i, and there are
n equivalent atoms.
A nontrivial but critical part of the rmsd calculation is to take residue
symmetries into account. For example, a phenylalanine side chain has a
twofold symmetry, which leads chemically identical atoms with dierent
names to occupy the same space. This concerns the following residues
(equivalent atoms are given in parentheses): tyrosine (Co, Cc); phenylalanine
(Co, Cc; valine C,1. C,2); leucine (Co1. Co2); arginine (NH1, NH2); aspar-
tate (Oo1. Oo2); glutamate (Oc1, Oc2). To nd the correct rmsd, one must
swap the indices of specied atoms 1 and 2 and choose a resulting cong-
uration with lower rmsd.
The rmsd works reasonably well if there is no plastic deformation of
the overall structure and all the individual atomic deviations are of the same
order of magnitude. Otherwise, the rmsd value will be dominated by the
largest deviation in a frequently unimportant region (e.g., a surface lysine
side chain) and will not be interpretable.
To address the problem of plastic deformations or domain rearrange-
ments, some methods attempt to identify a subset of aligned pairs that
produce the best rmsd after superposition. For example, Lackner et al.
304 Marsden and Abagyan
2003 by Taylor & Francis Group, LLC
(80) recently have devised a method (ProSup) able to accurately superim-
pose structures that are remotely related in terms of structure while allowing
the quality of a number of structural ts to be ranked purely on the basis of
the number of aligned residue pairs. A lter is also used on aligned residue
pairs whose side chains are incompatible (e.g., in beta-strands that are mis-
aligned by one residue). ProSup supplies a range of possible structural
alignments, which may be useful in the derivation of new, empirically
based error detection methods. ProSup was applied to the results of
CASP3 (81).
5.2 Backbone and Ca Coordinate rmsd
This measure also relies on a global static superposition of all the equivalent
atoms at the same time. The backbone conformation is a more structurally
conservative feature than the side chains. Therefore, measuring the rmsd of
the backbone is a better indicator of signicant structural deformations.
However, the foregoing limitations concerning the relative domain shifts
and plastic deformations still apply. In addition, movements of large exible
loops will create a problem with the backbone rmsd, which is equivalent, in
essence, to the long side-chain deviations that plague heavy-atom rmsd, i.e.,
the loop deviation will dominate the overall backbone rmsd.
Both backbone and all-heavy-atom measures are global in nature and
do not discriminate between evenly and unevenly distributed deformations.
Secondly, in the case of an uneven distribution of the deformation, it does
not tell us where the deformation occurs. Analysis of the individual con-
tributions to the total rmsd resolves the problem.
5.3 Local (Fragment) Coordinate rmsd
The main limitation of the previous two measures is their reliance on a
global superposition of all equivalent atoms. However, frequently this
superposition does not make sense, e.g., in a multidomain protein where
the constituent domains shift with respect to each other without changing
their internal structure. Superimposing only a protein fragment, e.g., 1025
residues, before calculating the local rmsd partially resolves this problem.
The choice of the window size will depend upon the scale of the structural
changes of interest. The net result of this calculation is an array of deviation
values that can be mapped onto the protein backbone.
5.4 Fraction-Correct Measure Based on Local rmsd Prole
Predictions of larger proteins frequently lead to a mosaic pattern of correct
and completely wrong topologies. The best way to characterize the overall
Identifying Errors in 3D Protein Models 305
2003 by Taylor & Francis Group, LLC
quality of such a model is to dene what fraction of the topology is correct.
To do that, we need to postulate a threshold for the local rmsd value. If a
cuto value is applied to the error prole calculated by the fragment rmsd
calculation just described, one can then easily calculate the ratio of the
residues below the threshold.
5.5 Angular rmsd
The previous calculations were based on optimal superposition of equivalent
pairs of atoms followed by the measurement of the deviations of absolute
Cartesian positions of atoms. However, the use of the internal coordinates,
such as torsion angles of the backbone and side chains, avoids the need for
structural superposition and is free from the eects of plastic deformations.
For example, the backbone angular rmsd is calculated as:
rmsd
X
n
i1
2
i
[
2
i
n
v
u
u
u
t
where n is the number of equivalent residues,
i
is the dierence between
the phi angles of the equivalent pair of residues, i, and [
i
is the dierence
between the psi angles of the same equivalent pair of residues.
A similar approach can be applied to the internal coordinates describ-
ing secondary structure topology (82).
5.6 Distance rmsd
Interatomic distances are the third wayand probably the most physical in
natureto characterize protein conformation. Therefore, analyzing changes
of those distances leads us to the distance rmsd measure. A straightforward
measure relying on interatomic distances is the all-atom distance rmsd.
However, the large distances are not as signicant in terms of local error
and can be ltered out. Contact maps and contact map dierences can be
derived to evaluate the conformational changes and errors. A matrix with all
CoCo distances is calculated for each of the two structures. Then the two
matrices can be compared and the dierence contact map derived. A single
number can also be derived to characterize the deviation.
5.7 Contact-Area Difference
Contact-area dierence (CAD) (28) is a normalized sum of absolute dier-
ences of residueresidue contact surface areas calculated between the model
and the true structure. Since this approach reects not only backbone
topology but also side-chain packing, and is smooth, continuous, threshold
306 Marsden and Abagyan
2003 by Taylor & Francis Group, LLC
free, and not aected by plastic deformation, it can be successfully used to
detect dierences between two structures that backbone rmsd is sometimes
unable to discover. And CAD can also be used with a xed window length
to create a prole showing local regions of a model that are signicantly
dierent compared to the true structure.
5.8 Static rmsd
In both global and local rmsd, the same set of atoms is used for super-
position and calculation of the deviations. However, some applications,
e.g., measuring loop errors or peptide placement errors or position of a
small domain with respect to a larger domain, call for a dierent calculation.
In this case the superposition is rst performed over the atoms that are
dierent from those in the area where an error or deviation is expected.
Then the rmsd is calculated only over the atoms of interest. This measure
is called static rmsd, for lack of a better name.
5.9 Relative Displacement Error
The relative displacement error is a measure that can be used to evaluate the
accuracy of a loops that can be partially correct. In this case, the local
prole is calculated with a static rmsd procedure, threshold (similar to the
fraction-correct method earlier) is dened, and the number of correctly
predicted loop residues is calculated.
6 OUTLOOK
The move toward more automated and higher-throughput methods of pro-
tein structure prediction and experimental determination provides new chal-
lenges to the protein modeler. On these large scales of data throughput, it is
not possible to carefully hand-inspect each stage of the structure determina-
tion of each protein. This lack of quality-control will inevitably lead to
less accurate models than those models that are hand-determined by a per-
son with an experienced eye.
The current lack of consistent error annotation for all generated or
predicted models is akin to the lack of expiration dates on food product-
syou might end up drinking milk that is far past its best, with nasty
consequences! Similarly, a model should be labeled similar to: Protein
structure health warning: the use of fragments 1024 and 152200 is hazar-
dous for your predictions and may lead to a sudden lack of employment!
Therefore it is becoming increasingly important that new methods for
error detection and prediction be created that can automatically annotate
model structures, giving an indication of the local reliability of the model.
Identifying Errors in 3D Protein Models 307
2003 by Taylor & Francis Group, LLC
Better statistics on erroneous structures, new, and more accurate, force
elds, better global optimization algorithms, and combination approaches
will lead to a better, more useful models.
ACKNOWLEDGEMENTS
We would like to thank Maxim Totrov, Sergei Batalov, and Wen Hwa Lee
for interesting discussions and help in the preparation of this chapter. We
would like to thank Molsoft LLC for providing the ICM package with
which most gures in this chapter were generated. Finally we would like
to thank the NIH, DOE, and the Wellcome Trust for their nancial support.
REFERENCES
1. Ferentz, A.E. and G. Wagner. NMR spectroscopy: a multifaceted approach
to macromolecular structure. Q Rev Biophys 33(1):2965, 2000.
2. Jones, T.A. and G.J. Kleywegt. CASP3 comparative modeling evaluation.
Proteins 37(S3):3046, 1999.
3. Bru nger, A.T., P.D. Adams, and L.M. Rice. Recent developments for the
ecient crystallographic renement of macromolecular structures. Curr
Opin Struct Biol 8(5):60611, 1998.
4. Adams, P.D. and R.W. Grosse-Kunstleve. Recent developments in software
for the automation of crystallographic macromolecular structure determina-
tion. Curr Opin Struct Biol 10(5):564568, 2000.
5. Stevens, R.C. High-throughput protein crystallization. Curr Opin Struct Biol
10(5):558563, 2000.
6. Xu, Y., J. Wu, D. Gorenstein, and W. Braun. Automated 2D NOESY assign-
ment and structure calculation of Crambin(S22/I25) with the self-correcting
distance geometry-based NOAH/DIAMOD programs. J Magn Reson
136(1):7685, 1999.
7. Mumenthaler, C., P. Guntert, W. Braun, and K. Wuthrich. Automated com-
bined assignment of NOESY spectra and three-dimensional protein structure
determination. J Biomol NMR 10(4):351362, 1997.
8. Abagyan, R., S. Batalov, T. Cardozo, M. Totrov, J. Webber, and Y. Zhou.
Homology modeling with internal coordinate mechanics: deformation zone
mapping and improvements of models via conformational search. Proteins
Suppl(1):2937, 1997.
9. Cardozo, T., M. Totrov, and R. Abagyan. Homology modeling by the ICM
method. Proteins 23(3):403414, 1995.
10. Guex, N. and M.C. Peitsch. SWISS-MODEL and the Swiss-PdbViewer: an
environment for comparative protein modeling. Electrophoresis
18(15):27142723, 1997.
11. Sali, A. and T.L. Blundell. Comparative protein modeling by satisfaction of
spatial restraints. J Mol Biol 234(3):779815, 1993.
308 Marsden and Abagyan
2003 by Taylor & Francis Group, LLC
12. Sanchez, R. and A. Sali. Large-scale protein structure modeling of the
Saccharomyces cerevisiae genome. Proc Natl Acad Sci USA,
95(23):1359713602, 1998.
13. Berman, H.M., T.N. Bhat, P.E. Bourne, Z. Feng, G. Gilliland, H. Weissig,
and J. Westbrook. The Protein Data Bank and the challenge of structural
genomics. Nat Struct Biol 7 Suppl:957959, 2000.
14. Hooft, R.W., G. Vriend, C. Sander, and E.E. Abola. Errors in protein
structures. Nature 381(6580):272, 1996.
15. Howell, P.L., S.C. Almo, M.R. Parsons, J. Hajdu, and G.A. Petsko. Structure
determination of turkey egg-white lysozyme using Laue diraction data. Acta
Crystallogr B 48(Pt 2):200207, 1992.
16. Novotny, J., R. Bruccoleri, and M. Karplus. An analysis of incorrectly folded
protein models. Implications for structure predictions. J Mol Biol
177(4):787818, 1984.
17. Novotny, J., A.A. Rashin, and R.E. Bruccoleri. Criteria that discriminate
between native proteins and incorrectly folded models. Proteins 4(1):1930,
1988.
18. Kocher, J.P., M.J. Rooman, and S.J. Wodak. Factors inuencing the ability
of knowledge-based potentials to identify native sequence-structure matches. J
Mol Biol 235(5):15981613, 1994.
19. Jones, D.T. and J.M. Thornton. Potential energy functions for threading.
Curr Opin Struct Biol 6(2):210216, 1996.
20. Bryant, S.H. and S.F. Altschul. Statistics of sequencestructure threading.
Curr Opin Struct Biol 5(2):236244, 1995.
21. Wodak, S.J. and M.J. Rooman. Factors inuencing the ability of knowledge-
based potentials to identify native sequence-structure matches. Curr. Opin.
Struct. Biol. 3:247259, 1993.
22. Sippl, M.J. Recognition of errors in three-dimensional structures of proteins.
Proteins, 17(4):355362, 1993.
23. Wuthrich, K. NMR of Proteins and Nucleic Acids. New York: Wiley,
1986.
24. Jones, T.A., J.Y. Zou, S.W. Cowan, and Kjeldgaard. Improved methods for
binding protein models in electron density maps and the location of errors in
these models. Acta Crystallogr A 47(Pt 2):110119, 1991.
25. Kleywegt, G.J. Validation of protein crystal structures. Acta Crystallogr D
Biol Crystallogr 56(Pt 3):249265, 2000.
26. Kleywegt, G.J. Validation of protein models from C-alpha coordinates alone.
J Mol Biol, 273(2):371386, 1997.
27. Ratnaparkhi, G.S., S. Ramachandran, J.B. Udgaonkar, and R. Varadarajan.
Discrepancies between the NMR and X-ray structures of uncomplexed
barstar: analysis suggests that packing densities of protein structures deter-
mined by NMR are unreliable. Biochemistry 37(19):69586966, 1998.
28. Abagyan, R.A. and M.M. Totrov. Contact area dierence (CAD): a robust
measure to evaluate accuracy of protein models. J Mol Biol 268(3):678685,
1997.
Identifying Errors in 3D Protein Models 309
2003 by Taylor & Francis Group, LLC
29. Kuszewski, J., A.M. Gronenborn, and G.M. Clore. Improving the packing
and accuracy of NMR structures with a pseudopotential for the radius of
gyration. J Am Chem Soc 121(10):23372338, 1999.
30. Gronenborn, A.M. and G.M. Clore. Structures of protein complexes by multi-
dimensional heteronuclear magnetic resonance spectroscopy. Crit Rev
Biochem Molec Biol 30(5):351385, 1997.
31. Rost, B. Twilight zone of protein sequence alignments. Protein Eng
12(2):8594, 1999.
32. Chothia, C. and A.M. Lesk. The relation between the divergence of sequence
and structure in proteins. Embo J 5(4):823826, 1986.
33. Network, E.-D.V. Who checks the checkers? Four validation tools applied to
eight atomic resolution structures. EU 3-D Validation Network. J Mol Biol
276(2):417436, 1998.
34. Guntert, P., C. Mumenthaler, and K. Wuthrich. Torsion angle dynamics for
NMR structure calculation with the new program DYANA. J Mol Biol
273(1):283298., 1997.
35. Engh, R.A. and R. Huber. Accurate bond and angle parameters for X-ray
protein structure renement. Acta Cryst A47:400404, 1991.
36. Pontius, J., J. Richelle, and S.J. Wodak. Deviations from standard atomic
volumes as a quality measure for protein crystal structures. J Mol Biol
264(1):121136, 1996.
37. Laskowski, R.A., J.A. Rullmannn, M.W. MacArthur, R. Kaptein, and J.M.
Thornton. AQUA and PROCHECK-NMR: programs for checking the quality
of protein structures solved by NMR. J Biomol NMR, 8(4):477486, 1996.
38. Colovos, C. and T.O. Yeates. Verication of protein structures: patterns of
nonbonded atomic interactions. Protein Sci 2(9):15111519, 1993.
39. Luthy, R., J.U. Bowie, and D. Eisenberg. Assessment of protein models with
three-dimensional proles. Nature 356(6364):8385, 1992.
40. Eisenberg, D., R. Luthy, and J.U. Bowie. VERIFY3D: assessment of protein
models with three-dimensional proles. Methods Enzymol 277:396404, 1997.
41. Sippl, M.J. Calculation of conformational ensembles from potentials of mean
force. An approach to the knowledge-based prediction of local structures in
globular proteins. J Mol Biol 213(4):859883, 1990.
42. Melo, F. and E. Feytmans. Assessing protein structures with a nonlocal
atomic interaction energy. J Mol Biol 277(5):1141-1152, 1998.
43. Ben-Naim, A. Statistical potentials extracted from protein structures: are these
meaningful potentials? J. Chem. Phys. 107(9):36983706, 1997.
44. Miwa, J.M., I. Ibanez-Tallon, G.W. Crabtree, R. Sanchez, A. Sali, L.W. Role,
and N. Heintz. lynx1, an endogenous toxin-like modulator of nicotinic acet-
ylcholine receptors in the mammalian CNS. Neuron 23(1):105114, 1999.
45. Maiorov, V. and R. Abagyan. Energy strain in three-dimensional protein
structures. Fold Des 3(4):259269, 1998.
46. Abagyan, R. and M. Totrov. Biased probability Monte Carlo conformational
searches and electrostatic calculations for peptides and proteins. J Mol Biol
235(3):9831002, 1994.
310 Marsden and Abagyan
2003 by Taylor & Francis Group, LLC
47. Abagyan, R.A., M. Totrov, and D. Kuznetsov. ICMA new method for
protein modeling and design: applications to docking and structure predicition
from the distorted native conformation. J. Comp. Chem. 15(5):488506, 1993.
48. Branden, C.I. and T.A. Jones. Between objectivity and subjectivity. Nature
343(6260):687689, 1990.
49. Read, R.J. Improved Fourier coecients for maps using phases from partial
structures with errors. Acta Cryst. A42:140149, 1986.
50. Jones, T.A., J.-Y. Zou, C.S. W., and K. M. Improved methods for building
protein models in electron density maps and the location of errors in these
models. Acta Cryst. A47:110119, 1991.
51. Zhou, J. and S.L. Mowbray. An evaluation of the use of databases in protein
structure renement. Acta Cryst. D50:237249, 1993.
52. Vaguine, A.A., J. Richelle, and S.J. Wodak. SFCHECK: a unied set of
procedures for evaluating the quality of macromolecular structurefactor
data and their agreement with the atomic model. Acta Cryst D55:191205,
1999.
53. Hooft, R.W., C. Sander, and G. Vriend. Positioning hydrogen atoms by
optimizing hydrogen-bond networks in protein structures. Proteins
26(4):363376, 1996.
54. Bru nger, A.T. XPLOR Version 3.1. A system for X-ray Crystallography and
NMR. New Haven, CT: Yale University Press, 1992.
55. Bru nger, A.T., P.D. Adams, G.M. Clore, W.L. DeLano, P. Gros, R.W.
Grosse-Kunstleve, J.-S. Jiang, J. Kuszewski, M. Nilges, N.S. Pannu, R.J.
Read, L.M. Rice, S. T., and W.G. L. Crystallography NMR system: a new
software suite for macromolecular structure determination. Acta Cryst.
D54:922937, 1998.
56. Gronwald, W., R. Kirchhofer, A. Gorler, W. Kremer, B. Ganslmeier, K.P.
Neidig, and H.R. Kalbitzer. RFAC, a program for automated NMR R-factor
estimation. J Biomol NMR 17(2):137151, 2000.
57. Lipari, G. and A. Szabo. Model-Free approach to the interpretation of
nuclear magnetic resonance relaxation in macromoloecule. 2. Analysis of
experimental results. J Am Chem Soc 104(17):45590-4570, 1982.
58. Lipari, G. and A. Szabo. Model-free approach to the interpretation of nuclear
magnetic resonance relaxation in macromoloecule. 1. Theory and range of
validity. J Am Chem Soc, 104(17):45464559, 1982.
59. Tjandra, N. and A. Bax. Direct measurement of distances and angles in bio-
molecules by NMR in a dilute liquid crystalline medium. Science
278(5340):11111114, 1997.
60. Tjandra, N., J.G. Omichinski, A.M. Gronenborn, G.M. Clore, and A. Bax.
Use of dipolar 1H-15N and 1H-13C couplings in the structure determination
of magnetically oriented macromolecules in solution. Nat Struct Biol
4(9):732738, 1997.
61. Zweckstetter, M. and A. Bax. Predicition of sterically induced alignment in a
dilute liquid crystalline phase: aid to protein structure determination by
NMR. J Am Chem Soc 122:37913792, 2000.
Identifying Errors in 3D Protein Models 311
2003 by Taylor & Francis Group, LLC
62. Tjandra, N., D.S. Garrett, A.M. Gronenborn, A. Bax, and G.M. Clore.
Dening long range order in NMR structure determination from the depen-
dence of heteronuclear relaxation times on rotational diusion anisotropy.
Nat Struct Biol 4(6):443449, 1997.
63. Schwalbe, H., S.B. Grimshaw, A. Spencer, M. Buck, J. Boyd, C.M. Dobson,
C. Redeld, and L.J. Smith. A rened solution structure of hen lysozyme
determined using residual dipolar coupling data. Protein Sci 10(4):677688,
2001.
64. Skolnick, J., A. Kolinski, and A.R. Ortiz. MONSSTER: a method for folding
globular proteins with a small number of distance restraints. J Mol Biol
265(2):217241, 1997.
65. Linge, J.P. and M. Nilges. Inuence of nonbonded parameters on the quality
of NMR structures: a new force eld for NMR structure calculation. J
Biomolec NMR 13(1):5159, 1999.
66. Bateman, A., E. Birney, R. Durbin, S.R. Eddy, R.D. Finn, and E.L.
Sonnhammer. Pfam 3.1: 1313 multiple alignments and prole HMMs match
the majority of proteins. Nucleic Acids Res 27(1):260262, 1999.
67. Abagyan, R.A. and S. Batalov. Do aligned sequences share the same fold? J
Mol Biol 273(1):355368, 1997.
68. Feng, Z.K. and M.J. Sippl. Optimum superimposition of protein structures:
ambiguities and implications. Fold Des 1(2):123132, 1996.
69. Godzik, A. The structural alignment between two proteins: is there a unique
answer? Protein Sci, 5(7):13251338, 1996.
70. Fiser, A., R.K. Do, and A. Sali. Modeling of loops in protein structures.
Protein Sci 9(9):17531773., 2000.
71. Martin, A.C., M.W. MacArthur, and J.M. Thornton. Assessment of com-
parative modeling in CASP2. Proteins Suppl(1):1428, 1997.
72. Cardozo, T., S. Batalov, and R.A. Abagyan. Estimating local backbone struc-
tural deviation in homology models. Computers Chemistry 24:1331, 2000.
73. Moult, J., J.T. Pedersen, R. Judson, and K. Fidelis. A large-scale experiment
to assess protein structure prediction methods. Proteins 23(3):iiv., 1995.
74. Moult, J., T. Hubbard, S.H. Bryant, K. Fidelis, and J.T. Pedersen. Critical
assessment of methods of protein structure prediction (CASP): round II.
Proteins Suppl(1):26, 1997.
75. Moult, J., T. Hubbard, K. Fidelis, and J.T. Pedersen. Critical assessment of
methods of protein structure prediction (CASP): round III. Proteins
Suppl(3):26, 1999.
76. Mosimann, S., R. Meleshko, and M.N. James. A critical assessment of com-
parative molecular modeling of tertiary structures of proteins. Proteins
23(3):301317, 1995.
77. Venclovas, C., A. Zemla, K. Fidelis, and J. Moult. Criteria for evaluating
protein structures derived from comparative modeling. Proteins Suppl(1):7
13., 1997.
78. McLachlan, A.D. Gene duplications in the structural evolution of chymotryp-
sin. J.Mol. Biol. 128:4979, 1979.
312 Marsden and Abagyan
2003 by Taylor & Francis Group, LLC
79. Kabsch, W. A solution for the best rotation to relate two sets of vectors. Acta
Cryst. A32:922, 1976.
80. Lackner, P., W.A. Koppensteiner, M.J. Sippl, and F.S. Domingues. ProSup: a
rened tool for protein structure alignment. Protein Eng 13(11):745752,
2000.
81. Lackner, P., W.A. Koppensteiner, F.S. Domingues, and M.J. Sippl.
Automated large-scale evaluation of protein structure predictions. Proteins
37(S3):714, 1999.
82. Abagyan, R.A. and V.N. Maiorov. An automatic search for similar spatial
arrangements of alpha-helices and beta-strands in globular proteins. J
Biomolec Strut Design 6(6):10451060, 1989.
Identifying Errors in 3D Protein Models 313
2003 by Taylor & Francis Group, LLC