Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1
National Cancer Institute/Food & Drug ABSTRACT Introduction
Administration, Clinical Proteomics Pro- The vocabulary of proteomics and the We live in both a remarkable period in
gram; 2Division of Rheumatology, Johns swiftly-developing, technological na - the history of science and a time of un-
Hopkins University School of Medicine
ture of the field constitute substantial precedented opportunity in clinical
Vincent A. Fusaro, BS, Computer Scientist, barriers to clinical investigators. In investigation. As the quote from Osler
1
National Cancer Institute/Food & Drug
recent years, mass spectrometry has indicates, clinicians have long recog-
Administration, Clinical Proteomics Pro-
gram; John H. Stone, MD, MPH, Associate emerged as the most promising tech - nized that the critical mechanisms of
Professor of Medicine, Director, The Johns nique in this field. The purpose of this both health and disease play them-
Hopkins Vasculitis Center, Johns Hopkins review is to introduce the field of mass selves out at the level of the microvas-
University School of Medicine, Baltimore, spectrometry-based proteomics to clin - culature. The essential manufactured
Maryland, USA. ical investigators, to explain many of articles to which Osler refers were
Please direct correspondence and reprint the relevant terms, to introduce the recognized even in his day as pro-
requests to: John H. Stone, MD, MPH, equipment employed in this field, and teins. As the genetic codes effectors,
The Johns Hopkins Vasculitis Center, to outline approaches to asking clinical proteins determine the phenotype not
5501 Hopkins Bayview Circle, Baltimore,
questions using a proteomic approach. only of each cell, but also of every tis-
Maryland 21224, USA.
E-mail: jstone@jhmi.edu Examples of clinical applications of sue and organ (and ultimately the entire
proteomic techniques are provided organism). Although the concept that
Received on August 12, 2003; accepted
in revised form on September 27, 2003. from the fields of cancer and vasculitis Genes are destiny is true with regard
research, with an emphasis on a pat - to some disorders, in many others the
Clin Exp Rheumatol 2003; 21 (Suppl. 32):
S3-S14. tern recognition approach. identification of candidate genes has
revealed disappointingly little about
Copyright CLINICALAND
EXPERIMENTAL RHEUMATOLOGY 2003. Sir William Osler (1912) (1) disease biology, patients response to
"In the capillary lake into which the ar- therapy, the impact of lifestyle changes
Key words: Proteomics, vasculitis, terial stream widens, the current slows on health, and other important issues.
mass spectrometry. and the pressure lessens In the brief Such types of information are reflected
fraction of a second the business of more reliably in the levels of mRNA
life is transacted, for here is the mart or and even more so in the specific types
exchange in which the raw and the and quantities of the proteins them-
manufactured articles from the intesti- selves that are expressed (Fig. 1).
nal and hepatic shops are spread out for In theory all disease processes, even
sale". those based in single organs, lead to
Fig. 1. Diagram of assorted cellular processes leading to phenotype and the relationship of these
processes to proteomics.
S-3
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone
Fig. 2 . Prototypical proteomic profile illustrating several glossary terms. The profile is the readout from a single patient sample analyzed by tandem mass
spectrometry.
perturbations within the serum. As dis- The Proteome and Proteomics: Glossary of major terms
cussed in this article, proteomic studies Working definitions We present below a glossary of terms
in ovarian and prostate cancer support The words proteome and proteo- for which the meanings are not intu-
this concept (2,3). The application of mics did not even exist ten years ago. itively clear, despite their frequent use
proteomic techniques to human serum One may guess, from the burgeoning of in the proteomics literature. Beginning
may also have particular relevance to terms that end in -omics, that pro- with this glossary will orient the reader,
inflammatory vascular conditions such teomics is the study of the proteome. even if the context of all the terms is
as systemic vasculitis, a group of disor- But what does this term mean ? A pro- not apparent initially. The definitions
ders in which the site of pathology teome may be considered to be all of of the terms included often contain
the blood vessel wall is in direct con- the proteins linked to a given set of other terms that are defined elsewhere
tact with the serum. The ability to make genes (genome). These proteins in- in the glossary. These other terms are
accurate inferences about the state of clude not only those translated directly highlighted in bold. Figure 2 illustrates
pathology (or health) within organ sys- from genes but also those modified several of the glossary terms and other
tems by examining the fluid that per- after translation. All proteins present in concepts in this review.
fuses them has several major potential a cell or organism at a given time com- Abundance: The proteomics literature
advantages. First, because traces of the prise its proteome. Moreover, investi- refers to low abundance proteins
molecular footprints of disease are ex- gators also refer to subproteomes, (e.g., cytokines) and high abundance
pected to equilibrate (even at sub- which may be restricted to specific bio- proteins (e.g., albumin or immunoglob-
minute quantities) in the serum, the logical compartments, e.g., the inner ulins). The term abundance simply
strong possibility of sampling error that mitochondrial membrane. Proteomics means concentration. The development
often accompanies tissue biopsy is is the application of tools from fields as of methods by which low abundance
reduced substantially. Second, findings diverse as clinical medicine, molecular proteins may be studied in the setting
in the serum represent the sum of dis- biology, mass spectrometry, and bioin- of high abundance proteins that dwarf
ease processes in organs, even those in formatics to explore the separation, them by many orders of magnitude is
which clinical involvement is unrecog- identification, and characterization of one of the greatest conundrums con-
nized. This may be particularly rele- proteins, and to shape this wealth of fronting proteomics today.
vant to multi-organ system diseases information into new knowledge. Thus, Analytes: Proteins and peptides con-
such as vasculitis. Finally, because proteomics is not a single discipline but tained within the clinical sample to be
phlebotomy can be repeated essentially rather a collection of highly-special- analyzed.
as often as needed, serum investiga- ized forms of expertise, all of which Dynamic range: Refers to the proteins
tions provide relative ease of sampling may be brought to bear on many types and peptides within the proteome that
compared to the biopsy of solid organs. of clinical problems. are defined by a set of certain charac-
S-4
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW
S-5
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone
Fig. 4. Peak map (lower part of the figure, in red) showing how numerous analytes may be found around individual m/z values.
plementary. MS techniques have greater resolution ferred to as MS/MS (and, when cou-
Proteomic mass fingerprint (PMF): than their predecessors, albeit their pled to liquid chromatography, as LC-
Refers to a unique combination of ions dynamic ranges may be considerably MS/MS). Ions of a particular m/z value
whose overall intensity differences can narrower. are selected by a first mass analyzer,
segregate different states (e.g., samples SELDI: Abbreviation for surface- and then fragmented in a collision cell
from patients with cancer from those of enhanced laser desorption/ionization, (Fig. 3B). The masses of the ion frag-
patients who do not have cancer). As another type of platform for MS stud- ments are subsequently read out by a
discussed below, a PMF consisting of 5 ies. The SELDI technique performs second time-of-flight mass analyzer.A
ions has been shown to discriminate protein separation based on the ana- sequence as short as 5 amino acid re-
patients who have ovarian cancer from lytes surface charge. First applied to sidues may be sufficient to identify an
those who are at high risk but who are clinical medicine in the late 1990s, entire protein provided that the se-
cancer-free (2). SELDI represents a breakthrough in quence is not derived from a highly-
Resolution: Refers to the ability of a protein separation techniques because conserved motif.
mass spectrometer (or, more specifical- of its superiority (compared to two- Time-of-flight: Refers to the length of
ly, of its mass analyzer) to distinguish dimensional gel electropheresis, 2-DE) time required for proteins and peptides
between discrete analytes with similar in the detection of low MW ions and ionized from the surface of a protein
characteristics (see, for example, the ions of basic charge. chip to travel through the MS chamber
peak map in Fig. 4). In general, tandem Tandem mass spectrometry: Also re- to the detector plate. Time of flight is
S-6
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW
Table I. Functional groups of blood proteins*. off of the glomerular filtration appara-
tus (approximately 45 kDa). In theory,
Proteins secreted by solid tissues that act in serum
* Largely produced in the intestines and liver
proteins below this MW should be lost
* Include the classic serum proteins (e.g., albumin) in the urine on the first pass through the
Immunoglobulin
circulation. In order to remain in the
serum, these proteins must either be
Long-distance receptor ligands
* Classic peptide and protein hormones (e.g., erythropoietin and insulin)
part of larger protein complexes or pos-
sess other retention mechanisms. One
Local receptor ligands
* Cytokines
likely explanation is that many low
* Mediate local interactions and are subsequently diluted into serum at ineffective levels MW proteins are bound to albumin
* Native MWusually < kidney filtration cut-off and/or other high abundance proteins.
Temporary passengers Simple stoichiometry dictates that most
* Non-hormone proteins that traverse the serum transiently en route to the site of their small, low abundance peptides within
primary function (e.g., proteins secreted elsewhere but sequestered in lysosomes) the serum will be bound to larger,
Tissue leakage products charged species that are far more nu-
* Released into serum as a result of cell death/damage (e.g., troponin, creatine kinase) merous. This point has profound impli-
Aberrant secretions cations for any attempts to fractionate
* Tumor-associated proteins or secretions from other abnormal tissues serum specimens, simply because the
Foreign proteins removal of high abundance proteins al-
* Proteins related to infectious agents most certainly means that lower abun-
*Adapted from (21). dance proteins (peptides) are removed
as well.
S-7
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone
Volume of data
Some approaches to the challenge of
large quantities of data produced by
proteomic analyses are outlined in the
discussion of Bioinformatics, below.
Disease-specific challenges
Each individual disease poses its own
set of challenges in the design of pro-
teomic studies. Some of these chal-
lenges may include: 1) disease hetero-
geneity; 2) recognition of different
stages of disease; 3) the collection of
sufficient numbers of patients to pro-
vide adequate statistical power; 4) the
timing of sampling with regard to dis- (A) (B)
ease activity; and 5) the impact of treat- Fig. 5. Protein chips. A. Two examples of protein chips. Each contains 8 spots (one spot for each clin-
ment on proteomic profiles. Just as ical sample). The surfaces of the spots on the two types of chips shown contain different types of chip
technological limits create challenges chemistries. B. Protein chip being inserted into a mass spectrometer for its encounter with the laser.
to studying the proteome, the frequent
lack of well-characterized clinical pop- Limitations of older approaches orthogonally in the second dimension
ulations is another major hurdle that to protein separation by their relative molecular mass, typi-
must be overcome before the promise For three decades, the mainstay of pro- cally by SDS-PAGE. This approach has
proteomics can be realized. In the rush teimic analysis has been 2-DE (6-8). In two major limitations as a tool for pro-
to embrace technology, there is a risk 2-DE, separation in the first dimension teomics: firstly, 2-DE is ineffective at
of overlooking the requirement for is achieved by isoelectric focusing ac- distinguishing low-abundance proteins;
clean clinical phenotyping. cording to the proteins isoelectric and secondly, 2-DE analyses underrep-
point (pI). Proteins are then resolved resent basic and membrane proteins. In
Fig. 6. Positions on protein chip spots. Each spot is divided into tiny coordinates (the 20, 50, and 80 marks shown in the figure) known as positions that are
used to direct the laser to strike at precisely the same point on each spot. The right side of the figure shows the proteomic profile generated by the sample on
the spot.
S-8
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW
attempts to overcome these shortcom- for each clinical sample. Furthermore, data, during the first laser pulse (acqui-
ings, 2-DE analyses are now often cou- each spot has many positions (Fig. 6) sition #1) ionic species at m/z values of
pled with MS technology; that is, spots that are not visible to the naked eye but 3000, 5500 and 9800 hit the detector.
of interest are selected, digested, and which can be used to program the laser During the second pulse, only a species
then analyzed by MS. Even so, this to strike precisely the same coordinates with an m/z value of 3000 hit the detec-
application has a limited dynamic on each spot. There are many varieties tor; and so on. This type of data collec-
range and is generally effective at the of protein chips, each containing on tion is multiplied and averaged for all
identification of only high abundance their spot surfaces different substrates of the ionized analytes from a given
proteins. Because of its shortcomings, that are designed to target different sample. The intensities of these ion
replacement of 2-DE has become in dynamic ranges of peptides. Some sub- species, ultimately reflected to some
large measure the Holy Grail of pro- strates capture proteins with weakly degree in the height of peaks on a pro-
teomics. Although the disappearance of positive charges, whereas others have teomic profile (Fig. 2), are the sums of
2-DE is not likely to happen soon, affinities for metal ions such as nickel all the hits during the total acquisition
developments of the past few years or copper. Because of the overlapping time.
have made MS unrivaled for its accura- dynamic ranges that they target, differ - An important but counterintuitive point
cy in mass detection, its ability to ad- ent chip surfaces may be complemen- is that even tandem mass spectrometers
dress complex protein mixtures, its tary. The chip essentially performs a are, at best, only semi-quantitative
amenity to automation, and its high protein separation on its surface, and instruments. For both MALDI and ESI
throughput capabilities. samples can be pre-processed on the platforms, the relationship between the
basis of size exclusion, pH, pI, and amount of a given analyte present and
Mass spectrometry: The basic other features to further isolate proteins the measured signal intensity is com-
components of interest. In general, the same protein plex and non-linear. The reasons for
The machines chips used for SELDI analyses may this phenomenon remain poorly under-
With all MS instruments, peptides are also be used with tandem MS plat- stood.
ionized from samples using either a forms.
MALDI technique (from a solid state Robots for sample processing
sample) or ESI (directly from the liquid The matrix Swift advances in robotic instrumenta-
phase). Generic MS instruments are In the processing of samples, a matrix tion have led to tremendous increases
depicted in Figures 3A and B. Most of is added to the chip surface after the in both the throughput capabilities and
the following discussion focuses on application of the sample. The matrix reproducibility of MS. Robots can be
chip-based techniques of SELDI and forms a crystalline layer on top of the programmed to perform the entire chip
tandem MS. The specific instrument to sample. The matrix crystals help trans- preparation, including pre-treatment of
which we refer in the studies described fer the laser energy to the sample, the chip, sample application, and appli-
below is the ABI Q-STAR Hybrid LC/ thereby aiding the ionization process cation of matrix. The advantages of
MS/MS (Applied Biosystems; Foster and ultimately inducing the analytes to robots include not only speed but also
City, CA) for SELDI processing. The fly down the TOF tube. Without the consistency in sample processing. Until
upper limit of detection for this instru- addition of matrix, virtually no analytes recently, the typical time required to
ment is an m/z ratio of 12,000. become ionized. prepare 96 SELDI samples in the labo-
ratory of the NCI-FDA Clinical Pro-
The chips The laser teomics Program was approximately
The SELDI protein chips are rectangu- Ionization occurs when energy is trans- 3.5 to 4 hours. Even this representated
lar aluminum plates (Fig. 5) with ferred from a laser beam to the sample. a dramatic improvement over the time
approximate dimensions of 3" x 1/2" x As noted, in the interests of sample-to- that would be required to process a
1/4". Each chip has 8 spots one spot sample consistency, the laser pulses comparable number of samples by
may be directed to precisely the same hand. Recent breakthroughs in instru-
position on each spot. In the analysis of mentation have now decreased this
Table II. Data from a hypothetical data a clinical sample by MS, the laser may time to 1.5 hours, and even greater
mass spectrometry collection. be fired at the sample thousands of throughput should be possible in the
times a second. Each firing of the laser future.
Acquisition Ionic Species Detected
at the sample and the resultant data on
3000 m/z 5500 m/z 9800 m/z
the m/z ratios of analytes are termed an Mass spectrometry platforms
Laser Fire #1 1 1 1 acquisition. MALDI is used most often for the
Laser Fire #2 1 analysis of comparatively simple pep-
Laser Fire #3 1 1 The mass detector tide mixtures. Pharmaceutical compa-
Laser Fire #4 1 1 Airborne ions strike a detector that re- nies have used the MALDI platform for
cords the presence of a hit. As shown years in the development of new drugs,
Total Intensity 4 2 2
in Table II, which contains hypothetical specifically in the area of protein iden-
S-9
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone
tification. The standard procedure has proteins can be identified through the bound to the chip surface and ionized
been to query ion fragments identified analysis of collision-induced spectra, by the laser.
against protein libraries to determine which provide information about pep-
(when possible) the identity of their tide sequences. Collision-induced spec- Disease-specific examples of
parent proteins. tra are scanned against comprehensive functional proteomics
The development of the SELDI plat- protein sequence databases (using a Ovarian and prostate cancer
form offers several substantial advan- variety of possible algorithms). A pep- Using SELDI-TOF analyses of sera,
tages over two-dimensional polyacry- tide sequence tag approach identifies a investigators have developed a method
lamide gel electrophoresis (2D-PAGE) short amino acid sequence from the to distinguish the presence or absence
(9-11). First, SELDI-TOF can describe peak pattern that, coupled with infor- of neoplasia within the ovary and
entire populations of ions within serum mation about mass, permits determina- prostate (2,3). These studies indicate
simultaneously. Second, SELDI-TOF tion of the peptides origin (16). A tech- that low-MW proteomic patterns exist
analysis is capable of detecting pro- nique known as stable isotope label- in serum that reflect the pathologic
teins that are smaller than 10,000 Dal- ing now permits quantification of pep- state of the ovary and the prostate.
tons (Da), as well as proteins that are tide levels by MS/MS (17, 18). Moreover, these patterns can predict
basically charged. The group of pro- the presence of ovarian cancer (includ-
teins in this lower MW range are of The profiling of protein signatures ing Stage I disease) and early prostate
tremendous biologic potential because The notion of a peptide mass finger- cancer with a high degree of reliability.
they contain cleaved or aberrantly shed print (PMF) has existed for several Within the sera of these cancer patients,
proteins or peptides that may reflect decades. In concept, the PMF is very the use of novel bioinformatics tech-
essential features of a disease. Until simple: every disease will create char- niques has identified optimal proteom-
recently, these molecules were below acteristic changes within the proteome ic patterns that distinguish patients
the level of detection (12). that permit the identification, staging, with these types of malignancies from
A significant disadvantage of SELDI is and other profiling of that specific dis- relevant control groups. The flow dia-
that the technique does not provide a ease. A diseases PMF may be used to gram in Figure 7 provides an overview
sequence-based identification, because differentiate that disorder from other of this approach to proteomics. This
there are many proteins close to a given diseases and from states of health. The approach is based on the simultaneous
m/z ratio (Fig.4). The protein peaks combination of mass spectrometry and analysis of a pattern of proteins or pep-
representing potential markers cannot proteomics has become the method of tide fragments, rather than reliance
be identified without significant addi- choice for analyzing these differences. upon a pre-defined set of biomarkers.
tional effort. Tandem MS measure- The technique described below pos- The optimal discriminatory pattern
ments now provide the means to char- sesses the advantage of not requiring identified for ovarian cancer consisted
acterize specific post-translational fractionation and the consequent risk of of relative abundances of proteins at 5
modifications and to identify structural removing low MW peptides of interest. different MWs (534, 989, 2111, 2251,
differences between related proteins, All analytes within the dynamic range and 2465 Da) (2). In contrast, the opti-
differentially modified proteins, and of the SELDI platform are potentially mal discriminatory pattern identified
protein isoforms (13-15). Individual analyzable, provided that they become for prostate cancer consisted of relative
S-10
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW
S-11
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone
Bioinformatics
A variety of computational tools have
been designed or adapted to mine the
large amounts of data generated in pro-
teomic analyses. Detailed discussions
of these techniques are beyond the
scope of this review (and, frankly, be-
yond the interest of most clinical inves-
tigators). What follows is an overview
of the most common tools for analyz-
ing proteomic data, with an emphasis
on approaches to detecting patterns that
segregate one state from another.
The sheer mountain of data points ac-
quired by MS techniques can be over- Fig. 10. Conceptual
drawing of the con-
whelming in size, complexity, dimen- cept of a clustering
sionality (i.e., number of data points), algorithm.
and computational requirement. A typi-
cal data file from a single sample gen-
erated from a low resolution MS
technique (e.g., SELDI) has approxi- over, with data generated by a high res- into a training set and a testing set. As
mately 40,000 data points and a size of olution mass spectrometer and a com- the names imply, the training data are
800 KB. By way of comparison, a typi- bination of 10 m/z values designated to used to build the model, and the testing
cal e-mail message is approximately 8 segregate the same two states, the data to validate it. The optimal model
KB. Thus, the data contained within the analysis would take 6 x 1027 years. would have 100% sensitivity and spe-
file on one sample is equivalent to that Clearly, the brute force method is not cificity when applied to the testing
contained within 100 e-mail messages practical for such analyses. The field of data. For this example, we randomly
(complex routing information and all). bioinformatics is charged with parsing divide the samples in half: 45 remis-
Even more daunting are files generated solutions to these challenges. sion and 78 active GCA samples for
by high resolution instruments (e.g., both the training and testing phases.
tandem MS), which have approximate- Specific bioinformatic approaches: Figure 10 illustrates the concept of
ly 350,000 data points and sizes of 5 Focus on clustering clustering of patient samples in multi-
MB for each sample. Each data point Several computational strategies have dimensional space according to the
represents one m/z value and its corre- been employed in proteomics analyses number of features examined (i.e., the
sponding intensity. The challenge lies to date. In our own work, we have used specific number of ions (m/z values)
in trying to identify the feature or fea- an approach called "clustering". We used to discriminate clinical subsets).
tures that differentiate one state from discuss this strategy in some detail be- We would like to detect a model that
another. With very small sample sizes low. Other strategies employed include categorizes all remission samples and
(e.g., n <30) it may be possible to in- decision tree analysis, support vector all active GCA samples into their own
spect the samples visually and discern machines, principal component analy- distinct groups. Figure 10 shows an
differences in peak intensities. As the sis, and neural networks. example of the K-nearest neighbor
example below shows, however, this This practical example helps illustrate (Knn) method. This figure shows two
approach is not practical for large num- the clustering approach. Suppose that features, A and B, that are used to seg-
bers of samples. one would like to detect a proteomic regate the two groups. These features
Suppose that one is attempting to iden- fingerprint that segregates active giant represent any m/z value. The samples
tify a combination of 5 m/z values to cell arteritis (GCA) from remission and are then plotted according to their cor-
segregate two disease states (e.g., ac- that one has a total of 246 samples (90 responding intensity value for that par-
tive Takayasus arteritis versus remis- remission samples and 156 samples ticular feature. As Figure 10 indicates,
sion) using a low resolution mass spec- from patients with active GCA). The the active GCAsamples appear to clus-
trometer. In addition, assume that every samples will be run on a high resolu- ter in the lower portion of the graph.
possible combination of analytes will tion mass spectrometer. One therefore This means that, in general, active
be analyzed, and that (hopefully) one anticipates approximately 86 million GCA samples have an intensity that is
has access to the worlds fastest super- data points (246 samples x 350,000 higher for Feature B, but lower for Fea-
computer, which can perform 40 tril- data points/sample), comprising a po- ture A compared to the remission sam-
lion calculations/second. Under such tential total quantity of data of data of ples. The power of clustering comes
conditions, completion of the analysis 1.2 GB. In order to produce a validated when an unknown sample is then put
would require nearly 9 months! More- model, the samples must be divided through the model, as show by the
S-12
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW
black dot. The K in Knn represents calibration samples can then be com- References
the number of samples used to predict pared against custom models design- 1. OSLER W: An address on high pressure. Brit
an unknown sample. In this case K = 3, ed by individual laboratories. Med J 1912; 2: 1.
2. PETRICOIN EF III, ARDEKANI A, HITT BA et
which means that the 3 nearest neigh- * Failure to use control samples. Con- al.: A bioinformatics analytical method
bors are used to classify the sample. trol samples should be run with every reveals proteomic signatures of ovarian can-
Thus, the black dot would be classified study. In the case of protein chips, at cer in serum. Lancet 2002; 359: 572-7.
as a sample from a GCA patient, least one control sample should be 3. PETRICOIN EF III, ORNSTEIN DK, PAWE-
LETZ CP et al.: Serum proteomic patterns for
because 2 of the 3 nearest neighbors are placed randomly on a spot for each detection of prostate cancer. J Natl Cancer
samples from GCApatients (20). If the chip. The control sample can be used Inst 2002; 94: 1576-8.
algorithm is trained correctly, cluster- to track the process variability from 4. PUTNAM FW: The Plasma Proteins Struc -
ing can be a very powerful prediction sample preparation to mass spec- ture, Function, and Genetic Control. Acade-
mic Press, New York, 1975-87, pp. 1-55.
tool because of its natural ability to trometer acquisition. 5. ADKINS JN, VARNUM SM, AUBERRY KJ et
generalize. * Failure to assign samples randomly to al.: Toward a human blood serum proteome:
training or testing sets. Samples should Analysis by multi-dimensional separation
coupled with mass spectrometry. Mol Cell
Sources of variability in proteomic be randomized to either the training
Proteom 2003; 1: 947-55.
studies and critical quality control or testing phases. Clustering algo- 6. GORG A, WEISS W : Analytical IPG-Dalt.
measures rithms and other means of parsing pro- Methods Mol Biol 1999; 112: 189-95.
During the past few years, fantastic teomics data are very good at finding 7. GORG A, OBERMAIER C, BOGUTH G et al.:
The current state of two-dimensional elec-
claims have been made for the deriva- any difference between groups of
trophoresis with immobilize pH gradients.
tion of diagnostic tests through pro- interest. Without randomization of Electrophoresis 2000; 21: 1037-53.
teomics. Most of these claims have not samples, differences detected be- 8. GYGI SP, CORTHALS GL, ZHANG Y et al.:
or will not stand up under further scru- tween two sets of samples may have Evaluation of two-dimensional gel electro-
phoresis-based proteome analysis technolo-
tiny (testing in new populations of pa- little to do with biologic plausibility gy. Proc Natl Acad Sci USA 2000; 17: 9390-
tients, etc.). Unfortunately, relatively and more to do with systematic han- 5.
few papers have focused on quality dling differences in the samples. 9. RICHTER R, SCHULZ-KNAPPE P, SCHRADER
control issues in proteomics and on M et al.: Composition of the peptide fraction
in human blood plasma: database of circulat-
methods of qualifying samples for ana- Bench to bedside collaborations ing human peptides. J Chromatogr Biomed
lysis in the first place. Rigorous quality The variety of skills needed to conduct Sci Appl 1999; 726: 25-35.
control efforts are essential to every cutting edge translational research in 10. LEUNG S-M, ZEYDA T, THATCHER B: A
stage, from the collection of samples to proteomics today calls for collabora- new and rapid method for phenotype screen-
ing using Seldi Proteinchip arrays de-
operating the MS instrument to the sta- tion among individuals with expertise monstrated on serum from knockout and wild
tistical analysis of data. The prediction in many disparate fields. Indeed, the type mice. Mol Biol Cell 1998; 9 (Suppl.):
power of bioinformatics algorithms is collaborative nature of proteomics in- 351a.
directly related to the quality of the vestigations is a paradigm for the man- 11. PAWELETZ CP, GILLESPIE JW, ORNSTEIN
DK et al.: Rapid protein display profiling of
data going in. The potential sources of ner in which much good science is con- cancer progression directly from human tis-
error in proteomic studies include (but ducted today. The most productive sue using a protein biochip. Drug Develop -
are not limited to): work will derive from the joint efforts ment Res 2000; 49: 34-42.
* Flawed procedures for the collection of scientists familiar with the type of 12. WILKENS MR, WILLIAMS KL, APPELRD, DF
H (Eds.): Proteome Research: New Frontiers
of sera. As discussed above, allowing rigorous laboratory techniques requir- in Functional Genomics. New York, Springer-
samples to sit for too long before pro- ed, computer scientists who can design Verlag, 1997.
cessing is the cardinal offense in this new bioinformatics approaches for this 13. AEBERSOLD R, MANN M : Mass spectrome-
category. field, and clinical investigators who try-based proteomics. Nature 2003; 422:
198-207.
* Improper calibration of instruments. know what questions are relevant to pa- 14. STEEN H, KUSTER B, FERNANDEZ M et al.:
Standard operating procedures must tient care. For this third group of inves- Detection of tyrosine phosphorylated pep-
be developed for the calibration of all tigators, a thorough understanding of tides by precursor ion scanning quadrupole
TOF mass spectrometry in positive ion mode.
MS instruments, which are inherently the disease of interest, the ability to
Anal Chem 2001; 73: 1440-8.
finicky. provide reliable data on well-character- 15. BALDWIN MA, MEDZIHRADSZKYKF, LOCK
* Faulty protein chips or faulty individ - ized patient cohorts, and a sufficient CM et al.: Matrix-assisted laser desorption/
ual spots on chips. In many cases, understanding of the technical issues of ionization coupled with quadrupole/orthogo-
nal acceleration time-of-flight mass spectro-
quality control within the industry proteomics are all essential to effective metry for protein discovery, identification
that produces commercially available collaborations. and structural analysis. Anal Chem 2001; 73:
protein chips and other implements 1707-20.
has been poor. Application of a cali- Acknowledgement 16. MANN M, WILM MS : Error tolerant identifi-
cation of peptides in sequence databases by
bration sample to one spot on each The authors thank Dr. E.F. Petricoin, III, peptide sequence tags. Anal Chem 1994; 66:
protein chip may help overcome this of the NCI/FDA Clinical Proteomics 4390-9.
problem. The spectra derived from Program for many helpful discussions. 17. CONRADS TP, ISSAQ HJ, VEENSTRA TD:
S-13
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone
New tools for quantitative phosphoproteome Cell Proteomics 2002; 1: 376-86. selection and classification using genetic
analysis. Biochem Biophys Res Commun 19. STONE JH, HOFFMAN GS, MERKEL PA et algorithms. Proceedings of the International
2002; 290: 885-90. al.: The Birmingham Vasculitis Activity Conference on Genetic Algorithms 1993,
18. ONG SE, BLAGOEV B, KRATCHMAROVA I Score for Wegeners Granulomatosis (BVAS University of Illinois, pages 557-64.
et al.: Stable isotope labeling by amino acids for WG): A disease-specific vasculitis activi- 21. ANDERSON NL, ANDERSON NG: The human
in cell culture, SILAC, as a simple and accu- ty index. Arthritis Rheum 2001; 44: 912-20. plasma proteome. Mol Cell Proteom 2002; 1:
rate approach to expression proteomics. Mol 20. PUNCH WF et al.: Further research on feature 845-67.
S-14