Sei sulla pagina 1di 12

Review

Mass spectrometry-based proteomics and analyses of serum:


A primer for the clinical investigator
V.A. Fusaro1, J.H. Stone2

1
National Cancer Institute/Food & Drug ABSTRACT Introduction
Administration, Clinical Proteomics Pro- The vocabulary of proteomics and the We live in both a remarkable period in
gram; 2Division of Rheumatology, Johns swiftly-developing, technological na - the history of science and a time of un-
Hopkins University School of Medicine
ture of the field constitute substantial precedented opportunity in clinical
Vincent A. Fusaro, BS, Computer Scientist, barriers to clinical investigators. In investigation. As the quote from Osler
1
National Cancer Institute/Food & Drug
recent years, mass spectrometry has indicates, clinicians have long recog-
Administration, Clinical Proteomics Pro-
gram; John H. Stone, MD, MPH, Associate emerged as the most promising tech - nized that the critical mechanisms of
Professor of Medicine, Director, The Johns nique in this field. The purpose of this both health and disease play them-
Hopkins Vasculitis Center, Johns Hopkins review is to introduce the field of mass selves out at the level of the microvas-
University School of Medicine, Baltimore, spectrometry-based proteomics to clin - culature. The essential manufactured
Maryland, USA. ical investigators, to explain many of articles to which Osler refers were
Please direct correspondence and reprint the relevant terms, to introduce the recognized even in his day as pro-
requests to: John H. Stone, MD, MPH, equipment employed in this field, and teins. As the genetic codes effectors,
The Johns Hopkins Vasculitis Center, to outline approaches to asking clinical proteins determine the phenotype not
5501 Hopkins Bayview Circle, Baltimore,
questions using a proteomic approach. only of each cell, but also of every tis-
Maryland 21224, USA.
E-mail: jstone@jhmi.edu Examples of clinical applications of sue and organ (and ultimately the entire
proteomic techniques are provided organism). Although the concept that
Received on August 12, 2003; accepted
in revised form on September 27, 2003. from the fields of cancer and vasculitis Genes are destiny is true with regard
research, with an emphasis on a pat - to some disorders, in many others the
Clin Exp Rheumatol 2003; 21 (Suppl. 32):
S3-S14. tern recognition approach. identification of candidate genes has
revealed disappointingly little about
Copyright CLINICALAND
EXPERIMENTAL RHEUMATOLOGY 2003. Sir William Osler (1912) (1) disease biology, patients response to
"In the capillary lake into which the ar- therapy, the impact of lifestyle changes
Key words: Proteomics, vasculitis, terial stream widens, the current slows on health, and other important issues.
mass spectrometry. and the pressure lessens In the brief Such types of information are reflected
fraction of a second the business of more reliably in the levels of mRNA
life is transacted, for here is the mart or and even more so in the specific types
exchange in which the raw and the and quantities of the proteins them-
manufactured articles from the intesti- selves that are expressed (Fig. 1).
nal and hepatic shops are spread out for In theory all disease processes, even
sale". those based in single organs, lead to

Fig. 1. Diagram of assorted cellular processes leading to phenotype and the relationship of these
processes to proteomics.

S-3
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone

Fig. 2 . Prototypical proteomic profile illustrating several glossary terms. The profile is the readout from a single patient sample analyzed by tandem mass
spectrometry.

perturbations within the serum. As dis- The Proteome and Proteomics: Glossary of major terms
cussed in this article, proteomic studies Working definitions We present below a glossary of terms
in ovarian and prostate cancer support The words proteome and proteo- for which the meanings are not intu-
this concept (2,3). The application of mics did not even exist ten years ago. itively clear, despite their frequent use
proteomic techniques to human serum One may guess, from the burgeoning of in the proteomics literature. Beginning
may also have particular relevance to terms that end in -omics, that pro- with this glossary will orient the reader,
inflammatory vascular conditions such teomics is the study of the proteome. even if the context of all the terms is
as systemic vasculitis, a group of disor- But what does this term mean ? A pro- not apparent initially. The definitions
ders in which the site of pathology teome may be considered to be all of of the terms included often contain
the blood vessel wall is in direct con- the proteins linked to a given set of other terms that are defined elsewhere
tact with the serum. The ability to make genes (genome). These proteins in- in the glossary. These other terms are
accurate inferences about the state of clude not only those translated directly highlighted in bold. Figure 2 illustrates
pathology (or health) within organ sys- from genes but also those modified several of the glossary terms and other
tems by examining the fluid that per- after translation. All proteins present in concepts in this review.
fuses them has several major potential a cell or organism at a given time com- Abundance: The proteomics literature
advantages. First, because traces of the prise its proteome. Moreover, investi- refers to low abundance proteins
molecular footprints of disease are ex- gators also refer to subproteomes, (e.g., cytokines) and high abundance
pected to equilibrate (even at sub- which may be restricted to specific bio- proteins (e.g., albumin or immunoglob-
minute quantities) in the serum, the logical compartments, e.g., the inner ulins). The term abundance simply
strong possibility of sampling error that mitochondrial membrane. Proteomics means concentration. The development
often accompanies tissue biopsy is is the application of tools from fields as of methods by which low abundance
reduced substantially. Second, findings diverse as clinical medicine, molecular proteins may be studied in the setting
in the serum represent the sum of dis- biology, mass spectrometry, and bioin- of high abundance proteins that dwarf
ease processes in organs, even those in formatics to explore the separation, them by many orders of magnitude is
which clinical involvement is unrecog- identification, and characterization of one of the greatest conundrums con-
nized. This may be particularly rele- proteins, and to shape this wealth of fronting proteomics today.
vant to multi-organ system diseases information into new knowledge. Thus, Analytes: Proteins and peptides con-
such as vasculitis. Finally, because proteomics is not a single discipline but tained within the clinical sample to be
phlebotomy can be repeated essentially rather a collection of highly-special- analyzed.
as often as needed, serum investiga- ized forms of expertise, all of which Dynamic range: Refers to the proteins
tions provide relative ease of sampling may be brought to bear on many types and peptides within the proteome that
compared to the biopsy of solid organs. of clinical problems. are defined by a set of certain charac-

S-4
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW

assisted laser desorption/ionization, a


traditional platform for mass spectrom-
etry. MALDI consists of a stainless
steel plate onto which the sample is
spotted directly.With the MALDI tech-
nique, analytes are sublimated (i.e.,
taken directly from the solid to the gas-
eous phase) and ionized out of a dry,
crystalline matrix by laser pulses. Pro-
tein separation using affinity columns
or other fractionation techniques is
usually performed before the applica-
tion of mass spectrometry by MALDI.
Mass-to-charge ratio: (Abbreviated
m/z). The ratio of the mass of an ion-
ized peptide or protein to its overall
charge. The m/z ratio comprises the X-
axis (Fig. 2) on the output of proteomic
spectra from mass spectrometers, and
corresponds loosely to MW. Mass
spectrometrists often speak of an ions
Fig. 3. Schematic diagrams of mass spectrometers. "mass", when technically they are re-
A. Standard low-resolution mass spectrometer, typical of those used in MALDI and SELDI analyses. ferring to its m/z ratio. This convenient
B. Tandem mass spectrometer. The principal distinguishing feature from low-resolution instruments is
the presence of a collision cell in which peptides are broken apart by collision with an inert gas, per-
way of speaking is really only accurate
mitting in many cases sequence analysis and parent protein identification. in the case of singly-charged ions.
Mass spectrometry (MS): (Fig. 3A and
B) An instrument that measures the
teristics, e.g., molecular weight (MW), some cases, the removal of interfer- masses of individual molecules that
charge, abundance, or other features. ence by such proteins facilitates the have been converted to ions, i.e. that
Approaches to proteomic analyses may analysis of other proteins of interest are electrically charged. In general, a
be evaluated in part by the dynamic (e.g., those of lower abundance) that mass spectrometer has three compo-
range of the proteome to which they are theoretically more pertinent to the nents: 1) a chamber that holds the ion
provide access. With regard to MW, for disease of interest. Fractionation must source (the clinical sample from which
example, some proteomic techniques be distinguished from separation, analytes are ionized via laser); 2) a
(see SELDI) that are highly effective which is the differentiation of proteins detector that registers the number of
in analyses of ions and peptides in the and peptides from each other that usu- ions at each m/z value; and, 3) a mass
range of 700-12,000 Daltons (Da) may ally occurs (by mass spectrometry or analyzer that measures the m/z ratio of
have little utility at MW beyond this another technique) after fractionation the ionized analytes. A mass spectrom-
range. has been performed. eter has the ability to analyze samples
Electrospray ionization (ESI): A tech- Intensity: Refers to the height of a peak processed on a variety of platforms, in-
nique commonly used to volatilize and at a given mass-to-charge ratio in a pro- cluding MALDI, SELDI, and ESI.
ionize proteins or peptides for mass teomic profile (Fig. 2). The intensity is Peptide mass mapping: One method of
spectrometric analysis. ESI ionizes the number of times an ion of a particu- identifying proteins whose masses
analytes out of solution. It is readily lar mass:charge ratio strikes the ana- have been determined by MS. The
coupled, therefore, to liquid-based pro- lyte detector during a data acquisition identity of proteins is established by
tein separation tools such as liquid period. An important (but counterintu- matching the analytescalculated mass-
chromatography (LC). Integrated sys- itive) point is that the intensity of a es with the lists of all peptide masses at
tems of LC and mass spectrometry given peak correlates poorly with the entries in publicly-accessible databases
(LC-MS), now based on ESI, are the quantity of ion in a specimen. (e.g., SWISS-PROT or TrEMBL). A
preferred technique for the analyzing Ions: Strictly speaking, in mass spec- more cutting-edge method of protein
complex samples. trometry the analytes are typically ions identification, which exploits the capa-
Fractionation: A step in the prepara- (charged particles) rather than full pro- bilities of tandem MS (Fig. 3B), is the
tion of samples for some types of pro- teins or peptide fragments. In their ion- analysis of collision-induced spectra,
teomic analysis. Fractionation involves ized state, analytes may be separated described below. Because neither pep-
the use of a variety of techniques to re- by the mass spectrometer on the basis tide mass mapping nor tandem MS is
move certain proteins from a sample of their mass:charge ratios. capable of identifying all peptides or
(e.g., high abundance proteins). In MALDI: An abbreviation for matrix- proteins, the two approaches are com-

S-5
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone

Fig. 4. Peak map (lower part of the figure, in red) showing how numerous analytes may be found around individual m/z values.

plementary. MS techniques have greater resolution ferred to as MS/MS (and, when cou-
Proteomic mass fingerprint (PMF): than their predecessors, albeit their pled to liquid chromatography, as LC-
Refers to a unique combination of ions dynamic ranges may be considerably MS/MS). Ions of a particular m/z value
whose overall intensity differences can narrower. are selected by a first mass analyzer,
segregate different states (e.g., samples SELDI: Abbreviation for surface- and then fragmented in a collision cell
from patients with cancer from those of enhanced laser desorption/ionization, (Fig. 3B). The masses of the ion frag-
patients who do not have cancer). As another type of platform for MS stud- ments are subsequently read out by a
discussed below, a PMF consisting of 5 ies. The SELDI technique performs second time-of-flight mass analyzer.A
ions has been shown to discriminate protein separation based on the ana- sequence as short as 5 amino acid re-
patients who have ovarian cancer from lytes surface charge. First applied to sidues may be sufficient to identify an
those who are at high risk but who are clinical medicine in the late 1990s, entire protein provided that the se-
cancer-free (2). SELDI represents a breakthrough in quence is not derived from a highly-
Resolution: Refers to the ability of a protein separation techniques because conserved motif.
mass spectrometer (or, more specifical- of its superiority (compared to two- Time-of-flight: Refers to the length of
ly, of its mass analyzer) to distinguish dimensional gel electropheresis, 2-DE) time required for proteins and peptides
between discrete analytes with similar in the detection of low MW ions and ionized from the surface of a protein
characteristics (see, for example, the ions of basic charge. chip to travel through the MS chamber
peak map in Fig. 4). In general, tandem Tandem mass spectrometry: Also re- to the detector plate. Time of flight is

S-6
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW

Table I. Functional groups of blood proteins*. off of the glomerular filtration appara-
tus (approximately 45 kDa). In theory,
Proteins secreted by solid tissues that act in serum
* Largely produced in the intestines and liver
proteins below this MW should be lost
* Include the classic serum proteins (e.g., albumin) in the urine on the first pass through the
Immunoglobulin
circulation. In order to remain in the
serum, these proteins must either be
Long-distance receptor ligands
* Classic peptide and protein hormones (e.g., erythropoietin and insulin)
part of larger protein complexes or pos-
sess other retention mechanisms. One
Local receptor ligands
* Cytokines
likely explanation is that many low
* Mediate local interactions and are subsequently diluted into serum at ineffective levels MW proteins are bound to albumin
* Native MWusually < kidney filtration cut-off and/or other high abundance proteins.
Temporary passengers Simple stoichiometry dictates that most
* Non-hormone proteins that traverse the serum transiently en route to the site of their small, low abundance peptides within
primary function (e.g., proteins secreted elsewhere but sequestered in lysosomes) the serum will be bound to larger,
Tissue leakage products charged species that are far more nu-
* Released into serum as a result of cell death/damage (e.g., troponin, creatine kinase) merous. This point has profound impli-
Aberrant secretions cations for any attempts to fractionate
* Tumor-associated proteins or secretions from other abnormal tissues serum specimens, simply because the
Foreign proteins removal of high abundance proteins al-
* Proteins related to infectious agents most certainly means that lower abun-
*Adapted from (21). dance proteins (peptides) are removed
as well.

abbreviated TOF, as in SELDI- ty. Because of splicing, processing, Dynamic range


TOF or MALDI-TOF or quadru- post-translational modifications, and Among the high abundance proteins,
pole-TOF (Q-TOF). The fundamental other events that occur once proteins serum albumin has a concentration of
principle that permits MS to separate have been made, the proteome is con- 35-50 pg/ml. This single protein ac-
analytes is the fact that small ions fly siderably more complex than the ge- counts for fully 55-60% of all proteins
faster than large ones. The ions m/z nome. In contrast to the 30,000 within the serum. In contrast, at the low
ratios may be calculated from the time 50,000 genes that comprise the human abundance end the concentration of
that each requires to reach the detector genome, serum probably contains mil- interleukin-6 between 1 and 5 pg/ml
plate. Differences in TOF permit the lions of polypeptide species, spanning is 10-10 smaller. Comparing the con-
distinction and, in many cases, the a staggering concentration range of 10 centration of IL-6 in the circulation to
identification (by tandem MS) of dif- orders of magnitude. A sobering fact that of albumin is analogous to com-
ferent peptides. today is that even with the most robust paring the mass of a single human
MS techniques, only about 500 pro- being to the combined mass of the
The proteomes inherent challenges teins have been identified to date (5). entire human population (now nearly 7
Examinations of a blood substance (These include many of the proteins billion people). The task of measuring
called albumin began as early as the used today in clinical evaluations, e.g., masses across such an enormous con-
1830s (4). Thus, in some ways only the creatine kinase, troponin, and aspartate centration range constitutes one of the
name proteomics is new. Our appre- and alanine aminotransferase). A list of major challenges to complete descrip-
ciation of the depth and complexity of the broad functional groups of blood tion of the proteome. Even LC/MS/
the proteome continues to evolve with proteins known currently is shown in MS, the most versatile method for un-
the development of new techniques for Table I. Contrary to the status of the hu- biased protein discovery, has a maxi-
studying it. Studying the proteome has man genome, a full description of the mal dynamic range of only 104. Inde-
numerous challenges, some inherent to human proteome is a task for which pendent fractionation methods expand
the nature of protein mixtures them- completion is not even nearly in sight. the possible dynamic range by only an
selves and others more specific to the Moreover, in contrast to the shotgun additional 102 or so.
potential application (i.e., the disease sequencing approach that permitted the
or clinical question of interest). rapid completion of the Human Ge- Timing
nome Project, there is not yet a clear The proteome is vibrant, changing con-
Complexity road map or set of techniques for map- tinually in response to its environment
Until very recently, the concept of one ping the entire proteome. even after removal of a clinical speci-
gene, one protein was regarded as fun- men from a patient. In order to study
damental to biology. We now recognize Protein binding the proteome accurately therefore,
that this concept is a remarkable under- More than half of the known proteins samples must be processed in a swift
estimation of the proteomes complexi- are smaller than the presumed size cut- and uniform fashion. Failure to process

S-7
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone

samples quickly permits ongoing post-


translational modifications to alter their
detected phenotypes. MS can detect
differences between samples processed
immediately and those stored overnight
in a refrigerator before processing, and
also between samples that have been
thawed only once before analysis and
those that have been thawed several
times.

Volume of data
Some approaches to the challenge of
large quantities of data produced by
proteomic analyses are outlined in the
discussion of Bioinformatics, below.

Disease-specific challenges
Each individual disease poses its own
set of challenges in the design of pro-
teomic studies. Some of these chal-
lenges may include: 1) disease hetero-
geneity; 2) recognition of different
stages of disease; 3) the collection of
sufficient numbers of patients to pro-
vide adequate statistical power; 4) the
timing of sampling with regard to dis- (A) (B)
ease activity; and 5) the impact of treat- Fig. 5. Protein chips. A. Two examples of protein chips. Each contains 8 spots (one spot for each clin-
ment on proteomic profiles. Just as ical sample). The surfaces of the spots on the two types of chips shown contain different types of chip
technological limits create challenges chemistries. B. Protein chip being inserted into a mass spectrometer for its encounter with the laser.
to studying the proteome, the frequent
lack of well-characterized clinical pop- Limitations of older approaches orthogonally in the second dimension
ulations is another major hurdle that to protein separation by their relative molecular mass, typi-
must be overcome before the promise For three decades, the mainstay of pro- cally by SDS-PAGE. This approach has
proteomics can be realized. In the rush teimic analysis has been 2-DE (6-8). In two major limitations as a tool for pro-
to embrace technology, there is a risk 2-DE, separation in the first dimension teomics: firstly, 2-DE is ineffective at
of overlooking the requirement for is achieved by isoelectric focusing ac- distinguishing low-abundance proteins;
clean clinical phenotyping. cording to the proteins isoelectric and secondly, 2-DE analyses underrep-
point (pI). Proteins are then resolved resent basic and membrane proteins. In

Fig. 6. Positions on protein chip spots. Each spot is divided into tiny coordinates (the 20, 50, and 80 marks shown in the figure) known as positions that are
used to direct the laser to strike at precisely the same point on each spot. The right side of the figure shows the proteomic profile generated by the sample on
the spot.

S-8
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW

attempts to overcome these shortcom- for each clinical sample. Furthermore, data, during the first laser pulse (acqui-
ings, 2-DE analyses are now often cou- each spot has many positions (Fig. 6) sition #1) ionic species at m/z values of
pled with MS technology; that is, spots that are not visible to the naked eye but 3000, 5500 and 9800 hit the detector.
of interest are selected, digested, and which can be used to program the laser During the second pulse, only a species
then analyzed by MS. Even so, this to strike precisely the same coordinates with an m/z value of 3000 hit the detec-
application has a limited dynamic on each spot. There are many varieties tor; and so on. This type of data collec-
range and is generally effective at the of protein chips, each containing on tion is multiplied and averaged for all
identification of only high abundance their spot surfaces different substrates of the ionized analytes from a given
proteins. Because of its shortcomings, that are designed to target different sample. The intensities of these ion
replacement of 2-DE has become in dynamic ranges of peptides. Some sub- species, ultimately reflected to some
large measure the Holy Grail of pro- strates capture proteins with weakly degree in the height of peaks on a pro-
teomics. Although the disappearance of positive charges, whereas others have teomic profile (Fig. 2), are the sums of
2-DE is not likely to happen soon, affinities for metal ions such as nickel all the hits during the total acquisition
developments of the past few years or copper. Because of the overlapping time.
have made MS unrivaled for its accura- dynamic ranges that they target, differ - An important but counterintuitive point
cy in mass detection, its ability to ad- ent chip surfaces may be complemen- is that even tandem mass spectrometers
dress complex protein mixtures, its tary. The chip essentially performs a are, at best, only semi-quantitative
amenity to automation, and its high protein separation on its surface, and instruments. For both MALDI and ESI
throughput capabilities. samples can be pre-processed on the platforms, the relationship between the
basis of size exclusion, pH, pI, and amount of a given analyte present and
Mass spectrometry: The basic other features to further isolate proteins the measured signal intensity is com-
components of interest. In general, the same protein plex and non-linear. The reasons for
The machines chips used for SELDI analyses may this phenomenon remain poorly under-
With all MS instruments, peptides are also be used with tandem MS plat- stood.
ionized from samples using either a forms.
MALDI technique (from a solid state Robots for sample processing
sample) or ESI (directly from the liquid The matrix Swift advances in robotic instrumenta-
phase). Generic MS instruments are In the processing of samples, a matrix tion have led to tremendous increases
depicted in Figures 3A and B. Most of is added to the chip surface after the in both the throughput capabilities and
the following discussion focuses on application of the sample. The matrix reproducibility of MS. Robots can be
chip-based techniques of SELDI and forms a crystalline layer on top of the programmed to perform the entire chip
tandem MS. The specific instrument to sample. The matrix crystals help trans- preparation, including pre-treatment of
which we refer in the studies described fer the laser energy to the sample, the chip, sample application, and appli-
below is the ABI Q-STAR Hybrid LC/ thereby aiding the ionization process cation of matrix. The advantages of
MS/MS (Applied Biosystems; Foster and ultimately inducing the analytes to robots include not only speed but also
City, CA) for SELDI processing. The fly down the TOF tube. Without the consistency in sample processing. Until
upper limit of detection for this instru- addition of matrix, virtually no analytes recently, the typical time required to
ment is an m/z ratio of 12,000. become ionized. prepare 96 SELDI samples in the labo-
ratory of the NCI-FDA Clinical Pro-
The chips The laser teomics Program was approximately
The SELDI protein chips are rectangu- Ionization occurs when energy is trans- 3.5 to 4 hours. Even this representated
lar aluminum plates (Fig. 5) with ferred from a laser beam to the sample. a dramatic improvement over the time
approximate dimensions of 3" x 1/2" x As noted, in the interests of sample-to- that would be required to process a
1/4". Each chip has 8 spots one spot sample consistency, the laser pulses comparable number of samples by
may be directed to precisely the same hand. Recent breakthroughs in instru-
position on each spot. In the analysis of mentation have now decreased this
Table II. Data from a hypothetical data a clinical sample by MS, the laser may time to 1.5 hours, and even greater
mass spectrometry collection. be fired at the sample thousands of throughput should be possible in the
times a second. Each firing of the laser future.
Acquisition Ionic Species Detected
at the sample and the resultant data on
3000 m/z 5500 m/z 9800 m/z
the m/z ratios of analytes are termed an Mass spectrometry platforms
Laser Fire #1 1 1 1 acquisition. MALDI is used most often for the
Laser Fire #2 1 analysis of comparatively simple pep-
Laser Fire #3 1 1 The mass detector tide mixtures. Pharmaceutical compa-
Laser Fire #4 1 1 Airborne ions strike a detector that re- nies have used the MALDI platform for
cords the presence of a hit. As shown years in the development of new drugs,
Total Intensity 4 2 2
in Table II, which contains hypothetical specifically in the area of protein iden-

S-9
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone

Fig. 7. Schematic diagram of the approach to pattern recognition in proteomic studies.

tification. The standard procedure has proteins can be identified through the bound to the chip surface and ionized
been to query ion fragments identified analysis of collision-induced spectra, by the laser.
against protein libraries to determine which provide information about pep-
(when possible) the identity of their tide sequences. Collision-induced spec- Disease-specific examples of
parent proteins. tra are scanned against comprehensive functional proteomics
The development of the SELDI plat- protein sequence databases (using a Ovarian and prostate cancer
form offers several substantial advan- variety of possible algorithms). A pep- Using SELDI-TOF analyses of sera,
tages over two-dimensional polyacry- tide sequence tag approach identifies a investigators have developed a method
lamide gel electrophoresis (2D-PAGE) short amino acid sequence from the to distinguish the presence or absence
(9-11). First, SELDI-TOF can describe peak pattern that, coupled with infor- of neoplasia within the ovary and
entire populations of ions within serum mation about mass, permits determina- prostate (2,3). These studies indicate
simultaneously. Second, SELDI-TOF tion of the peptides origin (16). A tech- that low-MW proteomic patterns exist
analysis is capable of detecting pro- nique known as stable isotope label- in serum that reflect the pathologic
teins that are smaller than 10,000 Dal- ing now permits quantification of pep- state of the ovary and the prostate.
tons (Da), as well as proteins that are tide levels by MS/MS (17, 18). Moreover, these patterns can predict
basically charged. The group of pro- the presence of ovarian cancer (includ-
teins in this lower MW range are of The profiling of protein signatures ing Stage I disease) and early prostate
tremendous biologic potential because The notion of a peptide mass finger- cancer with a high degree of reliability.
they contain cleaved or aberrantly shed print (PMF) has existed for several Within the sera of these cancer patients,
proteins or peptides that may reflect decades. In concept, the PMF is very the use of novel bioinformatics tech-
essential features of a disease. Until simple: every disease will create char- niques has identified optimal proteom-
recently, these molecules were below acteristic changes within the proteome ic patterns that distinguish patients
the level of detection (12). that permit the identification, staging, with these types of malignancies from
A significant disadvantage of SELDI is and other profiling of that specific dis- relevant control groups. The flow dia-
that the technique does not provide a ease. A diseases PMF may be used to gram in Figure 7 provides an overview
sequence-based identification, because differentiate that disorder from other of this approach to proteomics. This
there are many proteins close to a given diseases and from states of health. The approach is based on the simultaneous
m/z ratio (Fig.4). The protein peaks combination of mass spectrometry and analysis of a pattern of proteins or pep-
representing potential markers cannot proteomics has become the method of tide fragments, rather than reliance
be identified without significant addi- choice for analyzing these differences. upon a pre-defined set of biomarkers.
tional effort. Tandem MS measure- The technique described below pos- The optimal discriminatory pattern
ments now provide the means to char- sesses the advantage of not requiring identified for ovarian cancer consisted
acterize specific post-translational fractionation and the consequent risk of of relative abundances of proteins at 5
modifications and to identify structural removing low MW peptides of interest. different MWs (534, 989, 2111, 2251,
differences between related proteins, All analytes within the dynamic range and 2465 Da) (2). In contrast, the opti-
differentially modified proteins, and of the SELDI platform are potentially mal discriminatory pattern identified
protein isoforms (13-15). Individual analyzable, provided that they become for prostate cancer consisted of relative

S-10
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW

(Ciphergen Biosystems, Fremont, CA),


we have analyzed 16 sera from eight
WG patients, one sample from a period
of active disease and another from
remission for each patient. All patients
had severe disease (defined as WG that
constitutes an immediate threat to the
patient's life or to vital organ function)
at the time of initial sampling. Remis-
sion samples were obtained between 9
and 15 months after the start of treat-
ment. The Birmingham Vasculitis acti-
vity scores for WG (19) for the patients
had a mean of 8 (range: 4-14) during
active disease, and was zero for all pa-
tients during remission.
The chip preparation protocol used (for
WCX2 chips; Ciphergen Biosystems;
Fremont, CA.) was designed to exam-
Fig. 8. Serum proteomic profiles from baseline and remission in a patient with Wegeners granulo- ine low MW proteins, particularly those
matosis. Even on visual inspection, major differences between these two profiles are evident. Note the
narrow band of intensity around the m/z value 7,000 in the baseline sample that is absent at remission. with m/z ratios of less than 15,000. Fig-
There are also two peaks just below 15,000 which are substantially more intense at baseline than in the ure 8 shows the serum proteomic pro-
remission sample. Finally, in the remission sample there is a broad range of intensity in the 2,500 files from baseline and remission in
3,000 range that is absent in the baseline sample. one of the patients, scanning all pro-
teins with m/z ratios between 1,000 and
abundances of proteins at 7 different now being performed in the context of 20,000. Even on visual inspection, ma-
MWs (2092, 2367, 2582, 3080, 4819, a multi-center clinical study. jor differences between these two pro-
5439, and 18220 Da), all distinct from files are evident at this magnification.
those that segregated ovarian cancer. Systemic vasculitis: Wegeners First, in the baseline sample there is a
Most strikingly, the serum proteomic granulomatosis narrow band of intensity around the
analyses were able to make the critical Preliminary work indicates that these m/z value 7,000 that is absent at remis-
distinction between two different types techniques are also highly relevant to sion. Second, the two peaks just below
of pathology in the prostate: frank pro- inflammatory diseases of blood ves- 15,000 are substantially more intense at
state cancer and benign prostatic hy- sels. We have performed a series of baseline than in the remission sample.
pertrophy (3). Confirmation of this ap- early studies in Wegeners granulo- Third, in the remission sample, there is
proach using a tandem MS platform is matosis (WG). Using WCX-2 chips a broad range of intensity in the 2,500
3,000 range that is absent in the base-
line sample. The contrasts in the pro-
teomic profiles between states of active
disease and remission, apparent even to
visual inspection at this low power, are
even more striking in magnified views
(Fig. 9). In all 8 patients there was the
consistent emergence of a peak in the
region of MW 10,500 Da as the pa-
tients clinical status changed from
active disease to remission. The range
of MWs tested in these protein chip
assays represents only a small portion
of the entire serum protein spectrum.
Other MW ranges can be evaluated by
slight alterations in the chip prepara-
tion techniques. Although these find-
ings are preliminary, they underscore
Fig. 9. A magnified view of the m/z ratios between 15,000 and 17,000 in the baseline and remission the potential of this technology when
serum proteomic profiles from patient D. Two unequivocal peaks evident in the baseline sample, near applied to well-characterized patient
15,500 and 16,200. These peaks are completely absent in the sample from remission.
groups.

S-11
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone

Bioinformatics
A variety of computational tools have
been designed or adapted to mine the
large amounts of data generated in pro-
teomic analyses. Detailed discussions
of these techniques are beyond the
scope of this review (and, frankly, be-
yond the interest of most clinical inves-
tigators). What follows is an overview
of the most common tools for analyz-
ing proteomic data, with an emphasis
on approaches to detecting patterns that
segregate one state from another.
The sheer mountain of data points ac-
quired by MS techniques can be over- Fig. 10. Conceptual
drawing of the con-
whelming in size, complexity, dimen- cept of a clustering
sionality (i.e., number of data points), algorithm.
and computational requirement. A typi-
cal data file from a single sample gen-
erated from a low resolution MS
technique (e.g., SELDI) has approxi- over, with data generated by a high res- into a training set and a testing set. As
mately 40,000 data points and a size of olution mass spectrometer and a com- the names imply, the training data are
800 KB. By way of comparison, a typi- bination of 10 m/z values designated to used to build the model, and the testing
cal e-mail message is approximately 8 segregate the same two states, the data to validate it. The optimal model
KB. Thus, the data contained within the analysis would take 6 x 1027 years. would have 100% sensitivity and spe-
file on one sample is equivalent to that Clearly, the brute force method is not cificity when applied to the testing
contained within 100 e-mail messages practical for such analyses. The field of data. For this example, we randomly
(complex routing information and all). bioinformatics is charged with parsing divide the samples in half: 45 remis-
Even more daunting are files generated solutions to these challenges. sion and 78 active GCA samples for
by high resolution instruments (e.g., both the training and testing phases.
tandem MS), which have approximate- Specific bioinformatic approaches: Figure 10 illustrates the concept of
ly 350,000 data points and sizes of 5 Focus on clustering clustering of patient samples in multi-
MB for each sample. Each data point Several computational strategies have dimensional space according to the
represents one m/z value and its corre- been employed in proteomics analyses number of features examined (i.e., the
sponding intensity. The challenge lies to date. In our own work, we have used specific number of ions (m/z values)
in trying to identify the feature or fea- an approach called "clustering". We used to discriminate clinical subsets).
tures that differentiate one state from discuss this strategy in some detail be- We would like to detect a model that
another. With very small sample sizes low. Other strategies employed include categorizes all remission samples and
(e.g., n <30) it may be possible to in- decision tree analysis, support vector all active GCA samples into their own
spect the samples visually and discern machines, principal component analy- distinct groups. Figure 10 shows an
differences in peak intensities. As the sis, and neural networks. example of the K-nearest neighbor
example below shows, however, this This practical example helps illustrate (Knn) method. This figure shows two
approach is not practical for large num- the clustering approach. Suppose that features, A and B, that are used to seg-
bers of samples. one would like to detect a proteomic regate the two groups. These features
Suppose that one is attempting to iden- fingerprint that segregates active giant represent any m/z value. The samples
tify a combination of 5 m/z values to cell arteritis (GCA) from remission and are then plotted according to their cor-
segregate two disease states (e.g., ac- that one has a total of 246 samples (90 responding intensity value for that par-
tive Takayasus arteritis versus remis- remission samples and 156 samples ticular feature. As Figure 10 indicates,
sion) using a low resolution mass spec- from patients with active GCA). The the active GCAsamples appear to clus-
trometer. In addition, assume that every samples will be run on a high resolu- ter in the lower portion of the graph.
possible combination of analytes will tion mass spectrometer. One therefore This means that, in general, active
be analyzed, and that (hopefully) one anticipates approximately 86 million GCA samples have an intensity that is
has access to the worlds fastest super- data points (246 samples x 350,000 higher for Feature B, but lower for Fea-
computer, which can perform 40 tril- data points/sample), comprising a po- ture A compared to the remission sam-
lion calculations/second. Under such tential total quantity of data of data of ples. The power of clustering comes
conditions, completion of the analysis 1.2 GB. In order to produce a validated when an unknown sample is then put
would require nearly 9 months! More- model, the samples must be divided through the model, as show by the

S-12
Mass spectrometry-based proteomics forserum analysis / V.A. Fusaro & J.H. Stone REVIEW

black dot. The K in Knn represents calibration samples can then be com- References
the number of samples used to predict pared against custom models design- 1. OSLER W: An address on high pressure. Brit
an unknown sample. In this case K = 3, ed by individual laboratories. Med J 1912; 2: 1.
2. PETRICOIN EF III, ARDEKANI A, HITT BA et
which means that the 3 nearest neigh- * Failure to use control samples. Con- al.: A bioinformatics analytical method
bors are used to classify the sample. trol samples should be run with every reveals proteomic signatures of ovarian can-
Thus, the black dot would be classified study. In the case of protein chips, at cer in serum. Lancet 2002; 359: 572-7.
as a sample from a GCA patient, least one control sample should be 3. PETRICOIN EF III, ORNSTEIN DK, PAWE-
LETZ CP et al.: Serum proteomic patterns for
because 2 of the 3 nearest neighbors are placed randomly on a spot for each detection of prostate cancer. J Natl Cancer
samples from GCApatients (20). If the chip. The control sample can be used Inst 2002; 94: 1576-8.
algorithm is trained correctly, cluster- to track the process variability from 4. PUTNAM FW: The Plasma Proteins Struc -
ing can be a very powerful prediction sample preparation to mass spec- ture, Function, and Genetic Control. Acade-
mic Press, New York, 1975-87, pp. 1-55.
tool because of its natural ability to trometer acquisition. 5. ADKINS JN, VARNUM SM, AUBERRY KJ et
generalize. * Failure to assign samples randomly to al.: Toward a human blood serum proteome:
training or testing sets. Samples should Analysis by multi-dimensional separation
coupled with mass spectrometry. Mol Cell
Sources of variability in proteomic be randomized to either the training
Proteom 2003; 1: 947-55.
studies and critical quality control or testing phases. Clustering algo- 6. GORG A, WEISS W : Analytical IPG-Dalt.
measures rithms and other means of parsing pro- Methods Mol Biol 1999; 112: 189-95.
During the past few years, fantastic teomics data are very good at finding 7. GORG A, OBERMAIER C, BOGUTH G et al.:
The current state of two-dimensional elec-
claims have been made for the deriva- any difference between groups of
trophoresis with immobilize pH gradients.
tion of diagnostic tests through pro- interest. Without randomization of Electrophoresis 2000; 21: 1037-53.
teomics. Most of these claims have not samples, differences detected be- 8. GYGI SP, CORTHALS GL, ZHANG Y et al.:
or will not stand up under further scru- tween two sets of samples may have Evaluation of two-dimensional gel electro-
phoresis-based proteome analysis technolo-
tiny (testing in new populations of pa- little to do with biologic plausibility gy. Proc Natl Acad Sci USA 2000; 17: 9390-
tients, etc.). Unfortunately, relatively and more to do with systematic han- 5.
few papers have focused on quality dling differences in the samples. 9. RICHTER R, SCHULZ-KNAPPE P, SCHRADER
control issues in proteomics and on M et al.: Composition of the peptide fraction
in human blood plasma: database of circulat-
methods of qualifying samples for ana- Bench to bedside collaborations ing human peptides. J Chromatogr Biomed
lysis in the first place. Rigorous quality The variety of skills needed to conduct Sci Appl 1999; 726: 25-35.
control efforts are essential to every cutting edge translational research in 10. LEUNG S-M, ZEYDA T, THATCHER B: A
stage, from the collection of samples to proteomics today calls for collabora- new and rapid method for phenotype screen-
ing using Seldi Proteinchip arrays de-
operating the MS instrument to the sta- tion among individuals with expertise monstrated on serum from knockout and wild
tistical analysis of data. The prediction in many disparate fields. Indeed, the type mice. Mol Biol Cell 1998; 9 (Suppl.):
power of bioinformatics algorithms is collaborative nature of proteomics in- 351a.
directly related to the quality of the vestigations is a paradigm for the man- 11. PAWELETZ CP, GILLESPIE JW, ORNSTEIN
DK et al.: Rapid protein display profiling of
data going in. The potential sources of ner in which much good science is con- cancer progression directly from human tis-
error in proteomic studies include (but ducted today. The most productive sue using a protein biochip. Drug Develop -
are not limited to): work will derive from the joint efforts ment Res 2000; 49: 34-42.
* Flawed procedures for the collection of scientists familiar with the type of 12. WILKENS MR, WILLIAMS KL, APPELRD, DF
H (Eds.): Proteome Research: New Frontiers
of sera. As discussed above, allowing rigorous laboratory techniques requir- in Functional Genomics. New York, Springer-
samples to sit for too long before pro- ed, computer scientists who can design Verlag, 1997.
cessing is the cardinal offense in this new bioinformatics approaches for this 13. AEBERSOLD R, MANN M : Mass spectrome-
category. field, and clinical investigators who try-based proteomics. Nature 2003; 422:
198-207.
* Improper calibration of instruments. know what questions are relevant to pa- 14. STEEN H, KUSTER B, FERNANDEZ M et al.:
Standard operating procedures must tient care. For this third group of inves- Detection of tyrosine phosphorylated pep-
be developed for the calibration of all tigators, a thorough understanding of tides by precursor ion scanning quadrupole
TOF mass spectrometry in positive ion mode.
MS instruments, which are inherently the disease of interest, the ability to
Anal Chem 2001; 73: 1440-8.
finicky. provide reliable data on well-character- 15. BALDWIN MA, MEDZIHRADSZKYKF, LOCK
* Faulty protein chips or faulty individ - ized patient cohorts, and a sufficient CM et al.: Matrix-assisted laser desorption/
ual spots on chips. In many cases, understanding of the technical issues of ionization coupled with quadrupole/orthogo-
nal acceleration time-of-flight mass spectro-
quality control within the industry proteomics are all essential to effective metry for protein discovery, identification
that produces commercially available collaborations. and structural analysis. Anal Chem 2001; 73:
protein chips and other implements 1707-20.
has been poor. Application of a cali- Acknowledgement 16. MANN M, WILM MS : Error tolerant identifi-
cation of peptides in sequence databases by
bration sample to one spot on each The authors thank Dr. E.F. Petricoin, III, peptide sequence tags. Anal Chem 1994; 66:
protein chip may help overcome this of the NCI/FDA Clinical Proteomics 4390-9.
problem. The spectra derived from Program for many helpful discussions. 17. CONRADS TP, ISSAQ HJ, VEENSTRA TD:

S-13
REVIEW Mass spectrometry-based proteomics forserum analysis /V.A. Fusaro & J.H. Stone

New tools for quantitative phosphoproteome Cell Proteomics 2002; 1: 376-86. selection and classification using genetic
analysis. Biochem Biophys Res Commun 19. STONE JH, HOFFMAN GS, MERKEL PA et algorithms. Proceedings of the International
2002; 290: 885-90. al.: The Birmingham Vasculitis Activity Conference on Genetic Algorithms 1993,
18. ONG SE, BLAGOEV B, KRATCHMAROVA I Score for Wegeners Granulomatosis (BVAS University of Illinois, pages 557-64.
et al.: Stable isotope labeling by amino acids for WG): A disease-specific vasculitis activi- 21. ANDERSON NL, ANDERSON NG: The human
in cell culture, SILAC, as a simple and accu- ty index. Arthritis Rheum 2001; 44: 912-20. plasma proteome. Mol Cell Proteom 2002; 1:
rate approach to expression proteomics. Mol 20. PUNCH WF et al.: Further research on feature 845-67.

S-14

Potrebbero piacerti anche