Sei sulla pagina 1di 4

World J. Surg.

29, 557560 (2005)


DOI: 10.1007/s00268-005-7912-z

How to Analyze an Article


John D. Urschel, M.D.
Department of Surgery, Division of Cardiothoracic Surgery, Tufts University School of Medicine, Tufts-New England Medical Center,
750 Washington Street, Boston, Massuchusetts, 02111, USA
Published Online: April 21, 2005
Abstract. In clinical research investigators generalize from study samples
to populations, and in evidence-based medicine practitioners apply population-level evidence to individual patients. The validity of these processes is assessed through critical appraisal of published articles. Critical
appraisal is therefore a core component of evidence-based medicine
(EBM). The purpose of critical appraisal is not one of criticizing for
criticisms sake. Instead, it is an exercise in assigning a value to an article.
A checklist approach to article appraisal is outlined, and common pitfalls
of analysis are highlighted. Relevant questions are posed for each section
of an article (introduction, methods, results, discussion). The approach is
applicable to most clinical surgical research articles, even those of a
nonrandomized nature. Issues specific to evidence-based surgical practice, in contrast to evidence-based medicine, are introduced.

Critical appraisal of a clinical research article is an essential feature of evidence-based medicine (EBM) and EBM surgical
practice. Nevertheless, surgical trainees and some students of
EBM occasionally lose sight of this fact or misunderstand the
purpose of critical appraisal. Trainees wonder if they can leave
the critique of research to others and simply read an expert review. This strategy may serve the generalist reasonable well (even
this is debatable), but it is not acceptable for a serious practitioner
of surgery [16]. The E in EBM stands for evidence, not expert
opinion. The lessons of medical history point to the fallibility of
expert opinion, especially when it is not rigorously derived from
published evidence. Whereas some students question the practical usefulness of critical appraisal, others embrace it with
excessive enthusiasm. In their vigor to find fault in published
papers and to criticize for criticisms sake, they fail to evaluate the
value of an article. Determining the value of an article is the
essence of critical appraisal [1]. All articles have flaws. The real
question is: Given the flaws, how valuable is this article to the
practice of EBM?
Research in surgery yields a variety of article types, ranging
from simple case reports to randomized controlled trials (RCTs)
and meta-analyses of RCTs (Table 1). Between these extremes
are the ever-prevalent case-series and nonrandomized compara-

Correspondence to: John D. Urschel, M.D., e-mail:


buffalo.edu

jurschel@

tive studies of varying validity. The specialty of surgery has been


criticized for relying excessively on case-series and their related
expert opinion for far too long and for being slow to adopt the
RCT [24]. A detailed exploration of this problem is outside the
scope of this article, but the four major reasons for the relative
infrequency of RCTs in surgery deserve mention. First, surgeons
tend to be seriously attached to their own surgical viewpoint or
technique, an attachment that usually exceeds a medical physicians affinity for a particular drug. Second, many surgical questions cannot be addressed in RCTs because the necessary
community equipoise, or management uncertainty, does not exist.
Third, surgeons are usually skilled in one operative approach to
any given problem, but they are rarely equally proficient in two
competitive operative approaches; this makes RCTs of different
operations difficult. Finally, patients do not mind having the
choice of a perioperative antibiotic (or some similar medical
intervention) left to chance, but they are understandably reluctant
to leave the decision to operate to chance. Although I do not seek
to make excuses for the lack of RCTs in surgery, the realities of
the situation should be considered. Therefore, there is still a role,
albeit a diminishing one, for a carefully conducted case-series in
surgery. Whereas medical practitioners of EBM can often simply
dismiss case-series from consideration, surgeons do not currently
have this luxury. We still must critically analyze case-series while
at the same time encouraging the performance of more sophisticated research studies [6].
The editors of this World Journal of Surgery issue on EBM for
surgeons have commissioned several articles on the critical appraisal of articles, highlighting the central role of critical appraisal
in EBM. The issue contains articles devoted to the analysis of
therapeutic studies, studies of diagnostic tests, prognostic studies,
and systematic reviews. Critical appraisal of these various forms
of research publications has much in common. Readers familiar
with the Users Guide publication series [7] and subsequent
textbook [8] will be well versed in three basic questions of article
appraisal: Are the results valid? What are the results? How can I
apply the results to patient care? At least two of the articles in this
issue follow this established format. However, there are other
published checklists and appraisal approaches that are also useful
[1, 911]. To maintain some balance in presentation in this issue,
and to permit a generic approach to appraisal that is broadly

558

World J. Surg. Vol. 29, No. 5, May 2005

Table 1. Hierarchy of clinical surgical research.

Table 3. Critical appraisal checklistmethods.

Meta-analyses and systematic reviews of multiple randomized controlled


trials
Randomized controlled trials
Nonrandomized comparative studies with intent of a fair comparison
Prospective (concurrent) cohort studies
Retrospective (historic) cohort studies
Case-control studies
Nonrandomized comparative studies without consideration of fair
comparison
Case-series attempting to compare two dissimilar groups of patients
Case-series attempting to compare contemporary patients with historic
controls from previous era
Noncomparative observational case-series
Case reports

Question

Be wary of

Are the numbers of


patients sufcient?

Lack of evidence for a treatment eect is


not the same as evidence of no eect
(study underpowered).
Hospital records are not created with
research in mind, and measurements
found in hospital records are suspect.
Convenient surrogates for important
outcomes may not be valid.
Unnecessarily complex methods may
be designed to deceive.
Data dredging leads to spurious
associations.
Best test seeking behavior
overstates signicance.

Table 2. Critical appraisal checklistintroduction.

results valid? In other words, if the methodology is not sound,


the results will not be valid. This is an important concept. Are the
numbers of patients sufficient? is the first question in the methods
section (Table 3). In an RCT, for example, the reader should look
for explicit sample size justification. Of course, this question
comes up again in the discussion section when the possibility of a
type II error (finding no evidence of a difference between sample
groups when a difference really exists in the population groups) in
the study should be considered.
The next question is: Are the measurements valid and reliable? A valid measure is one that measures what it is supposed
to measure; and a reliable measure is one that gives a similar
result when applied on more than one occasion [1]. Published
articles often fail to mention shortcomings in this area or minimize their importance. Readers should be especially skeptical
of clinical measurements obtained from hospital records. Hospital records are not designed for research, and many measurements that are acceptable for clinical care are not valid as
research measurements. In vascular and plastic surgery, for
example, Doppler measurements are often used to assess tissue
perfusion. This serves a useful clinical purpose but the measurements, when viewed in a research context, may not be valid
or reliable.
Are the outcomes clinically relevant? is the next question in the
methods section. The really important outcomes are often difficult to measure. Therefore investigators may select outcomes that
are easy to assess and then argue that these outcomes are clinically relevant in their own right or are useful surrogates for other
outcomes. Serum albumin, for example, is a simple outcome to
assess, but it may not be a good surrogate for nutritional status in
acutely ill patients.
The final question in the methods section is: Are the statistical
approaches sensible? Surgeons should have a basic understanding of statistical methods; we need to understand standard
approaches to common statistical problems. That basic knowledge allows the reader to evaluate, in a general sense, the
suitability of the reported statistical approach [13]. If a studys
statistical methods seem unduly complex or depart too far from
the norm, the reader might wonder if this represents an intentional attempt at statistical deception. Readers should also be
wary of two common forms of disingenuous statistical manipulation: data dredging and best test seeking behavior. With data
dredging, the investigator tests for multiple possible associations
in the data and hopes to find something significant. Of course,
if enough possible associations are examined, something is

Question

Be wary of

Why was the study done?

Case-series may be a veiled


form of advertising for a
prot-seeking organization.
Preliminary unfocused data
dredging, with study goals
formulated after data analysis
(to give appearance of legitimate
research question).

Are the aims clearly stated?

applicable to RCTs and lesser research publications, an approach


is outlined that differs from the Users Guide format popularized
at McMaster University (the reader is referred to the papers by
Bhandari and colleagues in this issue). I admit some difficulty
with this departure from the familiar [12].
Article Appraisal Checklists
Checklists for the critical appraisal of surgical articles are outlined
in Tables 2 to 5. The checklists are organized into four main
categories that correspond to the usual format of a research
article: introduction, methods, results, and discussion [1]. Within
each category there are two to four basic questions.
Appraisal of the Introduction
The first question to ask when reading an article is: Why was the
study done?(Table 2) Whereas RCTs are usually motivated by a
desire to answer a serious research question, case-series are often
an exercise in publishing for the sake of publishing, or even
publishing for the purpose of improving an institutions marketing
position. The second question to ask about an articles introduction is: Are the aims clearly stated? A plausible and focused research goal suggests that the study was well thought out before
data were collected. Alternatively, a vague goal or no goal at all
usually indicates that data collection and analysis preceded the
formulation of a research question. The introduction section of an
article provides the reader with an early estimate of the papers
value; a good question does not guarantee good research, but a
poor question precludes it.
Appraisal of Methods
The articles methods section provides information on the internal
validity of the study. In the Users Guide approach, for example,
the methodology questions are asked under the heading Are the

Are the measurements


valid and reliable?
Are the outcomes
clinically relevant?
Are the statistical
approaches sensible?

Urschel: Article Appraisal

bound to turn up by chance alone. For example, an investigator


exploring 20 possible associations may find, by chance, one that
seems to meet an arbitrary definition of statistical significance
(p = 0.05). Data dredging gives rise to spurious associations.
Best test seeking behavior is similar to data dredging, put here
the investigator seeks out good tests instead of good associations. In other words, the investigator runs the data with many
different statistical tests and then reports the statistical methods
that are most pleasing. Whereas data dredging leads to spurious
associations, best test seeking behavior overstates the statistical
significance of an association. Unfortunately, modern computer
software packages facilitate both data dredging and best test
seeking behavior.

Appraisal of Results
Are the basic data properly described? is the first question
(Table 4). Basic data include important patient characteristics
such as age, sex, weight, socioeconomic status, performance
status, and disease stage. It also includes basic data on the
medical environment, such as size and type of hospital (teaching,
community, private, public), referral patterns, specialist or generalist practice, and hospital resources. These basic data may
seem mundane, but they are extremely important. A fair comparison of two groups of patients hinges on the similarity of the
two groups before intervention. Even the process of randomization in an RCT does not guarantee that the two groups are
similar. Randomization prevents the groups from being dissimilar in a systematically biased way, but it does not prevent dissimilarity by chance. Irrespective of publication type, the reader
cannot make a judgment on group similarity without the basic
data. Basic data also help the reader in another respect. The
reader cannot generalize the study findings to his or her surgical
practice without considering the studys patient and hospital
characteristics. The issue of generalizing research findings is
critically important for surgeons (see below).
The question Do the numbers add up? may seem too obvious
for inclusion in this checklist, but (sadly) it remains an important
question for the reader. A quick glance at the tables and graphs,
while reading through the text, may show inconsistencies in the
numbers. All articles have flaws, and errors do occur, but the real
worry for the reader is the extent of the error. If there is obvious
sloppiness in the paper, might there be even more sloppiness and
error in the underlying study?
The next question, Are the measure of effect, and statistical
significance, properly presented? is important. The related
Users Guide questions are How large was the treatment effect
and how precise was its estimate? With these questions the
reader evaluates the magnitude of difference between two patient
groups, and its possible explanation by chance alone (statistical
significance). The reader should be wary if the authors quietly
state a modest absolute difference between groups and then go on
to use measures of effect (e.g., relative risk reduction) that express absolute difference as a proportion of the control groups
risk [12, 14]. If, for example, a new drug reduces the risk of a
perioperative complication from 6% in the control group to 3% in
the treatment group, the absolute risk reduction is 3% (number
needed to treat is 33, see Dr. Trainers article) and the relative
risk reduction is 50%. A novice reader may be unduly impressed
by the 50% relative risk reduction.

559
Table 4. Critical appraisal checklistResults.
Question

Be wary of

Are the basic data


properly described?

If basic data are not provided,


there is no way of telling if the two
groups are similar (fair comparison).
Generalization, a key step in EBM,
is not possible if we do not have
basic patient data.
Do the numbers add up?
Sloppiness, when present, is usually
not conned to the easily identiable
errors (iceberg analogy).
Are the measures of effect and Relative risk reduction may be
statistical signicance properly impressive, but what is the absolute
presented?
risk reduction (and NNT)?
What is the main nding,
If the main nding comes from
and could it be erroneous?
an unplanned subgroup analysis,
it may be wrong (data dredging).
Bias and confounders may give
a spurious result.
Absolute risk reduction = risk in control group minus risk in treatment group.
Relative risk reduction = absolute risk reduction divided by risk in
control group, expressed as percent.
NNT: number needed to treat (1/absolute risk reduction); EBM:
evidence based medicine.

After considering the suitability of the articles measure of


effect, the reader should look at the presentation of statistical
significance or, stated differently, the precision of the estimate of
measure of effect. Confidence intervals are preferred, but traditional p values provide the same information (in a more
opaque way). A 95% confidence interval (CI) is typically reported in surgical journals for the same reason that a p value of
0.05 is considered significant: It is an arbitrary but convenient
threshold level of significance. A 95% CI defines an interval of
values that include the true value 95% of the time. Unfortunately, statistical significance is still poorly presented in many
articles. Readers should be wary of statements or tables that
simplistically report not significant or, alternatively, significant, p < 0.05. There is no excuse for this type of imprecision
in surgical reporting.
The last question in the results section is What is the main
finding and could it be erroneous? The reader should be wary of
main findings that do not directly follow from the main research
question. If, for example, a paper reported a trial of lymphadenectomy versus no lymphadenectomy for malignancy, the
anticipated main finding would be one of survival in patients
treated by lymphadenectomy. However, it would not be unusual
for the article to emphasize a different finding, such as improved
survival in a just a subgroup of patients undergoing lymphadenectomy. Subgroup analyses may be valid if the analysis was
planned a priori (contrast with subgroup analyses after data
dredging). Nevertheless, subgroup analyses should be viewed with
caution, especially if they form the basis of the articles main
finding [15].
The reader should also consider the possibility of a major error
in findings due to bias, or the presence of a confounder. Bias, at
any point in a study, can systematically (rather than randomly)
deviate the results away from the truth. A confounder is an
unidentified third variable that is responsible for an apparent, but
false, association between two study variables. Good researchers
strive to eliminate bias and to understand confounders.

560

World J. Surg. Vol. 29, No. 5, May 2005

Table 5. Critical appraisal checklistdiscussion.


Question

Be wary of

Are the results fairly


considered against a
background of previously
published data?

Results only discussed within


the context of supportive
published data.
Results nicely conrm authors
previously published position.
It may not be possible to generalize
the study results to a dierent
treatment environment.
The skill with which an operation
is performed may be more important
than the specics of the operation itself.

What are the


implications for
my practice?
Do I possess surgical
skills similar to
those of the reporting
surgeons?

Appraisal of Discussion
Are the results fairly considered against a background of previously published data? (Table 5). The authors should present their
results in a balanced way, but this is often not done. Readers
should be wary of articles that cite only supportive data. Similarly,
readers should ask how the findings fit into a framework of any
previous publications by the same authors. Some authors champion the same opinion, in an unwavering way, in publication after
publication.
A key issue in EBM relates to the process of generalizing research results to individual patients. The question can be stated as
What are the implications for my practice? For medical practitioners, these are questions of patient characteristics and health
care environment. The reader assesses the basic data in the article
(see above) and asks if the patients and health care environment
are similar to his or her own. If they are, the articles findings are
probably applicable to the physicians practice. However, for
surgeons, there is an additional dimension to this process of
generalization: individual surgeon skill. Do I possess similar skills
to those of the reporting surgeons? This is a difficult issue for
surgeons to confront [12, 16]. Patients would not be well served if
surgeons abandoned operative techniques with which they were
successful in an attempt to adopt the latest best technique.
There must be a cautious transition to new surgical techniques. In

some cases, the evidence may even suggest that the surgeon refer
specific patients to another center. That is an especially difficult
aspect of evidence-based surgery, and one that our medical colleagues have trouble understanding. Few physicians have seen
their professional livelihood altered by the arrival of a new prescription medicine, but the same cannot be said for the impact of
new procedures on established surgeons. In part, it is the differences between evidence-based medicine and evidence-based
surgery that make this World Journal of Surgery issue on evidencebased surgery so timely.
References
1. Crombie IK. The pocket guide to critical appraisal. London: BMJ
Books, 2002
2. Horton R. Surgical research or comic opera: questions, but few answers. Lancet 1996;347:984985
3. Lee JS, Urschel DM, Urschel JD. Is general thoracic surgical practice
evidence based? Ann. Thorac. Surg. 2000;70:429431
4. McLeod RS. Issues in surgical randomized controlled trials. World J.
Surg. 1999;23:12101214
5. Urschel JD, Urschel DM, Mannella SM, et al. Duration of knowledge
in general thoracic surgery. Ann. Thorac. Surg. 2001;71:337339
6. Law S, Wong J. Use of controlled randomized trials to evaluate new
technologies and new operative procedures in surgery. J. Gastrointest.
Surg. 1998;2:494495
7. Oxman AD, Sackett DL, Guyatt GH. Users guide to the medical
literature. I. How to get started. J. A. M. A. 1993;270:20932095
8. Guyatt, G, Rennie, D (2002) Users Guide to the Medical Literature:
A Manual for Evidence-based Clinical Practice, AMA Press, Chicago
9. Greenhalgh T. How to read a paper: getting your bearings (deciding
what the paper is about). B. M. J. 1997;315:243246
10. Greenhalgh T. How to read a paper. London: BMJ Books, 1997
11. Jadad A. Randomised controlled trials. London: BMJ Books, 1998
12. Urschel JD, Goldsmith CH, Tandan VR, et al. Users guide to evidence-based surgery: how to use an article evaluating surgical interventions. Can. J. Surg. 2001;44:95100
13. Greenhalgh T. How to read a paper: statistics for the non-statistician.
I. Different types of data need different statistical tests. B. M. J.
1997;315:364366
14. Antes G, Galandi D, Bouillon B. What is evidence-based medicine?.
Langenbecks Arch. Surg. 1999;384:409416
15. Oxman AD, Guyatt GH. A consumers guide to subgroup analyses.
Ann. Intern. Med. 1992;16:7884
16. Sauerland S, Lefering R, Neugebauer EAM. The pros and cons of
evidence-based surgery. Langenbecks Arch. Surg. 1999;384:423431

Potrebbero piacerti anche