Sei sulla pagina 1di 8

⏐ PUBLIC HEALTH MATTERS ⏐

The Weight of Scientific Evidence in Policy and Law


| Sheldon Krimsky, PhD

which all scientific evidence that is relevant to


The term “weight of evidence” (WOE) appears in regulatory rules and deci-
the status of a causal hypothesis is taken into
sions. However, there has been little discussion about the meaning, variations of
account. In criminal law, juries are given the
use, and epistemic significance of WOE for setting health and safety standards.
This article gives an overview of the role of WOE in regulatory science, dis- responsibility to decide the WOE in regards
cusses alternative views about the methodology underlying the concept, and to guilt or innocence. Judges weigh the evi-
places WOE in the context of the Supreme Court’s decision in Daubert v Merrell dence of legal precedent in justifying their
Dow Pharmaceuticals, Inc (1993). I argue that whereas the WOE approach to eval- rulings. Clinicians use a form of WOE in mak-
uating scientific evidence is gaining favor among regulators, its applications in ing diagnoses, and judges may defer to it
judicial processes may be in conflict with some interpretations of how the Daubert when they offer opinions on the reliability of
criteria for judging reliable evidence should be applied. (Am J Public Health. evidence. In the policy sectors of government,
2005;95:S129–S136. doi: 10.2105/AJPH.2004.044727) regulatory agencies or risk analysis panels use
WOE to assess the total value of the scientific
evidence that a substance may be dangerous
In the narratives describing the historical de- of them—without of course, proving the to human health. Sometimes the term is used
velopment of natural science, nothing cap- other.”)2 In contrast, Pierre Duhem and as if there were some algorithm or rational
tures the drama of discovery as effectively as Thomas Kuhn were leading voices against the decision process by which the “weighing of
the “crucial experiment” (an experimentum view that scientists are influenced by “crucial evidence” is accomplished. Other times, the
crucis). For it is such an experiment, according experiments” in deciding between competing term WOE refers to nothing more than a sub-
to most historical accounts, that finally re- paradigms. jective assessment on the part of a reviewer
solves competing explanations and/or theo- Notwithstanding this debate, I believe there who takes relevant data from a given body of
ries, bringing to a close contested schools of are influential or determinative experiments published research into consideration in order
thought. For example, history of science texts that crystallize a new scientific consensus, to ascertain whether a hypothesis is more
tell us that it was a crucial experiment that particularly in fields like physics, chemistry, likely to be true than false.
put to rest the theory of spontaneous genera- and engineering. A distinction has been made between
tion in favor of the germ theory of disease However, it is very rare to find determina- WOE and “strength of evidence” (SOE).4 The
and that launched a critical blow to the Phlo- tive experiments in environmental health sci- latter is associated with the gravitas and rele-
giston theory of combustion. Also widely ac- ences. A single, well-constructed experiment vance of information related to a specific indi-
claimed as a crucial experiment in the early almost never resolves a critical issue on the cator, such as the number of tumors pro-
part of the 20th century were the measure- cause of a disease, especially but not exclu- duced in animals. In contrast, WOE includes
ments made by British physicists, among sively, diseases resulting from exposure to all varieties of evidence, positive and nega-
them Sir Arthur Eddington, of the sun’s rays toxic substances. Rothman provides an exam- tive, mechanistic and nonmechanistic, in vivo
during a solar eclipse. From their measure- ple where the etiology of “toxic shock syn- and in vitro, as well as human and animal
ments they concluded that light bends in a drome” was resolved through a crucial ex- studies. In risk assessment, the trend has been
gravitational field, which provided evidence periment.3 As long as we do not permit to widen the lens of relevant empirical and
in support of Einstein’s over Newton’s theory controlled experiments where we would in- theoretical evidence, thus moving from ap-
of light.1 Such experiments have gained iconic tentionally harm a human subject, when there proaches that utilize “strength of evidence” to
status in the history of science. are no possible benefits to them, for the mere those that utilize WOE. In this article I shall
But there is a significant and lively debate sake of scientific inquiry, no single experiment speak exclusively of WOE and assume that it
among philosophers and historians on can provide decisive data on the effects of a encompasses the use of strength of evidence.
whether it is meaningful to talk about “crucial foreign substance on a human group. In so far The WOE approach has been introduced
experiments” in science. Sir Karl Popper be- as we depend on a number of experiments, into ecological risk assessment since the early
lieved that crucial experiments could play a some with greater statistical or explanatory 1990s in response to the need for better risk
role in falsifying scientific theories (“It should power than others and information from di- analyses of Superfund sites and impacted nat-
be noted that I mean by a crucial experiment verse forms of evidence, we need to have ural ecosystems.5,6 One consensus report on
one that is designed to refute a theory (if pos- some way of aggregating or weighing the re- WOE defined it as “the process by which
sible) and more especially one which is de- sults across different modalities of evidence. multiple measurement endpoints are related
signed to bring about a decision between two The term “weight of evidence” (WOE) is to an assessment endpoint to evaluate
competing theories by refuting (at least) one used to characterize a process or method in whether a significant risk of harm is posed to

Supplement 1, 2005, Vol 95, No. S1 | American Journal of Public Health Krimsky | Peer Reviewed | Public Health Matters | S129
⏐ PUBLIC HEALTH MATTERS ⏐

the environment.”7 In his widely cited book application largely depends on the tacit ex- formation) can yield the definitive answer to
Ecological Risk Assessment, Suter notes the pertise of scientific evaluators. Moreover, no an environmental health question, we refer to
significance of WOE in evaluating different canonical frameworks for weighing scientific multiple epistemic modalities. The problem is:
classes of evidence generated by alternative evidence have emerged. When experts use how does the evidence from these modalities
ecological models. He wrote, “the separate the term WOE in publications or in the court- add up? Does the accumulated data from sev-
lines of evidence must be evaluated, orga- room, they are almost always referring to the eral epistemic modalities mitigate against the
nized in some coherent fashion, and ex- outcome of a process in which scientists, insufficiency or shortcomings of evidence
plained to the risk manager so that a weight working as individuals or in groups, examine from a single epistemic modality?
of evidence evaluation can be made.”8 a body of relevant scientific studies on the re- A similar problem is presented in decision
A number of benefits have been attributed lationship between a compound and a disease analysis. Multiattribute Utility Theory applies
to a WOE framework in regulatory decisions. outcome. These scientists, operating within an to cases where there are different dimensions
Walker9 cites three objectives of a WOE accepted framework, apply their tacit knowl- of value associated with outcomes that, on the
analysis: (1) it provides a “clear and transpar- edge of a field to reach a “yea,” “nay,” or face of it, are not reducible to a common met-
ent framework” for evaluating the evidence “probabilistic conclusion” about the relation- ric.11 Thus, a decision to build a dam will
in a risk determination; (2) it offers regulatory ship between the compound and a disease have both positive and negative impacts of a
agencies a consistent and standardized ap- outcome. Most applications of WOE in sup- social and ecological variety. These attributes
proach to evaluating toxic substances; and port of public policy that are cited in the liter- are incommensurable, such as the additional
(3) it helps to identify the discretionary as- ature seem to (by inference or lack of specifi- hydropower gained by the dam and the loss
sumptions in risk determinations from ex- cation) use a process methodology that is low of fish spawning in the river. In Multiattribute
perts. However, in selecting a WOE approach on transparency and high on subjectivity. Utility Theory, a decision analyst develops a
a certain number of nontestable a priori as- ranking and a utility function for the attrib-
sumptions must be adopted, which may MODALITIES OF EVIDENTIARY utes and then undertakes an empirical investi-
narrow the scope of scientific opinion and SUPPORT gation to determine the actual value of those
consensus on how different modalities of evi- attributes (how many fish will be lost and
dence should be aggregated, thereby failing If the modality of evidence considered for how much energy will be produced). Thus,
to meet Walker’s objective. evaluating the human health effects of a the final outcome of applying Multiattribute
I begin with the observation that there is chemical compound was of one type, let us Utility Theory is the aggregation of incom-
virtually no discussion in the scientific litera- say epidemiological studies, then the WOE mensurable variables through the adoption
ture of the epistemic meaning of WOE. We might refer to how many studies support the of a numerical schema.
cannot tell whether it is used as a methodol- hypothesis about health risks, what the indi- For evaluating the human health effects of
ogy, a heuristic, a ranking system, or simply vidual power of a study is, or what the com- a chemical agent, there are different modali-
a subjective process of setting a causal thresh- bined power of all the studies are in a meta- ties of evidence, including human epidemiol-
old for cumulative indirect evidence. In the analysis. But each modality of evidentiary ogy, wildlife studies, experimental laboratory
spirit of these questions, this article will do support is limited. For example, some scien- animal studies with rodents, primate studies,
the following: (1) discuss the problem of ag- tists argue that epidemiological studies cannot in vitro cell studies, and chemical structure
gregating different forms of evidence; (2) re- demonstrate causation or mechanism, but activity analysis. Each type of study may pro-
view uses of WOE in health science publica- only association.10 Controlled animal studies vide some evidence, but each has its limita-
tions; (3) examine some applications of WOE do not yield direct information about people. tions. Human epidemiology may be valued
by federal agencies; and (4) discuss how Comparison of chemical structure between highly for its relevance but less so for its sci-
WOE enters judicial proceedings, particularly suspected and known toxins (known as struc- entific power, especially if the findings are
in the context of the admissibility of expert ture activity analysis) does not provide infor- unrelated to a postulated or known biological
witnesses. mation on how the chemicals function in a mechanism. Experimental animal studies may
In this discussion, I shall argue that the live organism. The term WOE has come to be dependable for the mechanistic knowledge
concept of WOE, as it is currently applied in mean not only a determination of the statisti- they offer but questionable for their rele-
the health sciences, largely involves a qualita- cal and explanatory power of any individual vance to human cases.
tive approach to rating and assessing the study (or the combined power of all the stud- If a chemical were known to be one of the
aggregation of different forms of scientific evi- ies) but the extent to which different types causal agents responsible for a human disease,
dence in relationship to a causal hypothesis. of studies converge on the hypothesis. The then we would expect a series of evidentiary
Currently, qualitative or quantitative frame- WOE approach has become likened to “trian- pathways to converge on that conclusion. The
works that guide the use of a WOE method gulation,” namely, approaching the target chemical might manifest genotoxic or gross
are more or less a priori heuristics that adopt question from many directions. Where no chromosomal effects in human cells studied
certain norms about the status and relevance single epistemic modality (by which I mean a in vitro. Or the chemical might be associated
of alternative types of information, but their specific method or technique for acquiring in- with wildlife abnormalities. But not all of the

S130 | Public Health Matters | Peer Reviewed | Krimsky American Journal of Public Health | Supplement 1, 2005, Vol 95, No. S1
⏐ PUBLIC HEALTH MATTERS ⏐

evidence may be consistent with the result. It be told that the decision to regulate was de- that certain polychlorinated biphenyl/dioxin-
is possible that the chemical may be harmless cided on the WOE rather than a crucial study like compounds found in fish in the Great
to certain species and yet cause disease in demonstrating causality. Without an explica- Lakes-St. Lawrence basin and elsewhere can
others. Nevertheless, we gain confidence tion of how evidence is “weighed” or cause neurobehavioral deficits.”16
when one epistemic modality (rodent studies) “weighted,” the claim WOE seems to be com- The concept of WOE is used widely but
is consistent with the results of other epis- ing out of a “black box” of scientific judg- rarely explicated in the scientific and policy
temic modalities (epidemiological studies) that ment.13 One article that uses the term WOE literature. Menzie et al.17 state that, “although
make up the architectonic of evidence. in its title does not refer to the term else- the term ‘weight-of-evidence’ is used fre-
When we do not know whether a chemical where in the text.14 Other articles assign scal- quently in ecological risk assessment, there
causes a human disease but have the type of ing factors or qualitative terms to the eviden- is no consensus on its definition or how it
circumstantial evidence we would expect to tiary attributes. should be applied.” Often when WOE is
acquire if the substance were known to cause A report issued by the US Agency for Toxic cited, it is assumed that readers know what it
the disease, then, building on a coherence Substances and Disease Registry (ATSDR) of means. Sometimes it is used to signify that ev-
theory of truth, the weight of the circumstan- the Department of Health and Human Ser- idence must reach a certain critical threshold
tial or related evidence elevates our confi- vices stated that a necessary and reasonable before it can support regulation. Other times
dence in the hypothesis connecting the sub- alternative to causal determinations when es- it refers to a process that examines both posi-
stance to the disease. tablishing policy “may be a critical assessment tive and negative studies and determines by
But how can we aggregate the evidence of the overall “weight of evidence” of avail- the number and strength of the studies
from a variety of modalities in a WOE ap- able science to serve as a surrogate of ‘causal- whether a causal relationship can be inferred.
proach, when no single study is definitive, ity.’” The implication is that when causality is As regulatory bodies and scientific review
and we cannot justifiably reach a conclusion out of reach, we must use a surrogate called panels depend increasingly on WOE meth-
from the limited evidence that a specific com- WOE. The ATSDR states: “ ‘The weight-of- ods, questions surrounding their use will in-
pound is likely the cause of human illness? evidence’ approach is an assessment method evitably enter litigation either in torts or
Aggregating evidence across different epis- that includes reviewing site-specific doses, epi- contested regulations, where the elusive
temic modalities is like adding incommensu- demiologic studies, and chemical-specific toxi- methodology behind WOE is ripe for
rables. It can only be done if a priori con- city data to evaluate exposures and potential Daubert challenges. Therefore, it is important
structs provide a basis for developing a health effects in a community.”15 to understand how WOE is being interpreted
common metric. More evidence, albeit incon- In law, when direct material evidence of a and what, if any, criteria are implicit or ex-
clusive, may mean you are closer to demon- crime or direct eyewitness testimony is not plicit in its application.
strating causality, but you cannot know by available, the term “circumstantial evidence” After an extensive review of the appear-
how much. And sometimes, different modali- is used. This type of evidence comes in “bun- ance of WOE in public health studies and
ties of evidence do not converge on a single dles” and eventually must be “weighed” by regulatory documents, I have uncovered what
hypothesis and may even be inconsistent. the jury in its role of determining guilt or I believe are four general uses of the term.
innocence. Each piece of the “bundle” of cir-
USES OF THE TERM WOE cumstantial evidence is insufficient to make a Intensive Literature Review
IN HEALTH SCIENCES case. It is the entire “bundle” that convinces the This interpretation of WOE takes the form
jury. The concept of “circumstantial evidence” of an intensive literature review, including
Usually WOE methods are applied when has a counterpart in environmental health. some qualitative discussion of the studies,
no single study and no individual modality of The ATSDR uses the metaphor of the mi- without assigning any weights to the studies.
evidence (e.g., animal studies, human studies, croscope as the rationale for applying the In the words of one medical group, “the more
in vitro, etc.) is conclusive in demonstrating a WOE approach to examining the human ef- inclusive method of literature review involves
cause-effect relationship. Other times it may fects of polychlorinated biphenyls, by aggre- assessing the ‘weight of evidence’ . . . the im-
be used even when there is a solid epidemio- gating the results of disparate studies. portance of the findings from each piece of
logical study showing a large increased risk “Each of the studies, whether an epidemio- research should be judged: this is termed ‘Sig-
from the exposure to some substance in order logic study, a laboratory study, or the findings nal.’ This is then balanced by the strength of
to build a stronger argument for regulation. of wildlife biologists, could be compared to the evidence or design weaknesses (termed
Alternatively, WOE has been introduced to the lens of a microscope. Like the lens of a ‘Noise’).”18 Those who use the term WOE in
assess the “strengths and weaknesses of vari- microscope, they can vary in terms of their this context assume that the reviewers have
ous measurements, and of the nature of un- resolving power and quality. They are also applied their expertise in interpreting both
certainty associated with each of them.”12 focused on different populations at different the quantity (number of positive studies) and
However, while the term is applied quite lib- points in time . . . . Despite the limits and the quality (statistical power) of the evidence
erally in the regulatory literature, the method- weaknesses of individual pieces of research, without any explicit reference to a methodol-
ology behind it is rarely explicated. We might the collective weight of evidence indicates ogy. Readers may justifiably assume that the

Supplement 1, 2005, Vol 95, No. S1 | American Journal of Public Health Krimsky | Peer Reviewed | Public Health Matters | S131
⏐ PUBLIC HEALTH MATTERS ⏐

reviewers are basing their interpretation of quantitative techniques. For example, ATSDR induced a stage 3 effect and the frequency
the aggregate value of the selected studies incorporates an assessment method that in- among those that also induced a stage 5 ef-
on their years of experience and tacit knowl- cludes reviewing site-specific doses, epidemio- fect, we would have an empirically based
edge, rather than a fully developed analyti- logical studies, and toxicity data. A dose level system to develop weighting factors.
cal framework.19 injurious to humans is found from different However, there is no generally accepted
types of research protocols. rationale for such a priori weightings within
Seat-of-the Pants Qualitative Assessment The World Health Organization’s Global a discipline. And if there were an accepted
According to this view, WOE is a vague Assessment of Endocrine Disrupting Chemi- framework of weightings, the selection would
term that scientists use when they apply im- cals uses “overall strength of evidence” as a be premised on achieving consistency among
plicit, qualitative, and/or subjective criteria to qualitative evaluation of the outcome of expert evaluators rather than on some con-
evaluate a body of evidence. Experts cite the concern and an exposure to a substance— sensus about causality.
general grounds for their opinion, but no spe- assessing the strength of association as weak,
cific parameters or methodologies are given moderate, or strong based on the qualitative WOE in Hypothesis Testing
for how the evidence is weighed. Thus, one values of each of five evaluation factors.22 Sometimes the term WOE refers to a
might see general statements such as: A deci- Calabrese et al.23 have proposed a quanti- methodology used for selecting between two
sion was made based on WOE standards, tative ranking scheme to evaluate the en- competing hypotheses. In this context, au-
such as number of studies, strength of associ- docrine effects of chemicals on human health. thors refer to WOE in the quantitative evalu-
ation, breadth and consistency of evidence, In their scheme endocrine disruption is con- ation of a hypothesis relative to the null hy-
correlational power, and biological plausibil- sidered a multistage process, where they as- pothesis, based on a priori evidence.24 It is
ity. A number of papers use the term WOE sume the probability of achieving the end re- common to find Bayesian methods of analysis
in the title without explaining a methodology sult, namely a clinical endocrine pathology, used, where the probability of a hypothesis is
or process that is used to do the weighting. rises as one progresses through the process. based on current evidence and prior proba-
Sometimes the application of WOE in- The authors identify five levels of evidence bilities. This use of WOE is discussed in a
volves a taxonomic presentation of studies. that correspond with the stages of the multi- published report that examines whether a
An example can be found in a 2001 study stage process, level 1 being the weakest and DNA profile of a suspect is unique in the pop-
of “disinfection by-products.”20 These are the level 5 the strongest. Then they introduce a ulation.25 A suspect’s DNA is compared to the
potential human hazards of chlorination. The point system based on a geometric progres- DNA found at the crime scene. The compar-
authors created a table of evidence, which sion (a + ar + ar2 + ar3 + ar4), which is nor- ison is presented in the form of a probability
listed the summary data of studies for each malized to 10 points when stage 5 is reached. estimate that the suspect’s DNA and the DNA
adverse reproductive effect focusing on sam- Stages 1-4 are weighted as 0.6, 1.3, 2.5, and found at the crime scene are a perfect match.
ple size, exposure assessment, relative risk, 5.0, respectively. The causal chain is neither The weight of evidence is synonymous with
and odds ratios. They describe as the goal of linear nor deterministic. Stage 3 will not al- the probability estimate.26
the paper “to view the totality of the evidence ways reach stage 5, but only does so at a
in order to judge the overall weight of evi- certain probability. Therefore, by attaching a THE FEDERAL AGENCY USE OF WOE
dence concerning ‘disinfection by-products’ weight to each stage, one is essentially assign-
and reproductive and developmental ef- ing probability estimates to the evidence. US Federal agencies, as well as interna-
fects.”21 After commenting on the categories Thus, these weights represent the probabili- tional agencies like the International Joint
listed in their taxonomy (odds ratios, uncer- ties that the specific stage will proceed to Commission,27 have begun to incorporate
tainties, and statistical significance), the au- the next stage. WOE in both their internal risk assessment
thors conclude that the weight of evidence In theory it is possible to come up with analysis and in their advisory processes
shows that low birth weight is not associated weighting factors that are empirically verifi- where they engage with external experts. The
with “disinfection by-products” exposure. But able. Let us suppose we are trying to deter- approaches taken are usually qualitative and
the outcome they reach is not logically or rig- mine whether a chemical is a human en- avoid compressing all of the data to some
orously derived by a methodology. The justifi- docrine disruptor (that it will adversely affect WOE numerical value. The ATSDR uses a
cation for the use of WOE could be en- the human endocrine system) and that there WOE approach to evaluate the synergistic
hanced if criteria for weighing the evidence are five stages in the causal chain. If we had effects of chemical mixtures.28 The ATSDR
were established at the outset. evidence that the chemical induced stage 5 describes the objectives of and factors to con-
effects, then we can declare the substance a sider in a WOE analysis in its Public Health
Aggregating Diverse Evidentiary human endocrine disruptor. Let us assume we Assessment Guidance Manual, without provid-
Modalities have evidence the chemical induced a stage 3 ing any details on how evidence is actually
In this particular use of WOE, an effort is effect. If we had a toxicological database with “weighed” or scaled.29
made to aggregate the evidence through thousands of entries that allowed us to calcu- “A weight-of-evidence analysis involves the
some combination of qualitative and/or late the percentage of those chemicals that balanced review and integration of relevant

S132 | Public Health Matters | Peer Reviewed | Krimsky American Journal of Public Health | Supplement 1, 2005, Vol 95, No. S1
⏐ PUBLIC HEALTH MATTERS ⏐

exposure, toxicologic, epidemiologic, medical, the agency would include structure-activity re- and asserted that: “Applying a descriptor is a
and health outcome data to help determine lationships (computer models of chemical sub- matter of judgment and cannot be reduced to
whether exposure to contaminant levels stances) of other carcinogenic agents, modes a formula.”37
under site-specific conditions might result in of action of carcinogenic agents at cellular What happens when you bring scientists
harmful effects. . . . The goal of the weight-of- and subcellular levels, and knowledge of toxi- together and ask them to apply a WOE quali-
evidence analysis is to decide whether or not cokinetic and metabolic processes, in addition tative heuristic and reach a conclusion on
harmful effects might be possible in the ex- to the more conventional sources of evidence. whether a substance is, is likely, or is unlikely
posed population by weighing the scientific In 1986, the EPA issued a summary rank- to be harmful? Several studies have evaluated
evidence and by keeping site-specific doses ing of five grades for possible carcinogenic expert panels’ use of WOE to determine
in perspective.”30 agents (A through E, A signifies that a chemi- whether there is consistency and convergence
The Occupational Safety and Health Ad- cal is a human carcinogen, B a probable on the application of the criteria.38 Some
ministration (OSHA) has incorporated WOE human carcinogen, etc., until we get to E, not panel studies have introduced weighting fac-
in its regulations. In OSHA’s air contaminants a carcinogen). In 1996, the EPA replaced the tors for specific evidentiary modalities (e.g., in
standard the agency stated: letters with three designations: known/likely one case, studies that show direct mechanistic
a human carcinogen, cannot be determined, evidence for an effect receive a ranking of
In response to those commenters who argued and not likely a human carcinogen. The “1.0,” whereas mechanistic data on related
that none of the studies described by OSHA
change in the carcinogen guidelines accompa- compounds receive a ranking of “0.71.”) and
presented sufficient dose-response data to be
used as a basis for establishing a limit, the nied a more expansive view of the acceptable measured the degree of consensus among ex-
Agency emphasizes that it is not relying on any sources of evidence, which the agency defines perts.39 The results in the study were mixed.
single study to determine that wood dust pre-
as a WOE approach. The EPA referred to a The six teams of experts could not always
sents a significant risk of material health im-
pairment. Instead, OSHA is making this deter- WOE evaluation as a “collective evaluation of agree on the direction of the interaction effect
mination on the basis of the findings in the all pertinent information so that the full im- of two chemicals after reviewing and ranking
dozens of studies reporting on the respiratory,
pact of biological plausibility and coherence the same data and applying the same a priori
irritant, allergic, and carcinogenic properties of
wood dust. The Agency finds the results of are adequately considered.”33 ranking scheme.
these studies biologically plausible and their The EPA notes that for a WOE approach, One of the key factors behind the reliabil-
findings reproducible and consistent. It is true
no single “weighing factor” determines the ity of science is the accuracy and replicability
that some of these studies, like all human stud-
ies, have limitations of sample size, involve overall weight; moreover, “the factors are not of measurement. The term WOE may suggest
confounding exposures, have exposure mea- scored numerically by adding pluses and mi- that a measurement is involved, but that is a
surement problems, and often do not produce
nuses.”34 The factors are judged in combina- false implication of the term. Weighing the
the kind of dose-response data that can be ob-
tained when experimental animals are sub- tion, and there is no algorithm to aggregate evidence, in the way it is carried out by regu-
jected to controlled laboratory conditions. the modalities and quality of evidence. The latory bodies, is based on human judgment.
What the large group of studies being relied
EPA does provide a guidance document that Such judgments are rarely, if ever, tested for
upon by OSHA to establish the significance of
the risk associated with exposure to wood dust indicates when the weight goes up or down. interrater reliability. Those who are consid-
do show is that the overall weight of evidence Evidence is weighted more highly when time ered experts in “weighing” evidence are con-
that such exposures are harmful and cause loss
between exposure and outcome is short; sidered so because they have a good grasp of
of functional capacity and material impairment
of health is convincing beyond a reasonable there are consistent results in independent the type and variety of evidence that, accord-
doubt.31 studies; a strong association exists between a ing to standards in their discipline, are suffi-
compound and an effect; there are reliable cient to justify a claim of cause and effect.
The EPA has used WOE in the assessment exposure data; there is a dose-response rela-
of Superfund sites, endocrine disruptors, and tionship; there are no biases and confounding WOE IN LEGAL TESTIMONY
carcinogens. In its 1986 carcinogen assess- factors; there is a high level of statistical sig-
ment guidelines, the EPA introduced the term nificance; and positive results are found in In law and public policy, three standards
WOE to describe how it combined tumor multiple species, sites, and sexes. The agency of evidence are generally recognized: pre-
findings in animals and humans as the princi- wrote: “Generally, the weight of human evi- ponderance, clear and convincing, and be-
pal elements of its WOE analysis to ascertain dence increases with the number of adequate yond a reasonable doubt. By preponderance
the carcinogenicity rating of a compound. In studies that show comparable results on pop- of evidence, it is usually meant that a hy-
subsequent years, the EPA expanded its ulations exposed to the same agent under dif- pothesis under consideration need only be
framework for a WOE evaluation of carcino- ferent conditions.”35 These qualitative weight- proven more trustworthy (more probable)
genicity by including a wider range of eviden- ing factors are consistent with the Bradford- than its negation. Most civil proceedings use
tiary sources beyond rodent and human epi- Hill criteria for inferring causation.36 a preponderance of evidence as a standard
demiological studies. In its recent policy As previously noted, the EPA defined three of proof.
document, “Proposed Guidelines for Carcino- descriptors for carcinogenicity (I, known/likely; A higher standard is found in the phrase
gen Risk Assessment”32 the EPA stated that II, cannot be determined; and III, not likely) “clear and convincing evidence.” The

Supplement 1, 2005, Vol 95, No. S1 | American Journal of Public Health Krimsky | Peer Reviewed | Public Health Matters | S133
⏐ PUBLIC HEALTH MATTERS ⏐

supporting evidence under this standard has can foster a consensus on causality, experts evaluating a hypothesis demonstrate that the
to have more than a marginal edge over the will exercise their judgment on the strength of application of WOE is not strictly a science
alternative hypothesis. It has been described evidentiary support when a subset of the but depends on the experience, as well as
as evidentiary support “sufficiently strong to pieces of the puzzle are assembled. The term other tacit factors associated with the expert,
command the unhesitating assent of every puzzle solving is an apt metaphor for the such as their familiarity with or financial
reasonable mind.”40 practice of science. Thomas Kuhn used it in connection to the substance being evaluated.
Finally, evidentiary criterion that meets the his classic study The Structure of Scientific Rev- Experts who apply a WOE analysis to evalu-
standard “beyond a reasonable doubt” is the olutions to describe the role of scientists en- ate the human health hazards of a substance
highest burden and the one used in criminal gaged in normal research problems. “Bringing draw from their personal knowledge of simi-
trials to minimize false positives (convicting a normal research problem to a conclusion is lar compounds; situate the properties of the
an innocent person). achieving the anticipated in a new way, and it compound in a ranking system; and, based
In Daubert v Merrell Dow Pharmaceuticals, requires the solution to all sorts of complex on the diversity and quality of the evidence,
Inc, the US Supreme Court issued a ruling instrumental, conceptual, and mathematical reach an informed, albeit subjective, judg-
clarifying standards for federal judges on the puzzles. The man who succeeds proves him- ment on whether the likelihood that the sub-
admissibility of expert testimony in the court- self the expert puzzle-solver.”44 The metaphor stance is the cause of a human disease is
room. According to the Daubert standard, ad- has also been cited by Susan Haack in con- strong, moderate, or weak (e.g., the sub-
missible expert testimony must meet a stan- nection with the Daubert decision: “ . . . scien- stance is a human carcinogen, a reproductive
dard of relevancy and reliability. Moreover, tists are like a bunch of people working, toxicant, or an endocrine disruptor).48 With-
some judges apply the standard to each study sometimes in cooperation with each other, out an accepted canonical methodology or
on which the expert relies, as well as the ex- sometimes in competition, on this or that part standard of weighing and combining infor-
pert’s overall conclusions. This interpretation of a vast crossword. . . .”45 mation streams, and because subjective fac-
of Daubert would have each study stand on Two experts may easily disagree on the tors inevitably shape the outcome of the pro-
its own. McGarity calls this interpretation of WOE. Who should decide whether the WOE cess, judges may not be in any better
Daubert the “corpuscular approach to expert has been met for a given hypothesis when position than jurors to decide which WOE
testimony.”41 He writes: there are contested views? After the corpus- analysis used by expert witnesses is more
“If the plaintiff fails to establish the rele- cular interpretation of Daubert, a judge ap- credible or reliable.
vance and scientific reliability of a sufficient plies the reliability standard to the admissibil-
number of individual studies, the trial judge ity of every piece of evidence in expert CONCLUSIONS
will exclude the expert’s testimony and (in the testimony without seeing it as part of the en-
absence of other relevant and reliable expert tire evidentiary record. By disqualifying the As a metaphor, the term WOE turns a cog-
testimony on causation) grant the defendant’s evidence as unreliable on its own weight, ju- nitive and subjective process, as in the case of
motion for summary judgment before the rors may never hear the total weight of scien- juries “weighing the evidence,” into some-
jury ever enters the picture.”42 tific evidence. McGarity concludes: “It is not thing that connotes a purely rational and ob-
If McGarity is correct on how Daubert has at all clear that lay judges have the where- jective process. If we add the term “scientific”
been applied, then we will begin to witness a withal to distinguish unreliable expert testi- to the phrase, as in “weight of scientific evi-
divergence between judicial and regulatory mony from reliable testimony based on scien- dence,” it suggests even more precision by
approaches to evidence. In regulation, the tific studies that have been ‘deconstructed’ by drawing its symbolic meaning from the terms
strands of evidence are not assumed to stand paid industry consultants.”46 “weighing” (from the weights and mea-
by themselves. Rather, they are seen as pieces When an agency reports, “according to a sures) and “science” (the most dependable
of a puzzle. McGarity notes: “corpuscular ap- WOE determination chemical X causes (does self-correcting system for fixing belief). In this
proach effectively prevents the expert in not cause) a human disease,” a number of metaphor there is a triple dose of constructed
toxic tort cases from applying the ‘weight-of- possible presuppositions are implicit in the rationality. Our first realization is that the
evidence’ approach that regulatory agencies decision process including: “weighing instrument” for “weighing evi-
universally employ in addressing the risks dence” is human cognition, which has never
• a socially constructed heuristic for classify-
that toxic substances pose to human be- been calibrated to the task. In fact, “weighing
ing studies or evaluating data,47
ings.”43 He likens the WOE approach in risk evidence” has little if anything in common
• an a priori numerical weighting scheme,
assessment to the jury’s role in civil trials in with weights and measures. Secondly, evi-
and
weighing the quality and credibility of various dence for a hypothesis generally appears in
• a constructed consensus from a panel of sci-
testimonies. gradations, with the exception of the evidence
entists through an interactive consultative
Because there is no algorithm or canonical from a crucial experiment. Generally, there is
process, such as the Delphi Process.
methodology for determining WOE in cir- more or less evidence or conflicting evidence,
cumstances where no single study is definitive Studies that have measured the variance or more or less uncertainty in the evidence.
and there is no determinative experiment that in expert judgments on the use of WOE in The approach that uses WOE applies a

S134 | Public Health Matters | Peer Reviewed | Krimsky American Journal of Public Health | Supplement 1, 2005, Vol 95, No. S1
⏐ PUBLIC HEALTH MATTERS ⏐

method that treats evidence as a continuous velop a strong comparative approach for as- Weight of Evidence: Adapting Gatekeeping Concepts
variable and turns it into a dichotomous sessing the potential health and environmen- from the Courts.” Risk Analysis 14 (1996): 793-799.

(below or above the threshold) or triadic tal effects of products. On the other hand, the 10. G. E. Dallal, Chief, Biostatistics Unit, The Little
Handbook of Statistical Practice (The Jean Mayer USDA
variable: “yes,” “no,” or “probably.” (I am in- transparency of WOE will enable jurors and Human Nutrition Research Center on Aging, Tufts Uni-
debted to Susan Haack for suggesting this stakeholders to fully grasp the norms and a versity), available at http://www.tufts.edu/~gdallal/
point.) Third, the process of assigning values priori assumptions that enter into the analysis. LHSP.HTM.

(qualitative or quantitative) to different evi- The Daubert decision and subsequent related 11. R. A. Chechile “Probability, utility, and decision
trees in environmental decision analysis,” in Environ-
dentiary modalities or to studies of different procedures should neither serve as an excuse mental Decision Making: A Multidisciplinary Perspective.
quality within the same modality is generally for “disbarring” WOE analysis in risk assess- (New York: Van Nostrand, 1991), 64-91.
constructed a priori (independent of empiri- ment nor prevent jurors from learning about 12. Menzie, 1.
cally based evidence) for each specific case. the value and limitations that it may bring to 13. M. A. Ibrahim, G. G. Bond, T. A. Burke et al.
Where frameworks or models have been de- litigation. “Weight of the Evidence on the Human Carcinogenci-
tiy of 2,4-D.” Environmental Health Perspectives 96
veloped for this purpose, they have not been
(1991): 213-222.
standardized.49
14. R. L. Cooper and R. J. Kavlock. “Endocrine Disrup-
Writing about the environmental etiology About the Author tors and Reproductive Development: A Weight of Evi-
of childhood diseases, Debaun and Gurney The author is with the Department of Urban and Environ- dence Overview. Journal of Endocrinology 152 (1997):
mental Policy and Planning at Tufts University. 159-166.
highlight the essential role of a conceptual Request for reprints should be sent to Sheldon Krimsky,
framework for weighing the evidence. “In- PhD, Department of Urban and Environmental Policy and 15. Agency for Toxic Substances and Disease Registry
Planning, Tufts University, Medford, MA 02155 (e-mail: (ATSDR), “The Assessment Process: An Interactive
formed recommendations require systematic Learning Program,” available at http://www.atsdr.cdc.
sheldon.krimsky@tufts.edu).
assessments of the weight of evidence from This article was accepted July 27, 2004. gov/training/public-health-assessment-overview/html/
available studies and placement of the studies module2/sv18.html. Accessed March 24, 2005.

into a conceptual framework that allows for 16. Agency for Toxic Substances and Disease Registry
Acknowledgments (ATSDR), “The Assessment Process: An Interactive
available data to be reviewed in the context This work was supported in part by the Project on Sci-
Learning Program,” available at http://www.atsdr.cdc.
of epidemiology principles of causal infer- entific Knowledge and Public Policy.
gov/training/public-health-assessment-overview/html/
Special thanks to the SKAPP Planning Committee,
ence.”50 Presuppositions within these frame- especially David Ozonoff, and participants at the Coron-
module2/sv18.html. Accessed March 24, 2005.
works about the value of different forms of ado Conference in 2003 for their constructive com- 17. Menzie.
evidence may bias the outcome of a WOE ments on an earlier version of the paper. 18. A. Edwards, G. Elwyn, K. Hood, and S. Rollnick
“Judging the Weight of Evidence in Systematic Re-
analysis. For example, some WOE approaches
views: Introducing Rigour into the Qualitative Over-
give higher weight to mechanistic information References
view Stage by Assessing Signal and Noise.” Journal of
1. A. S. Eddington, Space, Time and Gravitation.
over epidemiological data. Where mechanistic (Cambridge, UK: Cambridge University Press, 1920).
Evaluation in Clinical Practice 6 (2000): 177-184.
knowledge may be unavailable for a particu- 2. K. R. Popper, The Logic of Scientific Discovery.
19. R. L. Cooper and R. J. Kavlock “Endocrine Disrup-
lar substance, the value of excellent human tors and Reproductive Development: A Weight-of-
(New York: Harper, 1959), 277.
Evidence Review.” Journal of Endocrinology 152
epidemiological data may be reduced in the 3. K. Rothman. Causation and Causal Inference in (1997):159-166.
weighing schema because of a priori assump- Epidemiology. Draft paper delivered to the Coronado
20. C. G. Graves, G. M. Matanoski, and R. G. Tardiff
Conference on Scientific Evidence and Public Policy,
tions about evidence. “Weight of Evidence for an Association between Ad-
March 3-4, 2003.
The use of all the relevant evidence for as- verse Reproductive and Developmental Effects and Ex-
4. C. C. Willhite. “Weight-of-Evidence versus posure to Disinfection By-Products: A Critical Review.”
sessing the health effects of a substance is cer- Strength-of-Evidence in Toxicologic Hazard Identifica- RegulatoryToxicology and Pharmacology 34 (2001):
tainly an advance over restricting assessment tion: Di(2-Ethylhexyl)Phthalate (DEHP),” Toxicology 103-124.
160 (2001): 219-226.
to a few choice evidentiary modalities, where 21. Ibid, 110.
information derived from these modalities is 5. J. M. Culp, R. B. Lowell, and K. J. Cash. “Integrating
22. World Health Organization, IPCS Global Assess-
Mesocosm Experiments with Field and Laboratory
scarce or the results highly uncertain. A legal ment of the State of the Science of Endocrine Disrup-
Studies to Generate Weight-of-Evidence Risk Assess-
tors: Chapter 7. “Causal Criteria for Assessing En-
process that rejects the use of WOE or re- ments for Large Rivers.” Environmental Toxicology and
docrine Disruptors—a Proposed Framework,” 123-128
stricts its utilization seems to be at odds with Chemistry 19 (2000): 1167-1173.
http://www.who.int/ipcs/publications/new_issues/
current practices in regulatory science, where 6. L. W. Hall and J. M. Giddings. “The Need for Mul- endocrine_disruptors/en/. Accessed March 24, 2005.
tiple Lines of Evidence for Predicting Site-Specific Eco-
knowledge about a potentially hazardous logical Effects.” Human and Ecological Risk Assessment
23. E. J. Calabrese, L. A. Baldwin, P. T. Kostecki, et al.
“A Toxicologically Based Weight-of-Evidence Methodol-
product is pursued through a triangulation of 6 (2000): 679-710.
ogy for the Relative Ranking of Chemicals of En-
evidentiary streams. Moreover, the same legal 7. C. Menzie, M. H. Henning, J. Cura, et al. docrine Disruption Potential.” Regulatory Toxicology and
processes that acknowledge the value of “A Weight-of-Evidence Approach for Evaluating Pharmacology 26 (1997): 36-40.
Ecological Risks: Report of the Massachusetts Weight-
WOE must also acknowledge that its use is 24. E. P. Smith, I. Lipkovich, and K. Ye. Weight of Evi-
of-Evidence Work Group.” Human Ecological Risk
not a rigorous science and, therefore, must be dence (WOE): Quantitative Estimate of Probability of Im-
Assessment 2 (1996): 277-304.
pact. Working Paper. February 10, 2002.
open to public view and interpretation. When 8. G. W. Suter II, ed. Ecological Risk Assessment.
25. D. J. Balding “When Can a DNA Profile Be Re-
WOE is used consistently and uniformly by a (Chelsea, MI: Lewis Pub. Co, 1993), 86.
garded as Unique?” Science and Justice 39 (1999):
regulatory body, it enables that body to de- 9. V. R. Walker, “Risk Characterization and the 257-260.

Supplement 1, 2005, Vol 95, No. S1 | American Journal of Public Health Krimsky | Peer Reviewed | Public Health Matters | S135
⏐ PUBLIC HEALTH MATTERS ⏐

26. I. W. Evett, L. A. Forman, G. Jackson, et al. “DNA published manuscript. (Blacksburg, VA: Department of
Profiling: a Discussion of Issues Relating to the Report- Statistics, Virginia Tech., 2002).
ing of Very Small Match Probabilities.” Criminal Law
48. The ATSDR endorses “the use of a narrative
Review (May 2000): 341-355.
statement incorporating “weight-of-evidence” conclu-
27. International Joint Commission (IJC). Sixth Biennial sions in lieu of alphanumeric designations alone in con-
Report on Great Lakes Water Quality. Washington, DC: veying qualitative conclusions regarding carcinogenic-
International Joint Commission, 1992. ity,” available at: http://www/atsdr.cdc.gov/cancer.
html, p. 9. Accessed March 24, 2005.
28. H.R. Pohl, N. Roney, M. Fay, et al. “Site-Specific
Consultation for a Chemical Mixture.” Toxicology and 49. Op. cit. Ibrahim et al. 1991, p. 219.
Industrial Health 15 (1999): 470-479.
50. M. R. DeBaun and J. G. Gurney “Environmental
29. Agency for Toxic Substances and Disease Registry Exposure and Cancer in Children: a Conceptual Frame-
(ATSDR). Public Health Assessment Guidance Manual, work for the Pediatrician.” Pediatric Clinics of North
Ch. 8, Health effects evaluation: weight-of-evidence America 48 (2001): 1125.
analysis, available at http://www.atsdr.cdc.gov/HAC/
PHAmanual/ch8p1.html. Accessed June 5, 2004.
30. Agency for Toxic Substances and Disease Registry
(ATSDR), Chapter 8, http://www.atsdr.cdc.gov/HAC/
PHAmanual/ch8p1.html, pp. 2-3. Accessed March 24,
2005.
31. 29CFR Par. 1910. Air Contaminants, Sec. VI.
Health Effects Discussion and Determination of Final Pel,
January 1989.
32. Environmental Protection Agency. Proposed
Guidelines for Carcinogen Risk Assessment. Federal
Register 61(79):17960-18011 (April 23, 1996). Here-
after, EPA 1996.
33. Ibid., 17981.
34. Ibid.
35. Ibid.
36. A. Bradford-Hill, “The Environment and Disease:
Association or Causation?” Proc. Royal Soc. Med. 58
(1965): 295-300.
37. EPA 1996, 17985.
38. M. E. Anderson, M. E. Meek, G. A. Boorman, et al.
“Lessons Learned in Applying the U.S. EPA Proposed
Cancer Guidelines to Specific Compounds.” Toxicologi-
cal Sciences 53 (2000): 159-172.
39. M. M. Muntaz, P. Furkin, G. I. Diamond, et al. “Ex-
ercises in the Use of Weight-of-Evidence Approach for
Chemical-Mixture Interactions.” Journal of Clean Tech-
nology, Environmental Toxicology, and Occupational
Medicine 5 (1996): 339-345.
40. V. R. Walker “Risk Characterization and the
Weight of Evidence: Adapting Gatekeeping Concepts
from the Courts.” Risk Analysis 16 (1996): 793-799.
41. T. O. McGarity “Proposal for Linking Culpability
and Causation to Ensure Corporate Accountability for
Toxic Risks.” William and Mary Environmental Law and
Policy Review, Fall 2001.
42. McGarity, 7.
43. Ibid., 8.
44. T. S. Kuhn. The Structure of Scientific Revolutions.
(Chicago: University of Chicago Press, 1962), 36.
45. S. Haack “An Epistemologist in the Bramble-Bush:
at the Supreme Court with Mr. Joiner.” Journal of
Health, Politics, Policy & Law, 26 (2001): 217-248.
46. McGarity, 12.
47. A Baysian statistical approach to WOE is vien in:
E. P. Smith, I. Lipkovich, and K. Ye. Weight of evidence
(WOE): Quanitative estimation of probability impact. Un-

S136 | Public Health Matters | Peer Reviewed | Krimsky American Journal of Public Health | Supplement 1, 2005, Vol 95, No. S1

Potrebbero piacerti anche