Sei sulla pagina 1di 3

COMMENT

SUSTAINABILITY When the wells POPULATION The best way to PUBLISHING Pay reviewers OBITUARY Statistician
of the world run dry, what reduce abortions is to invest in coupons for open- StephenE. Fienberg,
then? p.412 in family planning p.414 access fees p.414 remembered p.415
ILLUSTRATION BY DAVID PARKINS

No publication without
confirmation
Jeffrey S. Mogil and Malcolm R. Macleod propose a new kind of paper that combines
the flexibility of basic research with the rigour of clinical trials.

C
oncern over the reliability of Bolder ideas are needed. What we propose push researchers to be more sceptical of
published biomedical results grows here is a compromise between the need to their own work. Instead of striving to con-
unabated. Frustration with this trust conclusions in published papers and vince reviewers and editors to publish a
reproducibility crisis is felt by everyone the freedom for basic scientists to explore paper in prestigious outlets, they would be
pursuing new disease treatments: from and innovate1. Our proposal is a new type of questioning whether their hypotheses could
clinicians and would-be drug developers paper for animal studies of disease therapies stand up in a large, confirmatory animal
who want solid foundations for the preclini- or preventions: one that incorporates an inde- study. Such a trial would allow much more
cal research they build on, to basic scientists pendent, statistically rigorous confirmation flexibility in earlier hypothesis-generating
who are forced to devote more time and of a researchers central hypothesis. We call experiments, which would be published in
resources to newly imposed requirements this large confirmatory study a preclinical the same paper as the confirmatory study.
for rigour, reporting and statistics. Tight- trial. These would be more formal and rigor- If the idea catches on, there will be fewer
ening rigour across all experiments will ous than the typical preclinical testing con- high-profile papers hailing new therapeu-
decrease the number of false positive find- ducted in academic labs, and would adopt tic strategies, but much more confidence in
ings, but comes with the risk of reducing many practices of a clinical trial. their conclusions.
experimental efficiency and creativity. We believe that this requirement would The confirmatory study would have

2 3 F E B R UA RY 2 0 1 7 | VO L 5 4 2 | N AT U R E | 4 0 9

2
0
1
7
M
a
c
m
i
l
l
a
n
P
u
b
l
i
s
h
e
r
s
L
i
m
i
t
e
d
,
p
a
r
t
o
f
S
p
r
i
n
g
e
r
N
a
t
u
r
e
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
COMMENT

three features. First, it would adhere to have low statistical power4 and a high risk of Confirmatory labs would be less depend-
the highest levels of rigour in design (such bias5. Many journals, including this one, have ent on positive results than the original
as blinding and randomization), analysis promoted guidelines such as those framed by researchers, a situation that should promote
and reporting. Second, it would be held to the ARRIVE initiative6. The impact of these the publication of null and negative results.
a higher threshold of statistical significance, publishing policies is being investigated7 but They would be rewarded by authorship on
such as using P values of P<0.01 instead is not yet clear. published papers, service fees, or both. They
of the currently standard P<0.05. Third, Under our proposal, a protocol for the would also be more motivated to build a rep-
it would be performed by an independent confirmatory study would be set out in utation for quality and competence than to
laboratory or consortium. This exceeds advance, specifying the hypothesis, the key achieve a particular finding.
the requirements currently proposed by outcome measures and the plan for statistical For findings with immediate clinical
various checklists and funders, but would analysis. Enough animals should be studied applications (that is, a potential treatment
apply only to the final, crucial confirmatory so that a positive statistical test means that that might go into human testing), we pro-
experiment. the hypothesis is very likely to be correct pose an extra generalizability study to fol-
Unlike clinical studies, most preclinical (see The maths of predictive value). Sam- low the confirmatory phase (see Publication
research papers describe a long chain of ple sizes for this crucial experiment would with confirmation). This would be designed
experiments, all incrementally building sup- need to go up; we estimate around sixfold. to assess how widely applicable the treat-
port for the same hypothesis. Such papers Overall, however, the subsequent savings in ment might be, and to boost confidence that
often include more than a dozen separate both animals and money are likely to be sub- it will work across a range of situations. One
invitro and animal stantial; fewer people would waste resources strategy is to repeat the confirmatory study
experiments, with This would following up on weak papers. This would get across multiple sites, with built-in biological
each one required to represent a new drugs to market more quickly. variability. By broadening the circumstances
reach statistical sig- big shift in in which the hypothesis is tested (animal age,
nificance. We argue how scientists GETTING IT DONE strain, sex, health, co-morbidity, precise
that, as long as there produce Who will conduct these hypothesis-testing assay used, drug administration, timing of
is a final, impeccable papers. experiments, and why would they want to? outcome assessments), such studies are more
study that confirms Preclinical trials should be run by research- likely to provide clinically useful informa-
the hypothesis, the earlier experiments in ers with strong expertise in the relevant ani- tion and to survive replication attempts.
this chain do not need to be held to the same mal models, and we believe that some will Generalizability studies would probably
rigid statistical standard. decide to specialize in performing confirm- be beyond an individual labs capabilities
This would represent a big shift in how atory experiments for colleagues. Another and require multicentre consortia, but prin-
scientists produce papers, but we think that option would be to establish dedicated ani- ciples and tools to support them are already
the integrity of biomedical research could mal-testing facilities, analogous to genomics in place. The Multi-PART consortium (www.
benefit from such radical thinking. and bioinformatics core facilities. These pro- dcn.ed.ac.uk/multipart) has established a
vide high-quality services and have become web-based system that allows the design,
FINAL CONFIRMATION a crucial part of the scientific enterprise. execution and assessment of studies across
For hypotheses with clear clinical implica- Additionally, consortia might be set up to an unlimited number of centres. Its plans
tions, the logical confirmatory experiment conduct such studies, and to develop and include multicentre testing of interventions
almost always involves animal studies, in deepen the methodologies used in them. that increase oxygen delivery to brain regions
which the effect of a treatment strategy or a Specialized confirmatory labs would affected by stroke. In collaboration with the
genetic mutation is assessed in mice or rats. increase the quality of animal studies, and International League Against Epilepsy, it also
The execution of these studies is often poor2. free the labs that did the initial experiments plans to test potential new epilepsy drugs.
For example, behavioural testing such as to focus on their core expertise. We think that
gauging the extent of pain or paralysis falls government funders and industry partners, ENJOY THE EXPERIMENT
outside the core competency of most molec- which have spent billions of dollars on disap- With a system in place for rigorous hypoth-
ular biology labs. Large variability or ques- pointing clinical trials, would be prepared to esis testing, other formalities become less
tionable baseline measures cloud results3. shift resources to support such an improved necessary. Any experiment in the explora-
In addition, most studies conducted today system, perhaps by offering dedicated grants. tory stage could be performed without for-
mal statistical hypothesis testing. No P-value
thresholds would need to be reached; results
PUBLICATION WITH CONFIRMATION sections might display only a central esti-
Our proposed paper would be accepted by journals only if it included a preclinical trial following best
clinical-research practices. For therapies that might later be tested in humans, all three study types are mate, such as mean or median, and a meas-
recommended. ure of the spread of the data or, ideally, the
Exploratory studies Confirmatory study Generalizability study individual data points themselves. This is in
(preclinical trial) line with recommendations that the Ameri-
Who Original researchers Separate team, core Multicentre consortium can Statistical Association made last year
facility or consortium (see go.nature.com/2kbqkxu) that Pvalues
Why To generate hypotheses To test hypotheses To test broader application of
alone are not good measures of evidence for
hypotheses a hypothesis. Sample sizes should be large
Aims To maximize efficiency and To avoid false positive To judge readiness for clinical
enough to give investigators confidence in
exploration findings translation the direction of effect, and small enough to
Features High flexibility, no High rigour and a High rigour, built-in variability
save time and money. Complete reporting
mandatory statistics predefined statistical in animal subjects and assays and attention to confounding variables are
analysis plan (P<0.01) still essential; researchers should not exclude
Publishing Preprint server or informal New category of journal New category of journal animals from results without mentioning
venue means. Formal publication article, recognized as article, recognized as having them, and they should avoid methods that
requires confirmatory study having high impact exceptionally high impact would introduce bias or batch effects.

4 1 0 | N AT U R E | VO L 5 4 2 | 2 3 F E B R UA RY 2 0 1 7

2
0
1
7
M
a
c
m
i
l
l
a
n
P
u
b
l
i
s
h
e
r
s
L
i
m
i
t
e
d
,
p
a
r
t
o
f
S
p
r
i
n
g
e
r
N
a
t
u
r
e
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.
COMMENT

Reviewers would focus on statistics in the


S I G N I F I CA N T SA MP L ES confirmatory study. For graduate students
and postdocs, coveted publications would
The maths of predictive value depend less on particular results in early
experiments, and more on the strength
How likely is it that a hypothesis is can detect some predetermined (and of their overall hypotheses. Eventually,
correct? This is best answered by positive presumably meaningful) effect size, such the incentive system would subtly shift to
predictive value (PPV) not by P values, as the difference between a treatment and reward greater confidence and caution in
as is commonly thought. The PPV reflects control group. conclusions: researchers would be rewarded
the probability that a positive result is In this example, it is assumed that more for the marathon than for the sprint.
truly positive. It is determined by P values 250 potential therapies went through This system would slow the rate of pub-
(calculated after results are collected) preclinical testing. (Results later showed lications, but not the pace of discovery.
and statistical power (which should be that 50 work (green) and 200 do not (red)). Scientific priority could be established by
calculated before a study begins). Statistical Ratios change depending on the fraction of the date on which an experimental plan was
power describes the chance a study promising molecules that actually work. agreed (essentially registered) between the
original researchers and those performing
STATUS QUO: Most studies have a statistical power of only 20% and a P value of 0.05, meaning the confirmatory study. Furthermore, if pub-
many more false findings (PPV of 50%). This reflects a sample size of about 10 mice per study.
lished studies are more reliable and public
10 promising 10 false confidence in science is boosted, a somewhat
molecules found positives found
slower publication process seems acceptable.
We trust that reviewers and tenure com-
mittees will find appropriate ways to credit
papers that include confirmation.
20 preclinical studies
showed promise and were
published, but 10 (50%) WHAT NEXT?
were false positives. This proposal does not fix everything that is
currently broken in translational medicine,
including false conclusions drawn from
inappropriate animal models, unappreci-
ated variables (such as animal microbiomes
40 undetected 190 true negative results (rarely published) or the sex of experimenters) and publication
bias. But we believe it is worth a try.
PROPOSED STANDARDS: To achieve a PPV of 95%, study results would need a P value of 0.01
It is not practical to expect the community
and a large enough sample size to reach 80% statistical power (typically >75 mice per study). to change direction in step and as one. Four
40 promising 2 false things could help. Journals should make
molecules found positives found space for papers that include confirmatory
experiments along with exploratory work.
(They could eventually prioritize them or
42 studies showed even make confirmatory experiments a
promise and were requirement.) Tenure and faculty-assess-
published, and only 2
(5%) were false positives. ment committees should find ways to credit
such work. Funders could develop schemes
to pilot this approach, and those who run
clinical trials should demand greater con-
fidence in the premise underlying human
studies. With even some of these incentives
in place, scientists will lead the charge.
10 undetected 198 true negative results

Jeffrey S. Mogil is a basic neuroscientist


at McGill University in Montreal,
Wouldnt this lowered bar increase the overcome the resources needed to conduct Canada. Malcolm R. Macleod is a
number of false positives? We think not. a preclinical trial. clinical neuroscientist at the University of
Because investigators would be required to Importantly, our proposal would preserve Edinburgh, UK.
put up or shut up and formally submit to a the fun of doing exploratory science. In this e-mails: jeffrey.mogil@mcgill.ca;
preclinical trial, they would be more com- new system, the costs of poor science (for malcolm.macleod@ed.ac.uk
prehensive and careful with their explora- example, being seduced by a rogue find-
1. Kimmelman, J., Mogil, J. S. & Dirnagl, U. PLoS
tory work. They would have an incentive ing or being too cavalier in experimental Biol. 12, e1001863 (2014).
to do the experiments that might disprove design) are borne by the initial research- 2. Macleod, M. R. et al. PLoS Biol. 13, e1002273
their hypothesis at an early stage. Con- ers. If they cut corners, cherry-pick data or (2015).
versely, researchers would not feel obliged eschew blinding in their experiments, they 3. Perrin, S. Nature 507, 423425 (2014).
4. Button, K. S. et al. Nature Rev. Neurosci. 14,
to perform experiments that they consider harm their chances of their hypothesis sur- 365376 (2013).
uninformative, as is too often the case today. viving the rigorous testing proposed in a 5. Head, M. L., Holman, L., Lanfear, R., Kahn, A. T. &
Even more importantly, they would not need preclinical trial. We predict that data fraud Jennions, M. D. PLoS Biol. 13, e1002106 (2015).
6. Kilkenny, C., Browne, W. J., Cuthill, I. C., Emerson, M.
to increase sample size until each and every would decrease as well, because the need for & Altman, D. G. PLoS Biol. 8, e1000412 (2010).
Pvalue dropped below 0.05. The efficien- every experiment to reach an arbitrary sta- 7. Cramond, F. et al. Scientometrics 108, 315328
cies gained by this change should more than tistical threshold would be rendered moot. (2016).

2 3 F E B R UA RY 2 0 1 7 | VO L 5 4 2 | N AT U R E | 4 1 1

2
0
1
7
M
a
c
m
i
l
l
a
n
P
u
b
l
i
s
h
e
r
s
L
i
m
i
t
e
d
,
p
a
r
t
o
f
S
p
r
i
n
g
e
r
N
a
t
u
r
e
.
A
l
l
r
i
g
h
t
s
r
e
s
e
r
v
e
d
.

Potrebbero piacerti anche