Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Oliver a n d Boyd
E d i n b u r g h : Tweeddale C o u r t
London: 3ga Welbeck Street, W. I
I
PREFACE TO FIRST EDITION
i
I N 1925 the author wrote a book (Statistical Methods
for Research Workers) with the object of supplying
practical experimenters and, incidentally, teachers of
mathematical statistics, with a connected account of
the applications in laboratory work of some of the
more recent advances in statistical theory. Some of
the new methods, such as the analysis of variance,
were found to be so intimately related with problems
of experimental design that a considerable part of
the eighth chapter was devoted to the technique of
agricultural experimentation, and these sections have
been progressively enlarged with subsequent editions,
in response to frequent requests for a fuller treatment
of the subject. The design of experiments is, however,
too large a subject, and of too great importance to the
general body of scientific workers, for any incidental
treatment to be adequate. A clear grasp of simple
and standardised statistical procedures will, as the
reader may satisfy himself, go far to elucidate the
principles of experimentation ; but these procedures
are themselves only the means to a more important
end. Their part is to satisfy the requirements of sound
and intelligible experimental design, and to supply the
machinery for unambiguous interpretation. T o attain
a clear grasp of these requirements we need to study
designs which have been widely successful in many
fields, and to examine their structure in relation to the
PRINTED A N D PUBLISHED IN G R E A T BRITAIN
BY OLIVER AND BOYD L T D . , EDINBURGH
requirements of valid inference.
T h e examples chosen in this book are aimed at
vii
26 THE PRINCIPLES OF EXPERIMENTATION
decided to do so, in order that it may be seen that the which M r Galton has h a d the great kindness to draw
superiority of the crossed plants over the self-fertilised up for me."
does not commonly depend on the presence of two or 15. Galton's Method of Interpretation
three extra fine plants on the one side, or of a few very I I
I have examined the measurements of the plants with
poor plants o n the other side. Although several care, and by many statistical methods, to find out how far the
observers have insisted in general terms on the offspring means of the several sets represent constant realities, such as
from intercrossed varieties being superior to either would come out the same so long as the general conditions of
parent-form, no precise measurements have beeA given ; growth remained unaltered. The principal methods that were
a n d I have met with no observations on the effects of adopted are easily explained by selecting one of the shorter
crossing a n d self-fertilising the individuals of the same series of plants, say of 2 e a mays, for an example.
variety. Moreover, experiments of this kind require " The observations as I received them are shown in columns
so much t i m e m i n e having been continued during I I. and I I I., where they certainly have no prima^facie appearance
eleven years-that they are not likely soon to be of regularity. But as soon as we arrange them in the order of
repeated. their mag~itudes,as in columns IV. and V., the case is materially
altered. We now see, with few exceptions, that the largest
" A s only a moderate number of crossed a n d self- plant on the crossed side in each pot exceeds the largest plant
fertilised plants were measured, it was of great importance
on the self-fertilised side, that the second exceeds the second,
to me to learn how far the averages were trustworthy. the third the third, and so on. Out of the fifteen cases in the
I therefore asked M r Galton, who has had much experi- table, there are only two exceptions to this rule.' We may
ence in statistical researches, to examine some of my therefore confidently affirm that a crossed series will always
tables of measurements, seven in number, namely be found to exceed a self-fertilised series, within the range
- of the
those of Zpomiza, Digitalis, Reseda lutea, Viola, conditions under which the present experiment has been made.
Limnanthes, Petunia, and Zea. I may premise that 11
Next as regards the numerical estimate of this excess.
if we took b y chance a dozen or score of men belonging I The mean values of the several groups are so discordant, as
to two nations a n d measured them, it would I presume is shown in the table just given, that a fairly precise numerical
be very rash to form a n y judgment from such small estimate seems impossible. But the consideration arises,
numbers on their average heights. But the case is whether the difference between pot and pot may not be of
somewhat different with m y crossed a n d self-fertilised I much the same order of importance as that of the other
conditions upon which the growth of the plants has been
plants, a s they were of exactly the same age, were
modified. If so, and only on that condition, it would follow
subjected from first to last to the same conditions, and that when all the measurements, either of the crossed or the
were descended from the same parents. When only self-fertilised plants, were combined into a single series, that
from two to six pairs of plants were measured, the I series would be statistically regular. The experiment is tried
results a r e manifestly of little or no value, except in so in columns VII. and VIII., where the regularity is abundantly
far as they confirm a n d are confirmed by experiments clear, and justifies us in considering its mean as perfectly reliable
made on a larger scale with other species. I will now * Galton evidently did not notice that this is true also before rearrange-
give the report on the seven tables of measurements, ment.
EXPERIMENT O N G;RO;WTH RATE
I GALTON'S METHOD OF INTERPRETATION
---- *bWim
0000 V I N
Pot. Crossed. Self-fert. Difference.
tative effects which conform satisfactorily to a theoretical With respect to these differences our null hypothesis
distribution known as the normal law of frequency of asserts that they are normally distributed about a mean
error. I t is this circumstance that makes it appropriate value at zero, and we have to test whether our 15
to choose, as the null hypothesis to be tested, one for observed differences are compatible with the supposition
which an exact statistical criterion is available, namely that they are a sample from such a population.
that the two groups of measurements are samples
TABLE 3
drawn from the same normal population. On the basis
Dzferences i n e i g h t h of an inch 6etzpreen cross- and
of this hypothesis we may proceed to compare the self-fertilised plants of the same pair
average difference in height, between the cross-fertilised
and the self-fertilised plants, with such differences as
might be expected between these averages, in view of
the observed discrepancies between the heights of
plants of like origin.
T h e calculations needed to make a rigorous test of
We must now see how the adoption of the method
the null hypothesis stated above involve no more than
of pairing determines the details of the arithmetical
the sum, and th,e sum of the squares, of these numbers.
procedure, so as to lead to an unequivocal interpreta-
The sum is 314, and, since there are 15 plants, the
tion. The pairing procedure, as indeed was its purpose,
has equalised any differences in soil conditions, illumina- mean difference is 2 0 3 in favour of the cross-fertilised
tion, air-currents, etc., in which the several pairs of 1S-
plants. The sum of the squares is 26,518, and from
individuals may differ. Such differences having been
this is deducted the product of the total and the mean,
eliminated from the experimental comparisons, and
or 6573, leaving 19,945 for the sum of squares of devia-
contributing nothing to the real errors of our experiment,
tions from the mean, representing discrepancies among
must, for this reason, be eliminated likewise from our
the differences observed in the 15 pairs. T h e algebraic
estimate of error, upon which we are to judge what
fact here used is that
differences between the means are compatible with the
null hypothesis, and what differences are so great as to S ( X - ~=
) ~S(x2)-2S(x)
be incompatible with it. We are therefore not con- where S stands for summation over the sample, and
cerned with the differences in height among plants of f for the mean value of the observed differences, x.
like origin, but only with differences in height between We may make from this measure of the discrepancies
members of the same pair, and with the discrepancies an estimate of a quantity known as the variance of an
among these differences observed in different pairs. individual difference, by dividing by 14, one less than
Our first step, therefore, will be to subtract from the the number of pairs observed. Equally, and what is
height of each cross-fertilised plant the height of the more immediately required, we may make an estimate
self-fertilised plant belonging to the same pair. The of the variance of the mean of 15 such pairs, by dividing
differences are shown below in eighths of an inch. again by r 5, a process which yields 94.976 as the estimate.
38 E X P E R I M E N T ON GROWTH RATE FALLACIOUS METHODS 39
The square root of the variance is known as the standard can suffice by themselves to establish the point at issue,
error, and it is by the ratio which our observed mean and those which are of little value except in so far as
difference bears to its standard error that we shall judge they confirm or are confirmed by other experiments of
of its significance. Dividing our difference, 20.933, a like nature. In particular, it is to be noted that
I
by its standard error 9.746, we find this ratio (which is Darwin recognised that the reliability of the result
usually denoted by t) to be 2.148. must be judged by the consistency of the superiority
The object of these calculations has been to obtain of the crossed plants over the self-fertilised, and not
from the data a quantity measuring the average differ- only on the difference of the averages, which might
ence in height between the cross-fertilised and the self- depend, as he says, on the presence of two or three
fertilised plants, in terms of the observed discrepancies extra-fine plants on the one side, or of a few very poor
among these differences ; and which, moreover, shall plants on the other side ; and that therefore the pre-
be distributed in a known manner when the null hypo- sentation of the experimental evidence depended essen-
thesis is true. The mathematical distribution for our tially on giving the measurements of each independent
present problem was discovered by " Student " in plant, and could not be assessed from the mere averages.
1908, and depends only upon the number of independent I t may be noted also that Galton's scepticism of the
comparisons (or the number of degrees of freedom) value of the probable error, deduced from only 15 pairs
available for calculating the estimate of error. With of observations, though, as it turned out, somewhat
, excessive, was undoubtedly right in principle. The
15 observed differences we have among them 14 inde-
pendent discrepancies, and our degrees of freedom are standard error (of which the probable error is only a
14. The available tables of the distribution of t show conventional fraction) can only be estimated with con-
that for 14 degrees of freedom the value 2.145 is exceeded siderable uncertainty from so small a sample, and,
T
by chance, either in the positive or negative direction, prior to " Student's " solution of the problem, it was
in exactly 5 per cent. of random trials. The observed i
by no means clear to what extent this uncertainty would
value o f t , 2.148, thus just exceeds the 5 per cent. point, invalidate the test of significance. From " Student's "
and the experimental result may be judged significant, work it is now known that the cause for anxiety was
though barely so. not so great as it might have seemed. Had the standard
error been known with certainty, or derived from an
i
18. Fallacious Use of Statistics effectively infinite number of observations, the 5 per
We may now see that Darwin's judgment was cent. value of t would have been 1.960. When our
perfectly sound, in judging that it was of importance estimate is based upon only 15 differences, the 5 per
to learn how far the averages were trustworthy, and cent. value, as we have seen, is 2.145, or less than
, 10 per cent. greater. Even using the inexact theory
that this could be done by a statistical examination of
the tables of measurements of individual plants, though available at the time, a calculation of the probable
not of their averages. The example chosen, in fact, error would have provided a valuable guide to the
falls just on the border-line between those results which interpretation of the results.
g5.;aJ 7, 2 5 +ucd 2
a l l urn+,
"
$ ' " o'-c
--d3cz g.5 g.$
75
,o c0, , Ccd E
0 42
c I+ . @ QJ
aJuoI-,3 4.S.2 c
h a t: o , I - ,V)U + , 0 7
I-,
3 g u.2
42 0 c a o
0
5L C03 a EJ C
b= ' a, C-U5>
rn cc,
I-,
% -Q.5 c
, G
2
4 2
'- 0 C Q g.5 7G- 3 cd
Q)
5% 23 0 0 I-,$
I-,
7 ah.?
u c g
5 g . S - hg ~
I-,
8O g" 5. 2. r (2 %
f M22 7 I-,, 2 g
- 7,
;cdo-Q,
a o o >=
e- h"0 cd
a2 42
. S c ~ z 8
.2 7 c
+,
d'"
cd'Z
"3
,cdaJ
$
I+
,% E
a-
a 2 6$8gI-,
d u d 0
c
2 E - k ---
7 QJ
0
u
"cdC.2
"bd.-"E I-, (d
Qg,g; I-,
a a J = 2 0 0 .5
2
. c
Q , O
>
+ 03
a
a c
0 . i
G
8 .- :
0
:
cd aJ
-
0
I-,
Q,
q 2-L
2cdu8
2 a
.-c .- u . 2
&2C"-
cd $ Q.2
- -2s
c
z'a5 I-,
t:
0
.5 5 2 cd
* 2 c
%Ss:
rnka.2
d Y Q Q
+i-g z.5
r3 $ 4 2
Ea
&='a
0~
. 3
$
a.3 a
X C ~
,Q)oaJ
E
c z"5
cdg .2 M
+ r n C ~
0 '" .2 c
-3e2 .l2.; "2 22
42
c.2
42
aJ
75 .sY a, Z. B
42 EXPERIMENT ON GROWTH RATE VALIDITY 43
estimate, this estimate will be vitiated, and will be as by tossing a coin, which site shall be occupied by the
incapable of providing a correct statement as to the crossed and which by the self-fertilised plant, we shall
frequency with which our real error will exceed any be assigning by the same act whether this particular
assigned quantity ; and such a statement of frequency ingredient of error shall appear in our average with a
is the sole purpose for which the estimate is of any use. positive or a negative sign. Since each particular
Nevertheless, though its logical necessity is easily error has thus an equal and independent chance of
apprehended, the question of the validity of the estimates being positive or negative, the error of our average
of error used in tests of significance was for long ignored, will necessarily be distributed in a sampling distribution,
and is still often overlooked in practice. One reason centred a t zero, which will be symmetrical in the sense
for this is that standardised methods of statistical analysis that to each possible pjositive error there corresponds
have been taken over ready-made from a mathematical an equal negative error, which, as our procedure guaran-
theory, into which questions of experimental detail do tees, will in fact occur with equal probability.
not explicitly enter. In consequence the assumptions Our estimate of error is easily seen to depend only
which enter implicitly into the bases of the theory have on the same fifteen ingredients, and the arithmetical
not been brought prominently under the notice of processes of summation, subwaction and division may
practical experimenters. A second reason is that it has be designed, and have in fact been designed, so as to
not until recently been recognised that any simple provide the estimate appropriate to the system of
precaution would supply an absolute guarantee of the chances which our method of choosing sites had imposed
validity of the calculations. on the data. This is to say much more than merely
In the experiment under consideration, apart from that the experiment is unbiased, for we might still call
chance differences in the selection of seeds, the sole the experiment unbiased if the whole of the cross-
source of the experimental error in the average of our fertilised plants had been assigned to the west side of
fifteen differences lies in the differences in soil fertility, the pots, and the self-fertilised plants to the east side,
illumination, evaporation, etc., which make the site of by a single toss of the coin. That this would be in-
each crossed plant more or less favourable to growth sufficient to ensure the validity of our estimate may
than the site assigned to the corresponding self-fertilised be easily seen ; for it might well be that some unknown
plant. I t is for this reason that every precaution, such circumstance, such as the incidence of different illumina-
as mixing the soil, equalising the watering and orienting tion a t different times of the day, or the desiccating
the pot so as to give equal illumination, may be expected action of the air-currents prevalent in the greenhouse,
to increase the precision of the experiment. If, now, might systematically favour all the plants on one side
when the fifteen pairs of sites have been chosen, and in over those on the other. The effect of any such pre-
so doing all the differences in environmental circum- vailing cause would then be confounded with the
stances, to which the members of the different pairs advantage, real or apparent, of cross-breeding over
will be exposed during the course of the experiment, inbreeding, and would be eliminated from our estimate
have been predetermined, we then assign a t random, of error, which is based solely on the discrepancies
44 EXPERIMENT ON GROWTH RATE GENERAL TEST 45
between the differences shown by different pairs of plants. distributed population. This is the type of null hypo-
Randomisation properly carried out, in which each thesis which experimenters, rightly in the author's
pair of plants are assigned their positions independently opinion, usually consider it appropriate to test, for
a t random, ensures that the estimates of error will take reasons not only of practical convenience, but because
proper care of all such causes of different growth rates, the unique properties of the normal distribution make
and relieves the experimenter from the anxiety of it alone suitable for general application. There has,
considering and estimating the magnitude of the in- however, in recent years, been a tendency for theoretical
numerable causes by which his data may be disturbed. statisticians, not closely in touch with the requirements
The one flaw in Darwin's procedure was the absence of experimental data, to stress the element of normality,
of randomisation. in the hypothesis tested, as though it were a serious
Had the same measurements been obtained from limitation to the test applied. I t is, indeed, demonstrable
1
pairs of plants properly randomised the experiment that, as a test of this hypothesis, the exactitude of
would, as we have shown, have fallen on the verge of " Student's " t test is absolute. I t may, nevertheless,
significance. Galton was led greatly to overestimate be legitimately asked whether we should obtain a
its conclusiveness through the major error of attempting materially different result were it possible to test the
to estimate the reliability of the comparisons by re- wider hypothesis which merely asserts that the two
arranging the two series in order of magnitude. His series are drawn from the same population, without
discussion shows, in other respects, an over-confidence specifying that this is normally distributed.
in the power of statistical methods to remedy the In these discussions it seems to have escaped recogni-
irregularities of the actual data. In particular, the tion that the physical act of randomisation, which, as
attempt mentioned by Darwin to improve on the simple has been shown, is necessary for the validity of any
averages of the two series " by a more correct method test of significance, affords the means, in respect of any
. . . by including the heights, as estimated in accord- particular body of data, of examining the wider hypo-
ance with statistical rules, of a few plants which died thesis in which no normality of distribution is implied.
before they were measured," seems to go far beyond The arithmetical procedure of such an examination is
the limits of justifiable inference, and is one of many tedious, and we shall only give the results of its appli-
indications that the logic of statistical induction was in cation in order to show the possibility of an independent
its infancy, even a t a time when the technique of accurate check on the more expeditious methods in common use.
experimentation had already been notably advanced. On the hypothesis that the two series of seeds are
random samples from identical populations, and that
21. Test of a Wider Hypothesis their sites have been assigned to members of each pair
I t has been mentioned that " Student's " t test, in independently at random, the 15 differences of Table 3
conformity with the classicaI theory of errors, is appro- would each have occurred with equal frequency with a
priate to the null hypothesis that the two groups of positive or with a negative sign. Their sum, taking
measurements are samples drawn from the same normally account of the two negative signs which have actually
46 EXPERIMENT ON GROWTH RATE GENERAL TEST 47
occurred, is 314, and we may ask how many of the 216 a result very nearly equivalent to that obtained using
numbers, which may be formed by giving each com- the t test with the hypothesis of a normally distributed
ponent alternatively a positive and a negative sign, population. Slight as it is, indeed, the difference
exceed this value. Since ex hy9othesi each of these between the tests of these two hypotheses is partly due
216 combinations will occur by chance with equal to the continuity of the t distribution, which effectively
frequency, a knowledge of how many of them are equal counts only half of the 28 cases which give a total of
to or greater than the value actually observed affords a exactly 314, as being as great as or greater than the
direct arithmetical test of the significance of this value. observed value.
I t is easy to see that if there were no negative signs, Both tests prove that, in about 5 per cent. of trials,
or only one, every possible combination would exceed samples from the same;batch of seed would show differ-
314, while if the negative signs are 7 or mme, every ences just as great, and as regular, as those observed ;
possible combination will fall short of this value. The so that the experimental evidence is scarcely sufficient
distribution of the cases, when there are from 2 to 6 to stand alone. In conjunction with other experiments,
negative values, is shown in the following table :- however, showing a consistent advantage of cross-
fertilised seed, the experiment has considerable weight ;
TABLE 4 since only once in 40 trials would a chance deviation
Number of combinations of dzferences, positive or negative, have been observed both so large, and in the right
which exceed or fall short of the total observed direction.
How entirely appropriate to the present problem
Number of negative
values. >314 = 314 (314 Total. is the use of the distribution of t, based on the theory
,
- of errors, when accurately carried out, may be seen
0 . I ... ... I by inserting an adjustment, which effectively allows for
I 15 ... ... I5 the discontinuity of the measurements. This adjustment
2 . 94 I 10
3 . : '63 3 189
105
455 is not usually of practical importance, with the t test,
4 . 302 II 1,052 1,365 and is only given here to show the close similarity of
5 . 138 I2 2,853 3,003
6 . 22 I 4,982 the results of testing the two hypotheses, in one of
7 or more . ... ... 22,819
S*OO5
22,819 which the errors are distributed according to the normal
law, whereas in the other they may be distributed in
Total . - 835 28 31,905 32,968 any conceivable manner. The adjustment consists "
in calculating the value of t as though the total difference
between the two sets of measurements were less than
In just 863 cases out of 32,768 the total deviation
that actually observed by half a unit of grouping;
will have a positive value as great as or greater than
that observed. In an equal number of cases it will + This adjustment is an extension to the distribution of t of ' ~ a t e s '
have as great a negative value. The two groups together adjustment for continuity, which is of greater importance in the distribution
of xa, for which it was developed.
constitute 5.267 per cent. of the possibilities available,
48 E X P E R I M E N T O N G R O W T H RATE G E N E R A L TEST 49
i.e. as if it were 313 instead of 314, since the possible
knowledge of the material. In inductive logic, however,
values advance by steps of 2. T h e value of t is then an erroneous assumption of ignorance is not innocuous ;
found to be 2.139 instead of 2.148. The following
it often leads to manifest absurdities. Experimenters
table shows the effect of the adjustment on the test of should remember that they and their colleagues usually
significance, and its relation to the test of the more know more about the kind of material they are dealing
general hypothesis. with than do the authors of text-books written without
TABLE 5 such personal experience, and that a more complex,
Probability of a Positive
Difference exceeding that or less intelligible, test is not likely to serve their purpose
1. observed. better, in any sense, than those of proved value in their
unadjusted
Normal hypothesis{adjusted
.. 2.148 2.485 per cent.
own subject. i
2.139 2'529 ,,
General hypothesis . 2.634 ,,
REFERENCES AND OTHER READING
T h e difference between the two hypotheses
.- is thus C. DARWIN(1876). The effects of cross- and self-fertilisation in the
equivalent to little more than a probability of one in a vegetable kingdom. John Murray, London.
thousand. R. A. FISHER(1925). Applications of " Student's " distribution.
Metron, v. 90-104.
21.1. " Non-parametric " Tests R. A. FISHER(1925-1963). Statistical methods for research workers.
In recent years tests using the physical act of Chap. V., §§ 23-24.
R. A. FISHER (1956, 1959). Statistical methods and scientific
randomisation to supply (on the Null Hypothesis) a inference. Oliver and Boyd Ltd., Edinburgh.
frequency distribution, have been largely advocated " STUDENT" (1908). The probable error of a mean. Biometrika,
under the name of " Non-parametric " tests. Some- vi. 1-25.
what extravagant claims have often been made on their
behalf. T h e example of this Chapter, published in
1935, was by many years the first of its class. T h e reader
will realise that it was in no sense put forward to super-
sede the common and expeditious tests based on the
Gaussian theory of errors. T h e utility of such non-
parametric tests consists in their being able to supply
confirmation whenever, rightly or, more often, wrongly,
it is suspected that the simpler tests have been appre-
ciably injured by departures from normality.
They assume less knowledge, or more ignorance, of
the experimental material than do the standald tests,
and this has been an attraction to some mathematicians
who often discuss experimentation without personal
k,~-,
.w
9'2
BIRTHS AND DEATHS IN LONDON,
-m
'
WEEKEMIMC) SATUSDAY, NOYEMBEE
- [No.48.
Nov. 19.
- - --
Xew Bridge.
C h h - TheThamea,atBat&rnea- 7 122147 22 18
Kent TheBavms-ingent
Warnid-
Inndon - Thc'hu&,it-
The river Ime, at Lee
- 18
72
26
134300
277700
484694
30
R4
144
22
SO
33
Bridge.
*(l)'fambethlrna The !lhmea, at Thsmes 1 846563 211 61
(2)ktB.wt
Sortthwk
*(1)&dmrk
- Diilrndat-
The Thamea at Battaws
T h b ! t h m a , r t m
- 8
0
11826i
17806
111
19
94
101
(2) b t * the Ravemibourne in Kent,
ud ditahes aidwalk
*h1~aar(I8luLed#h8n~)tbemredhtriotrye~pp~d
t WbOgo O m p a U .
C48.1 3c
. - a m applisd
chie4ly by the m p m t i t e
'WaterCompurie;~.
Deaths
Ele~tion Di?atbsfrom to 100,000
in feet C h o h IdubiEants*
e:y in 18w&
t
!m.
ze I
ending
Nov. 19.
~ X D O H - - . - - - 39 2362236
-1
698
8
SO
5
* ( l ) 5 p 8 & i d d 8@l@8 at h p t d and 80 166956
(2) New M. EQIvaod, two orterian
Nml&irer - - r a h , and Nem Ever.
At C!hadwell 8 -
Herttbrddh fwm river
in 76 684468 55 9
--
Bkw Bridge.
C h b
Eat
-- !rheTham+bt~~ - 3 122147 22 18
aLondon
TheILaav~loWhU
-
wertlldiddkSE!x- T h c U a t m
The river I&. at Lee
- 18
79
26
134300
977700
434694
: SO
84
144
22
SO
33
-d
(2) gsnt*
I the h w k a r o s i n Kent,
PldltobDldralL.
'~thnrea#r(nurted~muosria)tbewllve~ua~pp~d~~o~
1 , I I
fAQ 1 3c