Sei sulla pagina 1di 17

BY THE SAME AUTHOR :

Statistical Methods for Research Workers


(1925-13th Ed. 1963)
The
Statistical Methods and Scienf+c Inferenre
(1956, 1959)
The Theory of Inbreeding Design of Experiments
(1949, 1965)
WITH F. YATES :
Statistical Tables
(19384th Ed. 1963)
All published by Oliver and Boyd Ltd., Edinburgh
Sir Ronald A. Fisher, Sc.D., F.R.S.
Honorary Research Fejlow, Division of Mathematical Statistics,
C.S.I.R.O., University of Adelaide; Foreign Associate, United States
National Academy of Sciences, and Foreign Honorary Member,
American Academy of Arts and Sciences; Foreign Member of the
Swedish Royal Academy of Sciences, and the Royal Danish Academy
of Sciences and Letters; Member of the Pontifical Academy;
Member of the German Academy of Sciences (Leopoldina) ; formerly
Galton Professor, University of London, and Arthur Balfour Professor
of Genetics, University of Cambridge

Oliver a n d Boyd
E d i n b u r g h : Tweeddale C o u r t
London: 3ga Welbeck Street, W. I
I
PREFACE TO FIRST EDITION
i
I N 1925 the author wrote a book (Statistical Methods
for Research Workers) with the object of supplying
practical experimenters and, incidentally, teachers of
mathematical statistics, with a connected account of
the applications in laboratory work of some of the
more recent advances in statistical theory. Some of
the new methods, such as the analysis of variance,
were found to be so intimately related with problems
of experimental design that a considerable part of
the eighth chapter was devoted to the technique of
agricultural experimentation, and these sections have
been progressively enlarged with subsequent editions,
in response to frequent requests for a fuller treatment
of the subject. The design of experiments is, however,
too large a subject, and of too great importance to the
general body of scientific workers, for any incidental
treatment to be adequate. A clear grasp of simple
and standardised statistical procedures will, as the
reader may satisfy himself, go far to elucidate the
principles of experimentation ; but these procedures
are themselves only the means to a more important
end. Their part is to satisfy the requirements of sound
and intelligible experimental design, and to supply the
machinery for unambiguous interpretation. T o attain
a clear grasp of these requirements we need to study
designs which have been widely successful in many
fields, and to examine their structure in relation to the
PRINTED A N D PUBLISHED IN G R E A T BRITAIN
BY OLIVER AND BOYD L T D . , EDINBURGH
requirements of valid inference.
T h e examples chosen in this book are aimed at
vii
26 THE PRINCIPLES OF EXPERIMENTATION

such an assessment would be inadmissible and irrelevant


in judging the state of the scientific evidence ; more-
over, accurately assessable prior information is ordin-
arily known to be lacking. Such differences between
the logical situations should be borne in mind whenever
we see tests of significance spoken of as " Rules of
Action ". A good deal of confusion has certainly A HISTORICAL EXPERIMENT ON GROWTH RATE
been caused by the attempt to' formalise the exposition 13. WE have illustrated a psycho-physical experiment,
of tests of significance in a logical framework different the result of which depends upon judgments, scored
from that for which they were in fact first developed. " right " or wrong:" and may be appropriately
11

interpreted by the method of the classical theory of


REFERENCES AND OTHER READING probability. This method rests on the enumeration
R. A. FISHER(1925-1963). Statistical methods for research workers. of the frequencies with which different combinations
Chap. III., $8 15-19 of right or wrong judgments will occur, on the hypo-
R. A. FISHER(1926). The arrangement of field experiments. Journal
of Ministry of Agriculture, xxxiii. 503-513. thesis to be tested. We may now illustrate an experiment
in which the results are expressed in quantitative
measures, and which is appropriately interpreted by
means of the theory of errors.
In the introductory remarks to his book on " The
effects of cross and self-fertilisation in the vegetable
kingdom,'' Charles Darwin gives an account of the
considerations which guided him in the design of his
I
experiments and in the presentation of his data, which
will serve well to illustrate the principles on which
biological experim'ents may be made conclusive. The
passage is of especial interest in illustrating the extremely
crude and unsatisfactory statistical methods available
at the time, and the manner in which careful attention
to commonsense considerations led to the adoption of
an experimental design, in itself greatly superior to
these methods of interpretation.
14. Darwin's Discussion of the Data
" I long doubted whether it was worth while to
give the measurements of each separate plant, but have
2 8 EXPERIMENT O N GROWTH R A T E GALTON'S METHOD OF INTERPRETATION 29

decided to do so, in order that it may be seen that the which M r Galton has h a d the great kindness to draw
superiority of the crossed plants over the self-fertilised up for me."
does not commonly depend on the presence of two or 15. Galton's Method of Interpretation
three extra fine plants on the one side, or of a few very I I
I have examined the measurements of the plants with
poor plants o n the other side. Although several care, and by many statistical methods, to find out how far the
observers have insisted in general terms on the offspring means of the several sets represent constant realities, such as
from intercrossed varieties being superior to either would come out the same so long as the general conditions of
parent-form, no precise measurements have beeA given ; growth remained unaltered. The principal methods that were
a n d I have met with no observations on the effects of adopted are easily explained by selecting one of the shorter
crossing a n d self-fertilising the individuals of the same series of plants, say of 2 e a mays, for an example.
variety. Moreover, experiments of this kind require " The observations as I received them are shown in columns
so much t i m e m i n e having been continued during I I. and I I I., where they certainly have no prima^facie appearance
eleven years-that they are not likely soon to be of regularity. But as soon as we arrange them in the order of
repeated. their mag~itudes,as in columns IV. and V., the case is materially
altered. We now see, with few exceptions, that the largest
" A s only a moderate number of crossed a n d self- plant on the crossed side in each pot exceeds the largest plant
fertilised plants were measured, it was of great importance
on the self-fertilised side, that the second exceeds the second,
to me to learn how far the averages were trustworthy. the third the third, and so on. Out of the fifteen cases in the
I therefore asked M r Galton, who has had much experi- table, there are only two exceptions to this rule.' We may
ence in statistical researches, to examine some of my therefore confidently affirm that a crossed series will always
tables of measurements, seven in number, namely be found to exceed a self-fertilised series, within the range
- of the
those of Zpomiza, Digitalis, Reseda lutea, Viola, conditions under which the present experiment has been made.
Limnanthes, Petunia, and Zea. I may premise that 11
Next as regards the numerical estimate of this excess.
if we took b y chance a dozen or score of men belonging I The mean values of the several groups are so discordant, as
to two nations a n d measured them, it would I presume is shown in the table just given, that a fairly precise numerical
be very rash to form a n y judgment from such small estimate seems impossible. But the consideration arises,
numbers on their average heights. But the case is whether the difference between pot and pot may not be of
somewhat different with m y crossed a n d self-fertilised I much the same order of importance as that of the other
conditions upon which the growth of the plants has been
plants, a s they were of exactly the same age, were
modified. If so, and only on that condition, it would follow
subjected from first to last to the same conditions, and that when all the measurements, either of the crossed or the
were descended from the same parents. When only self-fertilised plants, were combined into a single series, that
from two to six pairs of plants were measured, the I series would be statistically regular. The experiment is tried
results a r e manifestly of little or no value, except in so in columns VII. and VIII., where the regularity is abundantly
far as they confirm a n d are confirmed by experiments clear, and justifies us in considering its mean as perfectly reliable
made on a larger scale with other species. I will now * Galton evidently did not notice that this is true also before rearrange-
give the report on the seven tables of measurements, ment.
EXPERIMENT O N G;RO;WTH RATE
I GALTON'S METHOD OF INTERPRETATION

I have protracted these measurements, and revised them in the


31

usual way, by drawing a curve through them with a free hand,


but the revision barely modifies the means derived from the
original observations. In the present, and in nearly all the
other cases, the difference between the original and revised
means is under 2 per cent. of their value. I t is a very remarkable
coincidence that in the seven kinds of plants, whose measure-
ments I have examined, the ratio between the heights of the
crossed and of the self-fertilised ranges in five cases within very
narrow limits. In Zea mays it is as loo to 84, and in the others
it ranges between loo to $76 and IOO to 86.
TABLE 2
b.

---- *bWim
0000 V I N
Pot. Crossed. Self-fert. Difference.

I. . . 18% 19% foil


11. .
111.
IV.
.
.
.
20p
21Q
I*
:&
I6
-I*
-43
-3%

" The determination of the variability (measured by what


is technically called the ' probable error ') is a problem of more
delicacy than that of determining the means, and I doubt, after
making many trials, whether it is possible to derive useful
conclusions from these few observations. We ought to have
measurements of at least fifty plants in each case, in order to
be in a position to deduce fair results. . .. ,1

" Mr Galton sent me at the same time graphical


representations which he had made of the measurements,
and they evidently form fairly regular curves. He
appends the words ' very good ' to those of Zea and
Limnanthes. He also calculated the average height of
the crossed and self-fertilised plants in the seven tables
by a more correct method than that followed by me,
namely by including, the heights, as estimated in
accordance with statistical rules, of a few plants which
32 E X P E R I M E N T ON GROWTH R A T E PAIRING AND GROUPING
I 33
1 of each individual observation; and, on the other, we
died before they were measured ; whereas I merely
added up the heights of the survivors, and divided the i require to multiply the observations so as to demon-
sum by their number. T h e difference in our results is strate so far as possible the reliability and consistency
in one way highly satisfactory, for the average heights of the results. Thus an experimenter with field crops
of the self-fertilised plants, as deduced by Mr Galton, may desire to replicate his experiments upon a large
is less than mine in all the cases excepting one, in which number of plots, but be deterred by the consideration
our averages are the same ; and this shows that I have I
that his facilities allow him to sow only a limited area
by no means exaggerated the superiority of the crossed on the same day. An experimenter with small mammals
I
over the self-fertilised plants." may have only a limited supply of an inbred and highly
~ uniform stock, whic$ he believes to be particularly
16. Pairing and Grouping desirable for experimental purposes. Or, he may desire
It is seen that the method of comparison adopted to carry out his experiments on members of the same
1
by Darwin is that of pitting each self-fertilised plant litter, and feel that his experiment is limited by the
against a cross-fertilised one, in conditions made as size of the largest litter he can obtain. I t has indeed
equal as possible. The pairs so chosen for comparison frequently been argued that, beyond a certain moderate
had germinated at the same time, and the soil conditions degree, further replication can give no further increase I
in which they grew were largely equalised by planting in precision, owing to the increasing heterogeneity with I
in the same pot. Necessarily they were not of the same which, it is thought, it must be accompanied. In all
parentage, as it would be difficult in maize to self- these cases, however, and in the many analogous cases
fertilise two plants at the same time as raising a cross- which constantly arise, there is no real dilemma.
fertilised progeny from the pair. However, the parents Uniformity is only requisite between the objects whose
were presumably grown from the same batch of seed. response is to be contrasted (that is, objects treated
T h e evident object of these precautions is to increase differently). I t is not requisite that all the parallel plots
the sensitiveness of the experiment, by making such under the same treatment shall be sown on the same
I
differences in growth rate as were to be observed as little day, but only that each such plot shall be sown so far
1
as possible dependent from environmental circumstances, as possible simultaneously with the differently treated
and as much as possible, therefore, from intrinsic plot or plots with which it is to be compared. If, there-
I
differences due to their mode of origin. fore, only two kinds of treatments are under examina-
T h e method of pairing, which is much used in tion, pairs of plots may be chosen, one plot for each
modern biological work, illustrates well the way in treatment ; and the precision of the experiment will
which an appropriate experimental design is able to be given its highest value if the members of each pair
I
reconcile two desiderata, which sometimes a ear to are treated closely alike, but will gain nothing from
P
be in conflict. O n the one hand we require the utmost
I
I similarity of treatment applied to different pairs, nor
uniformity in the biological material, which is the subject I lose anything if the conditions in these are somewhat
of experiment, in order to increase the sensitiveness , varied. In the same way, if the numbers of animals
C
34 EXPERIMENT ON GROWTH RATE "STUDENT'S TEST" 35
available from any inbred line are too few for adequate of comparisons which we wish to make. Whether'
replication, the experimental contrasts in treatments these estimates are valid, for the purpose for which we
may be applied to pairs of animals from different inbred intend them, depends on what has been actually done.
lines, so long as each pair belongs to the same line. I t is possible, and indeed it is all too frequent, for an
In these two cases it is evident that the principle of experiment to be so conducted that no valid estimate
combining similarity between controls to be compared, of error is available. In such a case the experiment
with diversity between parallels, may be extended to cannot be said, strictly, to be capable of proving any-
cases where three or more treatments are under @vest;- thing. Perhaps it should not, in this case, be called an
gation. T h e requirement that animals to be contrasted experiment at all, but be added merely to the body of
must come from the same litter limits, not the amount experience on which, h r lack of anything better, we
of replication, but the number of different treatments may have to base our opinions. All that we need to
that can be so tested. Thus we might test three, but emphasise immediately is that, if an experiment does
not so easily four or five treatments, if it were necessary allow us to calculate a valid estimate of error, its struc-
that each set of animals must be of the same sex and ture must completely determine the statistical procedure
litter. Paucity of homogeneous material limits the by which this estimate is to be calculated. If this were
number of different treatments in an experiment, not not so, no interpretation of the data could ever be
the number of replications. I t may cramp the scope unambiguous ; for we could never be sure that some
and comprehensiveness of an experimental enquiry, other equally valid method of interpretation would not
but sets no limit to its possible precision. lead to a different result.
The object of the experiment is to determine whether
17. " Student's " t Test * the difference in origin between inbred and cross-bred
Owing to the historical accident that the theory of plants influences their growth rate, as measured by
errors, by which quantitative data are to be interpreted, height at a given date ; in other words, if the numbers
was developed without reference to experimental of the two sorts of plants were to be increased indefinitely,
methods, the vital principle has often been overlooked our object is to determine whether the average heights,
that the actual and physical conduct of an experiment to which these two aggregates of plants will tend; are
must govern the statistical procedure of its interpreta- equal or unequal. The most general statement of our
tion. In using the theory of errors we rely for our con- null hypothesis is, therefore, that the limits to which
clusion upon one or more estimates of error, derived these two averages tend are equal. The theory of
from the data, and appropriate to the one or more sets errors enables us to test a somewhat more limited
* A full account of this test in more varied applications, and the tables hypothesis, which, by wide experience, has been found
for its use, will be found in Siaiisiical Meikudr fw Research Workers. Its to be appropriate to the metrical characters of experi-'
originator, who published anonymously under the pseudonym " Student,"
possesses the remarkable distinction that, without being a professed mental material in biology. The disturbing causes
mathematician, but a research chemist, he made early in life this revolutionary which introduce discrepancies in the means of measure-
refinement of the classical theory of errors. ments of similar material are found to produce quanti-
36 EXPERIMENT ON G R O W T H R A T E " STUDENT'S" TEST 37

tative effects which conform satisfactorily to a theoretical With respect to these differences our null hypothesis
distribution known as the normal law of frequency of asserts that they are normally distributed about a mean
error. I t is this circumstance that makes it appropriate value at zero, and we have to test whether our 15
to choose, as the null hypothesis to be tested, one for observed differences are compatible with the supposition
which an exact statistical criterion is available, namely that they are a sample from such a population.
that the two groups of measurements are samples
TABLE 3
drawn from the same normal population. On the basis
Dzferences i n e i g h t h of an inch 6etzpreen cross- and
of this hypothesis we may proceed to compare the self-fertilised plants of the same pair
average difference in height, between the cross-fertilised
and the self-fertilised plants, with such differences as
might be expected between these averages, in view of
the observed discrepancies between the heights of
plants of like origin.
T h e calculations needed to make a rigorous test of
We must now see how the adoption of the method
the null hypothesis stated above involve no more than
of pairing determines the details of the arithmetical
the sum, and th,e sum of the squares, of these numbers.
procedure, so as to lead to an unequivocal interpreta-
The sum is 314, and, since there are 15 plants, the
tion. The pairing procedure, as indeed was its purpose,
has equalised any differences in soil conditions, illumina- mean difference is 2 0 3 in favour of the cross-fertilised
tion, air-currents, etc., in which the several pairs of 1S-
plants. The sum of the squares is 26,518, and from
individuals may differ. Such differences having been
this is deducted the product of the total and the mean,
eliminated from the experimental comparisons, and
or 6573, leaving 19,945 for the sum of squares of devia-
contributing nothing to the real errors of our experiment,
tions from the mean, representing discrepancies among
must, for this reason, be eliminated likewise from our
the differences observed in the 15 pairs. T h e algebraic
estimate of error, upon which we are to judge what
fact here used is that
differences between the means are compatible with the
null hypothesis, and what differences are so great as to S ( X - ~=
) ~S(x2)-2S(x)
be incompatible with it. We are therefore not con- where S stands for summation over the sample, and
cerned with the differences in height among plants of f for the mean value of the observed differences, x.
like origin, but only with differences in height between We may make from this measure of the discrepancies
members of the same pair, and with the discrepancies an estimate of a quantity known as the variance of an
among these differences observed in different pairs. individual difference, by dividing by 14, one less than
Our first step, therefore, will be to subtract from the the number of pairs observed. Equally, and what is
height of each cross-fertilised plant the height of the more immediately required, we may make an estimate
self-fertilised plant belonging to the same pair. The of the variance of the mean of 15 such pairs, by dividing
differences are shown below in eighths of an inch. again by r 5, a process which yields 94.976 as the estimate.
38 E X P E R I M E N T ON GROWTH RATE FALLACIOUS METHODS 39

The square root of the variance is known as the standard can suffice by themselves to establish the point at issue,
error, and it is by the ratio which our observed mean and those which are of little value except in so far as
difference bears to its standard error that we shall judge they confirm or are confirmed by other experiments of
of its significance. Dividing our difference, 20.933, a like nature. In particular, it is to be noted that
I
by its standard error 9.746, we find this ratio (which is Darwin recognised that the reliability of the result
usually denoted by t) to be 2.148. must be judged by the consistency of the superiority
The object of these calculations has been to obtain of the crossed plants over the self-fertilised, and not
from the data a quantity measuring the average differ- only on the difference of the averages, which might
ence in height between the cross-fertilised and the self- depend, as he says, on the presence of two or three
fertilised plants, in terms of the observed discrepancies extra-fine plants on the one side, or of a few very poor
among these differences ; and which, moreover, shall plants on the other side ; and that therefore the pre-
be distributed in a known manner when the null hypo- sentation of the experimental evidence depended essen-
thesis is true. The mathematical distribution for our tially on giving the measurements of each independent
present problem was discovered by " Student " in plant, and could not be assessed from the mere averages.
1908, and depends only upon the number of independent I t may be noted also that Galton's scepticism of the
comparisons (or the number of degrees of freedom) value of the probable error, deduced from only 15 pairs
available for calculating the estimate of error. With of observations, though, as it turned out, somewhat
, excessive, was undoubtedly right in principle. The
15 observed differences we have among them 14 inde-
pendent discrepancies, and our degrees of freedom are standard error (of which the probable error is only a
14. The available tables of the distribution of t show conventional fraction) can only be estimated with con-
that for 14 degrees of freedom the value 2.145 is exceeded siderable uncertainty from so small a sample, and,
T

by chance, either in the positive or negative direction, prior to " Student's " solution of the problem, it was
in exactly 5 per cent. of random trials. The observed i
by no means clear to what extent this uncertainty would
value o f t , 2.148, thus just exceeds the 5 per cent. point, invalidate the test of significance. From " Student's "
and the experimental result may be judged significant, work it is now known that the cause for anxiety was
though barely so. not so great as it might have seemed. Had the standard
error been known with certainty, or derived from an
i
18. Fallacious Use of Statistics effectively infinite number of observations, the 5 per
We may now see that Darwin's judgment was cent. value of t would have been 1.960. When our
perfectly sound, in judging that it was of importance estimate is based upon only 15 differences, the 5 per
to learn how far the averages were trustworthy, and cent. value, as we have seen, is 2.145, or less than
, 10 per cent. greater. Even using the inexact theory
that this could be done by a statistical examination of
the tables of measurements of individual plants, though available at the time, a calculation of the probable
not of their averages. The example chosen, in fact, error would have provided a valuable guide to the
falls just on the border-line between those results which interpretation of the results.
g5.;aJ 7, 2 5 +ucd 2
a l l urn+,
"
$ ' " o'-c
--d3cz g.5 g.$
75
,o c0, , Ccd E
0 42
c I+ . @ QJ
aJuoI-,3 4.S.2 c
h a t: o , I - ,V)U + , 0 7
I-,
3 g u.2
42 0 c a o
0
5L C03 a EJ C
b= ' a, C-U5>
rn cc,
I-,
% -Q.5 c
, G
2
4 2
'- 0 C Q g.5 7G- 3 cd
Q)
5% 23 0 0 I-,$
I-,
7 ah.?
u c g
5 g . S - hg ~
I-,
8O g" 5. 2. r (2 %
f M22 7 I-,, 2 g
- 7,
;cdo-Q,
a o o >=
e- h"0 cd
a2 42
. S c ~ z 8
.2 7 c
+,
d'"
cd'Z
"3
,cdaJ
$
I+
,% E
a-
a 2 6$8gI-,
d u d 0
c
2 E - k ---
7 QJ

0
u
"cdC.2
"bd.-"E I-, (d
Qg,g; I-,
a a J = 2 0 0 .5
2
. c
Q , O
>
+ 03
a
a c
0 . i

G
8 .- :
0
:
cd aJ
-
0

I-,
Q,
q 2-L
2cdu8
2 a
.-c .- u . 2
&2C"-
cd $ Q.2
- -2s
c
z'a5 I-,
t:
0
.5 5 2 cd
* 2 c
%Ss:
rnka.2
d Y Q Q
+i-g z.5
r3 $ 4 2
Ea
&='a
0~
. 3
$
a.3 a
X C ~
,Q)oaJ
E
c z"5
cdg .2 M
+ r n C ~
0 '" .2 c
-3e2 .l2.; "2 22
42
c.2
42
aJ
75 .sY a, Z. B
42 EXPERIMENT ON GROWTH RATE VALIDITY 43
estimate, this estimate will be vitiated, and will be as by tossing a coin, which site shall be occupied by the
incapable of providing a correct statement as to the crossed and which by the self-fertilised plant, we shall
frequency with which our real error will exceed any be assigning by the same act whether this particular
assigned quantity ; and such a statement of frequency ingredient of error shall appear in our average with a
is the sole purpose for which the estimate is of any use. positive or a negative sign. Since each particular
Nevertheless, though its logical necessity is easily error has thus an equal and independent chance of
apprehended, the question of the validity of the estimates being positive or negative, the error of our average
of error used in tests of significance was for long ignored, will necessarily be distributed in a sampling distribution,
and is still often overlooked in practice. One reason centred a t zero, which will be symmetrical in the sense
for this is that standardised methods of statistical analysis that to each possible pjositive error there corresponds
have been taken over ready-made from a mathematical an equal negative error, which, as our procedure guaran-
theory, into which questions of experimental detail do tees, will in fact occur with equal probability.
not explicitly enter. In consequence the assumptions Our estimate of error is easily seen to depend only
which enter implicitly into the bases of the theory have on the same fifteen ingredients, and the arithmetical
not been brought prominently under the notice of processes of summation, subwaction and division may
practical experimenters. A second reason is that it has be designed, and have in fact been designed, so as to
not until recently been recognised that any simple provide the estimate appropriate to the system of
precaution would supply an absolute guarantee of the chances which our method of choosing sites had imposed
validity of the calculations. on the data. This is to say much more than merely
In the experiment under consideration, apart from that the experiment is unbiased, for we might still call
chance differences in the selection of seeds, the sole the experiment unbiased if the whole of the cross-
source of the experimental error in the average of our fertilised plants had been assigned to the west side of
fifteen differences lies in the differences in soil fertility, the pots, and the self-fertilised plants to the east side,
illumination, evaporation, etc., which make the site of by a single toss of the coin. That this would be in-
each crossed plant more or less favourable to growth sufficient to ensure the validity of our estimate may
than the site assigned to the corresponding self-fertilised be easily seen ; for it might well be that some unknown
plant. I t is for this reason that every precaution, such circumstance, such as the incidence of different illumina-
as mixing the soil, equalising the watering and orienting tion a t different times of the day, or the desiccating
the pot so as to give equal illumination, may be expected action of the air-currents prevalent in the greenhouse,
to increase the precision of the experiment. If, now, might systematically favour all the plants on one side
when the fifteen pairs of sites have been chosen, and in over those on the other. The effect of any such pre-
so doing all the differences in environmental circum- vailing cause would then be confounded with the
stances, to which the members of the different pairs advantage, real or apparent, of cross-breeding over
will be exposed during the course of the experiment, inbreeding, and would be eliminated from our estimate
have been predetermined, we then assign a t random, of error, which is based solely on the discrepancies
44 EXPERIMENT ON GROWTH RATE GENERAL TEST 45

between the differences shown by different pairs of plants. distributed population. This is the type of null hypo-
Randomisation properly carried out, in which each thesis which experimenters, rightly in the author's
pair of plants are assigned their positions independently opinion, usually consider it appropriate to test, for
a t random, ensures that the estimates of error will take reasons not only of practical convenience, but because
proper care of all such causes of different growth rates, the unique properties of the normal distribution make
and relieves the experimenter from the anxiety of it alone suitable for general application. There has,
considering and estimating the magnitude of the in- however, in recent years, been a tendency for theoretical
numerable causes by which his data may be disturbed. statisticians, not closely in touch with the requirements
The one flaw in Darwin's procedure was the absence of experimental data, to stress the element of normality,
of randomisation. in the hypothesis tested, as though it were a serious
Had the same measurements been obtained from limitation to the test applied. I t is, indeed, demonstrable
1
pairs of plants properly randomised the experiment that, as a test of this hypothesis, the exactitude of
would, as we have shown, have fallen on the verge of " Student's " t test is absolute. I t may, nevertheless,
significance. Galton was led greatly to overestimate be legitimately asked whether we should obtain a
its conclusiveness through the major error of attempting materially different result were it possible to test the
to estimate the reliability of the comparisons by re- wider hypothesis which merely asserts that the two
arranging the two series in order of magnitude. His series are drawn from the same population, without
discussion shows, in other respects, an over-confidence specifying that this is normally distributed.
in the power of statistical methods to remedy the In these discussions it seems to have escaped recogni-
irregularities of the actual data. In particular, the tion that the physical act of randomisation, which, as
attempt mentioned by Darwin to improve on the simple has been shown, is necessary for the validity of any
averages of the two series " by a more correct method test of significance, affords the means, in respect of any
. . . by including the heights, as estimated in accord- particular body of data, of examining the wider hypo-
ance with statistical rules, of a few plants which died thesis in which no normality of distribution is implied.
before they were measured," seems to go far beyond The arithmetical procedure of such an examination is
the limits of justifiable inference, and is one of many tedious, and we shall only give the results of its appli-
indications that the logic of statistical induction was in cation in order to show the possibility of an independent
its infancy, even a t a time when the technique of accurate check on the more expeditious methods in common use.
experimentation had already been notably advanced. On the hypothesis that the two series of seeds are
random samples from identical populations, and that
21. Test of a Wider Hypothesis their sites have been assigned to members of each pair
I t has been mentioned that " Student's " t test, in independently at random, the 15 differences of Table 3
conformity with the classicaI theory of errors, is appro- would each have occurred with equal frequency with a
priate to the null hypothesis that the two groups of positive or with a negative sign. Their sum, taking
measurements are samples drawn from the same normally account of the two negative signs which have actually
46 EXPERIMENT ON GROWTH RATE GENERAL TEST 47

occurred, is 314, and we may ask how many of the 216 a result very nearly equivalent to that obtained using
numbers, which may be formed by giving each com- the t test with the hypothesis of a normally distributed
ponent alternatively a positive and a negative sign, population. Slight as it is, indeed, the difference
exceed this value. Since ex hy9othesi each of these between the tests of these two hypotheses is partly due
216 combinations will occur by chance with equal to the continuity of the t distribution, which effectively
frequency, a knowledge of how many of them are equal counts only half of the 28 cases which give a total of
to or greater than the value actually observed affords a exactly 314, as being as great as or greater than the
direct arithmetical test of the significance of this value. observed value.
I t is easy to see that if there were no negative signs, Both tests prove that, in about 5 per cent. of trials,
or only one, every possible combination would exceed samples from the same;batch of seed would show differ-
314, while if the negative signs are 7 or mme, every ences just as great, and as regular, as those observed ;
possible combination will fall short of this value. The so that the experimental evidence is scarcely sufficient
distribution of the cases, when there are from 2 to 6 to stand alone. In conjunction with other experiments,
negative values, is shown in the following table :- however, showing a consistent advantage of cross-
fertilised seed, the experiment has considerable weight ;
TABLE 4 since only once in 40 trials would a chance deviation
Number of combinations of dzferences, positive or negative, have been observed both so large, and in the right
which exceed or fall short of the total observed direction.
How entirely appropriate to the present problem
Number of negative
values. >314 = 314 (314 Total. is the use of the distribution of t, based on the theory

,
- of errors, when accurately carried out, may be seen
0 . I ... ... I by inserting an adjustment, which effectively allows for
I 15 ... ... I5 the discontinuity of the measurements. This adjustment
2 . 94 I 10
3 . : '63 3 189
105
455 is not usually of practical importance, with the t test,
4 . 302 II 1,052 1,365 and is only given here to show the close similarity of
5 . 138 I2 2,853 3,003
6 . 22 I 4,982 the results of testing the two hypotheses, in one of
7 or more . ... ... 22,819
S*OO5
22,819 which the errors are distributed according to the normal
law, whereas in the other they may be distributed in
Total . - 835 28 31,905 32,968 any conceivable manner. The adjustment consists "
in calculating the value of t as though the total difference
between the two sets of measurements were less than
In just 863 cases out of 32,768 the total deviation
that actually observed by half a unit of grouping;
will have a positive value as great as or greater than
that observed. In an equal number of cases it will + This adjustment is an extension to the distribution of t of ' ~ a t e s '
have as great a negative value. The two groups together adjustment for continuity, which is of greater importance in the distribution
of xa, for which it was developed.
constitute 5.267 per cent. of the possibilities available,
48 E X P E R I M E N T O N G R O W T H RATE G E N E R A L TEST 49
i.e. as if it were 313 instead of 314, since the possible
knowledge of the material. In inductive logic, however,
values advance by steps of 2. T h e value of t is then an erroneous assumption of ignorance is not innocuous ;
found to be 2.139 instead of 2.148. The following
it often leads to manifest absurdities. Experimenters
table shows the effect of the adjustment on the test of should remember that they and their colleagues usually
significance, and its relation to the test of the more know more about the kind of material they are dealing
general hypothesis. with than do the authors of text-books written without
TABLE 5 such personal experience, and that a more complex,
Probability of a Positive
Difference exceeding that or less intelligible, test is not likely to serve their purpose
1. observed. better, in any sense, than those of proved value in their
unadjusted
Normal hypothesis{adjusted
.. 2.148 2.485 per cent.
own subject. i
2.139 2'529 ,,
General hypothesis . 2.634 ,,
REFERENCES AND OTHER READING
T h e difference between the two hypotheses
.- is thus C. DARWIN(1876). The effects of cross- and self-fertilisation in the
equivalent to little more than a probability of one in a vegetable kingdom. John Murray, London.
thousand. R. A. FISHER(1925). Applications of " Student's " distribution.
Metron, v. 90-104.
21.1. " Non-parametric " Tests R. A. FISHER(1925-1963). Statistical methods for research workers.
In recent years tests using the physical act of Chap. V., §§ 23-24.
R. A. FISHER (1956, 1959). Statistical methods and scientific
randomisation to supply (on the Null Hypothesis) a inference. Oliver and Boyd Ltd., Edinburgh.
frequency distribution, have been largely advocated " STUDENT" (1908). The probable error of a mean. Biometrika,
under the name of " Non-parametric " tests. Some- vi. 1-25.
what extravagant claims have often been made on their
behalf. T h e example of this Chapter, published in
1935, was by many years the first of its class. T h e reader
will realise that it was in no sense put forward to super-
sede the common and expeditious tests based on the
Gaussian theory of errors. T h e utility of such non-
parametric tests consists in their being able to supply
confirmation whenever, rightly or, more often, wrongly,
it is suspected that the simpler tests have been appre-
ciably injured by departures from normality.
They assume less knowledge, or more ignorance, of
the experimental material than do the standald tests,
and this has been an attraction to some mathematicians
who often discuss experimentation without personal
k,~-,
.w
9'2
BIRTHS AND DEATHS IN LONDON,

1853. VOL XIV.)

-m
'
WEEKEMIMC) SATUSDAY, NOYEMBEE

OF mDmot Dul?muO TEE


26.

- [No.48.

rpoV CBO- IX I)rel'lUCTE IIUPPUED BY WAT&PCOXPlWIEk

Aggregrte of DistriotD sopplied


chiefly by the reqeetive
Water Comjmuies.
Death
WW CiUfI* h u m s d s~pply. *levstion
in feet
Deaths film t0 100,OM)

Nov. 19.

&=x - - - - 39 2362286 698


-.
30

*(l)%-rsd Springe at Earn- and 80 166956 8 5


(2) NewBira, . Kenmod, two d
New E d - - rolls, mxl New Rim.
At C h a d d Springs in
H d d d k e , from river
76 €84468 55 9

Lee, and fom wells in


MXdlesex and Her&
&and Jun* - TheThames, 360 yards above 38 109636 14 13

- - --
Xew Bridge.
C h h - TheThamea,atBat&rnea- 7 122147 22 18
Kent TheBavms-ingent
Warnid-
Inndon - Thc'hu&,it-
The river Ime, at Lee
- 18
72
26
134300
277700
484694
30
R4
144
22
SO
33
Bridge.
*(l)'fambethlrna The !lhmea, at Thsmes 1 846563 211 61
(2)ktB.wt
Sortthwk
*(1)&dmrk
- Diilrndat-
The Thamea at Battaws
T h b ! t h m a , r t m
- 8
0
11826i
17806
111
19
94
101
(2) b t * the Ravemibourne in Kent,
ud ditahes aidwalk
*h1~aar(I8luLed#h8n~)tbemredhtriotrye~pp~d
t WbOgo O m p a U .

C48.1 3c

Figure 10.2. Weekly Return of Births and Deaths ( 26 IVovember 1853).


-
b t ~ 9 T. b e m e ~ a d y t e m - k
h ~ 6 6 . S htbs4mek8thntfbRowedit
Intheten
q
e m a greuzau. m ur:
m i 1 4 8 ~ 945*7°,~8-50*
~,
"-
aod(&tweek) 36-7'.
"- ------- -
' g reels# of the gem 1845-511 the average number of death6 was 1093,
*- no-- to inorar of popdation, h m m a 120% mere is an eulep in last
*&*InQPlln, to 137.
DkwrPrdtber have ~ ~ ~ m o t e f b l ; t b t y r o a c f i t o m 1 8 0 h t b e
~ronr~Zt"ZTi~!.tBLchr hw m ~a - w 1% ftom
nlolu ~ ~ r i r t r ~ w b t b c ~ ~ r ~is~16tmi!?Y3SZ,ith b ~ g r r ~ d
~ t o ~ m a d s a . a n i i ~ a r a ~ ~ d l ~ l t o & IatbsML4*cdL.
~ p e m m of
~ ~ ~ ~ a d k 8 4 8 4 9 ( r e o k ~ Q t o m l 4 ~ t O ci t oLb e ~? e d C r 2 9 ~ ; h ~ ~
n ~ b a a f w e e hof* prerent a& ~0mmmoing41st of1 744* or 315 pmm
mamtbrnintbtdkmwr. BcdtbaspSdemicbeginnin a t r a d e r ~ i n 1 8 S S , t h t m e S n
t e a ~ ~ b m a n ra rme i n g e ~ ~ ~ &inB , m d drnapoe for eO. tbim
Qa W or.p#P m S c b t u n d to conclade that the -per nor pare- more
vi2dent-x &at d 1848.

. - a m applisd
chie4ly by the m p m t i t e
'WaterCompurie;~.
Deaths
Ele~tion Di?atbsfrom to 100,000
in feet C h o h IdubiEants*
e:y in 18w&

t
!m.
ze I
ending
Nov. 19.

~ X D O H - - . - - - 39 2362236
-1
698

8
SO

5
* ( l ) 5 p 8 & i d d 8@l@8 at h p t d and 80 166956
(2) New M. EQIvaod, two orterian
Nml&irer - - r a h , and Nem Ever.
At C!hadwell 8 -
Herttbrddh fwm river
in 76 684468 55 9

Lee, and foar wells in


BWdlesex snd Be*
(h.snd Jun- - TheThames, 360pards above 38 109656 14 13

--
Bkw Bridge.
C h b
Eat
-- !rheTham+bt~~ - 3 122147 22 18

aLondon
TheILaav~loWhU
-
wertlldiddkSE!x- T h c U a t m
The river I&. at Lee
- 18
79
26
134300
977700
434694
: SO
84
144
22
SO
33

-d
(2) gsnt*
I the h w k a r o s i n Kent,
PldltobDldralL.
'~thnrea#r(nurted~muosria)tbewllve~ua~pp~d~~o~
1 , I I
fAQ 1 3c

Potrebbero piacerti anche