Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ABSTRACT
Wendt, I. and Carl, C., 1991. The statistical distribution of the mean squared weighted deviation. Chern. Geol. (Isot.
Gcosci. Sect.), 86: 275-285.
The probability distribution of the mean squared weighted deviation ( MSWD) is derived and its dependence on degrees
of freedom/is shown. The expectation (or mean) value of MSWD =I and is not a function off However, the +I a range
of the expectation value of the MSWD decreases with increasing f The standard deviation of the MSWD is
a=± (2/f) 112 • If MSWD >I+ 2 (2/f) 112 , there is only < 5% probability that the data define an isochron. Use ofMSWD
as a criterion for accepting or rejecting the assumption of an isochron may be applied only if analytical errors a; and ay
are well known. ' '
1. Introduction
L1y;=Y;-aX;-b (2)
When calculating regression lines ( iso-
is the deviation of the ith point from the best-
chrons, discordia, etc.) or regression planes (or
fit line in y-direction and
discordia planes) for empirical data with stan-
dard deviations a;, the weighted least mean (3)
squares method is used to obtain the best-fit
line (or plane). Parameters a and b in the the square of the error in L1y; when coordinates
equation for a straight line y=ax+b are cal- X; andY; are not correlated. If X; andY; are cor-
culated by this method. The standard devia- related, e.g. as in a Wetherill discordia, then eq.
tions (aa and ab) are calculated for these pa- 3 must be expanded by a term that takes this
rameters from individual errors (ax; and ay;) correlation into consideration (York, 1969;
(York, 1969; Wendt, 1986). For n points, the Ludwig, 1980). For three dimensions (Wendt,
calculated linear regression has ( n- 2) de- 1984) or for non-linear functions, eqs. 2 and 3
grees of freedom. The procedure is similar for also must be expanded accordingly.
a discordia plane y=ax+b+cz, which has The MSWD is commonly used as a statisti-
( n- 3) degrees of freedom. cal test of the validity of a regression line. As
A mean squared weighted deviation shown later it should average about 1 when the
( MSWD) or a mean value for chi [where observed deviations from the regression line or
i= (MSWD) 112 ] is calculated as follows: plane are within analytical error and there is
no additional scatter (geological error) due to
MSWD = f - 1 1: ( L1y; 2 I a; 2 ) ( 1)
inhomogeneous samples, no common initial
87
wheref=degree of freedom and Sr/ 86 Sr ratio, Rb or Sr gain or loss between
TABLE I
Pr(x) =I Zr(x)dx
0
f/2-l (fj2)v
= 1-e-fx/2 I Xv (11a)
V=O V!
0.5 1,0 1.5 2.0 2,5
and for odd values off:
Fig. I. MSWD frequency distributions for f (=degree of
freedom)= 2, 3, 4, 7, 10, 15 and 20.
Pr(x) ={ erf(fx) 1; 2 - (fj2n) 112 e-fx!l
ZJ = (f/2 )f12 [T(f/2)]- 1 (9)
(/-l)/2F-l2vv! ']
where the gamma function is: X I xv-, (11 b)
v=l (2v)!
00
I-
0
f 99,9
% 99,8
9~5
99
97
95
90
80 X
70
60
50
40
30
20
10
1
0.5
0.2
0.1
0.05 IMSWO): x-
0.02
0 0.5 1.0 1.5 2.0 2.5 3.0
co co
j xf/2
co co
as can be shown by partial integration when or, combined with eq. 14:
f f
co co
Z;{x) as given by eq. 7 is used.
xf/2+ I e-.fx/2 dx= ( 1+ 2/J) xf/2-1 e-M2 dx
0 0
6. The variance of the MSWD
(15)
j x 2Z;{x)dxIj Z;{x)dx
co co
Fig. 3. Maximum of MSWD frequency distribution, ± Ia, +2a and 1%, 5%, 10%, 50%, 90%, 95% and 99% range of
MSWD as function off
(after division by f). Also shown in Fig. 3 is > 95% probability the isochron 1s an
the value of x for the maximum (X max) of the "errorchron".
distribution function Z.r(x), as well as the val- In the reverse case, it may be concluded that
ues for x+ 1a, x- 1a and x+ 2a. Due to the a repeatedly obtained MSWD-value of
nonsymmetry of the distribution function: < [ 1 - ( 2 If) 112 ] is a certain indication that the
analytical errors assumed for the measured
Xmax <Xso <X= 1 values were too large. It should be stressed here
that all of the decision criteria derived from
Both approach 1 asymptotically with increas- MSWD-values have a real basis only when the
ing f (whereby the MSWD distribution ap- analytical error used for their calculation cor-
proaches a normal distribution). It can also be responds to the actual error. This assumes a
seen in Figs. 2 and , 3 that for f> 3, X= 1 statistically proper error analysis. This can be
+ (21f) I 12 corresponds nearly to the normal done, for example, by statistical analysis of a
distribution for a confidence interval of """' 84%; large number of replicate analyses of the sam-
similarly, X= 1 +2(21!) 112 for a confidence ple and a standard. The statistical error for an
interval of -96% and x=1-(21f) 112 for isotope ratio obtained by mass spectrometric
- 16% [due to the boundary condition at zero, measurement, for instance, is not the analyti-
this is not the case for X= 1-2(21!) 112 ]. cal error that should be used for this purpose;
The very simple relationship of eq. 17 thus it represents only the minimum error.
permits a decision about the acceptance or
rejection of a hypothesis, e.g. in the case 7. The (MSWD) 112 distribution
of an isochron, an MSWD-value of
>[1+2(211) 112 ], where f=n-2 (where Instead of the MSWD also the quantity
n =the number of points), indicates that with z= (MSWD) 112 (a factor of excess scattering)
280 I. WENDT AND C. CARL
TABLE II
f zf Zmax t az Pj{z)
1 ( 2/n) tl2e-='12 0 0. 7979 0.6028 2erf(z)
2 2ze-•' 0.7071 0.8862 0.4633 1-e-='
l/2
2 erf (3 112z) -c~
3
3 4.146z2e-3•'12 0.8165 0.9213 0.3888 ) ze- 3•'1 2
4 8z3e-2•' 0.8660 0.9399 0.3414 1- ( 1+2z 2)e- 2•'
l/2
5 14.87z 4e- 5•' 12 0.8944 0.9515 0.3076 2erf(5112z)-c~5) (z+iz3) e-5z'l2
2
6 27z 5e-J•' 0.9129 0.9594 0.2821 6 2+-z
1- ( 1+-z 6 4) e-Jz'
2 2·4
2
7 48.27z 6 e- 7 •' 12 0.9258 0.9650 0.2623 2erf(7t12z)- c _·_7f ( z+-z3+--zs
7 72 ) e-7z'l2
77: 1·3 1·3·5
85.33z 7 e- 4•' 0.9354 0.2459 8 32
1- ( 1+-z2+-z4+--z6 83 ) e-4•'
8 0.9693
2 2·4 2-4-6
2
9 149.6zse-9•'12 0.9428 0.9727 0.2322 2erf(9 112z)- c·9f
- ( z+-z 9 3+--z 92 5+ -93- z 7) e- 9•' 12
77: 1·3 1·3·5 1·3·5·7
2 3 4
10 260.4z 9 e- 5=' 0.9487 0.9753 0.2209 ( 10 10 10
1- 1+-z2+-z4+--z6+--zs e-Sz' 10 )
2 2·4 2·4·6 2·4+8
may be used as a quality check of a regression. butions Zj(z) for !=2, 4 and 10 are shown in
The distribution of Z is given by: Fig. 5 on p. 283.
Zj(z)=ZJz<f-l) e-<fl 2 )z 2 For the cumulative frequency distribution
Pj( z) one obtains by successive partial integra-
with tion for even values off:
f/2-1 (!)v 2v
ZJ (2!)112 (f/2)!<f-ll/21
F(f/2)
(18) Pj(z) = 1-e-fz2f2 L - =--, (20a)
V=O 2 V.
For the expected (mean) values one obtains: and for odd values off:
- ,1\1/2 1 r(f) Pj(z) =2 erf[ (jz)l/2]- (2//n)l/2 e-fz2f2
Lf-2 [F(f/2) ]2 (19)
Z= (nf2JJ !<f-l)j112v+l ( + 1)I
X L V "z2v+l (20b)
This rather inconvenient formula can be ap- y=o (2v+2)!
proximated applying Stirling's formula to:
8. Numerical test of regression equations
t- ( 1-1/4/) ( 19a)
and, since z 2 = 1, The best fit of a linear array of data points is
(1/2/) 1/2 usually performed by least-squares methods,
CJZ"' ( 19b)
i.e. minimising the weighted squares of the re-
These approximations are sufficiently precise siduals A;=(Y;-ax;-b) of a straight line
for />4. Zj(z), Zmax• i, CJZ=(l-i 2) 112 and y=ax+b:
Pj(z) are listed for/= 1, ... , 10 in Table II.
MSWD= (n-2)- 1Ew;(Y;-aX;-b) 2 --+ min
The distribution function of z is less skew
(21)
and more symmetrical than the distribution
function of the MSWD. The frequency distri- where the weighting factors W; are the in versed
THE STAT!ST!CAL DISTRIBUTION OF THE MSWD 281
TABLE III which yields the slope a for the best-fit line and
12 data points exactly on the line Y= 0.004x+ 0. 7 b=y-ax.
It can be shown (Wendt, 1976) that also the
Sample X y well-known simple equations:
I
2
5.0
15.0
0.720
0.760
a*= (xy-xy) 1(x 2 -x 2 )
3 25.0 0.800 b*=y-a*x
4 35.0 0.840 aa*2= (Ewi)-I (xz -x2) -I
5 45.0 0.880 ab* 2 =x 2 aa* 2 (24)
6 55.0 0.920
7 65.0 0.960 rab = -x/ (x 2 ) 112
8 75.0 1.000 MSWD= (n-2)- 1.Ewi[ (y 2 -y 2 )
9 85.0 1.040 -a2(x2 -x2)]
10 95.0 1.080
II 105.0 1.120 with wi as given by eq. 22 and x=Iwixj Ewi,
12 115.0 1.160
etc. yield very good approximations a* for a
and b and the difference between the true op-
squared errors of the residuals Ai due to error timal value a and the approximation a* is:
in xi and yi: Ia-a*i « aa
(22) i.e. it is insignificant. The correlation coeffi-
cient rab between a and b is of importance for
for uncorrelated errors ax and oy (for corre-
b vs. a (or vs. t) plots and for the error en vel-
lated errors a term - 2arxyaxi oyi has to be
opes of the best-fit line. If Jy is the deviation
added). The differentiation of eq. 21 in re-
of the error hyperbola from the line y=ax+b,
spect to a and b yields a cubic equation (Ma-
then:
danski, 1959; Mcintyre et al., 1960; York,
1966, 1969, 1967; Brooks et al., 1968). (25)
TABLE IV
TABLE V
Comparison of calculated and "experimental" mean values for MSWD, a( MSWD) and ( MSWD) 11 2
%
99.9 Distribution of slope a
995
99
98
95
90
80
70
50
50
40
30
20
01
slope a
Fig. 4. Cumulative frequency distribution of slope for f= I 0 (data points 1 to 12) J= 4 (data points 1 to 6 and 7 to 12)
and/= 2 (data points 1 to 4, 5 to 8 and 9 to 12). Symbols: result of numerical test with I 000 samples for each data points;
solid lines: theoretical I a curves according eqs. 24.
the weighting factors) and obtains a first im- In order to test the equations derived in this
proved value for a. After two or three repeti- paper for the MSWD and its standard devia-
tions a stable final value is reached. tion as well as eqs. 24 for the parameters of the
THE STATISTICAL DISTRIBUTION OF THE MSWD 283
(a) (b)
MSWD- Dlatrlbutlon ( f•10)
MSWO
"'"
'·'
,, ,, ,,
Fig. 5. Results of numerical test: frequency distribution and cumulative frequency distribution for f= 10, 4 and 2 for
MSWD (a) and (MSWD) 112 (b). Histograms and symbols: experimental class populations based on 1000 samples; solid
lines: theoretical distribution curves according to equations derived in this paper.
284 I. WENDT AND C. CARL
best-fit line we have taken 12 equally spaced as derived in this paper. Also in Fig. 5 the cu-
data pairs (x,y) located exactly on a straight mulative theoretical frequency distributions
line with a=4·10- 3 and b=0.7000 (Table III) (solid curves) are compared with the "experi-
which were overprinted by errors ax= ± 1% mental" values obtained by computer simula-
and oy= ± 0.1% by computer simulation with tion. In all cases the data fit sufficiently well
the random number method a thousand times. the theoretical curves.
Thus 1000 sets of these 12 x-y pairs if= 10)
were created which only differ by their statis- 9. Conclusions
tical errors. From these 1000 sets 1000 results
for a, b, rab MSWD and (MSWD) 112 and the The MSWD or i= (MSWD) 112 may be used
means and standard deviations were calcu- as a statistical test for regression of experimen-
lated. Splitting the 12 data pairs in two groups tal data with a theoretical curve if the overall
of 6 data each either points 1 to 6 and 7 to 12, analytical errors of the data are known. The
or samples 1, 3, 5, ... , 9, 11 and samples 2, 4, expected (mean) value of the MSWD is 1.000
... , 10, 12, respectively, four different data sets and the ± 1a range= ± ( 2I f) 112 depends on
with/= 4 are obtained. These four groups have the degree of freedom f, while for i=
the same a( MSWD) but different aa-, ab- and ( MSWD) 112 the average value is approxi-
rab-values. Splitting the 12 data pairs into three mately 1 - 1I 4f and the ± 1a range is about
groups: 1 to 4, 5 to 8 and 9 to 12 or samples ( 1, ± ( 112/) 112 • These simple formulae may serve
4, 7, 10), (2, 5, 8, 11) and (3, 6, 9, 12), re- as a criterion to decide whether a set of mea-
spectively, one obtains 6 groups with/=2. The sured data represents an isochron or a discor-
results of the statistical evaluation of the 1000 dia or reject this hypothesis if:
variations of the above-mentioned groups are
compiled in Tables IV and V, where also the MSWD> 1+2(211) 112
"experimentally" obtained values for aa, ab, rab or if:
and a(MSWD) are compared with the corre-
sponding values calculated using eqs. 24. i> 1-114/+2( 1121) 112
The calculated a-values show a normal dis- respectively.
tribution (Fig. 4) in agreement with the stan-
dard deviation aa as given in eqs. 24. References
In Fig. 5a the MSWD- and in Fig. 5b the one
thousand ( MSWD) 112-values grouped into Bronstein, I.N. and Semendjajew, K.S., 1979. Taschen-
classes of width 0.2 are presented as histo- buch der Mathematik. Deutsch, Frankfurt/M. 19.
grams while the solid curves represent the Auflage.
MSWD and ( MSWD) 112 frequency distribu- Brooks, C., Wendt, I. and Harre, W., 1968. A two-error
regression treatment and its application to Rb-Sr ages
tions Z.r(x) and Z.r(z) and initial Sr 87 /Sr86 ratios of younger Variscan granit-
ic rocks from the Schwarzwald Massif, Southwest Ger-
Z 10 (x) = 130.21x 4e-sx many. J. Geophys. Res., 73( 18 ): 6071-6084.
Ludwig, K.R., 1980. Calculation of uncertainties ofU-Pb
Z (z) =260.4 z 9 e- 5z2
10 isotope data. Earth Planet. Sci. Lett., 46: 212-220.
Madansky, A., 1959. The fitting of straight lines when both
Z 4 (x) =4xe- 2 x
variables are subject to error. J. Am. Stat. Assoc.,
Z4(z)=8z3e-2z2 54(285): 173-203.
Mcintyre, G.A., Brooks, C., Compston, W. and Turek, A.,
Z 2 (x) =e-x 1960. The statistical assessment ofRb-Sr isochrons. J.
Geophys. Res., 71: 5459.
Z 2(z) =2ze-z 2 Wendt, 1., 197 6. A simplified calculation of regression lines
with errors in x- and y-direction. Bundesanst. Geo-
THE STATISTICAL DISTRIBUTION OF THE MSWD 285
wiss. Rohstoffe, Hannover, Open File Rep., Feb. 1976, York, D., 1966. Least-squares fitting of a straight line. Can.
Archiv No. 106 524. J. Phys., 44: I 079-1086.
Wendt, I., 1984. A three-dimensional U-Pb discordia York, D., 1967. The best isochron. Earth Planet. Sci. Lett.,
plane to evaluate samples with common lead of un- 2: 479-482.
known isotopic composition. Isot. Geosci., 2: 1-12. York, D., 1969. Least-squares fitting of a straight line with
Wendt, I., I 986. Radiometrische Methoden in der Geo- correlated errors. Earth Planet. Sci. Lett., 5: 320-324.
chronologie. Clausthaler Tekton. Hefte, 23: 1-170.