Sei sulla pagina 1di 17

Journal of Hydrology, 58 (1982) 11--27 Elsevier Scientific Publishing Company, Amsterdam -- Printed in The Netherlands [3] SOME METHODS

FOR TESTING THE HOMOGENEITY OF RAINFALL RECORDS

11

T.A. BUISHAND Royal Netherlands Meteorological Institute (K.N.M.L ), De Bilt (The Netherlands) (Received June 25, 1981; accepted for publication August 19, 1981)

ABSTRACT Buishand, T.A., 1982. Some methods for testing the homogeneity of rainfall records. J. Hydrol., 58: 11--27. Cumulative deviations from the mean are often used in the analysis of homogeneity. Features of five tests on the cumulative deviations are discussed. Some of these tests have optimal properties in testing the null hypothesis of homogeneity against a shift in the mean at an unknown point. Together with the classical yon Neumann ratio the tests were applied to the annual amounts of 30-yr. rainfall records in The Netherlands. For a large number of records strong indications for a change in the mean were found. There were only small differences between the various test-statistics with respect to the number of records for which the null hypothesis was rejected.

INTRODUCTION H o m o g e n e o u s rainfall r e c o r d s are o f t e n r e q u i r e d in h y d r o l o g i c design. H o w e v e r , it f r e q u e n t l y o c c u r s t h a t rainfall d a t a o v e r d i f f e r e n t p e r i o d s are n o t c o m p a r a b l e since t h e m e a s u r e d a m o u n t o f rainfall d e p e n d s o n such f a c t o r s as t h e t y p e , h e i g h t a n d e x p o s u r e o f t h e raingauge, w h i c h h a v e n o t a l w a y s b e e n t h e s a m e . T h e r e f o r e m a n y m e t e o r o l o g i c a l i n s t i t u t e s m a i n t a i n an archive w i t h i n f o r m a t i o n o n t h e r a i n g a u g e sites a n d t h e i n s t r u m e n t s used. U n f o r t u n a t e l y , it is o f t e n n o t possible t o s p e c i f y t h e n a t u r e o f c h a n g e s in t h e m e a n a m o u n t o f rainfall f r o m t h e s t a t i o n d o c u m e n t a t i o n . T h i s is p a r t l y b e c a u s e it is n o t a l w a y s k n o w n h o w a c h a n g e in t h e i n s t r u m e n t or in t h e r a i n g a u g e site m a y i n f l u e n c e t h e m e a s u r e d a m o u n t o f rainfall a n d p a r t l y b e c a u s e it is highly q u e s t i o n a b l e w h e t h e r t h e s t a t i o n i n f o r m a t i o n gives a c o m p l e t e p i c t u r e o f t h e r a i n g a u g e site d u r i n g t h e p e r i o d t h a t t h e s t a t i o n h a s b e e n in o p e r a t i o n . B e c a u s e o f t h e u n c e r t a i n t y a b o u t p o s s i b l e changes, graphical m e t h o d s are o f t e n used in c l i m a t o l o g y a n d h y d r o l o g y t o o b t a i n s o m e insight i n t o t h e h o m o g e n e i t y o f a r e c o r d . A p o p u l a r t o o l is t h e d o u b l e - m a s s c u r v e ( S e a r c y a n d H a r d i s o n , 1 9 6 0 ) , w h i c h is o b t a i n e d b y p l o t t i n g t h e c u m u l a t i v e a m o u n t s

0022-1694/82/0000--0000/$02.75

1982 Elsevier Scientific Publishing Company

12

o f the station u n d e r consideration against the cumulative a m o u n t s of a set of neighbouring stations. T he p l o t t e d points t end t o fall along a straight line u nd er conditions o f h o m o g e n e i t y . Instead of the double-mass curve one can also plot the cumulative deviations f r o m some average value. The cumulative deviations have the advantage t hat changes in the mean a m o u n t of rainfall are easier recognized (Craddock, 1979). The graph o f the cumulative deviations is sometimes called a residual mass curve. Th o u g h graphs are useful for the d e t e c t i o n of shifts in the mean it is usually n o t obvious how real changes can be distinguished f r o m purely rand o m fluctuations. T h e r e f o r e it is always necessary to test the significance o f departures f r o m h o m o g e n e i t y by statistical methods. C o m m o n statistical techniques in climatology and h y d r o l o g y are reviewed in a publication on climatic change by the World Meteorological Organization (W.M.O., 1966). It is a surprising fact, however, t hat these statistical tests are n o t based on some characteristic o f the cumulative sums in the graphical analysis. The in ten tio n of this paper is t o discuss some tests on the cumulative deviations. These tests are c o m p a r e d with the classical von N eum ann ratio. A study was made on properties of the test-statistics for a simple model with a shift in t he mean. F u r t h e r the usefulness of the tests was investigated for annual rainfall totals in T he Netherlands for the period 1951-1980. First some features of t he test-statistics are derived and t hen the application to the rainfall data is discussed.

STATISTICAL ANALYSIS OF HOMOGENEITY

In the i n t r o d u c t i o n the need for statistical techniques was emphasized to test the h o m o g e n e i t y of rainfall records. Suppose t hat one wants to test the h o m o g e n e i t y o f a sequence Y1, Y2, , Yn. Under t he null hypothesis H0 it is usually assumed t h a t the Yi's have the same mean. The form of the alternative hypothesis H1 is generally r at her vague since o f t e n no reliable prior i n f o r m a t i o n is available a b o u t possible changes in the mean. Usually, some assumptions are made on the j o i n t distribution o f the Yi's. Most tests require t h a t the Yi's be i ndependent . This is n o t a serious restriction, since th e tests are usually p e r f o r m e d on consecutive seasonal or annual values which are a p p r o x i m a t e l y i n d e p e n d e n t in m a n y countries. The distributions o f th e test-statistics in this paper are derived for the case t hat the Yi's are stochastically i n d e p e n d e n t and have a normal distribution, The tests can still be applied, however, when there are slight departures f r o m normality. In th e literature a b o u t testing the h o m o g e n e i t y of rainfall records, hardly any a t t e n t i o n is paid to the distribution of test-statistics under the alternative hypothesis. Generally, no i n f o r m a t i o n is given on the probability of rejecting the null hypothesis in relation to the magnitude of changes in the mean. In this paper, properties of test-statistics are illustrated for t he case t h a t th e Yi's are normally distributed with mean:

13
u, i = 1 .... , m

E (Yi)

= t #+A,

i=m+l

....

,n

(1)

and variance:

var Yi = o~
The model assumes a jump in the mean of magnitude A after m observations. In the sequel, examples are given of the probability of rejecting H 0 as a function of A. Also, some remarks are made on the estimation of the change-point m.

The von Neumann ratio


The well-known yon N e u m a n n ratio is defined by: ,-1 2/~ N = ~ ( Y i - - Yi+l)
i=l ~i=1

( Y i - - Y):

(2)

in which Y stands for the average of the Yi's. Under the null hypothesis of a constant mean it can be shown that E(N) = 2. For a non-homogeneous record the mean of N tends to be smaller than 2. A table of percentage points of N for normally distributed samples is given by Owen (1962). The yon Neumann ratio is closely related to the first-order serial correlation coefficient (W.M.O., 1966). A comprehensive study of the effect of changes in the mean on the correlogram was made by Yevjevich and Jeng (1969).

Cumulative deviations
Tests for homogeneity can be based on the adjusted partial sums or cumulative deviations from the mean:
k

S~ = 0;

St = ~
i=l

(Yi--Y),

k= 1,...,n

(3)

Note t h a t S* -- 0. For a homogeneous record one may expect that the S t ' s fluctuate around zero since there is no systematic pattern in the deviations of the Yds from their average value Y. On the other hand, when A is negative in eq. 1 most values of S t are positive because the Yi's tend to be larger than :P if i ~< m, and smaller than Y if i > m. A typical example is given in Fig. 1. For A positive the S t ' s tend to be negative. Rescaled adjusted partial sums are obtained by dividing the S~'s by the sample standard deviation:

St*

= S~/Dy,

k = 0,...,

(4)

14

Yk
20-

10

0 0 5 ' 1o ' I'5

~0

15

Fig. 1. Non-homogeneous time series with adjusted partial sums.

with

D2Y = ~ (Yi-- Y)2/n i=l


The values of the S~*'s are not influenced by a linear transformation of the data. For instance, if the amount of rainfall is expressed in metres instead of in millimetres, the S~ 's are diminished by a factor 1000 but the S~*'s remain unchanged. Therefore tests of homogeneity are based on the rescaled adjusted partial sums S~*. A statistic which is sensitive to departures from homogeneity is: V = max
O~k~n

IS~*l

(5)

High values of Q are an indication for a change in level. Critical values for the test-statistic can be found in Table I. The percentage points in this table are based on 19,999 synthetic sequences of Gaussian random numbers. For n-+~o the critical values of Q can be obtained from a table of the Kolmogorow-Smirnov goodness-of-fit statistic, see the Appendix.
TABLE I

Percentage points of Q p ~ r n a n d
n

R]X/-n R/%/~
99% 1.29 1.42 1.46 1.50 1.52 1.55 1.63 90% 1.21 1.34 1.40 1.42 1.44 1.50 1.62 95% 1.28 1.43 1.50 1.53 1.55 1.62 1.75 99% 1.38 1.60 1.70 1.74 1.78 1.86 2.00

Q]Vrn
90% 95% 1.14 1.22 1.24 1.26 1.27 1.29 1.36

10 20 30 40 50 100

oo

1.05 1.10 1.12 1.13 1.14 1.17 1.22

15 A n o t h e r statistic which can be used for testing homogeneity is the range: R = max
O~k<~n

St*--

min
O~k~n

St*

(6)

The range is an important q u a n t i t y in studies on the storage capacity of reservoirs. Much work has been done on its statistical properties in relation to the famous Hurst p h e n o m e n o n (Gomide, 1978). Shifts in the mean usually give rise to high values of the range. A figure with percentage points of the distribution of R under the null hypothesis is given by Wallis and O'Connell (1973). Some percentage points are also given in Table I since it is n o t convenient to determine critical values from a graph.

Worsley's likelihood ratio test


Consider again eq. 1 and assume t h a t one wishes to test A = 0 against A 4= 0. If the position of the change-point m is k n o w n Student's t-test can be used. In situations that no information about m is available the test can be based on. W = max
l <~ k <~ n - 1

[tk[

(7)

where tk denotes Student's t for testing a difference in mean between the first k and the last (n -- k) observations. Critical values for the test-statistic can be obtained from a paper by Worsley (1979). The test is equivalent with the likelihood ratio test. It is also possible to give a relation between W and the weighted adjusted partial
sums :

Z'~ = [ k ( n - - k ) l - v 2 S t ,

k=l,...,n--1

(8)

The largest weights are given to S T and S*-1. The weights are relatively small for k in the neighbourhood of n. From eq. A-3 in the Appendix it is seen t h a t the variance of Z~ does n o t depend on k. Dividing Z~ by the sample standard deviation gives the weighted rescaled adjusted partial sums Z~ *. Let V = max
l~<k~< n-1

IZ~*l

(9)

then some algebra shows (Worsley, 1979): W = ( n - - 2) 1/2 V / ( 1 - - V 2)1/2 (10)

So there is a unique relation between V and W, which means t h a t tests on V and W are equivalent.

16

Bayesian procedures
B a y e s i a n p r o c e d u r e s f o r t h e d e t e c t i o n o f c h a n g e s in t h e m e a n h a v e b e e n d e v e l o p e d b y C h e r n o f f a n d Z a c k s ( 1 9 6 4 ) a n d G a r d n e r ( 1 9 6 9 ) . In t h e derivat i o n o f B a y e s i a n t e s t s it is a s s u m e d t h a t t h e v a r i a n c e a ~ is k n o w n . G a r d n e r ' s statistic f o r a t w o - s i d e d t e s t o n a shift in t h e m e a n at an u n k n o w n p o i n t can b e w r i t t e n as:
r~-i

= ~ Pk {S~ / ay}2
k=l

(11)

w h e r e Pk d e n o t e s t h e p r i o r p r o b a b i l i t y t h a t t h e shift o c c u r s j u s t a f t e r t h e k t h o b s e r v a t i o n (k = 1 , . . . , n - - 1). W h e n t h e s t a n d a r d d e v i a t i o n is n o t k n o w n a y c a n b e r e p l a c e d b y t h e s a m p l e s t a n d a r d d e v i a t i o n . F o r Pk i n d e p e n d e n t o f k ( u n i f o r m p r i o r distribution) one obtains: 1 n-1

U -

E n(n + 1) k=, {S~*} 2

(12)

a n d f o r Pk p r o p o r t i o n a l t o 1 / [ k (n - - k) ] o n e o b t a i n s :
n-1

A =

~
k=l

{Z~*} 2

(13)

L a r g e values o f t h e s e test-statistics are an i n d i c a t i o n f o r d e p a r t u r e s f r o m h o m o g e n e i t y . Critical values f o r U a n d A are given in T a b l e II. T h e p e r c e n t age p o i n t s in this t a b l e are b a s e d o n 1 9 , 9 9 9 s y n t h e t i c s e q u e n c e s o f G a u s s i a n r a n d o m n u m b e r s . T h e l i m i t i n g d i s t r i b u t i o n s o f U a n d A are t h o s e o f c e r t a i n test-statistics o f t h e C r a m ~ r - - v o n Mises t y p e . T h e statistic U/n c o r r e s p o n d s a s y m p t o t i c a l l y w i t h S m i r n o v ' s ~2 a n d t h e statistic A w i t h t h e A n d e r s o n - D a r l i n g statistic, see t h e A p p e n d i x . TABLE II Percentage points of U and A n U
90% 95% 99%

A
90% 95% 99%

10 20 30 40 50 100 oo

0.336 0.343 0.344 0.341 0.342 0.341 0.347

0.414 0.447 0.444 0.448 0.452 0.457 0.461

0.575 0.662 0.691 0.693 0.718 0.712 0.743

1.90 1.93 1.92 1.91 1.92 1.92 1.93

2.31 2.44 2.42 2.44 2.48 2.48 2.49

3.14 3.50 3.70 3.66 3.78 3.82 3.86

17

The power o f tests on homogeneity


The probability of detecting changes in the mean of a sequence Y1, Y 2 , . . - , Yn by statistical methods depends on how serious these changes are. When only a small change occurs during a short period of the sample record there is little chance t h a t the tests will indicate non-homogeneity. On the other hand, for feasible test-statistics it is necessary that they should be able to indicate all relevant departures from homogeneity. A study on the power of tests for a change in level at an u n k n o w n point was made by Sen and Srivastava (1975). These authors compared the likelih o o d ratio statistic with Bayesian procedures. In this paper the power of N, Q and W is discussed for testing A = 0 against A ~= 0. For a particular test-statistic the probability of rejecting H0 depends on the significance level ~, the value of A, the standard deviation o y , the number of observations n and the position of the change-point m. The dependence on A and Oy can be combined into one single parameter: A' = A / o y . The power of N, Q and W was investigated for ~ = 0.05 and n = 30. Comparisons between these test-statistics were based on their power function: P ( A ' , m ) = Pr(H0 is rejected I A ' , m ) (14)

If A = 0, then P ( A ' , m ) = ~ = 0.05; for A ~= 0 and m fixed the power function increases monotonically with the absolute value of A'. With IA'l growing, P(A',m) tends to 1, t h a t is H 0 is rejected with probability 1. To obtain the power of N, Q and W 1,999 sequences of 30 pseudo-random numbers were generated from a standard normal distribution. For each sequence the statistics N, Q and W were calculated and then the critical values were read from the ordered samples of the c o m p u t e d statistics. The powers

P(a.m)
1.0 ,"'"" ......

0.8-

//

0.6-

0.4-

0.2.~...~.. 0
o

Q ~ / // /.." /' ]:llll .""//


f /

/'I"'' ~"
m:
15 n :30
i

'

& Fig. 2. S i m u l a t e d p o w e r s o f t h e statistics N, Q and W for testing a change of level in the middle o f a sequence (ol = 0.06).

18 P(z~',m)

1.0

0.8 -

. W , ~ ,..." ..',:"
/ //1/ "-N

S~

O.Z,
0.2

""~--/i .. ~ / "
.;;,d"
"

m=5
n =30

i & Fig. 3. Simulated powers of the statistics N, Q and W for testing a change of level near the beginning of a sequence (~ = 0.05).
i

were based on t he same set of r a n d o m num bers by calculating t he teststatistics again after adding a cons t a nt A' to the last ( 3 0 - m) numbers of each sequence. Simulated p o w e r functions of N, Q and W for m = 15 and m = 5 {which is equivalent to m = 25) are given in Figs. 2 and 3, respectively. Since the p o w e r functions are symmetric in A', non-negative values of A' are considered only. F r o m the figures it is seen t hat the von N e u m a n n ratio N is less p o wer f u l than Q and W b o t h f or m = 5 and m = 15. This is n o t surprising since N is n o t based on a specific f o r m of the alternative hypothesis whereas Q and W are particularly designed for testing a change in level at an u n k n o w n point. F o r ot her departures f r o m h o m o g e n e i t y N could be mo r e p o w e r f u l t ha n Q and W. Q is superior to W for m = 15, while the opposite holds for m = 5. In general, for m in t he n e i g h b o u r h o o d o f n the statistic Q is m ore pow erful t h an W. On the o t h e r hand, W is m or e sensitive to changes at the beginning and at the end o f the sequence. This is a consequence of the large values of t h e weights [k(n -- k)]-]/2 near t he end-points. Th e p o wer o f the Bayesian statistic U is comparable with t hat of Q and t h e p o wer f u n c t i o n o f A is s om ew ha t similar to t hat of W. F o r a single change in the mean t he range R is less powerful than Q. But for t w o changepoints th e range usually gives a be t t e r test. A case with two change-points is discussed b y Buishand {1981). Estimation o f the position o f a change-point Graphs o f cumulative deviations are o f t e n used to det erm i ne the position

19

of change-points. It is then assumed that something has happened at points where the cumulative sum plot shows a clear change of slope. For the model in eq. 1 the position of the m a x i m u m of IS~[ or IZ~[ can be taken as an estimate for the change-point m. Let K be the value of k for which IS~[ reaches its maximum, i.e. Q = [S~* [. In the same way M is chosen such that V = [Z~*[. Asymptotic properties of the statistic M were derived b y Hinkley (1970). Because of the slow convergence of M to its asymptotic distribution Hinkley's results are n o t applicable to most hydrological sequences.

Pr (K:k)
0.5-

0.4
~--- rn =15

0.3-

0.2

'!f
_

m=5

I
0.1

L-, i

-J
r-J i

L-~

10

20

Fig. 4. Distribution o f the index for w h i c h Is~l reaches its m a x i m u m u n d e r the c o n d i t i o n that the null h y p o t h e s i s is rejected at t h e 5% level (n = 30, IA'I = 1.5).
Pr

(M:k)

0.5-

0.4o.3-

:*--m:5

~--rn=15

o,2f-J

r-J 0 i0 20 k

Fig. 5. Distribution of the index for which iZ~t reaches its m a x i m u m under the condition that the null hypothesis is rejected at the 5% level (n ----30, i A'i = 1.5).

20

For n = 30 the probability distribution of K and M was obtained from the generated samples on which Figs. 2 and 3 were based. Only those samples were taken into account for which the null hypothesis of a constant mean was rejected at the 5% level. Figs. 4 and 5 give the distributions of K and M in the situation t h a t IA'] = 1.5. The distributions are given for two positions of the change-point: m = 5 and m = 15. The peak in the empirical distributions of K and M always coincides with the position of the change-point m. For m = 15 the statistic K is less dispersed than M, while on the other hand M is superior if m = 5. The distribution of K is highly skewed when the change in the mean occurs at the beginning of the sequence. This can be roughly explained as follows. Fig. 6 gives the means of S~ and Z~ (obtained from eq. A-1 in the Appendix) for m = 5 and ~ = --1.5. The mean of S~ rises quickly to its m a x i m u m at k = 5, but for k > 5 the mean drops down slowly. From the figure one reads for instance E(S~o) >E(S~). Also from eq. A-2 it follows var(S~0)> var(S~) and consequently for s sufficiently large Pr(S~o>S)> Pr(S~>s). Since the probability of high values for S~ at the beginning of the sequence is relatively small, it is very unlikely t h a t S~ reaches its m a x i m u m for k < 5. Therefore the distribution of K is positively skewed in this situation. This is n o t the case for the statistic M, since for the weighted adjusted partial sums Z~ the curve of the mean is rather symmetric near the peak at k = 5, and the variance does n o t depend on k. So in the situation of a single change-point the index for which Z~ reaches its m a x i m u m (or minimum) has a rather symmetric distribution. When there are two change-points it may occur t h a t the positions of the m a x i m u m and m i n i m u m of the Z~'s have very skewed distributions (Buishand, 1981). In Figs. 4 and 5 the magnitude of the jump in the mean was 1.5oy. For larger jumps in the mean the distributions of K and M are more concentrated around the position of the change-point m. When there is only a small change the estimates of m are widely scattered.

E (S~()

E(Z~)
0.6

2-

0.

0
0 ~0 20 k 30

0 3

10

2'o k 30

Fig. 6. M e a n o f S ~ a n d Z ~ f o r n = 3 0 , m = 5 a n d A = - - 1 . 5 .

21 APPLICATION TO RAINFALL DATA In the climatological network of the Royal Netherlands Meteorological Institute (K.N.M.I.) there are about 320 stations with daily rainfall registrations (about 1 gauge per 1 0 0 k m ~ ). The data from 1951 onwards are available on magnetic tape. The homogeneity of the records from 264 stations was investigated for the period 1951--1980. Stations with long interruptions in the observations were n o t taken into account. To obtain a sequence of 30 yr. for each station, missing data (e.g. due to a change of observer or the damaging floods in February 1953) were supplemented from nearby stations.

The use o f year-by-year differences


For the analysis of homogeneity the c o u n t r y was divided into a number of regions (Fig. 7). In the flat regions I, II, III and IV there is little variation in the local rainfall climate. Differences in the mean a m o u n t of rainfall are more pronounced in the small hilly region V. Fig. 8 gives for each region the annual means over consecutive 5-yr. periods. The figure shows t h a t the early 1950's were rather dry. The very wet 1960's were followed by the dry 1970's. The statistical tests were applied to the sequence of year-by-year differences:
ri = z i - Ri (15)

with Xi: a m o u n t of rainfall in year i for the station under consideration; and R~: average a m o u n t of rainfall in year i for the other stations in the region. In general, regional means are hardly sensitive to changes in the site of individual rainfall stations. Local changes in the observations of the station under consideration affect the means of the Xi's and the Yi's in the same way. But since o~ < o~c, the Y~'s are preferred for testing homogeneity. In The Netherlands the standard height of the rain-gauge is 0.40 m and the climatological conditions are such t h a t a station relocation or a change in the exposure of the gauge m a y lead to a decrease or increase in the annual mean of 5--10%. From Fig. 8 it is seen t h a t in all regions the annual mean is ~ 8 0 0 mm. Since o y is on average 45 mm, the standardized shift A' is nearly 1 for a change in the mean of 5%. For this value of A' it is seen from Fig. 2 t h a t for m = 15 the probability of rejecting the null hypothesis varies from 0.27 to 0.67, depending on the test-statistic used. These probabilities are much smaller for a change in the mean near the end-points of the sequence. For m = 5 it follows from Fig. 3 t h a t the probability of rejecting H 0 still differs substantially from 1 for jumps in the mean of 10% (A' ~ 2). For each test-statistic the number of records were counted for which the null hypothesis of a constant mean was rejected at a certain significance level. The results are given in Table III for the 5% level and in Table IV for

22

5O
i

lOOkm
o

..

r
i

II
)
s /

( s.

Ig
,r, :/ i
..- % L

'!
~,
~'/

\
) /

i!
mm 900
800

'\4
,

"L\

Fig. 7. Geographical regions used in the analysis of homogeneity.

/L
-

11

700

1950 mm

1980

1950

1980

9oo1

T~

,oo~F ~-

1980

1950

1980

Fig. 8. Average annual amounts over 5-yr. periods for the regions in Fig. 7.

23 TABLE III Results of tests on homogeneity ((~ -- 0.05) Region Total number of stations Number of stations for which H 0 is rejected N 16 22 17 19
5 79

Q 21 25 10 17
-73

R 22 31 10 16
1 80

W 23 20 11 16
-70

U 17 23 11 13
-64

A 16 22 10 12
-60

I II III IV V
Total

66 71 53 64
I0 264

TABLE IV Results of tests on homogeneity (~ ----0.01) Region Total number of stations Number of stations for which H0 is rejected N
9

Q 11 16 6 8
. . .

R 10 14 4 11
. .

W
7

U
9

A
9

66

II III IV
V

71 53 64
I0

10 3 6
.

10 4 6 27

6 4 6 25

7 4 6 26

Total

264

28

41

39

t h e 1% level. Despite t h e l o w p o w e r o f t h e test-statistics f o r relevant changes in t h e m e a n , t h e r e are m a n y r e c o r d s with s t r o n g statistical evidence o f n o n homogeneity.

Discussion of the results


Table I I I s h o w s t h a t f o r each test-statistic t h e r e are ~ 70 significant values at the 5% level. U n d e r t h e null h y p o t h e s i s o f a c o n s t a n t m e a n f o r all 2 6 4 r e c o r d s the e x p e c t e d n u m b e r o f significant values is 13. P r o v i d e d t h a t corr e l a t i o n b e t w e e n t h e Yi's f r o m d i f f e r e n t stations can be neglected, t w e n t y significant values w o u l d be highly unusual. This last a s s u m p t i o n is questionable since t h e r e is always s o m e c o r r e l a t i o n b e t w e e n the y e a r - b y - y e a r differences o f n e a r b y stations. F o r instance, let A and B be t w o n e i g h b o u r i n g s t a t i o n s in t h e same r e g i o n a n d s u p p o s e t h a t in a p a r t i c u l a r y e a r the a m o u n t o f rainfall at s t a t i o n A lies a b o v e t h e regional average. T h e n it is very likely t h a t the a n n u a l a m o u n t o f its n e i g h b o u r B is also higher t h a n the regional m e a n . D u e to this c o r r e l a t i o n it is possible t h a t in a p a r t i c u l a r region t h e n u m b e r o f significant values is m u c h larger t h a n t h e e x p e c t e d n u m b e r u n d e r

24 t he null hypothesis. However, when all records are h o m o g e n e o u s it is very unlikely t h a t this occurs in nearly all regions as is the case here. So it can be c oncl uded t h a t m a n y records are n o t hom ogeneous. There are only small differences b e t w e e n the various test-statistics with respect t o t he n u m b e r o f significant values. T he fact t h a t this n u m b e r is relatively high for the statistics N and R indicates t h a t departures f r o m h o m o g e n e i t y do n o t always consist o f a single shift in t he mean. T h o u g h for a n u m b e r of records there is statistical evidence of changes in th e mean, it is o f t e n n o t possible to correct these records. T o make sensible corrections it is necessary to know the causes of differences in the mean. F o r ten records with serious departures f r o m h o m o g e n e i t y a careful examination o f the station history was made to t ry to find a reason for these departures. Only in five cases was some indication f o u n d for a decrease or increase in th e mean a m o u n t of rainfall. In one of these the situation o f the raingauge site had been gradually improved and in four others there was a marked change in t he slope o f t he cumulative sum plot coinciding with the date o f a station relocation. But even for three of these four stations it was n o t quite clear w hy t he change of location resulted in a considerable decrease or increase in t he mean a m o u n t o f rainfall.

Other methods for testing homogeneity


Sometimes the sequence of ratios Xi/Ri is preferred for testing homogeneity (W.M.O., 1966). Instead of testing for a constant ratio bet w een t w o quantities one can also test f or a c ons t ant difference bet w een their logarithms. Th e tests f or h o m o g e n e i t y were repeated with the logarithms of the annual amounts, which gave the same results as the tests on t he original annual amounts. Th e tests were also d o n e with a partition of T he Netherlands into fifteen regions instead of five. F o r m os t stations the results were nearly identical. There were, however, a few stations f or which one subdivision indicated serious departures f r o m h o m o g e n e i t y whereas the o t h e r subdivision did not.

SUMMARY AND CONCLUSIONS Characteristics of cumulative deviations f r o m the mean can be used to test the h o m o g e n e i t y o f rainfall records. As a first example t w o tests on the rescaled adjusted partial sums were introduced. Weighted cumulative deviations were discussed to emphasize changes near the end-points o f t h e sequence. It was p o i n t e d o u t t h a t Worsley's likelihood ratio test f o r a shift in t he mean in normal populations is equivalent t o a test o n th e weighted adjusted partial sums. Some a t t e n t i o n was paid to Bayesian procedures for testing a change in level. Th e resulting test-statistics are simple quadratic forms o f the rescaled adjusted partial sums.

25

It was shown by the data generation m e t h o d t h a t tests on the cumulative deviations are superior to the classical yon Neumann ratio for a model with only one change in the mean. The tests were applied to annual data for 264 rainfall stations in The Netherlands. There was strong evidence of departures from homogeneity. The yon Neumann ratio gave nearly the same results as the tests on the cumulative deviations.

ACKNOWLEDGEMENTS

The author wishes to t h a n k his colleagues of the Climatological Branch for proposing this subject. He also would like to express his sincere gratitude to Messrs. A. Denkema and A.C. Patist for their work with the rainfall data.

APPENDIX

Properties of adjusted partial sums


When the Y}s have a normal distribution, then the adjusted partial sums are also normally distributed. For the model in eq. 1 it can readily be shown that:

E(S~) =

_ k(n -- m) A,
n

k=O,...,m (A-l) k=m+ l,...,n

m(n - - k ) A '
n

and var(S~) =

k(n--k)o~,, k=O,...,n
n

(A-2)

So for the weighted adjusted partial sums Z~ one obtains:

var(Z~) = l a ~ ,
n

k= 1,...,n--1

(A-3)

Asymptotic properties of the sequence {S~} in the situation that A = 0 are used in heuristic derivations of the limiting distributions of the Kolmogorov--Smirnov and the Cram6r--von Mises statistics (Doob, 1949; Anderson and Darling, 1952). Let:
=

max
O~k~n

]S'~]/ay

(A-4)

26

and R = {max
O~l~n

St--

min
O~k~n

S~}/ay

(A-5)

The limiting distribution of Q/n is the same as that of the Kolmogorov-Smirnov statistic (Doob, 1949). A derivation of the asymptotic distribution o f / ~ is given by Feller (1951). The distribution of the quadratic form in eq. 11 was investigated by Anderson and Darling (1952) to derive the limiting distribution of the Cram~r--von Mises statistic.
Properties o f rescaled adjusted partial sums

Because of the sample standard deviation in the denominator of eq. 4 the rescaled adjusted partial sums do n o t have a normal distribution. When A = 0, the squares of the weighted rescaled adjusted partial sums Z~* have a beta distribution with parameters and n -- 1 (Anis and Lloyd, 1976). Therefore: 1 var(Z~*) = E(Z~*) 2 - n - - l ' k = l , n--1 (A-6)
~

and var(S~*)
=

(Sk)

** 2

k(n--k) n--1 '

k=O,...,n

(A-7)

Substitution of this expression for E(S~*) 2 in the right-hand side of eq. 12 gives E(U) = ~ for all n. So U/n and Smirnov's ~2 have the same mean. In the same way it is shown that E ( A ) = 1 in correspondence with the mean of the Anderson--Darling statistic. Since for independent normal variates the sample standard deviation D y converges with probability 1 to a y , the statistics Q and R have the same limiting distribution as Q and/~, respectively, and the asymptotic distributions of U/n and A are identical to those of Smirnov's ~2 and the Anders o n - D a r l i n g statistic.

REFERENCES Anderson, T.W. and Darling, D.A., 1952. Asymptotic theory of certain "goodness of fit" criteria based on stochastic processes. Ann. Math. Stat., 23: 193--212. Anis, A.A. and Lloyd, E.H., 1976. The expected value of the adjusted rescaled Hurst range of independent normal summands. Biometrika, 63: 111--116. Buishand, T.A., 1981. The analysis of homogeneity of long-term rainfall records in The Netherlands. R. Neth. Meteorol. Inst. (K.N.M.I.), De Bilt, Sci. Rep. No. 81-7. Chernoff, H. and Zacks, S., 1964. Estimating the current mean of a normal distribution which is subjected to changes in time. Ann. Math. Stat., 35: 999--1018. Craddock, J.M., 1979. Methods of comparing annual rainfall records for climatic purposes. Weather, 34: 332--346.

27 Doob, J.L., 1949. Heuristic approach to the Kolmogorov--Smirnov theorems. Ann. Math. Stat., 20: 393--403. Feller, W., 1951. The asymptotic distribution of the range of sums of independent random variables. Ann. Math. Stat., 22: 427--432. Gardner Jr., L.A., 1969. On detecting changes in the mean of normal variates. Ann. Math. Star., 40: 116--126. Gomide, F.L.S., 1978. Markovian inputs and the Hurst phenomenon. J. Hydrol., 37: 23--45. Hinkley, D.V., 1970. Inference about the change-point in a sequence of random variables. Biometrika, 57: 1--17. Owen, D.B., 1962. Handbook of Statistical Tables. Addison-Wesley, Reading, Mass. Searcy, J.K. and Hardison, C.H., 1960. Double-mass curves. In: Manual of Hydrology: Part 1, General Surface Water Techniques. U.S. Geol. Surv., Water-Supply Pap., 1541-B: Washington, D.C., 31--59. Sen, A. and Srivastava, M.S., 1975. On tests for detecting change in mean. Ann. Stat., 3: 98--108. Wallis, J.R. and O'Connell, P.E., 1973. Firm reservoir y i e l d - How reliable are historic hydrological records? Hydrol. Sci. Bull., 18: 347--365. W.M.O. (World Meteorological Organization), 1966. Climatic change. World Meteorol. Org., Geneva, Tech. Note 79. Worsley, K.J., 1979. On the likelihood ratio test for a shift in location of normal populations. J. Am. Stat. Assoc., 74: 365--367. Yevjevich, V. and Jeng, R.J., 1969. Properties of non-homogeneous hydrologic series. Colo. State Univ., F o r t Collins, Colo., Hydrol. Pap. 32.

Potrebbero piacerti anche