Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms
The University of Chicago Press is collaborating with JSTOR to digitize, preserve and
extend access to The Journal of Geology
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
VOLUME 62 NUMBER 1
ARTHUR N. STRAHLER
Columbia University
ABSTRACT
Statistical analysis plays an essential part in quantitative geomorphic investigations using numerical
data obtained by random sampling. The sample is represented graphically by a histogram and is described
in terms of the arithmetic mean, standard deviation, variance, and form of distribution curve-whether
normal, log-normal, or other. From the sample it is possible to estimate the standard deviation of the
population and the standard deviation of the sample means. A difference between two sample means may
be tested by deriving the statistic t, which will indicate the probability of obtaining the observed difference
by chance alone if no real difference in population exists. Differences in sample variances can be tested,
using the statistic F, which is the ratio of the larger to the smaller variance. Where three or more sample
means are to be compared, analysis of variance using the statistic F affords a means of testing the hypothesis
that no significant difference exists. The interrelationship between two sets of variables may be determined
by regression analysis, in which equations of both linear and nonlinear types are fitted to the data and
tested by a t-test under the hypothesis that no significant departure from a zero trend actually exists.
Where no functional relationship is to be established, the degree of correlation between two sets of data is
determined in terms of a correlation coefficient which can be tested by a t-test under the hypothesis that no
correlation actually exists. Examples of applications of each of the above methods in current geomorphic
research show that statistical analysis is a powerful tool in investigation of fundamental problems of land-
form development.
ing of the fundamental concepts and ways, upon the creative imagination and
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
6 ARTHUR N. STRAHLER
what material they are used upon or as to Geomorphology, like most other
the scientific conclusions drawn from phases of geology, deals with forms that
the data read from them. Although no are complex and variable. Hillside slopes,
one seems to have made the claim that stream patterns, dune forms, cinder
petrologists in general were using the cones, or glacial troughs are never quite
microscope as an end in itself or that the same from one example to the next.
"you can prove anything with a micro- Perhaps this variability and complexity
scope," the author has heard similar have been the chief deterrents to the de-
opinions expressed more than once in velopment of a quantitative morphology.
reference to statistics in geology. Second, On the other hand, all geomorphologists
there is a prevalent belief among geolo- recognize the general tendency of forms
gists that statistical analysis can be ap- produced by the same group of processes
plied to any and all types of geological within a given area to show a general
investigations and that it can be sub- uniformity which provides the basis for
stituted for more conventional or older statements concerning the facies, or tex-
methods (which, by inference, are nowture, of the region and for the demarca-
outmoded). Like any specialized tool, tion of physiographic subdivisions. Such
statistical analysis has its appropriate form assemblages provide an excellent
uses but is not applicable under many subject for a phase of statistical treat-
circumstances. In time, it is to be hoped, ment known as frequency-distribution
the geologist will accept mathematical analysis.
statistics as he now accepts the Brunton, Suppose, for example, that the inves-
the alidade, the X-ray camera, or the tigator is interested in steepness of slope
as related to stage of development of an
microscope-as an instrument of routine
area, climatic and vegetative factors, or
scientific operations but at the same time
physical properties of the soil and bed-
as a means of acquiring unique, invalu-
rock. In a fluvially dissected region he is
able information to supplement direct
confronted with a maze of small valley
visual observation and to improve the
units, each with steep valley walls merg-
basis of his reasoning on fundamental
ing at the top into rounded divides and
geologic problems.
summits and at the base into gentle
Statistical methods cover a rather slopes of valley floors. The problem is to
wide range of kinds of data analysis,quantify
each the slope attributes in such a
of which may be thought of as a special-
way as to give various representative nu-
ized tool in itself, appropriate to a merical
par- values to the region. These
ticular problem that may arise in an values can then be used to make com-
investigation. Topics treated below con-
parisons between this and other regions
cern random sampling, frequency-dis- or to relate slope values to some other
tribution analysis of sample data, testing
categories of values in this same area.
of differences in sample characteristics, The first step in this investigation is to
analysis of variance, and regression and define the attribute or attributes to be
correlation. Only a few of the commoner, sampled. Any particular class of attri-
simpler procedures are discussed. butes will be called a population. The
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 3
0 10 20 30 40 50 60 70 80 90
A B
FIG. 1.-A, portion of Emporium, Pennsylvania, Quadrangle topographic map (U.S. Geological Survey,
1:24,000), showing Lucore Hollow. This square is about 4,000 feet across. B, random distribution of 100
points through which slopes were measured.
The principles of statistics are particu- table of random numbers is used. Such
larly designed to tell the investigator tables are found in the appendixes of
whether he has taken a representative most general statistics textbooks (for ex-
sample of sufficient size to yield reliable ample, Dixon and Massey, 1951, p. 290-
indexes, yet not much larger than neces- 294) or in volumes of statistical tables
sary. At this initial stage the geomor- (Fisher and Yates, 1938). Following in-
phologist will want to know how to avoid structions given with the tables, two sets
deceiving himself and others by taking a of 100 pairs of digits are drawn at ran-
sample that is prejudiced or biased in fa- dom from the tables and are paired to
vor of a preconceived notion of the out- provide a set of grid co-ordinates. The
come. center point of the grid square is then
Figure 1, A, shows a small square of used as the reference point from which
map from which it is decided to take a to measure the degree of slope.
sample representative of the slope-steep- Figure 1, B, shows the actual distribu-
ness conditions over the entire ground tion of the 100 random points. Their dis-
surface. Although one could cover this tribution is not, of course, uniform, as in
area with a grid of intersecting lines and a grid; but we are assured that it is truly
read the slope at each grid intersection, randomized and that, as the number of
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
4 ARTHUR N. STRAHLER
sample points thus determined ap- the spread of the values. Tabulation in
proached infinity, their distribution classes and graphic presentation in the
would approach uniform density. form of the histogram (fig. 2) are useful
Having now located the 100 sample primarily for rough visual appraisal.
points, the maximum slope angle on a Selection of the limits of the classes does
100-foot segment of line orthogonal to not appreciably influence the mathemat-
the contour through each point is ical parameters and tests used later.
measured by estimating from the con- Table 1 tells us the number of slope
tours the total drop in elevation along readings which fell into each group or
TABLE 1
class. The mid-value of the class is given
in the table, and it is inferred that the
FREQUENCY DISTRIBUTION OF SINES OF
SLOPE ANGLES, LUCORE HOL-
dividing limits between successive classes
LOW, PENNSYLVANIA are located midway between the values
given. This is shown on the histogram, in
Observed Cumulative which each bar represents a class; the
Class Mid- Percentage
Frequency Percentage
values (Sine of
Frequency
of Occur- Frequency
Slope Angle)
rence (A) (B)
0.20......... 2 2 2
.24.......... 4 4 6
.28.......... 21 21 27 (A) Frequency
.32.......... 21 21 48 Distribution
.36.......... 25 25 73 Histogram
.40........ 10 10 83
.44.......... 8 8 91
.48.......... 5 5 96
0.52.......... 4 4 100
s= 0.00513; s=0.0716.
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 5
compare histograms of samples of differ- of mechanics, this means that if one were
ent numbers of variates, at the same time to draw the histogram on stiff cardboard
keeping the total area of the bars of each and carefully cut it out, the histogram
graph the same. would balance perfectly on a horizontal
The histograms shown in figure 2 are knife-edge coinciding with the line repre-
in two forms: the first, A, is simply the senting the mean. The sum of the turning
ordinary frequency-distribution histo- moments (product of weight times dis-
gram; the second, B, is a cumulative fre- tance from fulcrum) is 0, those on one
quency-distribution histogram. The data side balancing those on the other. Statis-
used in the two forms of graph are shown ticians have given the term first moment
in table 1 under columns A and B. The of the distribution to the sum of the devia-
cumulative form is popular in sedimenta- tions of a distribution, and this is ex-
tion, soil mechanics, and hydrology. pressed in the following symbolic way:
Note that a smooth-line curve should
N (x - )
never be used instead of the block or step (2)
N
form shown here.
We are now in a position to determine where the terms are as defined in equa-
each of the several indexes, or "statis- tion (1). Interestingly enough, if any
tics," which describe various attributes other reference value is selected instead
of the sample. First of all, one will note of the arithmetic mean, the summed de-
viations will not total 0; hence we can de-
the spread of values in the sample-the
range of the distribution. One will also fine the arithmetic mean as that value
about which the summed deviations al-
see which class contains the largest num-
ber of variates, this being termed the ways total 0.
modal class (fig. 2). Much more impor- The next problem is to find a statistic
tant will be the statistic that measures that will indicate the extent to which the
the center of mass of the distribution- individual variates fail to coincide with
the arithmetic mean. This value is easily the mean, i.e., the dispersion qualities of
obtained by summing the variates and the distribution. The standard deviation
dividing by the total number of variates. of the sample is used for this purpose and
The symbolic statement is as follows:
consists of the square root of the sum of
the individual squared deviations. The
- x
x (1) symbolic statement is
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
6 ARTHUR N. STRAHLER
slopes.
Another statistic which is of prime im-Naturally, the means and stand-
ard deviations are not the same in these
portance in statistical work is variance,
which is merely the square of the stand- two samples, because they are taken from
ard deviation and has the simple widely different areas-different in geo-
statement logic structure, relief, vegetative cover,
and climatic environment. Up to this
N ' (4) point, statistical methods have merely
provided clearly defined, meaningful
where 42 is the variance and the other ways of describing quantitatively certain
terms are as defined in equation (1). attributes of sample data, but they have
Standard deviation is therefore simply not been applied to the solution of a geo-
the square root of variance. It is an inter- morphic problem.
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 7
where Y is the ordinate, or height of the ror, the distribution is described as log-
curve at any given value of x; a is stand- normal. This has been done in figure 4, B.
ard deviation, as in equation (3); e is the In the case of the stream lengths in fig-
base of natural logs (2.71828 . . .); and x ure 4, B, the log-normal distribution
and x are as in equation (1). One way in shows that extremely long streams are
which a close approach to the normal relatively common but that extremely
curve may be obtained is to compile a short streams are rare.
velopment have resulted in failure to FIG. 4.-A, histogram showing arithmetic dis-
achieve perfect uniformity. tribution of lengths of first-order streams developed
Other types of geomorphic data tend on Copper Ridge dolomite, Virginia (Miller, 1953).
(See table 2 for data.) B, histogram showing distri-
to have an unsymmetrical distribution, bution of logarithms of stream lengths. Frequencies
known in statistics as a skewed distribu- computed and grouped from logs of variates.
tion. In these cases the highest column of
the histogram (the modal class) is fre- bution, or some other form? Irregulari-
quently located to the left-hand, or low- ties are conspicuous in the frequency dis-
er-value, side of the mean. A good ex- tributions used thus far as examples. Are
ample of a skewed distribution is shown such irregularities inherent in the popu-
in figure 4, A, a sample of lengths of lation distribution, or could we expect
first-order streams obtained by Victor C.such irregularities by chance in samples
Miller (1953). If, by plotting the loga- of the limited size used here? Perhaps
rithms of the lengths on the abscissa, the this question could be answered by draw-
distribution becomes symmetrical, agree- ing many more samples from the same
ing closely with the normal curve of er- populations, superimposing them, and
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
8 ARTHUR N. STRAHLER
noting the replacement of the variations in the classes of the frequency distribu-
by a smoothed average curve. Unfortu- tion are set down in cumulative form, as
nately, the geomorphologist may be un- in table 2, column B. These data are
able to obtain further data in his field or next plotted on the probability paper-
mapwork, owing to limitations of time cumulative percentage frequency on the
and funds. The field data of the summer ordinate, class mid-values on the ab-
may be all he has to work with in prepar-scissa.
ing his report. By means of a cleverly devised
Two methods are used to ascertain ment of spacing on the ordinate of the
whether a distribution is normal or probability
log- paper, any normal probabili-
normal. The first of these is a quick meth-
ty distribution yields a straight, sloping
od for general guidance. The grouped line of points on this paper. As figure 5
data are plotted on a special type of shows, the cumulative slope data taken
graph paper known as probability paper at Bernalillo, New Mexico, fall close to a
(fig. 5). First the percentage of variatesstraight line (line C), whereas those of
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 9
0.9500 14 28
represents a distribution with the same 1.0500 28 56
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
10 ARTHUR N. STRAHLER
age probability from which the geo- to slope frequency distributions by the
morphologist accepts or rejects the hy- author (Strahler, 1950, p. 683-685).
pothesis of normal distribution which he
has set up. In using the prepared tables, SIGNIFICANCE OF SAMPLE
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 11
sample would favor the high-value vari- morphologist has taken a sample of 59
ates. Could this not cause the mean of slope angles (see fig. 3). If he takes more
one sample to be lower than the mean of samples of 59 readings each, the means of
the other sample, even though their pop- these samples will not be the same. If a
ulation means are actually in the reverse vast number of these samples were ac-
order of magnitude? Fortunately, statis- cumulated, their means would show a
tics provides an analysis of the theory of normal distribution curve (even if the
such chance variations in successive population is not exactly normally dis-
samples and offers tests to reassure the tributed). Treating the collection of
investigator of the strength or weakness sample means as a population in itself,
of his conclusions.
ESTIMATES OF POPULATION
CHARACTERISTICS
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
12 ARTHUR N. STRAHLER
N is small, the deviation in means will be of development of the region has ad-
great. vanced only slightly.
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 13
The value of t is thus a measure of the and is not significant. Consequently, the
ratio of difference in means to the prod- null hypothesis is retained. The geo-
uct of an average standard deviation and morphologist can only say "we have no
an inverse measure of sample size. the reason to doubt that parallel slope re-
denominator of the t-statistic is actually treat is occurring here."
an estimate of the standard deviation of From this test it should be obvious
the population of differences in these that statistical analysis, far from proving
means. The pooled estimate of variance anything one wants, actually introduces
is an average, or combined value of the an ultra-conservative attitude of re-
estimated standard deviations, s, of each straint in the drawing of conclusion
sample; its computation is explained in odds of being right or wrong are
statistics texts (Croxton and Cowden, forthrightly for all to see. The sc
1946, p. 330-331; Dixon and Massey, cannot uphold an unwarranted conclu-
1951, p. 102-103). When t is computed,sion it on the strength of his opinion or
is possible to refer to a "table of values of
prestige alone. Of course, he can intro-
t" (Croxton and Cowden, 1946, p. 875) duce to a bias into his sampling, whether
read the percentage probability of our consciously or subconsciously, to insure
rejecting the null hypothesis when it is the statistical test will prove favor-
that
actually correct. Having already set a able to his preconceived theory. This
critical value of probability (generally last doubt can be largely eliminated by
termed a) at a conservative level, say randomized sampling or by having
0.01, we retain or abandon our hypothe- samples taken by two or more disinter-
sis according to whether the observed ested persons and compared by the same
probability is greater or less than 0.01.
typeIn
of test as that described above.
the case of Schumm's slope samples, the For an example of a comparison of
value of t was computed as follows: sample means which gave a significant
4971 -48?8
difference, resulting in discard of th
4.64 /(1/154) + (1/149)
hypothesis, a case is cited from the
author's research on slope characteristics
=0.561 .
in the Verdugo Hills, southern California
(Strahler, 1950, p. 810-813). A different
The t-table is consulted under the last
statistical technique was used in the
row, "infinite number of degrees of free-
author's previously published paper and
dom," because there are 301 degrees of
is replaced herewith by the t-test de-
freedom in this case (N1 + N2 - 2), and
scribed below.
anything over 120 is considered as infi-
Two slope samples were taken (table
nite. The value of t shows a percentage of
3; fig. 8) in a maturely dissected moun-
close to 50, which is vastly greater than
tain mass. One group of slope readings
the critical level of 1 per cent. In ordi-
(sample A) was taken from valley-side
nary language this means that if pairs of
slopes at whose base accumulated talus
samples of this size were taken repeated- and slope wash indicated that consider-
ly from the same population, differences able time had elapsed since stream cor-
of means this large (0°3) or larger would rasion had been active against the slope
be expected 50 per cent of the time. In base. The other group of slope readings
other words, the observed difference (sample B) was taken from valley-side
could easily be the result of chance alone slopes at whose base stream corrasion
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
14 ARTHUR N. STRAHLER
had recently been active. The question upon the assumption that populations
was: Does the mean angle of the first are normally distributed and that the
sample differ significantly from the mean
variances are equal. The sample means
differ by about 6°6; the value of t is over
of the second? If so, the data support the
thesis that a slope, if left to weather and 10. In combination with 202 degrees of
waste without basal cutting, tends to de- freedom (N1 + N2 - 2) the probability
cline in angle rather than to retreat in of obtaining so great a difference or one
parallel planes. greater by chance alone when the true
Before undertaking the t-test, it was difference is actually zero is very much
determined by other tests that the vari- less than 0.001. The null hypothesis, that
ances of the samples do not differ signifi- no difference exists, is therefore rejected,
cantly and that the distributions do not and the difference is regarded as signifi-
depart significantly from a normal form. cant. The thesis of declining slope re-
This was done because the t-test is based treat is thus favored by the field data.
TABLE 3
SAMPLE*
33.75 35.75 37.75 39.75 41.75 43.75 45.75 47.75 49.75 51.75 53.75 X s N
A 4 3 10 11 4 0 1 38?23 2?70 33
SIGNIFICANCE OF DIFFERENCE IN
f
SAMPLE VARIANCES
40
Arithmetic means are not the only
kinds of statistics which may differ be-
30 tween samples. It is possible that two
samples may have almost identical
means (not significantly different), y
20 may have different degrees of dispersion,
as shown by marked differences in the
estimated standard deviation, s.
10 An example in geomorphic research
was encountered by Victor C. Miller
(1953) in studying the influence of lithol-
300 40° 50' ogy on slope steepness. Slope steepness in
Slope angle, degrees
two areas-one of shale (Athens forma-
FIG. 8.-Comparison of valley-wall slope samples tion), the other of interbedded sand-
taken in Verdugo Hills (Strahler, 1950). A, slopes
stones and shales (Pennington forma-
protected at base by talus and slope wash. B, slopes
actively corraded at base. (See table 3 for data.) tion)-was measured by taking a sample
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 15
As with sample means, sample vari- This value of F is somewhat larger than
ances may be expected to differ from the critical value at the probability of
TABLE 4
S..PLE*
17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 X s N
are equal:
FIG. 9.-Comparison of slope samples in homo-
2-
a1 =
2 '
geneous clastic rocks in western Virginia (Miller,
1953). A, Athens formation. B, Pennington forma-
tion. (See table 4 for data.)
where a is the variance of the first
sample population and o\ is the variance 0.01 (1 per cent); hence the difference
of the second sample population. A sta-may be regarded as significant and the
tistic termed "F" is then computed, using
null hypothesis be discarded. The ob-
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
16 ARTHUR N. STRAHLER
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 17
proximately the same except for the sis of variance of the stream-length-
steep-dip mean (col. 4), which drops versus-dip
to series.
less than half the other values. We shall In the case of the dip slope, row A
be interested in knowing whether this (table 5),
diversity in means is significant or could
F = - = 11.24.
be expected to occur readily by chance.
On the scarp slope (row B) the means are
quite similar, and we shall wish to test In the numerator there are 3 degrees of
the hypothesis that no real difference freedom (one less than the number of
exists on this side. samples); in the denominator there are
The technique used in this type of 243 degrees of freedom (four less than the
total number of variates in all four
testing is termed analysis of variance and
samples). For this combination of values
is of the simplest form in which only one
the table yields the information that the
variable of classification is introduced,
probability is very much less than 0.005.
i.e., variation in dip of strata. Details of
the testing procedure are readily avail- Having previously decided that a critical
able in elementary textbooks of statistics probability of 0.05 will be used to accept
(Dixon and Massey, 1951, p. 119-127). or reject the null hypothesis, we at once
First, a null hypothesis is set up to the reject the hypothesis. What we are saying
effect that the means of all samples are is that there is much less than one chance
the same and that they represent one and in 200 of obtaining, by chance alone, a
the same population. Expressed sym- difference in sample means as great as
bolically, those observed if the samples actually
Mi = M2 = /a = /4 = 1 ,
come from the same population. We
therefore regard the difference in means
where M refers to a population mean and
as significant. It is an easy step for the
the subscripts denote the four samples. geomorphologist to attribute the differ-
Short-cutting both the theory and com-
ence to control by dip of the strata, al-
putational instructions, it must suffice
though statistics offers no aid in assign-
here to say that a statistic known as F is ing causes to significant differences. Ob-
computed, which is a ratio between the viously, the steep-dip segment gives
variance of the sample means and the
shorter streams because there is a nar-
average variance of the individual vari-
rower outcrop of resistant strata be-
ates in the samples: tween ridge crest and base than where
the dip is low.
(11)
A similar analysis of variance of the
Variance of the sample means
stream lengths on the scarp slope (row B
Average variance within the samples
of table 5) yields a value of F which in-
Prepared tables (Dixon and Massey, dicates a percentage probability of well
1951, p. 310-313) may next be consulted over 0.05. The observed differences in
to obtain the probability percentages as- means are readily expectable through
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
18 ARTHUR N. STRAHLER
In the operations thus far discussed no slope could induce a change in particle
dimensions. Here the analysis deals with
account has been taken of the interrela-
regression, and an attempt is made to
tion of two variables, of which one might
define the function relating y and x.
be the cause, the other the effect. Estab-
Suppose, however, that the widths and
lishment of the existence of a significant
lengths of many barchan dunes are
difference in two or more sample means
tells nothing of the behavior of associated measured and it appears that an increase
or causative factors, even though the
in width is generally accompanied by an
increase in length. Because neither di-
geomorphologist is eager to guess at
causes of significant differences. mension can act dynamically (mechani-
cally) to regulate the other, cause and
Where two attributes are measured
simultaneously at a given place or time
effect cannot be assigned. Both dimen-
sions may be controlled by a third factor,
and similar measurements are repeated
which is the cause-perhaps length of
many times, it is possible to establish or
deny an association between attributes time of dune development or strength of
by statistical methods and to test the wind. One does not need to define a func-
reliability of the observed degree of as- tion relating the two attributes but mere-
sociation.
ly seeks to establish the existence of the
Two types of association are distin- correlation beyond the possibility of a
pure chance relationship.
guished: (a) regression and (b) correlation.
The first is most commonly applied to an Examples of both regression and cor-
association of two variable attributes relation studies will illustrate the applica-
wherein one is clearly a cause, the other tion of the principles to geomorphic re-
an effect. The two variables are repre- search. An example of a regression analy-
sented mathematically by the statement sis in which a strong trend was evident is
based upon experimental geomorphic
y=/(x), (12)
data of Van Burkalow (1945, p. 679),
where y is the dependent variable, or who examined the influence of particle
effect, x is the independent variable, or size upon angle of repose with a view to-
cause, andf( ) is a symbolic statement of ward determining the factors which con-
"a function of." In correlation the two trol the slope angle in natural fragmental
attributes vary together, as in regression; materials, such as dune sands, talus frag-
but it is the degree of association which ments, or cinders. By means of repeated
is established rather than the function laboratory experiments under controlled
that relates them. One attribute may be conditions, Van Burkalow obtained a
the cause of the other, or both attributes series of 12 pairs of values relating diam-
may be the results of a common cause, eter of lead shot (inches) to mean angle
which may or may not be known. of repose slope (degrees). The last pair of
For example, if mean size of sand the series seemed anomalous and was
grains on dune faces shows a close corre- eliminated in the regression analysis be-
spondence with slope angle of the dune cause examination had shown a large
face, we consider the size of particle to be number of flattened grain surfaces which
the cause, the dune slope to be an effect. might be expected to produce an abnor-
Not only does an understanding of dy- mally high angle. Thus there remained 11
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 19
5 19.55
6 18.20
7................... 17.50
8 16.81
(A)
9................... 15.76
LINEAR
10 14.81
REGRESSION
11................... 14.13
12 13.71
13 13.26 0 5 10 15
14 12.84 Lead shot diameter ( 100 ths. inches )
15 12.75
* Data from A. Van Burkalow (1945, p.
679).
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
20 ARTHUR N. STRAHLER
illustrated by short lines connecting the In considering a way of testing the sig-
individual points with the regression line. nificance of the regression, three principal
The length of each line is equal to (y - factors enter in: (1) the degree of scatter,
Y), where y is the ordinate of the ob- (2) the slope of the regression line, and
served point and Y is the ordinate on the (3) the number of points used to deter-
fitted regression, both y-values being as-mine the line. We realize intuitively that
sociated with the same x-value. In this if scatter is small, the observed trend is
way the deviations of observed points not likely to be due to chance, whereas a
from the fitted line can be measured. Thelarge scatter would suggest that the
measure of dispersion is analogous to the trend might be due to chance alone. It
standard deviation in frequency distribu- can also be appreciated that if the regres-
tion and is defined as
sion line has a very low slope, so as to ap-
proach parallelism with the X-axis, the
fi (y - Y)2
y.x = -- N -, (14) trend is a weak one and might be due to
chance. From this it can be reasoned that
where ay., is the scatter, Y and y
the significance of aare as might be
regression
explained above, and N is the by
measured number of varies di-
a statistic which
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 21
rectly as the slope of the regression line By contrast, the second illustration of
and inversely as the amount of scatter. a linear-regression study in geomorphic
We set up the null hypothesis that the research takes us to an opposite extreme
regression coefficient is equal to zero. To of probability. In studying the amount of
test this, the statistic used is t, and is summer erosion on badland slopes, Stan-
derived here as follows: ley Schumm (1953) drove a large number
of long dowel rods into the clay slopes
t1 (x- )2 (16) along profile lines extending from divide
Sy.x
to slope base. The rods were driven flush
with the slope and were spaced 1 foot
The term b in the numerator is the re-
apart along the profile lines. In all, there
gression coefficient, or slope of the regres-
were 16 profile lines and a total of 113
sion line; the term Sy., in the denomina-
rods, limiting the number to those which
tor is the standard error of estimate.
were embedded along essentially straight
Both have been explained above. In ad-
parts of the slope. It is not possible here
dition, there is in the numerator the term
to explain the procedures fully, but 6
/(x - x)2 in which x is the mean
weeks later the depth of material removed
value of all the individual x-values in the
from the slope by summer rains was
series. This is therefore a measure of the
measured by the extent to which the ends
dispersion of the observed x-values about
of the dowel rods projected above the
a mean x-value.
surface. Most of the depths were be-
The value of t in the illustrative case
tween 0.4 and 1.6 inches.
was computed as follows:
A matter of particular geomorphic in-
t
(- 0.69)(10.49)
= 0.48 - = -- 15.1 approx.
terest was to determine whether or not
the slopes had (1) reclined in angle,
(2) had steepened in angle, or (3) had
We refer to the table of t-distributions
maintained a parallel attitude. This
under 9 degrees of freedom (two less than
could be determined if depth of erosion
the number of pairs in the regression),
were correlated with distance downslope
and find that the probability, P, is very
from the divide, since, if the slope had
much less than 0.001. In other words,
reclined to a lower angle, the trend of re-
there is only a remote possibility of ob-
moval must be one decreasing from top
taining the observed regression by chance
to base. In order to compare distances on
alone if similar numbers of pairs are re-
the 16 different slopes, Schumm used a
peatedly drawn. The null hypothesis is
dimensionless value: percentage of total
rejected, and the slope of the regression
distance from base to top. The next step
is considered significant. Note that the
was to plot the 113 points on a regression
door is left open to the remote possibility
diagram (fig. 11) and to fit a regression
of obtaining this assemblage of paired
line to the points in the manner already
observations by chance alone when no
described. The equation is as follows:
trend actually exists, and we shall never
Y = 0.92+0.00021X.
be absolutely positive that such is not the
case here. The scientist must gamble in Note that the trend or slope is very
the final analysis, but he wants to be sure slight, as indicated by the exceedingly
he has the odds overwhelmingly in his low value of the regression coefficient and
favor. by the fact that the regression line seems
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
22 ARTHUR N. STRAHLER
almost parallel with the X-axis. The not always so; other functions may be
standard error of estimate, sy.x, is quite better. One function that is commonly
high (0.35 inch). Intuitively it seems encountered in geomorphic research is
most unlikely that under these condi- the logarithmic function,
tions the regression is a significant one.
log Y =log a+ b log X , (17)
A t-test of significance, following the
method described above, yielded a prob- in which the terms are the same as in
ability of 0.80, or 80 per cent. Thus in 80 linear regression except that the loga-
times out of 100 we should expect so rithm of the term is used where indicat-
great a trend or a greater one through ed. Another way of stating this same
chance alone if similar sets of measure- function is
ments were repeatedly made in this area.
Y =aXb, (18)
The conclusion, so far as the geomorphic
interpretation is concerned, is that we which is called a power function, inas-
much as the independent variable is
24
raised to a power b. Plotted on log-log
graph paper, a logarithmic or power
2.0
function is a straight line.
16
Figure 10, B, is a log-log plot of Van
Burkalow's lead-shot-repose-angle data.
8 Notice that the points fall close to the
4
fitted logarithmic function and show no
tendency to produce a concave-up trend,
0
NONLINEAR (LOGARITHMIC)
REGRESSION
log() = ax , (20a)
The regression previously discussed
assumed that a linear equation (Y = or quadratic equations of the form
a + bX) gave the best possible descrip- Y= a + bX + cX2 (21)
tion of the relationship between depend-
or
ent and independent variables. This is Y = a+ bX+ cX2+dX3 (22)
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 23
may be fitted to the data. Of these last- thus determine two possible regression
mentioned forms, the exponential (semi- coefficients. In the perfect case, with all
logarithmic) equation is remarkably well points lying in a straight line, the prod-
adapted to many geomorphic forms and is uct of the coefficients would be unity,
used extensively in stream profiles and and the scatter would be zero.
other fluvial forms (Krumbein, 1937). In the slope-length-slope-angle corre-
lation shown in figure 12, the correlation
CORRELATION
coefficient, r, is 0.485. Because an ideal,
In correlation studies we are concerned or perfect, correlation is represented by a
simply with demonstrating the existence
(or lack of existence) of a relationship be-
tween two sets of numerical values taken
in pairs. An example in geomorphic re-
search is afforded by the work of Ken-
neth G. Smith (1953) in the Big Bad-
lands of South Dakota. Smith systemati-
cally measured the lengths and the slope
angles of the steep badland slopes bor-
dered by pediments. A plot of slope
length versus slope angle is shown in fig-
ure 12. There were 134 pairs of variates
altogether. Examination of the scatter
diagram, figure 12, might indicate that a
significant correlation is doubtful be-
cause the degree of scatter is great. The
trend appears to be an increasing one but
is very steep, almost paralleling the
Y-axis. The degree to which correlation
exists may be represented by the statistic
r, termed the correlation coefficient (Goul-
den, 1939, p. 65-77; Croxton and Cow-
Slope angle, degrees
den, 1946, p. 653-654; Dixon and Mas-
sey, 1951, p. 162-165), which is defined FIG. 12.-Correlation of slope length with slope
angle on steep badland slopes bordering miniature
as follows:
pediments in the lower Brule formation, Big Bad-
(23) lands, South Dakota (Smith, 1953).
~ byzr b;cy
The term b means "regression coeffi- value of r = 1.0, it may seem likely that
cient," or slope of a regression line, as we are dealing with a questionable cor-
used previously in the regression analy- relation. A test of significance of the cor-
sis. The modified symbol by, refers to therelation makes use of the statistic t, de-
regression coefficient that results when xfined as follows:
is assumed to be the independent vari- r /N - m
able; bx, is the regression coefficient when (24)
/1 - r2 '
y is assumed to be the independent vari-
4 In the event that the values of b,, and bx, are
able.4 That is to say, one can fit two re- negative, indicating that y decreases as x increases,
gression lines to the correlation data and a negative sign should be applied to r.
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
24 ARTHUR N. STRAHLER
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 25
REFERENCES CITED
CROXTON, F. E., and COWDEN, D. J., 1946, AppliedSTANLEY A., 1953, Erosion measured on
SCHUMM,
general statistics: New York, Prentice-Hall, badland
Inc. slopes: Unpublished manuscript of paper
DIXON, W. J., and MASSEY, F. J., JR., 1951, Intro-
read at Am. Geophys. Union meetings, Wash-
duction to statistical analysis: New York, ington, D.C., May, 1953. (Abstract printed in
McGraw-Hill Book Co., Inc. 34th Annual Meeting program, p. 348.)
FISHER, R. A., and YATES, F., 1938, Statistical
SMITH, KENNETH G., 1953, Erosional processes and
tables for biological, agricultural, and medical
landforms in Badlands National Monument,
research: Edinburgh, Oliver and Boyd.
South Dakota: Unpublished doctoral disserta-
GOULDEN, C. H., 1939, Methods of statistical
tion, Columbia University.
analysis: New York, John Wiley and Sons, Inc.
STRAHLER, A. N., 1950, Equilibrium theory of ero-
KRUMBEIN, W. C., 1937, Sediments and exponential
curves: Jour. Geology, v. 45, p. 577-601. sional slopes approached by frequency distribu-
MILLER, VICTOR C., 1953, A quantitative geo- tion analysis: Am. Jour. Sci., v. 248, p. 673-696,
800-814.
morphic study of drainage basin characteristics
in the Clinch Mountain area of Virginia and VAN BURKALOW, A., 1945, Angle of repose and angle
Tennessee: Unpublished doctoral dissertation, of sliding friction: an experimental study: Bull.
Columbia University. Geol. Soc. America, v. 56, p. 669-708.
This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms