Sei sulla pagina 1di 26

Statistical Analysis in Geomorphic Research

Author(s): Arthur N. Strahler


Source: The Journal of Geology, Vol. 62, No. 1 (Jan., 1954), pp. 1-25
Published by: The University of Chicago Press
Stable URL: https://www.jstor.org/stable/30080861
Accessed: 14-09-2018 00:34 UTC

JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org.

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
https://about.jstor.org/terms

The University of Chicago Press is collaborating with JSTOR to digitize, preserve and
extend access to The Journal of Geology

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
VOLUME 62 NUMBER 1

THE JOURNAL OF GEOLOGY


January 1954

STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH1

ARTHUR N. STRAHLER
Columbia University

ABSTRACT

Statistical analysis plays an essential part in quantitative geomorphic investigations using numerical
data obtained by random sampling. The sample is represented graphically by a histogram and is described
in terms of the arithmetic mean, standard deviation, variance, and form of distribution curve-whether
normal, log-normal, or other. From the sample it is possible to estimate the standard deviation of the
population and the standard deviation of the sample means. A difference between two sample means may
be tested by deriving the statistic t, which will indicate the probability of obtaining the observed difference
by chance alone if no real difference in population exists. Differences in sample variances can be tested,
using the statistic F, which is the ratio of the larger to the smaller variance. Where three or more sample
means are to be compared, analysis of variance using the statistic F affords a means of testing the hypothesis
that no significant difference exists. The interrelationship between two sets of variables may be determined
by regression analysis, in which equations of both linear and nonlinear types are fitted to the data and
tested by a t-test under the hypothesis that no significant departure from a zero trend actually exists.
Where no functional relationship is to be established, the degree of correlation between two sets of data is
determined in terms of a correlation coefficient which can be tested by a t-test under the hypothesis that no
correlation actually exists. Examples of applications of each of the above methods in current geomorphic
research show that statistical analysis is a powerful tool in investigation of fundamental problems of land-
form development.

INTRODUCTION Despite its wide permeation into


virtually every field of modern scientific
This paper is directed to professional
research, the position of statistics is fre-
geomorphologists and graduate students
quently completely misunderstood by
who are fully familiar with conventional
geologists. In the first place, statistical
geological research methods using field
analysis is not an end in itself; it is a
study and map interpretation but who
versatile and powerful tool for use in an
have had little or no contact with quan-
intermediate stage of certain quantita-
titative research methods involving sam-
tive investigations. The selection and
pling and statistical testing. If the reader
gains from this paper some understand-
statement of the problem depend, as al-

ing of the fundamental concepts and ways, upon the creative imagination and

aims of statistical analysis as they apply far-sightedness of the investigator; the


directly to his geomorphic problems, he interpretation of the analyzed data de-
may be impelled to develop further pends, as always, upon the judgment and
knowledge of the subject through widely intellectual honesty of the investigator.
available texts, courses of instruction, or Statistical methods may be put in the
expert consultants. same category as the Brunton compass,
1 Manuscript received June 28, 1953. the petrographic microscope, the spectro-

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
6 ARTHUR N. STRAHLER

scope, or the X-ray camera. These mech- FREQUENCY-DISTRIBUTION ANALYSIS

anisms exercise no judgment either as to RANDOM SAMPLING

what material they are used upon or as to Geomorphology, like most other
the scientific conclusions drawn from phases of geology, deals with forms that
the data read from them. Although no are complex and variable. Hillside slopes,
one seems to have made the claim that stream patterns, dune forms, cinder
petrologists in general were using the cones, or glacial troughs are never quite
microscope as an end in itself or that the same from one example to the next.
"you can prove anything with a micro- Perhaps this variability and complexity
scope," the author has heard similar have been the chief deterrents to the de-
opinions expressed more than once in velopment of a quantitative morphology.
reference to statistics in geology. Second, On the other hand, all geomorphologists
there is a prevalent belief among geolo- recognize the general tendency of forms
gists that statistical analysis can be ap- produced by the same group of processes
plied to any and all types of geological within a given area to show a general
investigations and that it can be sub- uniformity which provides the basis for
stituted for more conventional or older statements concerning the facies, or tex-
methods (which, by inference, are nowture, of the region and for the demarca-
outmoded). Like any specialized tool, tion of physiographic subdivisions. Such
statistical analysis has its appropriate form assemblages provide an excellent
uses but is not applicable under many subject for a phase of statistical treat-
circumstances. In time, it is to be hoped, ment known as frequency-distribution
the geologist will accept mathematical analysis.

statistics as he now accepts the Brunton, Suppose, for example, that the inves-

the alidade, the X-ray camera, or the tigator is interested in steepness of slope
as related to stage of development of an
microscope-as an instrument of routine
area, climatic and vegetative factors, or
scientific operations but at the same time
physical properties of the soil and bed-
as a means of acquiring unique, invalu-
rock. In a fluvially dissected region he is
able information to supplement direct
confronted with a maze of small valley
visual observation and to improve the
units, each with steep valley walls merg-
basis of his reasoning on fundamental
ing at the top into rounded divides and
geologic problems.
summits and at the base into gentle
Statistical methods cover a rather slopes of valley floors. The problem is to
wide range of kinds of data analysis,quantify
each the slope attributes in such a
of which may be thought of as a special-
way as to give various representative nu-
ized tool in itself, appropriate to a merical
par- values to the region. These
ticular problem that may arise in an values can then be used to make com-
investigation. Topics treated below con-
parisons between this and other regions
cern random sampling, frequency-dis- or to relate slope values to some other
tribution analysis of sample data, testing
categories of values in this same area.
of differences in sample characteristics, The first step in this investigation is to
analysis of variance, and regression and define the attribute or attributes to be
correlation. Only a few of the commoner, sampled. Any particular class of attri-
simpler procedures are discussed. butes will be called a population. The

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 3

population decided upon in this case is we


the slope of the ground surface over the If a random sample is to be taken, the
entire area within a given boundary. Be- sampling method must be such that there
cause the slope is to be measured only is an equally good possibility that the
along selected short lengths of line over slope measurement will fall at any con-
the surface, there is an infinite number of ceivable point over this surface.
measurements which can be made, and The map square is scaled into 100
we can regard the population as infinitely
units on a side, and these are numbered
large. Because only a small number of from 00 to 99, inclusive, on each side.
measurements can be taken within rea- There are thus 10,000 possible small
sonable limits of time, the next step is tosquares from which we shall draw a
devise a technique of sampling. sample of 100 at random. To do this, a

0 10 20 30 40 50 60 70 80 90
A B

FIG. 1.-A, portion of Emporium, Pennsylvania, Quadrangle topographic map (U.S. Geological Survey,
1:24,000), showing Lucore Hollow. This square is about 4,000 feet across. B, random distribution of 100
points through which slopes were measured.

The principles of statistics are particu- table of random numbers is used. Such
larly designed to tell the investigator tables are found in the appendixes of
whether he has taken a representative most general statistics textbooks (for ex-
sample of sufficient size to yield reliable ample, Dixon and Massey, 1951, p. 290-
indexes, yet not much larger than neces- 294) or in volumes of statistical tables
sary. At this initial stage the geomor- (Fisher and Yates, 1938). Following in-
phologist will want to know how to avoid structions given with the tables, two sets
deceiving himself and others by taking a of 100 pairs of digits are drawn at ran-
sample that is prejudiced or biased in fa- dom from the tables and are paired to
vor of a preconceived notion of the out- provide a set of grid co-ordinates. The
come. center point of the grid square is then
Figure 1, A, shows a small square of used as the reference point from which
map from which it is decided to take a to measure the degree of slope.
sample representative of the slope-steep- Figure 1, B, shows the actual distribu-
ness conditions over the entire ground tion of the 100 random points. Their dis-
surface. Although one could cover this tribution is not, of course, uniform, as in
area with a grid of intersecting lines and a grid; but we are assured that it is truly
read the slope at each grid intersection, randomized and that, as the number of

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
4 ARTHUR N. STRAHLER

sample points thus determined ap- the spread of the values. Tabulation in
proached infinity, their distribution classes and graphic presentation in the
would approach uniform density. form of the histogram (fig. 2) are useful
Having now located the 100 sample primarily for rough visual appraisal.
points, the maximum slope angle on a Selection of the limits of the classes does
100-foot segment of line orthogonal to not appreciably influence the mathemat-
the contour through each point is ical parameters and tests used later.
measured by estimating from the con- Table 1 tells us the number of slope
tours the total drop in elevation along readings which fell into each group or
TABLE 1
class. The mid-value of the class is given
in the table, and it is inferred that the
FREQUENCY DISTRIBUTION OF SINES OF
SLOPE ANGLES, LUCORE HOL-
dividing limits between successive classes
LOW, PENNSYLVANIA are located midway between the values
given. This is shown on the histogram, in
Observed Cumulative which each bar represents a class; the
Class Mid- Percentage
Frequency Percentage
values (Sine of
Frequency
of Occur- Frequency
Slope Angle)
rence (A) (B)

0.20......... 2 2 2
.24.......... 4 4 6
.28.......... 21 21 27 (A) Frequency
.32.......... 21 21 48 Distribution
.36.......... 25 25 73 Histogram
.40........ 10 10 83
.44.......... 8 8 91
.48.......... 5 5 96
0.52.......... 4 4 100

N=100 100 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0


Sine of Slope Angle

X=0.35; oa=0.00508; a=0.0713;

s= 0.00513; s=0.0716.

the length of the line segment. Divided (B) Cumulative


Frequency
by 100 (horizontal distance), the tangent
\Histogram
of the slope angle is obtained. It has been
decided, however, to study the distribu-
tion of the sine of slope angle rather than u .I .2 .3 .4 .5 .6 .7 .8 .9 1.0
Sine of Slope Angle
the tangent; so the tangent figures are
transformed into equivalent sines from a FIG. 2.-A, frequency distribution histogram of
table of trigonometric functions. The slope sines read from random points on figure 1. B,
cumulative form of histogram; same data as in A.
result is a list of 100 sine values in the (See table 1 for data.)
order in which they were read. This list
constitutes the raw data. height of the bar represents the number
of individual readings (which we shall
SAMPLE STATISTICS
henceforth call variates) falling within
For convenience in handling, the raw that class. But, instead of giving the ac-
data are grouped into classes for study, as tual numbers of variates, we have con-
shown in table 1. The size of the classes verted them into percentage of the total
depends upon the size of the sample and number of variates. This will enable us to

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 5

compare histograms of samples of differ- of mechanics, this means that if one were
ent numbers of variates, at the same time to draw the histogram on stiff cardboard
keeping the total area of the bars of each and carefully cut it out, the histogram
graph the same. would balance perfectly on a horizontal
The histograms shown in figure 2 are knife-edge coinciding with the line repre-
in two forms: the first, A, is simply the senting the mean. The sum of the turning
ordinary frequency-distribution histo- moments (product of weight times dis-
gram; the second, B, is a cumulative fre- tance from fulcrum) is 0, those on one
quency-distribution histogram. The data side balancing those on the other. Statis-
used in the two forms of graph are shown ticians have given the term first moment
in table 1 under columns A and B. The of the distribution to the sum of the devia-
cumulative form is popular in sedimenta- tions of a distribution, and this is ex-
tion, soil mechanics, and hydrology. pressed in the following symbolic way:
Note that a smooth-line curve should
N (x - )
never be used instead of the block or step (2)
N
form shown here.
We are now in a position to determine where the terms are as defined in equa-
each of the several indexes, or "statis- tion (1). Interestingly enough, if any
tics," which describe various attributes other reference value is selected instead
of the sample. First of all, one will note of the arithmetic mean, the summed de-
viations will not total 0; hence we can de-
the spread of values in the sample-the
range of the distribution. One will also fine the arithmetic mean as that value
about which the summed deviations al-
see which class contains the largest num-
ber of variates, this being termed the ways total 0.
modal class (fig. 2). Much more impor- The next problem is to find a statistic
tant will be the statistic that measures that will indicate the extent to which the
the center of mass of the distribution- individual variates fail to coincide with
the arithmetic mean. This value is easily the mean, i.e., the dispersion qualities of
obtained by summing the variates and the distribution. The standard deviation
dividing by the total number of variates. of the sample is used for this purpose and
The symbolic statement is as follows:
consists of the square root of the sum of
the individual squared deviations. The
- x
x (1) symbolic statement is

where x is < 5(x


the - X) 2 ari
(3) S-- N -(3

"sum of," x mean


the total number of variates in the where a is the standard deviation and the
sample.
other terms are as defined in equation
The arithmetic mean, referred to here-
(1). The purpose of squaring the devia-
after as simply the "mean," has some
tions is to get rid of the positive and neg-
remarkable properties. If one subtracts
ative signs which would otherwise give a
the mean from the value of a variate, he
sum of 0. One might say that standard
obtains the deviation of the variate from
the mean. If all deviations deviation
(some isnega-
the average "distance" out
on either side of the mean that the vari-
tive, some positive) are summed algebra-
ates occur on the histogram.
ically, the sum will always be 0. In terms

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
6 ARTHUR N. STRAHLER

slopes.
Another statistic which is of prime im-Naturally, the means and stand-
ard deviations are not the same in these
portance in statistical work is variance,
which is merely the square of the stand- two samples, because they are taken from
ard deviation and has the simple widely different areas-different in geo-
statement logic structure, relief, vegetative cover,
and climatic environment. Up to this
N ' (4) point, statistical methods have merely
provided clearly defined, meaningful
where 42 is the variance and the other ways of describing quantitatively certain
terms are as defined in equation (1). attributes of sample data, but they have
Standard deviation is therefore simply not been applied to the solution of a geo-
the square root of variance. It is an inter- morphic problem.

Slope ongle in degrees. NORMAL DISTRIBUTIONS


O° 20 30° 40 50
40
A matter of considerable interest in
geomorphic research, as in other fields, is
the nature of the frequency distribution.
30 The question is: What characteristic or
% model form is taken by a frequency dis-
tribution when samples are drawn at ran-
20 dom from a particular population, such
as slope angles or stream lengths or
stream basin areas? Can we tell from the
10 form of the sample distribution what
form the distribution for the entire popu-
lation would take, if we could measure it
0 completely?
3o0 2 a 1 x lao 2" 3"
A great variety of distribution f
FIG. 3.-Normal curve fitted to grouped data. possible; but in sampling of various
The sample consists of maximum valley-wall slopes
in dissected fan gravels near Bernalillo, New
classes of information in natural science
Mexico. investigations, certain characteristic
forms are repeatedly encountered. Two
esting fact that the sum of squares of de- that have been of special interest in geo-
viations with respect to the mean is a morphic research are the normal distribu-
minimum. If any other value were used tion (or Gaussian distribution) and the
instead of the mean for the calculation of log-normal distribution. These are dis-
the individual deviations of the variates,
cussed at length in all elementary statis-
the sum of the squares would be larger.tics texts (Croxton and Cowden, 1946, p.
In figures 2 and 3 the values of the 265-304; Dixon and Massey, 1951, p. 47-
arithmetic mean, x, and the standard 66). Briefly, the normal distribution is
deviation, a, are given for two samples of
represented on a frequency-distribution
slopes-the first taken from a topo- diagram by a symmetrical, bell-shaped
graphic map by the random co-ordinate curve (see fig. 3) and is described by the
method already explained; the second equation

taken in the field with an Abney hand 1


level sighted down maximum valley-wal
<r / - 2 ( (5)

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 7

where Y is the ordinate, or height of the ror, the distribution is described as log-
curve at any given value of x; a is stand- normal. This has been done in figure 4, B.
ard deviation, as in equation (3); e is the In the case of the stream lengths in fig-
base of natural logs (2.71828 . . .); and x ure 4, B, the log-normal distribution
and x are as in equation (1). One way in shows that extremely long streams are
which a close approach to the normal relatively common but that extremely
curve may be obtained is to compile a short streams are rare.

large number of "precision" measure-


TESTS OF GOODNESS TO FIT
ments of some dimension, such as the
length of a steel bar. Because of errors in- How can the investigator make a good
herent in the measuring device and oper- guess as to whether the population from
ator's judgment, the length readings will which he has drawn his sample follows a
not be precisely the same but will tend to normal distribution, a log-normal distri-
group closely around the true value
60
(which we have no means of obtaining by
50
a single direct measurement). The larger
(A) ARITHMETIC
40
errors will tend to be relatively few, de- DISTRIBUTION
creasing with greater departures from the 30

true value; the smaller errors will tend to 20

be more numerous as they become small- 10

er. The term normal curve of error is thus 0


frequently used for this distribution. An I0 20 30 40 50
Stream length in I00'" of miles
illustration of the normal curve is shown
in figure 3, where it has been applied to
(B) LOGARITHMIC
slope data. When the fitness of such a DISTRIBUTION
curve is established for a given class of
data, we may reason that the natural
values tend to approach or crowd about a
particular value most representative of
.6 .7 .8 .9 1.0 1.1 1.2 1.3 1.4 1.5 1.6 1.7
the data but that chance variations in de- Logarithm of stream length

velopment have resulted in failure to FIG. 4.-A, histogram showing arithmetic dis-
achieve perfect uniformity. tribution of lengths of first-order streams developed
Other types of geomorphic data tend on Copper Ridge dolomite, Virginia (Miller, 1953).
(See table 2 for data.) B, histogram showing distri-
to have an unsymmetrical distribution, bution of logarithms of stream lengths. Frequencies
known in statistics as a skewed distribu- computed and grouped from logs of variates.
tion. In these cases the highest column of
the histogram (the modal class) is fre- bution, or some other form? Irregulari-
quently located to the left-hand, or low- ties are conspicuous in the frequency dis-
er-value, side of the mean. A good ex- tributions used thus far as examples. Are
ample of a skewed distribution is shown such irregularities inherent in the popu-
in figure 4, A, a sample of lengths of lation distribution, or could we expect
first-order streams obtained by Victor C.such irregularities by chance in samples
Miller (1953). If, by plotting the loga- of the limited size used here? Perhaps
rithms of the lengths on the abscissa, the this question could be answered by draw-
distribution becomes symmetrical, agree- ing many more samples from the same
ing closely with the normal curve of er- populations, superimposing them, and

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
8 ARTHUR N. STRAHLER

noting the replacement of the variations in the classes of the frequency distribu-
by a smoothed average curve. Unfortu- tion are set down in cumulative form, as
nately, the geomorphologist may be un- in table 2, column B. These data are
able to obtain further data in his field or next plotted on the probability paper-
mapwork, owing to limitations of time cumulative percentage frequency on the
and funds. The field data of the summer ordinate, class mid-values on the ab-
may be all he has to work with in prepar-scissa.
ing his report. By means of a cleverly devised

Stream lengths in 100th' of miles (Line A)


125 17.5 2 .25 27.5 32.5

16.75 18.75 20.75 22.75 24.75 26.75 28.75 30.75 32.75


Slope angle in degrees ( Line C )

FIG. 5.-Cumulative frequencies plotted on probability paper. Lines A and B rep


as in figure 4 and table 2. Line C represents same sample as in figure 3.

Two methods are used to ascertain ment of spacing on the ordinate of the
whether a distribution is normal or probability
log- paper, any normal probabili-
normal. The first of these is a quick meth-
ty distribution yields a straight, sloping
od for general guidance. The grouped line of points on this paper. As figure 5
data are plotted on a special type of shows, the cumulative slope data taken
graph paper known as probability paper at Bernalillo, New Mexico, fall close to a
(fig. 5). First the percentage of variatesstraight line (line C), whereas those of

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 9

stream lengths (line A) form a broadly


certain this degree of probability, a sta-
tistic known as chi-square (x2) is com-
curved trend. Obviously, the slope data
puted (Croxton and Cowden, 1946, p.
may well represent a normally distribut-
ed population; but the stream-length 286-287; Dixon and Massey, 1951, p.
data are not likely to represent a normal 190-191):
distribution.
(f -F) 2
Because of the broadly curving trend x'=3 F ' (6)
of points of the arithmetic plot of stream-
length data, we may suspect a log-normal where x2 is the statistic "chi-square";fis
distribution. When the logarithms of the observed frequency in a given class;
stream lengths are grouped cumulatively TABLE 2
and plotted on probability paper, as
FREQUENCY DISTRIBUTION OF LENGTHS
shown in figure 5, line B, the points now OF FIRST-ORDER STREAMS ON COPPER
lie close to a straight line. RIDGE FORMATION, VIRGINIA
Having now satisfied ourselves that A. ARITHMETIC SCALE

one of these two samples seems to re-


semble a normal distribution and the Cumulative
Class Mid-values Percentage
other a log-normal distribution, we still (In Hundredths Frequency
Percentage
Frequency

do not know exactly how to draw the of Miles)


(A) (B)

normal curves to fit these data, nor are


7.5 28 28
we at all sure whether the apparent re- 12.5 42 70
semblance could be due to chance. These 17.5 16 86
22.5 10 96
samples are small. Would other samples 27.5 2 98
drawn from the same population have 32.5 0 98
37.5 2 100
different forms?
To answer these questions a rigorous
Class interval= 5; = 13.9.
method of curve-fitting and testing is
readily available (Croxton and Cowden, B. LOG SCALE

1946, p. 271-287; Dixon and Massey,


1951, p. 61-63, 190-191). In figure 3 a Cumulative
Class Mid-values Percentage
Percentage
normal curve has been fitted to the (In Log of Hun- Frequency
Frequency
dredths of Miles)
Bernalillo slope sample, following con- (A) (B)

ventional methods described in the sta- 0. 7500 4 4


tistics texts. The smooth normal curve 0. 8500 10 14

0.9500 14 28
represents a distribution with the same 1.0500 28 56

mean and standard deviation as the 1. 1500 16 72


1. 2500 14 86
sample; but it is not otherwise influenced 1.3500 10 96
by the sample frequency distribution. 1.4500 2 98
1. 5500 2 100
The next step brings us to the first en-
counter with probability principles. A
Class interval = 0.1000; X= 1.1052
question is posed as follows: What is the (anti-log= 12.74).
likelihood (probability) that so poor a fit
(or worse) of normal curve to sample and F is the computed frequency in the
data could be obtained by chance alone if given class taken from the fitted normal
the population from which the sample is curve. When x2 has been calculated, a
drawn has a normal distribution? To as- table is consulted to obtain the percent-

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
10 ARTHUR N. STRAHLER

age probability from which the geo- to slope frequency distributions by the
morphologist accepts or rejects the hy- author (Strahler, 1950, p. 683-685).
pothesis of normal distribution which he
has set up. In using the prepared tables, SIGNIFICANCE OF SAMPLE

one must determine an accessory value, MEAN DIFFERENCES

termed the number of degrees of freedom. STATEMENT OF THE PROBLEM


Without attempting to explain the prin-
Thus far we have dealt with the study
ciples involved, it may be noted that this
of one sample with a view to describing
number is three less than the number of
its mean, standard deviation, and vari-
classes in the frequency distribution in
ance and to determining whether or not
the particular operation used here.
the sample could readily come from a
In figure 3 there are 7 classes; the
normally distributed population having
number of degrees of freedom is 4, and
identical values of the afore-mentioned
the value of x2 was found to be 0.631.
statistics. These determinations are
This combination yields a probability of
merely descriptions in quantitative form.
0.97. Stated in words, it may now be said
The next application of frequency-dis-
that so poor a fit or worse would be ex-
tribution statistics involves two sets of
pected 97 times out of 100 if the popula-
sample data. The geomorphologist will
tion is normally distributed; hence the fit
often take samples in two quite different
is a very good one, and we have no reason
regions or from two parts of the same
to doubt that the slope population from
homogeneous region or at two successive
which this sample was drawn is a normal
dates in the same place. He may be com-
one. On the other hand, a very low prob-
paring such attributes as dune dimen-
ability (say 1 per cent or less) would
sions (are the dunes higher or wider in one
cause us to doubt that the distribution is
place or another?) or beach slopes (are
normal. An arbitrary probability level of
beach slopes steeper when wave steepness
5 or 2 per cent is set by the investigator,
is greater?), or slopes of cinder cones
depending on how large a chance he
(does particle size affect slope of the
wishes to take of rejecting the hypothesis
cone?). This might seem offhand to pre-
when it is actually true; and the percent-
sent no operational problem. One simply
ages yielded by the x2 test are according-
measures the property in each area, de-
ly interpreted as favorable or unfavor-
termines the means of each, and sees
able to his hypothesis. The log-normal
which is the larger.
distribution may be tested in a similar
But at this point a suspicion unhappily
manner by using the logarithms of the
enters: just because one sample mean is
variates in place of the arithmetic values.
greater than the other, can we be sure
In passing, it may be noted that some
samples are somewhat lopsided or that the populations from which the
skewed; others show an unusually peaked
samples were drawn are truly different?
(leptokurtic) form or an unusually Most geomorphic
flat- attributes have a con-
topped (platykurtic) form. To determine siderable dispersion. In any particular
whether this degree of malformation sample of 25, 50, or 100 variates the in-
might readily be expected from a sample vestigator might just happen, by chance,
drawn from a normally distributed popu- to include many more of the low-valued
lation, tests of skewness and kurtosis are variates than is proportionate to their
available. Such tests have been applied actual occurrence. Perhaps the next

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 11

sample would favor the high-value vari- morphologist has taken a sample of 59
ates. Could this not cause the mean of slope angles (see fig. 3). If he takes more
one sample to be lower than the mean of samples of 59 readings each, the means of
the other sample, even though their pop- these samples will not be the same. If a
ulation means are actually in the reverse vast number of these samples were ac-
order of magnitude? Fortunately, statis- cumulated, their means would show a
tics provides an analysis of the theory of normal distribution curve (even if the
such chance variations in successive population is not exactly normally dis-
samples and offers tests to reassure the tributed). Treating the collection of
investigator of the strength or weakness sample means as a population in itself,
of his conclusions.

ESTIMATES OF POPULATION

CHARACTERISTICS

Without undertaking a complete de-


velopment of the subject of sampling
principles, it may suffice to consider two
new statistics relating to the sample and
population. When a small sample is taken
from an infinite population, the standard
deviation, ca, of the sample will tend to be Slope angle, degrees

smaller than the standard deviation of FIG. 6.-Comparison of normal distribution


the population from which it is drawn. Itfitted to sample and normal distribution of sample
means.
is possible to estimate the standard devi-
ation of the population from the sample
we would find that the mean of the
by the equation
would coincide with the mean of the pop-
ulation but that the standard deviation
s= -1-- ' (7) of the means would be very small by
comparison. This is shown graphically in
where s is the standard deviation of the figure 6.
population estimated from the sample The standard deviation (estimated) of
and the other terms are as defined in sample means is obtained by the equa-
equation (3). When using in the denomi-
tion
nator one less than the total number of
S
variates, the value of s will, of course, al- s - -- ' (8)
ways be greater than o. When the sample
is large, this difference is small. There is where sx is the estimated standard devia-
thus an automatic compensation for tion of the means2 for a given sample
sample size. The statistic s2 is, corre- size; s is the estimated standard devia-
spondingly, the variance of the popula- tion of the population; and N is the num-
tion estimated from the sample. ber of variates in the sample.
The second statistic of interest in test- It is obvious that if N is large, i.e., if a
ing the reliability of sample means is a large sample size is used, the means will
measure of the expected variations in show a small deviation; whereas, where
means drawn repeatedly from the same
2 This statistic is commonly called the standard
population. Let us say that a geo- error of the means.

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
12 ARTHUR N. STRAHLER

N is small, the deviation in means will be of development of the region has ad-
great. vanced only slightly.

THE t-TEST Now, a Walther Penck advocate


would counter this interpretation with
With the foregoing two statistics in
the opinion that such a small difference
mind, we turn next to a test of difference
in slope means as this could easily result
between two sample means. In studying
from chance variations in the samples,
the problem of parallel slope retreat ver-
even though there are about 150 variates
sus reclining retreat in valley-wall slopes
in each sample, enough to fix the popula-
in badlands at Perth Amboy, Stanley
tion mean rather accurately.
Schumm (1953) compared two field
To test the difference in sample means,
samples, one of which had been taken in
1948 by Strahler, the second of which a null hypothesis is set up as follows:
Schumm had taken himself in 1952. Both "There is actually no real difference in
the two slope populations from which the
samples were drawn; thus, in effect we
A B
1948 1952 are dealing with only one slope popula-
N 154 N 149 tion."3 Expressed symbolically,
x 49.1° R 48.8°
s 3.6° s 3.5°
1 = I2 = ,

where pi is the population mean of the


first sample, b2 is the population mean of
the second sample, and M is the popula-
tion mean of the combined populations.
Under the null hypothesis we take the
scientifically more conservative proposi-
30 40 50 60 70
Slope angle, degrees. tion, i.e., that no noteworthy, or s
cant, difference exists; and we will hold
FIG. 7.-Comparison of slope samples taken four
years apart on same slopes in badlands at Perth to this hypothesis until grounds are pre-
Amboy, New Jersey (Schumm, 1953). sented for abandoning it.
The null hypothesis is tested by ob-
samples were taken in the same geo- taining the value of a statistic known as
graphical area; the same instruments "t," where
were used in both surveys. In the period
S___ 1x -x ( __
of four years between readings, a reclin- (9)
sV'(1/Ni) + (1/N2)
ing slope development, as postulated by
Davis' general scheme of "normal" cycle in which xl is the mean of the first
development, might be revealed. sample, x2 is the mean of the second
Histograms of the two samples are sample, sp is a statistic, called the pooled
shown superimposed in a single graph in estimate of variance, N1 is the number of
figure 7. The means of the two sarrples variates in the first sample, and N2 is the
differ only slightly (0°3 less in 1952 thannumber of variates in the second sample.
in 1948), but this seems to confirm re- 3 This test requires the assumption that the popu-
clining retreat. In such a short period the lations are normally distributed and that the stand-
ard deviations of the two populations are the same.
change in angle would necessarily be
We have no reason to doubt the latter, because the
slight, we might argue, because the stage values of s are almost identical.

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 13

The value of t is thus a measure of the and is not significant. Consequently, the
ratio of difference in means to the prod- null hypothesis is retained. The geo-
uct of an average standard deviation and morphologist can only say "we have no
an inverse measure of sample size. the reason to doubt that parallel slope re-
denominator of the t-statistic is actually treat is occurring here."
an estimate of the standard deviation of From this test it should be obvious
the population of differences in these that statistical analysis, far from proving
means. The pooled estimate of variance anything one wants, actually introduces
is an average, or combined value of the an ultra-conservative attitude of re-
estimated standard deviations, s, of each straint in the drawing of conclusion
sample; its computation is explained in odds of being right or wrong are
statistics texts (Croxton and Cowden, forthrightly for all to see. The sc
1946, p. 330-331; Dixon and Massey, cannot uphold an unwarranted conclu-
1951, p. 102-103). When t is computed,sion it on the strength of his opinion or
is possible to refer to a "table of values of
prestige alone. Of course, he can intro-
t" (Croxton and Cowden, 1946, p. 875) duce to a bias into his sampling, whether
read the percentage probability of our consciously or subconsciously, to insure
rejecting the null hypothesis when it is the statistical test will prove favor-
that
actually correct. Having already set a able to his preconceived theory. This
critical value of probability (generally last doubt can be largely eliminated by
termed a) at a conservative level, say randomized sampling or by having
0.01, we retain or abandon our hypothe- samples taken by two or more disinter-
sis according to whether the observed ested persons and compared by the same
probability is greater or less than 0.01.
typeIn
of test as that described above.
the case of Schumm's slope samples, the For an example of a comparison of
value of t was computed as follows: sample means which gave a significant

4971 -48?8
difference, resulting in discard of th
4.64 /(1/154) + (1/149)
hypothesis, a case is cited from the
author's research on slope characteristics
=0.561 .
in the Verdugo Hills, southern California
(Strahler, 1950, p. 810-813). A different
The t-table is consulted under the last
statistical technique was used in the
row, "infinite number of degrees of free-
author's previously published paper and
dom," because there are 301 degrees of
is replaced herewith by the t-test de-
freedom in this case (N1 + N2 - 2), and
scribed below.
anything over 120 is considered as infi-
Two slope samples were taken (table
nite. The value of t shows a percentage of
3; fig. 8) in a maturely dissected moun-
close to 50, which is vastly greater than
tain mass. One group of slope readings
the critical level of 1 per cent. In ordi-
(sample A) was taken from valley-side
nary language this means that if pairs of
slopes at whose base accumulated talus
samples of this size were taken repeated- and slope wash indicated that consider-
ly from the same population, differences able time had elapsed since stream cor-
of means this large (0°3) or larger would rasion had been active against the slope
be expected 50 per cent of the time. In base. The other group of slope readings
other words, the observed difference (sample B) was taken from valley-side
could easily be the result of chance alone slopes at whose base stream corrasion

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
14 ARTHUR N. STRAHLER

had recently been active. The question upon the assumption that populations
was: Does the mean angle of the first are normally distributed and that the
sample differ significantly from the mean
variances are equal. The sample means
differ by about 6°6; the value of t is over
of the second? If so, the data support the
thesis that a slope, if left to weather and 10. In combination with 202 degrees of
waste without basal cutting, tends to de- freedom (N1 + N2 - 2) the probability
cline in angle rather than to retreat in of obtaining so great a difference or one
parallel planes. greater by chance alone when the true
Before undertaking the t-test, it was difference is actually zero is very much
determined by other tests that the vari- less than 0.001. The null hypothesis, that
ances of the samples do not differ signifi- no difference exists, is therefore rejected,
cantly and that the distributions do not and the difference is regarded as signifi-
depart significantly from a normal form. cant. The thesis of declining slope re-
This was done because the t-test is based treat is thus favored by the field data.

TABLE 3

FREQUENCY DISTRIBUTIONS OF VALLEY-SIDE SLOPE ANGLES


VERDUGO HILLS, CALIFORNIA

CLASS MID-VALUES IN DEGREES OF SLOPE

SAMPLE*

33.75 35.75 37.75 39.75 41.75 43.75 45.75 47.75 49.75 51.75 53.75 X s N

A 4 3 10 11 4 0 1 38?23 2?70 33

B 4 12 27 47 37 19 17 4 4 44?82 3?27 171

A, slopes with protected bases; B, slopes with actively corraded bases.

SIGNIFICANCE OF DIFFERENCE IN
f
SAMPLE VARIANCES
40
Arithmetic means are not the only
kinds of statistics which may differ be-
30 tween samples. It is possible that two
samples may have almost identical
means (not significantly different), y
20 may have different degrees of dispersion,
as shown by marked differences in the
estimated standard deviation, s.
10 An example in geomorphic research
was encountered by Victor C. Miller
(1953) in studying the influence of lithol-
300 40° 50' ogy on slope steepness. Slope steepness in
Slope angle, degrees
two areas-one of shale (Athens forma-
FIG. 8.-Comparison of valley-wall slope samples tion), the other of interbedded sand-
taken in Verdugo Hills (Strahler, 1950). A, slopes
stones and shales (Pennington forma-
protected at base by talus and slope wash. B, slopes
actively corraded at base. (See table 3 for data.) tion)-was measured by taking a sample

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 15

of 100 readings of maximum valley-side the equation


slopes. Table 4 shows the slope data
2
grouped into classes. Figure 9 com- F-
s '
bines the histograms of the two samples. (10) 2

Whereas the means of the two samples


where s] is the estimated variance
differ by a relatively small amount, the
estimated standard deviations differ by (squared
a estimated standard deviation)
relatively large amount. The Pennington of the first sample and si is the corre-
sandy-shale sample has a markedly wider sponding variance for the second sample.
dispersion, and this is shown in the histo- In the case of the two slope samples,
grams by the longer tails of the distribu-
(5.78)2 33.41
tion, which include two additional classes
(4.42)2 19.54
at ea.ch end.

As with sample means, sample vari- This value of F is somewhat larger than
ances may be expected to differ from the critical value at the probability of

TABLE 4

FREQUENCY DISTRIBUTIONS OF VALLEY-SIDE SLOPE ANGLES


ATHENS AND PENNINGTON FORMATIONS, VIRGINIA

CLSS MID-VALUES IN DEGREES OF SLOPE

S..PLE*

17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 X s N

A.... 5 3 14 12 21 9 17 12 4 3 31782 442 100

B..... 1 3 4 2 9 9 11 17 9 17 7 5 4 2 33?12 5?78 100

* A, Athens shale; B, Pennington, interbedded shales and sandstones.

sample to sample within the same popu-


lation, simply because of chance varia-
tions. Consequently, we shall want to
test the difference in standard deviations
in a manner analogous to tests of sample
mean differences. A null hypothesis is
first set up (Dixon and Massey, 1951, p.
88-90), in which it is assumed that the
variances (squared standard deviations) 15 20 25 30 35 40
Slope ongle, degrees
45 50

are equal:
FIG. 9.-Comparison of slope samples in homo-
2-

a1 =
2 '
geneous clastic rocks in western Virginia (Miller,
1953). A, Athens formation. B, Pennington forma-
tion. (See table 4 for data.)
where a is the variance of the first
sample population and o\ is the variance 0.01 (1 per cent); hence the difference
of the second sample population. A sta-may be regarded as significant and the
tistic termed "F" is then computed, using
null hypothesis be discarded. The ob-

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
16 ARTHUR N. STRAHLER

served difference in spread of slope statistical tests involving three or more


angles is not considered as due to chance sample means. Suppose that some geo-
alone. At this point the geomorphologist morphic attribute has been sampled at
seeks a rational explanation for the ob- several different places, where somewhat
served difference, and one is readily different geological conditions prevail.
available. As might be deduced from the We shall be interested in knowing wheth-
differences in lithology, the slopes in the er the list of samples includes different
relatively homogeneous black shale of populations or whether, instead, despite
the Athens formation would tend to be certain observed differences in the sample
highly uniform, whereas the alternation means, the magnitude of these observed
of resistant sandy layers with weaker differences could readily be due to chance
shales in the Pennington formation variations in sampling.
would produce a relatively greater num- An example in point is from the re-
ber of exceptionally steep and exception- search work of Victor C. Miller (1953).
ally gentle slopes. In investigating the effect of angle of dip
TABLE 5

LENGTHS IN MILES OF FIRST-ORDER STREAMS ON


CLINCH MOUNTAIN, VIRGINIA

Low Dip Medium-low Dip Medium Dip Steep Dip


(1) (2) (3) (4)

.x =0.255 x =0.309 x =0.301 x =0.129


A, dip slope...... s =0.20 s =0.18 s =0.18 s =0.05
N= 50 N=131 N=27 N=39

fx =0.207 x =0.196 x =0.229 x =0.217


B, scarp slope... s =0.15 s =0.12 s =0.12 s =0.12
N= 108 N= 282 N= 57 N= 53

The investigator might have been of strata of Clinch Mountain, Virginia,


motivated initially by deductive meth- upon the form of first-order drainage
ods, in which the expected results were basins, the ridge was divided into seg-
outlined in advance, to proceed to a ments designated as "low dip," "medi-
sampling and testing program as outlined um-low dip," "medium dip," and "steep
above. Or the sampling and testing may dip." In each segment the length of the
have motivated the investigator induc- first-order stream channels on the ridge
tively to seek a cause of the observed dif- flanks was measured. One set of samples
ferences. The statistical method is thus a was taken from the dip slope, the other
part of both deductive and inductive from the scarp slope (obsequent slope).
methods; but the distinction is largely Now it would be expected that the length
academic, because both deduction and of streams on the dip slope would tend to
induction normally proceed mixed be strongly influenced by the degree of
through any scientific investigation. dip, being longer where dip is lower. But
the lengths on the scarp slope might not
ANALYSIS OF VARIANCE
show such control. The data are sum-
From two sample means and their pos- marized in table 5. Note that the means
sible significant differences we progress toof the dip-slope samples (row A) are ap-

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 17

proximately the same except for the sis of variance of the stream-length-
steep-dip mean (col. 4), which drops versus-dip
to series.

less than half the other values. We shall In the case of the dip slope, row A
be interested in knowing whether this (table 5),
diversity in means is significant or could
F = - = 11.24.
be expected to occur readily by chance.
On the scarp slope (row B) the means are
quite similar, and we shall wish to test In the numerator there are 3 degrees of
the hypothesis that no real difference freedom (one less than the number of
exists on this side. samples); in the denominator there are
The technique used in this type of 243 degrees of freedom (four less than the
total number of variates in all four
testing is termed analysis of variance and
samples). For this combination of values
is of the simplest form in which only one
the table yields the information that the
variable of classification is introduced,
probability is very much less than 0.005.
i.e., variation in dip of strata. Details of
the testing procedure are readily avail- Having previously decided that a critical
able in elementary textbooks of statistics probability of 0.05 will be used to accept
(Dixon and Massey, 1951, p. 119-127). or reject the null hypothesis, we at once
First, a null hypothesis is set up to the reject the hypothesis. What we are saying
effect that the means of all samples are is that there is much less than one chance
the same and that they represent one and in 200 of obtaining, by chance alone, a
the same population. Expressed sym- difference in sample means as great as
bolically, those observed if the samples actually

Mi = M2 = /a = /4 = 1 ,
come from the same population. We
therefore regard the difference in means
where M refers to a population mean and
as significant. It is an easy step for the
the subscripts denote the four samples. geomorphologist to attribute the differ-
Short-cutting both the theory and com-
ence to control by dip of the strata, al-
putational instructions, it must suffice
though statistics offers no aid in assign-
here to say that a statistic known as F is ing causes to significant differences. Ob-
computed, which is a ratio between the viously, the steep-dip segment gives
variance of the sample means and the
shorter streams because there is a nar-
average variance of the individual vari-
rower outcrop of resistant strata be-
ates in the samples: tween ridge crest and base than where
the dip is low.
(11)
A similar analysis of variance of the
Variance of the sample means
stream lengths on the scarp slope (row B
Average variance within the samples
of table 5) yields a value of F which in-
Prepared tables (Dixon and Massey, dicates a percentage probability of well
1951, p. 310-313) may next be consulted over 0.05. The observed differences in
to obtain the probability percentages as- means are readily expectable through

sociated with a particular value of F chance


and variations in sampling alone; the
a particular number of "degrees of free- null hypothesis is sustained; there is no
dom" (combination of number of samples reason to suspect an influence of steep-
and number of variates). Let us see what ness of dip upon stream length on this
results were obtained by Miller in analy- side of the ridge.

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
18 ARTHUR N. STRAHLER

LINEAR REGRESSION namics make this seem reasonable, but it


USE OF REGRESSION would be absurd to think that change in

In the operations thus far discussed no slope could induce a change in particle
dimensions. Here the analysis deals with
account has been taken of the interrela-
regression, and an attempt is made to
tion of two variables, of which one might
define the function relating y and x.
be the cause, the other the effect. Estab-
Suppose, however, that the widths and
lishment of the existence of a significant
lengths of many barchan dunes are
difference in two or more sample means
tells nothing of the behavior of associated measured and it appears that an increase
or causative factors, even though the
in width is generally accompanied by an
increase in length. Because neither di-
geomorphologist is eager to guess at
causes of significant differences. mension can act dynamically (mechani-
cally) to regulate the other, cause and
Where two attributes are measured
simultaneously at a given place or time
effect cannot be assigned. Both dimen-
sions may be controlled by a third factor,
and similar measurements are repeated
which is the cause-perhaps length of
many times, it is possible to establish or
deny an association between attributes time of dune development or strength of
by statistical methods and to test the wind. One does not need to define a func-
reliability of the observed degree of as- tion relating the two attributes but mere-
sociation.
ly seeks to establish the existence of the
Two types of association are distin- correlation beyond the possibility of a
pure chance relationship.
guished: (a) regression and (b) correlation.
The first is most commonly applied to an Examples of both regression and cor-
association of two variable attributes relation studies will illustrate the applica-
wherein one is clearly a cause, the other tion of the principles to geomorphic re-

an effect. The two variables are repre- search. An example of a regression analy-

sented mathematically by the statement sis in which a strong trend was evident is
based upon experimental geomorphic
y=/(x), (12)
data of Van Burkalow (1945, p. 679),
where y is the dependent variable, or who examined the influence of particle
effect, x is the independent variable, or size upon angle of repose with a view to-
cause, andf( ) is a symbolic statement of ward determining the factors which con-
"a function of." In correlation the two trol the slope angle in natural fragmental
attributes vary together, as in regression; materials, such as dune sands, talus frag-
but it is the degree of association which ments, or cinders. By means of repeated
is established rather than the function laboratory experiments under controlled
that relates them. One attribute may be conditions, Van Burkalow obtained a
the cause of the other, or both attributes series of 12 pairs of values relating diam-
may be the results of a common cause, eter of lead shot (inches) to mean angle
which may or may not be known. of repose slope (degrees). The last pair of
For example, if mean size of sand the series seemed anomalous and was
grains on dune faces shows a close corre- eliminated in the regression analysis be-
spondence with slope angle of the dune cause examination had shown a large
face, we consider the size of particle to be number of flattened grain surfaces which
the cause, the dune slope to be an effect. might be expected to produce an abnor-
Not only does an understanding of dy- mally high angle. Thus there remained 11

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 19

pairs of values, shown in table 6. The abscissa, the regression is described as


data are shown graphically in figure 10, linear and will be represented by an
A, plotted on ordinary arithmetic scales equation of the form
with the independent variable (cause) on Y = a+ bX, (13)
the abscissa, or X-axis, and the depend-
ent variable (effect) on the ordinate, orin which a and b are the two constants
Y-axis. There is no doubt that increasing which determine the position and slope of
particle diameter is consistently associat- the regression line. When a is varied, the
line is displaced up or down on the graph,
TABLE 6
in a motion parallel with the Y-axis.
RELATION OF ANGLE OF REPOSE When b is varied, the slope of the line
TO LEAD-SHOT DIAMETER*
changes. If the sign of b is plus, the line
Diameter of Angle of
Lead Shot Slope of
(In Hun- Repose
dredths of (Degrees)
Inches)

5 19.55
6 18.20
7................... 17.50
8 16.81
(A)
9................... 15.76
LINEAR
10 14.81
REGRESSION
11................... 14.13
12 13.71
13 13.26 0 5 10 15
14 12.84 Lead shot diameter ( 100 ths. inches )

15 12.75
* Data from A. Van Burkalow (1945, p.
679).

ed with decreasing slope angle. This


might be expected because the propor-
tion of grain surfaces in contact to mass
is less with larger sizes. Nevertheless, we (B
LOGARITHMIC
will want to establish the particular func- REGRESSION
tion that best fits these data and test the
relationship for possible occurrence .5 .6 .7 .8 .9 1.0 1.1 1.2 1.3

through chance alone. Logarithm of diameter

FIG. 10.-A, linear regression of angle of repose


FITTED REGRESSION EQUATION
on diameter of lead shot (Van Burkalow, 1945).
(See table 6 for data.) B, logarithmic regression of
The first step in regression analysis is
same data as in A.
to find that regression line which best fits
the series of points (Goulden, 1939, p. slopes upward to the right, indicating
52-64; Croxton and Cowden, 1946, p. that as X is increased, Y also increases.
654-673; Dixon and Massey, 1951, p. When the sign is minus, the line slopes
153-172). Such a line is shown in figure down to the right, indicating an inverse
10, A. Because we are fitting a straight relationship in which the values of Y de-
line to the points, using arithmetic crease as X is increased. The term b is
scales on both the ordinate and the called the regression coefficient.

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
20 ARTHUR N. STRAHLER

Fitting of the regression is done ac- points, or pairs. An important


cording to a method of least squares, the the method of fitting the regression line
details of which are readily available in by least squares has yielded that one line
statistical texts. In figure 10, A, the re- about which the scatter is the least of all
gression line has been fitted to the lead- possible lines.
shot-slope-angle data and has the equa- It will be recalled that in a frequency-
tion distribution analysis of a sample the
Y = 22.3 - 0.69X . (13a) statistic a described the standard devia-
tion in the sample, but it was necessary
It is evident that the points form a con-
to derive the statistic s in order to esti-
cave-up curve which is not ideally fitted mate the standard deviation of the popu-
by the straight line, but it is the best pos- lation from which the sample was taken.
sible under the supposition that the rela- Similarly, in order to derive an estimate
tionship between shot diameter and of the scatter of the population of y-
slope is basically a straight line. values about the regression line, we de-
Although in the example given here rive the standard error of estimate denoted
one would scarcely need to be concerned by the symbol, Sy.x, where
as to whether or not the trend of points
in the regression could be due to a chance
S,,, _U"iY-~1(rN-2
- Y) 2 (15)
occurrence of values, mathematical sta-

tistics offers a means of testing this pos-


In the lead-shot-slope-angle regression
sibility. As with the frequency-distribu-
the standard error of estimate was 0?48,
tion study of sample data, there is a
meaning that roughly two-thirds of the
measure of dispersion of the individual
y-values would be expected to fall within
points about a central value. In regres-
about yo of slope angle represented by the
sion this is a measure of the departures of
regression line for any particular speci-
the points above or below the regression
fied shot diameter.
line, measured in vertical lines parallel
with the Y-axis. In figure 10, A, this is TEST OF SIGNIFICANCE

illustrated by short lines connecting the In considering a way of testing the sig-
individual points with the regression line. nificance of the regression, three principal
The length of each line is equal to (y - factors enter in: (1) the degree of scatter,
Y), where y is the ordinate of the ob- (2) the slope of the regression line, and
served point and Y is the ordinate on the (3) the number of points used to deter-
fitted regression, both y-values being as-mine the line. We realize intuitively that
sociated with the same x-value. In this if scatter is small, the observed trend is
way the deviations of observed points not likely to be due to chance, whereas a
from the fitted line can be measured. Thelarge scatter would suggest that the
measure of dispersion is analogous to the trend might be due to chance alone. It
standard deviation in frequency distribu- can also be appreciated that if the regres-
tion and is defined as
sion line has a very low slope, so as to ap-
proach parallelism with the X-axis, the
fi (y - Y)2
y.x = -- N -, (14) trend is a weak one and might be due to
chance. From this it can be reasoned that
where ay., is the scatter, Y and y
the significance of aare as might be
regression
explained above, and N is the by
measured number of varies di-
a statistic which

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 21

rectly as the slope of the regression line By contrast, the second illustration of
and inversely as the amount of scatter. a linear-regression study in geomorphic
We set up the null hypothesis that the research takes us to an opposite extreme
regression coefficient is equal to zero. To of probability. In studying the amount of
test this, the statistic used is t, and is summer erosion on badland slopes, Stan-
derived here as follows: ley Schumm (1953) drove a large number
of long dowel rods into the clay slopes
t1 (x- )2 (16) along profile lines extending from divide
Sy.x
to slope base. The rods were driven flush
with the slope and were spaced 1 foot
The term b in the numerator is the re-
apart along the profile lines. In all, there
gression coefficient, or slope of the regres-
were 16 profile lines and a total of 113
sion line; the term Sy., in the denomina-
rods, limiting the number to those which
tor is the standard error of estimate.
were embedded along essentially straight
Both have been explained above. In ad-
parts of the slope. It is not possible here
dition, there is in the numerator the term
to explain the procedures fully, but 6
/(x - x)2 in which x is the mean
weeks later the depth of material removed
value of all the individual x-values in the
from the slope by summer rains was
series. This is therefore a measure of the
measured by the extent to which the ends
dispersion of the observed x-values about
of the dowel rods projected above the
a mean x-value.
surface. Most of the depths were be-
The value of t in the illustrative case
tween 0.4 and 1.6 inches.
was computed as follows:
A matter of particular geomorphic in-
t
(- 0.69)(10.49)
= 0.48 - = -- 15.1 approx.
terest was to determine whether or not
the slopes had (1) reclined in angle,
(2) had steepened in angle, or (3) had
We refer to the table of t-distributions
maintained a parallel attitude. This
under 9 degrees of freedom (two less than
could be determined if depth of erosion
the number of pairs in the regression),
were correlated with distance downslope
and find that the probability, P, is very
from the divide, since, if the slope had
much less than 0.001. In other words,
reclined to a lower angle, the trend of re-
there is only a remote possibility of ob-
moval must be one decreasing from top
taining the observed regression by chance
to base. In order to compare distances on
alone if similar numbers of pairs are re-
the 16 different slopes, Schumm used a
peatedly drawn. The null hypothesis is
dimensionless value: percentage of total
rejected, and the slope of the regression
distance from base to top. The next step
is considered significant. Note that the
was to plot the 113 points on a regression
door is left open to the remote possibility
diagram (fig. 11) and to fit a regression
of obtaining this assemblage of paired
line to the points in the manner already
observations by chance alone when no
described. The equation is as follows:
trend actually exists, and we shall never
Y = 0.92+0.00021X.
be absolutely positive that such is not the
case here. The scientist must gamble in Note that the trend or slope is very
the final analysis, but he wants to be sure slight, as indicated by the exceedingly
he has the odds overwhelmingly in his low value of the regression coefficient and
favor. by the fact that the regression line seems

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
22 ARTHUR N. STRAHLER

almost parallel with the X-axis. The not always so; other functions may be
standard error of estimate, sy.x, is quite better. One function that is commonly
high (0.35 inch). Intuitively it seems encountered in geomorphic research is
most unlikely that under these condi- the logarithmic function,
tions the regression is a significant one.
log Y =log a+ b log X , (17)
A t-test of significance, following the
method described above, yielded a prob- in which the terms are the same as in
ability of 0.80, or 80 per cent. Thus in 80 linear regression except that the loga-
times out of 100 we should expect so rithm of the term is used where indicat-
great a trend or a greater one through ed. Another way of stating this same
chance alone if similar sets of measure- function is
ments were repeatedly made in this area.
Y =aXb, (18)
The conclusion, so far as the geomorphic
interpretation is concerned, is that we which is called a power function, inas-
much as the independent variable is
24
raised to a power b. Plotted on log-log
graph paper, a logarithmic or power
2.0
function is a straight line.
16
Figure 10, B, is a log-log plot of Van
Burkalow's lead-shot-repose-angle data.
8 Notice that the points fall close to the
4
fitted logarithmic function and show no
tendency to produce a concave-up trend,
0

0 20 40 60 80 100 as they do on the linear, or arithmetic,


Distance (%)
plot (fig. 10, A).
FIG. 11.-Linear regression of depth of erosion Unquestionably, the logarithmic anal-
on percentage of distance from top to bottom of
slope (Schumm, 1953).
ysis provides the better fit, and this can
be tested statistically. In figure 10, B, the
have no reason to doubt that the slopes fitted function is

are retreating in a parallel manner. We log Y = 1.57 -0.39 log X (19)


have not ruled out the possibility that re- or
clining or steepening retreat is actually
Y = 36.8X-039 . (19a)
occurring, but the data do not suggest it.
A geomorphologist who chooses to think According to the requirements of
that slopes retreat in parallel planes may data, an exponential (semilogarithmic)
continue to think so, in so far as this evi-function of the form
dence bears on the question.
y = yoeax (20)
or

NONLINEAR (LOGARITHMIC)
REGRESSION
log() = ax , (20a)
The regression previously discussed
assumed that a linear equation (Y = or quadratic equations of the form
a + bX) gave the best possible descrip- Y= a + bX + cX2 (21)
tion of the relationship between depend-
or
ent and independent variables. This is Y = a+ bX+ cX2+dX3 (22)

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 23

may be fitted to the data. Of these last- thus determine two possible regression
mentioned forms, the exponential (semi- coefficients. In the perfect case, with all
logarithmic) equation is remarkably well points lying in a straight line, the prod-
adapted to many geomorphic forms and is uct of the coefficients would be unity,
used extensively in stream profiles and and the scatter would be zero.
other fluvial forms (Krumbein, 1937). In the slope-length-slope-angle corre-
lation shown in figure 12, the correlation
CORRELATION
coefficient, r, is 0.485. Because an ideal,
In correlation studies we are concerned or perfect, correlation is represented by a
simply with demonstrating the existence
(or lack of existence) of a relationship be-
tween two sets of numerical values taken
in pairs. An example in geomorphic re-
search is afforded by the work of Ken-
neth G. Smith (1953) in the Big Bad-
lands of South Dakota. Smith systemati-
cally measured the lengths and the slope
angles of the steep badland slopes bor-
dered by pediments. A plot of slope
length versus slope angle is shown in fig-
ure 12. There were 134 pairs of variates
altogether. Examination of the scatter
diagram, figure 12, might indicate that a
significant correlation is doubtful be-
cause the degree of scatter is great. The
trend appears to be an increasing one but
is very steep, almost paralleling the
Y-axis. The degree to which correlation
exists may be represented by the statistic
r, termed the correlation coefficient (Goul-
den, 1939, p. 65-77; Croxton and Cow-
Slope angle, degrees
den, 1946, p. 653-654; Dixon and Mas-
sey, 1951, p. 162-165), which is defined FIG. 12.-Correlation of slope length with slope
angle on steep badland slopes bordering miniature
as follows:
pediments in the lower Brule formation, Big Bad-
(23) lands, South Dakota (Smith, 1953).
~ byzr b;cy

The term b means "regression coeffi- value of r = 1.0, it may seem likely that
cient," or slope of a regression line, as we are dealing with a questionable cor-
used previously in the regression analy- relation. A test of significance of the cor-
sis. The modified symbol by, refers to therelation makes use of the statistic t, de-
regression coefficient that results when xfined as follows:
is assumed to be the independent vari- r /N - m
able; bx, is the regression coefficient when (24)
/1 - r2 '
y is assumed to be the independent vari-
4 In the event that the values of b,, and bx, are
able.4 That is to say, one can fit two re- negative, indicating that y decreases as x increases,
gression lines to the correlation data and a negative sign should be applied to r.

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
24 ARTHUR N. STRAHLER

where r is the correlation coefficient, N is into this discussion. Computa-


brought
the number of pairs of variates, and m is tional instructions and explanations of
the number of degrees of freedom lost theory have necessarily been omitted, be-
(two in this case). Substituting in equa- cause these are within easy reach of all
tion (24), we obtain researchers.

The geomorphologist who seeks to go


0.485 /134- 2 beyond mere verbal descriptions of
S/1 - (0.485) 2 scenery to a rigorous, quantitative ap-
5.57 proach will invariably be taking limited
- 6.4 approx .
0.875 samples from vastly larger populations.
Statistical methods offer means of devel-
The t-distribution table shows the prob-
oping unbiased, objective sampling pro-
ability to be very much less than 0.001
cedures and of extracting the maximum
(0.1 per cent). The correlation is thus
possible information from the numerical
definitely significant, because, if similar
data thus recorded. Testing procedures
sets of sample pairs were drawn repeated-
guard against the use of unreliable data
ly from this population, the chances of
as the basis of drawing unwarranted con-
obtaining as good a correlation by chance
clusions. On the other hand, the statisti-
alone would be extremely small-much
cal tests may give increased confidence in
less than 1 chance in 1,000 if no correla-
stating conclusions where significant dif-
tion existed. The geomorphologist may
ferences and trends can be demonstrated.
therefore conclude that an increase in
As quantitative analysis opens up vast
slope angle is, in general, associated with
new areas of study of land forms and the
an increase in slope length in the area in
processes involved in their development,
which the samples were taken. What pos-
the methods of statistics will become an
sible cause is involved is left for him to
indispensable part of geomorphic re-
decide in the light of other data and expe-
search.
rience. In this instance Smith formulated
a reasonable hypothesis to explain the
ACKNOWLEDGMENTS.-The author is in-
observed correlation. The statistical test
debted to the Statistical Consulting Service of
served to show beyond question that a the Department of Mathematical Statistics
correlation was present, and this would of Columbia University, and to Mr. M. Vernon
have been very much in doubt from Johns,
mereJr., of that department for advice on
the application of various tests to geomorphic
visual appraisal of the scatter diagram.
problems. The unpublished research work of
The relationship would have completely several of the author's colleagues and graduate
escaped a field observer relying solely students has been drawn upon for illustrative
upon unaided eyesight. It is in such examples; individual acknowledgments are
phases of the investigation that system-inserted at appropriate places in the text.
The greater part of the illustrative examples
atic quantitative sampling and testing
are from a research project supported by the
can serve to bring forward new data Geography
in an Branch of the Office of Naval
otherwise limited and unproductive field Project no. NR-089-042, under con-
Research,
study.
tract N6-ONR-271, Task Order 30, with
Columbia University. It is intended that this
SUMMARY STATEMENT paper will serve as a general guide in current and
future studies under this project, which has as
Only a few of the most simple opera-
its purpose the quantitative investigation of the
tions and tests of statistics have been erosional land forms.

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms
STATISTICAL ANALYSIS IN GEOMORPHIC RESEARCH 25

REFERENCES CITED

CROXTON, F. E., and COWDEN, D. J., 1946, AppliedSTANLEY A., 1953, Erosion measured on
SCHUMM,
general statistics: New York, Prentice-Hall, badland
Inc. slopes: Unpublished manuscript of paper
DIXON, W. J., and MASSEY, F. J., JR., 1951, Intro-
read at Am. Geophys. Union meetings, Wash-
duction to statistical analysis: New York, ington, D.C., May, 1953. (Abstract printed in
McGraw-Hill Book Co., Inc. 34th Annual Meeting program, p. 348.)
FISHER, R. A., and YATES, F., 1938, Statistical
SMITH, KENNETH G., 1953, Erosional processes and
tables for biological, agricultural, and medical
landforms in Badlands National Monument,
research: Edinburgh, Oliver and Boyd.
South Dakota: Unpublished doctoral disserta-
GOULDEN, C. H., 1939, Methods of statistical
tion, Columbia University.
analysis: New York, John Wiley and Sons, Inc.
STRAHLER, A. N., 1950, Equilibrium theory of ero-
KRUMBEIN, W. C., 1937, Sediments and exponential
curves: Jour. Geology, v. 45, p. 577-601. sional slopes approached by frequency distribu-
MILLER, VICTOR C., 1953, A quantitative geo- tion analysis: Am. Jour. Sci., v. 248, p. 673-696,
800-814.
morphic study of drainage basin characteristics
in the Clinch Mountain area of Virginia and VAN BURKALOW, A., 1945, Angle of repose and angle
Tennessee: Unpublished doctoral dissertation, of sliding friction: an experimental study: Bull.
Columbia University. Geol. Soc. America, v. 56, p. 669-708.

This content downloaded from 168.176.5.118 on Fri, 14 Sep 2018 00:34:00 UTC
All use subject to https://about.jstor.org/terms

Potrebbero piacerti anche