Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Costner (1965) showed that many of the measures of association most commonly used in the social sciences may be interpreted as indicating the proportional reduction in error by predicting categories, pair orders, ranks, or values
based on the bivariate distribution of observations as opposed to the distribution
for a single variable or a condition of independence between two variables. Although not an exhaustive list, the following may be considered as proportional
reduction in error measures: Goodman and Kruskals lambda (Guttmans coefficient of relative predictability) , Goodman and Kruskals tau-b, Goodman and
Kruskals gamma, Yules Q, Pearsons r2, correlation ratio (eta squared) (Costner, 1965); Somers d,, and d,, (Somers, 1968; Costner, 1968); Kendalls tau
(Wilson, 1969); the square of Spearmans rho (p) Mueller et al., 1970); and
Freemans theta (Crittenden and Montgomery, 1980).
The most significant advantage to be derived from this conceptual approach
to measures of association is that it would elucidate the interpretation of research
findings which focus on the analysis of bivariate relationships. That is, instead of
the previously standard interpretation of observed measures of association as indicating either no relationship or a relationship whose degree is abstractly evaluated as weak, moderate, or strong (cf., Davis, 1971; Gehring, 1978; Levin, 1977;
Ott, Mendenhall, and Larson, 1978), the proportional reduction in error approach provides a clear conceptual basis for the interpretation of such results,
Proportional reduction in error has been increasingly included as a heuristic
device in statistics textbooks (cf., Blalock, 1972; Leonard, 1976; Loether and
McTavish, 1974; Mueller, Schuessler, and Costner, 1977; Ott, Mendenhall and
Larson, 1978; Reynolds, 1977). Additionally, it continues to be used in the de0 1 9 8 1 by The Sociological Quarterly. All rights reserved. 0038-0253/81/1400-0413$00.75
* The author is grateful to Herbert L. Costner and anonymous reviewers for helpful comments on drafts of
this manuscript. Frederick J. Kvizs address is: University of Illinois-Medical Center, 2121 West Taylor
Street, Chicago, Illinois 60680.
414
velopment of new measures of association. For example, Crittenden and Montgomery (1980) recently introduced two asymmetric measures of association (nu
and iota) for cases involving a nominal level independent variable and an ordinal
level dependent variable. But the proportional reduction in error interpretation
has not been used widely in reports of research findings. This is probably because the terminology is cumbersome and because a unique interpretation is
required for each measure. For example, it is rarely reported that a lambda value
of .57, for instance, indicates that errors committed when predicting the marginal
modal category for the dependent variable for all observations are reduced by 57
percent by predicting the conditional modal category for the dependent variable
within categories of the independent variable. Similarly, it is not frequently reported that a gamma value of .62 indicates that errors committed when predicting pair orders by random guessing are reduced by 62 percent by predicting the
order of the most prevalent type of ordered pair observed.
This paper argues for a single, convenient interpretation of proportional reduction in error measures in terms already familiar to social scientists. Specifically,
all proportional reduction in error measures may be interpreted as indicating the
percentage of variation explained in a manner similar to that which is typically
employed in the interpretation of the square of the bivariate linear correlation
coefficient (Pearsons rz). This interpretation is derived directly from the proportional reduction in error computational format that is common to all such
measures.
4 15
El
where PRE = proportional reduction in error,
E, = prediction errors committed using prediction rule 1, and
E, = prediction errors committed using prediction rule 2.
The magnitude of a proportional reduction in error measure indicates the proportion of prediction errors committed using prediction rule 1 that are eliminated
by switching to prediction rule 2.
4 16
pair order, rank, or value. Variation is not unique to interval level variables and
its measurement is not restricted to the computation of the variance, even at the
interval level.
Furthermore, deviation from a predicted category, pair order, rank, or value
may be generally termed prediction error. Therefore, prediction error, as defined
in proportional reduction in error terminology, is equivalent to variation as the
term may be generally defined. This identity was pointed out by Senter (1969:
429), who noted that, contrary to everyday usage, the statistical definition of
error is equivalent to variation. Senter maintained, therefore, that Reduction
in error, statistically speaking, means reduction in variability (emphasis in original; also see Leonard, 1976:326).
Furthermore, because total variation may be partitioned into explained and unexplained variation (Total Variation = Explained Variation
Unexplained
Variation), explained variation can be expressed
(3)
Explained Variation
Total Variation
(4)
4 I7
(RxX p )
+ ((N + 1)/2)
( 1 - p), and R, =
Goodman and Kruskals gamma, probably the most often used measure of association between ordinal variables, indicates the proportional reduction in error
when predicting ordered pairs of observations first by random guessing (rule 1)
and then by examination of the relative preponderance of same- and reverseordered pairs (also referred to as concordant and discordant pairs, or agreements
and inversions, respectively) actually observed (rule 2). Total variation (El) is
41 8
equal to one-half the total number of ordered pairs because the probability of
making a correct prediction by random guessing for a set of dichotomous categories is .5. According to rule 2, each pair is predicted to be either same- or
reverse-ordered depending on which type of pair occurs most frequently and unexplained-variation (E2) is equal to the number of same- or reverse-ordered
pairs, whichever is smaller. Furthermore, this interpretation may be extended to
Yules Q (which is identical to gamma for a 2x2 cross-classification), Somers
d,, and d,,, and Kendalls tau once ties are taken into account because their structure differs only slightly from that of gamma regarding the treatment of tied pairs.
Finally, an example of the applicability of the percentage of variation explained interpretation for nominal level variables is Goodman and Kruskals
lambda, which measures total variation as the number of observations which are
not located within the modal category for the predicted variable. In other words,
El = N - f,,, where N = the total number of observations and f,, = the number of observations located within the modal category. If this expression is divided
by N, the result is equivalent to the variation ratio. Unexplained variation (E2)
in the computation of lambda is equal to the total number of observations which
are not located within the modal category of the predicted variable when predictions are made within each category of a second variable.
Discussion
The percentage of variation explained interpretation is most appropriate for
asymmetrical measures of association, for which the direction of prediction from
an independent variable to a dependent variable is unambiguous. Caution must
be exercised in the case of symmetrical measures (e.g., Goodman and Kruskals
gamma, Kendalls tau, and Pearsons r2) because a result may be interpreted in
either of two directions; that is, as indicating the percentage of variation in the
first variable that is explained by the second variable or as the percentage of variation in the second variable that is explained by the first variable. Detailed discussions of this problem as it pertains to the interpretation of Pearsons r2, and
which may be generalized to other symmetrical proportional reduction in error
measures, are presented in many texts (cf., Blalock, 1972; Korin, 1975; Loether
and McTavish, 1974; Mueller, Schuessler, and Costner, 1977).
It is also important to guard against the inappropriate use of this interpretation with certain correlation coefficients, such as Pearsons r, eta, and Spearmans
rho, which are not proportional reduction in error measures. The square of these
coefficients, however, are proportional reduction in error measures and may
therefore be interpreted as indicating the percentage of variation explained. In
a typical Pearson correlation analysis, for example, the most often reported result
is the correlation coefficient, r. The main disadvantage of squared coefficients
such as r2 is that they are always positive values and therefore do not indicate
the direction of a relationship. But although coefficients such as r, which may
range in vakue from - 1.00 to 1.00, indicate direction, there is no convenient
conceptual basis for interpreting them as indicators of the strength of a relationship. Although many researchers do attempt to consider the size of r to evaluate
the strength of a relationship, this can be seriously misleading because the relationship between r and the percent of variation explained is not linear. As a
41 9
result, rather high values of r may be observed when much less than 50 percent
of the variation is explained. For example, when r = S O , only 25 percent of the
variation is explained; when r = .60, 36 percent of the variation is explained;
and even when r is as high as .70, only 49 percent of the variation is explained.
Therefore, both values, r and r2, should be reported for a Pearson correlation
analysis. Similarly, both the unsquared and squared coefficients should be reported for analyses in which eta and Spearmans rho are computed.
The reader is especially cautioned regarding possible confusion between the
terms variation and variance. Variation is a general term referring to the spread
or dispersion of observations on any variable and, as described earlier in this
paper, may be measured by various methods according to the level of measurement and the purpose at hand. Variance is one particular method for measuring
variation among interval level observations. Therefore, the terminology percentage of variance explained, which is most familiar from Pearson correlation analysis, may be substituted when, and only when, prediction errors have been
measured as variance from a mean value. In all other cases, the general term
variation applies.
A major advantage of the proportional reduction in error interpretation is that
it communicates information regarding the nature or form of a relationship in
accordance with the specification of the prediction rules and prediction errors
for each measure. Although the alternate interpretation as percentage of variation explained is less precise in this regard it nevertheless provides a conceptually
useful and convenient universal interpretive approach for all proportional reduction in error measures.
Just as Costner (1965) argued that the proportional reduction in error interpretation obviates the arbitrary approach to interpreting observed values of a
measure of association in terms of broad levels of degree of strength, so does
the percentage of variation explained interpretation. Furthermore, a more useful
evaluation of the strength of a relationship is provided by the latter approach. For
example, because it is desirable to explain a majority, if not all, of the variation
observed for a variable, a conceptual rather than arbitrary cutting-point for distinguishing between a weak and strong relationship, as indicated by a proportional reduction in error measure, might be set at k.50. This would define a
strong relationship as one where at least 50 percent of the variation is explained
and a weak relationship as one where less than 50 percent of the variation is
explained.
This is only a suggested guideline, however. Additionally, evaluations of
strength must also consider the quality of the data and the purpose of the analysis. That is, when exploring an area in which little or no research has been conducted previously, or valid and reliable measurement methods have not been
developed, it would be wise for the investigator not to ignore relatively weak relationships because they may indicate areas that warrant further investigation. In
contrast, when in an area in which a considerable amount of research has already
been reported, and when using measurement methods whose validity and reliability are well established, the investigator may focus only on relationships that
are relatively strong.
The percentage of variation explained interpretation contributes greatly to the
evaluation of the substantive significance of research findings where proportional
420
-.