Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
2 MARCH, 1959
Psychological Bulletin
CONVERGENT AND DISCRIMINANT VALIDATION BY THE
MULTITRAIT-MULTIMETHOD MATRIX1
DONALD T. CAMPBELL
Northwestern University
AND DONALD W. FISKE
University of Chicago
Method 1 B, .sN^.89)
C, .38 ^7X^.76)
_
A:,
x x^ x.22 .llj .<J7"X..42 .33i (.94)
•*" 1
Method 3 B» S
.23X "x.5* V "x
.12! |.43\ .rf(T v x .34| .67X(.92)
i
~s ""si ( *X "^N^ ^^X
and may not be met even when the such phrases as "external variable,"
validity coefficients are of substantial "criterion performance," "behavioral
size. In Table 1, all of the validity criterion" (American Psychological
values meet this requirement. A Association, 1954, pp. 13-15) used in
third common-sense desideratum is connection with concurrent and pre-
that a variable correlate higher with dictive validity. For construct valid-
an independent effort to measure the ity it has been stated thus: "Numer-
same trait than with measures de- ous successful predictions dealing
signed to get at different traits which with phenotypically diverse 'criteria'
happen to employ the same method. give greater weight to the claim of
For a given variable, this involves construct validity than do ... pre-
comparing its values in the validity dictions involving very similar be-
diagonals with its values in the heter- havior" (Cronbach & Meehl, 1955, p.
otrait-monomethod triangles. For 295). The importance of independ-
variables Ai, Bi, and Ci, this require- ence recurs in most discussions of
ment is met to some degree. For the proof. For example, Ayer, discussing
other variables, As, A3 etc., it is not a historian's belief about a past
met and this is probably typical of event, says "if these sources are
the usual case in individual differ- numerous and independent, and if
ences research, as will be discussed in they agree with one another, he will
what follows. A fourth desideratum be reasonably confident that their ac-
is that the same pattern of trait in- count of the matter is correct" (Ayer,
terrelationship be shown in all of the 1954, p. 39). In discussing the man-
heterotrait triangles of both the mon^ ner in which abstract scientific con-
omethod and heteromethod blocks. cepts are tied to operations, Feigl
The hypothetical data in Table 1 speaks of their being "fixed" by "tri-
meet this requirement to a very angulation in logical space" (Feigl,
marked degree, in spite of the dif- 1958, p. 401).
ferent general levejs of correlation in- Independence is, of course, a mat-
volved in the several heterotrait tri- ter of degree, and in this sense, relia-
angles. The last three criteria pro- bility and validity can be seen as re-
vide evidence for discriminant va- gions on a continuum. (Cf. Thur-
lidity. stone, 1937, pp. 102-103.) Reliability
Before examining the multitrait- is the agreement between two efforts
multimethod matrices available in to measure the same trait through
the literature, some explication and maximally similar methods. Validity
justification of this complex of re- is represented in the agreement be-
quirements seems in order. tween two attempts to measure the
Convergence of independent methods: same trait through maximally differ-
the distinction between reliability and ent methods. A split-half reliability
validity. Both reliability and validity is a little more like a validity coeffi-
concepts require that agreement be- cient than is an immediate test-retest
tween measures be demonstrated. A reliability, for the items are not quite
common denominator which most identical. A correlation between
validity concepts share in contradis- dissimilar subtests is probably a reli-
tinction to reliability is that this ability measure, but is still closer to
agreement represent the convergence the region called validity.
of independent approaches. The con- Some evaluation of validity can
cept of independence is indicated by take place even if the two methods
84 D. T. CAMPBELL AND D. W. FISKE
are not entirely independent. In method triangles are as high as those
Table 1, for example, it is possible in the validity diagonal, or even
that Methods 1 and 2 are not en- where within a monomethod block,
tirely independent. If underlying the heterotrait values are as high as
Traits A and B are entirely inde- the reliabilities. Loevinger, Gleser,
pendent, then the .10 minimum cor- and DuBois (1953) have emphasized
relation in. the heterotrait-hetero- this requirement in the development
method triangles may reflect method of maximally discriminating subtests.
covariance. What if the overlap of When a dimension of personality is
method variance were higher? All hypothesized, when a construct is
correlations in the heteromethod proposed, the proponent invariably
block would then be elevated, includ- has in mind distinctions between the
ing the validity diagonal. The hetero- new dimension and other constructs
method block involving Methods 2 already in use. One cannot define
and 3 in Table 1 illustrates this. The without implying distinctions, and
degree of elevation of the validity the verification of these distinctions
diagonal above the heterotrait-heter- is an important part of the valida-
omethod triangles remains compa- tional process. In discussions of con-
rable and relative validity can still be struct validity, it has been expressed
evaluated. The interpretation of the in such terms as "from this point of
validity diagonal in an absolute fash- view, a low correlation with athletic
ion requires the fortunate coincidence ability may be just as important and
of both an independence of traits encouraging as a high correlation
and an independence of methods, with reading comprehension" (APA,
represented by zero values in the 1954, p. 17).
heterotrait-heteromethod triangles. The test as a trait-method unit. In
But zero values could also occur any given psychological measuring
through a combination of negative device, there are certain features or
correlation between traits and posi- stimuli introduced specifically to
tive correlation between methods, or represent the trait that it is intended
the reverse. In practice, perhaps all to measure. There are other features
that can be hoped for is evidence for which are characteristic of the
relative validity, that is, for common method being employed, features
variance specific to a trait, above and which could also be present in efforts
beyond shared method variance. to measure other quite different
Discriminant validation. While the traits. The test, or rating scale, or
usual reason for the judgment of in- other device, almost inevitably elicits
validity is low correlations in the systematic variance in response due
validity diagonal (e.g., the Downey to both groups of features. To the ex-
Will-Temperament Test [Symonds, tent that irrelevant method variance
1931, p. 337ff]) tests have also been contributes to the scores obtained,
invalidated because of too high cor- these scores are invalid.
relations with other tests purporting This source of invalidity was first
to measure different things. The noted in the "halo effects" found in
classic case of the social intelligence ratings (Thorndike, 1920). Studies
tests is a case in point. (See below of individual differences among lab-
and also [Strang, 1930; R. Thorndike, oratory animals resulted in the recog-
1936].) Such invalidation occurs nition of "apparatus factors," usu-
when values in the heterotrait-hetero- ally more dominant than psychologi-
VALIDATION BY THE MULTITRAIT-MULT1 METHOD MATRIX 85
cal process factors (Tryon, 1942). method is of course relative to the
For paper-and-pencil tests, methods test constructor's intent. What is an
variance has been noted under such unwanted response set for one tester
terms as "test-form factors" (Ver- may be a trait for another who wishes
non: 1957, 1958) and "response sets" to measure acquiescence, willingness
(Cronbach: 1946, 1950; Lorge, 1937). to take an extreme stand, or tendency
Cronbach has stated the point partic- to attribute socially desirable attri-
ularly clearly: "The assumption is butes to oneself (Cronbach: 1946,
generally made . . . that what the 1950; Edwards, 1957; Lorge, 1937).
test measures is determined by the
content of the items. Yet the final MULTITRAIT-MULTIMETHOD MA-
score . . . is a composite of effects re- TRICES IN THE LITERATURE
sulting from the content of the item Multitrait-multimethod matrices
and effects resulting from the form are rare in the test and measurement
of the item used" (Cronbach, 1946, literature. Most frequent are two
p. 475). "Response sets always lower types of fragment: two methods and
the logical validity of a test. . . . one trait (single isolated values from
Response sets interfere with infer- the validity diagonal, perhaps ac-
ences from test data" (p. 484). companied by a reliability or two),
While E. L. Thorndike (1920) was and heterotrait-monomethod tri-
willing to allege the presence of halo angles. Either type of fragment is
effects by comparing the high ob- apt to disguise the inadequacy of our
tained correlations with common present measurement efforts, particu-
sense notions of what they ought to larly in failing to call attention to the
be (e.g., it was unreasonable that a preponderant strength of methods
teacher's intelligence and voice qual- variance. The evidence of test valid-
ity should correlate .63) and while ity to be presented here is probably
much of the evidence of response set poorer than most psychologists would
variance is of the same order, the have expected.
clear-cut demonstration of the pres- One of the earliest matrices of this
ence of method variance requires kind was provided by Kelley and
both several traits and several meth- Krey in 1934. Peer judgments by
ods. Otherwise, high correlations be- students provided one method, scores
tween tests might be explained as due on a word-association test the other.
either to basic trait similarity or to Table 2 presents the data for the four
shared method variance. In the most valid traits of the eight he em-
multitrait-multimethod matrix, the ployed. The picture is one of strong
presence of method variance is indi- method factors, particularly among
cated by the difference in level of cor- the peer ratings, and almost total in-
relation between the parallel values validity. For only one of the eight
of the monomethod block and the measures, School Drive, is the value
heteromethod blocks, assuming com- in the validity diagonal (.16!) higher
parable reliabilities among all tests. than all of the heterotrait-hetero-
Thus the contribution of method var- method values. The absence of dis-
iance in Test Ai of Table 1 is indi- criminant validity is further indi-
cated by the elevation of r^^ above cated by the tendency of the values
^AiB2i i-e., the difference between .51 in the monomethod triangles to ap-
and .22, etc. proximate the reliabilities.
The distinction between trait and An early illustration from the ani-
86 D. T. CAMPBELL AND D, W. FISKE
TABLE 2
PERSONALITY TRAITS OF SCHOOL CHILDREN FROM KELLEY'S STUDY
A! Bi C, A2 C2
Peer Ratings
Courtesy (.82)
Honesty Bi .74 (.80)
Poise Ci .63 .65 (.74)
School Drive Di .76 .78 .65 (.89)
Association Test
Courtesy A2 .13 .14 .10 .14 (.28)
Honesty B2 .06 .12 .16 .08 .27 (.38)
Poise C2 .01 .08 .10 .02 .19 .37 (.42)
School Drive D2 .12 .15 .14 .16 .27 .32 .18 (.36)
Memory Vocabular
y
A, B! A, B,
Memory
Social Intelligence (Memory for Names & Faces) Ai ( )
Mental Alertness (Learning Ability) BI .31 (
Comprehension
Social Intelligence (Sense of Humor) As .30 .31 ( )
Mental Alertness (Comprehension) Ba .29 .38 .48 ( )
Vocabulary
Social Intelligence (Recog. of Mental State) A8 .23 .35 .31 .35 ( )
Mental Alertness (Vocabulary) B, .30 .58 .40 .48 .47 ( )
Social Content
Memory (Memory for Names and Faces) Ai
Comprehension (Sense of Humor) Bi .30 ( )
Vocabulary (Recognition of Mental State) Ci .23 .31 ( )
Abstract Content
Memory (Learning Ability) As .31 .31 .35 ( )
Comprehension 82 .29 .48 .35 .38 ( )
Vocabulary Cs .30 .40 .47 .58 .48 ( )
Sociometric Observation
Free Behavior
Shows solidarity
Gives suggestion .25
Gives opinion d .13 .24
Gives orientation Di -.14 .26 .52
Shows disagreement .34 .41 .27 .02
Role Playing
Shows solidarity At .43 .43 .08 .10 .29 ( )
Gives suggestion B* .16 .32 .00 .24 .07 .37 ( )\
Gives opinion c. .15 .27 .60 .38 .12 .01 .10 ( )
Gives orientation D* -.12 .24 .44 .74 .08 .04 .18 .40 ( )
Shows disagreement £2 .51 .36 .14 -.12 .50 .39 .27 .23 -.11 ( )
Projective Test
Shows solidarity
Gives suggestion
Gives opinion
Gives orientation
A,
B,
c,
D,
.20
.05
.31
-.01
.17
.21
.30
.09
.16
.05
.12
.08
.13 -.02
.30
.08
.13
.26
.35 -.05
.17
.10
.25
.12
.19
.03 - .00
.30
.19 -.02
.17
.06
.15 -.04
.19 .33
.22
.30
.53
.00
(
.31
.37
)
.32 ( )
.63 ( )
.29 .32 ( )
I
Shows disagreement E, .13 .18 .10 .14 .19 .22 .28 .02 .04 .23 .27 .51 .47 .30 ( )
Si
92 D. T. CAMPBELL AND D. W. FISKE
TABLE 9
MAYO'S INTERCORRELATIONS BETWEEN OBJECTIVE AND RATING
MEASURES OF INTELLIGENCE AND EFFORT
(#=166)
sumed that the measures are labeled (.84 and .85). The objective meas-
in correspondence with the correla- ures share no appreciable apparatus
tions expected, i.e., in correspondence overlap because they were independ-
with the traits that the tests are ent operations. In spite of Mayo's
alleged to diagnose. Note that in argument that the ratings have some
Table 8, Gives Opinion is the best valid trait variance, the .46 hetero-
projective test predictor of both free trait-heteromethod value seriously de-
behavior and role playing Shows Dis- preciates the otherwise impressive .46
agreement. Were a proper theoretical and .40 validity values.
rationale available, these values Cronbach (1949, p. 277) and Ver-
might be regarded as validities. non (1957, 1958) have both discussed
Mayo (1956) has made an analysis the multitrait-multimethod matrix
of test scores and ratings of effort and shown in Table 10, based upon data
intelligence, to estimate the contribu- originally presented by H. S. Conrad.
tion of halo (a kind of methods vari- Using an approximative technique,
ance) to ratings. As Table 9 shows, Vernon estimates that 61% of the
the validity picture is ambiguous. systematic variance is due to a gen-
The method factor or halo effect for eral factor, that 21|% is due to the
ratings is considerable although the test-form factors specific to verbal or
correlation between the two ratings to pictorial forms of items, and that
(.66) is well below their reliabilities but 11 J% is due to the content fac-
TABLE 10
MECHANICAL AND ELECTRICAL FACTS MEASURED BY VERBAL AND PICTORIAL ITEMS
Verbal Items Pictorial Items
A! B, A2 B2
Verbal Items
Mechanical Facts Ai (.89)
Electrical Facts B! .63 (.71)
Pictorial Items
Mechanical Facts A2 .61 .45 (.82)
Electrical Facts B2 .49 .51 .64 (.67)
VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX 93
tors specific to electrical or to mechan- variance, and thus as having an in-
ical contents. Note that for the pur- flated validity diagonal. The more
poses of estimating validity, the in- independent heteromethod blocks in-
terpretation of the general factor, volving Peer Ratings show some evi-
which he estimates from the .49 and dence of discriminant and convergent
.45 heterotrait-heteromethod values, validity, with validity diagonals av-
is equivocal. It could represent de- eraging .33 (InventoryXPeer Rat-
sired competence variance, represent- ings) and .39 (Self Ratings X Peer
ing components common to both elec- Ratings) against heterotrait-hetero-
trical and mechanical skills—perhaps method control values averaging .14
resulting from general industrial shop and .16. While not intrinsically im-
experience, common ability compo- pressive, this picture is nonetheless
nents, overlapping learning situations, better than most of the validity ma-
and the like. On the other hand, this trices here assembled. Note that the
general factor could represent over- Self Ratings show slightly higher
lapping method factors, and be due to validity diagonal elevations than do
the presence in both tests of multiple the Inventory scores, in spite of the
choice item format, IBM answer much greater length and undoubtedly
sheets, or the heterogeneity of the 5s higher reliability of the latter. In ad-
in conscientiousness, test-taking mo- dition, a method factor seems almost
tivation, and test-taking sophistica- totally lacking for the Self Ratings,
tion. Until methods that are still while strongly present for the Inven-
more different and traits that are tory, so that the Self Ratings come
still more independent are introduced off much the best if true trait vari-
into the validation matrix, this gen- ance is expressed as a proportion of
eral factor remains uninterpretable. total reliable variance (as Vernon
From this standpoint it can be seen [1958] suggests). The method factor
that 21 J% is a very minimal estimate in the STDCR Inventory is undoubt-
of the total test-form variance in the edly enhanced by scoring the same
tests, as it represents only test-form item in several scales, thus contribut-
components specific to the verbal or ing correlated error variance, which
the pictorial items, i.e., test-form could be reduced without loss of reli-
components which the two forms do ability by the simple expedient of
not share. Similarly, and more hope- adding more equivalent items and
fully, the lli% content variance is a scoring each item in only one scale.
very minimal estimate of the total It should be noted that Carroll makes
true trait variance of the tests, repre- explicit use of the comparison of the
senting only the true trait variance validity diagonal with the hetero-
which electrical and mechanical trait-heteromethod values as a valid-
knowledge do not share. ity indicator.
Carroll (1952) has provided data
on the Guilford-Martin Inventory of RATINGS IN THE ASSESSMENT STUDY
Factors STDCR and related ratings OF CLINICAL PSYCHOLOGISTS
which can be rearranged into the The illustrations of multitrait-
matrix of Table 11. (Variable R has multimethod matrices presented so
been inverted to reduce the number far give a rather sorry picture of the
of negative correlations.) Two of the validity of the measures of individual
methods, Self Ratings and Inventory differences involved. The typical
scores, can be seen as sharing method case shows an excessive amount of
TABLE 11
GUILFORD-MARTIN FACTORS STDCR AND RELATED RATINGS
A! Bj c, D, Ex A2 B2 C2 D2 E2 A3 B, C3 D3 E'3 to
-3
Staff Ratings
Assertive (.89)
Cheerful .37 (.85)
Serious -.24 -.14 (-81) to
Unshakable Poise .25 .46 .08 (.84)
Broad Interests E,' .35 .19 .09 .31 (.92)
Teammate Ratings
Assertive At .71 .35 -.18 .26 .41 (.82) §
Cheerful B2 .39 .53 -.15 .38 .29 .37 (.76) t)
Serious C2 -.27 -.31 .43 -.06 .03 -.15 -.19 (-70)
Unshakable Poise D2 .03 -.05 .03 .20 .07 .11 .23 .19 (.74) S
Broad Interests E2 .19 .05 .04 .29 .47 .33 .22 .19 .29 (-76)
Self Ratings
Assertive As .48 .31 -.22 .19 .12 .46 .36 -.15 .12 -23 ( )
s
Cheerful B3 .17 .42 -.10 .10 -.03 .09 .24 -.25 -.11 -.03 .23 ( )
Serious C3 -.04 -.13 .22 -.13 -.05 -.04 -.11 .31 .06 .06 -.05 -.12 ( )
Unshakable Poise D3 .13 .27 -.03 .22 -.04 .10 .15 .00 .14 -.03 .16 .26 .11 ( )
Broad Interests E3 .37 .15 -.22 .09 .26 .27 .12 -.07 .05 .35 .21 .15 .17 .31 ( )
VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX 97
loading on the first recurrent factor.) be stated in terms of method factors
The picture presented in Table 12 or shared confounded irrelevancies,
is, we believe, typical of the best operate strongly in these data, as
validity in personality trait ratings probably in all data involving rat-
that psychology has to offer at the ings. In such cases, where several
present time. It is comforting to note variables represent each factor, none
that the picture is better than most of the variables consistently meets
of those previously examined. Note the criterion that validity values ex-
that the validities for Assertive ex- ceed the corresponding values in the
ceed heterotrait values of both the monomethod triangles, when the full
monomethod and heteromethod tri- matrix is examined.
angles. Cheerful, Broad Interests, To summarize the validation pic-
and Serious have validities exceeding ture with respect to comparisons of
the heterotrait-heteromethod values validity values with other hetero-
with two exceptions. Only for Un- method values in each block, Table
shakable Poise does the evidence of 13 has been prepared. For each trait
validity seem trivial. The elevation and for each of the three hetero-
of the reliabilities above the hetero- method blocks, it presents the value
trait-monomethod triangles is further of the validity diagonal, the highest
evidence for discriminant validity. heterotrait value involving that trait,
A comparison of Table 12 with the and the number out of the 42 such
full matrix shows that the procedure heterotrait values which exceed the
of having but one variable to repre- validity diagonal in magnitude. (The
sent each factor has enhanced the ap- number 42 comes from the grouping
pearance of validity, although not of the 21 other column values and the
necessarily in a misleading fashion. 21 other row values for the column
Where several variables are all highly and row intersecting at the given
loaded on the same factor, their diagonal value.)
"true" level of intercorrelation is On the requirement that the valid-
high. Under these conditions, sam- ity diagonal exceed all others in its
pling errors can depress validity diag- heteromethod block, none of the
onal values and enhance others to traits has a completely perfect record,
produce occasional exceptions to the although some come close. Assertive
validity picture, both in the hetero- has only one trivial exception in the
trait-monomethod matrix and in the Teammate-Self block. Talkative has
heteromethod-heterotrait triangles. almost as good a record, as does
In this instance, with an N oi 124, the Imaginative. Serious has but two in-
sampling error is appreciable, and consequential exceptions and Interest
may thus be expected to exaggerate in Women three. These traits stand
the degree of invalidity. out as highly valid in both self-
Within the monomethod sections, description and reputation. Note
errors of measurement will be cor- that the actual validity coefficients of
related, raising the general level of these four traits range from but .22 to
values found, while within the heter- .82, or, if we concentrate on the
omethods block, measurement errors Teammate-Self block as most cer-
are independent, and tend to lower tainly representing independent
the values both along the validity methods, from but .31 to .46. While
diagonal and in the heterotrait tri- these are the best traits, it seems that
angles. These effects, which may also most of the traits have far above
98 D. T, CAMPBELL AND D. W. FISKE
TABLE 13
VALIDITIES OF TRAITS IN THE ASSESSMENT STUDY OF CLINICAL PSYCHOLOGISTS,
AS JUDGED BY THE HETEROMETHOD COMPARISONS
chance validity. All those having 10 Self block, all but five for the most
or fewer exceptions have a degree of independent block, Teammate-Self.
validity significant at the .001 level The exceptions to significant validity
as crudely estimated by a one-tailed are not parallel from column to col-
sign test.3 All but one of the variables umn, however, and only 13 of 22
meet this level for the Staff-Team- variables have .001 significant valid-
mate block, all but four for the Staff- ity in'all three]blocks. These are indi-
cated^by anvasterisk in Table 13.
8
If we take the validity value as fixed (ig- This highly significant general
noring its sampling fluctuations), then we can level of validity must not obscure the
determine whether the number of values meaningful problem created by the
larger than it in its row and column is less than
expected on the null hypothesis that half the occasional exceptions, even for the
values would be above it. This procedure re- best variables. The excellent traits
quires the assumption that the position (above of Assertive and Talkative provide
or below the validity value) of any one of a case in point. In terms^ofjFiske's
these comparison values is independent of the
position of each of the others, a dubious as-
original analysis, both have high
sumption when common methods and trait loadings on the recurrent factor
variance are present. "Confident self-expression" (repre-
VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX 99
sen ted by Assertive in Table 12). initially predisposed to reinterpret
Talkative also had high loadings on self-ratings, to treat them as symp-
the recurrent factor of Social Adapta- toms rather than to interpret them
bility (represented by Cheerful in literally. Thus, we were alert to in-
Table 12). We would expect, there- stances in which the self ratings were
fore, both high correlation between not literally interpretable, yet none-
them and significant discrimination theless had a diagnostic significance
as well. And even at the common when properly "translated." By and
sense level, most psychologists would large, the instances of invalidity of
expect fellow psychologists to dis- self-descriptions found in this assess-
criminate validly between assertive- ment study are not of this type, but
ness (nonsubmissiveness) and talka- rather are to be explained in terms of
tiveness. Yet in the Teammate-Self an absence of communality for one
block, Assertive rated by self cor- of the variables involved. In general,
relates .48 with Talkative by team- where these self descriptions are in-
mates, higher than either of their terpretable at all, they are as literally
validities in this block, .43 and .46. interpretable as are teammate de-
In terms of the average values of scriptions. Such a finding may, of
the validities and the frequency of course, reflect a substantial degree of
exceptions, there is a distinct trend insight on the part of these 5s.
for the Staff-Teammate block to The general success in discriminant
show the greatest agreement. This validation coupled with the parallel
can be attributed to several factors. factor patterns found in Fiske's
Both represent ratings from the ex- earlier analysis of the three intra-
ternal point of view. Both are aver- method matrices seemed to justify an
aged over three judges, minimizing inspection of the factor pattern valid-
individual biases and undoubtedly in- ity in this instance. One possible pro-
creasing reliabilities. Moreover, the cedure would be to do a single analy-
Teammate ratings were available to sis of the whole 66X66 matrix.
the Staff in making their ratings. An- Other approaches focused upon sep-
other effect contributing to the less arate factoring of heteromethods
adequate convergence and discrim- blocks, matrix by matrix, could also
ination of Self ratings was a response be suggested. Not only would such
set toward the favorable pole which methods be extremely tedious, but in
greatly reduced the range of these addition they would leave undeter-
measures (Fiske, 1949, p. 342). In- mined the precise comparison of
spection of the details of the instances factor-pattern similarity. Correlat-
of invalidity summarized in Table 13 ing factor loadings over the popula-
shows that in most instances the ef- tion of variables was employed for
fect is attributable to the high spec- this purpose by Fiske (1949) but
ificity and low communality for the while this provided for the identifica-
self-rating trait. In these instances, tion of recurrent factors, no single
the column and row intersecting at over-all index of factor pattern sim-
the low validity diagonal are asym- ilarity was generated. Since our im-
metrical as far as general level of cor- mediate interest was in confirming a
relation is concerned, a fact covered pattern of interrelationships, rather
over by the condensation provided than in describing it, an efficient
in Table 13. short cut was available: namely to
The personality psychologist is test the similarity of the sets of heter-
100 D. T. CAMPBELL AND D. W. FISKE
otrait values by correlation coeffi- of construct validity (Cronbach &
cients in which each entry repre- Meehl, 1955; APA, 1954), this pa-
sented the size values of the given per is primarily concerned with the
heterotrait coefficients in two differ- adequacy of tests as measures of a
ent matrices. For the full matrix, construct rather than with the ade-
such correlations would be based quacy of a construct as determined
upon the N of the 22X21/2 or 231 by the confirmation of theoretically
specific heterotrait combinations. predicted associations with measures
Correlations were computed between of other constructs. We believe that
the Teammate and Self monometh- before one can test the relationships
ods matrices, selected as maximally between a specific trait and other
independent. (The values to follow traits, one must have some confidence
were computed from the original cor- in one's measures of that trait. Such
relation matrix and are somewhat confidence can be supported by evi-
higher than that which would be ob- dence of convergent and discriminant
tained from a reflected matrix.) The validation. Stated in different words,
similarity between the two mono- any conceptual formulation of trait
methods matrices was .84, corrob- will usually include implicitly the
orating the factor-pattern similarity proposition that this trait is a re-
between these matrices described sponse tendency which can be ob-
more fully by Fiske in his parallel served under more than one experi-
factor analyses of them. To carry mental condition and that this trait
this mode of analysis into the hetero- can be meaningfully differentiated
method block, this block was treated from other traits. The testing of
as though divided into two by the these two propositions must be prior
validity diagonal, the above diagonal to the testing of other propositions to
values and the below diagonal repre- prevent the acceptance of erroneous
senting the maximally independent conclusions. For example, a con-
validation of the heterotrait correla- ceptual framework might postulate a
tion pattern. These two correlated large correlation between Traits A
.63, a value which, while lower, shows and B and no correlation between
an impressive degree of confirmation. Traits A and C. If the experimenter
There remains the question as to then measures A and B by one
whether this pattern upon which the method (e.g., questionnaire) and C
two heteromethod-heterotrait tri- by another method (such as the meas-
angles agree is the same one found in urement of overt behavior in a situa-
common between the two mono- tion test), his findings may be con-
method triangles. The intra-Team- sistent with his hypotheses solely as
mate matrix correlated with the two a function of method variance com-
heteromethod triangles .71 and .71. mon to his measures of A and B but
The intra-Self matrix correlated with not to C.
the two .57 and .63. In general, then, The requirements of this paper are
there is evidence for validity of the intended to be as appropriate to the
intertrait relationship pattern. relatively atheoretical efforts typical
of the tests and measurements field
DISCUSSION as to more theoretical efforts. This
Relation to construct validity. While emphasis on validational criteria ap-
the validational criteria presented are propriate to our present atheoretical
explicit or implicit in the discussions level of test construction is not at all
VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX 101
incompatible with a recognition of multimethod matrix is, we believe,
the desirability of increasing the ex- an important practical first step in
tent to which all aspects of a test and avoiding "the danger . . . that the
the testing situation are determined investigator will fall into the trap of
by explicit theoretical considerations, thinking that because he went from
as Jessor and Hammond have advo- an artistic or literary conception
cated (Jessor & Hammond, 1957). . . . to the construction of items for a
Relation to operationalism. Under- scale to measure it, he has validated
wood (1957, p. 54) in his effective his artistic conception" (Underwood,
presentation of the operationalist 1957, p. 55). In contrast with the
point of view shows a realistic aware- single operationalism now dominant
ness of the amorphous type of theory in psychology, we are advocating a
with which most psychologists work. multiple operationalism, a convergent
He contrasts a psychologist's "lit- operationalism (Garner, 1954; Garner,
erary" conception with the latter's Hake, & Eriksen, 1956), a methodologi-
operational definition as represented cal triangulation (Campbell: 1953,
by his test or other measuring instru- 1956), an operational delineation
ment. He recognizes the importance (Campbell, 1954), a convergent valida-
of the literary definition in communi- tion.
cating and generating science. He Underwood's presentation and that
cautions that the operational defini- of this paper as a whole imply moving
tion "may not at all measure the from concept to operation, a sequence
process he wishes to measure; it may that is frequent in science, and per-
measure something quite different" haps typical. The same point can be
(1957, p. 55). He does not, however, made, however, in inspecting a tran-
indicate how one would know when sition from operation to construct.
one was thus mistaken. For any body of data taken from a
The requirements of the present single operation, there is a subinfinity
paper may be seen as an extension of of interpretations possible; a sub-
the kind of operationalism Under- infinity of concepts, or combinations
wood has expressed. The test con- of concepts, that it could represent.
structor is asked to generate from his Any single operation, as representa-
literary conception or private con- tive of concepts, is equivocal. In an
struct not one operational embodi- analogous fashion, when we view the
ment, but two or more, each as dif- Ames distorted room from a fixed
ferent in research vehicle as possible. point and through a single eye, the
Furthermore, he is asked to make ex- data of the retinal pattern are equiv-
plicit the distinction between his new ocal, in that a subinfinity of hexa-
variable and other variables, distinc- hedrons could generate the same pat-
tions which are almost certainly im- tern. The addition of a second view-
plied in his literary definition. In his point, as through binocular parallax,
very first validational efforts, before greatly reduces this equivocality,
he ever rushes into print, he is asked greatly limits the constructs that
to apply the several methods and sev- could jointly account for both sets of
eral traits jointly. His literary defini- data. In Garner's (1954) study, the
tion, his conception, is now best rep- fractionation measures from a single
resented in what his independent method were equivocal—they could
measures of the trait hold distinc- have been a function of the stimulus
tively in common. The multitrait- distance being fractionated, or they
102 D. T. CAMPBELL AND D. W. FISKE
could have been a function of the trait variance, and in the rearrange-
comparison stimuli used in the judg- ment of the social intelligence ma-
ment process. A multiple, convergent trices of Tables 4 and 5.) It will then
operationalism reduced this equivo- be recognized that measurement pro-
cality, showing the latter conceptual- cedures usually involve several the-
ization to be the appropriate one, and oretical constructs in joint applica-
revealing a preponderance of meth- tion. Using obtained measurements
ods variance. Similarly for learning to estimate values for a single con-
studies: in identifying constructs struct under this condition still re-
with the response data from animals quires comparison of complex meas-
in a specific operational setup there is ures varying in their trait composi-
equivocality which can operationally tion, in something like a multitrait-
be reduced by introducing transposi- multimethod matrix. Mill's joint
tion tests, different operations so de- method of similarities and differences
signed as to put to comparison the still epitomizes much about the ef-
rival conceptualizations (Campbell, fective experimental clarification of
1954). concepts.
Garner's convergent operational- The evaluation of a muUitrait-multi-
ism and our insistence on more than method matrix. The evaluation of the
one method for measuring each con- correlation matrix formed by inter-
cept depart from Bridgman's early correlating several trait-method units
position that "if we have more than must take into consideration the
one set of operations, we have more many factors which are known to
than one concept, and strictly there affect the magnitude of correlations.
should be a separate name to cor- A value in the validity diagonal must
respond to each different set of op- be assessed in the light of the reliabil-
erations" (Bridgman, 1927, p. 10). ities of the two measures involved:
At the current stage of psychological e.g., a low reliability for Test A2
progress, the crucial requirement is might exaggerate the apparent
the demonstration of some conver- method variance in Test Ai. Again,
gence, not complete congruence, be- the whole approach assumes ade-
tween two distinct sets of operations. quate sampling of individuals: the
With only one method, one has no curtailment of the sample with re-
way of distinguishing trait variance spect to one or more traits will de-
from unwanted method variance. press the reliability coefficients and
When psychological measurement intercorrelations involving these
and conceptualization become better traits. While restrictions of range
developed, it may well be appropri- over all traits produces serious diffi-
ate to differentiate conceptually be- culties in the interpretation of a mul-
tween Trait-Method Unit Ai and titrait-multimethod matrix and
Trait-Method Unit A2, in which should be avoided whenever possible,
Trait A is measured by different the presence of different degrees of
methods. More likely, what we have restriction on different traits is the
called method variance will be speci- more serious hazard to meaningful
fied theoretically in terms of a set of interpretation.
constructs. (This has in effect been Various statistical treatments
illustrated in the discussion above in for multitrait-multimethod matrices
which it was noted that the response might be developed. We have con-
set variance might be viewed as sidered rough tests for the elevation
VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX 103
of a value in the validity diagonal the trait as conceptualized. Although
above the comparison values in its this view will reduce the range of
row and column. Correlations be- suitable methods, it will rarely re-
tween the columns for variables strict the measurement to one opera-
measuring the same trait, variance tional procedure.
analyses, and factor analyses have Wherever possible, the several
been proposed to us. However, the methods in one matrix should be com-
development of such statistical meth- pletely independent of each other:
ods is beyond the scope of this paper. there should be no prior reason for
We believe that such summary sta- believing that they share method
tistics are neither necessary nor ap- variance. This requirement is neces-
propriate at this time. Psychologists sary to permit the values in the heter-
today should be concerned not with omethod-heterotrait triangles to ap-
evaluating tests as if the tests were proach zero. If the nature of the
fixed and definitive, but rather with traits rules out such independence
developing better tests. We believe of methods, efforts should be made to
that a careful examination of a multi- obtain as much diversity as possible
trait-multimethod matrix will indi- in terms of data-sources and classifi-
cate to the experimenter what his cation processes. Thus, the classes of
next steps should be: it will indicate stimuli or the background situations,
which methods should be discarded the experimental contexts, should be
or replaced, which concepts need different. Again, the persons provid-
sharper delineation, and which con- ing the observations should have dif-
cepts are poorly measured because of ferent roles or the procedures for
excessive or confounding method var- scoring should be varied.
iance. Validity judgments based on Plans for a validational matrix
such a matrix must take into account should take into account the differ-
the stage of development of the con- ence between the interpretations re-
structs, the postulated relationships garding convergence and discrimina-
among them, the level of technical tion. It is sufficient to demonstrate
refinement of the methods, the rela- convergence between two clearly dis-
tive independence of the methods, tinct methods which show little over-
and any pertinent characteristics of lap in the heterotrait-heteromethod
the sample of 5s. We are proposing triangles. While agreement between
that the validational process be several methods is desirable, conver-
viewed as an aspect of an ongoing gence between two is a satisfactory
program for improving measuring minimal requirement. Discrimina-
procedures and that the "validity tive validation is not so easily
coefficients" obtained at any one achieved. Just as it is impossible to
stage in the process be interpreted in prove the null hypothesis, or that
terms of gains over preceding stages some object does not exist, so one
and as indicators of where further ef- can never establish that a trait, as
fort is needed. measured, is differentiated from all
The design of a multitrait-multi- other traits. One can only show that
method matrix. The several methods this measure of Trait A has little
and traits included in a validational overlap with those measures of B and
matrix should be selected with care. C, and no dependable generalization
The several methods used to measure beyond B and C can be made. For
each trait should be appropriate to example, social poise could probably
104 D. T. CAMPBELL AND D. W. FISKE
REFERENCES
AMERICAN PSYCHOLOGICAL ASSOCIATION. tion: Actual, role-playing, and projective.
Technical recommendations for psychologi- /. abnorm. sac. Psychol., 1955, 51, 394-405.
cal tests and diagnostic techniques. Psy- BRIDGMAN, P. W. The logic of modern physics.
chol. Bull, Suppl., 1954, 51, Part 2, 1-38. New York: Macmillan, 1927.
ANDERSON, E. E. Interrelationship of drives BURWEN, L. S., & CAMPBELL, D. T. The gen-
in the male albino rat. I. Intercorrelations erality of attitudes toward authority and
of measures of drives. J. comp. Psychol., nonauthority figures. /. abnorm. sac. Psy-
1937, 24, 73-118. chol,, 1957, 54, 24-31.
AYER, A. J. The problem of knowledge. New CAMPBELL, D. T. A study of leadership among
York: St Martin's Press, 1956. submarine officers. Columbus: Ohio State
BORGATTA, E. F. Analysis of social interaction Univer. Res. Found., 1953.
and sociometric perception. Sociometry, CAMPBELL, D. T. Operational delineation of
1954, 17, 7-32. "what is learned" via the transposition ex-
BORGATTA, E. F. Analysis of social interac- periment. Psychol. Rev., 1954, 61, 167-174.
VALIDATION BY THE MULTITRAIT-MULTIMETHOD MATRIX 105
CAMPBELL, D. T. Leadership and its effects urements in the social sciences. New York:
•upon the group. Monogr. No. 83. Colum- Scribner, 1934.
bus: Ohio State Univer. Bur. Business Res., KELLY, E. L., & FISKE, D. W. The prediction
1956. of performance in clinical psychology. Ann
CARROLL, J. B. Ratings on traits measured by Arbor: Univer. Michigan Press, 1951.
a factored personality inventory. /. 06- LOEVINGER, J., GLESER, G. C., & DuBois,
norm. soc. Psychol, 1952, 47, 626-632. P. H., Maximizing the discriminating power
CHI, P.-L. Statistical analysis of personality of a multiple-score test. Psychometrika,
rating. J. exp. Educ., 1937, 5, 229-245. 1953, 18, 309-317.
CRONBACH, L. J. Response sets and test valid- LORGE, I. Gen-like: Halo or reality? Psychol.
ity. Educ. psychol. Measmt, 1946, 6, 475- Bull., 1937, 34, 545-546.
494. MAYO, G. D. Peer ratings and halo. Educ.
CRONBACH, L. J. Essentials of psychological psychol. Measmt, 1956, 16, 317-323.
testing. New York: Harper, 1949.
CRONBACH, L. J. Further evidence on re- STRANG, R. Relation of social intelligence to
sponse sets and test design. Educ. psychol. certain other factors. Sch. & Soc., 1930, 32,
Measmt, 1950, 10, 3-31. 268-272.
CRONBACH, L. J., & MEEHL, P. E. Construct SYMONDS, P. M. Diagnosing personality and
validity in psychological tests. Psychol. conduct. New York: Appleton-Century,
Bull., 1955, 52, 281-302. 1931.
EDWARDS, A. L. The social desirability variable THORNDIKE, E. L. A constant error in psy-
in personality assessment and research. New chological ratings. J. appl. Psychol., 1920,
York: Dryden, 1957. 4, 25-29.
FEIGL, H. The mental and the physical. In THORNDIKE, R. L. Factor analysis of social
H. Feigl, M. Scriven, & G. Maxwell (Eds.), and abstract intelligence. J. educ. Psychol.,
Minnesota studies in the philosophy of sci- 1936, 27, 231-233.
ence. Vol. II. Concepts, theories and the THURSTONE, L. L. The reliability and validity
mind-body problem. Minneapolis: Univer. of tests. Ann Arbor: Edwards, 1937.
Minnesota Press, 1958. TRYON, R. C. Individual differences. In F. A.
FISKE, D. W. Consistency of the factorial Moss (Ed.), Comparative Psychology. (2nd
structures of personality ratings from differ- ed.) New York: Prentice-Hall, 1942. Pp.
ent sources. J. abnorm. soc. Psychol., 1949, 330-365.
44, 329-344. UNDERWOOD, B. J. Psychological research.
GARNER, W. R. Context effects and the valid- New York: Appleton-Century-Crofts, 1957.
ity of loudness scales. J. exp. Psychol., 1954, VERNON, P. E. Educational ability and psy-
48, 218-224. chological factors. Address given to the
GARNER, W. R., HAKE, H. W., & ERIKSEN, Joint Education-Psychology Colloquium,
C. W. Operationism and the concept of Univer. of Illinois, March 29, 1957.
perception. Psychol. Rev., 1956, 63,149-159. VERNON, P. E. Educational testing and test-
JESSOR, R., & HAMMOND, K. R. Construct form factors. Princeton: Educational Test-
validity and the Taylor Anxiety Scale. ing Service, 1958. (Res. Bull. RB-58-3.)
Psychol. Bull., 1957, 54, 161-170.
KELLEY, T. L., & KREY, A. C. Tests and meas- Received June 19, 1958.