Sei sulla pagina 1di 12

Journal of Educational Psychology

2004, Vol. 96, No. 1, 56 67

Copyright 2004 by the American Psychological Association, Inc.


0022-0663/04/$12.00 DOI: 10.1037/0022-0663.96.1.56

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Explaining Paradoxical Relations Between Academic Self-Concepts and


Achievements: Cross-Cultural Generalizability of the Internal/External
Frame of Reference Predictions Across 26 Countries
Herbert W. Marsh

Kit-Tai Hau

University of Western Sydney

The Chinese University of Hong Kong

The internal/external frame of reference (I/E) model explains a seemingly paradoxical pattern of relations
between math and verbal self-concepts and corresponding measures of achievement, extends social
comparison theory, and has important educational implications. In a cross-cultural study of nationally
representative samples of 15-year-olds from 26 countries (total N 55,577), I/E predictions were
supported in that (a) math and verbal achievements were highly correlated, but math and verbal
self-concepts were nearly uncorrelated; (b) math achievement had positive effects on math self-concept,
but negative effects on verbal self-concept; and (c) verbal achievement had positive effects on verbal
self-concept, but negative effects on math self-concept. Supporting the cross-cultural generalizability of
predictions, multigroup structural equation models demonstrated good support for the generalizability of
results across 26 countries participating in the Programme for International Student Assessment project
sponsored by the Organisation for Economic Co-operation and Development.

The importance of self-concept as a relevant outcome variable is


evident in diverse settings, including social psychology, personality, education, child development, mental and physical health,
social services, organizations, industry, and sport. For example,
educational policy statements throughout the world list selfconcept enhancement as a central goal of education and an important vehicle for addressing social inequities experienced by disadvantaged groups. In self-concept research, support for the
construct validity of major instruments and the main theoretical
models has been based largely on responses by students from
Western countriesparticularly English-speaking students in the
United States, Australia, and Canada.
The purpose of the present investigation is to evaluate the
cross-cultural generalizability of relations between math and verbal self-concepts with math and verbal achievements. The internal/
external frame of reference (I/E) model was developed to explain
what initially seemed to be paradoxical patterns of relations between math and verbal self-concepts and corresponding areas of
achievement. In the present investigation, we evaluate the crosscultural generalizability of predictions based on the I/E model,
using a large cross-national sample consisting of large, nationally
representative samples of 15-year-olds from 26 countries.

Multidimensional Academic Self-Concepts and Their


Relation to Achievement
Historically, self-concept measurement, theory, research, and
application emphasized a largely atheoretical, global component of
self-concept, and reviewers noted the lack of theoretical models for
defining and interpreting the construct (e.g., Shavelson, Hubner, &
Stanton, 1976; Wells & Marwell, 1976; Wylie, 1979). In an
attempt to remedy this situation, Shavelson et al. (1976) reviewed
existing research and self-concept instruments and provided a
theoretical definition and model of self-concept that has had a
profound influence on subsequent research (see review by Marsh
& Hattie, 1996). In the Shavelson et al. model, self-concept is
posited to be a multidimensional, hierarchical construct. Global
self-concept, at the apex of the hierarchy, is divided into nonacademic (e.g., social, physical, emotional) and academic components. Of particular relevance to the present investigation, academic self-concept is divided into self-concepts in particular
content areas such as math and verbal self-concepts. Support for
the construct validity of self-concept interpretations and its multidimensionality requires that (a) academic outcomes are more
highly correlated with academic self-concept than with global and
nonacademic components of self-concept, and (b) achievement in
particular domains is more highly correlated with academic selfconcepts in the matching domain (e.g., math achievement and
math self-concept) than self-concepts in nonmatching domains
(e.g., math achievement and general or verbal self-concept).
Stimulated in part by the Shavelson et al. (1976) model and
subsequent research (see Marsh & Hattie, 1996), there is an
ongoing debate about the relative usefulness of unidimensional
perspectives that emphasize a single, relatively unidimensional,
global domain of self-concept (sometimes referred to as selfesteem) and multidimensional perspectives based on multiple,
relatively distinct components of self-concept. For example, Suls

Herbert W. Marsh, SELF Research Centre, University of Western Sydney, Sydney, New South Wales, Australia; Kit-Tai Hau, The Chinese
University of Hong Kong, Hong Kong, China.
This research was funded in part by grants from the Australian Research
Council. We thank Codula Artelt, Jurger Bamert, Oliver Ludtke, Ken
Rowe, Wolfram Schulz, and Ulrich Trautwein for comments and suggestions on earlier versions of this article.
Correspondence concerning this article should be addressed to Herbert
W. Marsh, SELF Research Centre, University of Western Sydney, Bankstown Campus, Locked Bag 1797 Penrith South, Sydney, New South Wales
1797, Australia. E-mail: h.marsh@uws.edu.au
56

CROSS-CULTURAL GENERALIZABILITY IN 26 COUNTRIES

(1993) noted the extreme divergence of claims in related issues in


chapters written by Marsh (1993) and Brown (1993) that appeared
in his monograph, concluding, both Brown and Marsh, who cite
strong support for their viewpoints, cannot be right; or, at minimum, a new integrative theory is needed to reconcile the two
approaches (p. x). Marsh and Craven (1997, p. 191) subsequently
argued that

57

comparison theory was not able to explain the seemingly paradoxical pattern of relations he found between math and verbal selfconcept measures and corresponding measures of math and verbal
achievement, the focus of the present investigation. To explain
these results, he developed the I/E model that is an extension of
social comparison theory.

Internal/External Frame of Reference (I/E) Model

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

If the role of self-concept research is to better understand the complexity of self in different contexts, to predict a wide variety of
behaviors, to provide outcome measures for diverse interventions, and
to relate self-concept to other constructs, then the specific domains of
self-concept are more useful than a general domain.

Support for this claim is particularly strong in educational psychology, where academic self-concept is substantially related to a
wide variety of academic outcomes, whereas global and nonacademic measures of self-concept are much less highly correlated
with these outcomes (Marsh, 1993).

Frame of Reference Effects: Social Comparison Theory


As emphasized in classical social-comparison-theory approaches to self-evaluation (e.g., Diener & Fujita, 1997; Festinger,
1954; Morse & Gergen, 1970; Suls, 1977), even after individuals
obtain information from various sources about their levels of
accomplishment, their self-perceptions must be compared with
some standard or frame of reference in order to form a selfappraisal. Thus, for example, to the extent that individuals have
different frames of reference, the same objective indicators of
academic achievement will lead to different academic selfconcepts. In the typical application of social comparison theory
(e.g., Diener & Fujita, 1997; Marsh, 1993; Morse & Gergen, 1970;
Suls, 1977), it is assumed that individuals evaluate their own
performance in comparison with the performances of others
through social comparison processes. Particularly in educational
settings, there is good support for how social comparison processes
associated with well-established group membership (the school
one attends) affects self-concept. Diener and Fujita (1997) referred
to this as situationally imposed or forced comparisons as opposed
to a more flexible situation in which individuals have considerable
freedom to consciously select or construct a comparison target so
as to maximize various goals. Diener and Fujita emphasized that
schools closely approximate a total environment (where the
frame of reference affecting judgment is limited to the immediate
context) implicit in the forced comparison. The school is a total
environment in that there are so many inherent constraints and a
natural emphasis on social comparison of achievement levels of
classmates in a school setting. Similarly, educational psychologists
(e.g., Covington, 1992; Marsh, 1993; Marsh, Kong, & Hau, 2000;
Marshall & Weinstein, 1984; also see Goethals & Darley, 1987)
emphasized the extreme salience of achievement as a reference
point within a school setting, particularly when the outcome measure is academic self-concept. Diener and Fujita noted that
Marshs research studies (e.g., Marsh, 1993; Marsh & Craven,
1997; Marsh & Parker, 1984) are one of the best validated examples in support of social comparison theory, demonstrating that
imposed comparisons do have a substantial, lasting impact. As
emphasized by Diener and Fujita, Marshs research clearly supports this traditional application of social comparison theory. However, Marsh (1986) found that this classic approach to social

The Shavelson et al. (1976) model assumes that there is a strong


higher order academic self-concept such that there is a substantial
correlation between verbal and math self-concepts. This prediction
also follows from the typically large correlation between math and
verbal academic achievements (typically .50 to .80, depending on
how achievement is measured). Early research, however, demonstrated that math and verbal self-concepts were much more differentiated than the corresponding achievement scores (Marsh, 1986).
In contrast to the expectation of high correlations between math
and verbal self-concepts, math and verbal self-concepts were
nearly uncorrelated. Furthermore, this near-zero correlation was
consistent across different measures of the math and verbal selfconcepts and a diversity of settings (Marsh, 1986; Marsh & Craven, 1997; Marsh & Yeung, 1998). Hence, it seems that individuals with good mathematics skills also tend to have good verbal
skills and vice versa, but people think of themselves as either
math persons or verbal persons but not both.
Theoretical rationale for the I/E model. The I/E model (for
further discussion, see Marsh, 1986, 1990, 1993; Marsh, Byrne, &
Shavelson, 1988; Marsh & Yeung, 2001) was initially developed
to explain why math and verbal self-concepts are almost uncorrelated even though corresponding areas of academic achievement
are substantially correlated. According to the I/E model, academic
self-concept in a particular domain (e.g., math or verbal selfconcepts) is formed in relation to two comparison processes or
frames of reference.
1.

The external (normative) reference is the typical social


comparison process in which students compare their selfperceived performances in a particular school subject
with the perceived performances of other students in the
same school subject and other external standards of actual achievement levels (e.g., normative comparisons,
school grades, class rankings, etc.). If they perceive
themselves to be able in relation to other students and
objective indicators of achievement, then they should
have a high academic self-concept in that school subject.

2.

The internal (ipsative-like) reference is a comparison


process in which students compare their own performance in one particular school subject with their own
performance in other school subjects. If, for example,
ones best school subject is mathematics, then this student should have a positive math self-concept that is
higher than this students verbal self-concept. Similarly,
according to this internal comparison process, a student
may have a favorable math self-concept if math is this
students best subject even if this student is not particularly good at math relative to other students and external
standards. It is this internal comparison process that
represents an extension to traditional social comparison
theory.

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

58

MARSH AND HAU

To clarify how these two processes operate, it is useful to


consider students who accurately perceive themselves to be below
average in both verbal and math skills (an external comparison),
but who are better at mathematics than verbal and other school
subjects (an internal comparison). Their math skills are below
average relative to other students and objective indicators of math
achievement (the external comparison), and this should lead to a
below-average math self-concept. However, their math skills are
above average relative to their other school subjects (an internal
comparison), and this should lead to an above-average math selfconcept. Depending on how these two processes are weighted in
the formation of self-concept, these students may have an average
or even above-average math self-concept even though they have
below-average math skills. The I/E model also predicts that these
students would have better math self-concepts than other students
who did equally poorly at mathematics but who did better in all
other school subjects (i.e., math was their worst subject). Similarly,
a student who is very bright in all school subjects may have an
average or even below-average math self-concept if the student
perceived mathematics to be his or her worst subject.
The external comparison process should result in substantial
positive correlations between math and verbal self-concepts because math and verbal achievements are substantially positively
correlated. However, the ipsative, internal comparison process
should result in a negative correlation between math and verbal
self-concepts because the average correlation among ipsative
scores is necessarily negative (i.e., an increase in any one score
must result in the counterbalancing decrease in average of the
remaining scores if they are ipsative). Both of these processes,
however, affect self-concept responses. Hence, the joint operation
of these processes, depending on the relative weight given to
internal and external comparisons, is consistent with the near-zero
correlation between math and verbal self-concepts that led to the
development of the I/E model. It is, however, important to emphasize that support for the I/E model does not require the correlation between math and verbal self-concepts to be zero, but only
that it be substantially less than the typically substantial positive
correlation between math and verbal achievement.
Domain specificity is a critical feature of the I/E model. However, stronger tests of the I/E model are possible when math and
verbal achievements are related to math and verbal self-concepts
(see Figure 1A). The external comparison process predicts that
good math skills lead to higher math self-concepts and that good
verbal skills lead to higher verbal self-concepts. According to the
internal comparison process, however, good math skills should
lead to lower verbal self-concepts (once the positive effects of
good verbal skills are controlled), that is, The better I am at
mathematics, the poorer I am at verbal subjects (relative to my
good math skills). Similarly, better verbal skills should lead to
lower math self-concept (once the positive effects of good math
skills are controlled). In models used to test this prediction (Figure
1A), the horizontal paths leading from math achievement to math
self-concept and from verbal achievement to verbal self-concept
(the gray horizontal lines in Figure 1A) are predicted to be substantially positive (indicated by in Figure 1A). However, the
cross paths leading from math achievement to verbal self-concept
and from verbal achievement to math self-concept (the dark lines
in Figure 1A) are predicted to be negative. Although consistent
with the I/E model, it is these negative cross pathsthe negative
effects of verbal achievement on math self-concept and of math

Figure 1. Predicted (Panel A) and actual (Panel B) results based on the


internal/external frame of reference model. In Panel A, the horizontal
(positive) paths are predicted to be substantial and positive (), whereas
the cross (negative) paths are predicted to be smaller and negative (). In
Panel B, the actual results based on the total-group (TG) analysis (Model
TG1 in Table 1) and the multiple-group (MG) analysis (Model MG15 in
Table 1) are consistent with predictions.

achievement on verbal self-conceptthat initially appeared to be


paradoxical and that provide the critical test of the I/E model.
It is also important to clarify what is actually being tested in the
I/E model. Typically, math and verbal achievements are substantially positively correlated with each other (the typical big-G that
underlies almost all measures of achievement) and typically each
is positively correlated with both math and verbal self-concepts.
Math achievement is more correlated with math self-concept than
verbal self-concept (and verbal achievement is more correlated
with verbal self-concept than math self-concept). The critical prediction for the I/E model, however, is in terms of the path coefficients. In other words, it is the effect of math achievement on
verbal self-concept after controlling for the effect of verbal
achievement. Hence, once we control for the positive effect of
verbal achievement (and the big-G component shared by math and
verbal achievement) on verbal self-concept, then the unique component of math achievement is negatively associated with verbal
self-concept. Thus, the size of the negative effect of math achievement on verbal self-concept should be a function of the discrepancy between the math and verbal achievement scores. Hence, the
operative construct is a residual score that is conceptually like the
difference score (without some of the statistical problems associated with raw difference scores). Thus, a B average in mathematics
may induce an average or even below-average mathematics selfconcept for the student who earns As in most other school subjects,
but may lead to an above-average math self-concept for the student
who earns Cs in other subjects. In the language of path analysis, it
is the direct effect of math achievement on verbal self-concept that

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

CROSS-CULTURAL GENERALIZABILITY IN 26 COUNTRIES

is predicted to be negativenot the correlation between math


achievement and verbal self-concept.
Empirical support for the I/E model. It is not surprising, of
course, that good verbal skills are associated with good verbal
self-concepts and that good math skills are associated with good
math self-concepts (the positive paths in Figure 1A). More surprising even paradoxicalare the negative paths from verbal
achievement to math self-concept and from math achievement to
verbal self-concept (i.e., being more mathematically able detracts
from verbal self-concept, whereas being more verbally able detracts from math self-concept). In a review of 13 studies that
considered students of different ages and different academic
achievement indicators, Marsh (1986) reported that (a) correlations
between indicators of verbal and math achievement were substantial (.42 to .94), (b) correlations between measures of verbal and
math self-concepts were much smaller (.10 to .19), (c) path
coefficients from verbal achievement to verbal self-concept and
from math achievement to math self-concept were all significantly
positive, (d) path coefficients from math achievement to verbal
self-concept and from verbal achievement to math self-concept
were significantly negative.
This pattern of results consistent with the I/E model was subsequently replicated for responses to each of three different selfconcept instruments by Canadian high school students (Marsh,
Byrne, et al., 1988), for the nationally representative sample of
U.S. high school students in the High School and Beyond Study
(Marsh, 1989), for the nationally representative sample of U.S.
high school students in the National Longitudinal Study (Marsh,
1994b), and for academically gifted students in the United States
(Plucker & Stocking, 2001; Williams & Montgomery, 1995) and
China (Lee, Yeung, Low, & Jin, 2000). Although Bong (1998) was
unsuccessful in her attempt to develop items that differentiated
between the internal and external comparison processes, Marsh
and Yeungs (2001) reanalysis of her results demonstrated that
there was clear support for the I/E predictions based on responses
to her alternative set of items. Across these different studies,
support for the I/E predictions have been shown to generalize well
across different sets of items used to infer self-concept and a
variety of different indicators of achievement (e.g., test scores,
school grades, teacher ratings).
Although much of this support is based on responses by students
from the United States, Canada, and Australia where the native
language is English, there is also some support for the crosscultural or cross-nationality generalizability of these results where
verbal self-concept is based on a native language other than
English (e.g., Norway: Skaalvik & Rankin, 1995; Skaalvik &
Valas, 2001; China: Dai, 2001, 2002; Kong, 2000; Lee et al., 2000;
Marsh, Kong, & Hau, 2001; Yeung & Lau, 1998; Yeung & Lee,
1999; Germany: Marsh & Koller, 2003; Moeller & Koller 2001a,
2001b; United Arab Emirates: Abu-Hilal, 2002; Abu-Hilal &
Bahri, 2000). Thus, for example, Abu-Hilal (2002, p. 2) concluded
that his research with Arabic students confirmed previous findings for the I/E model with western samples, thus adding more to
the universality of the model.
Particularly interesting research in Germany (e.g., Moeller &
Koller, 2001a) provided support for the I/E in a true experimental
study with randomly assigned students, demonstrating how experimentally manipulated feedback on achievement in one subject
area had an inverse effect on self-concept in a different area. The
authors concluded that as shown experimentally for the first time,

59

dimensional comparison information can have inverse effects on


task-related cognitions in other domains (p. 833). Importantly,
they demonstrated simultaneous support for both the internal comparison process (based on experimentally manipulated feedback
about the relative performance in two different tasks) and the
external comparison process (based on experimentally manipulated feedback about performance relative to other students). This
research is important, using a true experimental design with random assignment to groups to support the causality in the causal
path models that are the basis of the I/E model.
Whereas a growing number of studies have found support for
the I/E model in different countries and cultural groups, each was
based on results from a single country and a methodology (e.g.,
achievement indicators, instrumentation, translation, selection and
representativeness of the sample, and statistical analysis) that is
largely idiosyncratic to the particular study. In their critique of
research in this area, Marsh and Yeung (2001) noted the need to
pursue cross-national and cross-cultural comparisons systematically in order to evaluate more fully the generalizability of support
based on the I/E model. The present investigation in which the
same materials (with appropriate translation) were used in representative samples from different countries is clearly stronger in
terms of cross-cultural comparisons than any previous research.

Value of Cross-Cultural Comparisons


Cross-cultural comparisons provide researchers with a valuable
heuristic basis to test the external validity and generalizability of
their measures, theories, and models. Matsumoto (2001, p. 9)
argued that cultural differences challenge mainstream theoretical
notions about the nature of people and force us to rethink basic
theories of personality, perception, cognition, emotion, development, social psychology, and the like in fundamental and profound
ways. In their influential overview of cross-cultural research,
Segall, Lonner, and Berry (1998, p. 1102) stated that cross-cultural
researchs three complementary goals were
to transport and test our current psychological knowledge and perspectives by using them in other cultures; to explore and discover new
aspects of the phenomenon being studied in local cultural terms; and
to integrate what has been learned from these first two approaches in
order to generate more nearly universal psychology, one that has
pan-human validity.

Similarly, Sue (1999) argued that researchers have not taken


sufficient advantage of cross-cultural comparisons that allow researchers to test the external validity of their interpretations and to
gain insights about the applicability of their theories and models.
In cross-cultural research, there is an ongoing and growing
interest in the relation between culture and the self. This is an
inevitable consequence of the symbolic construction of self-image
by using the meaning system that is culture (Kashima, 1995).
Indeed, Singelis (2000) argued that this emphasis on self has been
the primary basis for other disciplines increasingly embracing
cross-cultural perspectives. However, there exists a schism between the overarching cultural relativist and universalist perspectives of cross-cultural research (Kagitcibasi & Poortinga, 2000).
The broad cultural relativist (idiographic, emic, indigenous, qualitative) perspective emphasizes the uniqueness of the individual
case that defies comparison. In contrast, the broad universalist
(nomothetic, etic, positivist, quantitative) perspective emphasizes

MARSH AND HAU

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

60

what is common between cultures with an emphasis on theoretical


predictions, replicability of methods, and empirical testing. Because neither of these apparently mutually exclusive metatheoretical positions is defensible in the extreme, there is a need to bridge
this dichotomy. Thus, for example, Kashima (1995) noted the need
for future research to identify universal as well as culture-specific
antecedents and consequences of self-conceptions. In an emerging
consensus on the cultural-mind relation, Kashima (2000) emphasized a system of reciprocal effects whereby agents generate
culture but are also shaped by culture. Similarly, Kitayama and
Markus (1999) argued for the mutual construction of culture and
self.
In their taxonomy of cross-cultural research, Van de Vijer and
Leung (2000) discussed generalizablity studies with a strong theoretical framework for generating testable hypotheses and an emphasis on the universality of structures and theoretical propositions. Within this context, they noted the need to use new multiplegroup structural equation modeling approaches that allow
researchers to make fine-grained comparisons of factor structures
and patterns of relations among multiple constructs in different
cultural groups. In this framework, there is a focus on similarities
as well as differences and an emphasis on the explanation of
observed differences. Because of the traditional focus on null
hypothesis testing, there is an unfortunate tendency to provide
elaborate interpretations for (sometimes very small) differences
and largely to ignore similarities that may argue for generalization
across cultures. From this perspective, cross-cultural research can
be seen as an extremely important variation on the traditional
multimethod approach to evaluation of construct validity. Van de
Vijer and Leung emphasized that the endemic problems of replicability in cross-culture research will improve with greater emphasis on theory development and testing coupled with the more
appropriate use of new statistical tools. When there are parallel
data from more than one groupthe multiple cultural groups in
cross-cultural researchit is possible to test the invariance of the
solution by requiring any one, any set, or all parameter estimates
to be the same in two or more groups. Byrne (2003) argued that
this analysis is particularly appropriate for making cross-cultural
comparisons.

Method
Data Source and Sample
The present investigation was based on the Program of Student Assessment (PISA) database compiled by the Organisation for Economic Cooperation and Development (OECD), which consists of nationally representative responses by 15-year-olds collected in 32 countries in the year
2000 (see Adams & Wu, 2002; OECD, 2001a, 2001b, for a description of
the database and variables). The PISA database was collected in response
to the need for internationally comparable evidence of student performance
and related competencies within a common framework that is internationally agreed on. Selection of the measures was made on the basis of advice
from substantive and statistical expert panels and results from extensive
pilot studies. Substantial efforts and resources were devoted to achieving
cultural and linguistic breadth in the assessment materials, stringent
quality-assurance mechanisms were applied in the translation of materials
into different languages, and data were collected under independently
supervised test conditions. Paper-and-pencil assessments consisted of a
combination of multiple-choice items and written responses. Whereas all
students completed some reading assessment items (which were the focus
of the 2000 data collection), only random samples of students completed

mathematics or science assessments. In addition, countries were given the


option of collecting materials on a Cross Curriculum Competencies (CCC)
questionnaire that included the academic self-concept items that are the
focus of the present investigation. A total of 26 of 32 participating countries chose to do so. Although the response rate in the Netherlands was
lower than recommended to ensure a nationally representative sample and
comparability with other countries (OECD, 2001b), the Netherlands was
included in the present investigation. The data for the other 25 countries
provided nationally representative samples of 15-year-old students in each
of these countries.
The present investigation was based on students who had both mathematics and reading achievement test scores and who completed the math
and verbal self-concept items (for a more detailed description of the
achievement tests and the self-concept items, see Adams & Wu, 2002).
Measures included three measures of reading achievement, a single measure of mathematics, three math self-concept items, and three verbal
self-concept items. The self-concept items were from the highly regarded
Self Description Questionnaire II (Byrne, 1996; also see Marsh, 1990,
1992, 1993). Although 97,384 students completed math and reading assessments, only 59,332 also completed any of the self-concept items and
55,577 had complete data for all 10 variables considered here (i.e., the
sample after listwise deletion for missing data, the basis of the present
investigation). As recommended in the database documentation (OECD,
2001a, 2001b), all analyses of the PISA data should be weighted to obtain
unbiased estimates of population parameters. For purposes of the present
investigation, the effective sample size for each country was set equal to
the number of cases for that country prior to weighting so that the weighted
sample size was the same as the unweighted sample size (i.e., the average
weight across all cases was 1.0; but also see Kaplan & Ferguson, 1999, and
Stapleton, 2002, for further discussion on relative and effective weighting).

Statistical Analysis
Structural equation models (SEMs) were conducted with LISREL 8
(Joreskog & Sorbom, 1993) using maximum likelihood estimation (for
further discussion of SEM, see Bollen, 1989; Byrne, 1998; Joreskog &
Sorbom, 1993). Following Marsh, Balla, and Hau (1996) and Marsh, Balla,
and McDonald (1988), we emphasize the Tucker-Lewis index (TLI), the
relative noncentrality index (RNI), and root-mean-square error of approximation (RMSEA) to evaluate goodness of fit, but also present the chisquare test statistic and an evaluation of parameter estimates. Whereas tests
of statistical significance and indices of fit aid in the evaluation of the fit
of a model, there is ultimately a degree of subjectivity and professional
judgment in the selection of a best model (Marsh, Balla, et al., 1988).
When there are parallel data from more than one groupthe 26 countries in this studyit is possible to test the invariance of the solution by
requiring any one, any set, or all parameter estimates to be the same in two
or more groups. Byrne (2003) argued that this analysis is particularly
appropriate for making cross-cultural comparisons. In applying this approach, there is a well-developed methodology in which the goodness of fit
of alternative models are compared, including the least restrictive model
that does not require any of the parameter estimates to be the same in
different groups and the most restrictive model that requires all parameter
estimates to be the same in the different groups (e.g., Byrne, 1998; Marsh,
1994a; but also see Cheung & Rensvold, 1999). In preliminary analyses,
we tested the a priori baseline model separately for each of the 26 groups
and found that the goodness of fit was excellent for each country considered separately (see subsequent discussion of Table 3). Typically, the
minimal condition for factorial invariance is the equivalence of all factor
loadings in the multiple groups, and this is one of the first tests of
invariance in the sequence. There is no clear consensus in recommendations about the ordering of subsequent invariance constraints (e.g., Bentler,
1988; Bollen, 1989; Byrne, 1998; Joreskog & Sorbom, 1993), although
Bentler (1988) and Byrne (1998) noted that the equality of parameters
associated with measurement errors is typically the least important hypothesis to test and is unlikely to be met in most applications. Whereas under

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

CROSS-CULTURAL GENERALIZABILITY IN 26 COUNTRIES


highly restrictive conditions it is possible to test for the statistical significance of differences between two nested models (e.g., models with and
without a particular set of invariance constraints), the concerns about
reliance on the chi-square test of statistical significance as a measure of fit
for a single model are even greater than for the comparison of two different
models. In applied research with real data, the null hypothesis of complete
invariance (no differences between groups in parameter estimates) is
always false and will lead to rejection of the null hypothesis when based on
a sufficiently large sample size. Hence, we again emphasize differences in
goodness-of-fit indexes, but also present the chi-square test statistic. More
important, however, in the present investigation, the four path coefficients
used to test the I/E model (Figure 1) are of critical importance and are the
focus of specific models designed to evaluate the invariance of these
parameters across the 26 countries. Hence, the comparison of results across
different countries when these path coefficients are not constrained to be
invariant is potentially more useful than goodness-of-fit indexes.
In preliminary analyses, traditional coefficient alpha estimates of reliability of scale scores were computed separately for each country. These
were consistently high across the 26 groups for reading achievement (M
.86, SD .03), math self-concept (M .88, SD .02), and verbal
self-concept (M .74, SD .07). Inspection of the item-total correlations
for each indicator (not shown) indicated that the one negatively worded
(reverse-scored) item in the verbal self-concept scale contributed less
positively to reliability than the other two positively worded verbal selfconcept items (all math self-concept items were positively worded) and
that this pattern of results was consistent across the 26 countries. In the
PISA database, mathematics achievement was represented by a single
score. Therefore, for purposes of structural equation models in the present
investigation, we set its reliability at .90, a conservative value in relation to
the corresponding reliability estimate for reading that provides a more
realistic estimate than assuming that mathematics achievement is measured
without error (see discussion by Joreskog & Sorbom, 1993).

Results and Discussion


We began with an evaluation of the results based on the total
group of 55,582 participants (Tables 1 and 2). The solution for this
total-group model (TG1) was well defined and the goodness of fit
(TLI .97; see Table 1) was very good. Most important, however,
were tests of the four path coefficients that were central to the
evaluation of support for the I/E model. As predicted, the two
horizontal paths relating math achievement to math self-concept
(.44) and relating reading achievement to verbal self-concept (.47)
were substantial and positive, whereas the two cross paths leading
from reading achievement to math self-concept (.20) and mathematics achievement to verbal self-concept (.26) were negative.
Also of relevance is the observation that the (zero-order) correlation between math and verbal achievement factors (r .78) was
very large, whereas the corresponding correlation between math
and verbal self-concept factors (r .10) was substantially lower.
Hence, results based on the total sample clearly support the main
predictions for the I/E model (see Figure 1).

Cross-Cultural Generalizability: Tests of Invariance Over


Countries
A critical question, for which these data are uniquely appropriate, is how well the results generalize across the 26 different
countries? In order to pursue this question, we conducted multigroup confirmatory factor analyses (CFAs) and SEMs in which we
constrained different parameters to be invariant across the 26
groups (Table 2). We began with a set of CFA models to evaluate
the invariance of the measurement component of the model and

61

Table 1
Parameter Estimates for Total-Group Solution (Model TG1) and
Multiple-Group Solution (MG15)
Total-group
solution
Factor

Multiple-group
solution

MAch VAch MSC VSC Uniq MAch VAch MSC VSC

Factor loadings
MAch
.95
.00 .00
VAch1
.00
.85 .00
VAch2
.00
.89 .00
VAch3
.00
.78 .00
MSC1
.00
.00 .84
MSC2
.00
.00 .85
MSC3
.00
.00 .83
VSC1
.00
.00 .00
VSC2
.00
.00 .00
VSC3
.00
.00 .00
Path coefficients
MAch
.00
.00 .00
VAch
.00
.00 .00
MSC
.44 .20 .00
VSC
.26
.47 .00
Variance
covariances
MAch
1.00
VAch
.78 1.00
MSC
.00
.00 .90
VSC
.00
.00 .11

.00
.00
.00
.00
.00
.00
.00
.55
.72
.83
.00
.00
.00
.00

.90

.10
.28
.22
.39
.29
.27
.30
.70
.48
.31

.94
.00
.00
.00
.00
.00
.00
.00
.00
.00

.00
.00
.00
.00
.84
.85
.83
.00
.00
.00

.00
.00
.00
.00
.00
.00
.00
.60
.71
.81

.00
.00 .00
.00
.00 .00
.48 .19 .00
.19
.45 .00

.00
.00
.00
.00

1.00
.76
.00
.00

.00
.83
.87
.77
.00
.00
.00
.00
.00
.00

1.00
.00 .87
.00 .04

.89

Note. All parameter estimates are present in completely standardized


form. The total-group (TG) solution is based on Model TG1 (Table 2) and
the multiple-group (MG) solution is based on MG15 (with invariant factor
loadings, path coefficients, and factor variance covariances, but freely
estimated uniquenesses for each of the 26 countries). MAch math
achievement; VAch verbal achievement; MSC math self-concept;
VSC verbal self-concept; Uniq uniqueness.

then focused specifically on structural equation models to evaluate


the I/E model.
In the baseline multiple-group model (MG1), no invariance
constraints were imposed and parameters for the a priori model
were fit separately to data from each country. The fit indexes for
this model (e.g., TLI .97) were very good. Separate analyses of
this baseline model conducted with each country indicated that the
fit was good in each of the 26 countries considered separately.
Thus, for example, the 26 RNIs varied from a low of .968 to a high
of .992 (M RNI .980; see subsequent discussion of Table 3). In
the first test of invariance (model MG2), factor loadings were
constrained to be equal across the 26 groups. Again, the fit indexes
were very good and differed little from those based on the totally
noninvariant solution (MG1). This supports the appropriateness of
the measures across the 26 groups and satisfies the minimum
requirement for factorial invariance. In each of the subsequent
CFA models (MG3MG6 in Table 2), the invariance of the factor
loadings was imposed in combination with the invariance of additional sets of parametersfactor variances, factor covariances,
and uniquenesses. Although the imposition of these added invariance constraints resulted in small decrements in fit, even the highly
restrictive Model MG6 of total invariance (i.e., requiring every
parameter to be the same in all 26 groups) provided a good fit to
the data that differed only slightly from Model MG1, which had no
invariance constraints. These results support the cross-cultural

MARSH AND HAU

62

Table 2
Goodness of Fit for I/E Model Fit to the Total Group and Multiple (Country) Groups
Model

df

RNI

TLI

RMSEA

Model description

Total sample
TG1

5,026.06

30

.98

.97

.05

Full I/E model

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Multiple-group CFA
MG1
MG2
MG3
MG4
MG5
MG6

5,784.36
7,650.34
9,846.64
9,070.34
12,515.47
18,513.56

780
930
1030
1005
1180
1405

.98
.97
.97
.97
.96
.93

.97
.97
.96
.96
.96
.95

.05
.06
.06
.06
.07
.08

CFA
CFA
CFA
CFA
CFA
CFA

INV
INV
INV
INV
INV
INV

none; Free FL, FV, FC, Uniq.


FL; Free FV, FC, Uniq.
FL, FV; Free FC, Uniq.
FL, FC; Free FV, Uniq.
FL, FC, FV; Free Uniq.
FL, FC, FV, Uniq. (total invariance)

SEM
SEM
SEM
SEM
SEM
SEM
SEM
SEM
SEM

INV
INV
INV
INV
INV
INV
INV
INV
INV

FL,
FL,
FL,
FL,
FL,
FL,
FL,
FL,
FL,

Multiple-group SEM
MG7
MG8
MG9
MG10
MG11
MG12
MG13
MG14
MG15

9,497.48
8,078.61
8,273.18
10,577.99
10,520.63
11,445.96
11,234.89
11,153.97
12,515.47

1030
980
980
1080
1080
1130
1130
1130
1180

.97
.97
.97
.96
.96
.96
.96
.96
.96

.96
.97
.97
.96
.96
.96
.96
.96
.96

.06
.06
.06
.06
.06
.07
.06
.06
.07

PC; Free FV, FC, Uniq.


PC; Free FV, FC, Uniq., PC
PC; Free FV, FC, Uniq., PC
FV, FC; Free PC, Uniq.
FC, PC; Free FV, Uniq.
FV, PC; Free FC, Uniq.
FV, FC, PC; Free PC, Uniq.
FV, FC, PC; Free PC, Uniq.
FV, FC, PC; Free Uniq.

Note. N 55,582. In Model TG1 (see parameter estimates in Table 1) the internal/external frame of reference (I/E) model was fit to the total group,
whereas for Models MG1MG13 the I/E model was fit separately for each of the 26 groups representing different countries. For Models MG2MG13, some
combination of parameters is required to be invariant across the 26 groups (countries). RNI relative noncentrality index; TLI TuckerLewis index;
RMSEA root-mean-square error of approximation; CFA confirmatory factor analysis; SEM structural equation model; FL factor loading; FC
factor covariances; FV factor variances; PC path coefficient; PC horizontal path coefficients predicted to be positive (see Figure 1); PC cross
path coefficients predicted to be negative (see Figure 1); Uniq. uniqueness; MG multiple group; TG total group; INV invariant; Free freely
estimated (not constrained to be invariant).

generalizability of the measures and the relations among them


across these 26 countries.
Models MG7MG15 focus specifically on the structural component of the modelthe path coefficients that are critical to tests
of predictions based on the I/E model (see Figure 1). In Model
MG7, the path coefficients and factor loadings were required to be
the same in each of the 26 groups. Although there was a very small
decrement in fit (TLI .96) relative to the model with only factor
loadings invariant (MG2), the fit was still very good. These tests of
the invariance of the four path coefficients provided a global test
of the invariance of the two path coefficients predicted to be
positive and the two path coefficients predicted to be negative. In
Model MG8, the horizontal (positive) path coefficients were freely
estimated in each group, whereas the cross (negative) path coefficients were required to be the same in all 26 groups. In Model
MG9, the negative path coefficients were freely estimated and the
positive paths were invariant across groups. In both models, the
goodness of fit improved a small amount (both TLIs are .97), but
the differences were small. These results demonstrated that the
magnitudesas well as the direction of the path coefficients
were consistent across the 26 different countries.
In Models MG10 MG12, we evaluated the effects on goodness
of fit associated with invariance constraints on factor variances and
factor covariances in the I/E model. Whereas these additional
invariance constraints produced some decrement in fit, even Model
MG15 (which required that all four path coefficients, all four
factor variances, and both factor covariances were the same across
the 26 countries) provided a good fit to the data.

In summary, even the extremely demanding model with complete invariance of all parameters provided a good fit to the data.
Because no one of these multiple-group models stood out as
clearly the best model, we evaluated parameter estimates based
on several of these models.

Cross-Cultural Generalizability: Evaluation of Parameter


Estimates
In order to evaluate further support for the cross-cultural generalizability of the results, we further evaluated parameter estimates based on Model MG15 (Table 1). Because factor loadings,
path coefficients, and factor variances and covariances were invariant (the same) across the 26 groups, it was only necessary to
present one set of parameter estimates (rather than separate sets of
parameter estimates for each of the 26 groups). Because uniqueness terms were not held invariant across groups in this model, the
26 separate sets of uniqueness terms were not presented in order to
conserve space (but see earlier discussion of reliability estimates;
also see Table 3). Of particular importance in this highly restrictive
multigroup model, MG15, were the cross (negative) paths leading
from reading achievement to math self-concept (.19) and from
mathematics achievement to verbal self-concept (.19). In addition
to providing global support for the I/E model, the invariance of
these parameter estimates provided remarkably strong support for
the cross-cultural generalizability of predictions based on the I/E
model.

CROSS-CULTURAL GENERALIZABILITY IN 26 COUNTRIES

63

Table 3
Reliability Estimates, Goodness-of-Fit Indexes, and Selected Parameter Estimates for Each Country
Reliability

Factor corr

Path coefficients
Goodness of fit

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Country

Total
55,582
1. Australia
2,642
2. Austria
2,380
3. BelgiumFlemish 1,962
4. Brazil
2,218
5. Czech Republic
2,698
6. Denmark
2,087
7. Finland
2,576
8. Germany
2,502
9. Hungary
2,550
10. Iceland
1,720
11. Ireland
2,041
12. Italy
2,678
13. Korea
2,705
14. Latvia
1,920
15. Liechtenstein
153
16. Luxembourg
1,441
17. Mexico
2,275
18. Netherlands
1,282
19. New Zealand
1,809
20. Norway
2,050
21. Portugal
2,378
22. Russia
3,398
23. Sweden
2,282
24. Switzerland
2,982
25. Scotland
1,211
26. United States
1,642
M
SD
Mdn
25th percentile
75th perecentile

VAch MSC VSC MAch

VAch
.87
.87
.87
.86
.81
.84
.87
.84
.87
.86
.86
.86
.86
.78
.88
.83
.88
.82
.84
.88
.88
.89
.84
.85
.88
.87
.89
.86
.03
.86
.84
.88

.88
.86
.88
.86
.85
.85
.86
.93
.90
.87
.91
.87
.88
.89
.85
.85
.88
.83
.89
.89
.90
.87
.87
.88
.88
.88
.86
.88
.02
.88
.86
.89

.74
.78
.81
.71
.63
.75
.77
.80
.81
.67
.78
.79
.81
.68
.66
.76
.75
.55
.74
.80
.74
.73
.67
.76
.76
.82
.76
.74
.07
.76
.70
.79

.76*
.77*
.76*
.81*
.69*
.74*
.78*
.71*
.79*
.76*
.75*
.79*
.73*
.77*
.61*
.76*
.74*
.76*
.84*
.80*
.73*
.79*
.68*
.81*
.76*
.82*
.84*
.76
.05
.74
.76
.79

VSC
MSC

From MAch From VAch From MAch From VAch


to MSC
to MSC
to VSC
to VSC

.06*
.08*
.07*
.11*
.14*
.08*
.10*
.28*
.12*
.08*
.31*
.11*
.06*
.13*
.09*
.06
.01
.52*
.07*
.07*
.14*
.06*
.29*
.18*
.20*
.12*
.11*
.06
.17
.08
.07
.14

.48*
.41*
.47*
.34*
.23*
.51*
.65*
.70*
.62*
.43*
.68*
.53*
.49*
.58*
.37*
.41*
.40*
.14*
.89*
.80*
.72*
.55*
.22*
.58*
.54*
.69*
.42*
.51
.18
.52
.41
.66

.19*
.16*
.25*
.24*
.07
.22*
.18*
.06
.45*
.15*
.09*
.22*
.21*
.19*
.20*
.27*
.30*
.03
.75*
.38*
.17*
.28*
.12*
.20*
.39*
.27*
.15*
.22
.16
.20
.27
.15

.19*
.19*
.26*
.20*
.08
.17*
.16*
.04
.41*
.17*
.08
.44*
.35*
.04
.14*
.37*
.27*
.10*
.25*
.53*
.20*
.18*
.16*
.08*
.28*
.38*
.20*
.21
.14
.20
.30
.13

.45*
.39*
.56*
.26*
.25*
.48*
.48*
.46*
.60*
.49*
.40*
.47*
.62*
.48*
.47*
.55*
.57*
.20*
.34*
.68*
.59*
.48*
.49*
.37*
.42*
.45*
.55*
.47
.12
.48
.40
.55

2(30)
12,515.47
181.96
181.57
178.97
256.04
395.87
350.22
257.61
185.00
318.31
194.20
146.14
199.84
115.65
212.20
50.11
152.48
184.50
103.33
198.64
312.61
334.03
446.98
243.33
274.55
85.45
224.79
222.47
95.27
199.24
172.35
284.06

RNI TLI
.957
.988
.987
.982
.970
.968
.969
.984
.989
.974
.982
.988
.987
.992
.978
.969
.982
.980
.988
.983
.974
.974
.971
.980
.983
.991
.977
.980
.007
.982
.974
.987

.957
.982
.981
.974
.955
.952
.953
.976
.983
.961
.973
.981
.981
.988
.966
.954
.973
.970
.981
.975
.961
.961
.957
.971
.975
.987
.965
.971
.011
.973
.961
.981

Note. All parameter estimates are present in completely standardized form. Factor correlations for each country are based on MG3 (Table 2), and the path
coefficients are based on MG10 (Table 2). The total results based on all 26 countries are based on Model MG15 (Table 1; also see Table 2) in which only
uniquenesses were allowed to differ from country to country. The a priori model (see Figure 1) was fit separately to responses from each country, and the
total sample and goodness of fit for each analysis is summarized by the chi-square test statistic, relative noncentrality index (RNI), and the TuckerLewis
index (TLI). MAch math achievement; VAch verbal achievement; MSC math self-concept; VSC verbal self-concept; corr correlation; MG
multiple group.
* p .05.

Of critical importance to the present investigation were the four


path coefficients relating the two achievement test scores to the
corresponding self-concept measures. Although the fit was good
for models that required these path coefficients to be the same
across the 26 countries, this highly restrictive model produced a
small decrement in fit. In order to evaluate the extent of variation
in different countries, path coefficients from Model MG10 (which
allowed the path coefficients to be estimated separately in each
country) are presented for all 26 countries in Table 3.
Horizontal (positive) paths from math achievement to math
self-concept and from verbal achievement to verbal self-concept
were predicted to be substantial and positive. All 52 of these path
coefficients were statistically significant and positive. The means
of the two sets of path coefficients were .51 (SD .18) and .47
(SD .12), respectively. Cross (negative) paths from math
achievement to verbal self-concept and from verbal achievement
to math self-concept were predicted to be negative and less substantial. Across these 52 path coefficients, one was small but

significantly positive (.12), seven were nonsignificant, and the


remaining 44 were significantly negative. The means of the two
sets of path coefficients were .22 (SD .16) and .21 (SD .14),
respectively. These results for horizontal and cross paths were
similar to those based on the total sample and those based on
Model MG15 in which path coefficients were required to be the
same across the 26 groups (see Table 1). In all 26 countries, the
absolute sizes of the cross (negative) paths were consistently much
smaller than the systematically larger horizontal (positive) paths.
An important feature of the relations between multidimensional
achievement scores and multidimensional academic self-concept
scores was that academic self-concept scores were substantially
less highly correlatedmore highly differentiatedthan the corresponding academic achievement scores. More specifically, the
I/E model predicts that correlations between math and verbal
achievement scores should be substantial and substantially larger
than those between math and verbal self-concept. Although support for the I/E model does not require math and verbal self-

64

MARSH AND HAU

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

concept measures to be uncorrelated, much of the research reviewed earlier has found these two self-concept scores to be nearly
uncorrelated. In order to evaluate the cross-cultural generalizability of this pattern of results, the two correlations were presented
separately for each country in Table 3 (based on results from
Model MG3 in which factor loading and factor variances were
invariant across countries, but factor correlations were not). Consistent with a priori predictions, in every country the correlation
between the two self-concept scores (M .06, SD .17) was
consistently much smaller than the correlation between the two
achievement scores (M .76, SD .07).

Potential Limitations of the Present Investigation


The present investigation, because of the strength of the PISA
data, is probably the strongest cross-cultural study of academic
self-concept ever undertaken and certainly the strongest crosscultural study of relations between academic self-concept and
academic achievement posited by the I/E model. Particular
strengths include the large, nationally representative samples from
each of 26 countries, the careful selection and construction of
measures based on advice from substantive and statistical expert
panels appointed by the OECD, the careful translation of materials
in each of the participating countries, the precise nature of a priori
predictions based on a strong theoretical model, and the application of a sophisticated test of invariance of parameter estimates
resulting from the structural equation models applied to data from
each of the 26 countries.
There are, however, some potentially important limitations in
our statistical analyses that dictate caution in the interpretation of
the results. The PISA study has a three-level design (Level 1
students, Level 2 school, Level 3 country), but we chose to
ignore Level 2. We justified this decision in several ways. First, all
of our variables were measured at the individual student level and
our substantive focus was on the country level. Second, for the
PISA dataas is the case more generally (see Marsh & Rowe,
1996)there was very little variation in self-concept responses at
the school level even though there was substantial variation at the
school level for achievement scores.1 Because essentially all of the
parameter estimates of interest in the present investigation involved relations (correlations or path coefficients) with one or the
other of the self-concept variables, parameter estimates and their
standard errors were not likely to be substantially affected by
ignoring the school level (see related discussion by Marsh &
Rowe, 1996). Finally, although there is rapid development of
statistical packages in this area, we are aware of no commercially
available packages that would allow us to evaluate a three-level
model, use multiple indicators to infer latent variables, and to test
the invariance of the factor structure across the 26 countries.
There is also a potential concern about missing data in our
analysesparticularly for the self-concept responses that were our
main focus. A substantial portion of students did not have both
math and verbal test scores because of the design of the PISA
study (in which students were given different achievement tests at
random). However, our use of listwise deletion is appropriate
under these circumstances in which missing values are determined
randomly by design. In addition, there were entire countries or
regions within countries (e.g., the non-Flemish part of Belgium
and all of the United Kingdom except for Scotland) that chose not
to participate in the CCC component of PISA that contained the

self-concept items, and these countries were excluded from our


analyses. Within some of the countries, however, there were entire
schools that chose not to participate in the survey component of
PISA that contained the self-concept items and these were also
excluded from the present investigation. In some cases, these
represented entire regions (e.g., only the Flemish part of Belgium
participated and in the United Kingdom only Scotland participated
in the CCC option of PISA), which were excluded. However, for
students who completed any of the self-concept items, there were
very few missing responses (1.6%). Whereas use of listwise deletion for missing data was a reasonable strategy for students who
had some nonmissing self-concept responses given the very
small amount of missing data, the exclusion of entire schools that
did not participate in the CCC option of PISA may compromise the
representativeness of the sample.
It could be argued that a majority of the countries are western
countries; that the sample included only one country from Central
America (Mexico), South America (Brazil), and Asia (Korea); and
that the sample included no African countries at all. Hence, it is
relevant to focus on results from Mexico, Brazil, and Korea. In all
three countries, the correlation between math and verbal achievement factors was substantially larger than the corresponding correlation between math and verbal self-concept factors, thus supporting the I/E model. However, the correlation between the two
academic self-concept factors in one of these countriesMexico
(r .52)was clearly larger than in the other 25 countries. In all
three countries, paths between matching achievement and selfconcept factors (the horizontal paths in Figure 1) were positive,
thus supporting the I/E model. However, in Brazil and Mexico,
these paths were smaller than in most of the other countries. In all
three countries, all of the critical paths between nonmatching
self-concept and achievement factors (the diagonal, cross paths in
Figure 1) were negative, thus supporting the I/E model. However,
only two of these six paths were statistically significant (i.e., four
are negative, but do not differ significantly from zero). Whereas
results based on these three countries provided some support for
the I/E model, the strength of this support was weaker than for the
other countries. It is, however, also important to emphasize that
there is strong support for the I/E model in non-Western countries
summarized earlier, particularly in Chinese research (Dai, 2001,
2002; Kong, 2000; Lee et al., 2000; Marsh et al., 2001; Yeung &
Lau, 1998; Yeung & Lee, 1999). In summary, whereas the results
of the present investigation provide strong support for the I/E
model, there is a need to further test the generalizability of the
results in non-Western countriesparticularly those in Asia, South
America, Central America, and Africain order to more fully
evaluate any claims that support for the I/E model is universal.
More generally, the I/E model is likely to be dependent on the
existence of a formal school system that emphasizes math and
verbal school subjects and would not be likely to be supported in
cultures where there was no formal school system.
We conducted a three-level (Level 1 individual student, Level 2
school, Level 3 country) variance components model for math achievement, verbal achievement, math self-concept, and verbal self-concept. The
amount of variance explained by school was 5% and 3% for verbal and
math self-concepts, respectively, but 21% and 27% for math and verbal
achievement scores, respectively.
1

CROSS-CULTURAL GENERALIZABILITY IN 26 COUNTRIES

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Summary and Implications


The extreme domain specificity of academic self-concepts that
led to the development of the I/E model and was demonstrated so
convincingly in the present investigation has important implications for any lingering debates about the relative importance of
unidimensional and multidimensional perspectives of self-concept.
Clearly, the relation between self-concepts in particular academic
areas and corresponding areas of academic achievement cannot be
adequately understood if researchers rely solely on global measures of general self-concept or self-esteem. Indeed, our results
demonstrate that not even global measures of academic selfconcept are sufficient to understand the interplay between selfperceptions in different academic domains that is the basis of the
internal comparison process in the I/E model.
The present investigation also has potentially important theoretical implications for social comparison theory. The theoretical
basis for the I/E model extends social comparison theory, positing
an internal comparison process in addition to the more typical
external comparison process. Specifically, students not only use
the performances of other students to form their self-concepts in a
particular school subject (the external comparison process), they
also use their own performances in other school subjects as a
second basis of comparison (the internal comparison process).
Although there is clear support for this pattern of results in relation
to academic achievements and academic self-concepts, it is also
relevant to ask whether similar frame-of-reference effects exist in
other areas as well. We suggest that the implications probably have
much broader generality. To illustrate this broader generality with
a hypothetical example, it is relevant to consider two athletes: (a)
a weekend sports enthusiast who is reasonably good at golf, tennis,
and a variety of other sports, but who is best at golf (with a
handicap of 10) and (b) a professional tennis player who is also a
good golfer (with a handicap of 2). Asked how good they were at
golf, it would be reasonable for the professional tennis player to
say pretty good (because she is so much better at tennis),
whereas the weekend sports enthusiast might say good (because
golf is her best sport). Objectively, the professional tennis player is
a better golfer, but if asked to complete self-concept for golf and
tennis, the weekend sports enthusiast may have as high or even a
higher self-concept of golf than the professional athlete. As illustrated by this example, the critical feature of the internal comparison frame of reference is the use of accomplishments in one arena
as a basis of comparison for evaluating accomplishments in another arena.
Marsh and Roche (1996) applied this logic in the evaluation of
performing arts (PA) self-concepts in dance, music, and drama.
For students not attending a PA school and non-PA students in a
PA school, there were modest, positive correlations among the
four PA self-concept scales. PA students specializing in dance,
music, or drama within a PA high school, however, were likely to
compare their relative competencies in the different PA domains in
forming their PA self-concepts in each domain as well as making
comparisons with others. Thus, for example, PA dance students
who were also good at drama and music were likely to have lower
music and drama self-concepts than non-PA students who were
equally able in music and dramanot because PA dance students
were less skilled at music and drama than non-PA students, but
because their music and drama skills were not nearly as good as
their dance skills. Consistent with a priori predictions based on the

65

I/E model, Marsh and Roche found that (a) dance, music, and
drama self-concepts were positively correlated for non-PA students, but uncorrelated for PA students and (b) PA students had
high self-concepts in their specialty area, but had much lower
self-concepts in their nonspecialty areaslower even than non-PA
students who did not specialize in any areas of PA.
The extreme domain specificity of academic self-concepts that
led to the development of the I/E model also has practical implications for teachers and parents and for educational practice.
Teachers, in order to understand the academic self-concepts of
their students in different content areas, must understand the implications of the I/E model. When teachers were asked to infer the
self-concepts of their students (see discussion by Marsh & Craven,
1997), their responses reflected primarily the external comparison
process so that teachers inferences were not nearly so domain
specific as responses by their students; students who were bright in
one area tended to be seen as having good academic self-concepts
in all areas, whereas students who were not bright in one area were
seen as having poor academic self-concept in all areas. Similarly,
Dai (2002) reported that inferred self-concept ratings by parents
reflected primarily the external comparison process typically emphasized in social comparison research, but not the internal comparison process that is the unique feature of the I/E model. In
contrast to inferred self-concept ratings by significant others
(teachers and parents), students academic self-concepts in different domains are extremely differentiated. Hence, understanding
the implications of the I/E model will allow significant others to
better understand children and to infer childrens self-concepts
more accurately. Thus, for example, our results demonstrate that
even bright students may have an average or below-average selfconcept in their weakest school subject that may seem paradoxical,
in relation to their good achievement (good relative to other
students, but not to their own performance in other school subjects). Similarly, even poor students may have an average or
above-average self-concept in their best school subject that may
seem paradoxical in relation to their below-average achievement in
that subject. Particularly for poorer students, understanding these
principles should assist teachers and parents in giving positive
feedback that is credible to students.
In summary, there exists a very strong and growing body of
support for the I/E model. In evaluating this support, it is important
to establish the limits of the models generalizability. Tests of the
cross-cultural support for predictions from a theoretical model
developed in one culture to another culture provide an important
basis for testing this generalizability. Importantly, previous tests of
the I/E model have been conducted primarily in western countries
and, typically, in those where the native language is English.
Although there exists support for the I/E predictions in some
non-Western countriesparticularly ChinaI/E studies typically
have been based on ad hoc samples within a single country and had
idiosyncratic design features that hindered comparisons across
different countries. From this respect, the results of the present
investigation based on large, representative samples of students
from 26 different countries and common materialsprovide a
much stronger test of the cross-cultural generalizability of predictions based on the I/E model than any previous research. Because
there was good support both for the predictions based on the I/E
model and for the generalizability of these results across the 26
countries, the results clearly support the construct validity of the
I/E model and its cross-cultural generalizability. Although it may

MARSH AND HAU

66

be premature to claim that the predicted pattern of results is


universal, the results of our research clearly extend the crosscultural generalizability of support for the I/E model.

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

References
Abu-Hilal, M. M. (2002, April). Frame of reference model of self-concept
and locus of control: A cross gender study in the United Arab Emirates.
Paper presented at the 87th Annual Convention of the American Educational Research Association, New Orleans, LA.
Abu-Hilal, M. M., & Bahri, T. M. (2000). Self concept: The generalizability of research on the SDQ, Marsh/Shavelson model and I/E frame of
reference model to the United Arab Emirates students. Social Behavior
& Personality, 28, 309 322.
Adams, R., & Wu, M. (2002). PISA 2000 Technical Report. Retrieved May
5, 2003, from Organisation for Economic Co-Operation and Development Web site: http://www.pisa.oecd.org/tech/intro.htm
Bentler, P. M. (1988). Theory and implementation of EQS. A structural
equations program. Los Angeles: BMDP Statistical Software.
Bollen, K. (1989). Structural equations with latent variables. New York:
Wiley.
Bong, M. (1998). Tests of the internal/external frames of reference model
with subject-specific academic self-efficacy and frame-specific academic self-concepts. Journal of Educational Psychology, 90, 102110.
Brown, J. D. (1993). Self-esteem and self-evaluation: Feeling is believing.
In J. Suls (Ed.), Psychological perspectives on the self (Vol. 4, pp.
59 98). Hillsdale, NJ: Erlbaum.
Byrne, B. (1996). Measuring self-concept across the life span: Issues and
instrumentation. Washington, DC: American Psychological Association.
Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS,
and SIMPLIS: Basic concepts, applications and programming. Mahwah,
NJ: Erlbaum.
Byrne, B. M. (2003). Measuring self-concept across culture: Issues, caveats, and practice In H. W. Marsh, R. G. Craven, & D. McInerney (Eds.),
International advances in self research (Vol. 1, pp. 291313). Greenwich, CT: Information Age.
Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance
across groups: A reconceptualization and proposed new method. Journal
of Management, 25, 127.
Covington, M. V. (1992). Making the grade: A self-worth perspective on
motivation and school reform. New York: Cambridge University Press.
Dai, D. Y. (2001). A comparison of gender differences in academic
self-concept and motivation between high-ability and average Chinese
adolescents. Journal of Secondary Gifted Education, 13, 2232.
Dai, D. Y. (2002). Incorporating parent perceptions: A replication and
extension study of the internal external frame of reference model of
self-concept development. Journal of Adolescent Research, 17, 617
645.
Diener, E., & Fujita, F. (1997). Social comparison and subjective wellbeing. In B. P. Buunk & F. X. Gibbons (Eds.), Health, coping, and
well-being: Perspectives from social comparison theory (pp. 329 358).
Mahwah, NJ: Erlbaum.
Festinger, L. (1954). A theory of social comparison processes. Human
Relations, 7, 117140.
Goethals, G. R., & Darley, J. M. (1987). Social comparison theory:
Self-evaluation and group life. In B. Mullen & G. R. Goethals (Eds.),
Theories of group behavior (pp. 21 47). New York: Springer-Verlag.
Joreskog, K. G., & Sorbom, D. (1993). LISREL 8: Structural equation
modeling with the SIMPLIS command language. Chicago: Scientific
Software International.
Kagitcibasi, C., & Poortinga, Y. H. (2000). Cross-cultural psychology:
Issues and overarching themes. Journal of Cross-Cultural Psychology,
31, 129 147.
Kaplan, D., & Ferguson, A. J. (1999). On the utilization of sample weights
in latent variable models. Structural Equation Modeling, 6, 305321.

Kashima, Y. (1995). Introduction to the special section on culture and self.


Journal of Cross-Cultural Psychology, 26, 603 605.
Kashima, Y. (2000). Conceptions of culture and person for psychology.
Journal of Cross-Cultural Psychology, 31, 14 32.
Kitayama, S., & Markus, H. R. (1999). Yin and yang of Japanese self: The
cultural psychology of personality coherence. In D. Cervone & Y. Shoda
(Eds.), The coherence of personality: Social cognitive bases of personality consistency, variability, and organization (pp. 242302). New
York: Guilford Press.
Kong, C. (2000). Chinese students self-concept: Structure, frame of reference, and relation with academic achievement. Dissertation Abstracts
International Section A: Humanities and Social Sciences, 61(3-A), 880.
Lee, M. F., Yeung, A. S., Low, R., & Jin, P. (2000). Academic self-concept
of talented students: Factor structure and applicability of the internal/
external frame of reference model. Journal for the Education of the
Gifted, 23, 343367.
Marsh, H. W. (1986). Verbal and math self-concepts: An internal/external
frame of reference model. American Educational Research Journal, 23,
129 149.
Marsh, H. W. (1989). Sex differences in the development of verbal and
math constructs: The High School and Beyond study. American Educational Research Journal, 26, 191225.
Marsh, H. W. (1990). A multidimensional, hierarchical self-concept: Theoretical and empirical justification. Educational Psychology Review, 2,
77172.
Marsh, H. W. (1992). Self Description Questionnaire, II. Sydney, New
South Wales, Australia: University of Western Sydney. (Original work
published 1990)
Marsh, H. W. (1993). Academic self-concept: Theory measurement and
research. In J. Suls (Ed.), Psychological perspectives on the self (Vol. 4,
pp. 59 98). Hillsdale, NJ: Erlbaum.
Marsh, H. W. (1994a). Confirmatory factor analysis models of factorial
invariance: A multifaceted approach. Structural Equation Modeling, 1,
534.
Marsh, H. W. (1994b). Using the National Educational Longitudinal Study
of 1988 to evaluate theoretical models of self-concept: The SelfDescription Questionnaire. Journal of Educational Psychology, 86,
439 456.
Marsh, H. W., Balla, J. R., & Hau, K. T. (1996). An evaluation of
incremental fit indices: A clarification of mathematical and empirical
processes. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced
structural equation modeling: Issues and techniques (pp. 315353).
Hillsdale, NJ: Erlbaum.
Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit
indexes in confirmatory factor analysis: The effect of sample size.
Psychological Bulletin, 103, 391 410.
Marsh, H. W., Byrne, B. M., & Shavelson, R. (1988). A multifaceted
academic self-concept: Its hierarchical structure and its relation to academic achievement. Journal of Educational Psychology, 80, 366 380.
Marsh, H. W., & Craven, R. (1997). Academic self-concept: Beyond the
dustbowl. In G. Phye (Ed.), Handbook of classroom assessment: Learning, achievement, and adjustment (pp. 131198). Orlando, FL: Academic Press.
Marsh, H. W., & Hattie, J. (1996). Theoretical perspectives on the structure
of self-concept. In B. A. Bracken (Ed.), Handbook of self-concept (pp.
38 90). New York: Wiley.
Marsh, H. W., & Koller, O. (2003). Unification of two theoretical models
of relations between academic self-concept and achievement. In H. W.
Marsh, R. G. Craven, & D. McInerney (Eds.), International advances in
self research (Vol. 1, pp. 17 47). Greenwich, CT: Information Age.
Marsh, H. W., Kong, C.-K., & Hau, K.-T. (2000). Longitudinal multilevel
models of the big-fishlittle-pond effect on academic self-concept:
Counterbalancing contrast and reflected-glory effects in Hong Kong
high schools. Journal of Personality and Social Psychology, 78, 337
349.

This document is copyrighted by the American Psychological Association or one of its allied publishers.
This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

CROSS-CULTURAL GENERALIZABILITY IN 26 COUNTRIES


Marsh, H. W., Kong, C., & Hau, K. (2001). Extension of the internal/
external frame of reference model of self-concept formation: Importance
of native and nonnative languages for Chinese students. Journal of
Educational Psychology, 93, 543553.
Marsh, H. W., & Parker, J. (1984). Determinants of student self-concept:
Is it better to be a relatively large fish in a small pond even if you dont
learn to swim as well? Journal of Personality and Social Psychology, 47,
213231.
Marsh, H. W., & Roche, L. A. (1996). Structure of artistic self-concepts for
performing arts and non-performing arts students in a performing arts
high school: Setting the stage with multigroup confirmatory factor
analysis. Journal of Educational Psychology, 88, 461 477.
Marsh, H. W., & Rowe, K. J. (1996). The negative effects of schoolaverage ability on academic self-conceptAn application of multilevel
modelling. Australian Journal of Education, 40, 65 87.
Marsh, H. W., & Yeung, A. S. (1998). Longitudinal structural equation
models of academic self-concept and achievement: Gender differences
in the development of math and English constructs. American Educational Research Journal, 35, 705738.
Marsh, H. W., & Yeung, A. S. (2001). An extension of the internal/external
frame of reference model: A response to Bong (1998). Multivariate
Behavioral Research, 36, 389 420.
Marshall, H. H., & Weinstein, R. S. (1984). Classroom factors affecting
students self-evaluations. Review of Educational Research, 54, 301
326.
Matsumoto, D. (2001). Cross-cultural psychology in the 21st century. In
J. S. Halonen & S. F. Davis (Eds.), The many faces of psychological research in the 21st century (chap. 5). Retrieved December 18,
2001, from the Teaching of Psychology Web site: http://teachpsych
.lemoyne.edu/teachpsych/faces/script/ch05.htm.
Moeller, J., & Koller, O. (2001a). Dimensional comparisons: An experimental approach to the internal/external frame of reference model.
Journal of Educational Psychology, 93, 826 835.
Moeller, J., & Koller, O. (2001b). Frame of reference effects following the
announcement of exam results. Contemporary Educational Psychology,
26, 277287.
Morse, S., & Gergen, K. J. (1970). Social comparison, self-consistency,
and the concept of self. Journal of Personality and Social Psychology,
16, 148 156.
Organisation for Economic Co-operation and Development. (2001a).
Knowledge and skills for life: Results from the first OECD programme
for international student assessment (PISA) 2000. Paris, France: Author.
Organisation for Economic Co-operation and Development. (2001b). PISA
international data base (Tech. Rep.). Paris, France: Author.
Plucker, J. A., & Stocking, V. B. (2001). Looking outside and inside:
Self-concept development of gifted adolescents. Exceptional Children,
67, 535548.

67

Segall, M. H., Lonner, W. J., & Berry, J. W. (1998). Cross-cultural


psychology as a scholarly discipline: On the flowering of culture in
behavioral research. American Psychologist, 53, 11011110.
Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Validation of
construct interpretations. Review of Educational Research, 46, 407 441.
Singelis, T. M. (2000). Some thoughts on the future of cross-cultural social
psychology. Journal of Cross-Cultural Psychology, 31, 76 91.
Skaalvik, E. M., & Rankin, R. J. (1995). A test of the internal/external
frame of reference model at different levels of math verbal selfperception. American Educational Research Journal, 32, 161184.
Skaalvik, E. M., & Valas, H. (2001). Achievement and self-concept in
mathematics and verbal arts: A study of relations. In R. J. Riding & S. G.
Rayner (Eds.), Self perception (pp. 221238). Westport, CT: Ablex.
Stapleton, L. M. (2002). The incorporation of sample weights into multilevel structural equation models. Structural Equation Modeling, 9, 475
502.
Sue, S. (1999). Science, ethnicity and bias: Where have we gone wrong?
American Psychologist, 54, 1070 1077.
Suls, J. M. (1977). Social comparison theory and research: An overview
from 1954. In J. M. Suls & R. L. Miller (Eds.), Social comparison
processes: Theoretical and empirical perspectives (pp. 120). Washington, DC: Hemisphere.
Suls, J. (1993). Preface. In J. Suls (Ed.), Psychological perspectives on the
self (Vol. 4, pp. ixxi). Hillsdale, NJ: Erlbaum.
Van de Vijer, F. J. R., & Leung, K. (2000). Methodological issues in
psychological research on culture. Journal of Cross-Cultural Psychology, 31, 3351.
Wells, L. E., & Marwell, G. (1976). Self-esteem: Its conceptualization and
measurement. Beverly Hills, CA: Sage.
Williams, J. E., & Montgomery, D. (1995). Using frame of reference
theory to understand the self-concept of academically able students.
Journal for the Education of the Gifted, 18, 400 409.
Wylie, R. C. (1979). The self-concept (Vol. 2). Lincoln: University of
Nebraska Press.
Yeung, A. S., & Lau, I.-C.-K. (1998, July). The internal/external frame of
reference in the self-concept development of higher education students.
Paper presented at the Conference of the Higher Education Research and
Development Society of Australasia, Auckland, New Zealand. (ERIC
Document Reproduction Service No. ED423772)
Yeung, A. S., & Lee, F. L. (1999). Self-concept of high school students in
China: Confirmatory factor analysis of longitudinal data. Educational &
Psychological Measurement, 59, 431 450.

Received March 7, 2003


Revision received July 24, 2003
Accepted July 28, 2003