0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

6 visualizzazioni12 pagineExplaining Paradoxical Relations Between Academic Self-Concepts and Achievements

Oct 21, 2016

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

Explaining Paradoxical Relations Between Academic Self-Concepts and Achievements

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

6 visualizzazioni12 pagineExplaining Paradoxical Relations Between Academic Self-Concepts and Achievements

© All Rights Reserved

Sei sulla pagina 1di 12

0022-0663/04/$12.00 DOI: 10.1037/0022-0663.96.1.56

This document is copyrighted by the American Psychological Association or one of its allied publishers.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Achievements: Cross-Cultural Generalizability of the Internal/External

Frame of Reference Predictions Across 26 Countries

Herbert W. Marsh

Kit-Tai Hau

The internal/external frame of reference (I/E) model explains a seemingly paradoxical pattern of relations

between math and verbal self-concepts and corresponding measures of achievement, extends social

comparison theory, and has important educational implications. In a cross-cultural study of nationally

representative samples of 15-year-olds from 26 countries (total N 55,577), I/E predictions were

supported in that (a) math and verbal achievements were highly correlated, but math and verbal

self-concepts were nearly uncorrelated; (b) math achievement had positive effects on math self-concept,

but negative effects on verbal self-concept; and (c) verbal achievement had positive effects on verbal

self-concept, but negative effects on math self-concept. Supporting the cross-cultural generalizability of

predictions, multigroup structural equation models demonstrated good support for the generalizability of

results across 26 countries participating in the Programme for International Student Assessment project

sponsored by the Organisation for Economic Co-operation and Development.

evident in diverse settings, including social psychology, personality, education, child development, mental and physical health,

social services, organizations, industry, and sport. For example,

educational policy statements throughout the world list selfconcept enhancement as a central goal of education and an important vehicle for addressing social inequities experienced by disadvantaged groups. In self-concept research, support for the

construct validity of major instruments and the main theoretical

models has been based largely on responses by students from

Western countriesparticularly English-speaking students in the

United States, Australia, and Canada.

The purpose of the present investigation is to evaluate the

cross-cultural generalizability of relations between math and verbal self-concepts with math and verbal achievements. The internal/

external frame of reference (I/E) model was developed to explain

what initially seemed to be paradoxical patterns of relations between math and verbal self-concepts and corresponding areas of

achievement. In the present investigation, we evaluate the crosscultural generalizability of predictions based on the I/E model,

using a large cross-national sample consisting of large, nationally

representative samples of 15-year-olds from 26 countries.

Relation to Achievement

Historically, self-concept measurement, theory, research, and

application emphasized a largely atheoretical, global component of

self-concept, and reviewers noted the lack of theoretical models for

defining and interpreting the construct (e.g., Shavelson, Hubner, &

Stanton, 1976; Wells & Marwell, 1976; Wylie, 1979). In an

attempt to remedy this situation, Shavelson et al. (1976) reviewed

existing research and self-concept instruments and provided a

theoretical definition and model of self-concept that has had a

profound influence on subsequent research (see review by Marsh

& Hattie, 1996). In the Shavelson et al. model, self-concept is

posited to be a multidimensional, hierarchical construct. Global

self-concept, at the apex of the hierarchy, is divided into nonacademic (e.g., social, physical, emotional) and academic components. Of particular relevance to the present investigation, academic self-concept is divided into self-concepts in particular

content areas such as math and verbal self-concepts. Support for

the construct validity of self-concept interpretations and its multidimensionality requires that (a) academic outcomes are more

highly correlated with academic self-concept than with global and

nonacademic components of self-concept, and (b) achievement in

particular domains is more highly correlated with academic selfconcepts in the matching domain (e.g., math achievement and

math self-concept) than self-concepts in nonmatching domains

(e.g., math achievement and general or verbal self-concept).

Stimulated in part by the Shavelson et al. (1976) model and

subsequent research (see Marsh & Hattie, 1996), there is an

ongoing debate about the relative usefulness of unidimensional

perspectives that emphasize a single, relatively unidimensional,

global domain of self-concept (sometimes referred to as selfesteem) and multidimensional perspectives based on multiple,

relatively distinct components of self-concept. For example, Suls

Herbert W. Marsh, SELF Research Centre, University of Western Sydney, Sydney, New South Wales, Australia; Kit-Tai Hau, The Chinese

University of Hong Kong, Hong Kong, China.

This research was funded in part by grants from the Australian Research

Council. We thank Codula Artelt, Jurger Bamert, Oliver Ludtke, Ken

Rowe, Wolfram Schulz, and Ulrich Trautwein for comments and suggestions on earlier versions of this article.

Correspondence concerning this article should be addressed to Herbert

W. Marsh, SELF Research Centre, University of Western Sydney, Bankstown Campus, Locked Bag 1797 Penrith South, Sydney, New South Wales

1797, Australia. E-mail: h.marsh@uws.edu.au

56

chapters written by Marsh (1993) and Brown (1993) that appeared

in his monograph, concluding, both Brown and Marsh, who cite

strong support for their viewpoints, cannot be right; or, at minimum, a new integrative theory is needed to reconcile the two

approaches (p. x). Marsh and Craven (1997, p. 191) subsequently

argued that

57

comparison theory was not able to explain the seemingly paradoxical pattern of relations he found between math and verbal selfconcept measures and corresponding measures of math and verbal

achievement, the focus of the present investigation. To explain

these results, he developed the I/E model that is an extension of

social comparison theory.

This document is copyrighted by the American Psychological Association or one of its allied publishers.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

If the role of self-concept research is to better understand the complexity of self in different contexts, to predict a wide variety of

behaviors, to provide outcome measures for diverse interventions, and

to relate self-concept to other constructs, then the specific domains of

self-concept are more useful than a general domain.

Support for this claim is particularly strong in educational psychology, where academic self-concept is substantially related to a

wide variety of academic outcomes, whereas global and nonacademic measures of self-concept are much less highly correlated

with these outcomes (Marsh, 1993).

As emphasized in classical social-comparison-theory approaches to self-evaluation (e.g., Diener & Fujita, 1997; Festinger,

1954; Morse & Gergen, 1970; Suls, 1977), even after individuals

obtain information from various sources about their levels of

accomplishment, their self-perceptions must be compared with

some standard or frame of reference in order to form a selfappraisal. Thus, for example, to the extent that individuals have

different frames of reference, the same objective indicators of

academic achievement will lead to different academic selfconcepts. In the typical application of social comparison theory

(e.g., Diener & Fujita, 1997; Marsh, 1993; Morse & Gergen, 1970;

Suls, 1977), it is assumed that individuals evaluate their own

performance in comparison with the performances of others

through social comparison processes. Particularly in educational

settings, there is good support for how social comparison processes

associated with well-established group membership (the school

one attends) affects self-concept. Diener and Fujita (1997) referred

to this as situationally imposed or forced comparisons as opposed

to a more flexible situation in which individuals have considerable

freedom to consciously select or construct a comparison target so

as to maximize various goals. Diener and Fujita emphasized that

schools closely approximate a total environment (where the

frame of reference affecting judgment is limited to the immediate

context) implicit in the forced comparison. The school is a total

environment in that there are so many inherent constraints and a

natural emphasis on social comparison of achievement levels of

classmates in a school setting. Similarly, educational psychologists

(e.g., Covington, 1992; Marsh, 1993; Marsh, Kong, & Hau, 2000;

Marshall & Weinstein, 1984; also see Goethals & Darley, 1987)

emphasized the extreme salience of achievement as a reference

point within a school setting, particularly when the outcome measure is academic self-concept. Diener and Fujita noted that

Marshs research studies (e.g., Marsh, 1993; Marsh & Craven,

1997; Marsh & Parker, 1984) are one of the best validated examples in support of social comparison theory, demonstrating that

imposed comparisons do have a substantial, lasting impact. As

emphasized by Diener and Fujita, Marshs research clearly supports this traditional application of social comparison theory. However, Marsh (1986) found that this classic approach to social

higher order academic self-concept such that there is a substantial

correlation between verbal and math self-concepts. This prediction

also follows from the typically large correlation between math and

verbal academic achievements (typically .50 to .80, depending on

how achievement is measured). Early research, however, demonstrated that math and verbal self-concepts were much more differentiated than the corresponding achievement scores (Marsh, 1986).

In contrast to the expectation of high correlations between math

and verbal self-concepts, math and verbal self-concepts were

nearly uncorrelated. Furthermore, this near-zero correlation was

consistent across different measures of the math and verbal selfconcepts and a diversity of settings (Marsh, 1986; Marsh & Craven, 1997; Marsh & Yeung, 1998). Hence, it seems that individuals with good mathematics skills also tend to have good verbal

skills and vice versa, but people think of themselves as either

math persons or verbal persons but not both.

Theoretical rationale for the I/E model. The I/E model (for

further discussion, see Marsh, 1986, 1990, 1993; Marsh, Byrne, &

Shavelson, 1988; Marsh & Yeung, 2001) was initially developed

to explain why math and verbal self-concepts are almost uncorrelated even though corresponding areas of academic achievement

are substantially correlated. According to the I/E model, academic

self-concept in a particular domain (e.g., math or verbal selfconcepts) is formed in relation to two comparison processes or

frames of reference.

1.

comparison process in which students compare their selfperceived performances in a particular school subject

with the perceived performances of other students in the

same school subject and other external standards of actual achievement levels (e.g., normative comparisons,

school grades, class rankings, etc.). If they perceive

themselves to be able in relation to other students and

objective indicators of achievement, then they should

have a high academic self-concept in that school subject.

2.

process in which students compare their own performance in one particular school subject with their own

performance in other school subjects. If, for example,

ones best school subject is mathematics, then this student should have a positive math self-concept that is

higher than this students verbal self-concept. Similarly,

according to this internal comparison process, a student

may have a favorable math self-concept if math is this

students best subject even if this student is not particularly good at math relative to other students and external

standards. It is this internal comparison process that

represents an extension to traditional social comparison

theory.

This document is copyrighted by the American Psychological Association or one of its allied publishers.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

58

consider students who accurately perceive themselves to be below

average in both verbal and math skills (an external comparison),

but who are better at mathematics than verbal and other school

subjects (an internal comparison). Their math skills are below

average relative to other students and objective indicators of math

achievement (the external comparison), and this should lead to a

below-average math self-concept. However, their math skills are

above average relative to their other school subjects (an internal

comparison), and this should lead to an above-average math selfconcept. Depending on how these two processes are weighted in

the formation of self-concept, these students may have an average

or even above-average math self-concept even though they have

below-average math skills. The I/E model also predicts that these

students would have better math self-concepts than other students

who did equally poorly at mathematics but who did better in all

other school subjects (i.e., math was their worst subject). Similarly,

a student who is very bright in all school subjects may have an

average or even below-average math self-concept if the student

perceived mathematics to be his or her worst subject.

The external comparison process should result in substantial

positive correlations between math and verbal self-concepts because math and verbal achievements are substantially positively

correlated. However, the ipsative, internal comparison process

should result in a negative correlation between math and verbal

self-concepts because the average correlation among ipsative

scores is necessarily negative (i.e., an increase in any one score

must result in the counterbalancing decrease in average of the

remaining scores if they are ipsative). Both of these processes,

however, affect self-concept responses. Hence, the joint operation

of these processes, depending on the relative weight given to

internal and external comparisons, is consistent with the near-zero

correlation between math and verbal self-concepts that led to the

development of the I/E model. It is, however, important to emphasize that support for the I/E model does not require the correlation between math and verbal self-concepts to be zero, but only

that it be substantially less than the typically substantial positive

correlation between math and verbal achievement.

Domain specificity is a critical feature of the I/E model. However, stronger tests of the I/E model are possible when math and

verbal achievements are related to math and verbal self-concepts

(see Figure 1A). The external comparison process predicts that

good math skills lead to higher math self-concepts and that good

verbal skills lead to higher verbal self-concepts. According to the

internal comparison process, however, good math skills should

lead to lower verbal self-concepts (once the positive effects of

good verbal skills are controlled), that is, The better I am at

mathematics, the poorer I am at verbal subjects (relative to my

good math skills). Similarly, better verbal skills should lead to

lower math self-concept (once the positive effects of good math

skills are controlled). In models used to test this prediction (Figure

1A), the horizontal paths leading from math achievement to math

self-concept and from verbal achievement to verbal self-concept

(the gray horizontal lines in Figure 1A) are predicted to be substantially positive (indicated by in Figure 1A). However, the

cross paths leading from math achievement to verbal self-concept

and from verbal achievement to math self-concept (the dark lines

in Figure 1A) are predicted to be negative. Although consistent

with the I/E model, it is these negative cross pathsthe negative

effects of verbal achievement on math self-concept and of math

internal/external frame of reference model. In Panel A, the horizontal

(positive) paths are predicted to be substantial and positive (), whereas

the cross (negative) paths are predicted to be smaller and negative (). In

Panel B, the actual results based on the total-group (TG) analysis (Model

TG1 in Table 1) and the multiple-group (MG) analysis (Model MG15 in

Table 1) are consistent with predictions.

paradoxical and that provide the critical test of the I/E model.

It is also important to clarify what is actually being tested in the

I/E model. Typically, math and verbal achievements are substantially positively correlated with each other (the typical big-G that

underlies almost all measures of achievement) and typically each

is positively correlated with both math and verbal self-concepts.

Math achievement is more correlated with math self-concept than

verbal self-concept (and verbal achievement is more correlated

with verbal self-concept than math self-concept). The critical prediction for the I/E model, however, is in terms of the path coefficients. In other words, it is the effect of math achievement on

verbal self-concept after controlling for the effect of verbal

achievement. Hence, once we control for the positive effect of

verbal achievement (and the big-G component shared by math and

verbal achievement) on verbal self-concept, then the unique component of math achievement is negatively associated with verbal

self-concept. Thus, the size of the negative effect of math achievement on verbal self-concept should be a function of the discrepancy between the math and verbal achievement scores. Hence, the

operative construct is a residual score that is conceptually like the

difference score (without some of the statistical problems associated with raw difference scores). Thus, a B average in mathematics

may induce an average or even below-average mathematics selfconcept for the student who earns As in most other school subjects,

but may lead to an above-average math self-concept for the student

who earns Cs in other subjects. In the language of path analysis, it

is the direct effect of math achievement on verbal self-concept that

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

achievement and verbal self-concept.

Empirical support for the I/E model. It is not surprising, of

course, that good verbal skills are associated with good verbal

self-concepts and that good math skills are associated with good

math self-concepts (the positive paths in Figure 1A). More surprising even paradoxicalare the negative paths from verbal

achievement to math self-concept and from math achievement to

verbal self-concept (i.e., being more mathematically able detracts

from verbal self-concept, whereas being more verbally able detracts from math self-concept). In a review of 13 studies that

considered students of different ages and different academic

achievement indicators, Marsh (1986) reported that (a) correlations

between indicators of verbal and math achievement were substantial (.42 to .94), (b) correlations between measures of verbal and

math self-concepts were much smaller (.10 to .19), (c) path

coefficients from verbal achievement to verbal self-concept and

from math achievement to math self-concept were all significantly

positive, (d) path coefficients from math achievement to verbal

self-concept and from verbal achievement to math self-concept

were significantly negative.

This pattern of results consistent with the I/E model was subsequently replicated for responses to each of three different selfconcept instruments by Canadian high school students (Marsh,

Byrne, et al., 1988), for the nationally representative sample of

U.S. high school students in the High School and Beyond Study

(Marsh, 1989), for the nationally representative sample of U.S.

high school students in the National Longitudinal Study (Marsh,

1994b), and for academically gifted students in the United States

(Plucker & Stocking, 2001; Williams & Montgomery, 1995) and

China (Lee, Yeung, Low, & Jin, 2000). Although Bong (1998) was

unsuccessful in her attempt to develop items that differentiated

between the internal and external comparison processes, Marsh

and Yeungs (2001) reanalysis of her results demonstrated that

there was clear support for the I/E predictions based on responses

to her alternative set of items. Across these different studies,

support for the I/E predictions have been shown to generalize well

across different sets of items used to infer self-concept and a

variety of different indicators of achievement (e.g., test scores,

school grades, teacher ratings).

Although much of this support is based on responses by students

from the United States, Canada, and Australia where the native

language is English, there is also some support for the crosscultural or cross-nationality generalizability of these results where

verbal self-concept is based on a native language other than

English (e.g., Norway: Skaalvik & Rankin, 1995; Skaalvik &

Valas, 2001; China: Dai, 2001, 2002; Kong, 2000; Lee et al., 2000;

Marsh, Kong, & Hau, 2001; Yeung & Lau, 1998; Yeung & Lee,

1999; Germany: Marsh & Koller, 2003; Moeller & Koller 2001a,

2001b; United Arab Emirates: Abu-Hilal, 2002; Abu-Hilal &

Bahri, 2000). Thus, for example, Abu-Hilal (2002, p. 2) concluded

that his research with Arabic students confirmed previous findings for the I/E model with western samples, thus adding more to

the universality of the model.

Particularly interesting research in Germany (e.g., Moeller &

Koller, 2001a) provided support for the I/E in a true experimental

study with randomly assigned students, demonstrating how experimentally manipulated feedback on achievement in one subject

area had an inverse effect on self-concept in a different area. The

authors concluded that as shown experimentally for the first time,

59

task-related cognitions in other domains (p. 833). Importantly,

they demonstrated simultaneous support for both the internal comparison process (based on experimentally manipulated feedback

about the relative performance in two different tasks) and the

external comparison process (based on experimentally manipulated feedback about performance relative to other students). This

research is important, using a true experimental design with random assignment to groups to support the causality in the causal

path models that are the basis of the I/E model.

Whereas a growing number of studies have found support for

the I/E model in different countries and cultural groups, each was

based on results from a single country and a methodology (e.g.,

achievement indicators, instrumentation, translation, selection and

representativeness of the sample, and statistical analysis) that is

largely idiosyncratic to the particular study. In their critique of

research in this area, Marsh and Yeung (2001) noted the need to

pursue cross-national and cross-cultural comparisons systematically in order to evaluate more fully the generalizability of support

based on the I/E model. The present investigation in which the

same materials (with appropriate translation) were used in representative samples from different countries is clearly stronger in

terms of cross-cultural comparisons than any previous research.

Cross-cultural comparisons provide researchers with a valuable

heuristic basis to test the external validity and generalizability of

their measures, theories, and models. Matsumoto (2001, p. 9)

argued that cultural differences challenge mainstream theoretical

notions about the nature of people and force us to rethink basic

theories of personality, perception, cognition, emotion, development, social psychology, and the like in fundamental and profound

ways. In their influential overview of cross-cultural research,

Segall, Lonner, and Berry (1998, p. 1102) stated that cross-cultural

researchs three complementary goals were

to transport and test our current psychological knowledge and perspectives by using them in other cultures; to explore and discover new

aspects of the phenomenon being studied in local cultural terms; and

to integrate what has been learned from these first two approaches in

order to generate more nearly universal psychology, one that has

pan-human validity.

sufficient advantage of cross-cultural comparisons that allow researchers to test the external validity of their interpretations and to

gain insights about the applicability of their theories and models.

In cross-cultural research, there is an ongoing and growing

interest in the relation between culture and the self. This is an

inevitable consequence of the symbolic construction of self-image

by using the meaning system that is culture (Kashima, 1995).

Indeed, Singelis (2000) argued that this emphasis on self has been

the primary basis for other disciplines increasingly embracing

cross-cultural perspectives. However, there exists a schism between the overarching cultural relativist and universalist perspectives of cross-cultural research (Kagitcibasi & Poortinga, 2000).

The broad cultural relativist (idiographic, emic, indigenous, qualitative) perspective emphasizes the uniqueness of the individual

case that defies comparison. In contrast, the broad universalist

(nomothetic, etic, positivist, quantitative) perspective emphasizes

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

60

predictions, replicability of methods, and empirical testing. Because neither of these apparently mutually exclusive metatheoretical positions is defensible in the extreme, there is a need to bridge

this dichotomy. Thus, for example, Kashima (1995) noted the need

for future research to identify universal as well as culture-specific

antecedents and consequences of self-conceptions. In an emerging

consensus on the cultural-mind relation, Kashima (2000) emphasized a system of reciprocal effects whereby agents generate

culture but are also shaped by culture. Similarly, Kitayama and

Markus (1999) argued for the mutual construction of culture and

self.

In their taxonomy of cross-cultural research, Van de Vijer and

Leung (2000) discussed generalizablity studies with a strong theoretical framework for generating testable hypotheses and an emphasis on the universality of structures and theoretical propositions. Within this context, they noted the need to use new multiplegroup structural equation modeling approaches that allow

researchers to make fine-grained comparisons of factor structures

and patterns of relations among multiple constructs in different

cultural groups. In this framework, there is a focus on similarities

as well as differences and an emphasis on the explanation of

observed differences. Because of the traditional focus on null

hypothesis testing, there is an unfortunate tendency to provide

elaborate interpretations for (sometimes very small) differences

and largely to ignore similarities that may argue for generalization

across cultures. From this perspective, cross-cultural research can

be seen as an extremely important variation on the traditional

multimethod approach to evaluation of construct validity. Van de

Vijer and Leung emphasized that the endemic problems of replicability in cross-culture research will improve with greater emphasis on theory development and testing coupled with the more

appropriate use of new statistical tools. When there are parallel

data from more than one groupthe multiple cultural groups in

cross-cultural researchit is possible to test the invariance of the

solution by requiring any one, any set, or all parameter estimates

to be the same in two or more groups. Byrne (2003) argued that

this analysis is particularly appropriate for making cross-cultural

comparisons.

Method

Data Source and Sample

The present investigation was based on the Program of Student Assessment (PISA) database compiled by the Organisation for Economic Cooperation and Development (OECD), which consists of nationally representative responses by 15-year-olds collected in 32 countries in the year

2000 (see Adams & Wu, 2002; OECD, 2001a, 2001b, for a description of

the database and variables). The PISA database was collected in response

to the need for internationally comparable evidence of student performance

and related competencies within a common framework that is internationally agreed on. Selection of the measures was made on the basis of advice

from substantive and statistical expert panels and results from extensive

pilot studies. Substantial efforts and resources were devoted to achieving

cultural and linguistic breadth in the assessment materials, stringent

quality-assurance mechanisms were applied in the translation of materials

into different languages, and data were collected under independently

supervised test conditions. Paper-and-pencil assessments consisted of a

combination of multiple-choice items and written responses. Whereas all

students completed some reading assessment items (which were the focus

of the 2000 data collection), only random samples of students completed

option of collecting materials on a Cross Curriculum Competencies (CCC)

questionnaire that included the academic self-concept items that are the

focus of the present investigation. A total of 26 of 32 participating countries chose to do so. Although the response rate in the Netherlands was

lower than recommended to ensure a nationally representative sample and

comparability with other countries (OECD, 2001b), the Netherlands was

included in the present investigation. The data for the other 25 countries

provided nationally representative samples of 15-year-old students in each

of these countries.

The present investigation was based on students who had both mathematics and reading achievement test scores and who completed the math

and verbal self-concept items (for a more detailed description of the

achievement tests and the self-concept items, see Adams & Wu, 2002).

Measures included three measures of reading achievement, a single measure of mathematics, three math self-concept items, and three verbal

self-concept items. The self-concept items were from the highly regarded

Self Description Questionnaire II (Byrne, 1996; also see Marsh, 1990,

1992, 1993). Although 97,384 students completed math and reading assessments, only 59,332 also completed any of the self-concept items and

55,577 had complete data for all 10 variables considered here (i.e., the

sample after listwise deletion for missing data, the basis of the present

investigation). As recommended in the database documentation (OECD,

2001a, 2001b), all analyses of the PISA data should be weighted to obtain

unbiased estimates of population parameters. For purposes of the present

investigation, the effective sample size for each country was set equal to

the number of cases for that country prior to weighting so that the weighted

sample size was the same as the unweighted sample size (i.e., the average

weight across all cases was 1.0; but also see Kaplan & Ferguson, 1999, and

Stapleton, 2002, for further discussion on relative and effective weighting).

Statistical Analysis

Structural equation models (SEMs) were conducted with LISREL 8

(Joreskog & Sorbom, 1993) using maximum likelihood estimation (for

further discussion of SEM, see Bollen, 1989; Byrne, 1998; Joreskog &

Sorbom, 1993). Following Marsh, Balla, and Hau (1996) and Marsh, Balla,

and McDonald (1988), we emphasize the Tucker-Lewis index (TLI), the

relative noncentrality index (RNI), and root-mean-square error of approximation (RMSEA) to evaluate goodness of fit, but also present the chisquare test statistic and an evaluation of parameter estimates. Whereas tests

of statistical significance and indices of fit aid in the evaluation of the fit

of a model, there is ultimately a degree of subjectivity and professional

judgment in the selection of a best model (Marsh, Balla, et al., 1988).

When there are parallel data from more than one groupthe 26 countries in this studyit is possible to test the invariance of the solution by

requiring any one, any set, or all parameter estimates to be the same in two

or more groups. Byrne (2003) argued that this analysis is particularly

appropriate for making cross-cultural comparisons. In applying this approach, there is a well-developed methodology in which the goodness of fit

of alternative models are compared, including the least restrictive model

that does not require any of the parameter estimates to be the same in

different groups and the most restrictive model that requires all parameter

estimates to be the same in the different groups (e.g., Byrne, 1998; Marsh,

1994a; but also see Cheung & Rensvold, 1999). In preliminary analyses,

we tested the a priori baseline model separately for each of the 26 groups

and found that the goodness of fit was excellent for each country considered separately (see subsequent discussion of Table 3). Typically, the

minimal condition for factorial invariance is the equivalence of all factor

loadings in the multiple groups, and this is one of the first tests of

invariance in the sequence. There is no clear consensus in recommendations about the ordering of subsequent invariance constraints (e.g., Bentler,

1988; Bollen, 1989; Byrne, 1998; Joreskog & Sorbom, 1993), although

Bentler (1988) and Byrne (1998) noted that the equality of parameters

associated with measurement errors is typically the least important hypothesis to test and is unlikely to be met in most applications. Whereas under

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

highly restrictive conditions it is possible to test for the statistical significance of differences between two nested models (e.g., models with and

without a particular set of invariance constraints), the concerns about

reliance on the chi-square test of statistical significance as a measure of fit

for a single model are even greater than for the comparison of two different

models. In applied research with real data, the null hypothesis of complete

invariance (no differences between groups in parameter estimates) is

always false and will lead to rejection of the null hypothesis when based on

a sufficiently large sample size. Hence, we again emphasize differences in

goodness-of-fit indexes, but also present the chi-square test statistic. More

important, however, in the present investigation, the four path coefficients

used to test the I/E model (Figure 1) are of critical importance and are the

focus of specific models designed to evaluate the invariance of these

parameters across the 26 countries. Hence, the comparison of results across

different countries when these path coefficients are not constrained to be

invariant is potentially more useful than goodness-of-fit indexes.

In preliminary analyses, traditional coefficient alpha estimates of reliability of scale scores were computed separately for each country. These

were consistently high across the 26 groups for reading achievement (M

.86, SD .03), math self-concept (M .88, SD .02), and verbal

self-concept (M .74, SD .07). Inspection of the item-total correlations

for each indicator (not shown) indicated that the one negatively worded

(reverse-scored) item in the verbal self-concept scale contributed less

positively to reliability than the other two positively worded verbal selfconcept items (all math self-concept items were positively worded) and

that this pattern of results was consistent across the 26 countries. In the

PISA database, mathematics achievement was represented by a single

score. Therefore, for purposes of structural equation models in the present

investigation, we set its reliability at .90, a conservative value in relation to

the corresponding reliability estimate for reading that provides a more

realistic estimate than assuming that mathematics achievement is measured

without error (see discussion by Joreskog & Sorbom, 1993).

We began with an evaluation of the results based on the total

group of 55,582 participants (Tables 1 and 2). The solution for this

total-group model (TG1) was well defined and the goodness of fit

(TLI .97; see Table 1) was very good. Most important, however,

were tests of the four path coefficients that were central to the

evaluation of support for the I/E model. As predicted, the two

horizontal paths relating math achievement to math self-concept

(.44) and relating reading achievement to verbal self-concept (.47)

were substantial and positive, whereas the two cross paths leading

from reading achievement to math self-concept (.20) and mathematics achievement to verbal self-concept (.26) were negative.

Also of relevance is the observation that the (zero-order) correlation between math and verbal achievement factors (r .78) was

very large, whereas the corresponding correlation between math

and verbal self-concept factors (r .10) was substantially lower.

Hence, results based on the total sample clearly support the main

predictions for the I/E model (see Figure 1).

Countries

A critical question, for which these data are uniquely appropriate, is how well the results generalize across the 26 different

countries? In order to pursue this question, we conducted multigroup confirmatory factor analyses (CFAs) and SEMs in which we

constrained different parameters to be invariant across the 26

groups (Table 2). We began with a set of CFA models to evaluate

the invariance of the measurement component of the model and

61

Table 1

Parameter Estimates for Total-Group Solution (Model TG1) and

Multiple-Group Solution (MG15)

Total-group

solution

Factor

Multiple-group

solution

Factor loadings

MAch

.95

.00 .00

VAch1

.00

.85 .00

VAch2

.00

.89 .00

VAch3

.00

.78 .00

MSC1

.00

.00 .84

MSC2

.00

.00 .85

MSC3

.00

.00 .83

VSC1

.00

.00 .00

VSC2

.00

.00 .00

VSC3

.00

.00 .00

Path coefficients

MAch

.00

.00 .00

VAch

.00

.00 .00

MSC

.44 .20 .00

VSC

.26

.47 .00

Variance

covariances

MAch

1.00

VAch

.78 1.00

MSC

.00

.00 .90

VSC

.00

.00 .11

.00

.00

.00

.00

.00

.00

.00

.55

.72

.83

.00

.00

.00

.00

.90

.10

.28

.22

.39

.29

.27

.30

.70

.48

.31

.94

.00

.00

.00

.00

.00

.00

.00

.00

.00

.00

.00

.00

.00

.84

.85

.83

.00

.00

.00

.00

.00

.00

.00

.00

.00

.00

.60

.71

.81

.00

.00 .00

.00

.00 .00

.48 .19 .00

.19

.45 .00

.00

.00

.00

.00

1.00

.76

.00

.00

.00

.83

.87

.77

.00

.00

.00

.00

.00

.00

1.00

.00 .87

.00 .04

.89

form. The total-group (TG) solution is based on Model TG1 (Table 2) and

the multiple-group (MG) solution is based on MG15 (with invariant factor

loadings, path coefficients, and factor variance covariances, but freely

estimated uniquenesses for each of the 26 countries). MAch math

achievement; VAch verbal achievement; MSC math self-concept;

VSC verbal self-concept; Uniq uniqueness.

the I/E model.

In the baseline multiple-group model (MG1), no invariance

constraints were imposed and parameters for the a priori model

were fit separately to data from each country. The fit indexes for

this model (e.g., TLI .97) were very good. Separate analyses of

this baseline model conducted with each country indicated that the

fit was good in each of the 26 countries considered separately.

Thus, for example, the 26 RNIs varied from a low of .968 to a high

of .992 (M RNI .980; see subsequent discussion of Table 3). In

the first test of invariance (model MG2), factor loadings were

constrained to be equal across the 26 groups. Again, the fit indexes

were very good and differed little from those based on the totally

noninvariant solution (MG1). This supports the appropriateness of

the measures across the 26 groups and satisfies the minimum

requirement for factorial invariance. In each of the subsequent

CFA models (MG3MG6 in Table 2), the invariance of the factor

loadings was imposed in combination with the invariance of additional sets of parametersfactor variances, factor covariances,

and uniquenesses. Although the imposition of these added invariance constraints resulted in small decrements in fit, even the highly

restrictive Model MG6 of total invariance (i.e., requiring every

parameter to be the same in all 26 groups) provided a good fit to

the data that differed only slightly from Model MG1, which had no

invariance constraints. These results support the cross-cultural

62

Table 2

Goodness of Fit for I/E Model Fit to the Total Group and Multiple (Country) Groups

Model

df

RNI

TLI

RMSEA

Model description

Total sample

TG1

5,026.06

30

.98

.97

.05

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Multiple-group CFA

MG1

MG2

MG3

MG4

MG5

MG6

5,784.36

7,650.34

9,846.64

9,070.34

12,515.47

18,513.56

780

930

1030

1005

1180

1405

.98

.97

.97

.97

.96

.93

.97

.97

.96

.96

.96

.95

.05

.06

.06

.06

.07

.08

CFA

CFA

CFA

CFA

CFA

CFA

INV

INV

INV

INV

INV

INV

FL; Free FV, FC, Uniq.

FL, FV; Free FC, Uniq.

FL, FC; Free FV, Uniq.

FL, FC, FV; Free Uniq.

FL, FC, FV, Uniq. (total invariance)

SEM

SEM

SEM

SEM

SEM

SEM

SEM

SEM

SEM

INV

INV

INV

INV

INV

INV

INV

INV

INV

FL,

FL,

FL,

FL,

FL,

FL,

FL,

FL,

FL,

Multiple-group SEM

MG7

MG8

MG9

MG10

MG11

MG12

MG13

MG14

MG15

9,497.48

8,078.61

8,273.18

10,577.99

10,520.63

11,445.96

11,234.89

11,153.97

12,515.47

1030

980

980

1080

1080

1130

1130

1130

1180

.97

.97

.97

.96

.96

.96

.96

.96

.96

.96

.97

.97

.96

.96

.96

.96

.96

.96

.06

.06

.06

.06

.06

.07

.06

.06

.07

PC; Free FV, FC, Uniq., PC

PC; Free FV, FC, Uniq., PC

FV, FC; Free PC, Uniq.

FC, PC; Free FV, Uniq.

FV, PC; Free FC, Uniq.

FV, FC, PC; Free PC, Uniq.

FV, FC, PC; Free PC, Uniq.

FV, FC, PC; Free Uniq.

Note. N 55,582. In Model TG1 (see parameter estimates in Table 1) the internal/external frame of reference (I/E) model was fit to the total group,

whereas for Models MG1MG13 the I/E model was fit separately for each of the 26 groups representing different countries. For Models MG2MG13, some

combination of parameters is required to be invariant across the 26 groups (countries). RNI relative noncentrality index; TLI TuckerLewis index;

RMSEA root-mean-square error of approximation; CFA confirmatory factor analysis; SEM structural equation model; FL factor loading; FC

factor covariances; FV factor variances; PC path coefficient; PC horizontal path coefficients predicted to be positive (see Figure 1); PC cross

path coefficients predicted to be negative (see Figure 1); Uniq. uniqueness; MG multiple group; TG total group; INV invariant; Free freely

estimated (not constrained to be invariant).

across these 26 countries.

Models MG7MG15 focus specifically on the structural component of the modelthe path coefficients that are critical to tests

of predictions based on the I/E model (see Figure 1). In Model

MG7, the path coefficients and factor loadings were required to be

the same in each of the 26 groups. Although there was a very small

decrement in fit (TLI .96) relative to the model with only factor

loadings invariant (MG2), the fit was still very good. These tests of

the invariance of the four path coefficients provided a global test

of the invariance of the two path coefficients predicted to be

positive and the two path coefficients predicted to be negative. In

Model MG8, the horizontal (positive) path coefficients were freely

estimated in each group, whereas the cross (negative) path coefficients were required to be the same in all 26 groups. In Model

MG9, the negative path coefficients were freely estimated and the

positive paths were invariant across groups. In both models, the

goodness of fit improved a small amount (both TLIs are .97), but

the differences were small. These results demonstrated that the

magnitudesas well as the direction of the path coefficients

were consistent across the 26 different countries.

In Models MG10 MG12, we evaluated the effects on goodness

of fit associated with invariance constraints on factor variances and

factor covariances in the I/E model. Whereas these additional

invariance constraints produced some decrement in fit, even Model

MG15 (which required that all four path coefficients, all four

factor variances, and both factor covariances were the same across

the 26 countries) provided a good fit to the data.

In summary, even the extremely demanding model with complete invariance of all parameters provided a good fit to the data.

Because no one of these multiple-group models stood out as

clearly the best model, we evaluated parameter estimates based

on several of these models.

Estimates

In order to evaluate further support for the cross-cultural generalizability of the results, we further evaluated parameter estimates based on Model MG15 (Table 1). Because factor loadings,

path coefficients, and factor variances and covariances were invariant (the same) across the 26 groups, it was only necessary to

present one set of parameter estimates (rather than separate sets of

parameter estimates for each of the 26 groups). Because uniqueness terms were not held invariant across groups in this model, the

26 separate sets of uniqueness terms were not presented in order to

conserve space (but see earlier discussion of reliability estimates;

also see Table 3). Of particular importance in this highly restrictive

multigroup model, MG15, were the cross (negative) paths leading

from reading achievement to math self-concept (.19) and from

mathematics achievement to verbal self-concept (.19). In addition

to providing global support for the I/E model, the invariance of

these parameter estimates provided remarkably strong support for

the cross-cultural generalizability of predictions based on the I/E

model.

63

Table 3

Reliability Estimates, Goodness-of-Fit Indexes, and Selected Parameter Estimates for Each Country

Reliability

Factor corr

Path coefficients

Goodness of fit

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Country

Total

55,582

1. Australia

2,642

2. Austria

2,380

3. BelgiumFlemish 1,962

4. Brazil

2,218

5. Czech Republic

2,698

6. Denmark

2,087

7. Finland

2,576

8. Germany

2,502

9. Hungary

2,550

10. Iceland

1,720

11. Ireland

2,041

12. Italy

2,678

13. Korea

2,705

14. Latvia

1,920

15. Liechtenstein

153

16. Luxembourg

1,441

17. Mexico

2,275

18. Netherlands

1,282

19. New Zealand

1,809

20. Norway

2,050

21. Portugal

2,378

22. Russia

3,398

23. Sweden

2,282

24. Switzerland

2,982

25. Scotland

1,211

26. United States

1,642

M

SD

Mdn

25th percentile

75th perecentile

VAch

.87

.87

.87

.86

.81

.84

.87

.84

.87

.86

.86

.86

.86

.78

.88

.83

.88

.82

.84

.88

.88

.89

.84

.85

.88

.87

.89

.86

.03

.86

.84

.88

.88

.86

.88

.86

.85

.85

.86

.93

.90

.87

.91

.87

.88

.89

.85

.85

.88

.83

.89

.89

.90

.87

.87

.88

.88

.88

.86

.88

.02

.88

.86

.89

.74

.78

.81

.71

.63

.75

.77

.80

.81

.67

.78

.79

.81

.68

.66

.76

.75

.55

.74

.80

.74

.73

.67

.76

.76

.82

.76

.74

.07

.76

.70

.79

.76*

.77*

.76*

.81*

.69*

.74*

.78*

.71*

.79*

.76*

.75*

.79*

.73*

.77*

.61*

.76*

.74*

.76*

.84*

.80*

.73*

.79*

.68*

.81*

.76*

.82*

.84*

.76

.05

.74

.76

.79

VSC

MSC

to MSC

to MSC

to VSC

to VSC

.06*

.08*

.07*

.11*

.14*

.08*

.10*

.28*

.12*

.08*

.31*

.11*

.06*

.13*

.09*

.06

.01

.52*

.07*

.07*

.14*

.06*

.29*

.18*

.20*

.12*

.11*

.06

.17

.08

.07

.14

.48*

.41*

.47*

.34*

.23*

.51*

.65*

.70*

.62*

.43*

.68*

.53*

.49*

.58*

.37*

.41*

.40*

.14*

.89*

.80*

.72*

.55*

.22*

.58*

.54*

.69*

.42*

.51

.18

.52

.41

.66

.19*

.16*

.25*

.24*

.07

.22*

.18*

.06

.45*

.15*

.09*

.22*

.21*

.19*

.20*

.27*

.30*

.03

.75*

.38*

.17*

.28*

.12*

.20*

.39*

.27*

.15*

.22

.16

.20

.27

.15

.19*

.19*

.26*

.20*

.08

.17*

.16*

.04

.41*

.17*

.08

.44*

.35*

.04

.14*

.37*

.27*

.10*

.25*

.53*

.20*

.18*

.16*

.08*

.28*

.38*

.20*

.21

.14

.20

.30

.13

.45*

.39*

.56*

.26*

.25*

.48*

.48*

.46*

.60*

.49*

.40*

.47*

.62*

.48*

.47*

.55*

.57*

.20*

.34*

.68*

.59*

.48*

.49*

.37*

.42*

.45*

.55*

.47

.12

.48

.40

.55

2(30)

12,515.47

181.96

181.57

178.97

256.04

395.87

350.22

257.61

185.00

318.31

194.20

146.14

199.84

115.65

212.20

50.11

152.48

184.50

103.33

198.64

312.61

334.03

446.98

243.33

274.55

85.45

224.79

222.47

95.27

199.24

172.35

284.06

RNI TLI

.957

.988

.987

.982

.970

.968

.969

.984

.989

.974

.982

.988

.987

.992

.978

.969

.982

.980

.988

.983

.974

.974

.971

.980

.983

.991

.977

.980

.007

.982

.974

.987

.957

.982

.981

.974

.955

.952

.953

.976

.983

.961

.973

.981

.981

.988

.966

.954

.973

.970

.981

.975

.961

.961

.957

.971

.975

.987

.965

.971

.011

.973

.961

.981

Note. All parameter estimates are present in completely standardized form. Factor correlations for each country are based on MG3 (Table 2), and the path

coefficients are based on MG10 (Table 2). The total results based on all 26 countries are based on Model MG15 (Table 1; also see Table 2) in which only

uniquenesses were allowed to differ from country to country. The a priori model (see Figure 1) was fit separately to responses from each country, and the

total sample and goodness of fit for each analysis is summarized by the chi-square test statistic, relative noncentrality index (RNI), and the TuckerLewis

index (TLI). MAch math achievement; VAch verbal achievement; MSC math self-concept; VSC verbal self-concept; corr correlation; MG

multiple group.

* p .05.

path coefficients relating the two achievement test scores to the

corresponding self-concept measures. Although the fit was good

for models that required these path coefficients to be the same

across the 26 countries, this highly restrictive model produced a

small decrement in fit. In order to evaluate the extent of variation

in different countries, path coefficients from Model MG10 (which

allowed the path coefficients to be estimated separately in each

country) are presented for all 26 countries in Table 3.

Horizontal (positive) paths from math achievement to math

self-concept and from verbal achievement to verbal self-concept

were predicted to be substantial and positive. All 52 of these path

coefficients were statistically significant and positive. The means

of the two sets of path coefficients were .51 (SD .18) and .47

(SD .12), respectively. Cross (negative) paths from math

achievement to verbal self-concept and from verbal achievement

to math self-concept were predicted to be negative and less substantial. Across these 52 path coefficients, one was small but

remaining 44 were significantly negative. The means of the two

sets of path coefficients were .22 (SD .16) and .21 (SD .14),

respectively. These results for horizontal and cross paths were

similar to those based on the total sample and those based on

Model MG15 in which path coefficients were required to be the

same across the 26 groups (see Table 1). In all 26 countries, the

absolute sizes of the cross (negative) paths were consistently much

smaller than the systematically larger horizontal (positive) paths.

An important feature of the relations between multidimensional

achievement scores and multidimensional academic self-concept

scores was that academic self-concept scores were substantially

less highly correlatedmore highly differentiatedthan the corresponding academic achievement scores. More specifically, the

I/E model predicts that correlations between math and verbal

achievement scores should be substantial and substantially larger

than those between math and verbal self-concept. Although support for the I/E model does not require math and verbal self-

64

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

concept measures to be uncorrelated, much of the research reviewed earlier has found these two self-concept scores to be nearly

uncorrelated. In order to evaluate the cross-cultural generalizability of this pattern of results, the two correlations were presented

separately for each country in Table 3 (based on results from

Model MG3 in which factor loading and factor variances were

invariant across countries, but factor correlations were not). Consistent with a priori predictions, in every country the correlation

between the two self-concept scores (M .06, SD .17) was

consistently much smaller than the correlation between the two

achievement scores (M .76, SD .07).

The present investigation, because of the strength of the PISA

data, is probably the strongest cross-cultural study of academic

self-concept ever undertaken and certainly the strongest crosscultural study of relations between academic self-concept and

academic achievement posited by the I/E model. Particular

strengths include the large, nationally representative samples from

each of 26 countries, the careful selection and construction of

measures based on advice from substantive and statistical expert

panels appointed by the OECD, the careful translation of materials

in each of the participating countries, the precise nature of a priori

predictions based on a strong theoretical model, and the application of a sophisticated test of invariance of parameter estimates

resulting from the structural equation models applied to data from

each of the 26 countries.

There are, however, some potentially important limitations in

our statistical analyses that dictate caution in the interpretation of

the results. The PISA study has a three-level design (Level 1

students, Level 2 school, Level 3 country), but we chose to

ignore Level 2. We justified this decision in several ways. First, all

of our variables were measured at the individual student level and

our substantive focus was on the country level. Second, for the

PISA dataas is the case more generally (see Marsh & Rowe,

1996)there was very little variation in self-concept responses at

the school level even though there was substantial variation at the

school level for achievement scores.1 Because essentially all of the

parameter estimates of interest in the present investigation involved relations (correlations or path coefficients) with one or the

other of the self-concept variables, parameter estimates and their

standard errors were not likely to be substantially affected by

ignoring the school level (see related discussion by Marsh &

Rowe, 1996). Finally, although there is rapid development of

statistical packages in this area, we are aware of no commercially

available packages that would allow us to evaluate a three-level

model, use multiple indicators to infer latent variables, and to test

the invariance of the factor structure across the 26 countries.

There is also a potential concern about missing data in our

analysesparticularly for the self-concept responses that were our

main focus. A substantial portion of students did not have both

math and verbal test scores because of the design of the PISA

study (in which students were given different achievement tests at

random). However, our use of listwise deletion is appropriate

under these circumstances in which missing values are determined

randomly by design. In addition, there were entire countries or

regions within countries (e.g., the non-Flemish part of Belgium

and all of the United Kingdom except for Scotland) that chose not

to participate in the CCC component of PISA that contained the

analyses. Within some of the countries, however, there were entire

schools that chose not to participate in the survey component of

PISA that contained the self-concept items and these were also

excluded from the present investigation. In some cases, these

represented entire regions (e.g., only the Flemish part of Belgium

participated and in the United Kingdom only Scotland participated

in the CCC option of PISA), which were excluded. However, for

students who completed any of the self-concept items, there were

very few missing responses (1.6%). Whereas use of listwise deletion for missing data was a reasonable strategy for students who

had some nonmissing self-concept responses given the very

small amount of missing data, the exclusion of entire schools that

did not participate in the CCC option of PISA may compromise the

representativeness of the sample.

It could be argued that a majority of the countries are western

countries; that the sample included only one country from Central

America (Mexico), South America (Brazil), and Asia (Korea); and

that the sample included no African countries at all. Hence, it is

relevant to focus on results from Mexico, Brazil, and Korea. In all

three countries, the correlation between math and verbal achievement factors was substantially larger than the corresponding correlation between math and verbal self-concept factors, thus supporting the I/E model. However, the correlation between the two

academic self-concept factors in one of these countriesMexico

(r .52)was clearly larger than in the other 25 countries. In all

three countries, paths between matching achievement and selfconcept factors (the horizontal paths in Figure 1) were positive,

thus supporting the I/E model. However, in Brazil and Mexico,

these paths were smaller than in most of the other countries. In all

three countries, all of the critical paths between nonmatching

self-concept and achievement factors (the diagonal, cross paths in

Figure 1) were negative, thus supporting the I/E model. However,

only two of these six paths were statistically significant (i.e., four

are negative, but do not differ significantly from zero). Whereas

results based on these three countries provided some support for

the I/E model, the strength of this support was weaker than for the

other countries. It is, however, also important to emphasize that

there is strong support for the I/E model in non-Western countries

summarized earlier, particularly in Chinese research (Dai, 2001,

2002; Kong, 2000; Lee et al., 2000; Marsh et al., 2001; Yeung &

Lau, 1998; Yeung & Lee, 1999). In summary, whereas the results

of the present investigation provide strong support for the I/E

model, there is a need to further test the generalizability of the

results in non-Western countriesparticularly those in Asia, South

America, Central America, and Africain order to more fully

evaluate any claims that support for the I/E model is universal.

More generally, the I/E model is likely to be dependent on the

existence of a formal school system that emphasizes math and

verbal school subjects and would not be likely to be supported in

cultures where there was no formal school system.

We conducted a three-level (Level 1 individual student, Level 2

school, Level 3 country) variance components model for math achievement, verbal achievement, math self-concept, and verbal self-concept. The

amount of variance explained by school was 5% and 3% for verbal and

math self-concepts, respectively, but 21% and 27% for math and verbal

achievement scores, respectively.

1

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

The extreme domain specificity of academic self-concepts that

led to the development of the I/E model and was demonstrated so

convincingly in the present investigation has important implications for any lingering debates about the relative importance of

unidimensional and multidimensional perspectives of self-concept.

Clearly, the relation between self-concepts in particular academic

areas and corresponding areas of academic achievement cannot be

adequately understood if researchers rely solely on global measures of general self-concept or self-esteem. Indeed, our results

demonstrate that not even global measures of academic selfconcept are sufficient to understand the interplay between selfperceptions in different academic domains that is the basis of the

internal comparison process in the I/E model.

The present investigation also has potentially important theoretical implications for social comparison theory. The theoretical

basis for the I/E model extends social comparison theory, positing

an internal comparison process in addition to the more typical

external comparison process. Specifically, students not only use

the performances of other students to form their self-concepts in a

particular school subject (the external comparison process), they

also use their own performances in other school subjects as a

second basis of comparison (the internal comparison process).

Although there is clear support for this pattern of results in relation

to academic achievements and academic self-concepts, it is also

relevant to ask whether similar frame-of-reference effects exist in

other areas as well. We suggest that the implications probably have

much broader generality. To illustrate this broader generality with

a hypothetical example, it is relevant to consider two athletes: (a)

a weekend sports enthusiast who is reasonably good at golf, tennis,

and a variety of other sports, but who is best at golf (with a

handicap of 10) and (b) a professional tennis player who is also a

good golfer (with a handicap of 2). Asked how good they were at

golf, it would be reasonable for the professional tennis player to

say pretty good (because she is so much better at tennis),

whereas the weekend sports enthusiast might say good (because

golf is her best sport). Objectively, the professional tennis player is

a better golfer, but if asked to complete self-concept for golf and

tennis, the weekend sports enthusiast may have as high or even a

higher self-concept of golf than the professional athlete. As illustrated by this example, the critical feature of the internal comparison frame of reference is the use of accomplishments in one arena

as a basis of comparison for evaluating accomplishments in another arena.

Marsh and Roche (1996) applied this logic in the evaluation of

performing arts (PA) self-concepts in dance, music, and drama.

For students not attending a PA school and non-PA students in a

PA school, there were modest, positive correlations among the

four PA self-concept scales. PA students specializing in dance,

music, or drama within a PA high school, however, were likely to

compare their relative competencies in the different PA domains in

forming their PA self-concepts in each domain as well as making

comparisons with others. Thus, for example, PA dance students

who were also good at drama and music were likely to have lower

music and drama self-concepts than non-PA students who were

equally able in music and dramanot because PA dance students

were less skilled at music and drama than non-PA students, but

because their music and drama skills were not nearly as good as

their dance skills. Consistent with a priori predictions based on the

65

I/E model, Marsh and Roche found that (a) dance, music, and

drama self-concepts were positively correlated for non-PA students, but uncorrelated for PA students and (b) PA students had

high self-concepts in their specialty area, but had much lower

self-concepts in their nonspecialty areaslower even than non-PA

students who did not specialize in any areas of PA.

The extreme domain specificity of academic self-concepts that

led to the development of the I/E model also has practical implications for teachers and parents and for educational practice.

Teachers, in order to understand the academic self-concepts of

their students in different content areas, must understand the implications of the I/E model. When teachers were asked to infer the

self-concepts of their students (see discussion by Marsh & Craven,

1997), their responses reflected primarily the external comparison

process so that teachers inferences were not nearly so domain

specific as responses by their students; students who were bright in

one area tended to be seen as having good academic self-concepts

in all areas, whereas students who were not bright in one area were

seen as having poor academic self-concept in all areas. Similarly,

Dai (2002) reported that inferred self-concept ratings by parents

reflected primarily the external comparison process typically emphasized in social comparison research, but not the internal comparison process that is the unique feature of the I/E model. In

contrast to inferred self-concept ratings by significant others

(teachers and parents), students academic self-concepts in different domains are extremely differentiated. Hence, understanding

the implications of the I/E model will allow significant others to

better understand children and to infer childrens self-concepts

more accurately. Thus, for example, our results demonstrate that

even bright students may have an average or below-average selfconcept in their weakest school subject that may seem paradoxical,

in relation to their good achievement (good relative to other

students, but not to their own performance in other school subjects). Similarly, even poor students may have an average or

above-average self-concept in their best school subject that may

seem paradoxical in relation to their below-average achievement in

that subject. Particularly for poorer students, understanding these

principles should assist teachers and parents in giving positive

feedback that is credible to students.

In summary, there exists a very strong and growing body of

support for the I/E model. In evaluating this support, it is important

to establish the limits of the models generalizability. Tests of the

cross-cultural support for predictions from a theoretical model

developed in one culture to another culture provide an important

basis for testing this generalizability. Importantly, previous tests of

the I/E model have been conducted primarily in western countries

and, typically, in those where the native language is English.

Although there exists support for the I/E predictions in some

non-Western countriesparticularly ChinaI/E studies typically

have been based on ad hoc samples within a single country and had

idiosyncratic design features that hindered comparisons across

different countries. From this respect, the results of the present

investigation based on large, representative samples of students

from 26 different countries and common materialsprovide a

much stronger test of the cross-cultural generalizability of predictions based on the I/E model than any previous research. Because

there was good support both for the predictions based on the I/E

model and for the generalizability of these results across the 26

countries, the results clearly support the construct validity of the

I/E model and its cross-cultural generalizability. Although it may

66

universal, the results of our research clearly extend the crosscultural generalizability of support for the I/E model.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

References

Abu-Hilal, M. M. (2002, April). Frame of reference model of self-concept

and locus of control: A cross gender study in the United Arab Emirates.

Paper presented at the 87th Annual Convention of the American Educational Research Association, New Orleans, LA.

Abu-Hilal, M. M., & Bahri, T. M. (2000). Self concept: The generalizability of research on the SDQ, Marsh/Shavelson model and I/E frame of

reference model to the United Arab Emirates students. Social Behavior

& Personality, 28, 309 322.

Adams, R., & Wu, M. (2002). PISA 2000 Technical Report. Retrieved May

5, 2003, from Organisation for Economic Co-Operation and Development Web site: http://www.pisa.oecd.org/tech/intro.htm

Bentler, P. M. (1988). Theory and implementation of EQS. A structural

equations program. Los Angeles: BMDP Statistical Software.

Bollen, K. (1989). Structural equations with latent variables. New York:

Wiley.

Bong, M. (1998). Tests of the internal/external frames of reference model

with subject-specific academic self-efficacy and frame-specific academic self-concepts. Journal of Educational Psychology, 90, 102110.

Brown, J. D. (1993). Self-esteem and self-evaluation: Feeling is believing.

In J. Suls (Ed.), Psychological perspectives on the self (Vol. 4, pp.

59 98). Hillsdale, NJ: Erlbaum.

Byrne, B. (1996). Measuring self-concept across the life span: Issues and

instrumentation. Washington, DC: American Psychological Association.

Byrne, B. M. (1998). Structural equation modeling with LISREL, PRELIS,

and SIMPLIS: Basic concepts, applications and programming. Mahwah,

NJ: Erlbaum.

Byrne, B. M. (2003). Measuring self-concept across culture: Issues, caveats, and practice In H. W. Marsh, R. G. Craven, & D. McInerney (Eds.),

International advances in self research (Vol. 1, pp. 291313). Greenwich, CT: Information Age.

Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance

across groups: A reconceptualization and proposed new method. Journal

of Management, 25, 127.

Covington, M. V. (1992). Making the grade: A self-worth perspective on

motivation and school reform. New York: Cambridge University Press.

Dai, D. Y. (2001). A comparison of gender differences in academic

self-concept and motivation between high-ability and average Chinese

adolescents. Journal of Secondary Gifted Education, 13, 2232.

Dai, D. Y. (2002). Incorporating parent perceptions: A replication and

extension study of the internal external frame of reference model of

self-concept development. Journal of Adolescent Research, 17, 617

645.

Diener, E., & Fujita, F. (1997). Social comparison and subjective wellbeing. In B. P. Buunk & F. X. Gibbons (Eds.), Health, coping, and

well-being: Perspectives from social comparison theory (pp. 329 358).

Mahwah, NJ: Erlbaum.

Festinger, L. (1954). A theory of social comparison processes. Human

Relations, 7, 117140.

Goethals, G. R., & Darley, J. M. (1987). Social comparison theory:

Self-evaluation and group life. In B. Mullen & G. R. Goethals (Eds.),

Theories of group behavior (pp. 21 47). New York: Springer-Verlag.

Joreskog, K. G., & Sorbom, D. (1993). LISREL 8: Structural equation

modeling with the SIMPLIS command language. Chicago: Scientific

Software International.

Kagitcibasi, C., & Poortinga, Y. H. (2000). Cross-cultural psychology:

Issues and overarching themes. Journal of Cross-Cultural Psychology,

31, 129 147.

Kaplan, D., & Ferguson, A. J. (1999). On the utilization of sample weights

in latent variable models. Structural Equation Modeling, 6, 305321.

Journal of Cross-Cultural Psychology, 26, 603 605.

Kashima, Y. (2000). Conceptions of culture and person for psychology.

Journal of Cross-Cultural Psychology, 31, 14 32.

Kitayama, S., & Markus, H. R. (1999). Yin and yang of Japanese self: The

cultural psychology of personality coherence. In D. Cervone & Y. Shoda

(Eds.), The coherence of personality: Social cognitive bases of personality consistency, variability, and organization (pp. 242302). New

York: Guilford Press.

Kong, C. (2000). Chinese students self-concept: Structure, frame of reference, and relation with academic achievement. Dissertation Abstracts

International Section A: Humanities and Social Sciences, 61(3-A), 880.

Lee, M. F., Yeung, A. S., Low, R., & Jin, P. (2000). Academic self-concept

of talented students: Factor structure and applicability of the internal/

external frame of reference model. Journal for the Education of the

Gifted, 23, 343367.

Marsh, H. W. (1986). Verbal and math self-concepts: An internal/external

frame of reference model. American Educational Research Journal, 23,

129 149.

Marsh, H. W. (1989). Sex differences in the development of verbal and

math constructs: The High School and Beyond study. American Educational Research Journal, 26, 191225.

Marsh, H. W. (1990). A multidimensional, hierarchical self-concept: Theoretical and empirical justification. Educational Psychology Review, 2,

77172.

Marsh, H. W. (1992). Self Description Questionnaire, II. Sydney, New

South Wales, Australia: University of Western Sydney. (Original work

published 1990)

Marsh, H. W. (1993). Academic self-concept: Theory measurement and

research. In J. Suls (Ed.), Psychological perspectives on the self (Vol. 4,

pp. 59 98). Hillsdale, NJ: Erlbaum.

Marsh, H. W. (1994a). Confirmatory factor analysis models of factorial

invariance: A multifaceted approach. Structural Equation Modeling, 1,

534.

Marsh, H. W. (1994b). Using the National Educational Longitudinal Study

of 1988 to evaluate theoretical models of self-concept: The SelfDescription Questionnaire. Journal of Educational Psychology, 86,

439 456.

Marsh, H. W., Balla, J. R., & Hau, K. T. (1996). An evaluation of

incremental fit indices: A clarification of mathematical and empirical

processes. In G. A. Marcoulides & R. E. Schumacker (Eds.), Advanced

structural equation modeling: Issues and techniques (pp. 315353).

Hillsdale, NJ: Erlbaum.

Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness-of-fit

indexes in confirmatory factor analysis: The effect of sample size.

Psychological Bulletin, 103, 391 410.

Marsh, H. W., Byrne, B. M., & Shavelson, R. (1988). A multifaceted

academic self-concept: Its hierarchical structure and its relation to academic achievement. Journal of Educational Psychology, 80, 366 380.

Marsh, H. W., & Craven, R. (1997). Academic self-concept: Beyond the

dustbowl. In G. Phye (Ed.), Handbook of classroom assessment: Learning, achievement, and adjustment (pp. 131198). Orlando, FL: Academic Press.

Marsh, H. W., & Hattie, J. (1996). Theoretical perspectives on the structure

of self-concept. In B. A. Bracken (Ed.), Handbook of self-concept (pp.

38 90). New York: Wiley.

Marsh, H. W., & Koller, O. (2003). Unification of two theoretical models

of relations between academic self-concept and achievement. In H. W.

Marsh, R. G. Craven, & D. McInerney (Eds.), International advances in

self research (Vol. 1, pp. 17 47). Greenwich, CT: Information Age.

Marsh, H. W., Kong, C.-K., & Hau, K.-T. (2000). Longitudinal multilevel

models of the big-fishlittle-pond effect on academic self-concept:

Counterbalancing contrast and reflected-glory effects in Hong Kong

high schools. Journal of Personality and Social Psychology, 78, 337

349.

This article is intended solely for the personal use of the individual user and is not to be disseminated broadly.

Marsh, H. W., Kong, C., & Hau, K. (2001). Extension of the internal/

external frame of reference model of self-concept formation: Importance

of native and nonnative languages for Chinese students. Journal of

Educational Psychology, 93, 543553.

Marsh, H. W., & Parker, J. (1984). Determinants of student self-concept:

Is it better to be a relatively large fish in a small pond even if you dont

learn to swim as well? Journal of Personality and Social Psychology, 47,

213231.

Marsh, H. W., & Roche, L. A. (1996). Structure of artistic self-concepts for

performing arts and non-performing arts students in a performing arts

high school: Setting the stage with multigroup confirmatory factor

analysis. Journal of Educational Psychology, 88, 461 477.

Marsh, H. W., & Rowe, K. J. (1996). The negative effects of schoolaverage ability on academic self-conceptAn application of multilevel

modelling. Australian Journal of Education, 40, 65 87.

Marsh, H. W., & Yeung, A. S. (1998). Longitudinal structural equation

models of academic self-concept and achievement: Gender differences

in the development of math and English constructs. American Educational Research Journal, 35, 705738.

Marsh, H. W., & Yeung, A. S. (2001). An extension of the internal/external

frame of reference model: A response to Bong (1998). Multivariate

Behavioral Research, 36, 389 420.

Marshall, H. H., & Weinstein, R. S. (1984). Classroom factors affecting

students self-evaluations. Review of Educational Research, 54, 301

326.

Matsumoto, D. (2001). Cross-cultural psychology in the 21st century. In

J. S. Halonen & S. F. Davis (Eds.), The many faces of psychological research in the 21st century (chap. 5). Retrieved December 18,

2001, from the Teaching of Psychology Web site: http://teachpsych

.lemoyne.edu/teachpsych/faces/script/ch05.htm.

Moeller, J., & Koller, O. (2001a). Dimensional comparisons: An experimental approach to the internal/external frame of reference model.

Journal of Educational Psychology, 93, 826 835.

Moeller, J., & Koller, O. (2001b). Frame of reference effects following the

announcement of exam results. Contemporary Educational Psychology,

26, 277287.

Morse, S., & Gergen, K. J. (1970). Social comparison, self-consistency,

and the concept of self. Journal of Personality and Social Psychology,

16, 148 156.

Organisation for Economic Co-operation and Development. (2001a).

Knowledge and skills for life: Results from the first OECD programme

for international student assessment (PISA) 2000. Paris, France: Author.

Organisation for Economic Co-operation and Development. (2001b). PISA

international data base (Tech. Rep.). Paris, France: Author.

Plucker, J. A., & Stocking, V. B. (2001). Looking outside and inside:

Self-concept development of gifted adolescents. Exceptional Children,

67, 535548.

67

psychology as a scholarly discipline: On the flowering of culture in

behavioral research. American Psychologist, 53, 11011110.

Shavelson, R. J., Hubner, J. J., & Stanton, G. C. (1976). Validation of

construct interpretations. Review of Educational Research, 46, 407 441.

Singelis, T. M. (2000). Some thoughts on the future of cross-cultural social

psychology. Journal of Cross-Cultural Psychology, 31, 76 91.

Skaalvik, E. M., & Rankin, R. J. (1995). A test of the internal/external

frame of reference model at different levels of math verbal selfperception. American Educational Research Journal, 32, 161184.

Skaalvik, E. M., & Valas, H. (2001). Achievement and self-concept in

mathematics and verbal arts: A study of relations. In R. J. Riding & S. G.

Rayner (Eds.), Self perception (pp. 221238). Westport, CT: Ablex.

Stapleton, L. M. (2002). The incorporation of sample weights into multilevel structural equation models. Structural Equation Modeling, 9, 475

502.

Sue, S. (1999). Science, ethnicity and bias: Where have we gone wrong?

American Psychologist, 54, 1070 1077.

Suls, J. M. (1977). Social comparison theory and research: An overview

from 1954. In J. M. Suls & R. L. Miller (Eds.), Social comparison

processes: Theoretical and empirical perspectives (pp. 120). Washington, DC: Hemisphere.

Suls, J. (1993). Preface. In J. Suls (Ed.), Psychological perspectives on the

self (Vol. 4, pp. ixxi). Hillsdale, NJ: Erlbaum.

Van de Vijer, F. J. R., & Leung, K. (2000). Methodological issues in

psychological research on culture. Journal of Cross-Cultural Psychology, 31, 3351.

Wells, L. E., & Marwell, G. (1976). Self-esteem: Its conceptualization and

measurement. Beverly Hills, CA: Sage.

Williams, J. E., & Montgomery, D. (1995). Using frame of reference

theory to understand the self-concept of academically able students.

Journal for the Education of the Gifted, 18, 400 409.

Wylie, R. C. (1979). The self-concept (Vol. 2). Lincoln: University of

Nebraska Press.

Yeung, A. S., & Lau, I.-C.-K. (1998, July). The internal/external frame of

reference in the self-concept development of higher education students.

Paper presented at the Conference of the Higher Education Research and

Development Society of Australasia, Auckland, New Zealand. (ERIC

Document Reproduction Service No. ED423772)

Yeung, A. S., & Lee, F. L. (1999). Self-concept of high school students in

China: Confirmatory factor analysis of longitudinal data. Educational &

Psychological Measurement, 59, 431 450.

Revision received July 24, 2003

Accepted July 28, 2003