Sei sulla pagina 1di 9

WD6138.

127-136 2/18/98 7:28 AM Page 127

Psychology in the Schools, Vol. 35(2), 1998


1998 John Wiley & Sons, Inc.

CCC 0033-3085/98/020127-09

RELIABILITY AND VALIDITY OF THE MATH ESSENTIAL SKILL SCREENER


ELEMENTARY VERSION (MESS-E)
bradley t. erford, donna l. bagley, james a. hopper, ramona m. lee,
kathleen a. panagopulos, and denise b. preller
Loyola College in Maryland
The Math Essential Skill ScreenerElementary Version (MESS-E) is a screener devised to identify primary grade students at risk for math difficulties. Item analysis, interitem consistency, test
retest reliability, decision efficiency, and construct validity of the MESS-E were studied using four
independent samples of boys and girls grades 13 (aged 68). Item analysis revealed median item
difficulty of .64 and median item discrimination of .75. Interitem consistency was .92 (n 5 171)
and .94 (n 5 711), while 30-day testretest reliability was .86 (n 5 125). Exploratory factor analysis indicated a one-factor solution accounting for 37% of observed variance. LISREL 7 confirmatory factor analysis procedures determined that the one-factor model fit the standardization sample
data poorly (goodness-of-fit index 5 .729, x2 to df ratio 5 9.91). The MESS-E yielded concurrent
validity coefficients (n 5 171) of .74 with the WoodcockJohnson: Tests of AchievementRevised
(WJ-R) Math Cluster, .80 with the Wide-Range Achievement TestRevised (WRAT-R) Arithmetic
subtest and .73 with the KeyMath-R Operations Area standard scores. A diagnostic efficiency study
yielded a total predictive value (TPV) of .93, sensitivity 5 .98, specificity 5 .88, positive predictive power (PPP) 5 .89, negative predictive power (NPP) 5 .98, and incremental validity 5 39%.
The MESS-E displayed a slight tendency to overidentify children potentially at risk for math difficulties. 1998 John Wiley & Sons, Inc.

Identifying students at risk for academic failure is receiving increasing attention among educators and has become a controversial topic. At-risk students are those in danger of failing to obtain an
adequate level of educational skills (Slavin & Madden, 1989). But while factors such as socioeconomic status (SES), poor attendance, limited English proficiency, and geographic location may be
helpful in identifying subgroups of students at risk (Slavin & Madden, 1989; Speece & Cooper,
1990), the accuracy of such stereotyped, indirect academic predictors is suspect. For example, while
some low SES students are at risk, not all such children can be characterized as in danger of academic failure. Direct measures of academic performance, such as academic achievement tests, are
likely to yield more accurate identification of children in need of academic intervention. Furthermore, to refer nonspecifically to a child as at risk may lead to a misrepresentation of potential deficiencies. Therefore, it is more meaningful to refer to a child who performs poorly in math as at risk
for math failure. This more specific identification schema acknowledges that children deficient in
one or more academic skill areas are not necessarily skill deficient in all academic domains.
Research generally supports the conclusion that early intervention with at-risk children significantly increases the chances of the childs subsequent academic success (Campbell & Ramsey, 1994;
Haskins, 1989; Karweit, 1988; Madden, Slavin, Karweit, Dolan, & Wasik, 1993; Reynolds, 1993).
Most children arriving in first grade lacking a basic math knowledge base soon fall behind the class
as increasingly difficult math instruction is presented and built upon. A poor start can lead to perpetual underachievement during the childs entire academic career (Schweinhart & Weikart, 1986).
The high incidence of academic failure and dropouts is causing many school systems to realize
that changes must occur to accommodate the special needs of at-risk youth (Herguert, 1991). Such
changes could begin with developing screening procedures to determine which children are at risk
for failure. While screening tests generally assess a more limited sample of behavior and are held to
a lower standard of reliability, these tests can provide the examiner with a time- and cost-efficient
method for identifying students at risk or estimating future performance (Salvia & Ysseldyke, 1995).
Correspondence concerning this article should be addressed to Bradley T. Erford, Department of Education, Loyola College in Maryland, 4501 N. Charles Street, Baltimore, MD 21210-2699.

127

WD6138.127-136 2/18/98 7:28 AM Page 128

128

Erford, Bagley, Hopper, Lee, Panagopulos, and Preller

Unfortunately, few commercially available math skills screening tests exist that can be administered
quickly in mass screening programs. The challenges in developing such a screening device include
making a test amenable to group administration that assesses all math skills essential to age-appropriate early elementary education, while including enough items to maintain reliability without sacrificing the goal of brief administration time.
Research indicates there are at least seven essential math skills to be mastered in the primary
grades. These skills include writing numbers (Buffington, Heber, & Wilson, 1985; Burton, 1982;
Greenes, Schulman, & Spungin, 1993; Smith & Ginsberg, 1973; White, 1994), addition, subtraction
(Carl, 1989; Cowan & Clary, 1978; Edwards, Nichols, & Sharpe, 1972; MSPP, 1990), telling time
(Charles et al., 1995; Irington, 1989; MSPP, 1990; Smith, 1987), and recognizing fractional concepts
(Carl, 1989; Cowan & Clary, 1978; Edwards et al., 1972; MSPP, 1990; Orfan & Vogeli, 1987). Young
students also should be able to apply money concepts, including counting, making change, and reading and writing money amounts (Charles et al., 1995; MSPP, 1990; Orfan & Vogeli, 1987; Sullivan,
1981; Swartz, 1981), and solve word problems using basic addition and subtraction skills (Charles
et al., 1995; Matz & Leier, 1992; MSPP, 1990; Orfan & Vogeli, 1987).
The MESS-P (Erford, Vitali, Haas, & Boykin, 1995) is a group or individually administered
screening test devised to identify primary grade students at risk for math difficulties. It was designed
to reliably and validly measure the above-mentioned seven essential math skills. The purpose of this
study was to examine the interitem consistency, testretest reliability, decision efficiency, and concurrent and construct validity of the MESS-E. Numerous tests of math skills are currently used as
screening and diagnostic instruments in the identification of children at risk for developmental math
problems. Among these the Wide-Range Achievement TestRevised (WRAT-R; Jastak & Wilkinson
1984) Arithmetic subtest is one of the most commonly used academic screening tests (Jastak &
Wilkinson, 1984). In addition, the WoodcockJohnson: Tests of Achievement, Math Cluster (WJ-R;
Woodcock & Johnson, 1991), and KeyMathRevised (Connolley, 1988) are commonly used as diagnostic assessments, helpful in identifying children at risk for math problems. These tests were
specifically chosen for concurrent validation of the MESS-E.
MethodStudy One
Four independent samples comprised the following four studies.
Participants
Participants were 171 children (87 girls, 84 boys) aged six (n 5 60), seven (n 5 65), and eight
(n 5 46), from three schools located in the central Maryland area. Eighty percent of the students
were White, 14% were African-American, 3% were Hispanic-American, and 3% were Asian-American. Seventy-seven percent of the students were from urban/suburban settings (defined as communities larger than 2,500 people) while 23% were from rural settings.
Procedure
The lead author and a trained assistant group administered the MESS-E, WJ-R Math Cluster,
WRAT-R Arithmetic subtest, and KeyMath-R Operations Area to the 171 participants in their
respective classrooms. The tests were administered in a counterbalanced sequence to eliminate
systematic variations due to order of administration. Counterbalancing was achieved by randomly
ordering protocols for the four tests for each of the class groups using a table of random digits. All
protocols were scored according to standardization specifications.
Analysis
Standard scores for each test were used in computation of Pearson correlation coefficients.

WD6138.127-136 2/18/98 7:28 AM Page 129

MESS-E Reliability and Validity

129

Instruments
Math Essential Skills ScreenerElementary Version (MESS-E). The MESS-E was . . .designed to be used as a screening instrument for the identification of children at risk for potential academic failure, to facilitate remedial program decisions, and to generate more accurate referrals for
deep testing (Erford et al., p. 1). The MESS-E is a 27-item math skills screening test that can be
group or individually administered by educational professionals and trained paraprofessionals and
scored in approximately 1015 minutes. The MESS-E was normed on 711 children, aged 68 (grades
13). The MESS-E is comprised of seven activities appropriate to primary grade mathematical conceptual development. Writing Numerals involves having the child respond to an oral prompt by writing several one-, two- and three-digit numerals. The Addition and Subtraction tasks require the child
to compute five addition and five subtraction problems with and without regrouping. Time and Money tasks necessitate identification of time and money concepts and demonstration of appropriate written conventions. The one Fraction item mandates identification of a fractional portion of a group. Word
Problems involve solving simple addition and subtraction problems. Each item is scored as right/wrong
and tallied in order to arrive at a total raw score. Raw scores can then be converted to standard scores
(M 5 100, SD 5 15), percentile ranks, age- or grade-equivalents and interpretive ranges (pass, borderline, refer). The MESS-E is one of nine screeners comprising the Essential Skills Screener.
KeyMathRevised: A Diagnostic Inventory of Essential Mathematics. The KeyMath-R (Connolly, 1988) is an individually administered test designed to assess understanding and application of
basic mathematical skills. The test is comprised of three primary areas, each containing various subscales: Basic Concepts (Numeration, Rational Numbers, and Geometry), Operations (Addition, Subtraction, Multiplication, Division, and Mental Computation), and Applications (Measurement, Time
and Money, Estimation, Interpreting Data, and Problem Solving). Each subtest score is reported as
scaled scores which are then converted into an Area standard score. Area scores can be combined to
arrive at a total test score. The KeyMath-R was standardized on 1,798 children in grades kindergarten
through nine. The KeyMath-R takes about 1 hr to administer and score. For the purpose of this study,
only the Operations area subtests of Form A were assessed because they could be group-administered. Thus, the following technical summary will focus on Operations Area results. The Operations
Area alternate-forms stability coefficient (n 5 356) was r 5 .82. A split-half reliability analysis using the standardization sample resulted in a coefficient of .92. Connolly also reported in the manual
that item response theory calculations were applied to standardization data and resulted in acceptable levels of reliability. Criterion-related validity studies resulted in correlations of .62 between the
KeyMath Operations Area and Comprehensive Test of Basic Skills (CTBS; CTB/McGrawHill,
1989) Total Mathematics (n 5 121), and .77 with the Iowa Tests of Basic Skills (ITBS; Hoover,
Hieronymus, Frisbie, & Dunbar, 1993) Total Mathematics (n 5 107).
Wide-Range Achievement TestRevised (WRAT-R) Level 1 Arithmetic Subtest. The WRAT-R
Arithmetic subtest (Jastak & Wilkinson, 1984) is an individual or group administered test designed
to assess basic math computation skills by having the child calculate a series of math computation
problems presented in approximately increasing order of difficulty. The Arithmetic subtest score can
be reported as a standard score, percentile rank, and grade equivalent. The WRAT-R was standardized on 5,600 individuals aged fiveadult. The Arithmetic subtest takes about 15 minutes to administer and score. Testretest reliability of the Arithmetic subtest (n 5 81) was reported in the manual
to be .94 for children in grades one through four. A criterion-related validity study (n 5 1,000) resulted in a correlation of .66 between the WRAT-R Arithmetic subtest and Peabody Individual
Achievement Test (PIAT; Markwardt, 1970) Arithmetic subtest. A combination of several additional studies (n 5 400) resulted in a correlation of .81 with the California Achievement Test (CTB/
McGrawHill, 1992) Arithmetic score.

WD6138.127-136 2/18/98 7:28 AM Page 130

130

Erford, Bagley, Hopper, Lee, Panagopulos, and Preller

WoodcockJohnson: Tests of AchievementRevised (WJ-R) Mathematics Cluster. The WJ-R


Mathematics Cluster (Woodcock & Johnson, 1989) is an individually administered cluster of two
subtests, Calculation and Applied Problems. These subtests were designed to assess basic computation and math problem-solving skills. Each subtest can be reported individually as a standard score,
percentile rank, and grade equivalent, then combined to represent a Mathematics Cluster score, expressed as a standard score or percentile rank. The WJ-R was standardized on 6,359 individuals aged
2adult. The two math subtests take about 2035 minutes total to administer and score. Median internal consistency coefficients of the Calculation and Applied Problems subtests and Mathematics
Cluster for the standardization sample were reported in the manual to be .93, .91, and .95, respectively. Criterion-related validity studies (n 5 70 9-year-old children) resulted in significant correlations between the WJ-R Mathematics Cluster score and Basic Achievement Skills Individual Screener (BASIS; Psychological Corporation, 1983) Math subtest (r 5 .71), Peabody Individual
Achievement Test (PIAT; Markwardt, 1970) Math subtest (r 5 .41), Wide Range Achievement TestRevised (WRAT-R; Jastak & Wilkinson, 1984) Arithmetic subtest (r 5 .63), and Kaufman-Tests of
Educational Achievement (K-TEA; Kaufman & Kaufman, 1985) Math Composite (r 5 .83).
MethodStudy Two
Participants
Participants were 125 children (63 girls, 62 boys) aged 6 (n 5 40), seven (n 5 44), and eight
(n 5 41), from two elementary schools located in the central Maryland area. Seventy-nine percent
of the students were White, 15% were African-American, 4% were Hispanic-American and 2% were
Asian-American. Seventy-eight percent of the students were from suburban settings (defined as communities larger than 2,500 people) whereas 22% were from rural settings.
Procedure
Trained evaluators group administered the MESS-E protocols to the 125 participants in their
respective classrooms, then readministered the MESS-E after 2931 days. All protocols were scored
according to standardization specifications.
Analysis
Total raw scores for each administration were used in the computation of Pearson correlation
coefficients to derive an estimate of the MESS-Es testretest reliability.
MethodStudy Three
Participants
Participants were 711 children (383 girls, 328 boys) in grades one (n 5 234), two (n 5 265),
and three (n 5 212) from the standardization sample. Seventy-eight percent of the students were
White and 22% were Nonwhite. Eighty-two percent of the students were from suburban settings
(defined as communities larger than 2,500 people), whereas 18% were from rural settings.
Procedure
The MESS-E was administered to the 711 participants in the standardization sample in groups
of 1529 by trained assistants. All protocols were scored according to standardization specifications.
Analysis
Item raw scores were used in subsequent item analysis (item difficulty and discrimination; Anastasi, 1988), calculations of interitem consistency, and exploratory factor analysis (EFA). EFA

WD6138.127-136 2/18/98 7:28 AM Page 131

MESS-E Reliability and Validity

131

involved subjecting the data to principal components analysis. Confirmatory factor analysis (CFA)
procedures were also conducted using LISREL 7 (Joreskog & Sorbom, 1989) to determine how well
the derived exploratory model fit the standardization sample data.
MethodStudy Four
Participants
Participants were 100 children (61 boys, 39 girls) grades one (n 5 12), two (n 5 36), and three
(n 5 52) from seven elementary schools. Seventy-one percent of the children were White, 20%
African-American, 7% Hispanic-American, and 2% Asian-American. Eighty-seven percent were
from suburban/urban settings (communities larger than 2,500 people) whereas 13% were from rural
communities. Twenty percent had fathers who had not graduated from high school, 52% who had a
high school degree, 23% who had some college, and 5% who had a college degree.
Procedure
Fifty children (grade one, n 5 6; grade two, n 5 18; and grade three, n 5 26) had been identified by their classroom teachers as having no perceived developmental math skill deficiencies and
50 (grade one, n 5 6; grade two, n 5 18; and grade three, n 5 26) were identified by their teachers
as having perceived developmental math skill deficiencies. Each child was administered the MESSE by evaluators blind to the childs developmental math categorization.
Analysis
Data were analyzed using decision theory and interpretation. The efficiency of diagnostic decisions can be understood best by exploring a tests incremental validity (Anastasi, 1988), sensitivity,
and specificity (Gutterman, OBrien, & Young, 1987) and total predictive value (TPV), positive predictive power (PPP), and negative predictive power (NPP; Widiger, Hurt, Frances, Clarkin, &
Gilmore, 1984). Each of these indices was computed using this samples results.
Results and Discussion
Item Analysis
MESS-E item difficulties, using the percentage passing technique (Anastasi, 1988), ranged from
0.19 to 1.00, with a median score of 0.64. Item discriminations, using the extreme groups method
(Anastasi, 1988), ranged from 0.01 to 0.97, with a median discrimination of 0.75. Seventy-four percent of the MESS-E items met or exceeded the 0.50 item discrimination criterion, meaning the items
discriminate very well among high- and low-performing individuals.
Reliability
Interitem consistency of the MESS-E items for the 171 participants in study one was computed using the KuderRichardson-Formula 20 (KR-20), resulting in a total scale coefficient of .92. Inter-item consistencies for participants in grades 1-3 were .85, .89, and .91, respectively. Interitem
consistency of the standardization sample was .94. Interitem consistencies for standardization sample participants in grades 13 were .87, .90, and .93, respectively.
The overall 30-day testretest reliability (Pearson) coefficient for the 125 subjects in study two
was .86. Testretest correlations for grades 13 were .84, .86, and .87, respectively. Therefore, the
MESS-Es reliability, both interitem consistency and testretest, was well within acceptable limits
for a screening test.
Concurrent Validity
Construct validation of the MESS-E involved the calculation of Pearson correlations between
the standard scores of the MESS-E, WJ-R Math Cluster, WRAT-R Arithmetic subtest, and KeyMath-

WD6138.127-136 2/18/98 7:28 AM Page 132

132

Erford, Bagley, Hopper, Lee, Panagopulos, and Preller

Table 1
Pearson Correlation Coefficients Between the MESS-E, WJ-R Math Cluster, WRAT-R1 Arithmetic Subtest,
and KeyMath Operations Area Standard Scores

MESS-E
WJ-R Math Cluster
WRAT-R1 Arithmetic
KeyMath-R Operations

MESS-E

WJ-R Math Cluster

WRAT-R1 Arithmetic

KeyMath-R Operations

1.00

.74
1.00

.80
.67
1.00

.73
.49
.66
1.00

Note. n 5 171; all correlations are significant at p , .001.

R Operations Area. These results are presented in Table 1. The MESS-E displayed a high degree of
relationship with the WJ-R Math Cluster (r 5 .74), WRAT-R Arithmetic subtest (.80), and KeyMath
Operations Area (.73) indicating the tests measure similar constructs (p , .001).
Construct Validity
An exploratory principal components analysis (PCA) was applied to the MESS-E standardization data to identify the construct(s) that may underly the MESS-E. PCA yielded eigenvalues associated with unrotated factors (Thompson, 1989) of 11.01, 2.16, 1.63, 1.49, 1.14, and 1.00. Cattells
(1966) scree test was applied to these six eigenvalues and subsequent extractions of one through
three factors were conducted. Each principal components structure was rotated to the varimax criterion. The 1-factor extraction yielded the most interpretable and parsimonious results, accounting for
37% of the variance among item responses.
A factor structure coefficient of .30 (9% of variance in common between a variable and a factor) and above was considered salient for the interpretation of factor loadings (Tabachnik & Fidell,
1983). The .30 criterion was comprised only three times. The factor structure coefficients and variance accounted for by the analysis are reported in Table 2. The results of the PCA lead to the conclusion that the MESS-E measures a unidimensional construct-math skill.
LISREL 7 procedures (Joreskog & Sorbom, 1989) were applied to the standardization sample
data to determine the 1-factor models fit. The resulting goodness-of-fit index (GFI) of .729 (Adjusted GFI 5 .684) and x2 to df goodness-of-fit index of 9.91 indicated a poor fit of the data to the
factor model. Clark et al. (1994) recommended a GFI of at least .900 and x2 to df index of less than
2.00 to indicate adequate model fit.
This combination of exploratory and confirmatory factor analytic results indicated that even
though a single-factor model appeared to be most parsimonious, it fit the data rather poorly. This
likely resulted because the MESS-E is comprised of only 27 items from seven built-in skill areas.
Thus, even though seven dimensions comprised the screening test, the items were so highly correlated that only a single statistical dimension arose from the exploratory analysis. Developing more
items in each skill area would likely give rise to a multidimensional factor solution, but this, of course,
would substantially increase the time needed to administer the screener.
Decision Reliability
Results of the decision reliability study are presented in Table 3. Criterion for inclusion in the
normal or deficient math skills groups was a decision made by each childs teacher who had instructed the children for at least 4 months. Criterion for deficiency or normal as measured by the
MESS-E was a percentile rank of # 25 and . 25, respectively. Eighty-eight percent of the normal
children were accurately identified using the MESS-E while 98% of the math skills deficiency group
were accurately identified, leading to an overall TPV of 0.93. The MESS-Es sensitivity, or the proportion of the tests true positives out of all those identified by the criterion measure, was 0.98. Speci-

WD6138.127-136 2/18/98 7:28 AM Page 133

MESS-E Reliability and Validity

133

Table 2
PCA One-Factor Extraction for the MESS-E
Standardization Sample
Task/item
1. Writing numerals
a. 1
b. 6
c. 0
d. 23
e. 205
f. 711
2. Addition
a. 4 1 3
b. 9 1 6
c. 24 1 55
d. 15 1 37
e. 36 1 54
3. Subtraction
a. 5 2 2
b. 15 2 7
c. 13 2 2
d. 31 2 16
e. 80 2 24
4. Time
a. 2:00
b. 5:30
c. 10:25
5. Money
a. $.22
b. $.95
c. $1.69
6. Fractions
a. one-half
7. Word Problems
a. 5 1 3
b. 7 2 3
c. 2 1 4
d. 8 1 4 2 2
Eigenvalue 5
% of variance 5

Item-factor association

h2

.11
.26
.15
.45
.75
.75

.01
.07
.02
.20
.57
.57

.42
.71
.78
.73
.76

.18
.51
.61
.53
.57

.64
.67
.72
.50
.50

.41
.45
.52
.25
.25

.81
.84
.85

.66
.71
.71

.72
.75
.64

.52
.57
.41

.37

.13

.49
.71
.52
.56
10.80
37.2

.24
.51
.28
.31

ficity, or the proportion of a tests true negatives out of all those not identified by the criterion measure, was 0.88. Thus, the MESS-E displays a slight tendency to overidentify children at risk. PPP,
the proportion of those with a diagnosis out of all those identified by the screening test, was .89 while
NPP, the proportion of those without a diagnosis not identified by the screening test, was .98. The
MESS-Es incremental validity on this sample was 39%, meaning that evaluators will make 39%
more accurate decisions using the MESS-E than without it. Thus, the MESS-E displayed very respectable decision reliability for this sample. The overidentification tendency is acceptable in a
screening test, as opposed to underidentification, because the purpose of screening tests is to identify likely at-risk students for further diagnostic assessment. Under these circumstances, it is better for

WD6138.127-136 2/18/98 7:28 AM Page 134

134

Erford, Bagley, Hopper, Lee, Panagopulos, and Preller

Table 3
MESS-E Decision Reliability of 100 Children Classified Normal or Math Skill-Deficient by Their Teachers
Teacher report decision

Deficient
(# 25 percentile)

MESS-E
decision
Normal
(. 25 percentile)

Normal

Deficient

False positives
Inaccurate decision
Overidentifications
n5 6

Valid acceptances
Accurate decision

n 5 44
Valid rejections
Accurate decision

n51
False negatives
Inaccurate decision
Underidentifications

n 5 49

the MESS-E to identify too many potentially at-risk students than miss some students who are truly
skill-deficient.
Overall, the MESS-E displayed strong reliability and validity on this studys samples for its
stated purpose as a screening test. The MESS-Es quick, group administration format and assessment
of wide-ranging skills gives it some advantages over other available screening instruments, such as
the WRAT-R (Jastak & Wilkinson, 1984) or the BASIS (Psychological Corporation, 1983). An experienced evaluator can screen an entire class using the MESS-E in 1 hr with substantial confidence
in accuracy. School programs could incorporate the MESS-E into a screening process with reading
and math instruments, such as the Reading Essential Skills ScreenerElementary Version (RESS-E)
and the Writing Essential Skills ScreenerElementary Version (WESS-E; Erford et al., 1995), to enhance procedures for early identification of academic skill deficiencies.
Further research using the MESS-E is needed to replicate these findings and extend validation
to special needs populations and other samples. For example, this study assessed participants of primarily middle to upper-middle SES. Because lower SES students appear more susceptible to at-risk
status than middle class SES students (Slavin & Madden, 1989; Speece & Cooper, 1990), replication of these studies with lower SES students appears particularly important. Predictive criterionrelated validation studies are greatly needed at this time. With further study, the MESS-E could
emerge as a reliable measure of math skills useful in early identification of students at risk for math
difficulties.
References
Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan.
Buffington, A. V., Heber, N. R., & Wilson, P. S. (1985). Mathematics. Columbus, OH: Charles E. Merril.
Burton, G. M. (1982). Writing numerals: Suggestions for helping children. Academic Therapy, 17, 415424.
Campbell, F. A., & Ramsey, C. T. (1994). Effects of early intervention on intellectual and academic achievement: A follow-up study of children from low-income families. Child Development, 65, 684698.
Carl, I. M. (1989). Essential mathematics for the twenty-first century: The position of the National Council of Supervisors
of Mathematics. Arithmetic Teacher, 37(1), 4446.
Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245276.
Charles, R. I., Harcourt, L., Brummett, D. C., Barnett, C. S., Wortzman, R., & Kelly, B. (1995). Quest 2000: Exploring mathematics. New York: AddisonWesley.
Clark, D. B., Turner, S. M., Beidel, D. C., Donovan, J. E., Kirisci, L., & Jacob, R. G. (1994). Reliability and validity
of the Social Phobia and Anxiety Inventory for adolescents. Psychological Assessment, 6(2), 135140.
Connolley (1988). KeyMath-R: A Diagnostic Inventory of Essential Mathematics. Circle Pines, MN: American Guidance
Services.

WD6138.127-136 2/18/98 7:28 AM Page 135

MESS-E Reliability and Validity

135

Cowan, R. E., & Clary, R. C. (1978). Identifying and teaching essential mathematical skills-items. Mathematics Teacher,
71, 130133.
CTB/McGrawHill (1989). The Comprehensive Tests of Basic Skills. Monterey, CA: Author.
CTB/McGrawHill (1992). The California Achievement Tests. Monterey, CA: Author.
Edwards, E. L., Jr., Nichols, E. D., & Sharpe, G. H. (1972). Mathematical competencies and skills essential for enlightened citizens. Mathematics Teacher, 65, 671677.
Erford, B. T., Vitali, G. J., Haas, R., & Boykin, R. R. (1995). Manual for the Essential Skills Screener: At-risk identification (ESS). East Aurora, NY: Slosson.
Greenes, C., Schulman, L., & Spungin, R. (1993). Developing sense about numbers. Arithmetic Teacher, 40, 279284.
Gutterman, E. M., OBrien, J. D., & Young, J. G. (1987). Structured diagnostic interviews for children and adolescents:
Current status and future directions. Journal of the American Academy of Child and Adolescent Psychiatry, 26, 621630.
Haskins, R. (1989). Beyond metaphor: The efficacy of early childhood education. American Psychologist, 44(2), 274282.
Herguert, L. F. (1991). School resources for at-risk youth. Equity and Excellence, 25, 1014.
Hoover, H. D., Hieronymus, A. N., Frisbie, D. A., & Dunbar, S. B. (1993). The Iowa Tests of Basic Skills. Chicago:
Riverside.
Irington, J. (1989). Walk around the clock: Third grade students learn to tell time. Orlando, FL: Nova University (ERIC
Document Reproduction Service No. ED 323 009).
Jastak, S., & Wilkinson, G. (1984). Manual for the Wide-Range Achievement TestRevised (WRAT-R). Wilmington, DE:
Jastak Associates.
Joreskog, K. G., & Sorbom, D. (1989). LISREL 7: A guide to the program and applications (2nd edition). Chicago IL:
SPSS.
Karweit, N. (1988). A research study: Effective preprimary programs and practices. Principal, 67(5), 1821.
Kaufman, A. S., & Kaufman, N. L. (1985). Kaufman Test of Educational Achievement. Circle Pines, MN: American Guidance Service.
Madden, N. A., Slavin, R. E., Karweit, N. L., Dolan, L. J., & Wasik, B. A. (1993). Success for all: Longitudinal effects
of a restructuring program for inner-city elementary schools. American Educational Research Journal, 30(1), 123148.
Markwardt, F. C., Jr. (1970). The Peabody Individual Achievement Test. Circle Pines, MN: American Guidance Services.
Maryland School Performance Program (MSPP; 1990). Learning outcomes in mathematics, reading, writing/
language usage, social studies and science for Maryland School Performance Assessment program [Guide]. Baltimore,
MD: Maryland State Department of Education.
Matz, K. A., & Leier, C. (1992). Word problems and the language connection. Arithmetic Teacher, 39(8), 1417.
Orfan, L. J., & Vogeli, B. J. (1987). Mathematics. Atlanta, GA: Silver, Burdett, & Ginn.
Psychological Corporation (1983). Basic Achievement Skills Individual Screener. San Antonio, TX: Author.
Reynolds, B. E. (1993). The algorists vs. the abacists: An ancient controversy on the use of calculators. The College Mathematics Journal, 24(3), 218223.
Salvia, J., & Ysseldyke, J. E. (1995). Assessment (6th ed.). Boston, MA: Houghton Mifflin.
Schweinhart, L. J., & Weikart, D. P. (1986). Early childhood development programs: A public investment opportunity.
Educational Leadership, 44(3), 412.
Slavin, R. E., & Madden, N. A. (1989). What works for students at risk: A research synthesis. Educational Leadership,
46(5), 413.
Smith, D. E., & Ginsburg, J. (1973). Numbers and numerals. Washington, DC: National Educational Association. (ERIC
Document Service No. ED 077 712).
Smith, P. E. (1987). One hand at a time: Using a one-handed clock to teach time telling skills. Palo Alto, CA: Dale Seymour Publications (ERIC Document Reproduction Service No. ED 286 716).
Speece, D. L., & Cooper, D. H. (1990). Ontogeny of school failure: Classification of first grade children. American Educational Research Journal, 27, 119140.
Sullivan, K. W. (1981). Money: A key to mathematical success. Arithmetic Teacher, 29(3), 3435.
Swartz, K. (1981). High finance at Trilby school. Arithmetic Teacher, 29(3), 5153.
Tabachnik, B. G., & Fidell, L. S. (1983). Using multivariate statistics. New York: Harper & Row.
Thompson, B. (1989). Prerotation and postrotation eigenvalues shouldnt be confused: A reminder. Measurement and Evaluation in Counseling and Development, 22(3), 114116.
White, S. (1994). Overview of the NAEP assessment frameworks [Guide]. Washington, DC: National Center for Educational Statistics.
Widiger, T. A., Hurt, S. W., Frances, A., Clarkin, J. F., & Gilmore, M. (1984). Diagnostic efficiency and DSM-III.
Archives of General Psychiatry, 41(4), 10051012.
Woodcock, R. W., & Johnson, M. B. (1989). WoodcockJohnson: Tests of AchievementRevised. Allen, TX: DLM.