0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

7 visualizzazioni9 pagine-28sici-291520-6807-28199804-2935-3A2-3C127-3A-3Aaid-pits4-3E3.0.co-3B2-m

Aug 14, 2016

© © All Rights Reserved

PDF, TXT o leggi online da Scribd

-28sici-291520-6807-28199804-2935-3A2-3C127-3A-3Aaid-pits4-3E3.0.co-3B2-m

© All Rights Reserved

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

7 visualizzazioni9 pagine-28sici-291520-6807-28199804-2935-3A2-3C127-3A-3Aaid-pits4-3E3.0.co-3B2-m

© All Rights Reserved

Sei sulla pagina 1di 9

1998 John Wiley & Sons, Inc.

CCC 0033-3085/98/020127-09

ELEMENTARY VERSION (MESS-E)

bradley t. erford, donna l. bagley, james a. hopper, ramona m. lee,

kathleen a. panagopulos, and denise b. preller

Loyola College in Maryland

The Math Essential Skill ScreenerElementary Version (MESS-E) is a screener devised to identify primary grade students at risk for math difficulties. Item analysis, interitem consistency, test

retest reliability, decision efficiency, and construct validity of the MESS-E were studied using four

independent samples of boys and girls grades 13 (aged 68). Item analysis revealed median item

difficulty of .64 and median item discrimination of .75. Interitem consistency was .92 (n 5 171)

and .94 (n 5 711), while 30-day testretest reliability was .86 (n 5 125). Exploratory factor analysis indicated a one-factor solution accounting for 37% of observed variance. LISREL 7 confirmatory factor analysis procedures determined that the one-factor model fit the standardization sample

data poorly (goodness-of-fit index 5 .729, x2 to df ratio 5 9.91). The MESS-E yielded concurrent

validity coefficients (n 5 171) of .74 with the WoodcockJohnson: Tests of AchievementRevised

(WJ-R) Math Cluster, .80 with the Wide-Range Achievement TestRevised (WRAT-R) Arithmetic

subtest and .73 with the KeyMath-R Operations Area standard scores. A diagnostic efficiency study

yielded a total predictive value (TPV) of .93, sensitivity 5 .98, specificity 5 .88, positive predictive power (PPP) 5 .89, negative predictive power (NPP) 5 .98, and incremental validity 5 39%.

The MESS-E displayed a slight tendency to overidentify children potentially at risk for math difficulties. 1998 John Wiley & Sons, Inc.

Identifying students at risk for academic failure is receiving increasing attention among educators and has become a controversial topic. At-risk students are those in danger of failing to obtain an

adequate level of educational skills (Slavin & Madden, 1989). But while factors such as socioeconomic status (SES), poor attendance, limited English proficiency, and geographic location may be

helpful in identifying subgroups of students at risk (Slavin & Madden, 1989; Speece & Cooper,

1990), the accuracy of such stereotyped, indirect academic predictors is suspect. For example, while

some low SES students are at risk, not all such children can be characterized as in danger of academic failure. Direct measures of academic performance, such as academic achievement tests, are

likely to yield more accurate identification of children in need of academic intervention. Furthermore, to refer nonspecifically to a child as at risk may lead to a misrepresentation of potential deficiencies. Therefore, it is more meaningful to refer to a child who performs poorly in math as at risk

for math failure. This more specific identification schema acknowledges that children deficient in

one or more academic skill areas are not necessarily skill deficient in all academic domains.

Research generally supports the conclusion that early intervention with at-risk children significantly increases the chances of the childs subsequent academic success (Campbell & Ramsey, 1994;

Haskins, 1989; Karweit, 1988; Madden, Slavin, Karweit, Dolan, & Wasik, 1993; Reynolds, 1993).

Most children arriving in first grade lacking a basic math knowledge base soon fall behind the class

as increasingly difficult math instruction is presented and built upon. A poor start can lead to perpetual underachievement during the childs entire academic career (Schweinhart & Weikart, 1986).

The high incidence of academic failure and dropouts is causing many school systems to realize

that changes must occur to accommodate the special needs of at-risk youth (Herguert, 1991). Such

changes could begin with developing screening procedures to determine which children are at risk

for failure. While screening tests generally assess a more limited sample of behavior and are held to

a lower standard of reliability, these tests can provide the examiner with a time- and cost-efficient

method for identifying students at risk or estimating future performance (Salvia & Ysseldyke, 1995).

Correspondence concerning this article should be addressed to Bradley T. Erford, Department of Education, Loyola College in Maryland, 4501 N. Charles Street, Baltimore, MD 21210-2699.

127

128

Unfortunately, few commercially available math skills screening tests exist that can be administered

quickly in mass screening programs. The challenges in developing such a screening device include

making a test amenable to group administration that assesses all math skills essential to age-appropriate early elementary education, while including enough items to maintain reliability without sacrificing the goal of brief administration time.

Research indicates there are at least seven essential math skills to be mastered in the primary

grades. These skills include writing numbers (Buffington, Heber, & Wilson, 1985; Burton, 1982;

Greenes, Schulman, & Spungin, 1993; Smith & Ginsberg, 1973; White, 1994), addition, subtraction

(Carl, 1989; Cowan & Clary, 1978; Edwards, Nichols, & Sharpe, 1972; MSPP, 1990), telling time

(Charles et al., 1995; Irington, 1989; MSPP, 1990; Smith, 1987), and recognizing fractional concepts

(Carl, 1989; Cowan & Clary, 1978; Edwards et al., 1972; MSPP, 1990; Orfan & Vogeli, 1987). Young

students also should be able to apply money concepts, including counting, making change, and reading and writing money amounts (Charles et al., 1995; MSPP, 1990; Orfan & Vogeli, 1987; Sullivan,

1981; Swartz, 1981), and solve word problems using basic addition and subtraction skills (Charles

et al., 1995; Matz & Leier, 1992; MSPP, 1990; Orfan & Vogeli, 1987).

The MESS-P (Erford, Vitali, Haas, & Boykin, 1995) is a group or individually administered

screening test devised to identify primary grade students at risk for math difficulties. It was designed

to reliably and validly measure the above-mentioned seven essential math skills. The purpose of this

study was to examine the interitem consistency, testretest reliability, decision efficiency, and concurrent and construct validity of the MESS-E. Numerous tests of math skills are currently used as

screening and diagnostic instruments in the identification of children at risk for developmental math

problems. Among these the Wide-Range Achievement TestRevised (WRAT-R; Jastak & Wilkinson

1984) Arithmetic subtest is one of the most commonly used academic screening tests (Jastak &

Wilkinson, 1984). In addition, the WoodcockJohnson: Tests of Achievement, Math Cluster (WJ-R;

Woodcock & Johnson, 1991), and KeyMathRevised (Connolley, 1988) are commonly used as diagnostic assessments, helpful in identifying children at risk for math problems. These tests were

specifically chosen for concurrent validation of the MESS-E.

MethodStudy One

Four independent samples comprised the following four studies.

Participants

Participants were 171 children (87 girls, 84 boys) aged six (n 5 60), seven (n 5 65), and eight

(n 5 46), from three schools located in the central Maryland area. Eighty percent of the students

were White, 14% were African-American, 3% were Hispanic-American, and 3% were Asian-American. Seventy-seven percent of the students were from urban/suburban settings (defined as communities larger than 2,500 people) while 23% were from rural settings.

Procedure

The lead author and a trained assistant group administered the MESS-E, WJ-R Math Cluster,

WRAT-R Arithmetic subtest, and KeyMath-R Operations Area to the 171 participants in their

respective classrooms. The tests were administered in a counterbalanced sequence to eliminate

systematic variations due to order of administration. Counterbalancing was achieved by randomly

ordering protocols for the four tests for each of the class groups using a table of random digits. All

protocols were scored according to standardization specifications.

Analysis

Standard scores for each test were used in computation of Pearson correlation coefficients.

129

Instruments

Math Essential Skills ScreenerElementary Version (MESS-E). The MESS-E was . . .designed to be used as a screening instrument for the identification of children at risk for potential academic failure, to facilitate remedial program decisions, and to generate more accurate referrals for

deep testing (Erford et al., p. 1). The MESS-E is a 27-item math skills screening test that can be

group or individually administered by educational professionals and trained paraprofessionals and

scored in approximately 1015 minutes. The MESS-E was normed on 711 children, aged 68 (grades

13). The MESS-E is comprised of seven activities appropriate to primary grade mathematical conceptual development. Writing Numerals involves having the child respond to an oral prompt by writing several one-, two- and three-digit numerals. The Addition and Subtraction tasks require the child

to compute five addition and five subtraction problems with and without regrouping. Time and Money tasks necessitate identification of time and money concepts and demonstration of appropriate written conventions. The one Fraction item mandates identification of a fractional portion of a group. Word

Problems involve solving simple addition and subtraction problems. Each item is scored as right/wrong

and tallied in order to arrive at a total raw score. Raw scores can then be converted to standard scores

(M 5 100, SD 5 15), percentile ranks, age- or grade-equivalents and interpretive ranges (pass, borderline, refer). The MESS-E is one of nine screeners comprising the Essential Skills Screener.

KeyMathRevised: A Diagnostic Inventory of Essential Mathematics. The KeyMath-R (Connolly, 1988) is an individually administered test designed to assess understanding and application of

basic mathematical skills. The test is comprised of three primary areas, each containing various subscales: Basic Concepts (Numeration, Rational Numbers, and Geometry), Operations (Addition, Subtraction, Multiplication, Division, and Mental Computation), and Applications (Measurement, Time

and Money, Estimation, Interpreting Data, and Problem Solving). Each subtest score is reported as

scaled scores which are then converted into an Area standard score. Area scores can be combined to

arrive at a total test score. The KeyMath-R was standardized on 1,798 children in grades kindergarten

through nine. The KeyMath-R takes about 1 hr to administer and score. For the purpose of this study,

only the Operations area subtests of Form A were assessed because they could be group-administered. Thus, the following technical summary will focus on Operations Area results. The Operations

Area alternate-forms stability coefficient (n 5 356) was r 5 .82. A split-half reliability analysis using the standardization sample resulted in a coefficient of .92. Connolly also reported in the manual

that item response theory calculations were applied to standardization data and resulted in acceptable levels of reliability. Criterion-related validity studies resulted in correlations of .62 between the

KeyMath Operations Area and Comprehensive Test of Basic Skills (CTBS; CTB/McGrawHill,

1989) Total Mathematics (n 5 121), and .77 with the Iowa Tests of Basic Skills (ITBS; Hoover,

Hieronymus, Frisbie, & Dunbar, 1993) Total Mathematics (n 5 107).

Wide-Range Achievement TestRevised (WRAT-R) Level 1 Arithmetic Subtest. The WRAT-R

Arithmetic subtest (Jastak & Wilkinson, 1984) is an individual or group administered test designed

to assess basic math computation skills by having the child calculate a series of math computation

problems presented in approximately increasing order of difficulty. The Arithmetic subtest score can

be reported as a standard score, percentile rank, and grade equivalent. The WRAT-R was standardized on 5,600 individuals aged fiveadult. The Arithmetic subtest takes about 15 minutes to administer and score. Testretest reliability of the Arithmetic subtest (n 5 81) was reported in the manual

to be .94 for children in grades one through four. A criterion-related validity study (n 5 1,000) resulted in a correlation of .66 between the WRAT-R Arithmetic subtest and Peabody Individual

Achievement Test (PIAT; Markwardt, 1970) Arithmetic subtest. A combination of several additional studies (n 5 400) resulted in a correlation of .81 with the California Achievement Test (CTB/

McGrawHill, 1992) Arithmetic score.

130

Mathematics Cluster (Woodcock & Johnson, 1989) is an individually administered cluster of two

subtests, Calculation and Applied Problems. These subtests were designed to assess basic computation and math problem-solving skills. Each subtest can be reported individually as a standard score,

percentile rank, and grade equivalent, then combined to represent a Mathematics Cluster score, expressed as a standard score or percentile rank. The WJ-R was standardized on 6,359 individuals aged

2adult. The two math subtests take about 2035 minutes total to administer and score. Median internal consistency coefficients of the Calculation and Applied Problems subtests and Mathematics

Cluster for the standardization sample were reported in the manual to be .93, .91, and .95, respectively. Criterion-related validity studies (n 5 70 9-year-old children) resulted in significant correlations between the WJ-R Mathematics Cluster score and Basic Achievement Skills Individual Screener (BASIS; Psychological Corporation, 1983) Math subtest (r 5 .71), Peabody Individual

Achievement Test (PIAT; Markwardt, 1970) Math subtest (r 5 .41), Wide Range Achievement TestRevised (WRAT-R; Jastak & Wilkinson, 1984) Arithmetic subtest (r 5 .63), and Kaufman-Tests of

Educational Achievement (K-TEA; Kaufman & Kaufman, 1985) Math Composite (r 5 .83).

MethodStudy Two

Participants

Participants were 125 children (63 girls, 62 boys) aged 6 (n 5 40), seven (n 5 44), and eight

(n 5 41), from two elementary schools located in the central Maryland area. Seventy-nine percent

of the students were White, 15% were African-American, 4% were Hispanic-American and 2% were

Asian-American. Seventy-eight percent of the students were from suburban settings (defined as communities larger than 2,500 people) whereas 22% were from rural settings.

Procedure

Trained evaluators group administered the MESS-E protocols to the 125 participants in their

respective classrooms, then readministered the MESS-E after 2931 days. All protocols were scored

according to standardization specifications.

Analysis

Total raw scores for each administration were used in the computation of Pearson correlation

coefficients to derive an estimate of the MESS-Es testretest reliability.

MethodStudy Three

Participants

Participants were 711 children (383 girls, 328 boys) in grades one (n 5 234), two (n 5 265),

and three (n 5 212) from the standardization sample. Seventy-eight percent of the students were

White and 22% were Nonwhite. Eighty-two percent of the students were from suburban settings

(defined as communities larger than 2,500 people), whereas 18% were from rural settings.

Procedure

The MESS-E was administered to the 711 participants in the standardization sample in groups

of 1529 by trained assistants. All protocols were scored according to standardization specifications.

Analysis

Item raw scores were used in subsequent item analysis (item difficulty and discrimination; Anastasi, 1988), calculations of interitem consistency, and exploratory factor analysis (EFA). EFA

131

involved subjecting the data to principal components analysis. Confirmatory factor analysis (CFA)

procedures were also conducted using LISREL 7 (Joreskog & Sorbom, 1989) to determine how well

the derived exploratory model fit the standardization sample data.

MethodStudy Four

Participants

Participants were 100 children (61 boys, 39 girls) grades one (n 5 12), two (n 5 36), and three

(n 5 52) from seven elementary schools. Seventy-one percent of the children were White, 20%

African-American, 7% Hispanic-American, and 2% Asian-American. Eighty-seven percent were

from suburban/urban settings (communities larger than 2,500 people) whereas 13% were from rural

communities. Twenty percent had fathers who had not graduated from high school, 52% who had a

high school degree, 23% who had some college, and 5% who had a college degree.

Procedure

Fifty children (grade one, n 5 6; grade two, n 5 18; and grade three, n 5 26) had been identified by their classroom teachers as having no perceived developmental math skill deficiencies and

50 (grade one, n 5 6; grade two, n 5 18; and grade three, n 5 26) were identified by their teachers

as having perceived developmental math skill deficiencies. Each child was administered the MESSE by evaluators blind to the childs developmental math categorization.

Analysis

Data were analyzed using decision theory and interpretation. The efficiency of diagnostic decisions can be understood best by exploring a tests incremental validity (Anastasi, 1988), sensitivity,

and specificity (Gutterman, OBrien, & Young, 1987) and total predictive value (TPV), positive predictive power (PPP), and negative predictive power (NPP; Widiger, Hurt, Frances, Clarkin, &

Gilmore, 1984). Each of these indices was computed using this samples results.

Results and Discussion

Item Analysis

MESS-E item difficulties, using the percentage passing technique (Anastasi, 1988), ranged from

0.19 to 1.00, with a median score of 0.64. Item discriminations, using the extreme groups method

(Anastasi, 1988), ranged from 0.01 to 0.97, with a median discrimination of 0.75. Seventy-four percent of the MESS-E items met or exceeded the 0.50 item discrimination criterion, meaning the items

discriminate very well among high- and low-performing individuals.

Reliability

Interitem consistency of the MESS-E items for the 171 participants in study one was computed using the KuderRichardson-Formula 20 (KR-20), resulting in a total scale coefficient of .92. Inter-item consistencies for participants in grades 1-3 were .85, .89, and .91, respectively. Interitem

consistency of the standardization sample was .94. Interitem consistencies for standardization sample participants in grades 13 were .87, .90, and .93, respectively.

The overall 30-day testretest reliability (Pearson) coefficient for the 125 subjects in study two

was .86. Testretest correlations for grades 13 were .84, .86, and .87, respectively. Therefore, the

MESS-Es reliability, both interitem consistency and testretest, was well within acceptable limits

for a screening test.

Concurrent Validity

Construct validation of the MESS-E involved the calculation of Pearson correlations between

the standard scores of the MESS-E, WJ-R Math Cluster, WRAT-R Arithmetic subtest, and KeyMath-

132

Table 1

Pearson Correlation Coefficients Between the MESS-E, WJ-R Math Cluster, WRAT-R1 Arithmetic Subtest,

and KeyMath Operations Area Standard Scores

MESS-E

WJ-R Math Cluster

WRAT-R1 Arithmetic

KeyMath-R Operations

MESS-E

WRAT-R1 Arithmetic

KeyMath-R Operations

1.00

.74

1.00

.80

.67

1.00

.73

.49

.66

1.00

R Operations Area. These results are presented in Table 1. The MESS-E displayed a high degree of

relationship with the WJ-R Math Cluster (r 5 .74), WRAT-R Arithmetic subtest (.80), and KeyMath

Operations Area (.73) indicating the tests measure similar constructs (p , .001).

Construct Validity

An exploratory principal components analysis (PCA) was applied to the MESS-E standardization data to identify the construct(s) that may underly the MESS-E. PCA yielded eigenvalues associated with unrotated factors (Thompson, 1989) of 11.01, 2.16, 1.63, 1.49, 1.14, and 1.00. Cattells

(1966) scree test was applied to these six eigenvalues and subsequent extractions of one through

three factors were conducted. Each principal components structure was rotated to the varimax criterion. The 1-factor extraction yielded the most interpretable and parsimonious results, accounting for

37% of the variance among item responses.

A factor structure coefficient of .30 (9% of variance in common between a variable and a factor) and above was considered salient for the interpretation of factor loadings (Tabachnik & Fidell,

1983). The .30 criterion was comprised only three times. The factor structure coefficients and variance accounted for by the analysis are reported in Table 2. The results of the PCA lead to the conclusion that the MESS-E measures a unidimensional construct-math skill.

LISREL 7 procedures (Joreskog & Sorbom, 1989) were applied to the standardization sample

data to determine the 1-factor models fit. The resulting goodness-of-fit index (GFI) of .729 (Adjusted GFI 5 .684) and x2 to df goodness-of-fit index of 9.91 indicated a poor fit of the data to the

factor model. Clark et al. (1994) recommended a GFI of at least .900 and x2 to df index of less than

2.00 to indicate adequate model fit.

This combination of exploratory and confirmatory factor analytic results indicated that even

though a single-factor model appeared to be most parsimonious, it fit the data rather poorly. This

likely resulted because the MESS-E is comprised of only 27 items from seven built-in skill areas.

Thus, even though seven dimensions comprised the screening test, the items were so highly correlated that only a single statistical dimension arose from the exploratory analysis. Developing more

items in each skill area would likely give rise to a multidimensional factor solution, but this, of course,

would substantially increase the time needed to administer the screener.

Decision Reliability

Results of the decision reliability study are presented in Table 3. Criterion for inclusion in the

normal or deficient math skills groups was a decision made by each childs teacher who had instructed the children for at least 4 months. Criterion for deficiency or normal as measured by the

MESS-E was a percentile rank of # 25 and . 25, respectively. Eighty-eight percent of the normal

children were accurately identified using the MESS-E while 98% of the math skills deficiency group

were accurately identified, leading to an overall TPV of 0.93. The MESS-Es sensitivity, or the proportion of the tests true positives out of all those identified by the criterion measure, was 0.98. Speci-

133

Table 2

PCA One-Factor Extraction for the MESS-E

Standardization Sample

Task/item

1. Writing numerals

a. 1

b. 6

c. 0

d. 23

e. 205

f. 711

2. Addition

a. 4 1 3

b. 9 1 6

c. 24 1 55

d. 15 1 37

e. 36 1 54

3. Subtraction

a. 5 2 2

b. 15 2 7

c. 13 2 2

d. 31 2 16

e. 80 2 24

4. Time

a. 2:00

b. 5:30

c. 10:25

5. Money

a. $.22

b. $.95

c. $1.69

6. Fractions

a. one-half

7. Word Problems

a. 5 1 3

b. 7 2 3

c. 2 1 4

d. 8 1 4 2 2

Eigenvalue 5

% of variance 5

Item-factor association

h2

.11

.26

.15

.45

.75

.75

.01

.07

.02

.20

.57

.57

.42

.71

.78

.73

.76

.18

.51

.61

.53

.57

.64

.67

.72

.50

.50

.41

.45

.52

.25

.25

.81

.84

.85

.66

.71

.71

.72

.75

.64

.52

.57

.41

.37

.13

.49

.71

.52

.56

10.80

37.2

.24

.51

.28

.31

ficity, or the proportion of a tests true negatives out of all those not identified by the criterion measure, was 0.88. Thus, the MESS-E displays a slight tendency to overidentify children at risk. PPP,

the proportion of those with a diagnosis out of all those identified by the screening test, was .89 while

NPP, the proportion of those without a diagnosis not identified by the screening test, was .98. The

MESS-Es incremental validity on this sample was 39%, meaning that evaluators will make 39%

more accurate decisions using the MESS-E than without it. Thus, the MESS-E displayed very respectable decision reliability for this sample. The overidentification tendency is acceptable in a

screening test, as opposed to underidentification, because the purpose of screening tests is to identify likely at-risk students for further diagnostic assessment. Under these circumstances, it is better for

134

Table 3

MESS-E Decision Reliability of 100 Children Classified Normal or Math Skill-Deficient by Their Teachers

Teacher report decision

Deficient

(# 25 percentile)

MESS-E

decision

Normal

(. 25 percentile)

Normal

Deficient

False positives

Inaccurate decision

Overidentifications

n5 6

Valid acceptances

Accurate decision

n 5 44

Valid rejections

Accurate decision

n51

False negatives

Inaccurate decision

Underidentifications

n 5 49

the MESS-E to identify too many potentially at-risk students than miss some students who are truly

skill-deficient.

Overall, the MESS-E displayed strong reliability and validity on this studys samples for its

stated purpose as a screening test. The MESS-Es quick, group administration format and assessment

of wide-ranging skills gives it some advantages over other available screening instruments, such as

the WRAT-R (Jastak & Wilkinson, 1984) or the BASIS (Psychological Corporation, 1983). An experienced evaluator can screen an entire class using the MESS-E in 1 hr with substantial confidence

in accuracy. School programs could incorporate the MESS-E into a screening process with reading

and math instruments, such as the Reading Essential Skills ScreenerElementary Version (RESS-E)

and the Writing Essential Skills ScreenerElementary Version (WESS-E; Erford et al., 1995), to enhance procedures for early identification of academic skill deficiencies.

Further research using the MESS-E is needed to replicate these findings and extend validation

to special needs populations and other samples. For example, this study assessed participants of primarily middle to upper-middle SES. Because lower SES students appear more susceptible to at-risk

status than middle class SES students (Slavin & Madden, 1989; Speece & Cooper, 1990), replication of these studies with lower SES students appears particularly important. Predictive criterionrelated validation studies are greatly needed at this time. With further study, the MESS-E could

emerge as a reliable measure of math skills useful in early identification of students at risk for math

difficulties.

References

Anastasi, A. (1988). Psychological testing (6th ed.). New York: Macmillan.

Buffington, A. V., Heber, N. R., & Wilson, P. S. (1985). Mathematics. Columbus, OH: Charles E. Merril.

Burton, G. M. (1982). Writing numerals: Suggestions for helping children. Academic Therapy, 17, 415424.

Campbell, F. A., & Ramsey, C. T. (1994). Effects of early intervention on intellectual and academic achievement: A follow-up study of children from low-income families. Child Development, 65, 684698.

Carl, I. M. (1989). Essential mathematics for the twenty-first century: The position of the National Council of Supervisors

of Mathematics. Arithmetic Teacher, 37(1), 4446.

Cattell, R. B. (1966). The scree test for the number of factors. Multivariate Behavioral Research, 1, 245276.

Charles, R. I., Harcourt, L., Brummett, D. C., Barnett, C. S., Wortzman, R., & Kelly, B. (1995). Quest 2000: Exploring mathematics. New York: AddisonWesley.

Clark, D. B., Turner, S. M., Beidel, D. C., Donovan, J. E., Kirisci, L., & Jacob, R. G. (1994). Reliability and validity

of the Social Phobia and Anxiety Inventory for adolescents. Psychological Assessment, 6(2), 135140.

Connolley (1988). KeyMath-R: A Diagnostic Inventory of Essential Mathematics. Circle Pines, MN: American Guidance

Services.

135

Cowan, R. E., & Clary, R. C. (1978). Identifying and teaching essential mathematical skills-items. Mathematics Teacher,

71, 130133.

CTB/McGrawHill (1989). The Comprehensive Tests of Basic Skills. Monterey, CA: Author.

CTB/McGrawHill (1992). The California Achievement Tests. Monterey, CA: Author.

Edwards, E. L., Jr., Nichols, E. D., & Sharpe, G. H. (1972). Mathematical competencies and skills essential for enlightened citizens. Mathematics Teacher, 65, 671677.

Erford, B. T., Vitali, G. J., Haas, R., & Boykin, R. R. (1995). Manual for the Essential Skills Screener: At-risk identification (ESS). East Aurora, NY: Slosson.

Greenes, C., Schulman, L., & Spungin, R. (1993). Developing sense about numbers. Arithmetic Teacher, 40, 279284.

Gutterman, E. M., OBrien, J. D., & Young, J. G. (1987). Structured diagnostic interviews for children and adolescents:

Current status and future directions. Journal of the American Academy of Child and Adolescent Psychiatry, 26, 621630.

Haskins, R. (1989). Beyond metaphor: The efficacy of early childhood education. American Psychologist, 44(2), 274282.

Herguert, L. F. (1991). School resources for at-risk youth. Equity and Excellence, 25, 1014.

Hoover, H. D., Hieronymus, A. N., Frisbie, D. A., & Dunbar, S. B. (1993). The Iowa Tests of Basic Skills. Chicago:

Riverside.

Irington, J. (1989). Walk around the clock: Third grade students learn to tell time. Orlando, FL: Nova University (ERIC

Document Reproduction Service No. ED 323 009).

Jastak, S., & Wilkinson, G. (1984). Manual for the Wide-Range Achievement TestRevised (WRAT-R). Wilmington, DE:

Jastak Associates.

Joreskog, K. G., & Sorbom, D. (1989). LISREL 7: A guide to the program and applications (2nd edition). Chicago IL:

SPSS.

Karweit, N. (1988). A research study: Effective preprimary programs and practices. Principal, 67(5), 1821.

Kaufman, A. S., & Kaufman, N. L. (1985). Kaufman Test of Educational Achievement. Circle Pines, MN: American Guidance Service.

Madden, N. A., Slavin, R. E., Karweit, N. L., Dolan, L. J., & Wasik, B. A. (1993). Success for all: Longitudinal effects

of a restructuring program for inner-city elementary schools. American Educational Research Journal, 30(1), 123148.

Markwardt, F. C., Jr. (1970). The Peabody Individual Achievement Test. Circle Pines, MN: American Guidance Services.

Maryland School Performance Program (MSPP; 1990). Learning outcomes in mathematics, reading, writing/

language usage, social studies and science for Maryland School Performance Assessment program [Guide]. Baltimore,

MD: Maryland State Department of Education.

Matz, K. A., & Leier, C. (1992). Word problems and the language connection. Arithmetic Teacher, 39(8), 1417.

Orfan, L. J., & Vogeli, B. J. (1987). Mathematics. Atlanta, GA: Silver, Burdett, & Ginn.

Psychological Corporation (1983). Basic Achievement Skills Individual Screener. San Antonio, TX: Author.

Reynolds, B. E. (1993). The algorists vs. the abacists: An ancient controversy on the use of calculators. The College Mathematics Journal, 24(3), 218223.

Salvia, J., & Ysseldyke, J. E. (1995). Assessment (6th ed.). Boston, MA: Houghton Mifflin.

Schweinhart, L. J., & Weikart, D. P. (1986). Early childhood development programs: A public investment opportunity.

Educational Leadership, 44(3), 412.

Slavin, R. E., & Madden, N. A. (1989). What works for students at risk: A research synthesis. Educational Leadership,

46(5), 413.

Smith, D. E., & Ginsburg, J. (1973). Numbers and numerals. Washington, DC: National Educational Association. (ERIC

Document Service No. ED 077 712).

Smith, P. E. (1987). One hand at a time: Using a one-handed clock to teach time telling skills. Palo Alto, CA: Dale Seymour Publications (ERIC Document Reproduction Service No. ED 286 716).

Speece, D. L., & Cooper, D. H. (1990). Ontogeny of school failure: Classification of first grade children. American Educational Research Journal, 27, 119140.

Sullivan, K. W. (1981). Money: A key to mathematical success. Arithmetic Teacher, 29(3), 3435.

Swartz, K. (1981). High finance at Trilby school. Arithmetic Teacher, 29(3), 5153.

Tabachnik, B. G., & Fidell, L. S. (1983). Using multivariate statistics. New York: Harper & Row.

Thompson, B. (1989). Prerotation and postrotation eigenvalues shouldnt be confused: A reminder. Measurement and Evaluation in Counseling and Development, 22(3), 114116.

White, S. (1994). Overview of the NAEP assessment frameworks [Guide]. Washington, DC: National Center for Educational Statistics.

Widiger, T. A., Hurt, S. W., Frances, A., Clarkin, J. F., & Gilmore, M. (1984). Diagnostic efficiency and DSM-III.

Archives of General Psychiatry, 41(4), 10051012.

Woodcock, R. W., & Johnson, M. B. (1989). WoodcockJohnson: Tests of AchievementRevised. Allen, TX: DLM.