Sei sulla pagina 1di 5

Norm and criterion-referenced tests 1

Norm-referenced and criterion-referenced tests as indicators of success in the classroom

Norm-referenced and criterion-referenced tests serve a variety of purposes due to the

array of educational situations that exist in today’s schools. Testing can rank students with each

other or some other sociocultural norm, or testing can be based on some performance criteria that

focus on assessing certain understandings or skill set. Ideally, a combination of both testing

types exists in a way that is valid, reliable, and fair. Thus, given that many classrooms contain

students with different socioeconomic and cultural backgrounds, testing becomes quite a

challenge. Therefore, in order to assure that all students receive the most appropriate feedback, a

variety of testing techniques is needed so that proper decisions and actions can be made that best

suit the learner.

Virtually all students have taken some kind of standardized test by the time they enter

high school or college. Moreover, many standardized tests (i.e., high stakes tests) are used as a

condition of graduation, acceptance, or financial aid. Because these tests are used as a way to

rank or compare students, they are often referred to as norm-referenced tests (NRT) (Kubiszyn

and Borich, 2007). NRTs are commonly used when stakeholders are interested in the central

tendency of the results of a group of students, as when descriptive statistics are used to find the

average, mean, median, and mode of a particular data set. When using tests to diagnose or to

figure the aptitude of a student, inferences are made based on how students compare with each

other or some other sample based on a social norm. Since results are “objective” – test items are

usually in terms of right and wrong answers – and since many tests can be applied at once, NRTs

are typically more appropriate for making decisions that are non-instructional based.

In addition to NRTs being used externally to rank students (e.g., SAT, ACT, etc.),

teachers oftentimes use NRTs to test students in the classroom. Multiple-choice, true-false,
Norm and criterion-referenced tests 2

matching, and essay questions are common testing types that fall under this same category. Test

results are gathered, averaged, and ranked in order for teachers to make their best inference as to

what level a student has understood, obtained the necessary skill set, or developed the intended

disposition based on the goals and objectives of the classroom. Subsequently, instructional

decisions are often made based on these results either by reviewing past information that students

continue to struggle with or continuing on with new information that makes up part of the

curriculum. Having framed NRT first as an external instrument, such as an ACT, then as an

internal instrument used by teachers in their classrooms, one can see a noticeable difference in

why they are being used in each circumstance. The former is to make decisions regarding

achievement while the latter is to make decisions regarding instruction. This distinction is

important when talking about a second type of test that is based on criteria.

Instead of ranking students to some certain norm, another testing method aids in basing

students performance in terms of meeting certain criteria. Kubiszyn and Borich (2007) define

criterion-referenced test (CRT) as tests that “tells us about a student’s level of proficiency in or

mastery of some skill or set of skills” (p. 66). Wiggins and McTighe (2005) also put forth the

notion of promoting the six facets of understanding (e.g., explain, interpret, apply, perspective,

empathy, and self-knowledge) when testing students regarding what they know and their

disposition they possess. In other words, CRTs can provide teachers with greater insight on

instructional decision-making adjustments when student performances are assessed in terms of

performance criteria. Rubrics are often used in order to qualitatively assess performances and

products. Arter and McTighe (2001) distinguish between a holistic and analytical trait rubric

when they state “A holistic rubric gives a single score or rating for an entire product or

performance based on an overall impression of a student’s work” and “an analytical trait rubric
Norm and criterion-referenced tests 3

divides a product or performance into essential traits or dimensions so that they can be judged

separately-one analyzes a product or performance for essential traits” (p. 18).

Communicating these “essential traits” with students provides the basis for what constitutes a

“good” and “bad” performance or product, and is essential in setting the expectations between

teacher and student. Indeed, CRTs are specifically suited for assessing understandings,

knowledge, skills, and dispositions in terms of subsequent inferences towards instructional

decision-making adjustments and adjustments to student learning tactics.

Regardless of the test being administered, reliability, validity, and “absence-of-bias”

(Popham, 2008, p. 73) drive the level of predictability an instrument has in making proper

inferences on a student’s achievement. Reliability in NRTs is of high concern since many

versions of the ACT, for example, are expected to contain test items that measure the same

content. Similarly, the same ACT should yield similar results (i.e., a high correlation coefficient)

if students retake the exam without being exposed to a learning intervention in the interim. The

validity of a test pertains to the three Cs: “content, criterion, and construct” (Popham, 2008, p.

53). Content validity addresses how test items represent concepts that are covered in the

curriculum. Criterion validity in NRTs deals with how accurate the testing items are in

predicting future behavior (e.g., ACT and SAT scores and subsequent academic success or

failure). Criterion validity in CRTs deals with rubric traits and how valid they are in terms of a

student’s future performance. The final C, construct validity, has to do with how a student’s

performance over time is gauged in terms of meeting criteria that is aligned to the curriculum.

And finally, absence-of-bias centers on how test items present information that is fair; that is,

does not lean towards a certain group of people based on socioeconomic status, race, ethnic

background, gender, or sexual orientation.


Norm and criterion-referenced tests 4

NRTs and CRTs should not be considered dichotomous, but are two different approaches

to assessing students in a complementary way. Ranking and comparing students has a purpose

when the goal is to measure achievement and to predict future academic success. Conversely,

testing understandings, knowledge, skills, and disposition through performance and product

criteria serves a vital role in making inferences that influence instructional decisions and student

tactic adjustments. In order for tests to be valid, reliable, and absent of bias, test designers

should conduct a variety of reviews to assure that tests measure curricular aims, are reliable

within the same and different versions of an exam, and do not discriminate minority groups

based on age, race, gender, socioeconomic status, or sexual orientation. Tests are the link

between the written and taught curriculum, the ideal and the reality of what schools are for all its

stakeholders. Thus, in order to continue the development and improve the feedback that tests

provide all of its stakeholders, a collaborative effort is needed in bringing together a community

of practice that addresses these important aspects of testing and assessment.


Norm and criterion-referenced tests 5

References

Arter, J. and McTighe, J. (2001). Scoring rubrics in the classroom: Using performance criteria
for assessing and improving student performance. Thousand Oaks: CA: Corwin Press.

Kubiszyn, T. and Borich, G. (2007). Educational testing and measurement: Classroom


application and practice. Hoboken, NJ: Wiley and Jossey-Bass Education.

Popham, W. (2008). Classroom assessment: What teachers need to know. New York: Pearson.

Wiggins, G. and McTighe, J. (2005). Understanding by design. Alexandria, VA: ASCD.

Potrebbero piacerti anche