Chapter3 PDF

UNIT 1
CHAPTER 3
ASSESSMENT IN MATHEMATICS CLASSROOMS

PRINCIPLES OF GOOD ASSESSMENT
CHAPTER 3
PRINCIPLES OF GOOD
ASSESSMENT
INTRODUCTION
Assessment is a critical component of any educational program. It involves the
selection, collection, and interpretation of information about students performance and
program adequacy. In relation to that, assessment is an educational process directed
toward the improvement of instruction and the assurance of students success. These
again may be accomplished through various ways or procedures.
Assessment is critical as well as integral part of teaching and learning process. It
cannot be isolated or evolved as a stand alone educational process. A major part of
assessment is taking place in the classrooms during instructions. Through assessment,
teachers gather information and insight about their students and their learning.
Assessment is therefore embedded in every aspect of classroom thinking, planning,
and action. Its importance cannot be overlook. Hence, as educators we need guiding
principles in executing assessment process in teaching and learning. In this chapter
we will examine these guiding principles.
OUTCOMES
At the end of this chapter students will be able to:
1. explain the principles of good assessment in mathematics instruction; and
2. construct systematically a good test.
3.1
PRINCIPLES OF GOOD PRACTICE FOR ASSESSMENT

OF LEARNING
What are the principles for classroom assessment?
Assessment is the process of gathering, interpreting, and synthesizing information to

aid decision making in the classroom. Although teachers have in hand multitude of
assessment information whether formal or informal, written or observational, cognitive
or affective, authentic or traditional, teachers will have to rely on good assessment
information. How do one possibly get good assessment information? What are the
criteria for good assessment? Figure 3.1 shows nine principles in good practice for
assessing learning.
OUM
33

UNIT 1
CHAPTER 3
Nine Principles in Good Practice for Assessing Learning

1.
The assessment of student learning begins with educational values what are the kinds of
learning intended for the learners and how to go about achieving the values.
2.
Assessment is most effective when it reflects an understanding of learning as multi dimensional,

integrated, and revealed in performance over time this involves not only knowledge and abilities
but values, attitudes and habits of the mind that affect the academic and performance beyond
the classroom.
3.
Assessment works best when the programs it seeks to improve have clear and explicit purposes
- with clear and implemental goals assessment can be more focused and useful.
4.
Assessment requires attention to outcomes but also and equally important is students
experiences that lead to those outcomes this means to improve outcomes special effort
about student experience along the way, curricula, teaching and interactions need to be given
attention.
5.
Assessment works best when it is ongoing and not episodic improvement of teaching and
learning is best fostered when assessment entails linked series of activities undertaken over
time.
6.
Assessment fosters wider improvement when representatives from across the educational
community are involved.
7.
Assessment makes a difference when it begins with issues of use and illuminates questions
that people really care about.
8.
Assessment is most likely to lead to improvement when it is part of a larger set of conditions
that promote change.
9.
Through assessment, educators meet responsibilities to students and to the public.

Figure 3.1: Nine principles in good practice for assessing learning
(Source: American Association for Higher Education, 2003)
34
OUM
UNIT 1
CHAPTER 3

Lange (1999) suggested a list of principles for classroom assessment as shown in

Figure 3.2.
Lange List of Principles for Classroom Assessment

1.
The main purpose of classroom assessment is to improve learning.
2.
The mathematics is embedded in worthwhile problems that are part of the students world. This
also means that students are learning mathematic is engaging, educative and authentic.
3.
Methods of assessment should be such that they enable students to reveal what they know,
rather than what they do not know.
4.
A balanced assessment plan should include multiple and varied opportunities or formats for
students to demonstrate and document their achievements.
5.
Tasks should be able to operate all goals of the curricula hence considering different levels of
mathematical thinking.
6.
Grading criteria should be public and consistently applied; should include examples of earlier
grading showing exemplary work and work that is less than exemplary.
7.
The assessment process, including scoring and grading, should be open to students.
8.
Students should have opportunities to receive genuine feedback on their work.
9.
The quality of a task is not defined by its accessibility to objective scoring, reliability, or validity
in the traditional sense but by authenticity, fairness and the extent to which it meet the above
principles.
Figure 3.2: Lange List of Principles for Classroom Assessment

(Source: Jan de Lange (1999) Freudenthal Institute & National Center for Improving Student
learning and Achievement in Mathematics and Science)
Other principles or considerations of good assessment are validity, reliability, and

practicality. The single most important characteristic of good assessment information
is its ability to help the teacher to make a correct decision which is referred to validity.
A second important characteristic of good assessment information is its consistency
or reliability (Messick, 1989). Figure 3.3 shows the other principles or considerations
of good assessment.
Why is validity more important than reliability?
OUM
35

UNIT 1
CHAPTER 3
Figure 3.3: Considerations of good assessment
3.1.1 Issues of Validity

Validity is essential as without it, the assessment data will not lead to correct information
for further instruction. When one ask the question Am I collecting the right kind of
evidence for the decision I want to make? In mathematics instruction Miss Syairah
can give a test that covers items consistent or similar with those of hers during
instruction rather than simply asked students How much they have learned. Similarly,
to assess students motivation for studying mathematics, Miss Syairah can make
observation of students classwork over a period of time than it is to judge the students
by their past performance or by their socio-economic background.
Figure 3.4 shows some examples of questions that help to determine the validity of
mathematics classroom assessment.
Questions to Determine Validity of Classroom Assessment
Does the assessment cover the mathematic topics that have been taught and focused during
class activities?
Does the assessment method using portfolio for algebra topic allow the teacher to make valid
decision about his or her instruction and assessment?
Do the assessment questions allow students to demonstrate the performance that the teacher
wants to assess?
Does the assessment cover the important aspects of what the teacher want to assess?
Does the teacher provide scoring procedures clearly, consistently and unbiased?
Does the teacher provide clear directions and wording of the mathematics items clear enough
that students will know what is to be expected in their answers?
Does the teacher present items of varying task of difficulty, sufficient numbers of easy questions
as well as problem solving items in order to assess problem solving performance of the students?
Figure 3.4: Some examples of questions that help to determine the
validity of mathematics classroom assessment
36
OUM
UNIT 1
CHAPTER 3

Besides the questions covered in Figure 3.4, can you think of other questions
that could determine validity of classroom assessment?
Figure 3.5 shows the key aspects of assessment validity.
Key Aspects of Assessment Validity
Validity is concerned with this general question: To what extent will this assessment information
help me make an appropriate decision?
Validity refers to the decision that are made from assessment information, the assessment
approach itself. It is not appropriate to say the assessment information is valid unless the
decisions or groups it is valid for are identified. Assessment information valid for one decision or
group of pupils is not necessarily valid for other decision or group.
Validity is a mater of degree; it does not exist on an all-or-nothing basis. Think of assessment
validity in terms of categories: highly valid, moderately valid, and invalid.
Validity is always determined by a judgment made by the test user.

Figure 3.5: Shows the key aspects of assessment validity. Source: Airasian, 2001
3.1.2
Issues of Reliability
How would you determine if the given information is reliable or not?
This is the second important characteristic of good assessment is its consistency,

stability, or reliability. Reliability is the extent to which an assessment consistently
assesses whatever is assessing (Airasian, 2001). Unreliable or inconsistent information
does not help teachers make sound decisions. Unreliable information is useless
because it provide different information each time it is used, just like an unreliable
weighing scale used to measure ones weight.
While no assessment tools are error-free, assessment information are unreliable to a
certain extent. Some of the possible influences of assessment reliability are shown in
Figure 3.6.
OUM
37

UNIT 1
CHAPTER 3
Influences of Assessment Reliability
Ambiguous test items.
Interruptions during testing.
Differences in students attention span.
Clarity of test directions.
Students luck in guessing.
Appropriateness of the assessment.
Mistakes in the scoring process.
Obtaining too small sample of behavior or intended learning outcomes to permit the students to
show consistent or stable performance.
Figure 3.6: Some of the possible influences of assessment reliability
Key aspects of assessment reliability are shown in Figure 3.7.
Key Aspects of Assessment Reliability
Reliability refers to the stability or consistency of assessment information and is concerned

with this question: How consistent or typical of the pupils behavior is the assessment information
I have gathered?
Reliability is not concerned with the appropriateness of the assessment information collected,
only with its consistency, stability, or typicality. Appropriateness of assessment information is
a validity concern.
Reliability does not exist on all-or-nothing basis, but in degrees: high, moderate, or low. Some
types of assessment information are more reliable than others.
Reliability is a necessary but insufficient condition for validity. An assessment that provides
inconsistent, atypical results cannot be relied upon to provide information useful for decision
making.
Figure 3.7: Key aspects of assessment reliability
3.1.3
Ethical Issues and Responsibilities
Although assessment is thought as a technical activity, there are ethical concerns

associated with the assessment process. Since teachers decision can influence
students self-perception and life chances, teachers must be aware of their ethical
responsibilities when assessing. Figure 3.8 shows some of these considerations.
38
OUM
UNIT 1
CHAPTER 3

Ethical Issues and Responsibilities

1.
Informing student about teacher expectations and assessments before beginning teaching and
assessment.
2.
Describing for pupils what they are to be assessed on before actually assessment.
3.
Being cautious about making snap judgments by labeling pupils with emotional labels (e.g.,
disinterested, at-risk, slow learner) before you have spent time with them.
4.
Avoiding stereotyping pupils (e.g., Kids from that part of town are troublemakers. Students
who dress that way have no interest in school.).
5.
Avoiding terms and examples that may be offensive to students of different gender, race, religion,
culture, or nationality.
6.
Recognizing that cultural differences do not imply cultural deficits.
7.
Knowing and protecting pupils legal rights guaranteed by federal/state law.
8.
Respecting learners diversities or disabilities and ensuring that pupil participation and interactions
are not limited on the basis of diversity or disability.
Figure 3.8: Ethical Issues and Responsibilities when assessing
How would you know if a teacher had constructed a valid assessment?

How would you know if a teacher had constructed a reliable assessment?
Give your own examples of teachers ethical responsibilities related to
assessment.
3.2
APPROACHES TO TEST VALIDITY
Tests and other assessment tools serve a variety of uses in the schools mainly in
making educational decisions concerning teaching process, learning process, selection
purposes, placement process, and certification of mastery, aptitude scores, and attitude
tendency of students. Validity is an important aspect in assessment. Figure 3.9 shows
the suggested ways to ensure validity.
OUM
39

UNIT 1
CHAPTER 3
Ways to Ensure Validity
There must be clear statement of objectives or learning outcomes of the course.
The objectives or learning outcomes of the course are to be measured (whether they are achieved).
The evaluation should sample the students abilities on majority of the objective or
learning outcome?.
Include various types of test for higher validity.
Carefully matching the test with the course objectives, content and teaching approaches.
Increase the sample of learning objectives hence content areas and level of questions included
in any given tests - use test blueprint or test specification table for this purpose.
Using test methods that are appropriate for the objectives specified.
Employing range of test types.
Ensuring adequate security and supervision in conducting the test to avoid cheating.
Figure 3.9: Ways to ensure validity
Figure 3.10 shows the steps in test construction.
Steps in Test Construction
Decide on the different types of test you are going to use (refer to Chapter 2).
Decide on the proportion of test items to be used.
Use the test blueprint starting with lowest cognitive level and the first content coverage and
construct the test items.
Review and edit the test items (refer to chapter 7).
Arrange the test items (refer to chapter 7).
Prepare test directions (refer to chapter 7).

Figure 3.10: The steps in test construction
40
OUM
UNIT 1
CHAPTER 3
Figure 3.11 shows the way to construct test blueprint.
Way to Construct Test Blueprint

1.
Check on the content - decide on the proportions of content coverage whether the
test prepared is for a monthly test, mid-semester test or end-of-semester test.Most
importantly, be sure that the topics or coverage to be assessed.
2.
Check on the learning objectives -decide on the proportions learning objectives based
on the different cognitive levels or taxonomy level (refer to previous Module SBEM3303 Kaedah
Pengajaran Matematik).
3.
For example, students should be able to define the radius, diameter and
circumference of a circle in their own words.
Another example, students should be able to solve problems related to areas of

circular figures.
Another example, students should be able to solve applications problems related

to circular figures.
Distinguish and identify the levels of these objectives based on Blooms taxonomy.
These objectives can also be classified as Knows (K), Understand (U), Applies,
Analyze, Synthesis and Evaluate (A).
4.
Draw up the test blueprint
5.
Check to ensure the incorporation of the different contents covered and the different
levels of learning taxonomy
6.
Discuss and agree on the test blueprint with your peers before constructing the test
items
Figure 3.11: Way to construct test blueprint
OUM
41

UNIT 1
CHAPTER 3
Table 3.1 shows a sample of Test Blueprint or Test Table of Specification.

Table 3.1: A sample of Test Table of Specification
No.
Topics
1.
2.
Whole numbers
Squaring of numbers and
factorization
Fractions
Decimal numbers
Percentage
Negative numbers
Measurement
Angle and parallel lines
Polygon
Perimeter and areas
Solids and Volume
Algebraic functions
Linear equations
Algebra formula
Rate and ratio
Plane geometry and coordinate
Loci
Circle
Transformation
Statistics
Index
Linear inequalities
Graphs and function
Trigonometry
Total
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
Cognitive Level
Kno
2
2
Total
App
Com
1
2
4
3
1
1
2
2
2
2
5
2
1
4
2
2
2
1
3
2
2
1
2
2
3
1
1
1
1
2
1
2
1
1
2
1
1
1
2
1
1
2
1
1
2
1
2
1
2
2
1
2
Figure 3.12 shows the factors to consider in determining the number of test items to use.
In your opinion, what steps were taken to create the sample of test
blueprint shown in Figure 3.12?
42
OUM
UNIT 1
CHAPTER 3

Factors to Consider in Determining the Number of Test Items to Use
Level of students achievement.
Time available - depending on types of test: short test, regular test or final examination.
Type of test items used - multiple choice items require more time than true-false or short
answer questions.
However more complex learning outcomes requires more time.
Keep in mind that it is desirable to give all students an opportunity to complete the test.
To interpret students performance (to gauge students performance) it is wise to use at least 10
test items per learning outcome.
Figure 3.13: Factors to Consider in Determining the Number of Test Items to Use
Figure 3.14 shows the distribution of items by taxonomy level.
Distribution of Items by Taxonomy Level
Knowledge
10 to 25%
Comprehension
20 to 35%
Application
20 to 25%
Analysis, synthesis, evaluation
10 to 15%
Figure 3.14: Distribution of item by taxonomy
OUM
43

UNIT 1
CHAPTER 3
Figure 3.15 shows how to write test items.
Writing Test Items

1.
What should test items do? What is the purpose of the test?
2.
Match the behavior of the learning objectives
3.
Use the correct verb to specify the behaviour
4.
Match with the levels of the learning objectives
5.
The number of test items needed for a power test or achievement tests can be based on the
type of item used short answer or essay items require longer time therefore fewer number
of test items
ability level of the students- test of advanced class should be shorter than slow learning
class
length and complexity of these items more stimulus materials, fewer number of test items
type of level objective being tested recall item require shorter period
amount of computation involved in the test .
6.
The typical student will require 30 to 45 seconds to read and answer simple factual type MCQ
or T-F.
7.
The typical student will require 75 to 100 seconds to read and answer fairly complex MCQ
requiring problem solving.
8.
It is normal to have 60 items for 1 hour 15 minutes or 1 hour to 30 minutes.
9.
Use verbs describing the behavior listed in the learning objectives for behavior such as recall
and comprehend use T-F, Matching or MCQ items
10.
Use verbs describing the behavior listed in the learning objectives for behavior such as apply,
analyze and organize use MCQ or essay items
Figure 3.15: Writing test items
44
OUM
UNIT 1
CHAPTER 3

Figure 3.16 shows the considerations for writing test items based on level of difficulty.
Considerations for Writing Test Items Based on Level of Difficulty

1.
Difficulty of test can be thought of in terms of items or the whole test.
2.
The item difficulty is determined by dividing the number of students getting correct answer by
the total number of students
if 40% of students answered correctly, then the item difficulty is 40% or .40
if 75% of students answered correctly, then the item difficulty is 75% or .75
3.
In general difficulty of an item should be halfway between the number of correct answers by
guessing and 100%.
4.
In general, MCQ with 5 choice should indicate approximately .60 level of difficulty.
5.
In general, MCQ with 4 choice should indicate approximately 0.62 level of difficulty.
6.
In general, True-false with 2 choice should indicate approximately 0.75 level of difficulty.
7.
The average desired difficulty is 62% to 65%.
8.
Difficult items passed by 30% or 40% of the students and some easy items passed by 80% to
90% of the students.
Figure 3.16: The considerations for writing test items based on level of difficulty
Figure 3.17 shows the other considerations when writing test items.
Would you take into account all considerations for writing test items based on
level of difficulty? If not, which consideration would you omit?
OUM
45

UNIT 1
CHAPTER 3
Other Considerations When Writing Test Items
Amount of testing time
Ease of scoring
Amount of time to grade
Probability of students guessing
Ease of cheating
Sequencing of items
Test directions
Evaluating test items
Marking schemes
Scoring of test items

Figure 3.17: The other considerations when writing test items
3.3
APPROACHES TO TEST RELIABILITY
This is a measure of consistency and precision with which it tests what it is supposed
to test. Theoretically, a reliable test should produce the same result if administered to
the same students on two separate occasions. Statistically, splitting the test into two
parts and assume that these parts are equivalent could be accepted - reliable test
shows a high degree of correlation between students performance in each half of the
test.
Figure 3.18 shows the ways to improve reliability.
Ways to Improve Reliability
Ensuring that questions are clear and suitable for the level of the students
Checking to make sure test time limits are realistic.
Writing test instructions that are simple, clear, and ambiguous.
Developing a marking scheme of high quality (explicit and agreed criteria, checking of marks,
several skilled examiners).
Keeping choices within a test paper to a minimum.
When using less reliable test methods increase the number of questions, observations or
examination time.
Figure 3.18: The ways to improve reliability
46
OUM
UNIT 1
CHAPTER 3

Determining Reliability of Test Set Test-Retest Method

To estimate reliability by means of the test-retest methods, the same test is
administered twice to the same group of pupils with a given time interval between the
two administration.
The resulting test scores are correlated, and this correlation coefficient provides a measure
of stability; that is, it indicates how stable the test results are over the given period of
time.
If the results are highly stable, those pupils who scored high on one administration of the
test will tend to score high on the other administration, and the remaining pupils will tend
to stay in their same relative positions on both administration.
Such stability is indicated by a large correlation coefficient. Refer to Chapter 7 of this

Module for calculation and further discussion.
The correlation coefficient may vary from a perfect positive relationship which is indicated
by 1.00 and a zero relationship by 0.00.
Measures of stability in the .80s and .90s are commonly reported for standardized tests
of aptitude and achievement over occasions within the same year.
One important factor to keep in mind when interpreting measures of stability is the time
interval between tests.
If this time interval is short, say a day or two, the constancy of the results will be inflated
because pupils will remember some of their answers from the first test.
If the time interval is long, say about a year, the results will be influenced not only by the
instability of the testing procedure but also by actual changes in the pupils going over
that period of time.
In general, the longer the time interval is between test and retest, the more the results
will be influenced by changes in the pupil characteristics being measured and the smaller
the reliability coefficient will be.
The best time interval between test administrations will depend largely on the use to be
made of the results.
If, for example, college admission test scores can be summated as part of an application
to college several years after the test was taken, then stability over several years is quite
important.
But stability over a long period of time is neither important nor desirable for a unit test in
a course designed to assess mastery of certain concepts and readiness to move on to
new material.
Thus, for some decisions we are interested in reliability coefficients based on a long
interval between test and retest, and for others, reliability coefficients based on a short
interval may be sufficient.
The important thing is to seek evidence of stability that fits the particular interpretation to
be made.
Most teachers will not find it possible to compute test-retest reliability coefficients for
their own classroom test.
OUM
47

UNIT 1
CHAPTER 3
However, in choosing standardized tests, the stability of scores serves as one important
criterion.
The test manual should provide evidence of stability, indicating the time interval between
tests and any unusual experiences the group members might have had between testing.
Information concerning the stability of test scores also has implications for the use of
test results from school records and for the frequency of retesting.
When using any test score from permanent records, one should check the date of testing
and the stability data available to determine whether the results are still dependable. If
there is doubt and the decision is important, retesting is in order.
Determining Reliability of Test Set Equivalent-Forms Method

Estimating reliability by means of the equivalent-forms method which uses two different
but equivalent forms of the test (also called parallel or alternate forms).
The two forms of the test are administered to the same group of pupils in close
successions, and the resulting test scores are correlated.
This correlation coefficient provides a measure of equivalence.
Thus, it indicates the degree to which both forms of the test are measuring the same
aspects of behavior.
The equivalent-forms method tells us nothing about the long-term stability of the pupil
characteristic being measured but, rather, reflects short-term constancy of pupil
performance and the extent to which the test represents an adequate sample of the
characteristic being measured.
In achievement testing, for example, there are thousands of questions that might be
asked in a particular test. But because of time limits and other restricting factors, only
some of the possible questions can be used.
The questions included in the test should provide an adequate sample of the possible
questions in the area.
The easiest way to estimate if a test measures an adequate sample of the content is to
construct two forms of the test and correlate the results. A high correlation indicates that
both forms are providing similar results and therefore are probably reliable samples of
content were being measured.
This method overcomes the timing and interval between tests problem.
However, its limited use is widely due to the necessity of two or more forms to be made
available.
The equivalent-forms method provides measures of stability and equivalence when

administration of the two forms is within a time interval.
This method is the most rigorous test of reliability due to: stability of the testing procedures,
the constancy of pupil characteristic being measured, and the representativeness of the
sample of the tasks.
This is the most recommended measure of reliability.
48
OUM
UNIT 1
CHAPTER 3

Determining Reliability of Test Set Split-Half Method

While both the test-retest and equivalent-form methods require two or more sets of
tests, another means of measuring reliability requires only a single form of a test.
The test is administered to a group of learners in the usual manner and then the set is
divided into half for scoring purposes.
Split the test into halves so that equivalent forms are available, the usual procedure is to
score the even-numbered and the odd-numbered items separately.
The two set of scores from a test will be correlated.
This provides a measure of internal consistency.
It indicates the degree to which consistent results are obtained from the two halves of
the test.
To estimate the scores reliability based on the full length test, the Spearman-Brown
formula is usually applied:
Reliability on full test
= Two times correlation between half tests

One plus correlation between half tests
The split-half method is similar to the equivalent-form method as it indicates the extent
to which the sample of test items is a dependable sample of the content being measured.
A high correlation between scores of the two sets of test denotes the equivalence of the
two halves.
This consequently indicates the adequacy of the sampling.
Split-half reliability tend to be higher than that of the equivalent-form method because in
the split-half, the source of inconsistency occurs less as the administration of the test is
based on a single test.
Inconsistency such as different forms, speed of work, fatigue, and test content are more
in control.
Determining Reliability of Test Set Kuder-Richardson Method

Another method of determining reliability of test set from a single administration is by
means of formula developed by Kuder and Richardson.
This method provides measure of internal consistency however, without splitting half
the test for scoring purposes.
This method measures the extent which items within one form of the test have as much
in common with one another.
Kuder-Richardson estimate can be thought as the average of all of the possible splithalf coefficients for the groups tested.
Kuder-Richardson estimate is preferred for the homogeneous type of test coverage,

whilst the split-half is more appropriate for heterogeneous learning outcomes.
OUM
49

UNIT 1
CHAPTER 3
The simplicity of applying this method has led to its widespread use in determining
reliability.
However, such internal consistency measures are not appropriate for speeded test.
For speeded tests, reliability obtained by test-retest or equivalent-forms method should
be used.
On the other hand, this method poses no great problem for or teacher-made tests since
they are power tests.
For standardized tests, time limits are seldom so liberal that all students manage to
complete the test. Thus use of Kuder-Richardson is appropriate unless there is evidence
of speed of work is a factor.
Another limitation of internal consistency procedures is that they do not indicate the
constancy of learners response from one session to the other. Thus, time interval is not
regarded and which does not indicate the extent to which test results are generalizable
over different periods of time.
3.4
PRACTICALITY
This is pertaining to questioning whether the test is practical in terms of time and
resources. Can the results be interpreted accurately? Can the test be administered,
marked and graded? Does the test take too much time? Does administering and
grading the test need special resources? The following are some considerations in the
purpose of a test.
Dos and Donts in Test Constructions
Begin writing items far enough in advance that you will have time to revise them.
Match items to intended outcomes at the proper difficulty level to provide a valid measure
of instructional objectives.
Be sure each item deals with ONLY ONE IDEA OR ASPECT of the content area and not
with trivia.
Keep the reading difficulty of test items low.
Group items according to item type so students do not continuously shift response
patterns.
Be sure that the each item is independent of all other items. The answer to one item
should not be required as a condition for answering the next item, nor should a hint to
one answer be unintentionally embedded in another item.
Arrange the correct alternative/answer for multiple-choice items randomly.
Be sure that item has one correct or best answer on which experts would agree.
Try to write items that require higher-level thinking.
Prevent unintended clues to the answer in the statement or question. Grammatical

inconsistencies such as a or an give clues to the correct answer to those students who
are not well prepared for the test.
50
OUM
UNIT 1
CHAPTER 3

Avoid quoting directly from textual materials. Besides, taken out of context, direct quotes
from the text are often ambiguous.
The stem of the item should clearly formulate a problem. Include as much of the item as
possible, keeping the response options as short as possible. However, include only the
material needed to make the problem clear and specific.
Be concise-dont add extraneous information. Be sure that there is one and only one
correct or clearly best answer.
Be sure wrong answer choices (distractors) are plausible. Eliminate unintentional

grammatical clues, and keep the length and form of all the answer choices equal. Rotate
the position of the correct answer from item to item randomly.
Include from three to five options (two to four distractors plus one correct answer) to
optimize testing for knowledge rather than encouraging guessing.
Use the option none of the above sparingly and only when the keyed answer can be
classified unequivocally as right or wrong. Dont use this option when asking for a best
answer.
Avoid using the phrase all of the above. It is usually the correct answer and makes the
item too easy for students with partial information.
Advantages of Multiple-Choice Tests
Multiple-choice questions have considerable versatility in measuring objectives from the

knowledge to the evaluation level.
Since writing is minimized, a substantial amount of course material can be sampled in

a relatively short time.
Scoring is highly objective, requiring only a count of the number of correct responses.
Multiple-choice items can be written so that students must discriminate among options
that vary in degree of correctness. This allows students to select the best alternative
and avoids the absolute judgments found in T-F tests.
Since there are multiple options, effects of guessing are reduced.
Multiple-choice items are amenable to item analysis, which permits a determination of

which items are ambiguous or too difficult.
Disadvantages of Multiple-Choice Tests
Multiple-choice questions can be time-consuming to write.
If not carefully written, multiple-choice questions can sometimes have more than one
defensible correct answer.
OUM
51

UNIT 1
CHAPTER 3
Advantages of True-False Tests
The desired method of marking true or false should be clearly explained before students
begin the test.
Construct statements that are definitely true or definitely false, without additional
qualifications. If opinion is used, attribute it to some source.
Use relatively short statements and eliminate extraneous material.
Keep true and false statements at approximately the same length, and be sure that
there are approximately equal numbers of true and false items.
Avoid using double-negative statements. They take extra time to decipher and are difficult
to interpret.
T-F questions tend to be short hence more material can be covered than with any other
item format. Thus, T-F item tends to be used when a great deal of content has been
covered.
T-F questions take less time to construct. But avoid taking statements directly from the
text and modifying them slightly to create an item.
Scoring is easier with T-F questions. But avoid having students write true or false or
a T or F. Instead have them circle T or F provided for each item.
T-F questions tend to emphasize rote memorization of knowledge, although sometimes

complex questions can be asked using T-F items.
T-F questions presume that the answer to the question or issue is unequivocally true or
false. It would be unfair to ask the student to guess at the teachers criteria for evaluating
the truth of a statement.
T-F questions allow for and sometimes encourage a high degree of guessing. Generally,
longer examinations are needed to compensate for this.
Advantages of Matching Tests
Matching questions are usually simple to construct and to score.
Matching items are ideally suited to measure associations between facts.
Indicate through instructions the basis for matching.
Matching questions can be more efficient than multiple-choice questions because they
avoid repetition of options in measuring associations.
Matching questions reduce the effects of guessing.
Provide extra choices in the answer column to avoid selection by elimination.
Why Conduct Item-Analysis
Provide a basis for efficient class discussion of the test results.
Provide a basis for remedial work.
52
OUM
UNIT 1
CHAPTER 3
Provide a basis for general improvement of instruction.
Provide increased skill in test construction.
How to Conduct Item-Analysis
Rank the test papers in order from highest to the lowest score.
Select 10 papers with the highest total score and 10 papers with the lowest total score.
For each test item, tabulate the number of students in the upper and lower groups who
selected each alternative.
For example:
o
o
Compute the difficulty index

Percentage of students who got the item correct
p = R/T X 100
For the example: p = 14/20 x 100 = 70%
Comment: fairly low level of difficulty
If for instance 13 out 50 students got it correct,
p = 13/50 x 100 = 26%
Comment: difficult item
If for instance 45 out 50 students got it correct,
p = 45/50 u 100 = 90%
Comment: easy item
Test Difficulty Index
Some guidelines:
o
0 to 25 %
difficult item - poor understanding of the topic
25 to 75 %
fairly easy item - good understanding of the topic
75 to 100%
easy item - excellent understanding
Test Discrimination Index (D)

What is the purpose of test discrimination text?
OUM
53

UNIT 1
CHAPTER 3
Measure of the extent to which a test item discriminates or differentiates between

students who do well on the overall test and those who do not do well on the overall test.
D=
Rupper - Rlower
1
Total
2
D = (10 - 4) / 10 = .60
This indicate an average discriminating power
There are three types of discrimination indexes:
1.
Positive discrimination index indicate that those who did well on the overall test
chose the correct answer for a particular item more often than those who did poorly on
the overall tests.
2.
Negative discrimination index - Those who did poorly on the overall test chose the
correct answer for a particular item more often than those who did well on the overall
test.
3.
Zero discrimination index - Those who did well and those who did poorly on the overall
test chose the correct answer for a particular item with equal frequency.
Figure 3.19: Three types of discrimination indexes
Guidelines for Interpretation
-1.00 TO -0.25
lower group perform better for that topic
-0.25 TO +0.25
not much difference between two groups
+0.25 TO +1.00
high discriminative power, big difference between two groups, lower

group needs remedial
54
OUM
UNIT 1
CHAPTER 3

Guidelines for Instruction
+0.4 >
positive discrimination, retain the item
+0.20 to +0.39
fairly positive discrimination, need to revise and improve the item
+0.10 to +0.19
low positive discrimination, delete item or revise the item
< +0.10
no discriminative power, delete the item
Exercise 3.1
1.
How would you know if a teacher had constructed a valid assessment?
2.
How would you know if a teacher had constructed a reliable assessment?
3.
Why is validity more important than reliability?
4.
On the following table state the type of consistency indicated by using

the different method for estimating reliability.
SUMMARY
In this chapter we have discussed matters pertaining to principles in assessment of
mathematics instruction mainly on constructing achievement tests. Basically the
process is:
x
Instructional objectives are formulated based on the schools educational goal.
Instructional procedures are implemented that lead to the achievement of these objectives.
A test blueprint is drawn to ensure that each important content and process area is
adequately sampled by the appropriate number and kind of test items.
Test items are written. The format, number and level are determined in part by the
objectives, in part by the test blueprint, and in part by the teachers judgment.
Test items are reviewed and where necessary edited or replaced by panel of validates.
The test is assembled and reproduced to ensure that copies are legible.
The test is administered.
The test is marked and scores given.
Items that look marginal are subjected to quantitative and qualitative analysis.
Test papers are return to students and notes are made for deficient or problematic
items.
OUM
55

Chapter3 PDF

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter3 PDF

Caricato da

Copyright:

Formati disponibili

UNIT 1

ASSESSMENT IN MATHEMATICS CLASSROOMS

PRINCIPLES OF GOOD PRACTICE FOR ASSESSMENT

What are the principles for classroom assessment?

Assessment is the process of gathering, interpreting, and synthesizing information to

ASSESSMENT IN MATHEMATICS CLASSROOMS

Nine Principles in Good Practice for Assessing Learning

Assessment is most effective when it reflects an understanding of learning as multi dimensional,

Through assessment, educators meet responsibilities to students and to the public.

ASSESSMENT IN MATHEMATICS CLASSROOMS

Lange (1999) suggested a list of principles for classroom assessment as shown in

Lange List of Principles for Classroom Assessment

The main purpose of classroom assessment is to improve learning.

Students should have opportunities to receive genuine feedback on their work.

Figure 3.2: Lange List of Principles for Classroom Assessment

Other principles or considerations of good assessment are validity, reliability, and

Why is validity more important than reliability?

ASSESSMENT IN MATHEMATICS CLASSROOMS

Figure 3.3: Considerations of good assessment

3.1.1 Issues of Validity

Questions to Determine Validity of Classroom Assessment

ASSESSMENT IN MATHEMATICS CLASSROOMS

Figure 3.5 shows the key aspects of assessment validity.

Key Aspects of Assessment Validity

Validity is always determined by a judgment made by the test user.

How would you determine if the given information is reliable or not?

This is the second important characteristic of good assessment is its consistency,

ASSESSMENT IN MATHEMATICS CLASSROOMS

Influences of Assessment Reliability

Ambiguous test items.

Interruptions during testing.

Differences in students attention span.

Clarity of test directions.

Students luck in guessing.

Appropriateness of the assessment.

Mistakes in the scoring process.

Key aspects of assessment reliability are shown in Figure 3.7.

Key Aspects of Assessment Reliability

Reliability refers to the stability or consistency of assessment information and is concerned

Ethical Issues and Responsibilities

Although assessment is thought as a technical activity, there are ethical concerns

ASSESSMENT IN MATHEMATICS CLASSROOMS

Ethical Issues and Responsibilities

Recognizing that cultural differences do not imply cultural deficits.

Knowing and protecting pupils legal rights guaranteed by federal/state law.

How would you know if a teacher had constructed a valid assessment?

APPROACHES TO TEST VALIDITY

ASSESSMENT IN MATHEMATICS CLASSROOMS

Ways to Ensure Validity

There must be clear statement of objectives or learning outcomes of the course.

Include various types of test for higher validity.

Employing range of test types.

Figure 3.10 shows the steps in test construction.

Steps in Test Construction

Decide on the proportion of test items to be used.

Review and edit the test items (refer to chapter 7).

Arrange the test items (refer to chapter 7).

Prepare test directions (refer to chapter 7).

ASSESSMENT IN MATHEMATICS CLASSROOMS

PRINCIPLES OF GOOD ASSESSMENT

Figure 3.11 shows the way to construct test blueprint.

Way to Construct Test Blueprint