Assessment of Learning

TABLE OF SPECIFICATIONS PROTOTYPE
LEVEL OBJECTIVE ITEM NUMBERS NO. %
1. Knowledge Identify subject- 1, 3, 5, 7, 9 5 16.67 %

verb
2. Comprehension Forming 2, 4, 6, 8, 10 5 16.67 %
appropriate verb
forms
3. Application Determining 11, 13, 15, 17, 19 5 16.67 %
subject and
predicate
4. Analysis Formulating rules 12, 14, 16, 18, 20 5 16.67 %
on agreement
5. Synthesis/ Writing of Part II 10 pts 33.32 %
evaluation sentences
observing rules on
subject-verb
agreement
TOTAL 30 100 %
Constructing the test items
The actual construction of the test items

follows the TOS. As a general rule, it is advised that
the actual number of items to be constructed in the
draft should be double the desired number of items.
For instance, of there are five (5) knowledge level
items to be included in the final test form, then at
least ten (10) knowledge level items should be
included in the draft.
Item analysis and try-out
The test draft is tried out to a group of pupils of

students. The purpose of this try out is to determine the:
a) item characteristics through item analysis,
b) characteristics of the test itself-validity, reliability, abd

practicality.
CONSTRUCTING A TRUE-FALSE TEST
Here are some rules of thumb in

constructing true-false items:
RULE 1. Do not give a hint

(inadvertently) in the body of the
question.
Example:
The Philippines gained its independence

in 1898 and therefore celebrated its
centennial year in 2000.
RULE 2
Avoid using the words “always”, “never”, “often” and other adverbs
that tend to be either always true or always false.
EXAMPLE:
Christmas always fall on a Sunday because it is a Sabbath day.
RULE 3
Avoid long sentences as these tend to be “true”. Keep sentences
short.
EXAMPLE:
Tests need to be valid, reliable, and useful, although, it would
require a great amount of time and effort to ensure that tests
possess these test characteristics.
RULE 4
Avoid trick statements with some minor
misleading word or spelling anomaly,
misplaced phrases, etc. A wise student who
does not know the subject matter may detect
this strategy and thus get the answer
correctly.
EXAMPLE:
True or False. The Principle of our school
is Mr. Albert P. Panadero.
RULE 5
Avoid quoting verbatim from reference
materials or textbooks. This practice sends the
wrong signal to the student that it is necessary
to memorize the textbook word for word and thus,
acquisition of higher level thinking skills is
not given due importance.
RULE 6
Avoid specific determiners or give-away
qualifiers. Students quickly learn that strongly
worded statements are more likely to be false
than true, for example, statements with “never”,
“no”, “all”, “always”. Moderately worded
statements are more likely to be true than false.
Statements with “many”, “often”, “sometimes”,
“generally”, “frequently”, or “some” should be
avoided.
RULE 7
With true or false questions,
avoid a grossly disproportionate
number of either true or false
statements or even patterns in the
occurrence of true and false
statements.
MULTIPLE CHOICE TESTS
A generalization of the true-false

test, the multiple choice type of test
offers the student with more than two (2)
options per item to choose from. Each item
in a multiple choice test consists of two
parts:
a) the stem,
b) the options.
Guidelines in Constructing Multiple
Choice Items
1. Do not use unfamiliar words, terms and

phrases.
EXAMPLE
What would be system reliability of a

computer system whose slave and peripherals are
connected in parallel circuits and each one has
a known time to failure probability of 0.05?
2. Do not use modifiers that are vague and whose
meanings can differ from one person to the next
such as: much, often, usually, etc.
EXAMPLE
Much of the process of photosynthesis takes
place in the:
a. bark
b. leaf
c. stem
3. Avoid complex or awkward word arrangements.
Also, avoid use of negatives in the stem as this
may add unnecessary comprehension difficulties.
EXAMPLE
(Poor)
As President of the Republic of the
Philippines, Corazon Cojuangco Aquino would stand
next to which President of the Philippine
republic subsequent to the 1986 EDSA Revolution?
(Better)
Who was the President of the Philippines
after Corazon C. Aquino?
4. Do not use negatives or double negatives
as such statements tend to be confusing. It
is best to use simpler sentences rather than
sentences that would require expertise in
grammatical construction.
EXAMPLE
(Poor)
Which of the following will not cause
inflation in the Philippine economy?
(Better)
Which of the following will cause inflation
in the Philippine economy.
5. Each item stem should be as short as possible; otherwise
you risk testing more for reading and comprehension skils.
6. Distracters should be equally plausible and attractive.
EXAMPLE
The short story: May Day’s Eve, was written by which Filipino
author?
a. Jose Garcia Villa
b. Nick Joaquin
c. Genoveva Edrosa Matute
d. Robert Frost
e. Edgar Allan Poe
7. All multiple choice options should be grammatically

consistent with the stem
8. The length, explicitness, or degree of
technicality of alternatives should not be the
determinants of the correctness of the answer.
The following is an example of this rule:
EXAMPLE:
If the three angles of two triangles are
congruent, then the triangles are:
a. congruent whenever one of the sides of

triangles are congruent
b. similar
c. equiangular and therefore, must also be
congruent
d. equilateral if they are equiangular
9. Avoid stems that reveal the answer to another
item
10. Avoid alternatives that are synonymous with

others or those that, include or overlap others
EXAMPLE
What causes ice to transform from solid state to
liquid state?
a. change in temperature
b. changes in pressure
c. change in the chemical composition
d. change in heat levels
11. Avoid presenting sequenced items in the same

order as in the text.
12. Avoid use of assumed qualifiers that many
examinees may not be aware of.
13. Avoid use of unnecessary words or phrases,

which are not relevant to the problem at hand
(unless such discriminating ability is the primary
intent of the evaluation). The item’s value is
particularly damaged if the unnecessary material is
designed to distract or mislead. Such items test
the student’s reading comprehension rather than
knowledge of the subject matter.
EXAMPLE. The side opposite the thirty degree
angle in a right triangle is equal to half the length of
the hypotenuse. If the sine of a 30-degree is 0.5
and its hypotenuse is 5, what is the length if the
side opposite the 30-degree angle?
a. 2.5
b. 3.5
c. 5.5
d. 1.5
14. Avoid use of non-relevant sources of
difficulty such as requiring a complex
calculation when only knowledge of a
principle is being tested
15. Avoid extreme specificity requirements

in responses.
16. Include as much of the item as possible

in the stem. This allows less repetition and
shorter choice options.
17. Use the “None of the above” option only when the keyed
answer is totally correct. When choice of the “best” response is
intended, “none of the above” is not appropriate, since the
implication has already been made that the correct response
may be partially inaccurate.
18. Note that the use of “all of the above” may allow credit for
partial knowledge. In a multiple option item, (allowing only one
option choice) if a student only knew that two (2) options were
correct, he could then deduce the correctness of “all of the
above”. This assumes you are allowed only one correct choice.
19. Having compound response choices may
purposefully increase difficulty of an item.
20. The difficulty of a multiple choice item may be

controlled by varying the homogeneity or degree of
similarity responses. The more homogeneous, the
more difficult the item.
EXAMPLE.
(Less Homogeneous) (More Homogeneous)
Thailand is located in: Thailand is located next
to:
a. Southeast Asia
b. Eastern Europe a. Laos & Kampuchea
c. South America b. India & China
d. East Africa c. China & Malaya
e. Central America d. Laos & China
e. India & Malaya
MATCHING TYPE AND SUPPLY TYPE ITEMS
The matching type items may be considered as modified
multiple choice type items where the choices progressively
reduce as one successfully matches the items on the left with
the items on the right.
EXAMPLE: Match the items in column A with the items
in column B.
COLUMN A COLUMN B
c 1. Magellan
__ a. First President of the Republic
d 2. Jacinto
__ b. National Hero
b 3. Rizal
__ c. Discovered the Philippines
f 4. Lapu-Lapu
__ d. Brain of the Katipunan
a 5. Aguinaldo
__ e. The great painter
f. Ruler of Mactan
Another useful device for testing lower order thinking
skills is the supply type of tests. Like the multiple choice
test, the items in this kind of test consist of a stem and a
blank where the students would write the correct
answer.
EXAMPLE. The study of life and living organisms

is called _________.
Supply type tests depend heavily on the way that the
stems are constructed. These tests allow for one and
only one answer and, hence, often test only the
students’ knowledge. It is, however, possible to
construct supple type of tests that will test higher order
thinking as the following example shows:
EXAMPLE. Write an appropriate synonym for

each of the following. Each blank corresponds to a
letter:
Metamorphose: _ _ _ _ _ _
Flourish: _ _ _ _
ESSAYS
Essays, classified as non-objective tests, allow
for the assessment of higher order thinking skills.
Such tests require students to organize their
thoughts on a subject matter in coherent sentences
in order to inform an audience. In essay, students
are required to write one or more paragraphs on a
specific topic,
Essay questions can be used to measure attainment of a
variety of objectives. Stecklein (1995) has listed 14
types abilities that can be measured by essay items:
 Comparisons between two or more things

 The development and defense of an opinion
 Questions of cause and effect
 Explanations of meanings
 Summarizing of information in a designated area
 Analysis
 Knowledge of relationships
 Illustrations of rules, principles, procedures, and
applications
 Applications of rules, laws, and principles to new

situations
 Criticisms of the adequacy, relevance, or correctness of a

concept, idea, or information
 Formulation of new questions and problems
 Reorganization of facts
 Discrimination between objects, concepts, or events
 Inferential thinking
The following are rules of thumb which
facilitate the scoring of essays:
RULE 1
Phrase the direction in such a way that
students are guided on the key concepts to be
included.
EXAMPLE
Write an essay on the topic:
“Plant Photosynthesis” using the
following keywords and phrases: chlorophyll,
sunlight, water, carbon dioxide, oxygen, by-
product, stomata.
RULE 2
Inform the students on the criteria to be used for
grading their essays. This rule allows the students to
focus on relevant and substantive materials rather than
on peripheral and unnecessary facts and bits of
information.
EXAMPLE
Write an essay on the topic:
“Plant Photosynthesis” using the keywords indicated.
You will be graded according to the following criteria:
 Coherence
 Accuracy of statements
 Use of keywords
 Clarity
 Extra points for innovative presentation of ideas
RULE 3
Put a time limit on the essay test.
RULE 4
Decide on your essay grading system prior to getting the
essays of your students.
RULE 5
Evaluate all of the students’ answer to one question before
proceeding to the next question.
RULE 6
Evaluate answers to essay questions without knowing the
identity of the writer.
RULE 7
Whenever possible, have two or more persons grade each
answer.
ITEM ANALYSIS
AND VALIDATION
ITEM ANALYSIS
There are two important characteristics
of an item that will be of interest to
the teacher.
These are:
(a)item difficulty
(b)discrimination index.
ITEM DIFFICULTY = number of students

with correct answer/ total number of
students
The following arbitrary rule is often used in the
literature:
RANGE OF INTERPRETATI ACTION

DIFFICULTY ON
INDEX
0 – 0.25 difficulty revise or discard
0.26 – 0.75 right difficulty retain
0.76 - above easy revise or discard
Index of discrimination = DU – DL
EXAMPLE. Obtain the index of discrimination
of an item if the upper 25% of the class had a
difficulty index of 0.60 (i.e. 60% of the upper 25%
got the correct answer) while the lower 25% of the
class had a difficulty index of 0.20.
Here, DU = 0.60 while DL = 0.20, thus index of
discrimination = .60 - .20 = .40
Following rule of thumb:
INDEX RANGE INTERPRETATI ACTION

ON
-1.0 – -.50 can discriminate discard
but item is
questionable
-.55 – 0.45 non- revise
discriminating
0.46 – 1.0 discriminating include
item
At the end of the Item Analysis report, test items are
listed according to their degrees of difficulty (easy,
medium, hard) and discrimination (good, fair, poor).
These distributions provide a quick overview of the
test, and can be used to identify items which are not
performing well and which can perhaps be
improved or discarded.
SUMMARY
The Item-Analysis Procedure for norm-Provides the

following information:
1. the difficulty of the item
2. the discriminating power of the item
3. the effectiveness of each alternative
Benefits derived from Item Analysis
1. it provides useful information for class
discussion of the test
2. it provides data which helps students
improve their learning
3. it provides insights and skills that lead to the
preparation of better tests in the future
INDEX OF DIFFICULTY
𝑅𝑢+𝑅𝐿
P= x 100
𝑇
Where:
Ru – the number in the upper group who answered
the item correctly
RL – the number in the lower group who answered
the item correctly
T – the total number who tried the item
INDEX OF ITEM DISCRIMINATING POWER
𝑅𝑢+𝑅𝐿
D= 1
𝑇
2
Where:
P – percentage who answered the item correctly
(index of difficulty)
R – number who answered the item correctly
T – total number who tried the item
8
P= x 100 = 40%
20
The smaller the percentage figure the more difficult the
item.
Estimate the item discriminating power using the
formula below:
𝑅𝑢 −𝑅𝐿 6 −2
D= 1 = = 0.40
𝑇 10
2
The discriminating power of an item is reported as a
decimal fraction; maximum discriminating power is
indicated by an index of 1.00.
Maximum discrimination is usually found at the 50
percent level of difficulty
0.00 – 0.20 = very difficult
0.21 – 0.80 = moderately difficult
0.81 – 1.00 = very easy
VALIDATION
After performing the item analysis and revising the
items which need revision, the next step is to
validate the instrument. The purpose of validation is
to determine the characteristics of the whole test
itself, namely, the validity and reliability of the test.
Validation is the process of collecting and analyzing
evidence to support the meaningfulness and
usefulness of the test.
VALIDITY
Validity is the extent to which a test measures what
is purports to measure or as referring to the
appropriateness, correctness, meaningfulness and
usefulness of the specific decisions a teacher
makes based in the test results.
Criterion-related evidence of validity refers to the
relationship between scores obtained using the
instrument and scores obtained using one or more
other tests (often called criterion).
Construct-related evidence of validity refers to the

nature of the psychological construct or
characteristic being measured by the test.
In order to obtain evidence of criterion-related validity,
the teacher usually compares scores on the test in
question with the scores on some other independent
criterion test which presumably has already high
validity.
Example. If a test is designed to measure mathematics
ability of students and it correlates highly with a
standardized mathematics achievement test (external
criterion), then we say we have high criterion-related
evidence of validity.
In particular, this type of criterion-related validity is
called its concurrent validity. Another type of criterion-
related validity is called predictive validity wherein the
test scores in the instrument are correlated with
scores on a later performance (criterion measure) of
the students.
Example. The mathematics ability test constructed
by the teacher may be correlated with their later
performance in a Division wide mathematics
achievement test.
Apart from the use of correlation coefficient in
measuring criterion-related validity, Gronlund
suggested using the so-called expectancy table. This
table is easy to construct and consists of the test
(predictor) categories listed on the left hand side and
the criterion categories listed horizontally along the top
of the chart.
Example. Suppose that a mathematics achievement
test is constructed and the scores are categorized as
high, average, and low. The criterion measure used is
the final average grades of the students in high school:
very good, good, and needs improvement.
The two way table lists down the number of
students falling under each of the possible pairs of
(test, grade) as shown below:
GRADE POINT AVERAGE
Test Very Good Needs Improvement

Score Good
High 20 10 5
Average 10 25 5
Low 1 10 14
RELIABILITY
Reliability refers to the consistency of the scores
obtained – how consistent they are for each
individual from one administration of an instrument
to another and from one set of items to another.
Reliability and validity are related concepts, if an
instrument is unreliable, it cannot yet valid
outcomes. As reliability improves, validity may
improve (or it may not).
The following table is a standard followed almost
universally in educational tests and measurement.
RELIABILIT INTERPRETATION
Y
90 and above Excellent reliability; at the level of the
best standardized tests
80 – 90 Very good for a classroom test
.70 – 80 Good for a classroom test; in the range

of most. There are probably a few items
which could be improved.
RELIABILIT INTERPRETATION
Y
.60 – 70 Somewhat low. This test needs to be
supplemented by other measures (e.g.,
more tests) to determine grades. There are
probably some items which could be
improved.
.50 – 60 Suggests need for revision of test, unless it
is quite short (ten or fewer items). The test
.50 or below definitely need reliability.
Questionable to be supplemented
This test by other
should
measures (e.g., heavily
not contribute more tests) forcourse
to the grading.
grade, and it needs revision

Assessment of Learning

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Assessment of Learning

Caricato da

Copyright:

Formati disponibili

TABLE OF SPECIFICATIONS PROTOTYPE

LEVEL OBJECTIVE ITEM NUMBERS NO. %

1. Knowledge Identify subject- 1, 3, 5, 7, 9 5 16.67 %

The actual construction of the test items

The test draft is tried out to a group of pupils of

a) item characteristics through item analysis,

b) characteristics of the test itself-validity, reliability, abd

Here are some rules of thumb in

RULE 1. Do not give a hint

The Philippines gained its independence

A generalization of the true-false

1. Do not use unfamiliar words, terms and

What would be system reliability of a

6. Distracters should be equally plausible and attractive.

7. All multiple choice options should be grammatically

a. congruent whenever one of the sides of

10. Avoid alternatives that are synonymous with

11. Avoid presenting sequenced items in the same

13. Avoid use of unnecessary words or phrases,

15. Avoid extreme specificity requirements

16. Include as much of the item as possible

20. The difficulty of a multiple choice item may be

EXAMPLE. The study of life and living organisms

EXAMPLE. Write an appropriate synonym for

 Comparisons between two or more things

 Applications of rules, laws, and principles to new

 Criticisms of the adequacy, relevance, or correctness of a

 Formulation of new questions and problems

 Discrimination between objects, concepts, or events

ITEM DIFFICULTY = number of students

RANGE OF INTERPRETATI ACTION

INDEX RANGE INTERPRETATI ACTION

The Item-Analysis Procedure for norm-Provides the

Construct-related evidence of validity refers to the

Test Very Good Needs Improvement

.70 – 80 Good for a classroom test; in the range

Potrebbero piacerti anche