Sei sulla pagina 1di 44

ITEM

ANALYSIS
Technique to improve test
items and instruction
TEST DEVELOPMENT PROCESS

1. Review 2. Convene National


13. Standard Setting Study Advisory Committee
14. Set Passing Standard
National
and
Professional 3. Develop Domain,
Knowledge and Skills
Standards
11. Administer Tests Statements
12. Conduct Item Conduct NeedsAnalysis
Analysis

5. Construct Table of
Specifications
6. Develop Test
9. Assemble Operational Test Design
Forms
10. Produce Printed Tests Mat. 7. Develop New Test Questions
8. Review Test Questions
WHAT IS ITEM ANALYSIS ?
 process that examines student
responses
to individual test items assessing quality
items and the test as a whole
 valuable in improving items which will
be
used again in later tests and
eliminate ambiguous or misleading
items
 valuable for increasing instructors' skills in
test construction, and
 identifying specific areas of course content
which need greater emphasis or clarity.
SEVERAL PURPOSES

1. More diagnostic information on students


• Classroom level:
• determine questions most found very difficult/
guessing on -
• reteach that concept
• questions all got right –
• don't waste more time on this area
• find wrong answers students are
choosing-
• identify common misconceptions

• Individual level:
• isolate specific errors the students made
2. Build future tests, revise
test items to make them
better
• know how much work in writing good
questions
• SHOULD NOT REUSE WHOLE TESTS -->
diagnostic teaching means responding to
needs of students, so after a few years a
test bank is build up and choose a
tests for the class
• can spread difficulty levels across your
blueprint (TOS)
3. Part of continuing professional
development
• doing occasional item analysis will help
become a better test writer

• documenting just how good your evaluation


is

• useful for dealing with parents or


administrators if there's ever a dispute

• once you start bringing out all these


impressive looking statistics, parents and
administrators will believe why some
students failed.
CLASSICAL ITEM ANALYSIS
STATISTICS

• Reliability (test level statistic)

• Difficulty (item level statistic)

• Discrimination (item level statistic)


TEST LEVEL STATISTIC

Quality of the Test

• Reliability and Validity

• Reliability Consistency of measurement

• Validity Truthfulness of response

• Overall Test Quality

• Individual Item Quality


RELIABILITY

• refers to the extent to which the test is likely to


produce consistent scores.

Characteristics:
1. The intercorrelations among the items --
the greater/stronger the relative number
of positive relationships are, the
greater
the reliability.

2. The length of the test –


a test with more items will have a higher
reliability, all other things being equal.
3.The content of the test --
generally, the more diverse the
subject matter tested and the testing
techniques used, the lower the
reliability.

4.Heterogeneous groups of test


takers
TYPES OF RELIABILITY

• Stability
1. Test – Retest
• Stability
2. Inter – rater / Observer/ Scorer
• applicable for mostly essay questions
• Use Cohen’s Kappa Statistic
• Equivalence
3. Parallel-Forms/ Equivalent
Used to assess the consistency of the results of
two tests constructed in the same way from the
same content domain.
• Internal Consistency
• Used to assess the consistency of results across
items within a test.

4. Split – Half
• 5. Kuder-Richardson
Formula 20 / 21
Correlation is determined from a
single administration of a test
through a study of score variances
• 6. Cronbach's Alpha (a)
Reliability
Interpretation
Indices
Excellent reliability; at the level of the best standardized
.91 and above
tests
.81 - .90 Very good for a classroom test

Good for a classroom test; in the range of most. There


.71 - .80 are probably a few items which could be improved.

Somewhat low. This test needs to be supplemented by


other measures (e.g., more tests) to determine
.61 - .70 grades. There are probably some items which could
be improved.

Suggests need for revision of test, unless it is quite short


(ten or fewer items). The test definitely needs to be
.51 - .60 supplemented by other measures (e.g., more tests)
for grading.

Questionable reliability. This test should not contribute


.50 or below
heavily to the course grade, and it needs revision.
TEST ITEM STATISTIC

 Item Difficulty
Percent answering correctly

 Item Discrimination
How well the item "functions“
How “valid” the item is based on
the total test score criterion
WHAT IS A WELL-FUNCTIONING
TEST ITEM?

• how many students got it correct?

(DIFFICULTY
)
• which students got it
correct?

(DECRIMINATION
)
THREE IMPORTANT INFORMATION
ON QUALITY OF TEST ITEMS
• Item difficulty: measure whether an item
was too easy or too hard.

• Item discrimination: measure whether an item


discriminated between students who knew the
material well and students who did not.

• Effectiveness of alternatives: Determination


whether distracters (incorrect but plausible
answers) tend to be marked by the less
able students and not by the more able
students.
ITEM DIFFICULTY

• Item difficulty is simply the percentage of


students who answer an item correctly. In this
case, it is also equal to the item mean.

Diff = # of students choosing correctly


total # of students

• The item difficulty index ranges from 0 to 100;


the higher the value, the easier the question.
ITEM DIFFICULTY LEVEL: DEFINITION

The percentage of students who answered


the item correctly.
Low
High Medium
(Easy
(Difficult) (Moderate)
)

>=80
<= 30% > 30% AND < 80%
%

0 10 20 30 40 50 60 70 80 90
100
ITEM DIFFICULTY LEVEL: SAMPLE

Number of students who answered each item = 50

Item No. Correct % Difficulty


No. Answers Correct Level
1 15 30 High
2 25 50 Medium
3 35 70 Medium
4 45 90 Low
ITEM DIFFICULTY LEVEL:

QUESTIONS/DISCUSSION
• Is a test that nobody failed too
easy?
• Is a test on which nobody got 100%
too difficult?
• Should items that are “too easy” or
“too difficult” be thrown out?
ITEM
DISCRIMINATION
• Traditionally, using high and low scoring groups
(upper 27 % and lower 27%)
• Computerized analyses provide more accurate
assessment of the discrimination power of
items since it accounts all responses rather
than just high and low scoring groups.
• Equivalent to point-biserial correlation. It
provides estimate the degree an individual item is
measuring the same thing as the rest of the
items.
WHAT IS ITEM
DISCRIMINATION
?
• Generally, students who did well on the
exam should select the correct answer to
any given item on the exam.
• The Discrimination Index distinguishes for
each item between the performance of
students who did well on the exam and
students who did poorly.
INDICES OF DIFFICULTY AND
DISCRIMINATION
(BY HOPKINS AND ANTES)

Index Difficulty Discrimination


0.86 above Very Easy To be discarded

0.71 – 0.85 Easy To be revised

0.30 – 0.70 Moderate Very Good items


0.15 – 0.29 Difficult To be revised

0.14 below Very Difficult To be discarded


ITEM DISCRIMINATION:
QUESTIONS / DISCUSSION
• What factors could contribute to
low item discrimination
between the two groups of
students?
• What is a likely cause for a
negative discrimination index?
ITEM
ANALYSI
S
PROCESS
SAMPLE TOS
Remember Understand Apply Total

Section 4 6 10 20
A (1,3,7,9)

Section 5 5 4 14
B (2,5,8,11,15)

Section 3 7 6 16
C (6,17,21)

Total 12 18 20 50
STEPS IN ITEM ANALYSIS

1. Code the test items:


- 1 for correct and 0 for incorrect
- Vertical – columns (item numbers)
- Horizontal – rows
(respondents/students)
TEST ITEMS
No. 1 2 3 4 5 6 7 8 9 1 1 1 1 1 . . . . 5
0 1 2 3 4 0

1 1 0 1 1 1 0 0 0 0 1 1 1 0 0 0 0 0 1 1
2 1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 1 1 1
3 0 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 0
4 0 1 0 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0
5 1 0 1 1 1 0 1 1 1 0 1 1 0 1 1 1 0 1 0
6 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 1
7 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1
8 1 1 0 1 1 1 0 1 1 1 0 1 1 0 0 0 1 0 0
2. IN SPSS:

Analyze-Scale-Reliability
analysis – (drag/place variables
to Item box) – Statistics –
Scale if item deleted – ok.
• ****** Method 1 (space saver) will be used for this analysis ******
• R E L I A B I L I T Y A N A L Y S I S - S C A L E (A L P H
• Item-total Statistics A)
• Scale Scale Corrected
• Mean Variance Item- Alpha
• if Item if Item Total if Item
• Deleted Deleted Correlation Del
• VAR00001 14.4211 127.1053 .9401 eted
.9502
• VAR00002 14.6316 136.8440 .7332 .9542

• VAR00022 14.4211 129.1410 .7311 .9513


• VAR00023 14.4211 127.1053 .4401 .9502
• VAR00024 14.6316 136.8440 -.0332 .9542
• VAR00047 14.4737 128.6109 .8511 .9508
• VAR00048 14.4737 128.8252 .8274 .9509
• VAR00049 14.0526 130.6579 .5236 .9525
• VAR00050 14.2105 127.8835 .7533 .9511
• Reliability Coefficients
• N of Cases = 57.0 N of Items = 50
• Alpha = .9533
3. In the output dialog box:

• Alpha placed at the bottom

• the corrected item total


correlation is the point biserial
correlation as bases for index
of test reliability
4. Count the number
of items discarded
and fill up summary
item analysis table.
TEST ITEM RELIABILITY ANALYSIS
SUMMARY (SAMPLE)

Test Level of Number % Item Number


Difficulty of Items
Math Very Easy 1 2 1
(50 items) Easy 2 4 2,5
Moderate 10 20 3,4,10,15…
Difficult 30 60 6,7,8,9,11,…
Very 7 14 16,24,32…
Difficult
5. Count the number of
items retained based on
the cognitive domains
in the TOS. Compute the
percentage per level of
difficulty.
Remember Understand Apply
N Ret N Ret N Ret

A 4 1 6 3 10 3
B 5 3 5 3 4 2
C 3 2 7 4 6 3
Total 12 6 18 10 20 8
% 50% 56% 40%
Over 24/50 = 48%
all
• Realistically: Do item analysis to
your most important tests
• end of unit tests, final exams -->
summative evaluation
• common exams with other teachers
(departmentalized exam)
• common exams gives
bigger sample to work with, which
is good
• makes sure that questions other
teacher s prepared are
working for your class
ITEM ANALYSIS is one area where
even a lot of otherwise very
good classroom teachers fall
down:
• they think they're doing a good job;
• they think they've doing good
evaluation;
• but without doing item
analysis,
• They don’t really know.
ITEM ANALYSIS is not an
end in itself,
• no point unless you use it
to revise items, and
• helps students on the basis
of information you get out
of it.
END OF PRESENTATION…

THANK U FOR
LISTENING…
H AV E A RELIABLE A N D
ENJOYABLE DAY….

Potrebbero piacerti anche