Report On Descriptive Statistics and Item Analysis

Report on Descriptive Statistics and Item Analysis of
Objective Test Items
Report on Descriptive Statistics and Item Analysis of Objective Test Items on
Data Extracted From the Grade 12 Final English Second Language Exam
2008.
by
Stephan Freysen
Prof. T Kuhn
CIA 722
7 April 2008
ii
Acknowledgements
I would like to express extreme gratitude to the Gauteng Department of
Education for the professional and cooperative manner in which they dealt.
The datagathering for this report would have been far much more gruelling
had it not been for the selfless assistance that Mr. Y Zafir and Ms. L Bongani
provided me with.
I would also like to thank Prof. Knoetze for tabulating the test data. Thanks
to Prof. Kuhn for setting up a template with formulas. It has been a great
help.
iii
Descriptive Abstract
This report is written so that judgement can be passed on the reliability of
the multiple –choice test in the grade 12 English second language final
exam.
iv
Table of contents
Acknowledgements iii
Descriptive abstract iv
List of Tables vi
List of Figures vii
Terminology list viii
1. Introduction and purpose 1
2. Test analysis 2
3. Item analysis 8
4. Conclusion 10
Bibliography 11
Appendix A: Test Data 12
List of tables
v
Table 2.1: Tabulated Test Scores
Table 2.2: Measure of Central Tendency
Table 2.3: Frequency Distribution
Table 2.4: Test scores with pq values
Table 3.1: Item Difficulty Indices
Table 3.2: Item Discrimination Indices
List of Figures
vi
Figure 2.1: Histogram of Frequency
Figure 2.2: Polygon of Frequency
Figure 2.3: Ogive of Frequency
Figure 4.1: Percentage of acceptability
vii
Terminology List
Descriptive Statistics The term used to refer to the mode, median and
mean.
Difficulty Index “Proportion of students who answered the item
correctly.” Borich & Kubiszyn (2007: 205)
Discrimination Index “Measure of the extent to which a test item
discriminates or differentiates between students who
do well on the overall test an those who do not do
well on the overall test.”
Borich & Kubiszyn (2007: 205)
Mean The average of a set of numbers
Median The score that splits the distribution in half.
Mode The score that appears most frequently in a set of
scores.
Quantitative Item “A numerical method for analyzing test items
Analysis employing student response alternatives or options.”
Reliability Refers to the internal consistency of a test.
Standard Deviation “The estimate of variability that accompanies the
mean in describing a distribution.”
viii
1. Introduction
As we have all experienced, objective test items are a very popular tool for
testing knowledge. One of the most popular objective test item types is the
multiplechoice format. According to Borich & Kubiszyn (2007: 116), the
uniqueness of multiplechoice items is that these items allow you to measure
knowledge at higher levels in Bloom’s taxonomy than other objective test
items. This provides a problem, as assessors often do not consider any
academic guidelines to set these questions. The result being that the items
differ vastly from one another in difficulty indices and that they often present
unrealistic discrimination indices. Borich & Kubiszyn (2007: 205)
The purpose of this report is to analyse the multiplechoice test item data
that was extracted from the final English second language grammar exam of
2008. This will be achieved through analysis of the measure of central
tendency and variability of the data. The first part of the analysis will consist
of the analysis of the question (test) as a whole. The second part of the
analysis will consist of individual itemanalysis.
The data includes the answers of twenty questions that were given by
twenty five learners. This is a small sample group, but it should provide
enough critique on the multiplechoice section of the exam to offer a detailed
overview of the test’s reliability. The findings in the report will be used to
determine whether the multiplechoice test items present in the exam was of
adequate and fair difficulty.
1
2. Test Analysis.
Descriptive Test Analysis.
In quantitative analysis, the first step is to tabulate the raw test scores.
According to Borich & Kubiszyn (2007: 204), this type of analysis is the ideal
for multiplechoice tests.
Consider table 2.1 for the ascending numerical sorting of the test scores.
Learner Percentage of items correct
L19 20
L1 35
L17 40
L24 40
L7 45
L15 45
L22 45
L6 50
L21 50
L10 55
L18 60
L4 65
L8 65
L9 65
L23 65
L5 70
L12 70
L13 80
L14 80
L20 80
L25 80
L2 85
L3 85
L11 95
L16 95
Table 2.1: Tabulated Test Scores
2
As depicted in table 2.1, we can determine the lower scores, higher scores
and the middle scores. We can see that considering the 40% cutoff rate,
only two students failed this test, while eight students obtained a distinction.
The measure of central tendency for these test scores in table 2.1 can be
seen in table 2.2.
Mean Median Mode Standard Deviation

62.2 65 65, 80 19.35
Table 2.2: Measure of Central Tendency
The mode is bimodal as we find an equal distribution of 65% and 80%
among these scores. Most scores are above the mean.
The next step is to group the scores in table 2.1 into intervals. This is done
in order to determine a simple frequency distribution. In table 2.3, one can
see the intervals, the lower and upper limits of the intervals, the frequency
and the cumulative frequency.
L25 80 Upper Cumulative

Learner
L2 Scores
85 Lower limit Mid Value Interval Frequency
Limit Frequency
L19
L3 20
85 20 26 23 2026 1 2
L1
L11 35
95 27 33 30 2733 0 2
L16
L17 95
40 34 40 37 3440 3 5
L24 40 41 47 44 4147 3 8
L7 45 48 54 51 4854 2 10
L15 45 55 61 58 5561 2 12
L22 45 62 68 65 6268 4 16
L6 50 69 75 72 6975 2 18
L21 50 76 82 79 7682 4 22
L10 55 83 89 86 8389 2 24
L18 60 90 96 93 9096 2 26
L4 65
L8 65
L9 65
L23 65
L5 70
L12 70
L13 80
L14 80
L20 80
3
Table 2.3: Frequency Distribution
Graphic Representation
The frequency can be graphically represented as follow:
Figure 2.1: Histogram of Frequency
In Figure 2.1, we can see that one learner scored between 20% and 26%.
Three learners scored between 34% and 40%. As the cutoff for passing is
40%, this graph unfortunately does not show how many of those three
passed. Three more learners scored between 41% and 47%. Two learners
scored between 48% and 54%. Another two learners scored between 55%
and 61%. Four learners scored between 62% and 68%. Two learners scored
between 69% and 75%. Four learners scored between 76% and 82%. Four
learners scored more than that. More than four of the learners got
distinctions. If we consider table 2.3 once again, we can see that although
the graph is accurate, the detail of the distribution is still unclear, due to the
large gap in scores that the intervals imply.
4
Figure 2.2: Polygon of Frequency
In figure 2.2, the average of the interval is taken into account on the
horizontal axis. We can see that the graph corresponds with figure 2.1 and
can thus trust that the data analysis that has been done on figure 2.1 is
reliable.
Figure 2.3: Ogive of Frequency
5
Figure 2.3 concentrates on the upper values of the intervals. This curve also
correlates with figures 2.2 and 2.1.
All three graphs are mesokurtic and negatively skewed. This implies that
that the sample group did truly well in the multiplechoice test. According to
Borich & Kubiszyn (2007: 257), there can be multiple reasons for this, for
example, that the sample group might have been of high intelligence, that
the test may have been too easy or that the timeconstraints for the test
was too lenient.
Reliability Coefficient.
“Another way of estimating the internal consistency of a test is through one
of the KuderRichardson methods.” Borich & Kubiszyn (2007: 321)
For the purpose of this analysis, we will use the KR20 method, as it is the
more accurate way of determining the reliability of a test. Borich & Kubiszyn
(2007: 322)
The formula for this test is:
From the data found in table 2.4, we can determine the reliability coefficient.
6
1 Mark for each correct answer
Q1
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 3 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Total % Level
1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 1 7 35 L
1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 0 17 85 U
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 17 85 U
1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 1 0 0 0 13 65 U
1 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 0 14 70 U
1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 10 50 U
0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 0 9 45 L
1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 1 0 13 65 U
1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 13 65 U
1 1 0 0 1 1 1 0 0 0 1 0 0 1 0 1 1 1 1 0 11 55 U
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 19 95 U
1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 0 1 0 14 70 U
1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 16 80 U
1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 16 80 U
1 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 9 45 L
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 19 95 U
0 1 0 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 8 40 L
1 1 0 1 1 0 1 0 0 0 1 1 0 1 1 1 1 0 1 0 12 60 U
0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 4 20 L
1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 0 16 80 U
1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 0 10 50 U
0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 0 9 45 L
1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 1 0 13 65 U
1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 1 8 40 L
1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 16 80 U
0.521 0.333 0.916 0.62 0.333
p 0.84 0.88 0.68 0.48 0.84 0.68 0.44 739 0.52 333 0.92 0.76 0.6 0.84 0.8 667 5 333 0.52 0.64
0.478 0.666 0.083 0.37 0.666
q 0.16 0.12 0.32 0.52 0.16 0.32 0.56 261 0.48 667 0.08 0.24 0.4 0.16 0.2 333 5 667 0.48 0.36
0.23
p 0.13 0.10 0.21 0.24 0.13 0.21 0.24 0.249 0.24 0.222 0.07 0.18 0.2 0.13 0.076 437 0.222 0.24 0.23 3.830
q 44 56 76 96 44 76 64 527 96 222 36 24 4 44 0.16 389 5 222 96 04 336
389.8
Var 333
1.052
Part 1 632
0.990
Part 2 174
1.042
r 289
Table 2.4: Test scores with pq values
7
The answer is a negative value and this can be interpreted that the test is
not reliable. Since the KR20 is equal to a small negative amount, it is safe to
assume that the reliability is not far out, but the test is still too easy.
3. Item Analysis
Difficulty Index
When considering table 3.1, we find that the difficulty indices demonstrate
that seven of the twenty questions were unacceptable because they were
too easy. These include Questions 1, 2, 5, 11, 14, 15 and16. Questions 6
and 12 were a bit easy and the rest of the questions were of acceptable
difficulty.
Question Difficulty Rating

Q1 .84 Unacceptable (too easy)
Q3 .68 Acceptable
Q4 .48 Acceptable
Q6 .68 Easy
Q7 .44 Acceptable
Q8 .52 Acceptable
Q9 .52 Acceptable
Q10 .33 Acceptable
Q12 .76 Easy
Q13 .60 Acceptable
Q17 .62 Acceptable
Q18 .33 Acceptable
Q19 .52 Acceptable
Q20 .64 Acceptable
Table 3.1: Item Difficulty Indices
8
Discrimination Index
In table 3.2, we can see that there are six items with a low discrimination
index. These items will have to be revised. It is also rather interesting to
note the correlation between the unacceptable difficulty indices and the
unacceptable discrimination indices as well as the correlation between the
acceptable difficulty indices and the acceptable discrimination indices.
Question Discrimination Rating
Q1 0.16 Negative

Q2 0.12 Negative
Q3 0.32 Positive
Q4 0.52 Positive
Q5 0.16 Negative
Q6 0.32 Positive
Q7 0.56 Positive
Q8 0.48 Positive
Q9 0.48 Positive
Q10 0.67 Positive
Q11 0.08 Negative
Q12 0.24 Positive
Q13 0.40 Positive
Q14 0.16 Negative
Q15 0.20 Negative
Q16 0.08 Positive
Q17 0.38 Positive
Q18 0.67 Positive
Q19 0.48 Positive
Q20 0.36 Positive
Table 3.2: Item Discrimination Indices
9
4. Conclusion
Figure 4.1: Percentage of acceptability
In this report on the 2008 Grade 12 English second language exam, the
assumption can be made that the multiplechoice test was rather easy. The
thorough analysis made on the frequency, standard deviation, discrimination
indices, difficulty indices and the reliability coefficient clearly proves this
assumption. Items 1,2,5,11,14 and 15 will need revision so that this test
may be graded as reliable. Consider that 76% of the test as seen in figure
4.1 is reliable and the other 24% of the test is too easy. The questions
mentioned were all rather easy and therefore not really applicable for a final
exam.
10
Bibliography
Borich, T. &. (2007). Educational Testing and Measurement: Classroom Application and Practice. NJ: John

Wiley & Sons. Inc.
Knoetze, J. (2007). Test Data. Retrieved April 1, 2008, from
http://www.jknoetze.co.za/CIA_722/testdata.xls
11
Appendix A: Test Data
Key C B D D B C D A C B A C B D A A C D B C
St No Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20
1 C B B A C D A D D A D A A A A C B D B
2 C B D D B D A A C B A C B D A A C D B C
3 C B D D B C D A C B A C B D A A C B D C
4 C B D B B C B A C B A C A D C A C B C C
5 C B D C B C B A C D A C B D A A A B B C
6 C A D D C C A D C D A C A D A A A B D C
7 B B A B B C B B D D A C B D C A A D D C
8 C B D B B C B D B C A C B D A A C A B A
9 C B D A B C D D B D A C B D A A C B D A
10 C B B A B C D C D C A B A D D A C D B C
11 C B D D B C D A C B A C B D A A C D B C
12 C B D D B C D D D A A C A D A A C B B D
13 C B D A B C D A C B A C B D A A A B B C
14 C B D A B C D A C B A C B D A A A B C
15 C B D D B B A A B D A C D A A C B B D D
16 C B D D B C D A C B A C B D A A C D B C
17 B B C C B A D D C A D B D A C A D
18 C B B D B A D D D D A C A D A A C B B C
19 D C A D B A B A D C C D A A D B B B A B
20 C B D D B C D A C A C D B D A A C D B C
21 C A D D C C A D C D A C A D A A A B D C
22 B B A B B C B B D D A C B D C A A D D C
23 C B D B B C B D B C A C B D A A C A B A
24 C B B A C D A D D A D A A A A C B D B
25 C B D D B D A A C B A C B D A A C D B C

Report On Descriptive Statistics and Item Analysis

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Report On Descriptive Statistics and Item Analysis

Caricato da

Copyright:

Formati disponibili

Report on Descriptive Statistics and Item Analysis of

Mean Median Mode Standard Deviation

L25 80 Upper Cumulative

Question Difficulty Rating

Question Discrimination Rating

Q1 0.16 Negative

Borich, T. &. (2007). Educational Testing and Measurement: Classroom Application and Practice. NJ: John

Potrebbero piacerti anche