Microsoft Word - Report On Descriptive Statistics and Item Analysis

Report on Descriptive Statistics and Item Analysis of
Objective Test Items

Report on Descriptive Statistics and Item Analysis of Objective Test Items on
Data Extracted From the Grade 12 Final English Second Language Exam
2008.
by
Stephan Freysen
Prof. T Kuhn
CIA 722
7 April 2008
ii
Acknowledgements
I would like to express extreme gratitude to the Gauteng Department of

Education for the professional and cooperative manner in which they dealt.
The data-gathering for this report would have been far much more gruelling
had it not been for the selfless assistance that Mr. Y Zafir and Ms. L Bongani
provided me with.
I would also like to thank Prof. Knoetze for tabulating the test data. Thanks
to Prof. Kuhn for setting up a template with formulas. It has been a great
help.
iii
Descriptive Abstract
This report is written so that judgement can be passed on the reliability of

the multiple –choice test in the grade 12 English second language final
exam.
iv
Table of contents
Acknowledgements iii
Descriptive abstract iv
List of Tables vi
List of Figures vii
Terminology list viii
1. Introduction and purpose 1
2. Test analysis 2
2.1 Descriptive Test Analysis. 2
2.2 Graphic Representation 5
2.3 Reliability Coefficient. 7
3. Item analysis 8
3.1 Difficulty Index 10
3.2 Discrimination Index 11
4. Conclusion 12
Bibliography 13
Appendix A: Test Data
v
List of tables
Table 2.1: Tabulated Test Scores
Table 2.2: Measure of Central Tendency
Table 2.3: Frequency Distribution
Table 2.4: Test scores with pq values
Table 3.1: Item Difficulty Indices
Table 3.2: Item Discrimination Indices
vi
List of Figures
Figure 2.1: Histogram of Frequency
Figure 2.2: Polygon of Frequency
Figure 2.3: Ogive of Frequency
Figure 4.1: Percentage of acceptability
vii
Terminology List
Descriptive Statistics The term used to refer to the mode, median and
mean.
Difficulty Index “Proportion of students who answered the item

correctly.” Borich & Kubiszyn (2007: 205)
Discrimination Index “Measure of the extent to which a test item

discriminates or differentiates between students who
do well on the overall test an those who do not do
well on the overall test.”
Borich & Kubiszyn (2007: 205)
Mean The average of a set of numbers
Median The score that splits the distribution in half.
Mode The score that appears most frequently in a set of

scores.
Quantitative Item “A numerical method for analyzing test items

Analysis employing student response alternatives or options.”

Reliability Refers to the internal consistency of a test.
Standard Deviation “The estimate of variability that accompanies the

mean in describing a distribution.”
viii
1. Introduction
As we have all experienced, objective test items are a very popular tool for testing
knowledge. One of the most popular objective test item types is the multiple-
choice format. According to Borich & Kubiszyn (2007: 116), the uniqueness of
multiple-choice items is that these items allow you to measure knowledge at
higher levels in Bloom’s taxonomy than other objective test items. This provides a
problem, as assessors often do not consider any academic guidelines to set these
questions. The result being that the items differ vastly from one another in
difficulty indices and that they often present unrealistic discrimination indices.
The purpose of this report is to analyse the multiple-choice test item data that was
extracted from the final English second language grammar exam of 2008. This will
be achieved through analysis of the measure of central tendency and variability of
the data. The first part of the analysis will consist of the analysis of the question
(test) as a whole. The second part of the analysis will consist of individual item-
analysis.
The data includes the answers of twenty questions that were given by twenty five
learners. This is a small sample group, but it should provide enough critique on the
multiple-choice section of the exam to offer a detailed overview of the test’s
reliability. The findings in the report will be used to determine whether the
multiple-choice test items present in the exam was of adequate and fair difficulty.
1
2. Test Analysis.
2.1 Descriptive Test Analysis.
In quantitative analysis, the first step is to tabulate the raw test scores. According
to Borich & Kubiszyn (2007: 204), this type of analysis is the ideal for multiple-
choice tests.
Consider table 2.1 for the ascending numerical sorting of the test scores.
Table 2.1: Tabulated Test Scores
Learner Percentage of items correct
L19 15
L1 30
L17 35
L24 40
L7 45
L15 50
L22 50
L6 55
L21 55
L10 60
L18 65
L4 65
L8 65
L9 65
L23 70
L5 70
L12 75
L13 85
2
L14 85
L20 85
L25 85
L2 90
L3 90
L11 100
L16 100
As depicted in table 2.1, we can determine the lower scores, higher scores and the
middle scores. We can see that considering the 40% cut-off rate, only three
students failed this test, while eight students obtained a distinction
The measure of central tendency for these test scores in table 2.1 can be seen in
table 2.2
Table 2.2: Measure of Central Tendency.
Mean Median Mode Standard Deviation

65.2 65 65, 80 21.7
An equal distribution of 65% and 80% among these scores shows that it is bi-
modal. Most scores are above the mean. The next step is to group the scores in
table 2.1 into intervals. This is done in order to determine a simple frequency
distribution. In table 2.3, one can see the intervals, the lower and upper limits of
the intervals, the frequency and the cumulative frequency.
3
Table 2.3: Frequency Distribution
Upper Cumulative
Learner Scores Lower limit Mid Value Interval Frequency
Limit Frequency
L19 15 15 22 18.5 15-22 1 1
L1 30 23 30 26.5 23-30 1 2
L17 35 31 38 34.5 31-38 1 3
L24 40 39 46 42.5 39-46 2 5
L7 45 47 54 50.5 47-54 2 7
L15 50 55 62 58.5 55-62 3 10
L22 50 63 70 66.5 63-70 6 16
L6 55 71 78 74.5 71-78 1 17
L21 55 79 86 82.5 79-86 4 21
L10 60 87 94 90.5 87-94 2 23
L18 65 95 102 98.5 95-102 2 25
L4 65
L8 65
L9 65
L23 70
L5 70
L12 75
L13 85
L14 85
L20 85
L25 85
L2 90
L3 90
L11 100
L16 100
4
2.2 Graphic Representation
In Figure 2.1, we can see that one learner scored between 20% and 26%. Three
learners scored between 34% % and 4 47%. As the cut-off for passing is 40%, this
graph shows that between one ne and three of these students passed. Two T more
learners
ers scored between 41% and 47% and another two wo learners scored between
48% and 54%. Another three learners scored between 55% and 6 62%.
%. Six learners
scored between 63% and 70%. %. One learner scored between 71% % and 78%.
7 Four
learners scored between 79% % and 8 86% and four learners scored above that.
Between four and eight of the learners achieved distinctions. If we consider table
2.3 once again, we can see that although the graph is accurate, the detail of the
distribution is still unclear, due to the large ga
gap in scores implied by the intervals.
Frequency Histogram
7
0
Intervals
20-26 27-33 34-40 41-47 48-54 55-62 63-70 71-78 79-86 87-94 95-102
102
Figure 2.1: Histogram of Frequency
In figure 2.2, the average

e of the int
interval is depicted on the horizontal axis. We can
see that the graph correlates with figure 2.1 and can thus trust that the data
analysis done in figure 2.1 is reliable.
5
Frequency Polygon
7
4
f
3 Series1
2 Linear (Series1)
0
0 20 40 60 80 100 120
Middle Values
Figure 2.2: Polygon of Frequency
Figure 2.3 concentrates on the upper values of the intervals. This curve also
correlates with figures 2.2 and 2.1.
Frequency Ogive
7
6 70, 6
4 86, 4
f
3 62, 3
Series1
2 46, 2 54, 2 94, 2 102, 2
1 22, 1 30, 1 38, 1 78, 1
0
0 20 40 60 80 100 120
Upper Values
Figure 2.3: Ogive of Frequency
6
All three graphs are leptokurtic and negatively skewed. This implies that that the
sample group did truly well in the multiple-choice test. According to Borich &
Kubiszyn (2007: 257), there can be multiple reasons for this, for example, that
the sample group might have been of high intelligence, that the test may have
been too easy or that the time-constraints for the test was too lenient.
2.3 Reliability Coefficient.
“Another way of estimating the internal consistency of a test is through one of the
Kuder-Richardson methods.” Borich & Kubiszyn (2007: 321)
For the purpose of this analysis, we will use the KR20 method, as it is the more
accurate way of determining the reliability of a test. Borich & Kubiszyn (2007:
322)
The formula for this test is:
1 ∑

1
From the data found in table 2.4, we can determine the reliability coefficient.
20 2.830336
KR 20

20‐1 240668
KR 20 1.05‐0.0000118
KR 20 1.05‐0.0000118
KR 20 ‐0.00001239
The answer is a negative value and this can be interpreted that the test is not
reliable. Since the KR20 is equal to a very small negative amount, it is safe to
assume that the reliability is not far out, but the test is still too easy.
Based on the diminutive magnitude of the answer to the KR20, the KR21 method
was used as well to verify the reliability of the test.
7
1 ! !

1 "
20 1 65.220 65.2

20 1 434
20 1 65.245.2

19 188356
64.245.2
1.05
188356
0.015
Since the outcome of this formula is a positive value, it complicates the decision of
whether the test is acceptable or not. The reason for this contradiction may lie
therein that the KR20 is more accurate than the KR21 Borich & Kubiszyn (2007:
322) and since both formulas provide small answers, it is probably safe to assume
that the test lies on the border of reliability. Since this is the case we will need to
analyse the difficulty and discrimination indices of each item individually.
8
Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20 Total % Level
L1 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 0 0 6 30 L
L2 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 18 90 U
L3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 18 90 U
L4 1 1 1 0 1 1 0 1 1 1 1 1 0 1 0 1 1 0 0 1 14 70 U
L5 1 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 1 1 15 75 U
L6 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 1 11 55 U
L7 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 10 50 U
L8 1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 1 0 13 65 U
L9 1 1 1 0 1 1 1 0 0 0 1 1 1 1 1 1 1 0 0 0 13 65 U
L10 1 1 0 0 1 1 1 0 0 0 1 0 0 1 0 1 1 1 1 1 12 60 U
L11 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 20 100 U
L12 1 1 1 1 1 1 1 0 0 0 1 1 0 1 1 1 1 0 1 0 14 70 U
L13 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 17 85 U
L14 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 17 85 U
L15 1 1 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 0 9 45 L
L16 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 20 100 U
L17 0 1 0 0 1 0 1 0 1 0 1 0 1 1 1 0 0 0 0 0 8 40 L
L18 1 1 0 1 1 0 1 0 0 0 1 1 0 1 1 1 1 0 1 1 13 65 U
L19 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 3 15 L
L20 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 1 1 1 17 85 U
L21 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 0 0 0 1 11 55 U
L22 0 1 0 0 1 1 0 0 0 0 1 1 1 1 0 1 0 1 0 1 10 50 L
L23 1 1 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0 1 0 13 65 U
L24 1 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 1 0 1 0 7 35 L
L25 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 0 1 17 85 U
p 0.8 0.88 0.68 0.48 0.84 0.68 0.44 0.5217 0.52 0.3333 0.92 0.76 0.6 0.84 0.8 0.916 0.62 0.3333 0.52 0.64
4
0.1 0.12 0.32 0.52 0.16 0.32 0.56 39
0.4782 0.48 33
0.6666 0.08 0.24 0.4 0.16 0.2 667
0.083 5
0.37 33
0.6666 0.48 0.36
q
6
0.1 0.10 0.21 0.24 0.13 0.21 0.24 61
0.2495 0.24 67
0.2222 0.07 0.18 0.24 0.13 0.16 333
0.076 5
0.23 67
0.2222 0.24 0.2304 3.830
pq
34 56 76 96 44 76 64 27 96 22 36 24 44 389 4375 22 96 336
Var 490.58
Part 33
1.0526
1
Part 32
0.9939
2 63
Table 2.4: Test scores with p and

9
3. Item Analysis
3.1 Difficulty Index
When considering table 3.1, we find that the difficulty indices demonstrate that
seven of the twenty questions were unacceptable because they were too easy.
These include Questions 1, 2, 5, 11, 14, 15 and16. Questions 6 and 12 were a bit
easy and the rest of the questions were of acceptable difficulty.
Table 3.1: Item Difficulty Indices
Question Difficulty Rating

Q1 .84 Unacceptable (too easy)
Q3 .68 Acceptable
Q4 .48 Acceptable
Q6 .68 Easy
Q7 .44 Acceptable
Q8 .52 Acceptable
Q9 .52 Acceptable
Q10 .33 Acceptable
Q12 .76 Easy
Q13 .60 Acceptable
Q17 .62 Acceptable
Q18 .33 Acceptable
Q19 .52 Acceptable
Q20 .64 Acceptable
10
3.2 Discrimination Index
In table 3.2, we can see that there are six items with a low discrimination index.
These items will have to be revised. It is also rather interesting to note the
correlation between the unacceptable difficulty indices and the unacceptable
discrimination indices as well as the correlation between the acceptable difficulty
indices and the acceptable discrimination indices.
Table 3.2: Item Discrimination Indices
Question Discrimination Rating
Q1 0.16 Negative
Q2 0.12 Negative
Q3 0.32 Positive
Q4 0.52 Positive
Q5 0.16 Negative
Q6 0.32 Positive
Q7 0.56 Positive
Q8 0.48 Positive
Q9 0.48 Positive
Q10 0.67 Positive
Q11 0.08 Negative
Q12 0.24 Positive
Q13 0.40 Positive
Q14 0.16 Negative
Q15 0.20 Negative
Q16 0.08 Positive
Q17 0.38 Positive
Q18 0.67 Positive
Q19 0.48 Positive
Q20 0.36 Positive
11
4. Conclusion
Reliability
24%
Acceptable
76%
Figure 4.1: Percentage of acceptability
In this report on the 2008 Grade 12 English second language exam, the
assumption can be made that the multiple
multiple-choice
choice test was rather easy. The
thorough analysis of the freque
frequency,
ncy, standard deviation, discrimination indices,
difficulty indices and the reliab
reliability coefficient clearly proved this assumption.
Items 1,2,5,11,14 and 15 will need revision so that this test may be graded as
reliable. Consider that 76% of the test
test, as seen in figure 4.1, is reliable and the
other 24% of the test is too easy. The questions mentioned were all rather easy
and therefore not really applicable for a final exam.
12
Bibliography
Borich, T. &. (2007). Educational Testing and Measurement: Classroom Application and Practice. NJ: John
Wiley & Sons. Inc.
Knoetze, J. (2007). Test Data. Retrieved April 1, 2008, from

http://www.jknoetze.co.za/CIA_722/testdata.xls
13
Appendix A: Test Data
Key C B D D B C D A C B A C B D A A C D B C
St No Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16 Q17 Q18 Q19 Q20
1 C B B A C D A D D A D A A A A C B D B
2 C B D D B D A A C B A C B D A A C D B C
3 C B D D B C D A C B A C B D A A C B D C
4 C B D B B C B A C B A C A D C A C B C C
5 C B D C B C B A C D A C B D A A A B B C
6 C A D D C C A D C D A C A D A A A B D C
7 B B A B B C B B D D A C B D C A A D D C
8 C B D B B C B D B C A C B D A A C A B A
9 C B D A B C D D B D A C B D A A C B D A
10 C B B A B C D C D C A B A D D A C D B C
11 C B D D B C D A C B A C B D A A C D B C
12 C B D D B C D D D A A C A D A A C B B D
13 C B D A B C D A C B A C B D A A A B B C
14 C B D A B C D A C B A C B D A A A B C
15 C B D D B B A A B D A C D A A C B B D D
16 C B D D B C D A C B A C B D A A C D B C
17 B B C C B A D D C A D B D A C A D
18 C B B D B A D D D D A C A D A A C B B C
19 D C A D B A B A D C C D A A D B B B A B
20 C B D D B C D A C A C D B D A A C D B C
21 C A D D C C A D C D A C A D A A A B D C
22 B B A B B C B B D D A C B D C A A D D C
23 C B D B B C B D B C A C B D A A C A B A
24 C B B A C D A D D A D A A A A C B D B
25 C B D D B D A A C B A C B D A A C D B C

Microsoft Word - Report On Descriptive Statistics and Item Analysis

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Microsoft Word - Report On Descriptive Statistics and Item Analysis

Caricato da

Copyright:

Formati disponibili

Report on Descriptive Statistics and Item Analysis of

Objective Test Items

I would like to express extreme gratitude to the Gauteng Department of

This report is written so that judgement can be passed on the reliability of

List of Figures vii

Terminology list viii

1. Introduction and purpose 1

2.1 Descriptive Test Analysis. 2

2.2 Graphic Representation 5

2.3 Reliability Coefficient. 7

3.1 Difficulty Index 10

3.2 Discrimination Index 11

Appendix A: Test Data

Table 2.2: Measure of Central Tendency

Table 2.3: Frequency Distribution

Table 2.4: Test scores with pq values

Table 3.1: Item Difficulty Indices

Table 3.2: Item Discrimination Indices

Figure 2.1: Histogram of Frequency

Figure 2.2: Polygon of Frequency

Figure 2.3: Ogive of Frequency

Figure 4.1: Percentage of acceptability

Difficulty Index “Proportion of students who answered the item

Discrimination Index “Measure of the extent to which a test item

Borich & Kubiszyn (2007: 205)

Mean The average of a set of numbers

Median The score that splits the distribution in half.

Borich & Kubiszyn (2007: 259)

Mode The score that appears most frequently in a set of

Borich & Kubiszyn (2007: 264)

Quantitative Item “A numerical method for analyzing test items

Borich & Kubiszyn (2007: 205)

Borich & Kubiszyn (2007: 318)

Standard Deviation “The estimate of variability that accompanies the

Borich & Kubiszyn (2007: 272)

Table 2.1: Tabulated Test Scores

Learner Percentage of items correct

Table 2.2: Measure of Central Tendency.

Mean Median Mode Standard Deviation

L17 35 31 38 34.5 31-38 1 3

L24 40 39 46 42.5 39-46 2 5

L15 50 55 62 58.5 55-62 3 10

L22 50 63 70 66.5 63-70 6 16

L21 55 79 86 82.5 79-86 4 21

L10 60 87 94 90.5 87-94 2 23

L18 65 95 102 98.5 95-102 2 25

Figure 2.1: Histogram of Frequency

In figure 2.2, the average

Figure 2.2: Polygon of Frequency

1 22, 1 30, 1 38, 1 78, 1

Figure 2.3: Ogive of Frequency

2.3 Reliability Coefficient.

Table 2.4: Test scores with p and

3.1 Difficulty Index

Table 3.1: Item Difficulty Indices

Question Difficulty Rating

Table 3.2: Item Discrimination Indices

Question Discrimination Rating

Figure 4.1: Percentage of acceptability

Knoetze, J. (2007). Test Data. Retrieved April 1, 2008, from

Potrebbero piacerti anche