Sei sulla pagina 1di 39

RELIABILITY

PREPARED BY:

1. PUTRI AMIRAH BINTI MEGAT AZAMUDDIN (824728)


2. NURAQLILI HANUM BINTI ABDULLAH (824762)
3. BALQIS BINTI HASINI (824695)

1
 Concept of Reliability
 Methods of Assessing Reliability
 Test-retest method
 Parallel form method
 Single administration
o Split-half method
o Kuder Richardson-20

Content Outline
o Cronbach’s alpha
 Intra- & Inter-rater reliability
 Index of Reliability Coefficient
 Factors Affecting Reliability
 Length of test
 Range of ability
 Scorer’s Objectivity
 Quality of test items
 Measurement Error
 Sources of measurement error
o Examinee
o Examiner
o Examination/Test 2
Concept of Reliability

3
CONCEPT OF RELIABILITY
• The degree of consistency among the information from the assessment.

• The score of a test is considered reliable when we get the same results repeatedly.

• Reliability is also discussed in terms of the degree of consistency between two measures of the
same thing (i.e., correlation) (Mehrens & Lehman, 1987).

• To measure of how stable, dependable, trustworthy and consistent a test is in measuring the
same thing each time (Worthen et al., 1993)

CONSISTENCY = RELIABILITY

4
CONCEPT OF RELIABILITY cont...

5
CONCEPT OF RELIABILITY cont...
• We cannot expect assessment results (i.e., scores) to be perfectly
consistent. There are numerous factors other than the quality
being measured that may influence assessment results.

• In generally, the more consistent our assessment results are from


one measurement to another, the less error there will be and,
consequently, the greater the reliability.

6
Methods of Assessing Reliability

7
TEST-RETEST METHOD
• The same test is administered twice to the same group of pupils with a
given time interval between the two administrations of the test.
• A high correlation between two sets of scores indicates that the score is
reliable.
• Time gap of retest should not be more than six months gives an
accurate index of reliability.

8
PARALLEL FORMS METHOD
• Two similar, or parallel forms of the same test are administrated to a group
of examinees just once.
• Two parallel forms must be homogeneous or similar in all respects, but not a
duplication of test items.
• The reliability coefficient may be looked upon as the coefficient correlation
between the scores on two equivalent forms of test.
• The two equivalent forms are to be possibly similar in content, degree,
mental processes tested, and difficulty level and in other aspects.

9
SINGLE ADMINISTRATION: SPLIT-HALF METHOD
• In this method, when a single test with homogeneous items is
administrated to a group of examinees, the test is split, or divided, into
two equal halves.
• The correlation between the two halves is an estimate of the test score
reliability.
• In this method easy and difficult items should be equally distributed in
two halves

10
SINGLE ADMINISTRATION: KUDER RICHARDSON-20
• Used for dichotomous answer items.
• KR-20 if Item Difficulty Index is different for all items, while KR-21
if this index is the same for all items.

SINGLE ADMINISTRATION: CRONBACH’S ALPHA


• This method is an extension to K-R20 for non- dichotomous items.
• This method is also suitable for essay whose scores may include a
large range of values.
• Ultimately, it can be used for dichotomous & polytomous items.

11
INTER-RATER RELIABILITY
• Inter-rater reliability means that if two different raters scored the scale
using the scoring rules, they should attain the same result.
• Inter-rater reliability is usually measured by computing the correlation
coefficient between the scores of two raters for the set of respondents.

12
INTRA-RATER RELIABILITY
• The degree to which same raters give consistent estimates of the same
measurement over time.
• Means that one person should come out with the same results on every
repetition of the test, within acceptable level.
• Consistency in measurement and scoring by the evaluator when two
tests results from two similar situations are correlated.

13
ADVANTAGES & DISADVANTAGES
Method Advantage Disadvantage
Test Retest Easy to use in most research No estimate reliability until after
settings the second test.
Parallel forms Shorter waiting period between Requires two test and at least
test than test-retest method two forms
Split half Only one test and Different subsection will affect
administration needed test homogeneity, thereby
reducing the reliability of test
scores
Inter-Rater They provide useful feedback There are more possibilities for
Reliability about areas of strength and raters to disagree.
weakness in student
performance. 14
RECAP
What is test test-retest method?
What is test parallel form method?
What is test single administration method?
Discuss the three types mentioned.
What is test inter- & intra-rater reliability?

15
Index of Reliability

16
INDEX OF RELIABILITY
• Reliability index can be calculated by using correlation coefficients
between two measurable measurements using various methods.
• Correlation coefficient range between -1.00 and + 1.00.
• Normally the reliability index is positive, and for most tests, the
index between 0.65 and 0.90 is sufficient
• The negative reliability index indicates inverse consistency, that is,
students who score high in the first test will get a low score in the
second test, and vice versa.

17
INDEX OF RELIABILITY
Value of Correlation (r) Item description
<0.20 Very poor
0.21 – 0.40 Poor
0.41 – 0.60 Intermediate
0.61 – 0.80 Good
0.81 – 1.00 Very good
Mehrens dan Lehmann (1991)

18
HOW HIGH SHOULD RELIABILITY BE?

• In general, we often use Cronbach’s alpha coefficient which is


somewhat similar to correlation coefficient although with
different computation and conceptualization.
• Reliability (Cronbach’s alpha) >= .9 is quite good. Instruments
used in high stakes decisions typically require the observed scores
to have a reliability of at least 0.9.
• Reliability (Cronbach’s alpha) = .8 is often adequate in non-high
stakes situations
• Reliability (Cronbach’s alpha) < 0.7 is getting quite low.

19
Factors Affecting Reliability

20
FACTOR AFFECTING RELIABILITY
1. Length Of Test
• Longer assessment procedure higher reliability
• The larger the number of task higher reliability
• A larger number of test tends to lessen the influence of chances
factors such as guessing.

21
FACTOR AFFECTING RELIABILITY
2. Range of Ability
• The narrower the range of a group’s ability lower the
reliability coefficient to be.
• Homogeneous group lower reliability coefficient.
• Heterogeneous group increases reliability
• Wider spread of scores higher reliability.

22
FACTOR AFFECTING RELIABILITY
3. Scorer’s Objectivity
• Measures without reference to outside influences.
• More objectively scored assessment results more reliable.
• The test item that are of the objective type , the resulting scores
are not influence by the examiner’s/scorer's judgment or opinion.
• e.g multiple choice question/ true false question

23
FACTOR AFFECTING RELIABILITY
4. Quality of test item

• Items that perform better (are more highly discriminating) produce


high reliability

• Narrow score distributions low reliability

Very
Difficult
Test

24
RECAP
What are the factors that may influence of
reliability?

25
Measurement Error

26
Ep : MEASUREMENT ERROR
All tests scores contain some error.

For any test, the higher the reliability estimate, the lower the error.

The error term includes all random errors that are affecting
empirical scores.

27
SOURCES OF MEASUREMENT ERROR

Examinee

Examiner

Examination/Test
28
SOURCE OF MEASUREMENT ERROR: EXAMINEE

 CONDITION OF THE PERSON TAKING THE TEST

Sensitivity Health Motivation

Ability to
Mood/Emotional
understand Luck
State
instruction
29
Students’ Slides:
Darshinee, Priya, SOURCE OF
Fatigue/
Theivina, & Sathiya Sickness MEASUREMENT
(A171)
ERROR: EXAMINEE
cont…

Examinees’ Poorly
scores are motivated/
EXAMINEE
affected by Anxiety/Effects
guessing of memory

When the group of pupils


being tested is homogeneous
in ability, the reliability of the
test scores is likely to be
lowered.
30
SOURCE OF MEASUREMENT ERROR:
EXAMINER
OBJECTIVITY OF SCORING
Different scorers produce the same score if they apply the same
scoring key.

More objective scoring

more accurate score


31
Students’ Slides:
Darshinee, Priya, Test length. The longer a
SOURCE OF
Theivina, & Sathiya test is, the more reliable it MEASUREMENT
(A171) is.
ERROR:
EXAMINATION
Test-retest interval. The
shorter the time interval Speed. Not every
between two student is able to
administrations of a test, TEST complete all of the
the less likely that items in a speed test.
changes will occur.

Item difficulty. Reliability will


be low if a test is so easy or
so difficult that every
student gets most or all of
the items wrong.
32
SOURCE OF MEASUREMENT ERROR:
EXAMINATION cont...
 TEST ADMINISTRATION PROCEDURES

Changes in time limits

Changes in test instructions/directions

Qualities of test administrator

Temporal influence (see next slide)

Differences in observation
33
SOURCE OF MEASUREMENT ERROR:
EXAMINATION cont...
TEST ADMINISTRATION PROCEDURE cont.. TEMPORAL INFLUENCES

TEMPORAL STABILITY – Scores should fluctuate


very little over a reasonably brief time interval.
Estimates TEMPORAL RELIABILITY –
correlation between scores on the 2 trials
34
Students’ Slides:
Darshinee, Priya, SOURCE OF
Theivina, & Sathiya Light levels & MEASUREMENT
(A171) Temperature
ERROR:
EXAMINATION cont...

Ventilation TEST ADMINISTRATION Noise level

Distraction (in
exam
hall/computer
breakdown)
35
SOURCE OF MEASUREMENT ERROR:
EXAMINATION cont...
TEST DEVELOPMENT: SAMPLING OF CONTENT
 A teacher cannot really construct 2 forms of a test that are independent of
each other.
 Another teacher’s test usually would differ even more.
 If test plan is fairly detailed and followed
carefully, then the content sampling for an
objective test with a large number of Set of all possible
questions
items should be reasonably adequate.

Sample of
questions

36
RECAP
What is measurement error?
What are the sources of measurement error?

37
 Concept of Reliability
 Methods of Assessing Reliability
 Test-retest method
 Parallel form method
 Single administration
o Split-half method
o Kuder Richardson-20

Recap
o Cronbach’s alpha
 Intra- & Inter-rater reliability
 Index of Reliability Coefficient
 Factors Affecting Reliability
 Length of test
 Range of ability
 Scorer’s Objectivity
 Quality of test items
 Measurement Error
 Sources of measurement error
o Examinee
o Examiner
o Examination/Test 38
THANK YOU

39

Potrebbero piacerti anche