Chapter 6

2008 The McGraw-Hill Companies, Inc. All rights reserved.
McGraw-Hill
1
Validity and Reliability
Chapter 6
McGraw-Hill
2
Validity and Reliability
McGraw-Hill
3
Validity
Validity has been defined as referring to the appropriateness,
correctness, meaningfulness, and usefulness of the specific
inferences researchers make based on the data they collect.
It is the most important idea to consider when preparing or
selecting an instrument.
Validation is the process of collecting and analyzing evidence
to support such inferences.
McGraw-Hill
4
Evidence of Validity
There are 3 types validity
Content-related evidence of validity
Content and format of the instrument.
Criterion-related evidence of validity
Relationship between scores obtained using the instrument
and scores obtained using one or more instruments or
measures (criterion).
Construct-related evidence of validity
Characteristics being measured by the instrument.
McGraw-Hill
5
Content-Related Evidence
A key element is the adequate sample of the domain it
is supposed to represent.
The other aspect of content validation is the format of
the instrument (clarity of printing, size of type, adequacy
of work space, appropriateness of language, clarity of
directions).
Attempts to obtain evidence that the items measure
what they are supposed to measure typify the process
of content-related evidence.
McGraw-Hill
6
Content-Related Evidence
To assess the students mathematics ability, a math test
of about 15 problems is given.
The performance on the instrument (test) will provide
valid evidence of mathematics ability of the students if
the instrument provides an adequate sample of types of
word problems covered in the test.
If easy problems are included in the test, or only very
difficult or lengthy ones, or only problems involving
subtraction, the test will be unpresentative and hence
not provide information from which valid inferences can
be made.
McGraw-Hill
7
Criterion-Related Evidence
A criterion is a second test presumed to measure the
same variable.
There are two forms of criterion-related validity:
1) Predictive validity: time interval elapses between
administering the instrument and obtaining criterion scores.
2) Concurrent validity: instrument data and criterion data are
gathered and compared at the same time.
A Correlation Coefficient (r) indicates the degree of
relationship that exists between the scores of individuals
obtained by two instruments.
A positive relationship is indicated when a high score on
one of the instruments is accompanied by a high score on
the other or when a low score on one is accompanied by a
low score on the other.
McGraw-Hill
8
Criterion-Related Evidence
A negative relationship is indicated when a high score on
one instrument is accompanied by a low score on the
other and vice versa.
All correlation coefficients fall between +1.00 and -1.00.
An r of 0.00 indicates that no relationship exists.
A correlation coefficient is used to describe the
relationship between a set of scores obtained by the same
group of individuals on a particular instrument and their
scores on some criterion measure, it is called a validity
coefficient.
McGraw-Hill
9
Construct-Related Evidence
Considered the broadest of the three categories.
There is no single piece of evidence that satisfies
construct-related validity.
Researchers attempt to collect a variety of types of
evidence, including both content-related and
criterion-related evidence.
The more evidence researchers have from different
sources, the more confident they become about the
interpretation of the instrument.
McGraw-Hill
10
Illustration of Types of Evidence of Validity
McGraw-Hill
11
Reliability
Refers to the consistency of scores or answers
provided by an instrument.
Scores obtained can be considered reliable but not
valid.
An instrument should be reliable and valid, depending
on the context in which an instrument is used.
McGraw-Hill
12
Reliability and Validity
McGraw-Hill
13
Reliability of Measurement
McGraw-Hill
14
Errors of Measurement
Whenever people take the same test twice, they will
seldom perform exactly the same.
Because errors of measurement are always present
to some degree, variation in test scores are
common.
This is due to:
Differences in motivation
Energy
Anxiety
Different testing situation
McGraw-Hill
15
Reliability Coefficient
Expresses a relationship between scores of the
same instrument at two different times or parts of
the same instrument.
The 3 best known methods are:
Test-retest
Equivalent-forms method
Internal-consistency method
McGraw-Hill
16
Test-Retest Method
Involves administering the same test twice to the same
group after a certain time interval has elapsed.
A reliability coefficient is calculated to indicate the
relationship between the two sets of scores.
Reliability coefficients are affected by the lapse of time
between the administrations of the test.
An appropriate time interval should be selected.
In educational research, scores collected over a two-month
period are considered sufficient evidence of test-retest
reliability.
McGraw-Hill
17
Equivalent-Forms Method
Two different but equivalent (alternate or parallel) forms of
an instrument are administered to the same group during
the same time period.
A reliability coefficient is then calculated between the two
sets of scores.
It is possible to combine the test-retest and equivalent-
forms methods by giving two different forms of testing with
a time interval between the two administrations.
McGraw-Hill
18
Internal-Consistency Methods
There are several internal-consistency methods that require only
one administration of an instrument.
Split-half Procedure: involves scoring two halves (usually odd
items versus even items) of a test separately for each subject and
calculating the correlation coefficient between the two scores.
The coefficient indicates the degree to which the two halves of the
test provide the same results and hence describes the internal
consistency of the test.
The reliability coefficient is calculated using Spearman-Brown
Prophecy Formula:
test for y reliabilit
test for y reliabilit
test total on scores of liability
2
1
1
2
1
2
Re
+
=
McGraw-Hill
19
Internal-Consistency Methods
Kuder-Richardson Approaches: (KR20 and KR21) requires 3
pieces of information:
Number of items on the test (K)
The mean (M)
The standard deviation (SD)
Considered the most frequent method for determining internal
consistency
Formula KR21 can be used only if it can be assumed that the
items are of equal difficulty.
Formula KR20 does not require the assumption that all items
are of equal difficulty.
Alpha Coefficient: a general form of the KR20 used to calculate the
reliability of items that are not scored right vs. wrong.
(
=
2
) (
) (
1
1
Re 21
SD K
M K M
K
K
t coefficien liability KR
McGraw-Hill
20
Standard Error of Measurement
An index that shows the extent to which a
measurement would vary under changed
circumstances.
There are many possible standard errors for scores
given.
Under the assumption that errors of measurement
are normally distributed, a range of scores can show
the amount of error to be expected.
McGraw-Hill
21
Scoring Agreement
Scoring agreement requires a demonstration that independent
scorers can achieve satisfactory agreement in their scoring.
Instruments that use direct observations are highly vulnerable to
observer differences.
What is desired is a correlation of at least .90 among scorers as
an acceptable level of agreement.

Chapter 6

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Chapter 6

Caricato da

Copyright:

Formati disponibili

2008 The McGraw-Hill Companies, Inc. All rights reserved.

Potrebbero piacerti anche