Sei sulla pagina 1di 26

Reliability & Validity

siti nur anis binti md azmi 2012306847

ON TARGET

ON TARGET

RELIABILITY The consistency or dependability of the scores obtained. A RELIABLE TEST Produces similar scores across various conditions and situations, including different evaluators and testing environments. Example 1: Typing test Example 2: A Weight Scale

ON TARGET

cont.

The scores obtained from one instrument can be reliable but not valid. Example: A researcher gave two forms (Form i and ii) to a second grade students to measure their knowledge on History of Malaysia. Any relationship between History and PE??

ON TARGET

DISTINCTION BETWEEN RELIABILITY AND VALIDITY Reliability and Validity always depend on the context on which an instrument is used. Depending on the context, the instrument may or may not give reliable/ consistent scores.

ON TARGET

cont.

The bulls eye represent the information desired Each dot represent the scores from an instrument A dot in the bulls eye indicate the information (the score) the researcher desires

ON TARGET

ERRORS OF MEASUREMENT

Whenever a person takes the SAME test TWICE, they seldom perform exactly the same. Why??

Motivation

Energy

Anxiety

Different testing situation

Other factors??

ON TARGET

cont.

Because error of measurement always present to

some degree, researchers expect some variation in


test scores whenever the instrument is administered more than one to the same group. Reliability estimates provide researchers how much variation to expect.

This is known as Reliability Coefficient.

ON TARGET

RELIABILITY COEFFICIENT

Def: the relationship between scores of the same individual on the same instrument at two different times.

Test-retest method Internal-consistency method

Equivalent-forms method

ON TARGET

TEST-RETEST METHOD

Def: Admistering the same test twice to the same

group with any time interval between tests.


Reliability Coefficient will be affected by the length of time that elapses between the two test. The longer the time interval, the lower the reliability coefficient to be since the changes in the

individual may happen.

ON TARGET

cont.

But then not all variable are equally stable, ie:


Personal Characteristics

Abilities

Mood

ON TARGET

EQUIVALENT-FORMS METHOD (ALTERNATE / PARALLEL)


Obtained by administering two equivalent tests to the same

group at the same time.


The contents are the same and reliability coefficient is calculated between the two sets of scores obtained.

It is necessary that the time frame between giving the two


forms be as short as possible

ON TARGET

cont.
Most acceptable estimate of reliability. (Therefore, most commonly used in research - similar in content, difficulty

level, arrangement, type of assessment, etc.)

ON TARGET

INTERNAL-CONSISTENCY METHOD

Internal Consistency Method

Split-Half Procedure

KuderRicherdson Approaches

Alpha Coefficient

ON TARGET

i) SPLIT-HALF PROCEDURE Test has two equivalent halves. Give test once, score two equivalent halves (odd items vs. even items).

Items on the instrument are divided into comparable


halves E.g., a scale divided so that the first half has the same score as the second. Looks at internal consistency. Weakness: difficulty to ensure that the two halves are equivalent

ON TARGET

cont.
The coefficient indicates the degree to which the two halves of the test provide the Same results and hence describe the internal consistency of the test.

The reliability is calculated using the Spearman-Brown prophecy formula.


Reliability of scores on total test = 2 X reliability for test -------------------------------1 + reliability for test

Thus, if we obtained a correlation coefficient of .56 by comparing one half of the test Items to the other half, the reliable scores of the total test would be: Reliability of scores on total test = 2 x .56 = 1.12 = .72 ---------- -----1 + .56 1.56

ON TARGET

cont.

ON TARGET

ii) KUDER-RICHARDSON APPROACHES

Estimate of homogeneity when items have a dichotomous response, e.g. yes/no items. Should be computed for a test on an initial reliability testing, and computed for the actual sample. Based on the consistency of responses to all of the items of a single form of a test

ON TARGET

cont.

The most employed method for determining internal-consistency, particularly KR20 and KR21. KR21 requires only three piece of information; number of items, mean, standard deviation. But, KR21 can only be used if it can be assumed that the items are all equal difficulty. KR21 reliability coefficient K --K-1 M(K-M) ---------K (SD)

1 -

K=Number of items on test, M=Mean, SD=Standard Deviation

ON TARGET

iii) ALPHA COEFFICIENT

Frequently called Cronbach alpha A general from of the KR20 formula to be used in calculating the reliability of items that are not scored right versus wrong (i.e., essay tests have more than one answer)

ON TARGET

Method Test-retest Equivalent forms

Content Identical Different

Time Interval Varies None Varies

Procedure Gives identical instrument twice Give two forms of instrument Give two forms of instrument, with time interval

Equivalent forms/ Different retest

Internal consistency

Different

None

Divide instrument into halves and score each using Kuder-Richarson approaches Compare scores obtained by two or more observer or scores

Scoring observer agreement

Identical

None

ON TARGET

STANDARD ERROR OF MEASUREMENT (SEMeas) Gives the margin or error that you should expect in an individual test score because of imperfect reliability of the test. For many IQ tests, the standard error of measurement over a one-year period and with different specific content is about 5 point. Over a 10 year period, it is about 8 points. This means tat a score fluctuates considerably more the longer the time between measurements. Thus, a person scoring 110 can expect to have a score between 100 and 20 one year later; but five years later between 94 and 126

ON TARGET

The formula for standard error of measurement is SD1-r11 where SD + standard deviation of scores and r11 = the reliability coefficient appropriate to the conditions that vary.

In the above example, the standard error (SEMeas) of 5 in the first example was obtained as follows; SD = 16, r11 = .90 SEM = 16 1-.90 = 16 .10 = 16 (.32) = 5.1

ON TARGET

SCORING AGREEMENT

Most tests and many other instruments are administered with specific directions and are scored objectively. Although differences in the resulting scores with different administrators or scores are still possible, it is generally considered highly unlikely that they would occur, except essays questions.

ON TARGET

Instruments that use direct observation are highly vulnerable to observer differences, so researchers who uses such instruments are obliged to investigate and report to the degree of scoring agreement.

Such agreement is enhanced by training the observers and by increasing the number of observation periods.

ON TARGET

Potrebbero piacerti anche