Sei sulla pagina 1di 31

OBJEKTIF

1.Dapat menjelaskan maksud kesahan dan


keboleh percayaan sesuatu alat
pengukuran penyelidikan
2.Dapat menghuraikan jenis-jenis kesahan
dan keboleh percayaan alat pengukuran
yang digunakan dalam penyelidikan

KESAHAN
(VALIDITY)

KEBOLEHPERCAYAAN
(RELIABILITY)

Validity refers to the degree in


which our test or other measuring
device is truly measuring what
we intended it to measure.

Sejauh mana alat


mengukur apa yang ia
sepatutnya ukur

Kesahan bermaksud kebolehan ujian mengukur apa


yang sepatutnya diukur, Youngman & Eggleston, 1982;
Sax & Newton, 1997)
Kesahan sesuatu alat pengukuran merujuk kepada
sejauh manakah alat yang digunakan mengukur data
yang dikehendaki untuk mencapai objektif kajian
(Mohd Majid Konting, 1990)

Based on Internal Structure

Kesahan Gagasan

Construct
(determination of the
significance, meaning,
purpose, and use of the
scores)

Based on Relations
to Other Variables
Based on content

Kesahan Kriteria
Criterion-referenced (scores are a
predictor of an outcome or criterion
they are expected to predict)
Concurrent
Evidence

Predictive
Evidence

Kesahan Kandungan
Content (representative of
all possible questions that
could be asked)
Content validation is usually carried
out by experts

Kesahan Kandungan
(Content Validity)

Sejauh mana alat merangkumi kandungan sesuatu


bidang.
Matlamat utama ialah untuk memastikan semua isi
dan kandungan bidang yang diukur menggambarkan
bidang tersebut.
Berdasarkan kepada skop dan objektif dan
kandungan sesuatu bidang yang dikaji.
Pendapat pakar atau penilai luar diperlukan bagi
menilai kesesuaian butiran bagi domain yang dipilih.

is concerned with a tests ability to include or represent


all of the content of a particular construct. The
question 1 + 1 = ___ may be a valid basic addition
question. Would it represent all of the content that
makes up the study of mathematics? It may be
included on a scale of intelligence, but does it
represent all of intelligence? The answer to these
questions is obviously no. To develop a valid test of
intelligence, not only must there be questions on math,
but also questions on verbal reasoning, analytical
ability, and every other aspect of the construct we call
intelligence. There is no easy way to determine
content validity aside from expert opinion.

1.
2.

3.

Do the items appear to represent the thing


you are trying to measure?
Does the set of items underrepresented
the constructs content (i.e., have you
excluded any important content areas or
topics?)
Do any of the items represent something
other than what you are trying to measure
(i.e., have you included any irrelevant
items?)

Sebelum sesuatu instrumen itu dikatakan


mempunyai kesahan kandungan, lima syarat ini
perlu dipenuhi:
1.Bidang

kandungan mestilah dinyatakan dalam bentuk


tingkah laku yang secara umum diterima maknanya.
2.Bidang mestilah dihuraikan dengan jelas.
3.Bidang mestilah relevan dengan tujuan penggunaan
ujian.
4.Hakim-hakim yang berkelayakan mestilah bersetuju
bahawa bidang telah disampel secara mencukupi.

Evidence Based on Internal


Structure

To measure several components or dimensions of a


construct.
Use Factor Analysis to analyzes correlations among
test items and tells you the number of factors
present. Its tell you whether the test is
unidimensional or multidimensional.
Unidimensional all the item measure are single
construct.
Multidimensional different set of item tap
different construct or different component of a
broader construct.

Internal Structure

Factor analysis tell you how many dimensions or


factors your test items represent.
Also can obtain a measure of test homogeneity
(i.e., the degree to which the different items
measure the same construct or trait)
Use coefficient alpha (Alpha Cronbach) for the test
of homogeneity.
If the alpha is low (e.g., <.70) for the test, then
some items might be measuring different
constructs or some items might be bad.
Examine the items that are contributing to your
low coefficient alpha and consider eliminating or
revising them.

Kesahan Kriteria
(Criterion Validity)

Obtained by relating your test scores to a relevant criterion.


A criterion is the standard or benchmark that you want to
predict accurately on the basis of scores from your test.
Sejauh mana kaitan antara alat dengan kriteria luaran yang
berkecuali (sama ada item mengukur kriteria yang hendak
diukur).
Ditentukan dengan analisis korelasi antara dua set markah.
Calculate correlation coefficients for the study of validity
validity coefficients.

Concurrent Validity refers to a measurement devices ability to vary


directly with a measure of the same construct or indirectly with a
measure of an opposite construct. It allows you to show that your
test is valid by comparing it with an already valid test.
Administering the focal test and criterion test at approximately the
same point in time (i.e., concurrently) and then correlating the two
set of scores. If the two sets of scores highly correlated, you have
concurrent evidence.

e.g.
A new test of adult intelligence, for example, would have
concurrent validity if it had a high positive correlation with the
Wechsler Adult Intelligence Scale since the Wechsler is an
accepted measure of the construct we call intelligence. An
obvious concern relates to the validity of the test against
which you are comparing your test. Some assumptions must
be made because there are many who argue the Wechsler
scales, for example, are not good measures of intelligence.

Obtain predictive evidence of validity by measuring your


participants at one point in time on your test and then, at a
future time, measuring them on the criterion measure.
Take more time and effort than concurrent evidence, but it
can provide superior evidence that your test does what
you want it to do.
In order for a test to be a valid screening device for some
future behavior, it must have predictive validity. The SAT is
used by college screening committees as one way to
predict college grades. The GMAT is used to predict
success in business school. And the LSAT is used as a means
to predict law school performance. The main concern with
these, and many other predictive measures is predictive
validity because without it, they would be worthless

Reliability is synonymous with the consistency of a test, survey,


observation, or other measuring device. Imagine stepping on your
bathroom scale and weighing 140 pounds only to find that your weight on
the same scale changes to 180 pounds an hour later and 100 pounds an
hour after that. Base on the inconsistency of this scale, any research
relying on it would certainly be unreliable. Consider an important study on
a new diet program that relies on your inconsistent or unreliable bathroom
scale as the main way to collect information regarding weight change.
Would you consider their results accurate?

Sejauh mana instrumen mengukur dengan tekal apa


yang hendak diukur.
Scores from measuring variables that are stable and
consistent

Test-retest
Reliability

Internal
Consistency
Reliability

Equivalent
Forms
Reliability

Merujuk kepada ketekalan atau stabiliti markah


ujian jika dilakukan pada masa yang berbeza.
Contoh:
Ujian diberikan kepada 100 individu untuk satu masa dan diulangi
pada masa berlainan. Dua set markah ini dikorelasikan. Sekiranya
individu memperoleh markah tertinggi dalam ujian 1 juga
memperolehi markah tertinggi dalam ujian 2, begitu juga individu
yang mendapat markah terendah dalam ujian 1 juga mendapat
markah terendah dalam ujian, maka dikatakan mempunyai korelasi
yang tinggi. Oleh itu soalan ujian tersebut mempunyai
kebolehpercayaan yang tinggi.

Refers to the consistency of a group of individuals scores on two


equivalent forms of a test designed to measure the same
characteristic.
Menggunakan satu alat yang dibina dan satu lagi yang piawai.
Ditadbir ke atas subjek yang sama dan pada masa yang sama
atau masa yang lain.
Equivalent form means that two tests are constructed so that
they are identical in every way except for the specific items
asked on the test.
This means that they have the same number of items, the items
are the same difficulty level, the item measure the same
construct, and the test is administered, scored, and interpreted
in the same way.
The two set of scores are than correlated. If this reliability
coefficient to be very high and positive, that is the individuals
who do well on the first form of the test should also do well on
the second form, and individuals who performed poorly on the
first form of the test should perform poorly on the second test.

Internal consistency refers to how consistently the items on a


test measure a single construct or concept.
The test-retest methods of assessing reliability are general
methods that can be used with just about any test.
Internal consistency measures are convenient and are very
popular with researchers because they require one group of
individuals to take the test one time.
Two indexes of internal consistency:
o Split half reliability
o Coefficient alpha

Split-half reliability

Splitting a test into two equivalent halves and then


assessing the consistency of the scores across the two
halves of the test.
Divide the test into halves and correlate the scores
from the two halves.
Compute the correlation between scores on the two
halves of the test using Spearman-Brown formula.
The low correlation indicates that the test was
unreliable, a high correlation indicates that the test was
reliable.

Coefficient alpha

Lee Cronbach 1951) developed coefficient alpha.. Alpha


Cronbach
Coefficient alpha tells you the degree to which the items
are interrelated.
Rule of thumb:
At a minimum, greater than or equal to .07 for research
purposes and somewhat greater than that value (e.g. .
09) for clinical testing purposes.

Pernyataan item mestilah jelas dan tepat.


Arahan mestilah jelas dan ringkas.
Item hendaklah bentuk sejenis.
Situasi dan masa pengukuran hendaklah piawai,
serupa dan terkawal.
Elakkan gangguan ke atas subjek.
Elakkan kebimbangan subjek dengan memberi
jaminan keselamatan dan kerahsiaan ke atas
maklumat yang diberi.

Fasa terakhir tinjauan


sebelum pengumpulan
data bermula.

Matlamatnya adalah untuk


mencari masalah dalam soal
selidik, termasuk soalan yang
lemah, arahan yang tidak lengkap
dan item yang sukar dijawab.

Tidak boleh gunakan


kumpulan fokus
sebenar.

Untuk kajian
baharu,
lakukan dua
kali ujian rintis.

Jumlah responden tidak


ditentukan dengan tepat,
dicadangkan sekurangkurangnya 25 orang, lebih baik
antara 50 75 orang.

Train
researchers to
collect
observational
data

Develop
standard written
procedures for
administering an
instrument

Obtain
permission to
collect and
use public
documents

Respect individuals and sites


during data gathering (ethics)

lp
a
du
i
v
i
Ind

ip
c
i
rt

n ts

Parents of
participants who are
not considered adults

Institutional or
organizational
(e.g., school
district)

Si
te
-s
se pec
co if
sc n ic
ho da (e .
ol ry g.,
)

Campus approval (e.g.,


university or college) and
Institutional Review
Board (IRB)

Potrebbero piacerti anche