Sei sulla pagina 1di 30

Reliability and Validity

Q560: Experimental Methods in Cognitive Science

Lecture 16
Psychological Measurement

Psychometric theory, sources of measurement noise


Cognitive Construct:
The label given to our hypothetical characteristic (e.g., attention,
STM-capacity, executive function, intelligence, forgetting, etc.)
Dimensions along which Ss can be located based on behavior
Cannot directly measure construct, so we link it to behavior
believed to reflect the construct

Operational Definition:
Statement specifies how a construct is measured
Link between the overt and latent variables
Reliability and Validity

Reliability and Validity are important concepts in both


measurement and the full experiment to insure the link
between our statistics and conclusions is sound.

Reliability = consistency

Validity = on targetness

Reliability is a necessary but insufficient condition for validity


Reliability

Reliability is the extent to which the measurements of a test


remain consistent over repeated tests of the same subject
under identical conditions.

An experiment is reliable if it yields consistent results of the same


measure. It is unreliable if repeated measurements give
different results.

Reliable car, repeatability, etc.

Reliability of experiment, replication, and error rate

Reliability does not imply validity


Validity

Validity of a measure is the degree to which the variable


measures what it is intended to measure

e.g., IQ tests, GREs, Eye tests

A valid measure is reliable


A reliable measure is not necessarily valid
Unreliable and Invalid Shooting:

Sam Scattershot Wilson


Reliable but Invalid Shooting:

Ralph Rightpull Roberts


Reliable and Valid Shooting:

Kit Bullseye Carson


Experimental Validity
Valid design is necessary for valid scientific conclusions
Crib sheet: www.indiana.edu/~clcl/Q560/Validity.pdf

1. Statistical Conclusion Validity:


Validity with which statements about the association of two
variables can be made based on statistical tests
Threats: measurement Rxx, statistical assumptions

2. Construct Validity:
Validity with which we can make generalizations about higher-order
constructs from the experimental results
Threats: vague operational def (Watson); experimenter/participant
expectancy effects
Experimental Validity

3. Internal Validity:
Validity with which statements about the causal relationship
between variables as manipulated
Threats: Confounds (history, maturation, testing), instrumentation,
statistical regression, mortality, etc.

4. External Validity:
Validity with which we can make generalizations from sample/expt
Ecological validity
Threats: Interactions of setting/selection method and treatment
Dr. N. Lewis is interested in whether memory encoding is stronger for pictures of
objects or for words that refer to the same objects. He has participants learn a list of 30
written words that refer to objects (then a distracting task) and then recall as many
words as they can. Next, he gives the same participants a list of 30 pictures of the
same objects and (after the same distracting task) has them again recall as many
words as they can. At the end of the experiment, participants recalled a mean of 16
words in the written condition vs. a mean of 24 words in the pictures condition.

1. What type of study is this?


2. What is the independent variable?
3. What is the dependent variable?
4. Is the dependent variable discrete or continuous?
5. What is the scale of measurement for the dependent variable?
6. Name one confounding variable.
A researcher is interested in whether or not snakes can detect insults. He buys 20
exotic snakes, and separates them into two groups of 10. For one group of snakes, he
insults them for 10 minutes each. For the other group, he simply stares at them for 10
minutes each. For each group, he records the number of times the snakes bite him
(assuming that a bite indicates that the snake took offense to him). At the end of the
experiment, the group of snakes he insulted had bit him 23 times vs. only 8 bites from
the group he did not insult.

1. What type of study is this?


2. What is the independent variable?
3. What is the dependent variable?
4. Is the dependent variable discrete or continuous?
5. What is the scale of measurement for the dependent variable?
6. Name one confounding variable.
Article Critique

Assignment #5: Article critique


Small-N Designs
Small-N Designs
Why would we want to do an experiment with a
small number of subjects?

B/c its easier/faster than doing an experiment


with a large number of Ss?
Small-N Designs
Why would we want to do an experiment with a
small number of subjects?

B/c its easier/faster than doing an experiment


with a large number of Ss?

This is untruesmall N experiments are frequently


more difficult and time consuming, even though
there are only 1..5 participants
Small-N Designs
In a small N design, we make repeated
measurements on a small number of participants

Although there are fewer participants, there are


many more observations per participant

These experiments can be many hours of testing


consisting of several thousand trials

The goal is to provide a complete and accurate


description of a single subjects behavioral
changes as a function of a repeated measure

Other subjects are replications (separate expts)

The experimenter is often a subject


Why would we want to do this?
1. Practical Reasons: A small N design may be
necessary b/c it is difficult to get Ss from a rare
population (e.g., OCD, Alzheimers), the
treatments may be expensive or time consuming
(e.g., teaching sign language to a chimp:
Patterson & Linden, 1981, spent 10 years doing
this)

2. Theoretical Reasons: Skinner believed that the


best way to understand behavior is to study
single individuals intensely one should study a
single subject for a thousand hours rather than a
thousand subjects for an hour (1966, p.21)

If conditions are precisely controlled, then orderly


and predictable behavior will follow
Why would we want to do this?
3. Methodological Reasons: Pooling or averaging
data from many subjects can produce misleading
results as an artifact of grouping (Sidman, 1960)

Averaging can produce results that do not


characterize any subject who participated. More
importantly, it can produce a result supporting
theory X when perhaps it shouldnt (Estes)

E.g.: Manis (1971) tested children in a


discrimination learning taskstimuli were simple
objects, and your will have to learn which feature
is the diagnotic one (shape, color, position, etc.)
Trial 1: +

Trial 2: +

Trial 3: +

Trial 4: +

Trial 5: +

Trial 6: +
Continuity Theory: Concept learning is a gradual
process of accumulating habit strength

Noncontinuity Theory: Subjects actively try out


different hypotheses over trials. While they search
for the correct hypothesis, performance is at chance,
but once they hit the correct hypothesis, the
performance shoots up to 100% and stays there

Perfect
Performance

Chance

Trials
Here are the averaged data. Continuity is right!!
But here are the individual data before averaging.
Discontinuity is right?!
CogSci Began with Small-N Research
Ebbinghaus, Wundt, Dressler, Thorndike

It wasnt until the 1930s that experiments with


large numbers of participants and aggregate
statistics became commonplace (largely due to
Fisher)

Psychophysics is still dominated by Small-N


designs. Idea is that we have very high similarity
between our perceptual systems; with sufficient
control, a stable effect should be observable
without needing many subjects

An effect that isnt stable enough to be studied


with a small N isnt worth studying
Elements of Small-N Designs
1.A within-Ss manipulation

2.Target behavior must be operationally defined

3.Establish a baseline of responding/behavior

4.Begin treatment manipulation, and monitor


change from baseline

How do we analyze this?

Visual inspection

Curve/Trend fitting based on theory

Change from baseline


Withdraw Designs
Get measurement of baseline behavior on the DV

Introduce the manipulationbut, a change in


responding may be due to history or maturation
(AB design)

Return to Baseline: If the change is due to hist


or mat, it is unlikely that the behavior will regress
when treatment is removed (called an ABA
design).ABAB design is more popular

1.Multiple baselines design


2.Alternating treatments design
3.Changing criterion design
4.Staircase designs
Withdraw Designs:
E.g., Does talking to a plant make it grow?

(A) (B) (A)


Baseline Treatment Baseline
Growth in inches

Growth of ficus divinicus

First three months Second three months Final three months


Withdraw Designs:
E.g., Does talking to a plant make it grow?

(A) (B) (A)


Baseline Treatment Baseline
Growth in inches

Growth of ficus divinicus

First three months Second three months Final three months


Small-N Designs and Psychophysics
Psychophysics: the relationship between the
physical stimulus and the perceptual reaction to it

Small-N designs are popular in Psychophysics bc:


Few Ss are needed b/c of of the similarity
between our sensory systems (generlizes)
On each trial, data are much less affected by
error variance than in questionnaire research
(also b/c of laboratory control)
Trials are very quick and easy: so why do a 30
min experiment where the data collection only
takes 30 seconds?
Study one S, others are replications
Criticisms Against Small-N Designs
1.External validity: To what extent do these results
generalize?

2.Criticized for relying on visual inspection of data


instead of statistical analysis (but there are more
theory-driven and useful for model fitting)

3.Small-N designs cannot adequately test for


interaction effects (interactive designs exist, but
are very cumbersome: ABBCBABBCB design, etc.)

4.Due to their operant learning tradition


(Skinnerian), they tend to focus on response
frequency as a DV, rather than RT, accuracy,
habituation, etc.

Potrebbero piacerti anche