Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Unchanged A B
Changed C D (e)
Experimental Condition
Experiment 1
A .89 .82 .85 .78
B
D
-.58 -.73 -.83 -.81
Experiment 2
Positive Pretest
A .89 .79 .86 .76
B
D
-.60 -.68 -.69 -.66
Negative Pretest
A .82 .81 .83 .77
B
D
-.75 -.69 -.76 -.80
Note. A = monitoring; B
D
= change bias; diff = difficult; SDT =
signal-detection theory.
Figure 1. Proportion of correct responses on the main test in Experiments 1 and 2 as a function of experimental phase
(Phase 1/Phase 2), question difficulty (difficult/easy), and response deadline (5 s/10 s).
CJEP 59-1 2/10/05 4:42 PM Page 31
32 Higham and Gerrard
false alarm rate = [B + 0.5] / [B + D + 1]; Snodgrass &
Corwin, 1988). These data are shown in Table 2. A 2
(Difficult/Easy) x 2 (10 s/5 s) within-subjects ANOVA on
monitoring in Experiment 1 revealed a main effect of
deadline, F(1,23) = 7.17, MSE = .004 (10 s: .82; 5 s: .85),
and a main effect of question difficulty, F(1,23) = 21.92,
MSE = .005 (easy: .87; difficult: .80). Additional analyses
determined the effect, if any, of pretest type on moni-
toring in Experiment 2. Although the analysis of pro-
portion-correct data revealed no effect of pretest, a 2
(Difficult/Easy) x 2 (10 s/5 s) x 2 (Positive/Negative)
mixed ANOVA on monitoring revealed a main effect of
question difficulty, F(1,46) = 24.41, MSE = .009 (easy:
.85; difficult: .78), and a pretest by difficulty interaction,
F(1,46) = 5.09, MSE = .010. As can been seen in Table 2,
a negative pretest lowered monitoring relative to a pos-
itive one, but only on easy questions. Analogous
ANOVAs on change bias revealed a main effect of dead-
line in Experiment 1, F(1,23) = 13.18, MSE = .010 (10 s:
-.82; 5 s: -.65), indicating that participants changed
answers to more questions initially answered with the
5-s deadline than the 10-s deadline. No effects on
change bias were significant in Experiment 2, although
the largest F value, F(1,46) = 2.04, MSE = .055, p > .15,
was also associated with the main effect of deadline.
Discussion
The current experiments were conducted to deter-
mine the role of metacognition in changing answers on
multiple-choice tests. Consistent with previous
research, providing participants with the opportunity to
revise their answers increased their overall test score.
However, the manipulations of question difficulty and
answer deadline showed that not all errors were cor-
rectable; revision in Phase 2 allowed participants to
completely eliminate any disadvantage caused by the
short versus long answer deadline in Phase 1.
Conversely, questions with confusable alternatives
remained harder than questions with less confusable
alternatives even after participants were given the
opportunity to change their answers (Figure 1). This
finding may help to explain the paradox that partici-
pants simultaneously hold the opinion that correcting
answers is unbeneficial, but generally do change a few
answers when given the opportunity to do so, and gen-
erally improve their score in the process. Errors like
those produced by an answering deadline (e.g., typo-
graphical errors, question misinterpretation or misread-
ing) are likely to be monitored well, are likely to be
corrected, and their correction should function to
improve the test score. However, they are not likely to
receive much attention. In contrast, students are likely
to sweat over questions with confusable alternatives
and check them for accuracy when receiving feedback
on their test performance. But because there is little
improvement to changing answers to these questions,
the possible result is that students form the erroneous
belief that answer changing never produces benefits.
Examining only the proportion of correct answers in
Experiment 2 did not reveal any effect of pretest.
Indeed, had performance only been analyzed in terms
of whether or not participants produced correct
responses, which is typical in the testing literature, we
would have concluded that pretest had no effect.
However, proportion correct can be a confounded
measure of performance. It may be influenced not just
by how much participants know, but also by their
monitoring ability and change bias. By separating out
the monitoring component, using a Type 2 index of
discrimination, the effect of pretest was revealed; after
receiving negative pretest feedback, participants did
not monitor the errors that they made to easy questions
as well as participants given positive pretest feedback
(Table 2). The precise explanation for this result will
have to wait for follow-up research, but it is worth
speculating that the experience of negative feedback
caused participants to recruit negative prior instances of
answer changing, or of instances of being told by
instructors to stay with first hunches. Recruitment of
such negative instances may have led participants in
the negative pretest group to leave more incorrect
answers unchanged, compared to the positive feedback
group. Regardless of the exact reason for the monitor-
ing decrease, however, it is clear that traditional meth-
ods are not suitable for investigating metacognition in
answer changing, and that something like Type 2 SDT
will be necessary in the long run.
Philip A. Higham and Catherine Gerrard, School of
Psychology, University of Southampton. Portions of this
research were presented at the 13th annual meeting of the
Canadian Society for Brain, Behaviour and Cognitive
Science, June, 2003 at McMaster University, Hamilton,
Ontario, Canada. Thanks to John Vokey, Scott Allen, and
Sam Hannah for comments on an earlier draft of this paper.
PH was Lee Brooks graduate student from 1987 to 1992.
He was awarded his PhD in 1993.
Correspondence concerning this article should be
addressed to Philip A. Higham, School of Psychology,
University of Southampton, Highfield, Southampton, UK
SO17 1BJ (E-mail: higham@soton.ac.uk).
References
Benjamin, L. T., Cavell, T. A., & Shallenberger, W. R. (1984).
Staying with initial answers on objective tests: Is it a
myth? Teaching of Psychology, 11, 133-141.
Brooks, L. R. (1967). Spatial and verbal components of the
act of recall. Canadian Journal of Psychology, 22, 349-
CJEP 59-1 2/10/05 4:42 PM Page 32
METACOGNITION AND ANSWER CHANGING 33
366.
Brooks, L. R. (1978). Nonanalytic concept formation and
memory for instances. In E. Rosch & B. Lloyd (Eds.),
Cognition and categorization (pp.169-211). Hillsdale, NJ:
Erlbaum.
Brooks, L. R., LeBlanc, V. R., Norman, G. R. (2000). On diffi-
culty of noticing obvious features in patient appearance.
Psychological Science, 11, 112-117.
Clarke, F. R., Birdsall, T. G., & Tanner, W. P. (1959). Two
types of ROC curves and definitions of parameters.
Journal of the Acoustical Society of America, 31, 629-630.
Crawford, C. (1928). The technique of study. Boston, MA:
Houghton Mifflin.
Crocker, L., & Benson, J. (1980). Does answer changing
affect test quality? Measurement and Evaluation in
Guidance, 12, 223-239.
Donaldson, W. (1992). Measuring recognition memory.
Journal of Experimental Psychology: General, 121, 275-
277.
Foote, R., & Belinky, C. (1972). It pays to switch?
Consequences of changing answers on multiple choice
examinations. Psychological Reports, 31, 667-673.
Galvin, S. J., Podd, J. V., Drga, V., & Whitmore, J. (2003).
Type 2 tasks in the theory of signal detectability.
Psychonomic Bulletin & Review, 10, 843876.
Geiger, M. A. (1996). On the benefits of changing multiple-
choice answers: Student perception and performance.
Education, 117, 108-119.
Grier, J. B. (1971). Nonparametric indexes for sensitivity
and bias: Computing formulas. Psychological Bulletin,
75, 424-429.
Harvil, L. M., & Davis, G. (1997). Medical students reasons
for changing answers on multiple-choice tests. Academic
Medicine, 72, S97-S99.
Higham, P. A. (2002). Strong cues are not necessarily weak:
Thomson and Tulving (1970) and the encoding specifici-
ty principle revisited. Memory & Cognition, 30, 67-80.
Higham, P. A., & Brooks, L. R. (1997). Learning the experi-
menters design: Tacit sensitivity to the structure of mem-
ory lists. Quarterly Journal of Experimental Psychology,
50A, 199-215.
Higham, P. A., & Tam, H. (in press). Generation failure:
Estimating metacognition in cued recall. Journal of
Memory and Language.
Maki, R. H., & McGuire, M. J. (2002). Metacognition for text:
Findings and implications for education. In T. J. Perfect
& B. L. Schwartz (Eds.), Applied metacognition (pp. 39-
68). Cambridge, UK: Cambridge University Press.
Mathews, C. O. (1929). Erroneous first impressions on
objective tests. Journal of Educational Psychology, 20,
280-286.
McMorris, R. F., DeMers, L. P., & Schwartz, S. P. (1987).
Attitudes, behaviours, and reasons for changing respons-
es following answer-changing instruction. Journal of
Educational Measurement, 24, 131-143.
Mueller, D. J., & Wasser, V. (1977). Implications of changing
answers on objective test items. Journal of Educational
Measurement, 14, 9-13.
Nelson, T. O., & Narens, L. (1980). Norms of 300 general-
knowledge questions: Accuracy of recall, latency of
recall, and feeling-of-knowing ratings. Journal of Verbal
Learning and Verbal Behavior, 19, 338-368.
Pressley, Ghatala, E. S., Woloshyn, V., & Pirie, J. (1990).
Sometimes adults miss the main ideas and do not realize
it: Confidence in responses to short-answer and multi-
ple-choice comprehension questions. Reading Research
Quarterly, 25, 233-249.
Snodgrass, J. C., & Corwin, J. (1988). Pragmatics of measur-
ing recognition memory: Applications to dementia and
amnesia. Journal of Experimental Psychology: General,
117, 34-50.
Vispoel, W. (1998). Reviewing and changing answers on
computer-adaptive and self-adaptive vocabulary tests.
Journal of Educational Measurement, 35, 329-346.
Vispoel, W. (2000). Reviewing and changing answers on
computerized fixed-item vocabulary tests. Educational
and Psychological Measurement, 60, 371-384.
Vokey, J. R., & Higham, P. A. (2005). Components of recall:
The semantic specificity effect and the monitoring of
cued recall. Manuscript submitted for publication.
Sommaire
Deux expriences ont servi faire enqute sur le rle de
la mtacognition dans le changement des rponses donnes
des questions choix multiple faisant appel aux connais-
sances gnrales. Dans le cadre de la premire exprience,
les participants ont dabord rpondu 100 questions qua-
tre rponses possibles. Deux variables indpendantes
taient dtermines pour chaque sujet, cest--dire la diffi-
cult des questions et le dlai de rponse. La difficult des
questions tait rgle par linclusion de rponses incorrectes
susceptibles ou non dtre confondues avec la bonne. En ce
qui concerne le dlai de rponse, le rpondant avait soit
cinq soit dix secondes pour rpondre chaque question.
Aprs avoir rpondu toutes les questions la premire
tape, les participants avaient la possibilit, la deuxime
tape, de changer leurs rponses chacune des questions
et disposaient de tout le temps voulu pour le faire. cette
fin, chaque question et les diffrentes rponses possibles
leur taient prsentes nouveau, de mme que leur
rponse initiale. Soit que les participants ne modifiaient pas
la premire rponse donne (mme rponse) soit quils
Revue canadienne de psychologie exprimentale, 2005, 59-1, 33
CJEP 59-1 2/10/05 4:42 PM Page 33
34 Higham and Gerrard
choisissaient une nouvelle rponse (rponse change). Les
rsultats ont montr que les erreurs commises cause du
dlai de rponse taient tout fait corrigeables et que la-
vantage de ceux qui bnficiaient dun dlai de rponse de
dix secondes plutt que cinq la premire tape tait com-
pltement limin la suite de la rvision des rponses la
deuxime. Par contre, les erreurs causes par la difficult
des questions ntaient pas corrigeables, si bien que lavan-
tage procur par les questions faciles en comparaison des
questions difficiles la premire tape est demeur entier
la suite de la rvision des rponses la deuxime. Ces
rsultats nous apprennent que la compression du temps de
rponse et le potentiel de confusion entre les rponses pos-
sibles ont donn lieu des erreurs diffrentes du point de
vue qualitatif. Lors de la deuxime exprience, les partici-
pants ont pass un test prliminaire avant de procder au
test (et de bnficier de la possibilit de revoir les rpon-
ses) prvu par la premire exprience. Lobjet du test
prliminaire tait de confrer aux participants une expri-
ence relle positive ou ngative du changement de rpon-
ses. cette fin, des questions trompeuses ou non
trompeuses taient intgres au test prliminaire, et les par-
ticipants se sont donc trouvs dans lobligation de changer
une part leve des rponses. Par leur conception, les
questions trompeuses suscitaient des rponses initiales in-
tuitives fausses, de telle sorte que le remplacement de nom-
breuses rponses tendait augmenter la note obtenue. En
ce qui concerne les questions non trompeuses, le change-
ment de nombreuses rponses tendait diminuer la note,
comme bien des bonnes rponses taient remplaces par
des rponses errones. Lanalyse de la part des bonnes
rponses donnes au test principal donne entendre que
leffet de lexprience du test prliminaire tait nul : la part
des bonnes rponses donnes au test principal tait pra-
tiquement identique la part obtenue lors de la premire
exprience, qui ne prvoyait aucun test prliminaire. Par
ailleurs, outre la mesure de la part des bonnes rponses,
une mesure de discrimination de type 2 fonde sur la
dtection des signaux a servi isoler llment de contrle
mtacognitif qui jouait dans le changement de rponses.
Llment en question tait une mesure de la tendance des
participants changer les rponses errones et ne pas
toucher aux bonnes rponses. Les analyses de llment ont
rvl que, contrairement au test prliminaire positif, le test
ngatif diminuait le rsultat du contrle des questions
faciles, peut-tre du fait que les participants avaient ten-
dance ne pas toucher aux rponses incorrectes qui leur
taient venues spontanment par intuition. Dans lensem-
ble, les rsultats indiquent que les recherches futures sur les
changements de rponses devraient prendre en compte
davantage les facteurs mtacognitifs qui sous-tendent le
changement et faire appel cette fin la thorie de dtec-
tion de signaux de type 2 afin de dgager cet aspect de la
performance.
CJEP 59-1 2/10/05 4:42 PM Page 34