Metamemory Cues and Monitoring Accuracy Judging What You Know

Journal of Educational Psychology
2000, Vol. 92, No. 4, 800-810

Copyright 2000 by the American Psychological Association, Inc.
0O22-O663/0O/$5.0O DOI: 10.I037//0022-0663.92.4.800
Metamemory Cues and Monitoring Accuracy: Judging What You Know
and What You Will Know
William L. Kelemen
University of MissouriSt. Louis
Three experiments examined metamemory for categorized lists of items. Judgments of learning (JOLs)
were obtained from college students either immediately after study or following a brief (at least 30-s)
delay. In contrast to past findings (e.g., T.O. Nelson & J. Dunlosky, 1991), no advantage was found for
delayed JOLs in Experiment 1, using a standard, prediction-based metamemory cue. In Experiment 2,
knowledge-based judgments were elicited, and delayed JOL accuracy improved significantly. The
relative efficacy of 4 different metamemory cues was examined in Experiment 3. An interaction between
the timing and phrasing of JOL cues was detected: Delayed JOLs were more accurate than immediate
JOLs only when knowledge-based cues were used. These results are interpreted in A. Koriat's (1997)
cue-utilization framework for JOL accuracy, and they show that the phrasing of metamemory cues can
have a substantial impact on monitoring accuracy.
The role of memory monitoring and control in education has
been studied with increasing interest in recent years (Hacker,
Dunlosky, & Graesser, 1998). During exam preparation, for ex-
ample, students must accurately assess their current state of knowl-
edge to effectively regulate their ongoing learning. Students who
are inaccurate in their memory monitoring may direct their efforts
to inappropriate material, make inefficient use of their study time,
or continue to utilize ineffective study strategies. For example,
Pressley, Levin, and Ghatala (1984) showed that when college
students studied new vocabulary items, they initially showed no
preference for an elaboration strategy compared with rote repeti-
tion, even though the former produced much higher rates of
learning. After experience with a practice list and test, however,
these students showed improved metamemory by selecting elabo-
ration instead of repetition. From a theoretical perspective, the
change in study strategy reflects the influence of metamemory on
cognitive goals and actions (Flavell, 1981). From a practical stand-
point, Pressley et al.'s study underscores the importance of
metamemory in study efficacy and students' potential academic
success.
One way in which students can monitor their memories is to
make judgments of learning (JOLs) during study or very soon
afterward. These JOLs may be used to regulate future study
activity and to predict subsequent memory performance (Nelson &
Narens, 1994). For example, participants may study some novel
information and then be asked to predict how well they will
Experiments 1 and 2 were conducted as part of a doctoral dissertation
completed by William L. Kelemen at Baylor University. The results of
these two experiments were presented at the 44th Annual Meeting of the
Southwestern Psychological Association, New Orleans, April 1998.1 thank
Chuck Weaver for his consultation and helpful comments on this article
and Heather DeRousse for her assistance with data collection.
Correspondence concerning this article should be addressed to William
L. Kelemen, Department of Psychology, University of MissouriSt.
Louis, 8001 Natural Bridge Road, St. Louis, Missouri 63121-4499. Elec-
tronic mail may be sent to kelemen@umsl.edu.
perform on a future test (i.e., make a JOL). Metamemory accuracy
is assessed by comparing the magnitude of JOLs for individual
items with future recall. If participants' memory monitoring is
accurate, then items receiving high JOLs should be more likely to
be recalled than items receiving lower JOLs.
The basis of participants' JOLs has been examined by a number
of researchers (e.g., Begg, Duft, Lalonde, Melnick, & Sanvito,
1989; Koriat, 1997; Nelson & Narens, 1994; Schwartz, 1994).
Koriat described three classes of information that participants may
use when making JOLs: intrinsic, extrinsic, and mnemonic factors.
Intrinsic factors are related to properties of the stimuli, for exam-
ple, item concreteness, relatedness of the items, and so on. Prop-
erties of the encoding conditions (e.g., amount of study time
available, study strategy, number of learning trials, etc.) are re-
ferred to as extrinsic factors. Mnemonic factors refer to internal,
experienced-based indicators of future recall, including memory of
previous recall attempts, accessibility of target information, and
cue familiarity. Koriat found that participants tended to discount
extrinsic factors in favor of intrinsic information during initial
study trials. After practice, however, participants based their JOLs
on mnemonic information, and the mean Gamma correlation (G)
between JOLs and recall increased. Thus, JOLs may be based on
a collection of information, and the relative weight participants
assign to each factor can vary across situations. JOL accuracy is
determined by the source(s) of information participants incorpo-
rate into their judgments. Mnemonic information can be especially
useful if the conditions at time of JOL are closely related to those
during recall (Koriat, 1997).
Monitoring Retrieval After Delays
JOL accuracy should be high when the class of information used
as the basis for JOLs is highly diagnostic of future memory. One
type of mnemonic cue, accessibility of information at time of JOL,
is often used by students as a basis for their metamemory judg-
ments (Benjamin & Bjork, 1996). Because JOLs typically are
provided during study, accessibility can be an imperfect basis for
prediction (not all information remembered at time of JOL will be
800
METAMEMORY CUES AND MONITORING ACCURACY 801
remembered on the test). Nelson and Dunlosky (1991), however,
found that increasing the amount of time allowed between study
and JOL dramatically improved monitoring accuracy. Participants
studied paired associates and then made JOLs either immediately
after study (hereafter identified as "immediate JOLs") or several
minutes later (hereafter identified as "delayed JOLs"). Predictive
accuracy was modest in the first case (mean G between predicted
and actual recall = .38). In contrast, mean G for delayed JOLs was
very high (.90). This so-called "delayed JOL effect" is robust
across a variety of encoding and test conditions using paired
associates (Dunlosky & Nelson, 1992, 1994, 1997; Kelemen &
Weaver, 1997; Thiede & Dunlosky, 1994; Weaver & Kelemen,
1997).
Theoretical explanations for such high delayed JOL accuracy
have focused on the retrieval of target information at time of
judgment. One specific account, known as the monitoring-dual-
memories hypothesis, was proposed by Nelson and Dunlosky
(1991). They suggested that participants monitor information in
both short-term and long-term memory at time of JOL, although
the recall test taps only long-term memory. Delayed JOLs are more
accurate than immediate JOLs because the former are not contam-
inated by target information in short-term memory. Consistent
with this hypothesis, Dunlosky and Nelson (1992) showed that
reinstating the cue-target pair into short-term memory during a
delayed JOL substantially reduced predictive accuracy (see also
Dunlosky and Nelson, 1994, 1997, for further discussion).
A related hypothesis that also emphasizes the role of target
retrieval has been advanced by Spellman and Bjork (1992, 1997).
They suggested that delayed JOLs are highly accurate because
recall of a target after a delay improves the probability of recalling
that item on a subsequent memory test. From this view, delayed
JOLs are accurate because they reinforce what they assess. Imme-
diate JOLs do not increase future memory because retrieval at-
tempts are always successful immediately after study. Thus, im-
mediate JOLs function as a massed study trial, whereas delayed
JOLs can be a spaced study trial, if the target is retrieved. There is
evidence that, in some cases, delayed JOLs do alter the probability
of future recall (Kelemen & Weaver, 1997; Spellman & Bjork,
1997). For example, Kelemen and Weaver compared immediate
JOLs with judgments made after delays ranging from a few sec-
onds to 5 min. They found that delayed JOLs were always more
accurate than immediate JOLs, and mean recall scores were reli-
ably higher in delayed JOL conditions compared with immediate
JOLs in two of three experiments. Moreover, the conditional
probability of cued recall on the test (given successful initial
retrieval) increased monotonically with longer delays.
Although theoretically distinct, Nelson and Dunlosky's (1991)
hypothesis and Spellman and Bjork's (1992) explanation are not
necessarily mutually exclusive. In fact, both views emphasize the
importance of making a target retrieval attempt at time of JOL.
Koriat (1997) suggested that delayed JOLs are more accurate than
immediate JOLs because the former are based more heavily on
mnemonic cues (i.e., retrieval success or failure at time of JOL)
rather than on other less diagnostic intrinsic or extrinsic cues.
Because retrieval of a paired associate is almost always successful
immediately after study, mnemonic cues cannot discriminate be-
tween items; therefore, immediate JOLs are necessarily based on
less effective intrinsic cues.
The role of target retrieval in JOL accuracy might be related to
recent findings by Maki (1998b). In her study, students read brief
passages of narrative texts and then made predictions about their
future test performance. JOLs occurred either immediately after
reading a passage, or they were delayed until all the texts had been
read. The time of test (immediate vs. delayed) also was manipu-
lated. Surprisingly, mean Gs for delayed JOLs were very low
(ranging from about .02 to .20). In contrast to Nelson and Dun-
losky's (1991) findings, predictive accuracy was best for immedi-
ate JOLs and tests.
Why was delayed JOL accuracy worse than immediate JOL
accuracy in text comprehension monitoring? Maki's (1998b) par-
adigm differed from Nelson and Dunlosky's (1991) procedures in
several ways. One important difference involved the retrievability
of target information at time of JOL. Using paired associates,
Nelson and Dunlosky reported that 95% of their participants
attempted to recall the target word during delayed JOLs. It seems
unlikely, however, that Maki's participants would be able to make
a similar retrieval attempt for all the propositions in a target text.
Perhaps text stimuli are not well-suited for a recall attempt at time
of JOL, and as a result, delayed JOLs were based on other, less
diagnostic cues. If so, increased delayed JOL accuracy may not be
robust across metacognitive domains in which retrieval attempts of
target information is more difficult. Thus, one goal of the present
study was to test whether the delayed JOL effect would generalize
beyond paired associate stimuli.
The Phrasing of Metamemory Cues
Previous research has shown that the phrasing of metamemory
cues can affect memory monitoring accuracy. Widner and Smith
(1996) manipulated the type of cue used in a feeling-of-knowing
paradigm. They asked participants to answer a series of trivia
questions and then provide feeling-of-knowing judgments for
items answered incorrectly. These judgments were compared with
subsequent recognition performance. Widner and Smith examined
the relative effectiveness of three different metamemory cues: (a)
recognition-based cues (e.g., "Will you be able to identify a
currently nonrecallable answer on a recognition test?"), (b)
knowledge-based cues (e.g., "Do you feel that you really do know
a currently nonrecallable answer?"), and (c) composite cues,
which combined recognition and knowledge cues. They found
that metamemory accuracy was highest (G = .49) when recogni-
tion alone was emphasized. Monitoring accuracy following
knowledge-based and composite cues was significantly lower
(mean Gs from .12 to .15).
Similar results have been obtained in text comprehension mon-
itoring experiments. Maki and Swett (1987) found that
metamemory for text was better when the cue asked for memory
predictions, as opposed to importance ratings, even though the
correlation between the two types of ratings was substantial (Pear-
son's r = .74 and .61 for two texts). In a related finding, Maki and
Serra (1992) showed that test predictions were more accurate than
judgments of comprehension immediately after reading texts.
Thus, studies in both feeling-of-knowing and text comprehension
monitoring tasks suggest that metamemory cues emphasizing pre-
diction tend to be better than those emphasizing current knowl-
edge, importance, or comprehension.
802
KELEMEN
Despite these findings, the phrasing of metamemory cues in JOL
studies has not received much attention. In most research, only the
predictive aspect of JOLs is emphasized. Dunlosky and Nelson
(1994), however, did study immediate and delayed JOLs using two
different measurement scales. In Experiment 1, they elicited judg-
ments using a 6-point percentage scale ranging from 0% (definitely
won't recall) to 100% (definitely will recall). In Experiment 2,
they asked participants to judge how well they had learned certain
items: JOLs ranged from 1 (not learned at all) to 6 (extremely well
learned). The percentage scale emphasized prediction, whereas the
latter scale was knowledge based. Even though participants were
asked to evaluate their performance in different ways between
experiments, Dunlosky and Nelson obtained a large delayed JOL
effect in both cases. Using the predictive cue, overall G = .18 for
immediate JOLs, and G = .83 for delayed JOLs; using knowledge-
based cue, Gs = .21 and .83, respectively. The type of cue was not
of primary interest to Dunlosky and Nelson, however, and so the
procedures of Experiment 1 and 2 varied considerably. Neverthe-
less, this study demonstrated that increased delayed JOL accuracy
can be obtained with paired associates using both predictive and
knowledge-based metamemory cues.
There were two primary objectives in the present study. As
noted earlier, one purpose was to examine the generalizability of
Nelson and Dunlosky's (1991) delayed JOL effect. Although the
effect has been shown to be large in magnitude and robust across
encoding conditions (Dunlosky & Nelson, 1994), Maki (1998b)
recently failed to obtain an advantage for delayed JOLs in text
comprehension monitoring. This may reflect procedural differ-
ences between experiments, or it may suggest that high delayed
JOL accuracy is limited to studies using stimuli that are readily
retrievable (e.g., paired associates).
To examine this issue, a categorized list-learning task was
devised (cf. Hertzog, Dixon, & Hultsch, 1990). Participants
learned lists of six related items presented in categories (e.g., A
type of fuel: petroleum, alcohol, butane, water, uranium, and
charcoal). After study, participants were shown the category title
(A type of fuel) and asked to make a judgment of future recall.
JOLs occurred either immediately after study, or they were de-
layed until all the items had been studied. At the end of the
experiment, participants were shown the category title and asked to
recall the exemplars they had studied. Thus, the basic procedures
resembled those used by Nelson and Dunlosky (1991), except that
the stimuli were categorized lists of items rather than paired
associates.
Participants may base their JOLs on a variety of information.
Koriat (1997) found that JOLs became more accurate when par-
ticipants relied primarily on mnemonic factors such as the outcome
of previous retrieval attempts. If participants in the present study
based their delayed JOLs on a retrieval attempt, then delayed JOLs
should be more closely related to future recall than immediate
JOLs. If participants based their delayed judgments on nonmne-
monic information, however, then no increase in delayed JOL
accuracy would be expected. Almost all participants reported
making retrieval attempts during delayed JOLs in Nelson and
Dunlosky's (1991) original study, but participants in the present
study might not use the same strategy. Because the retrieval of six
category exemplars is more difficult than remembering a single
target item, participants might choose to base both immediate and
delayed JOLs on nonmnemonic information, thus producing sim-
ilar levels of metacognitive accuracy for each type of judgment.
The second goal of this study was to determine whether the
phrasing of metamemory cues affects immediate and delayed JOL
accuracy. Previous research on feelings-of-knowing (Widner &
Smith, 1996) and text comprehension monitoring (Maki & Serra,
1992; Maki & Swett, 1987) has shown that prediction-based cues
are more effective than knowledge-based (or comprehension-
based) cues. Both of these metamemory paradigms typically in-
volve judgments about stimuli that cannot be completely recalled
at time of JOL. In contrast, target items were potentially retriev-
able at time of JOL in the present study. Thus, knowledge-based
JOLs might be more effective than predictive JOLs because the
former require participants to consider mnemonic-based informa-
tion, whereas the latter do not. The type of metamemory cue varied
between and within experiments: (a) Experiment 1 used a standard,
prediction-based metamemory cue; (b) Experiment 2 used a
knowledge-based cue; and (c) Experiment 3 compared the efficacy
of prediction-based cues, knowledge-based cues, and combined
(knowledge and prediction) cues.
Experiment 1
Previous research using paired associates has shown increased
metamemory accuracy when there is a delay of at least 30 s
between study and JOL (Nelson & Dunlosky, 1991). Conversely,
research in text comprehension monitoring has failed to replicate
this finding (Maki, 1998b). The purpose of Experiment 1 was to
determine (a) whether students could produce reliable metacogni-
tive accuracy for categories of related items and (b) whether delays
improve the accuracy of judgments in this task.
Method
Participants and materials. Thirty-six undergraduates volunteered for
Experiment 1 and received course credit. All participants were tested in
individual cubicles. Experimental stimuli were presented on IBM-
compatible PCs using Micro Experimental Laboratory software, Ver-
sion 2.0 (Schneider, 1995). Twenty-four categories of related items, each
containing 6 exemplars, were taken from Battig and Montague's (1969)
norms (e.g., A type of fuel: petroleum, alcohol, butane, water, uranium, and
charcoal). Categories were classified as either easy (frequency of associ-
ation > 100), medium (frequency of association ranged from SO to 100), or
difficult (frequency of association < 50), based on the norms reported by
Battig and Montague. Eight categories of items were constructed for each
level of difficulty to maximize metacognitive discriminability (see Nelson,
Leonesio, Landwehr, & Narens, 1986, and Schwartz & Metcalfe, 1994, for
discussions of potential measurement difficulties caused by a restricted
range of stimuli).
Design and procedure. Participants provided a single JOL for each
category. The timing of these judgments was manipulated within subjects.
Immediate JOLs occurred directly after study of all items in a category was
complete; delayed judgments occurred after all 24 categories had been
studied. These procedures ensured that at least 11 items intervened between
study and judgment in the delayed JOL condition. On the basis of the rate
of item presentation and average JOL latencies, delays in this condition
ranged from 1 to 9 min, with an average of 5 min.
Before beginning the experiment, instructions were read aloud to all
participants and also appeared on the computer screen. Participants were
instructed to study the items in all 24 categories so that on a future memory
test they would be able to recall the items when prompted by the category
heading. An additional 4 categories (and items) appeared at the beginning
of the experiment to serve as a memory buffer, and these stimuli were not
included on the final test. Participants were not informed that they would
not be tested on these items.
Participants studied items from one category at a time; the presentation
order of categories, and items within the categories, was randomized for
each participant. Categories were pseudorandomly assigned to condition,
with the restriction that 12 categories received immediate JOLs and 12
received delayed JOLs. During study, the category heading was shown at
the top of the screen in capital letters, and exemplars appeared sequentially
for 4 s each. The exemplars were printed in lowercase letters. For example,
A TYPE OF FUEL appeared on top of the screen, and petroleum, alcohol,
butane, water, uranium, and charcoal appeared one at a time below the
title. The order of the six exemplars listed in each category was randomly
determined for each participant. In the immediate JOL condition, the
category heading remained on the screen and the following cue appeared
after the last example was shown: "How confident are you that in about 10
minutes from now you will be able to recall the members of this category
when shown the category name?" Participants rated their certainty on a
6-point percentage scale ranging from 0% (definitely will not recall), to
100% certain (definitely will recall). In the delayed JOL condition, study of
the category and exemplars was followed by the instructions, "Press the
spacebar to continue." The next category then appeared. After participants
studied all 24 categories, they made delayed JOLs for the 12 categories that
had not been judged previously. The category heading appeared on the
screen, and the same cue phrase and percentage scales were shown. There
was no restriction on the amount of time allowed for participants to make
their JOLs.
After participants studied and rated all 24 categories (plus the 4 buffer
categories), they completed a distracting filler activity for 9 min. Finally,
participants received a memory test. The category heading appeared at the
top of the screen, and participants were instructed to type in all the
exemplars that they remembered studying. The order of category presen-
tation was randomly determined for each participant. To minimize the role
of misspellings, the computer scored responses as correct if the first three
letters entered by participants matched the first three letters of the category
members (the first three letters of the exemplars within each category were
unique). Participants were allowed as much time as necessary to complete
the memory test.
Results and Discussion
Although the dependent measure of primary interest was
metamemory accuracy, recall on the memory test was analyzed
along with the magnitude and latency of JOLs. All tests of statis-
tical reliability were conducted at p < .05.
Recall, JOL magnitude, and JOL latency. Descriptive statis-
tics for participants' recall on the memory test performance and
JOLs are listed in Table 1. For ease of comparison, both recall and
the magnitude of JOLs are shown as proportions. The magnitude
of JOLs was reliably higher in the immediate JOL condition (.52)
compared with delayed JOLs (.43), r(35) = 4.19, p < .001. There
was no reliable difference in the time required to produce JOLs
(i.e., JOL latency) between conditions. Recall on the memory test
did not differ reliably between conditions (.48 vs. .46). Thus,
participants predicted that they would remember more items when
JOLs occurred immediately after study, but test performance did
not differ between conditions.
Metamemory accuracy. Several measures have been devel-
oped to assess metamemory accuracy. These indices measure
either absolute accuracy (sometimes called "calibration") or rela-
tive, item-by-item accuracy (sometimes called "resolution"). Con-
sistent with the recommendations of some metamemory research-
ers (Koriat, 1997; Maki, 1998a; Weaver & Kelemen, 1997),
measures of both absolute and relative monitoring accuracy were
calculated.
Absolute accuracy was assessed with bias scores. The bias index
reflects participants' overall overconfidence or underconfidence in
a particular condition (Yates, 1990). Bias scores were derived by
obtaining the signed difference between mean JOL magnitude and
mean recall performance in each condition for each participant. A
score greater than 0 indicates overconfidence, and a score less
than 0 indicates underconfidence. Taken alone, the magnitude of
mean bias scores across participants was not reliably different
from zero in either condition (bias = .04 for immediate JOLs, and
bias = .04 for delayed JOLs.). This indicates that, in general,
participants were not reliably underconfident or overconfident.
However, the within-subject difference in bias between immediate
versus delayed JOLs was reliable, r(35) = 3.08, p = .004. This
probably reflects the decrease in JOL magnitude after delays noted
previously. The reliability of participants' bias scores across items
was very high, Cronbach's alpha coefficient = .92. In short, the
magnitude of under- or overconfidence within each condition was
not reliably different from zero, but the degree of bias between
conditions was statistically reliable.
A second aspect of metamemory, relative monitoring accuracy,
was assessed by computing Goodman-Kruskal Gamma correla-
Table 1
Mean Judgment of Learning (JOL) Magnitude, Recall, and Metamemory
Accuracy in Experiments 1 and 2
JOL Monitoring accuracy
Condition
Experiment 1
Immediate JOL
Delayed JOL
Experiment 2
Immediate JOL
Delayed JOL
Recall
.48 (.11)
.46 (.15)
3.12(1.09)
3.08 (.98)
Magnitude
.52* (.16)
.43* (.17)
4.37* (.69)
3.57* (.77)
Latency
5,835 (1,810)
5,187(2,194)
8,713* (3,981)
10,072* (4,572)
Bias
.04* (.19)
-. 04* (.20)
1.29* (.90)
0.55* (.66)
Gamma
.44 (.36)
.48 (.40)
.48* (.39)
.80* (.21)
Note. Main entries are mean values; entries in parentheses are standard deviations. Latency is the time (in
milliseconds) required for participants to enter their JOL from the onset of the metamemory prompt. Asterisks
indicate that a statistically reliable difference (/? < .05) was observed between immediate versus delayed JOLs.
804 KELEMEN
tions between predicted and actual test performance for each
participant. G is a measure of ordinal association, and it is the
preferred index of relative metamemory accuracy (Nelson, 1984,
19%; Wright, 1996). Mean Gs by condition are listed in Table 1.
Reliable (nonzero) predictive accuracy was observed in both con-
ditions, f(35) = 7.42, p < .001 (immediate JOLs), and
f(35) = 7.21, p < .001 (delayed JOLs), confirming that partici-
pants predicted which categories they would recall at a level
greater than chance. More important, though, no reliable improve-
ment was found for delayed JOLs (G = .48) compared with
immediate JOLs (G = .44), t(35) = 0.43, p = .670.
Summary. Experiment 1 showed that JOL magnitude was
higher immediately after study compared with after a delay, but
test performance itself did not differ between conditions. Consis-
tent with these findings, a reliable difference in bias was observed
between conditions, though bias scores considered separately were
not reliably nonzero. Participants did show nonzero, modest Gs in
both conditions, indicating reliable metamemory accuracy. Con-
trary to previous findings (Nelson & Dunlosky, 1991), however,
memory monitoring accuracy was not reliably better for delayed
JOLs compared with immediate JOLs. In other words, no delayed
JOL effect was observed. Experiment 2 examined one possible
cause of these findings.
Experiment 2
Delayed JOL accuracy was not higher than immediate JOL
accuracy in Experiment 1. One possible explanation is that partic-
ipants may not have based their JOLs on an actual retrieval attempt
of the target information. Because the JOL prompt was prediction
based, participants may have used any number of other cues (e.g.,
familiarity with the category title, concreteness of the title or
exemplars, etc.) as a basis for their ratings. In a paired associate
task, presenting the cue word alone at time of JOL may be
sufficient to elicit a target retrieval attempt, regardless of the
phrasing of the JOL prompt (e.g., Dunlosky & Nelson, 1994).
However, in the present categorized-list learning task, participants
may have been less likely to make a recall attempt of all six items
at time of JOL.
In a paired associate task, utilizing highly diagnostic mnemonic
cues for delayed JOLs is quick, easy, and relatively automatic: A
student reads the cue word and attempts to recall its associate.
Obtaining similar mnemonic cues for the categories in Experi-
ment 1, however, required that participants generate as many of the
six exemplars as they could and then tally the total number of
items recalled for each category. Clearly, this requires more cog-
nitive effort than retrieving a single target word. Rather than
attempting to retrieve all six exemplars, students may have utilized
more readily available (but less diagnostic) intrinsic cues for
delayed JOLs (e.g., judgments of ease based on the category
heading, similarity of the exemplars, etc.). Intrinsic information is
often used for immediate JOLs when mnemonic cues are less
readily available (Koriat, 1997). If participants did not make
retrieval attempts at delays, this could explain the similar levels of
metamemory accuracy for immediate and delayed JOLs in
Experiment 1.
Experiment 2 was designed to encourage participants to utilize
mnemonic cues at time of JOL. Specifically, the JOL prompt was
changed from an expression of confidence to a direct estimate of
retrieval. For both immediate and delayed JOLs, participants were
asked, "How many members of the category above can you cur-
rently recall?" This JOL cue should produce several clear results.
Compared with delayed JOLs, the magnitude of immediate JOLs
should be greater because the stimuli were presented just seconds
before. Similarly, bias scores should be greater for immediate
JOLs compared with delayed JOLs. The critical question is
whether relative metamemory accuracy (G) will improve after a
delay. If so, this would suggest that the modest delayed JOL
accuracy in Experiment 1 was produced by a reliance on less
effective, nonmnemonic cues.
Method
Participants and materials. Forty-two undergraduates volunteered for
Experiment 2. None of these individuals had participated previously, and
all volunteers received course credit. The stimuli were identical to those in
Experiment 1.
Design and procedure. The timing of JOLs (immediate vs. delayed)
was manipulated within subjects. The methodology of Experiment 2 was
identical to the first experiment except for the phrasing of the JOL cue. In
Experiment 2, a knowledge-based cue was used. Participants were asked,
"How many members of this category can you currently recall?" Partici-
pants chose one of six options (0, 1, 2, 3,4, or 5+) to maintain a consistent
6-point rating scale between Experiments 1 and 2. Participants were not
required to enter the recallable stimuli themselves at time of JOL; rather,
they were asked to report only the total number of exemplars they could
recall.
Recall, JOL magnitude, and JOL latency. Descriptive statis-
tics for recall, JOL magnitude, and JOL latency are listed in
Table 1. A( test for paired samples indicated that recall did not
differ reliably between conditions (mean recall = 3.12 for imme-
diate JOLs and 3.08 for delayed JOLs). As predicted, the mean
number of items participants claimed to remember was greater
immediately after study compared with after a delay (4.37 vs. 3.57,
respectively), f(41) = 9.40,/> < .001. The latency to produce JOLs
also differed reliably between conditions, f(41) = -2. 52, p =
.016, with participants taking longer to produce delayed JOLs than
immediate JOLs.
Metamemory accuracy. Bias scores were obtained by calcu-
lating the mean number of items reported at JOL minus the number
of items correctly recalled on the final memory test (see Table 1).
As expected, participants claimed to remember more items at time
of JOL than they actually recalled on the memory test. Bias scores
were positive and reliably nonzero in both conditions: immediate
JOLs, r(41) = 9.89, p < .001, and delayed JOLs, f(41) = 5.95, p <
.001. In addition, bias scores were reliably higher for immediate
judgments compared with delayed JOLs, f(41) = 7.04, p < .001.
In sum, participants always claimed to remember more items at
time of JOL than they could eventually recall on the test, and this
discrepancy was larger immediately after study.
Relative metamemory accuracy was assessed by computing G
between JOL magnitude and recall for each participant. G was
undefined in one or both conditions for 8 participants because of a
lack of variability in their responses. Six participants used a single
JOL rating for all immediate JOLs, 1 used a single rating for all
delayed JOLs, and 1 participant used a single rating for all JOLs
(both immediate and delayed). Participants in both conditions
showed reliable (nonzero) Gs: t(34) = 7.36, p < .001 (immediate
JOLs), and r(39) = 23.99, p < .001 (delayed JOLs). More impor-
tant, a large delayed JOL effect emerged (see Table 1). Delayed
judgments were significantly more accurate (G = .80) than im-
mediate predictions (G = .48), f(33) = 3.84, p < .001.
Summary. As hypothesized, participants reported better mem-
ory for items at time of JOL than they ultimately showed on the
memory test about 10 min later. This probably reflects normal
forgetting and was not the focus of this experiment. The variable
of primary interest in Experiment 2 was relative (item-by-item)
metamemory accuracy, G. Unlike Experiment 1, delayed JOLs
were reliably more accurate than immediate JOLs (Gs = .80 vs.
.48, respectively). This is consistent with the original delayed JOL
effect reported by Nelson and Dunlosky (1991) for paired associ-
ates. By using knowledge-based JOL cues (rather than prediction-
based cues), delayed judgments were more accurate than immedi-
ate judgments.
Reliable metamemory for categorized lists was observed in
Experiments 1 and 2, but increased delayed JOL accuracy was
observed only when participants were specifically cued to use then-
current recall as a basis for JOLs (Experiment 2). Although it was
impossible to directly confirm that participants made a recall
attempt at time of JOL, analyses of JOL latencies from both
experiments were consistent with this claim. First, participants
took significantly longer to make delayed judgments compared
with immediate judgments in Experiment 2 (see Table 1). If
participants were making a retrieval attempt at time of JOL, it
should have taken longer to recall the information after a delay
than immediately after study. No reliable difference in JOL laten-
cies was observed in Experiment 1, however, when participants
were not asked to make a retrieval attempt. Second, there was a
substantial increase in overall JOL latencies from Experiment 1 to
Experiment 2. This finding is also consistent with the notion that
participants made quick, nonretrieval-based JOLs in Experiment 1
compared with longer, knowledge-based JOLs in Experiment 2.
In Experiment 2, participants assessed their memory for the six
exemplars on a scale labeled 0, 1, 2, 3,4, and S+. Options S and 6
were collapsed into the 5+ rating to maintain the 6-point rating
scale used in Experiment 1. Because more items were likely to be
accessible immediately after study, one possibility is that collaps-
ing the upper end of the scale may have distorted predictive
accuracy for immediate JOLs. Although unlikely, it is possible that
a reliable difference in JOL accuracy emerged because the scale
selectively reduced immediate JOL accuracy, not because predic-
tive accuracy genuinely increased after delays.
This interpretation seems unlikely for two reasons. First, the
mean G for immediate JOLs in Experiment 1 is consistent with
that observed in Experiment 2 (.44 vs. .48, respectively). This
suggests that immediate JOL accuracy was not reduced as a
consequence of using the 5+ rating. Second, because G uses
ordinal information, any loss of resolution due to the 5+ rating
would have occurred when comparing perfectly recalled categories
given JOLs of 5+ with those containing five correct items given
ratings of 5+. Few categories were perfectly recalled in either
condition, and those that were perfectly recalled were distributed
evenly (8% of immediate JOL categories and 9% of delayed JOL
categories). Thus, the argument that the rating scale selectively
reduced immediate JOL accuracy is not tenable. Nevertheless, the
5+ rating was eliminated in Experiment 3.
Experiment 3
Taken together, Experiments 1 and 2 provide evidence that
delayed JOL accuracy is higher when knowledge-based cues rather
than prediction-based cues are used. However, these two cues also
differed in the type of scale that was presented to participants.
Consistent with previous research, a percentage scale (0%, 20%,
40%, 60%, 80%, or 100% confident) was used to elicit prediction
JOLs in Experiment 1. In contrast, participants in Experiment 2
provided knowledge-based JOLs using the raw number of items
that they could currently recall (0, 1, 2, 3, 4, or 5+). One possi-
bility is that delayed JOLs are not sensitive to prediction- versus
knowledge-based cues; rather, delayed JOLs may simply be more
accurate when raw numbers are used, compared with percentages.
Experiment 3 examined this possibility. In Condition A, a
prediction cue based on a percentage scale (0%, 20%, 40%, 60%,
80%, or 100% confident) was used; this was the same cue included
in Experiment 1. In Condition B, a prediction cue based on the
actual number of items to be remembered (1, 2, 3, 4, 5 or 6) was
devised.
1
In Condition C, JOLs were elicited with a knowledge cue
based on the actual number of items to be remembered (1, 2, 3,4,
S or 6); this was the same type of cue used in Experiment 2. If
JOLs are sensitive to the scale (percentages vs. actual numbers),
then performance in Conditions A and B should differ. For exam-
ple, if participants were more accurate using actual numbers rather
than percentages, then monitoring accuracy in Condition B should
be higher than in Condition A. Alternatively, if JOLs are sensitive
to prediction- versus knowledge-based cues (independent of the
scale used), then there should be no difference between Conditions
A and B, and accuracy in Condition C should be highest.
In Condition D, knowledge and prediction cues were combined.
Participants in Condition D provided two separate JOLs. First,
they were asked how many items they could currently recall (1, 2,
3, 4, S, or 6)the knowledge component. After making this
judgment, participants were asked to predict how many items they
would be able to recall on the memory test (1, 2, 3,4, 5 or 6)the
prediction component. Both components were included to deter-
mine whether participants would incorporate information derived
from knowledge-based judgments into their subsequent predictive
JOLs.
In sum, there were three primary concerns in Experiment 3.
First, it was important to replicate the delayed JOL effect observed
in Experiment 2. Second, Experiment 3 was designed to determine
whether the increase in delayed JOL accuracy was due to a change
in the type of scale (from percentages to numbers) or to a change
from predictive cues to knowledge-based cues. If delayed JOL
accuracy improved in Experiment 2 solely because a number scale
was used instead of a percentage scale, then JOL accuracy should
be higher in Condition B than in Condition A. Alternatively, if
delayed JOL accuracy increased because the cue was changed
1
The rating scale from Experiment 2 (0, 1,2, 3, 4, 5+) was changed. A
6-point scale was preserved, but in the values in Experiment 3 ranged
from 1 to 6. This change was made to eliminate possible difficulties
associated with the 5+ rating and because the 0 rating was used less than
2% of the time in Experiment 2.
806 KELEMEN
from prediction based to knowledge based, then (a) there should be
no difference between Conditions A and B because both are
prediction based, and (b) JOL accuracy in knowledge-based Con-
dition C should be greater than both prediction-based conditions
(A and B). The third important issue in Experiment 3 was to test
whether prediction-based cues could produce high delayed JOL
accuracy if participants were forced to first consider their current
state of knowledge. If so, predictive judgments preceded by
knowledge-based JOLs in Condition D should be more accurate
than predictive judgments alone in Condition B.
2
Method
Participants and materials. A total of 120 undergraduates volunteered
to participate in Experiment 3. All volunteers received course credit, and
none of the participants had completed previous versions of this experi-
ment. The same categorized lists from Experiments 1 and 2 were used in
Experiment 3.
Design and procedure. A mixed design was used in Experiment 3. The
timing of judgments (immediate or delayed) was manipulated within
subjects, and the phrasing of the JOL cue was a between-subjects manip-
ulation. Each participant received one of the following four JOL cues. In
Condition A, a prediction cue based on a percentage scale was used.
Participants were asked, "How confident are you that in about 10 minutes
from now you will be able to recall the members of this category when
shown the category name?" Participants rated their certainty on a 6-point
percentage scale ranging from 0% (definitely will not recall) to 100%
definitely will recalt). In Condition B, a prediction cue based on the actual
number of items to be recalled was devised. Participants were asked, "How
many members of this category will you recall on a test in about 10
minutes?" and they entered a number from 1 to 6. Thus, both Conditions
A and B were prediction based; the difference was whether a percentage
scale or the actual number of items was used. In Condition C, a knowledge-
based cue was included. Participants were asked, "How many members of
this category can you currently recall?" and they entered a number from 1
to 6. Thus, the difference between Conditions B and C was whether the
JOL cue referred to prediction or to current knowledge. Finally, in Con-
dition D, both knowledge and prediction cues were used. Participants made
two judgments. First, they were asked, "How many members of this
category can you currently recall?" Immediately after answering this
question, a second prompt appeared, asking, "How many members of this
category will you recall on a test in about 10 minutes?" Participants
answered both questions by entering a number from 1 to 6.
On arrival, each participant selected one of four cards for assignment to
the between-subject condition. Using these procedures, 25 participants
were assigned to Condition A, 33 to Condition B, 30 to Condition C,
and 32 to Condition D. After being assigned to a condition and receiving
instructions, participants completed the study, JOL (immediate and de-
layed), distractor, and test procedures as in Experiments 1 and 2.
Recall, JOL magnitude, and JOL latency. Mean values for
recall and JOL magnitude were converted to proportions and are
listed in Table 2. Recall scores were similar regardless of the
timing or phrasing of the JOL cue. A 2 X 4 (Timing X Phrasing
of JOL Cue) multivariate analysis of variance (MANOVA) was
conducted on recall. No reliable main effects emerged, and the
interaction was not significant (all ps > .05), suggesting that the
timing and phrasing of JOL cues had no effect on mean recall.
These results are consistent with Experiments 1 and 2.
The magnitude of JOLs was examined across conditions. Con-
sistent with Experiments 1 and 2, the magnitude of JOLs appeared
to be greater immediately after study than after a delay (see Table
2). A 2 X 4 MANOVA confirmed a reliable main effect of JOL
timing, F{\, 116) = 134, MSE = .01.
3
As in Experiments 1 and 2,
the magnitude of JOLs was higher immediately after study com-
pared with after a delay. The magnitude of JOLs also differed
reliably across Conditions A through D, F(3,116) = 3.56, MSE =
.03. The interaction between variables was not reliable. Because
the interaction term was nonsignificant, immediate and delayed
JOL ratings were collapsed for post hoc comparisons. Dunnett's C
procedure revealed that the magnitude of judgments was higher
when participants were asked how many items they could cur-
rently recall (knowledge based, Condition C) than when partici-
pants predicted future recall in Condition D.
Participants in Condition D made both knowledge and predic-
tion JOLs. Two planned comparisons were conducted to determine
whether participants adjusted the magnitude of their two judg-
ments in Condition D. For immediate JOLs, the initial, knowledge-
based judgments were reliably higher than subsequent predictive
judgments, /(31) = 5.95, p < .001. An analogous, reliable differ-
ence was also found for delayed JOLs, /(31) = 3.07, p = .004.
Participants' first JOL (based on current knowledge) was higher
than their second JOL (predicting future performance). Thus, par-
ticipants decreased the magnitude of their judgments to reflect
estimates of future forgetting.
The latency to provide JOLs was calculated for each participant,
and these mean values also appear in Table 2. Reliable main
effects were observed for timing and phrasing of the JOL cues,
F(l , 116) = 5.72, MSE = 4,690, and F(3, 116) = 32.2,
MSE = 14,893, respectively. The interaction between variables
also was statistically reliable. Inspection of the mean values sug-
gests that latencies were longer for knowledge-based JOLs. Post
hoc tests confirmed that for both immediate and delayed JOLs,
latencies in Condition C were reliably longer than in Conditions A,
B, and the predictive component of D. This finding is consistent
with the notion that participants in the prediction conditions used
a different basis for their JOLs, whereas participants in the
knowledge-based conditions made a retrieval-based attempt that
took longer.
Metamemory accuracy. As in Experiments 1 and 2, bias
scores were calculated to examine differences between the mag-
nitude of participants' JOLs and their recall. Mean values across
conditions are reported in Table 2. Mean bias scores were positive
2
The four groups included in Experiment 3 represent combinations of
two variables: type of scale (number vs. percentage) and type of cue
(prediction based vs. knowledge based). Four conditions were devised to
test the primary concerns discussed previously, including Condition D,
which contained two types of JOLs. All possible combinations of variables
were not tested, however (e.g., prediction-based percentage scales pre-
ceded by knowledge-based number scales were not included), so it is not
possible to evaluate other potential interactions between these variables.
3
In this MANOVA, and in subsequent MANOVAs reported for JOL
response latencies, bias scores, and G, performance was examined as a
function of JOL timing (immediate vs. delayed) and phrasing (Conditions
A through D). Participants made two JOLs in Condition D, but the primary
concern was with the latter, predictive component. The initial, knowledge
component of Condition D was essentially a replication of Condition C,
and so it was omitted from these MANOVAs and considered separately in
a subsequent analysis.
Table 2
Mean Judgment of Learning (JOL) Magnitude, Recall, and Bias Scores in Experiment 3
Condition
A. Prediction based (percentage)
Immediate JOL
Delayed JOL
B. Prediction based (number)
Immediate JOL
Delayed JOL
C. Knowledge based (number)
Immediate JOL
Delayed JOL
D. Combined (number)
Knowledge component
Immediate JOL
Delayed JOL
Prediction component
Immediate JOL
Delayed JOL
Recall
.42 (.12)
.40 (.14)
.44 (.13)
.45 (.15)
.46 (.16)
.46 (.15)
.47 (.14)
.46 (.14)
.47 (.14)
.46 (.14)
Magnitude
.60 (.20)
.49 (.18)
.62 (.13)
.54 (.13)
.72 (.09)
.57 (.09)
.72 (.11)
.55 (.11)
.61 (.12)
.51 (.10)
JOL
Latency
5,288 (2,436)
5,719 (1,400)
5,309 (2,054)
5,903 (3,822)
8,084 (4,071)
11,706(5,250)
8,519(3,459)
12,488 (4,438)
4,165 (2,264)
2,208(1,138)
Bias
.18 (.22)
.08 (.25)
.18 (.18)
.09 (.16)
.26 (.14)
.11 (.12)
.25 (.14)
.09 (.11)
.14 (.18)
.05 (.12)
Note. Main entries are mean values; entries in parentheses are standard deviations. Latency is the time (in
milliseconds) required for participants to enter their JOL from the onset of the metamemory prompt.
in all conditions, indicating that the magnitude of JOLs was greater
than recall on the memory test. A 2 X 4 MANOVA was con-
ducted, and a reliable main effect of delay was observed, F(l,
116) = 78.6, MSE = .01. No reliable effect emerged for the
phrasing of the JOL cue, and the interaction between variables was
not significant. Thus, bias scores were lower after delays, regard-
less of whether the JOLs were elicited by knowledge- or
prediction-based cues.
Relative metamemory accuracy was assessed by computing G
between JOL magnitude and recall. Mean values across conditions
are shown in Figure 1. G was undefined for two participants in
Condition A because they used the same percentage rating for all
delayed JOLs. Data from these participants were omitted from the
following analyses. A 2 X 4 MANOVA was conducted, and both
main effects (timing of JOL and phrasing of JOL) were reliable,
F(l , 114) = 17.51, MSE = .08, and F(3,114) = 8.13, MSE = .09,
respectively. The interaction between variables also was reliable,
F(l, 114) = 3.12, MSE = .08. Dunnett's C post hoc test showed
no differences in immediate JOL accuracy across all four condi-
tions. For delayed JOLs, however, G was reliably higher in Con-
ditions C and D (knowledge-based JOLs) than in Conditions A and
B (prediction-based JOLs).
Another important question concerned which conditions, if any,
produced a reliable delayed JOL effect. No statistically reliable
differences between immediate and delayed JOL accuracy were
observed in Conditions A and B (see filled circle and triangle,
respectively, in Figure 1). However, paired t tests showed a reli-
able delayed JOL effect in Condition C and for both types of
judgments in Condition D (all three ps < .001). Thus, the results
of Experiments 1 and 2 were replicated.
Summary. Three main hypotheses were explored in Experi-
ment 3. First, it was predicted that a reliable delayed JOL effect
should be observed in Condition C because this condition repli-
cated the procedures of Experiment 2. Delayed JOL accuracy in
Condition C was indeed higher (G = .81) than immediate JOL
accuracy (G = .50), suggesting that the delayed JOL effect does
extend to categories of six items when knowledge-based cues are
used.
Second, the source of this improvement in delayed JOL accu-
racy was examined. In particular, the possibility that changing
from a percentage scale to one including the raw number of items
may have caused this change was tested. The predictive cue in
Condition A was elicited using a percentage scale (cf. Experiment
1); the cue in Condition B also was predictive, but it was based on
the number of items to be recalled. No reliable differences in
monitoring accuracy (bias and G) emerged between these two
conditions. Thus, the shift from percentages to raw numbers in
Experiments 1 and 2 probably cannot account for the observed
differences in delayed JOL accuracy.
The relative efficacy of predictive- versus knowledge-based
cues was examined next. If the change from prediction-based to
knowledge-based cues was critical, then delayed JOL accuracy
should have been higher in Condition C than in Conditions A and
B. This reliable pattern of results was obtained in Experiment 3.
Delayed JOL accuracy was reliably higher using knowledge-based
cues (Condition C) than prediction-based cues (Conditions A
andB).
The third important issue involved the between-groups compar-
ison of predictive monitoring accuracy in Condition B with the
predictive component of Condition D. The phrasing of JOLs in
both cases was identical, and both judgments were based on the
number of items participants predicted they would recall. The
difference was that participants in Condition D provided a
knowledge-based JOL before making their predictions, whereas
participants in Condition B did not. A comparison of the open and
filled triangles in Figure 1 shows that large differences in delayed
JOL accuracy were obtained between conditions. A reliable de-
layed JOL effect emerged when participants made a recall attempt
(i.e., knowledge-based JOL) prior to their predictions but not when
predictions were provided alone.
808 KELEMEN
1.0 -i
0.8 -
0.6 -
0.4 -
0.2-
0.0
Condition:
C - Knowledge (number)
D - Knowledge (number)
0 - Prediction (number)
A - Prediction (percentage)
B - Prediction (number)
Immediate Delayed
Timing of JOL
Figure 1. Mean Gamma correlations in Experiment 3 as a function of delay and type of judgment of learning
(JOL) cue.
An interesting pattern of results was obtained when the two
JOLs in Condition D were compared. Participants did reduce the
magnitude of their latter, predictive JOLs compared with their
initial, knowledge-based JOLs. This suggests that participants in-
corporated some theory of forgetting into their latter judgments. As
a result, the magnitude of bias was reduced for the prediction-
based JOLs. No differences in G, however, were observed between
knowledge and predictive judgments. Thus, participants reduced
the overall level of their JOLs from the first judgment to the
second, but this reduction was consistent across all items. Partic-
ipants did not selectively reduce their judgments for some items
compared with others, and as a result, item-by-item monitoring
accuracy (G) was unaffected.
General Discussion
The present study examined metamemory for categorized lists
of items. Participants showed reliable (nonzero) relative monitor-
ing accuracy (G) in all three experiments. Mean Gs for immediate
JOLs were moderate (ranging from .44 to .52), consistent with the
results of previous studies using paired associates (e.g., Dunlosky
& Nelson, 1992, 1994; Kelemen & Weaver 1997; Nelson &
Dunlosky, 1991) and text materials (Maki & Serra, 1992; Weaver,
1990) Thus, students appeared to monitor their memory for cate-
gorized lists at a level comparable to that obtained using other
verbal stimuli.
One important question concerned whether the delayed JOL
effect (Nelson & Dunlosky, 1991) would emerge in this novel task.
Previous research on metamemory for more complex stimuli (i.e.,
texts) has failed to find an advantage for delayed judgments
(Glenberg, Sanocki, Epstein, & Morris, 1987; Maki, 1998b). In-
deed, all previous demonstrations of the delayed JOL effect had
used paired-associate stimuli. Using a standard, prediction-based
JOL prompt in Experiment 1, delayed JOLs were not reliably
better than immediate JOLs. One reason for this may be that
participants were unlikely to attempt retrieval of all six items when
making their judgments. For delayed JOLs, the retrieval (or re-
trieval failure) at time of JOL can be highly diagnostic of future
test performance and may even alter it (Kelemen & Weaver, 1997;
Spellman & Bjork, 1992, 1997). Such a retrieval attempt is
straightforward using paired associates because the response term
is a single item. When studying categorized lists, however, a
retrieval attempt at time of JOL involved six items. Increased
cognitive load in the latter case may have led participants to rely
on more readily availablebut less diagnosticinformation. If
so, encouraging students to make a retrieval attempt at time of
judgment should increase delayed JOL accuracy.
Evidence supporting these ideas was obtained in Experiments 2
and 3. In Experiment 2, a mnemonic, knowledge-based cue was
used ("How many items can you currently recall?") as opposed to
the common, prediction-based cue ("How confident are you in
future recall?"). Thus, participants were asked to make a covert
retrieval attempt at time of JOL and enter the number of items they
could currently recall. Using a knowledge-based cue in Experi-
ment 2 had three clear effects. First, the magnitude of judgments
was greater immediately after study compared with after a delay.
This probably reflects normal forgetting during the delay between
study and JOL and is not of primary interest. More important,
relative metamemory accuracy (G) increased substantially after a
delay. In contrast to Experiment 1, delayed JOLs were highly
accurate in Experiment 2. Third, the time required for participants
to provide JOLs increased substantially in Experiment 2 (see Table
1). One interpretation of this finding is that participants were
making an effortful, longer retrieval attempt during JOLs in Ex-
periment 2 but not in Experiment 1. If so, then delayed JOL
latencies should be longer than immediate JOL latencies only in
Experiment 2 because it would be harder to recall the items after
a delay. This reliable pattern of results was observed and was
subsequently replicated in Experiment 3. Although the JOL la-
tency data are intriguing, the relationship between them and the
inferred cognitive processes is speculative.
Experiment 3 eliminated the type of scale used (percentage vs.
actual number of items) as a possible explanation for the contrast-
ing findings in the first two experiments. No differences in G
emerged between predictive JOLs based on a percentage scale
compared with predictive JOLs based on the actual number of
items to be recalled. When participants provided only one type of
JOL after a delay, knowledge-based judgments were always more
accurate than predictive judgments. Together, these data suggest
that incorporating a delay between study and JOL may be neces-
sary to improve monitoring accuracy, but it is not sufficient.
Delays may be effective in improving metamemory accuracy only
if retrieval-based judgments are elicited.
These results can be interpreted using Koriat's (1997) cue-
utilization framework. This view suggests that JOLs are based on
a variety of cues and that the relative weight assigned to different
types of information varies. JOLs provided immediately after
study are based primarily on nonmnemonic cues and are moder-
ately accurate. Conversely, when delayed JOLs are based on
mnemonic information (e.g., a recall attempt), then predictive
accuracy is high. In Experiment 1, delayed JOLs may not have
been based on mnemonic information, and as a result, delayed JOL
accuracy was moderate. When participants were explicitly cued to
use mnemonic information (Experiments 2 and 3), delayed JOL
accuracy improved significantly.
Examination of Condition D in Experiment 3 provides further
insight into the basis of JOLs. In this case, two judgments were
provided: a knowledge-based JOL ("How many items can you
recall?") followed by a predictive JOL ("How many items will you
recall?"). The magnitude of predictive JOLs was reliably reduced
in the latter judgments, suggesting that participants incorporated
some theory about forgetting into these JOLs. The adjustment did
not depend on the presence of previous knowledge-based JOLs
because the magnitude of JOLs in Condition B was very similar.
Thus, even when predictive JOLs were provided by themselves,
they included some estimate of forgetting. These findings are
consistent with Rawson, Dunlosky, and McDonald (2000), who
demonstrated that predictions of performance for texts include
estimates of retention.
In the present study, participants did not appear to make
retrieval-based JOLs unless specifically prompted. Participants
clearly were able to make such attempts, and they were very
accurate when they did so. In Condition D of Experiment 3, for
example, participants appeared to incorporate the results of
knowledge-based JOLs into their subsequent predictive judg-
ments. Delayed monitoring accuracy was high (G .76) when
knowledge-based JOLs preceded predictive JOLs (Condition D)
but only modest when predictive JOLs occurred in isolation (G =
.48 in Condition B). In sum, the results of Experiment 3 suggest
that participants included a theory of forgetting in the magnitude of
their JOLs, but they did not make a diagnostic retrieval attempt at
time of JOL unless specifically prompted. Once retrieval was
attempted, however, this knowledge was utilized effectively in
subsequent predictive delayed JOLs.
These results add to a growing body of evidence for the impact
of metamemory cues on memory-monitoring accuracy. Differ-
ences in the phrasing of metamemory cues can alter how partici-
pants make their judgments in a number of domains. Widner and
Smith (1996) observed that feeling-of-knowing (FOK) accuracy
increased when task-relevant information was emphasized in the
FOK cue. Studying text comprehension monitoring, Maki and
Serra (1992) found that metamemory accuracy was higher for
memory predictions compared with judgments of comprehension.
These studies found that monitoring accuracy was better for
predictive judgments compared with knowledge-based or
comprehension-based judgments, respectively. In the present
study, however, predictive judgments were less accurate than
knowledge-based judgments.
One explanation for this difference may be that predictive
judgments are better when the target information cannot be re-
called at the time of judgment. Widner and Smith's (1996) FOK
task was based on the likelihood of future recognition for stimuli
that could not be recalled at time of judgment. Similarly, in text
comprehension monitoring, all of the to-be-remembered informa-
tion about a passage of text cannot be brought to mind at time of
JOL. Thus, using a knowledge-based cue when recall is not pos-
sible appears to be less effective than using predictive cues. In the
present study, participants were able to retrieve some or all of the
items at time of JOL, and knowledge-based cues were more
effective. Thus, one's current state of knowledge may be a good
basis for future memory performance when at least some of the
information can be recalled. If the target information is unavailable
at time of judgment, then predictive cues may be more effective.
Retrieval success or failure at time of JOL can be a very salient
and effective cue for subsequent metamemory judgments. The
present study demonstrates, however, that participants may not
always use this information as a basis for their judgments. In this
case, participants appeared to require specific prompting to use
retrieval as a basis for their JOLs. Moreover, different types of
judgments (e.g., immediate vs. delayed) are differentially sensitive
to the phrasing of JOL cues. Future studies of memory monitoring
should include carefully constructed metamemory cues because
the type of diagnostic information utilized by participants can have
a substantial impact on monitoring accuracy.
From a practical standpoint, college students are obliged to
acquire new information in a variety of forms. Sometimes, they
may study paired associates (e.g., when learning a foreign lan-
guage). More frequently, students read passages from a textbook
and must monitor their comprehension of complex text material. In
another common situation, students may be asked to learn lists (or
categories) of related information (e.g., famous authors, important
companies, lobes of the brain, etc.). These data show that students
can monitor their memories for lists of items immediately after
study at a level similar to that obtained using paired associates and
texts. Moreover, delayed judgments can be highly accurate if an
actual retrieval attempt is made at the time of JOL. Thus, Nelson
and Dunlosky's (1991) delayed JOL effect is robust across stimuli.
The highest levels of monitoring accuracy in this study were
obtained when students made knowledge-based judgments of re-
call after a delay. How much delay is enough? The present study
used intervals of about 1 min or more between study and JOL, but
other work by Kelemen and Weaver (1997) suggests that measur-
able increases can be obtained after only a few seconds. Thus, a
practical recommendation for improving metamemory in a variety
of situations may be to include a brief delay between study and
JOL. When assessing future memory for recallable information,
students are well-advised to make delayed, retrieval-based JOLs.
References
Battig, W. F., & Montague, W. E. (1969). Category norms for verbal items
in 56 categories: A replication and extension of the Connecticut category
norms. Journal of Experimental Psychology Monographs, 80 (3, Pt. 2).
Begg, I., Duft, S., Lalonde, P., Melnick, R., & Sanvito, J. (1989). Memory
predictions are based on ease of processing. Journal of Memory and
Language, 28, 610-632.
810 KELEMEN
Benjamin, A. S., & Bjork, R. A. (1996). Retrieval fluency as a metacog-
nitive index. In L. M. Reder (Ed.), Implicit memory and metacognition
(pp. 309-338). Mahwah, NJ: Erlbaum.
Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for
judgments of learning (JOL) and the delayed-JOL effect. Memory &
Cognition, 20, 374-380.
Dunlosky, J., & Nelson, T. O. (1994). Does the sensitivity of judgments of
learning (JOLs) to the effects of various study activities depend on when
the JOLs occur? Journal of Memory and Language, 33, 545-565.
Dunlosky, J., & Nelson, T. O. (1997). Similarity between the cue for
judgments of learning (JOL) and the cue for test is not the primary
determinant of JOL accuracy. Journal of Memory and Language, 36,
34-49.
Flavell, J. H. (1981). Cognitive monitoring. In W. P. Dickson (Ed.),
Children's oral communication skills (pp. 35-60). New York: Academic
Press.
Glenberg, A. M., Sanocki, T., Epstein, W., & Morris, C. (1987). Enhancing
calibration of comprehension. Journal of Experimental Psychology:
General, 116, 119-136.
Hacker, D. J., Dunlosky, J., & Graesser, A. C. (Eds.). (1998). Metacogni-
tion in educational theory and practice. Mahwah, NJ: Erlbaum.
Hertzog, C, Dixon, R. A., & Hultsch, D. F. (1990). Relationships between
metamemory, memory predictions, and memory task performance in
adults. Psychology and Aging, 5, 215-227.
Kelemen, W. L., & Weaver, C. A., HJ. (1997). Enhanced metamemory at
delays: Why do judgments of learning improve over time? Journal of
Experimental Psychology: Learning, Memory, and Cognition, 23, 1394-
1409.
Koriat, A. (1997). Monitoring one's own knowledge during study: A
cue-utilization approach to judgments of learning. Journal of Experi-
mental Psychology: General, 126, 349-370.
Maki, R. H. (1998a). Metacomprehension of text: Influence of absolute
confidence level on bias and accuracy. Psychology of Learning and
Motivation, 38, 223-248.
Maki, R. H. (1998b) Predicting performance on text: Delayed versus
immediate predictions and tests. Memory & Cognition, 26, 959-964.
Maki, R. H., & Serra, M. (1992). The basis of test predictions for text
material. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 18, 116-126.
Maki, R. H., & Swett, S. (1987). Metamemory for narrative text. Memory
& Cognition, 15, 72-87.
Nelson, T. O. (1984). A comparison of current measures of the accuracy of
feeling-of-knowing predictions. Psychological Bulletin, 95, 109-133.
Nelson, T. O. (19%). Gamma is a measure of the accuracy of predicting
performance on one item relative to another item, not of the absolute
performance of an individual item. Applied Cognitive Psychology, 10,
257-260.
Nelson, T. O., & Dunlosky, J. (1991). When people's judgments of
learning (JOLs) are extremely accurate at predicting subsequent recall:
The "delayed-JOL effect." Psychological Science, 2, 267-270.
Nelson, T. O., Leonesio, R. J., Landwehr, R. S., & Narens, L. (1986). A
comparison of three predictors of an individual's memory performance:
The individual's feeling of knowing versus normative feeling of know-
ing versus base-rate item difficulty. Journal of Experimental Psychol-
ogy: Learning, Memory, and Cognition, 12, 279-287.
Nelson, T. O., & Narens, L. (1994). Why investigate metacognition? In J.
Metcalfe & A. P. Shimamura (Eds.), Metacognition: Knowing about
knowing (pp. 1-27). Cambridge, MA: MIT Press.
Pressley, M., Levin, J. R., & Ghatala, E. S. (1984). Memory strategy
monitoring in adults and children. Journal of Verbal Learning and
Verbal Behavior, 23, 270-288.
Rawson, K. A., Dunlosky, J., & McDonald, S. L. (2000). Influence of
metamemory on performance predictions for text. Manuscript submitted
for publication.
Schneider, W. (1995). Micro Experimental Laboratory (Version 2.0)
[Computer Software]. Pittsburgh, PA: Psychological Software Tools.
Schwartz, B. L. (1994). Sources of information in metamemory: Judgments
of learning and feelings of knowing. Psychonomic Bulletin & Review, 1,
357-375.
Schwartz, B. L., & Metcalfe, J. (1994). Methodological problems and
pitfalls in the study of human metacognition. In J. Metcalfe & A. P.
Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 93-
113). Cambridge, MA: MIT Press.
Spellman, B. A., & Bjork, R. A. (1992). When predictions create reality:
Judgments of learning may alter what they are intended to assess.
Psychological Science, 3, 315-316.
Spellman, B. A., & Bjork, R. A. (1997, November). When prophecy
succeeds (too well): Inaccurate judgments of learning can produce
better-than-perfect predictions. Poster session presented at the 38th
annual meeting of the Psychonomic Society, Philadelphia.
Thiede, K. W., & Dunlosky, J. (1994). Delaying students' metacognitive
monitoring improves their accuracy in predicting their recognition per-
formance. Journal of Educational Psychology, 86, 290-302.
Weaver, C. A., JJI. (1990). Constraining factors in calibration of compre-
hension. Journal of Experimental Psychology: Learning, Memory, and
Cognition, 16, 214-222.
Weaver, C. A., UJ, & Kelemen, W. L. (1997). Judgments of learning at
delays: Shifts in response patterns or increased metamemory accuracy?
Psychological Science, 8, 318-321.
Widner, R. L. Jr., & Smith, S. M. (1996). Feeling-of-knowing judgments
from the subject's perspective. American Journal of Psychology, 109,
373-387.
Wright, D. B. (1996). Measuring feeling of knowing. Applied Cognitive
Psychology, 10, 261-268.
Yates, J. F. (1990). Judgment and decision making. Englewood Cliffs, NJ:
Prentice Hall.
Received August 26, 1999
Revision received March 13, 2000
Accepted March 13, 2000

Metamemory Cues and Monitoring Accuracy Judging What You Know

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Metamemory Cues and Monitoring Accuracy Judging What You Know

Caricato da

Copyright:

Formati disponibili

Journal of Educational Psychology

2000, Vol. 92, No. 4, 800-810

Potrebbero piacerti anche