0 valutazioniIl 0% ha trovato utile questo documento (0 voti)
10 visualizzazioni11 pagine
Phrasing of metamemory cues can have a substantial impact on monitoring accuracy. Memory monitoring and control in education has been studied with increasing interest. Students who are inaccurate in their memory monitoring may direct their efforts to inappropriate material.
Phrasing of metamemory cues can have a substantial impact on monitoring accuracy. Memory monitoring and control in education has been studied with increasing interest. Students who are inaccurate in their memory monitoring may direct their efforts to inappropriate material.
Phrasing of metamemory cues can have a substantial impact on monitoring accuracy. Memory monitoring and control in education has been studied with increasing interest. Students who are inaccurate in their memory monitoring may direct their efforts to inappropriate material.
Copyright 2000 by the American Psychological Association, Inc. 0O22-O663/0O/$5.0O DOI: 10.I037//0022-0663.92.4.800 Metamemory Cues and Monitoring Accuracy: Judging What You Know and What You Will Know William L. Kelemen University of MissouriSt. Louis Three experiments examined metamemory for categorized lists of items. Judgments of learning (JOLs) were obtained from college students either immediately after study or following a brief (at least 30-s) delay. In contrast to past findings (e.g., T.O. Nelson & J. Dunlosky, 1991), no advantage was found for delayed JOLs in Experiment 1, using a standard, prediction-based metamemory cue. In Experiment 2, knowledge-based judgments were elicited, and delayed JOL accuracy improved significantly. The relative efficacy of 4 different metamemory cues was examined in Experiment 3. An interaction between the timing and phrasing of JOL cues was detected: Delayed JOLs were more accurate than immediate JOLs only when knowledge-based cues were used. These results are interpreted in A. Koriat's (1997) cue-utilization framework for JOL accuracy, and they show that the phrasing of metamemory cues can have a substantial impact on monitoring accuracy. The role of memory monitoring and control in education has been studied with increasing interest in recent years (Hacker, Dunlosky, & Graesser, 1998). During exam preparation, for ex- ample, students must accurately assess their current state of knowl- edge to effectively regulate their ongoing learning. Students who are inaccurate in their memory monitoring may direct their efforts to inappropriate material, make inefficient use of their study time, or continue to utilize ineffective study strategies. For example, Pressley, Levin, and Ghatala (1984) showed that when college students studied new vocabulary items, they initially showed no preference for an elaboration strategy compared with rote repeti- tion, even though the former produced much higher rates of learning. After experience with a practice list and test, however, these students showed improved metamemory by selecting elabo- ration instead of repetition. From a theoretical perspective, the change in study strategy reflects the influence of metamemory on cognitive goals and actions (Flavell, 1981). From a practical stand- point, Pressley et al.'s study underscores the importance of metamemory in study efficacy and students' potential academic success. One way in which students can monitor their memories is to make judgments of learning (JOLs) during study or very soon afterward. These JOLs may be used to regulate future study activity and to predict subsequent memory performance (Nelson & Narens, 1994). For example, participants may study some novel information and then be asked to predict how well they will Experiments 1 and 2 were conducted as part of a doctoral dissertation completed by William L. Kelemen at Baylor University. The results of these two experiments were presented at the 44th Annual Meeting of the Southwestern Psychological Association, New Orleans, April 1998.1 thank Chuck Weaver for his consultation and helpful comments on this article and Heather DeRousse for her assistance with data collection. Correspondence concerning this article should be addressed to William L. Kelemen, Department of Psychology, University of MissouriSt. Louis, 8001 Natural Bridge Road, St. Louis, Missouri 63121-4499. Elec- tronic mail may be sent to kelemen@umsl.edu. perform on a future test (i.e., make a JOL). Metamemory accuracy is assessed by comparing the magnitude of JOLs for individual items with future recall. If participants' memory monitoring is accurate, then items receiving high JOLs should be more likely to be recalled than items receiving lower JOLs. The basis of participants' JOLs has been examined by a number of researchers (e.g., Begg, Duft, Lalonde, Melnick, & Sanvito, 1989; Koriat, 1997; Nelson & Narens, 1994; Schwartz, 1994). Koriat described three classes of information that participants may use when making JOLs: intrinsic, extrinsic, and mnemonic factors. Intrinsic factors are related to properties of the stimuli, for exam- ple, item concreteness, relatedness of the items, and so on. Prop- erties of the encoding conditions (e.g., amount of study time available, study strategy, number of learning trials, etc.) are re- ferred to as extrinsic factors. Mnemonic factors refer to internal, experienced-based indicators of future recall, including memory of previous recall attempts, accessibility of target information, and cue familiarity. Koriat found that participants tended to discount extrinsic factors in favor of intrinsic information during initial study trials. After practice, however, participants based their JOLs on mnemonic information, and the mean Gamma correlation (G) between JOLs and recall increased. Thus, JOLs may be based on a collection of information, and the relative weight participants assign to each factor can vary across situations. JOL accuracy is determined by the source(s) of information participants incorpo- rate into their judgments. Mnemonic information can be especially useful if the conditions at time of JOL are closely related to those during recall (Koriat, 1997). Monitoring Retrieval After Delays JOL accuracy should be high when the class of information used as the basis for JOLs is highly diagnostic of future memory. One type of mnemonic cue, accessibility of information at time of JOL, is often used by students as a basis for their metamemory judg- ments (Benjamin & Bjork, 1996). Because JOLs typically are provided during study, accessibility can be an imperfect basis for prediction (not all information remembered at time of JOL will be 800 METAMEMORY CUES AND MONITORING ACCURACY 801 remembered on the test). Nelson and Dunlosky (1991), however, found that increasing the amount of time allowed between study and JOL dramatically improved monitoring accuracy. Participants studied paired associates and then made JOLs either immediately after study (hereafter identified as "immediate JOLs") or several minutes later (hereafter identified as "delayed JOLs"). Predictive accuracy was modest in the first case (mean G between predicted and actual recall = .38). In contrast, mean G for delayed JOLs was very high (.90). This so-called "delayed JOL effect" is robust across a variety of encoding and test conditions using paired associates (Dunlosky & Nelson, 1992, 1994, 1997; Kelemen & Weaver, 1997; Thiede & Dunlosky, 1994; Weaver & Kelemen, 1997). Theoretical explanations for such high delayed JOL accuracy have focused on the retrieval of target information at time of judgment. One specific account, known as the monitoring-dual- memories hypothesis, was proposed by Nelson and Dunlosky (1991). They suggested that participants monitor information in both short-term and long-term memory at time of JOL, although the recall test taps only long-term memory. Delayed JOLs are more accurate than immediate JOLs because the former are not contam- inated by target information in short-term memory. Consistent with this hypothesis, Dunlosky and Nelson (1992) showed that reinstating the cue-target pair into short-term memory during a delayed JOL substantially reduced predictive accuracy (see also Dunlosky and Nelson, 1994, 1997, for further discussion). A related hypothesis that also emphasizes the role of target retrieval has been advanced by Spellman and Bjork (1992, 1997). They suggested that delayed JOLs are highly accurate because recall of a target after a delay improves the probability of recalling that item on a subsequent memory test. From this view, delayed JOLs are accurate because they reinforce what they assess. Imme- diate JOLs do not increase future memory because retrieval at- tempts are always successful immediately after study. Thus, im- mediate JOLs function as a massed study trial, whereas delayed JOLs can be a spaced study trial, if the target is retrieved. There is evidence that, in some cases, delayed JOLs do alter the probability of future recall (Kelemen & Weaver, 1997; Spellman & Bjork, 1997). For example, Kelemen and Weaver compared immediate JOLs with judgments made after delays ranging from a few sec- onds to 5 min. They found that delayed JOLs were always more accurate than immediate JOLs, and mean recall scores were reli- ably higher in delayed JOL conditions compared with immediate JOLs in two of three experiments. Moreover, the conditional probability of cued recall on the test (given successful initial retrieval) increased monotonically with longer delays. Although theoretically distinct, Nelson and Dunlosky's (1991) hypothesis and Spellman and Bjork's (1992) explanation are not necessarily mutually exclusive. In fact, both views emphasize the importance of making a target retrieval attempt at time of JOL. Koriat (1997) suggested that delayed JOLs are more accurate than immediate JOLs because the former are based more heavily on mnemonic cues (i.e., retrieval success or failure at time of JOL) rather than on other less diagnostic intrinsic or extrinsic cues. Because retrieval of a paired associate is almost always successful immediately after study, mnemonic cues cannot discriminate be- tween items; therefore, immediate JOLs are necessarily based on less effective intrinsic cues. The role of target retrieval in JOL accuracy might be related to recent findings by Maki (1998b). In her study, students read brief passages of narrative texts and then made predictions about their future test performance. JOLs occurred either immediately after reading a passage, or they were delayed until all the texts had been read. The time of test (immediate vs. delayed) also was manipu- lated. Surprisingly, mean Gs for delayed JOLs were very low (ranging from about .02 to .20). In contrast to Nelson and Dun- losky's (1991) findings, predictive accuracy was best for immedi- ate JOLs and tests. Why was delayed JOL accuracy worse than immediate JOL accuracy in text comprehension monitoring? Maki's (1998b) par- adigm differed from Nelson and Dunlosky's (1991) procedures in several ways. One important difference involved the retrievability of target information at time of JOL. Using paired associates, Nelson and Dunlosky reported that 95% of their participants attempted to recall the target word during delayed JOLs. It seems unlikely, however, that Maki's participants would be able to make a similar retrieval attempt for all the propositions in a target text. Perhaps text stimuli are not well-suited for a recall attempt at time of JOL, and as a result, delayed JOLs were based on other, less diagnostic cues. If so, increased delayed JOL accuracy may not be robust across metacognitive domains in which retrieval attempts of target information is more difficult. Thus, one goal of the present study was to test whether the delayed JOL effect would generalize beyond paired associate stimuli. The Phrasing of Metamemory Cues Previous research has shown that the phrasing of metamemory cues can affect memory monitoring accuracy. Widner and Smith (1996) manipulated the type of cue used in a feeling-of-knowing paradigm. They asked participants to answer a series of trivia questions and then provide feeling-of-knowing judgments for items answered incorrectly. These judgments were compared with subsequent recognition performance. Widner and Smith examined the relative effectiveness of three different metamemory cues: (a) recognition-based cues (e.g., "Will you be able to identify a currently nonrecallable answer on a recognition test?"), (b) knowledge-based cues (e.g., "Do you feel that you really do know a currently nonrecallable answer?"), and (c) composite cues, which combined recognition and knowledge cues. They found that metamemory accuracy was highest (G = .49) when recogni- tion alone was emphasized. Monitoring accuracy following knowledge-based and composite cues was significantly lower (mean Gs from .12 to .15). Similar results have been obtained in text comprehension mon- itoring experiments. Maki and Swett (1987) found that metamemory for text was better when the cue asked for memory predictions, as opposed to importance ratings, even though the correlation between the two types of ratings was substantial (Pear- son's r = .74 and .61 for two texts). In a related finding, Maki and Serra (1992) showed that test predictions were more accurate than judgments of comprehension immediately after reading texts. Thus, studies in both feeling-of-knowing and text comprehension monitoring tasks suggest that metamemory cues emphasizing pre- diction tend to be better than those emphasizing current knowl- edge, importance, or comprehension. 802 KELEMEN Despite these findings, the phrasing of metamemory cues in JOL studies has not received much attention. In most research, only the predictive aspect of JOLs is emphasized. Dunlosky and Nelson (1994), however, did study immediate and delayed JOLs using two different measurement scales. In Experiment 1, they elicited judg- ments using a 6-point percentage scale ranging from 0% (definitely won't recall) to 100% (definitely will recall). In Experiment 2, they asked participants to judge how well they had learned certain items: JOLs ranged from 1 (not learned at all) to 6 (extremely well learned). The percentage scale emphasized prediction, whereas the latter scale was knowledge based. Even though participants were asked to evaluate their performance in different ways between experiments, Dunlosky and Nelson obtained a large delayed JOL effect in both cases. Using the predictive cue, overall G = .18 for immediate JOLs, and G = .83 for delayed JOLs; using knowledge- based cue, Gs = .21 and .83, respectively. The type of cue was not of primary interest to Dunlosky and Nelson, however, and so the procedures of Experiment 1 and 2 varied considerably. Neverthe- less, this study demonstrated that increased delayed JOL accuracy can be obtained with paired associates using both predictive and knowledge-based metamemory cues. There were two primary objectives in the present study. As noted earlier, one purpose was to examine the generalizability of Nelson and Dunlosky's (1991) delayed JOL effect. Although the effect has been shown to be large in magnitude and robust across encoding conditions (Dunlosky & Nelson, 1994), Maki (1998b) recently failed to obtain an advantage for delayed JOLs in text comprehension monitoring. This may reflect procedural differ- ences between experiments, or it may suggest that high delayed JOL accuracy is limited to studies using stimuli that are readily retrievable (e.g., paired associates). To examine this issue, a categorized list-learning task was devised (cf. Hertzog, Dixon, & Hultsch, 1990). Participants learned lists of six related items presented in categories (e.g., A type of fuel: petroleum, alcohol, butane, water, uranium, and charcoal). After study, participants were shown the category title (A type of fuel) and asked to make a judgment of future recall. JOLs occurred either immediately after study, or they were de- layed until all the items had been studied. At the end of the experiment, participants were shown the category title and asked to recall the exemplars they had studied. Thus, the basic procedures resembled those used by Nelson and Dunlosky (1991), except that the stimuli were categorized lists of items rather than paired associates. Participants may base their JOLs on a variety of information. Koriat (1997) found that JOLs became more accurate when par- ticipants relied primarily on mnemonic factors such as the outcome of previous retrieval attempts. If participants in the present study based their delayed JOLs on a retrieval attempt, then delayed JOLs should be more closely related to future recall than immediate JOLs. If participants based their delayed judgments on nonmne- monic information, however, then no increase in delayed JOL accuracy would be expected. Almost all participants reported making retrieval attempts during delayed JOLs in Nelson and Dunlosky's (1991) original study, but participants in the present study might not use the same strategy. Because the retrieval of six category exemplars is more difficult than remembering a single target item, participants might choose to base both immediate and delayed JOLs on nonmnemonic information, thus producing sim- ilar levels of metacognitive accuracy for each type of judgment. The second goal of this study was to determine whether the phrasing of metamemory cues affects immediate and delayed JOL accuracy. Previous research on feelings-of-knowing (Widner & Smith, 1996) and text comprehension monitoring (Maki & Serra, 1992; Maki & Swett, 1987) has shown that prediction-based cues are more effective than knowledge-based (or comprehension- based) cues. Both of these metamemory paradigms typically in- volve judgments about stimuli that cannot be completely recalled at time of JOL. In contrast, target items were potentially retriev- able at time of JOL in the present study. Thus, knowledge-based JOLs might be more effective than predictive JOLs because the former require participants to consider mnemonic-based informa- tion, whereas the latter do not. The type of metamemory cue varied between and within experiments: (a) Experiment 1 used a standard, prediction-based metamemory cue; (b) Experiment 2 used a knowledge-based cue; and (c) Experiment 3 compared the efficacy of prediction-based cues, knowledge-based cues, and combined (knowledge and prediction) cues. Experiment 1 Previous research using paired associates has shown increased metamemory accuracy when there is a delay of at least 30 s between study and JOL (Nelson & Dunlosky, 1991). Conversely, research in text comprehension monitoring has failed to replicate this finding (Maki, 1998b). The purpose of Experiment 1 was to determine (a) whether students could produce reliable metacogni- tive accuracy for categories of related items and (b) whether delays improve the accuracy of judgments in this task. Method Participants and materials. Thirty-six undergraduates volunteered for Experiment 1 and received course credit. All participants were tested in individual cubicles. Experimental stimuli were presented on IBM- compatible PCs using Micro Experimental Laboratory software, Ver- sion 2.0 (Schneider, 1995). Twenty-four categories of related items, each containing 6 exemplars, were taken from Battig and Montague's (1969) norms (e.g., A type of fuel: petroleum, alcohol, butane, water, uranium, and charcoal). Categories were classified as either easy (frequency of associ- ation > 100), medium (frequency of association ranged from SO to 100), or difficult (frequency of association < 50), based on the norms reported by Battig and Montague. Eight categories of items were constructed for each level of difficulty to maximize metacognitive discriminability (see Nelson, Leonesio, Landwehr, & Narens, 1986, and Schwartz & Metcalfe, 1994, for discussions of potential measurement difficulties caused by a restricted range of stimuli). Design and procedure. Participants provided a single JOL for each category. The timing of these judgments was manipulated within subjects. Immediate JOLs occurred directly after study of all items in a category was complete; delayed judgments occurred after all 24 categories had been studied. These procedures ensured that at least 11 items intervened between study and judgment in the delayed JOL condition. On the basis of the rate of item presentation and average JOL latencies, delays in this condition ranged from 1 to 9 min, with an average of 5 min. Before beginning the experiment, instructions were read aloud to all participants and also appeared on the computer screen. Participants were instructed to study the items in all 24 categories so that on a future memory test they would be able to recall the items when prompted by the category heading. An additional 4 categories (and items) appeared at the beginning METAMEMORY CUES AND MONITORING ACCURACY 803 of the experiment to serve as a memory buffer, and these stimuli were not included on the final test. Participants were not informed that they would not be tested on these items. Participants studied items from one category at a time; the presentation order of categories, and items within the categories, was randomized for each participant. Categories were pseudorandomly assigned to condition, with the restriction that 12 categories received immediate JOLs and 12 received delayed JOLs. During study, the category heading was shown at the top of the screen in capital letters, and exemplars appeared sequentially for 4 s each. The exemplars were printed in lowercase letters. For example, A TYPE OF FUEL appeared on top of the screen, and petroleum, alcohol, butane, water, uranium, and charcoal appeared one at a time below the title. The order of the six exemplars listed in each category was randomly determined for each participant. In the immediate JOL condition, the category heading remained on the screen and the following cue appeared after the last example was shown: "How confident are you that in about 10 minutes from now you will be able to recall the members of this category when shown the category name?" Participants rated their certainty on a 6-point percentage scale ranging from 0% (definitely will not recall), to 100% certain (definitely will recall). In the delayed JOL condition, study of the category and exemplars was followed by the instructions, "Press the spacebar to continue." The next category then appeared. After participants studied all 24 categories, they made delayed JOLs for the 12 categories that had not been judged previously. The category heading appeared on the screen, and the same cue phrase and percentage scales were shown. There was no restriction on the amount of time allowed for participants to make their JOLs. After participants studied and rated all 24 categories (plus the 4 buffer categories), they completed a distracting filler activity for 9 min. Finally, participants received a memory test. The category heading appeared at the top of the screen, and participants were instructed to type in all the exemplars that they remembered studying. The order of category presen- tation was randomly determined for each participant. To minimize the role of misspellings, the computer scored responses as correct if the first three letters entered by participants matched the first three letters of the category members (the first three letters of the exemplars within each category were unique). Participants were allowed as much time as necessary to complete the memory test. Results and Discussion Although the dependent measure of primary interest was metamemory accuracy, recall on the memory test was analyzed along with the magnitude and latency of JOLs. All tests of statis- tical reliability were conducted at p < .05. Recall, JOL magnitude, and JOL latency. Descriptive statis- tics for participants' recall on the memory test performance and JOLs are listed in Table 1. For ease of comparison, both recall and the magnitude of JOLs are shown as proportions. The magnitude of JOLs was reliably higher in the immediate JOL condition (.52) compared with delayed JOLs (.43), r(35) = 4.19, p < .001. There was no reliable difference in the time required to produce JOLs (i.e., JOL latency) between conditions. Recall on the memory test did not differ reliably between conditions (.48 vs. .46). Thus, participants predicted that they would remember more items when JOLs occurred immediately after study, but test performance did not differ between conditions. Metamemory accuracy. Several measures have been devel- oped to assess metamemory accuracy. These indices measure either absolute accuracy (sometimes called "calibration") or rela- tive, item-by-item accuracy (sometimes called "resolution"). Con- sistent with the recommendations of some metamemory research- ers (Koriat, 1997; Maki, 1998a; Weaver & Kelemen, 1997), measures of both absolute and relative monitoring accuracy were calculated. Absolute accuracy was assessed with bias scores. The bias index reflects participants' overall overconfidence or underconfidence in a particular condition (Yates, 1990). Bias scores were derived by obtaining the signed difference between mean JOL magnitude and mean recall performance in each condition for each participant. A score greater than 0 indicates overconfidence, and a score less than 0 indicates underconfidence. Taken alone, the magnitude of mean bias scores across participants was not reliably different from zero in either condition (bias = .04 for immediate JOLs, and bias = .04 for delayed JOLs.). This indicates that, in general, participants were not reliably underconfident or overconfident. However, the within-subject difference in bias between immediate versus delayed JOLs was reliable, r(35) = 3.08, p = .004. This probably reflects the decrease in JOL magnitude after delays noted previously. The reliability of participants' bias scores across items was very high, Cronbach's alpha coefficient = .92. In short, the magnitude of under- or overconfidence within each condition was not reliably different from zero, but the degree of bias between conditions was statistically reliable. A second aspect of metamemory, relative monitoring accuracy, was assessed by computing Goodman-Kruskal Gamma correla- Table 1 Mean Judgment of Learning (JOL) Magnitude, Recall, and Metamemory Accuracy in Experiments 1 and 2 JOL Monitoring accuracy Condition Experiment 1 Immediate JOL Delayed JOL Experiment 2 Immediate JOL Delayed JOL Recall .48 (.11) .46 (.15) 3.12(1.09) 3.08 (.98) Magnitude .52* (.16) .43* (.17) 4.37* (.69) 3.57* (.77) Latency 5,835 (1,810) 5,187(2,194) 8,713* (3,981) 10,072* (4,572) Bias .04* (.19) -. 04* (.20) 1.29* (.90) 0.55* (.66) Gamma .44 (.36) .48 (.40) .48* (.39) .80* (.21) Note. Main entries are mean values; entries in parentheses are standard deviations. Latency is the time (in milliseconds) required for participants to enter their JOL from the onset of the metamemory prompt. Asterisks indicate that a statistically reliable difference (/? < .05) was observed between immediate versus delayed JOLs. 804 KELEMEN tions between predicted and actual test performance for each participant. G is a measure of ordinal association, and it is the preferred index of relative metamemory accuracy (Nelson, 1984, 19%; Wright, 1996). Mean Gs by condition are listed in Table 1. Reliable (nonzero) predictive accuracy was observed in both con- ditions, f(35) = 7.42, p < .001 (immediate JOLs), and f(35) = 7.21, p < .001 (delayed JOLs), confirming that partici- pants predicted which categories they would recall at a level greater than chance. More important, though, no reliable improve- ment was found for delayed JOLs (G = .48) compared with immediate JOLs (G = .44), t(35) = 0.43, p = .670. Summary. Experiment 1 showed that JOL magnitude was higher immediately after study compared with after a delay, but test performance itself did not differ between conditions. Consis- tent with these findings, a reliable difference in bias was observed between conditions, though bias scores considered separately were not reliably nonzero. Participants did show nonzero, modest Gs in both conditions, indicating reliable metamemory accuracy. Con- trary to previous findings (Nelson & Dunlosky, 1991), however, memory monitoring accuracy was not reliably better for delayed JOLs compared with immediate JOLs. In other words, no delayed JOL effect was observed. Experiment 2 examined one possible cause of these findings. Experiment 2 Delayed JOL accuracy was not higher than immediate JOL accuracy in Experiment 1. One possible explanation is that partic- ipants may not have based their JOLs on an actual retrieval attempt of the target information. Because the JOL prompt was prediction based, participants may have used any number of other cues (e.g., familiarity with the category title, concreteness of the title or exemplars, etc.) as a basis for their ratings. In a paired associate task, presenting the cue word alone at time of JOL may be sufficient to elicit a target retrieval attempt, regardless of the phrasing of the JOL prompt (e.g., Dunlosky & Nelson, 1994). However, in the present categorized-list learning task, participants may have been less likely to make a recall attempt of all six items at time of JOL. In a paired associate task, utilizing highly diagnostic mnemonic cues for delayed JOLs is quick, easy, and relatively automatic: A student reads the cue word and attempts to recall its associate. Obtaining similar mnemonic cues for the categories in Experi- ment 1, however, required that participants generate as many of the six exemplars as they could and then tally the total number of items recalled for each category. Clearly, this requires more cog- nitive effort than retrieving a single target word. Rather than attempting to retrieve all six exemplars, students may have utilized more readily available (but less diagnostic) intrinsic cues for delayed JOLs (e.g., judgments of ease based on the category heading, similarity of the exemplars, etc.). Intrinsic information is often used for immediate JOLs when mnemonic cues are less readily available (Koriat, 1997). If participants did not make retrieval attempts at delays, this could explain the similar levels of metamemory accuracy for immediate and delayed JOLs in Experiment 1. Experiment 2 was designed to encourage participants to utilize mnemonic cues at time of JOL. Specifically, the JOL prompt was changed from an expression of confidence to a direct estimate of retrieval. For both immediate and delayed JOLs, participants were asked, "How many members of the category above can you cur- rently recall?" This JOL cue should produce several clear results. Compared with delayed JOLs, the magnitude of immediate JOLs should be greater because the stimuli were presented just seconds before. Similarly, bias scores should be greater for immediate JOLs compared with delayed JOLs. The critical question is whether relative metamemory accuracy (G) will improve after a delay. If so, this would suggest that the modest delayed JOL accuracy in Experiment 1 was produced by a reliance on less effective, nonmnemonic cues. Method Participants and materials. Forty-two undergraduates volunteered for Experiment 2. None of these individuals had participated previously, and all volunteers received course credit. The stimuli were identical to those in Experiment 1. Design and procedure. The timing of JOLs (immediate vs. delayed) was manipulated within subjects. The methodology of Experiment 2 was identical to the first experiment except for the phrasing of the JOL cue. In Experiment 2, a knowledge-based cue was used. Participants were asked, "How many members of this category can you currently recall?" Partici- pants chose one of six options (0, 1, 2, 3,4, or 5+) to maintain a consistent 6-point rating scale between Experiments 1 and 2. Participants were not required to enter the recallable stimuli themselves at time of JOL; rather, they were asked to report only the total number of exemplars they could recall. Results and Discussion Recall, JOL magnitude, and JOL latency. Descriptive statis- tics for recall, JOL magnitude, and JOL latency are listed in Table 1. A( test for paired samples indicated that recall did not differ reliably between conditions (mean recall = 3.12 for imme- diate JOLs and 3.08 for delayed JOLs). As predicted, the mean number of items participants claimed to remember was greater immediately after study compared with after a delay (4.37 vs. 3.57, respectively), f(41) = 9.40,/> < .001. The latency to produce JOLs also differed reliably between conditions, f(41) = -2. 52, p = .016, with participants taking longer to produce delayed JOLs than immediate JOLs. Metamemory accuracy. Bias scores were obtained by calcu- lating the mean number of items reported at JOL minus the number of items correctly recalled on the final memory test (see Table 1). As expected, participants claimed to remember more items at time of JOL than they actually recalled on the memory test. Bias scores were positive and reliably nonzero in both conditions: immediate JOLs, r(41) = 9.89, p < .001, and delayed JOLs, f(41) = 5.95, p < .001. In addition, bias scores were reliably higher for immediate judgments compared with delayed JOLs, f(41) = 7.04, p < .001. In sum, participants always claimed to remember more items at time of JOL than they could eventually recall on the test, and this discrepancy was larger immediately after study. Relative metamemory accuracy was assessed by computing G between JOL magnitude and recall for each participant. G was undefined in one or both conditions for 8 participants because of a lack of variability in their responses. Six participants used a single JOL rating for all immediate JOLs, 1 used a single rating for all delayed JOLs, and 1 participant used a single rating for all JOLs METAMEMORY CUES AND MONITORING ACCURACY 805 (both immediate and delayed). Participants in both conditions showed reliable (nonzero) Gs: t(34) = 7.36, p < .001 (immediate JOLs), and r(39) = 23.99, p < .001 (delayed JOLs). More impor- tant, a large delayed JOL effect emerged (see Table 1). Delayed judgments were significantly more accurate (G = .80) than im- mediate predictions (G = .48), f(33) = 3.84, p < .001. Summary. As hypothesized, participants reported better mem- ory for items at time of JOL than they ultimately showed on the memory test about 10 min later. This probably reflects normal forgetting and was not the focus of this experiment. The variable of primary interest in Experiment 2 was relative (item-by-item) metamemory accuracy, G. Unlike Experiment 1, delayed JOLs were reliably more accurate than immediate JOLs (Gs = .80 vs. .48, respectively). This is consistent with the original delayed JOL effect reported by Nelson and Dunlosky (1991) for paired associ- ates. By using knowledge-based JOL cues (rather than prediction- based cues), delayed judgments were more accurate than immedi- ate judgments. Reliable metamemory for categorized lists was observed in Experiments 1 and 2, but increased delayed JOL accuracy was observed only when participants were specifically cued to use then- current recall as a basis for JOLs (Experiment 2). Although it was impossible to directly confirm that participants made a recall attempt at time of JOL, analyses of JOL latencies from both experiments were consistent with this claim. First, participants took significantly longer to make delayed judgments compared with immediate judgments in Experiment 2 (see Table 1). If participants were making a retrieval attempt at time of JOL, it should have taken longer to recall the information after a delay than immediately after study. No reliable difference in JOL laten- cies was observed in Experiment 1, however, when participants were not asked to make a retrieval attempt. Second, there was a substantial increase in overall JOL latencies from Experiment 1 to Experiment 2. This finding is also consistent with the notion that participants made quick, nonretrieval-based JOLs in Experiment 1 compared with longer, knowledge-based JOLs in Experiment 2. In Experiment 2, participants assessed their memory for the six exemplars on a scale labeled 0, 1, 2, 3,4, and S+. Options S and 6 were collapsed into the 5+ rating to maintain the 6-point rating scale used in Experiment 1. Because more items were likely to be accessible immediately after study, one possibility is that collaps- ing the upper end of the scale may have distorted predictive accuracy for immediate JOLs. Although unlikely, it is possible that a reliable difference in JOL accuracy emerged because the scale selectively reduced immediate JOL accuracy, not because predic- tive accuracy genuinely increased after delays. This interpretation seems unlikely for two reasons. First, the mean G for immediate JOLs in Experiment 1 is consistent with that observed in Experiment 2 (.44 vs. .48, respectively). This suggests that immediate JOL accuracy was not reduced as a consequence of using the 5+ rating. Second, because G uses ordinal information, any loss of resolution due to the 5+ rating would have occurred when comparing perfectly recalled categories given JOLs of 5+ with those containing five correct items given ratings of 5+. Few categories were perfectly recalled in either condition, and those that were perfectly recalled were distributed evenly (8% of immediate JOL categories and 9% of delayed JOL categories). Thus, the argument that the rating scale selectively reduced immediate JOL accuracy is not tenable. Nevertheless, the 5+ rating was eliminated in Experiment 3. Experiment 3 Taken together, Experiments 1 and 2 provide evidence that delayed JOL accuracy is higher when knowledge-based cues rather than prediction-based cues are used. However, these two cues also differed in the type of scale that was presented to participants. Consistent with previous research, a percentage scale (0%, 20%, 40%, 60%, 80%, or 100% confident) was used to elicit prediction JOLs in Experiment 1. In contrast, participants in Experiment 2 provided knowledge-based JOLs using the raw number of items that they could currently recall (0, 1, 2, 3, 4, or 5+). One possi- bility is that delayed JOLs are not sensitive to prediction- versus knowledge-based cues; rather, delayed JOLs may simply be more accurate when raw numbers are used, compared with percentages. Experiment 3 examined this possibility. In Condition A, a prediction cue based on a percentage scale (0%, 20%, 40%, 60%, 80%, or 100% confident) was used; this was the same cue included in Experiment 1. In Condition B, a prediction cue based on the actual number of items to be remembered (1, 2, 3, 4, 5 or 6) was devised. 1 In Condition C, JOLs were elicited with a knowledge cue based on the actual number of items to be remembered (1, 2, 3,4, S or 6); this was the same type of cue used in Experiment 2. If JOLs are sensitive to the scale (percentages vs. actual numbers), then performance in Conditions A and B should differ. For exam- ple, if participants were more accurate using actual numbers rather than percentages, then monitoring accuracy in Condition B should be higher than in Condition A. Alternatively, if JOLs are sensitive to prediction- versus knowledge-based cues (independent of the scale used), then there should be no difference between Conditions A and B, and accuracy in Condition C should be highest. In Condition D, knowledge and prediction cues were combined. Participants in Condition D provided two separate JOLs. First, they were asked how many items they could currently recall (1, 2, 3, 4, S, or 6)the knowledge component. After making this judgment, participants were asked to predict how many items they would be able to recall on the memory test (1, 2, 3,4, 5 or 6)the prediction component. Both components were included to deter- mine whether participants would incorporate information derived from knowledge-based judgments into their subsequent predictive JOLs. In sum, there were three primary concerns in Experiment 3. First, it was important to replicate the delayed JOL effect observed in Experiment 2. Second, Experiment 3 was designed to determine whether the increase in delayed JOL accuracy was due to a change in the type of scale (from percentages to numbers) or to a change from predictive cues to knowledge-based cues. If delayed JOL accuracy improved in Experiment 2 solely because a number scale was used instead of a percentage scale, then JOL accuracy should be higher in Condition B than in Condition A. Alternatively, if delayed JOL accuracy increased because the cue was changed 1 The rating scale from Experiment 2 (0, 1,2, 3, 4, 5+) was changed. A 6-point scale was preserved, but in the values in Experiment 3 ranged from 1 to 6. This change was made to eliminate possible difficulties associated with the 5+ rating and because the 0 rating was used less than 2% of the time in Experiment 2. 806 KELEMEN from prediction based to knowledge based, then (a) there should be no difference between Conditions A and B because both are prediction based, and (b) JOL accuracy in knowledge-based Con- dition C should be greater than both prediction-based conditions (A and B). The third important issue in Experiment 3 was to test whether prediction-based cues could produce high delayed JOL accuracy if participants were forced to first consider their current state of knowledge. If so, predictive judgments preceded by knowledge-based JOLs in Condition D should be more accurate than predictive judgments alone in Condition B. 2 Method Participants and materials. A total of 120 undergraduates volunteered to participate in Experiment 3. All volunteers received course credit, and none of the participants had completed previous versions of this experi- ment. The same categorized lists from Experiments 1 and 2 were used in Experiment 3. Design and procedure. A mixed design was used in Experiment 3. The timing of judgments (immediate or delayed) was manipulated within subjects, and the phrasing of the JOL cue was a between-subjects manip- ulation. Each participant received one of the following four JOL cues. In Condition A, a prediction cue based on a percentage scale was used. Participants were asked, "How confident are you that in about 10 minutes from now you will be able to recall the members of this category when shown the category name?" Participants rated their certainty on a 6-point percentage scale ranging from 0% (definitely will not recall) to 100% definitely will recalt). In Condition B, a prediction cue based on the actual number of items to be recalled was devised. Participants were asked, "How many members of this category will you recall on a test in about 10 minutes?" and they entered a number from 1 to 6. Thus, both Conditions A and B were prediction based; the difference was whether a percentage scale or the actual number of items was used. In Condition C, a knowledge- based cue was included. Participants were asked, "How many members of this category can you currently recall?" and they entered a number from 1 to 6. Thus, the difference between Conditions B and C was whether the JOL cue referred to prediction or to current knowledge. Finally, in Con- dition D, both knowledge and prediction cues were used. Participants made two judgments. First, they were asked, "How many members of this category can you currently recall?" Immediately after answering this question, a second prompt appeared, asking, "How many members of this category will you recall on a test in about 10 minutes?" Participants answered both questions by entering a number from 1 to 6. On arrival, each participant selected one of four cards for assignment to the between-subject condition. Using these procedures, 25 participants were assigned to Condition A, 33 to Condition B, 30 to Condition C, and 32 to Condition D. After being assigned to a condition and receiving instructions, participants completed the study, JOL (immediate and de- layed), distractor, and test procedures as in Experiments 1 and 2. Results and Discussion Recall, JOL magnitude, and JOL latency. Mean values for recall and JOL magnitude were converted to proportions and are listed in Table 2. Recall scores were similar regardless of the timing or phrasing of the JOL cue. A 2 X 4 (Timing X Phrasing of JOL Cue) multivariate analysis of variance (MANOVA) was conducted on recall. No reliable main effects emerged, and the interaction was not significant (all ps > .05), suggesting that the timing and phrasing of JOL cues had no effect on mean recall. These results are consistent with Experiments 1 and 2. The magnitude of JOLs was examined across conditions. Con- sistent with Experiments 1 and 2, the magnitude of JOLs appeared to be greater immediately after study than after a delay (see Table 2). A 2 X 4 MANOVA confirmed a reliable main effect of JOL timing, F{\, 116) = 134, MSE = .01. 3 As in Experiments 1 and 2, the magnitude of JOLs was higher immediately after study com- pared with after a delay. The magnitude of JOLs also differed reliably across Conditions A through D, F(3,116) = 3.56, MSE = .03. The interaction between variables was not reliable. Because the interaction term was nonsignificant, immediate and delayed JOL ratings were collapsed for post hoc comparisons. Dunnett's C procedure revealed that the magnitude of judgments was higher when participants were asked how many items they could cur- rently recall (knowledge based, Condition C) than when partici- pants predicted future recall in Condition D. Participants in Condition D made both knowledge and predic- tion JOLs. Two planned comparisons were conducted to determine whether participants adjusted the magnitude of their two judg- ments in Condition D. For immediate JOLs, the initial, knowledge- based judgments were reliably higher than subsequent predictive judgments, /(31) = 5.95, p < .001. An analogous, reliable differ- ence was also found for delayed JOLs, /(31) = 3.07, p = .004. Participants' first JOL (based on current knowledge) was higher than their second JOL (predicting future performance). Thus, par- ticipants decreased the magnitude of their judgments to reflect estimates of future forgetting. The latency to provide JOLs was calculated for each participant, and these mean values also appear in Table 2. Reliable main effects were observed for timing and phrasing of the JOL cues, F(l , 116) = 5.72, MSE = 4,690, and F(3, 116) = 32.2, MSE = 14,893, respectively. The interaction between variables also was statistically reliable. Inspection of the mean values sug- gests that latencies were longer for knowledge-based JOLs. Post hoc tests confirmed that for both immediate and delayed JOLs, latencies in Condition C were reliably longer than in Conditions A, B, and the predictive component of D. This finding is consistent with the notion that participants in the prediction conditions used a different basis for their JOLs, whereas participants in the knowledge-based conditions made a retrieval-based attempt that took longer. Metamemory accuracy. As in Experiments 1 and 2, bias scores were calculated to examine differences between the mag- nitude of participants' JOLs and their recall. Mean values across conditions are reported in Table 2. Mean bias scores were positive 2 The four groups included in Experiment 3 represent combinations of two variables: type of scale (number vs. percentage) and type of cue (prediction based vs. knowledge based). Four conditions were devised to test the primary concerns discussed previously, including Condition D, which contained two types of JOLs. All possible combinations of variables were not tested, however (e.g., prediction-based percentage scales pre- ceded by knowledge-based number scales were not included), so it is not possible to evaluate other potential interactions between these variables. 3 In this MANOVA, and in subsequent MANOVAs reported for JOL response latencies, bias scores, and G, performance was examined as a function of JOL timing (immediate vs. delayed) and phrasing (Conditions A through D). Participants made two JOLs in Condition D, but the primary concern was with the latter, predictive component. The initial, knowledge component of Condition D was essentially a replication of Condition C, and so it was omitted from these MANOVAs and considered separately in a subsequent analysis. METAMEMORY CUES AND MONITORING ACCURACY 807 Table 2 Mean Judgment of Learning (JOL) Magnitude, Recall, and Bias Scores in Experiment 3 Condition A. Prediction based (percentage) Immediate JOL Delayed JOL B. Prediction based (number) Immediate JOL Delayed JOL C. Knowledge based (number) Immediate JOL Delayed JOL D. Combined (number) Knowledge component Immediate JOL Delayed JOL Prediction component Immediate JOL Delayed JOL Recall .42 (.12) .40 (.14) .44 (.13) .45 (.15) .46 (.16) .46 (.15) .47 (.14) .46 (.14) .47 (.14) .46 (.14) Magnitude .60 (.20) .49 (.18) .62 (.13) .54 (.13) .72 (.09) .57 (.09) .72 (.11) .55 (.11) .61 (.12) .51 (.10) JOL Latency 5,288 (2,436) 5,719 (1,400) 5,309 (2,054) 5,903 (3,822) 8,084 (4,071) 11,706(5,250) 8,519(3,459) 12,488 (4,438) 4,165 (2,264) 2,208(1,138) Bias .18 (.22) .08 (.25) .18 (.18) .09 (.16) .26 (.14) .11 (.12) .25 (.14) .09 (.11) .14 (.18) .05 (.12) Note. Main entries are mean values; entries in parentheses are standard deviations. Latency is the time (in milliseconds) required for participants to enter their JOL from the onset of the metamemory prompt. in all conditions, indicating that the magnitude of JOLs was greater than recall on the memory test. A 2 X 4 MANOVA was con- ducted, and a reliable main effect of delay was observed, F(l, 116) = 78.6, MSE = .01. No reliable effect emerged for the phrasing of the JOL cue, and the interaction between variables was not significant. Thus, bias scores were lower after delays, regard- less of whether the JOLs were elicited by knowledge- or prediction-based cues. Relative metamemory accuracy was assessed by computing G between JOL magnitude and recall. Mean values across conditions are shown in Figure 1. G was undefined for two participants in Condition A because they used the same percentage rating for all delayed JOLs. Data from these participants were omitted from the following analyses. A 2 X 4 MANOVA was conducted, and both main effects (timing of JOL and phrasing of JOL) were reliable, F(l , 114) = 17.51, MSE = .08, and F(3,114) = 8.13, MSE = .09, respectively. The interaction between variables also was reliable, F(l, 114) = 3.12, MSE = .08. Dunnett's C post hoc test showed no differences in immediate JOL accuracy across all four condi- tions. For delayed JOLs, however, G was reliably higher in Con- ditions C and D (knowledge-based JOLs) than in Conditions A and B (prediction-based JOLs). Another important question concerned which conditions, if any, produced a reliable delayed JOL effect. No statistically reliable differences between immediate and delayed JOL accuracy were observed in Conditions A and B (see filled circle and triangle, respectively, in Figure 1). However, paired t tests showed a reli- able delayed JOL effect in Condition C and for both types of judgments in Condition D (all three ps < .001). Thus, the results of Experiments 1 and 2 were replicated. Summary. Three main hypotheses were explored in Experi- ment 3. First, it was predicted that a reliable delayed JOL effect should be observed in Condition C because this condition repli- cated the procedures of Experiment 2. Delayed JOL accuracy in Condition C was indeed higher (G = .81) than immediate JOL accuracy (G = .50), suggesting that the delayed JOL effect does extend to categories of six items when knowledge-based cues are used. Second, the source of this improvement in delayed JOL accu- racy was examined. In particular, the possibility that changing from a percentage scale to one including the raw number of items may have caused this change was tested. The predictive cue in Condition A was elicited using a percentage scale (cf. Experiment 1); the cue in Condition B also was predictive, but it was based on the number of items to be recalled. No reliable differences in monitoring accuracy (bias and G) emerged between these two conditions. Thus, the shift from percentages to raw numbers in Experiments 1 and 2 probably cannot account for the observed differences in delayed JOL accuracy. The relative efficacy of predictive- versus knowledge-based cues was examined next. If the change from prediction-based to knowledge-based cues was critical, then delayed JOL accuracy should have been higher in Condition C than in Conditions A and B. This reliable pattern of results was obtained in Experiment 3. Delayed JOL accuracy was reliably higher using knowledge-based cues (Condition C) than prediction-based cues (Conditions A andB). The third important issue involved the between-groups compar- ison of predictive monitoring accuracy in Condition B with the predictive component of Condition D. The phrasing of JOLs in both cases was identical, and both judgments were based on the number of items participants predicted they would recall. The difference was that participants in Condition D provided a knowledge-based JOL before making their predictions, whereas participants in Condition B did not. A comparison of the open and filled triangles in Figure 1 shows that large differences in delayed JOL accuracy were obtained between conditions. A reliable de- layed JOL effect emerged when participants made a recall attempt (i.e., knowledge-based JOL) prior to their predictions but not when predictions were provided alone. 808 KELEMEN 1.0 -i 0.8 - 0.6 - 0.4 - 0.2- 0.0 Condition: C - Knowledge (number) D - Knowledge (number) 0 - Prediction (number) A - Prediction (percentage) B - Prediction (number) Immediate Delayed Timing of JOL Figure 1. Mean Gamma correlations in Experiment 3 as a function of delay and type of judgment of learning (JOL) cue. An interesting pattern of results was obtained when the two JOLs in Condition D were compared. Participants did reduce the magnitude of their latter, predictive JOLs compared with their initial, knowledge-based JOLs. This suggests that participants in- corporated some theory of forgetting into their latter judgments. As a result, the magnitude of bias was reduced for the prediction- based JOLs. No differences in G, however, were observed between knowledge and predictive judgments. Thus, participants reduced the overall level of their JOLs from the first judgment to the second, but this reduction was consistent across all items. Partic- ipants did not selectively reduce their judgments for some items compared with others, and as a result, item-by-item monitoring accuracy (G) was unaffected. General Discussion The present study examined metamemory for categorized lists of items. Participants showed reliable (nonzero) relative monitor- ing accuracy (G) in all three experiments. Mean Gs for immediate JOLs were moderate (ranging from .44 to .52), consistent with the results of previous studies using paired associates (e.g., Dunlosky & Nelson, 1992, 1994; Kelemen & Weaver 1997; Nelson & Dunlosky, 1991) and text materials (Maki & Serra, 1992; Weaver, 1990) Thus, students appeared to monitor their memory for cate- gorized lists at a level comparable to that obtained using other verbal stimuli. One important question concerned whether the delayed JOL effect (Nelson & Dunlosky, 1991) would emerge in this novel task. Previous research on metamemory for more complex stimuli (i.e., texts) has failed to find an advantage for delayed judgments (Glenberg, Sanocki, Epstein, & Morris, 1987; Maki, 1998b). In- deed, all previous demonstrations of the delayed JOL effect had used paired-associate stimuli. Using a standard, prediction-based JOL prompt in Experiment 1, delayed JOLs were not reliably better than immediate JOLs. One reason for this may be that participants were unlikely to attempt retrieval of all six items when making their judgments. For delayed JOLs, the retrieval (or re- trieval failure) at time of JOL can be highly diagnostic of future test performance and may even alter it (Kelemen & Weaver, 1997; Spellman & Bjork, 1992, 1997). Such a retrieval attempt is straightforward using paired associates because the response term is a single item. When studying categorized lists, however, a retrieval attempt at time of JOL involved six items. Increased cognitive load in the latter case may have led participants to rely on more readily availablebut less diagnosticinformation. If so, encouraging students to make a retrieval attempt at time of judgment should increase delayed JOL accuracy. Evidence supporting these ideas was obtained in Experiments 2 and 3. In Experiment 2, a mnemonic, knowledge-based cue was used ("How many items can you currently recall?") as opposed to the common, prediction-based cue ("How confident are you in future recall?"). Thus, participants were asked to make a covert retrieval attempt at time of JOL and enter the number of items they could currently recall. Using a knowledge-based cue in Experi- ment 2 had three clear effects. First, the magnitude of judgments was greater immediately after study compared with after a delay. This probably reflects normal forgetting during the delay between study and JOL and is not of primary interest. More important, relative metamemory accuracy (G) increased substantially after a delay. In contrast to Experiment 1, delayed JOLs were highly accurate in Experiment 2. Third, the time required for participants to provide JOLs increased substantially in Experiment 2 (see Table 1). One interpretation of this finding is that participants were making an effortful, longer retrieval attempt during JOLs in Ex- periment 2 but not in Experiment 1. If so, then delayed JOL latencies should be longer than immediate JOL latencies only in Experiment 2 because it would be harder to recall the items after a delay. This reliable pattern of results was observed and was subsequently replicated in Experiment 3. Although the JOL la- tency data are intriguing, the relationship between them and the inferred cognitive processes is speculative. Experiment 3 eliminated the type of scale used (percentage vs. actual number of items) as a possible explanation for the contrast- ing findings in the first two experiments. No differences in G emerged between predictive JOLs based on a percentage scale compared with predictive JOLs based on the actual number of items to be recalled. When participants provided only one type of METAMEMORY CUES AND MONITORING ACCURACY 809 JOL after a delay, knowledge-based judgments were always more accurate than predictive judgments. Together, these data suggest that incorporating a delay between study and JOL may be neces- sary to improve monitoring accuracy, but it is not sufficient. Delays may be effective in improving metamemory accuracy only if retrieval-based judgments are elicited. These results can be interpreted using Koriat's (1997) cue- utilization framework. This view suggests that JOLs are based on a variety of cues and that the relative weight assigned to different types of information varies. JOLs provided immediately after study are based primarily on nonmnemonic cues and are moder- ately accurate. Conversely, when delayed JOLs are based on mnemonic information (e.g., a recall attempt), then predictive accuracy is high. In Experiment 1, delayed JOLs may not have been based on mnemonic information, and as a result, delayed JOL accuracy was moderate. When participants were explicitly cued to use mnemonic information (Experiments 2 and 3), delayed JOL accuracy improved significantly. Examination of Condition D in Experiment 3 provides further insight into the basis of JOLs. In this case, two judgments were provided: a knowledge-based JOL ("How many items can you recall?") followed by a predictive JOL ("How many items will you recall?"). The magnitude of predictive JOLs was reliably reduced in the latter judgments, suggesting that participants incorporated some theory about forgetting into these JOLs. The adjustment did not depend on the presence of previous knowledge-based JOLs because the magnitude of JOLs in Condition B was very similar. Thus, even when predictive JOLs were provided by themselves, they included some estimate of forgetting. These findings are consistent with Rawson, Dunlosky, and McDonald (2000), who demonstrated that predictions of performance for texts include estimates of retention. In the present study, participants did not appear to make retrieval-based JOLs unless specifically prompted. Participants clearly were able to make such attempts, and they were very accurate when they did so. In Condition D of Experiment 3, for example, participants appeared to incorporate the results of knowledge-based JOLs into their subsequent predictive judg- ments. Delayed monitoring accuracy was high (G .76) when knowledge-based JOLs preceded predictive JOLs (Condition D) but only modest when predictive JOLs occurred in isolation (G = .48 in Condition B). In sum, the results of Experiment 3 suggest that participants included a theory of forgetting in the magnitude of their JOLs, but they did not make a diagnostic retrieval attempt at time of JOL unless specifically prompted. Once retrieval was attempted, however, this knowledge was utilized effectively in subsequent predictive delayed JOLs. These results add to a growing body of evidence for the impact of metamemory cues on memory-monitoring accuracy. Differ- ences in the phrasing of metamemory cues can alter how partici- pants make their judgments in a number of domains. Widner and Smith (1996) observed that feeling-of-knowing (FOK) accuracy increased when task-relevant information was emphasized in the FOK cue. Studying text comprehension monitoring, Maki and Serra (1992) found that metamemory accuracy was higher for memory predictions compared with judgments of comprehension. These studies found that monitoring accuracy was better for predictive judgments compared with knowledge-based or comprehension-based judgments, respectively. In the present study, however, predictive judgments were less accurate than knowledge-based judgments. One explanation for this difference may be that predictive judgments are better when the target information cannot be re- called at the time of judgment. Widner and Smith's (1996) FOK task was based on the likelihood of future recognition for stimuli that could not be recalled at time of judgment. Similarly, in text comprehension monitoring, all of the to-be-remembered informa- tion about a passage of text cannot be brought to mind at time of JOL. Thus, using a knowledge-based cue when recall is not pos- sible appears to be less effective than using predictive cues. In the present study, participants were able to retrieve some or all of the items at time of JOL, and knowledge-based cues were more effective. Thus, one's current state of knowledge may be a good basis for future memory performance when at least some of the information can be recalled. If the target information is unavailable at time of judgment, then predictive cues may be more effective. Retrieval success or failure at time of JOL can be a very salient and effective cue for subsequent metamemory judgments. The present study demonstrates, however, that participants may not always use this information as a basis for their judgments. In this case, participants appeared to require specific prompting to use retrieval as a basis for their JOLs. Moreover, different types of judgments (e.g., immediate vs. delayed) are differentially sensitive to the phrasing of JOL cues. Future studies of memory monitoring should include carefully constructed metamemory cues because the type of diagnostic information utilized by participants can have a substantial impact on monitoring accuracy. From a practical standpoint, college students are obliged to acquire new information in a variety of forms. Sometimes, they may study paired associates (e.g., when learning a foreign lan- guage). More frequently, students read passages from a textbook and must monitor their comprehension of complex text material. In another common situation, students may be asked to learn lists (or categories) of related information (e.g., famous authors, important companies, lobes of the brain, etc.). These data show that students can monitor their memories for lists of items immediately after study at a level similar to that obtained using paired associates and texts. Moreover, delayed judgments can be highly accurate if an actual retrieval attempt is made at the time of JOL. Thus, Nelson and Dunlosky's (1991) delayed JOL effect is robust across stimuli. The highest levels of monitoring accuracy in this study were obtained when students made knowledge-based judgments of re- call after a delay. How much delay is enough? The present study used intervals of about 1 min or more between study and JOL, but other work by Kelemen and Weaver (1997) suggests that measur- able increases can be obtained after only a few seconds. Thus, a practical recommendation for improving metamemory in a variety of situations may be to include a brief delay between study and JOL. When assessing future memory for recallable information, students are well-advised to make delayed, retrieval-based JOLs. References Battig, W. F., & Montague, W. E. (1969). Category norms for verbal items in 56 categories: A replication and extension of the Connecticut category norms. Journal of Experimental Psychology Monographs, 80 (3, Pt. 2). Begg, I., Duft, S., Lalonde, P., Melnick, R., & Sanvito, J. (1989). Memory predictions are based on ease of processing. Journal of Memory and Language, 28, 610-632. 810 KELEMEN Benjamin, A. S., & Bjork, R. A. (1996). Retrieval fluency as a metacog- nitive index. In L. M. Reder (Ed.), Implicit memory and metacognition (pp. 309-338). Mahwah, NJ: Erlbaum. Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for judgments of learning (JOL) and the delayed-JOL effect. Memory & Cognition, 20, 374-380. Dunlosky, J., & Nelson, T. O. (1994). Does the sensitivity of judgments of learning (JOLs) to the effects of various study activities depend on when the JOLs occur? Journal of Memory and Language, 33, 545-565. Dunlosky, J., & Nelson, T. O. (1997). Similarity between the cue for judgments of learning (JOL) and the cue for test is not the primary determinant of JOL accuracy. Journal of Memory and Language, 36, 34-49. Flavell, J. H. (1981). Cognitive monitoring. In W. P. Dickson (Ed.), Children's oral communication skills (pp. 35-60). New York: Academic Press. Glenberg, A. M., Sanocki, T., Epstein, W., & Morris, C. (1987). Enhancing calibration of comprehension. Journal of Experimental Psychology: General, 116, 119-136. Hacker, D. J., Dunlosky, J., & Graesser, A. C. (Eds.). (1998). Metacogni- tion in educational theory and practice. Mahwah, NJ: Erlbaum. Hertzog, C, Dixon, R. A., & Hultsch, D. F. (1990). Relationships between metamemory, memory predictions, and memory task performance in adults. Psychology and Aging, 5, 215-227. Kelemen, W. L., & Weaver, C. A., HJ. (1997). Enhanced metamemory at delays: Why do judgments of learning improve over time? Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 1394- 1409. Koriat, A. (1997). Monitoring one's own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experi- mental Psychology: General, 126, 349-370. Maki, R. H. (1998a). Metacomprehension of text: Influence of absolute confidence level on bias and accuracy. Psychology of Learning and Motivation, 38, 223-248. Maki, R. H. (1998b) Predicting performance on text: Delayed versus immediate predictions and tests. Memory & Cognition, 26, 959-964. Maki, R. H., & Serra, M. (1992). The basis of test predictions for text material. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18, 116-126. Maki, R. H., & Swett, S. (1987). Metamemory for narrative text. Memory & Cognition, 15, 72-87. Nelson, T. O. (1984). A comparison of current measures of the accuracy of feeling-of-knowing predictions. Psychological Bulletin, 95, 109-133. Nelson, T. O. (19%). Gamma is a measure of the accuracy of predicting performance on one item relative to another item, not of the absolute performance of an individual item. Applied Cognitive Psychology, 10, 257-260. Nelson, T. O., & Dunlosky, J. (1991). When people's judgments of learning (JOLs) are extremely accurate at predicting subsequent recall: The "delayed-JOL effect." Psychological Science, 2, 267-270. Nelson, T. O., Leonesio, R. J., Landwehr, R. S., & Narens, L. (1986). A comparison of three predictors of an individual's memory performance: The individual's feeling of knowing versus normative feeling of know- ing versus base-rate item difficulty. Journal of Experimental Psychol- ogy: Learning, Memory, and Cognition, 12, 279-287. Nelson, T. O., & Narens, L. (1994). Why investigate metacognition? In J. Metcalfe & A. P. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 1-27). Cambridge, MA: MIT Press. Pressley, M., Levin, J. R., & Ghatala, E. S. (1984). Memory strategy monitoring in adults and children. Journal of Verbal Learning and Verbal Behavior, 23, 270-288. Rawson, K. A., Dunlosky, J., & McDonald, S. L. (2000). Influence of metamemory on performance predictions for text. Manuscript submitted for publication. Schneider, W. (1995). Micro Experimental Laboratory (Version 2.0) [Computer Software]. Pittsburgh, PA: Psychological Software Tools. Schwartz, B. L. (1994). Sources of information in metamemory: Judgments of learning and feelings of knowing. Psychonomic Bulletin & Review, 1, 357-375. Schwartz, B. L., & Metcalfe, J. (1994). Methodological problems and pitfalls in the study of human metacognition. In J. Metcalfe & A. P. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 93- 113). Cambridge, MA: MIT Press. Spellman, B. A., & Bjork, R. A. (1992). When predictions create reality: Judgments of learning may alter what they are intended to assess. Psychological Science, 3, 315-316. Spellman, B. A., & Bjork, R. A. (1997, November). When prophecy succeeds (too well): Inaccurate judgments of learning can produce better-than-perfect predictions. Poster session presented at the 38th annual meeting of the Psychonomic Society, Philadelphia. Thiede, K. W., & Dunlosky, J. (1994). Delaying students' metacognitive monitoring improves their accuracy in predicting their recognition per- formance. Journal of Educational Psychology, 86, 290-302. Weaver, C. A., JJI. (1990). Constraining factors in calibration of compre- hension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 214-222. Weaver, C. A., UJ, & Kelemen, W. L. (1997). Judgments of learning at delays: Shifts in response patterns or increased metamemory accuracy? Psychological Science, 8, 318-321. Widner, R. L. Jr., & Smith, S. M. (1996). Feeling-of-knowing judgments from the subject's perspective. American Journal of Psychology, 109, 373-387. Wright, D. B. (1996). Measuring feeling of knowing. Applied Cognitive Psychology, 10, 261-268. Yates, J. F. (1990). Judgment and decision making. Englewood Cliffs, NJ: Prentice Hall. Received August 26, 1999 Revision received March 13, 2000 Accepted March 13, 2000