Sei sulla pagina 1di 11

Journal of Educational Psychology 2013, Vol. 105, No.

1, 78 88

2012 American Psychological Association 0022-0663/13/$12.00 DOI: 10.1037/a0029378

Self and External Monitoring of Reading Comprehension


Ling-po Shiu
The Chinese University of Hong Kong

Qishan Chen
South China Normal University

The present study compared the effectiveness of 2 approaches to remedy the inaccuracy of selfmonitoring of reading comprehension. The first approach attempts to enhance self-monitoring by strengthening the cues utilized in monitoring. The second approach replaces self-monitoring with external regulation based on objective evaluative information. We used the delayed-keyword effects (Thiede, Anderson, & Therriault, 2003) to produce different levels of self-monitoring accuracy and, at each accuracy level, compared selection of texts for restudy by the participants themselves with selection by the computer on the basis of a comprehension test taken by the participants. We found that the computer-select groups were more able than the self-select groups to choose the least understood texts for restudy and obtained larger improvement on those texts in a final comprehension test. It appears that a comprehension test can provide more accurate information than self-monitoring for regulation of study. The role of objective evaluative information in self-regulation of learning is discussed. Keywords: Self-monitoring, self-regulated learning, delayed-keyword effects, reading comprehension

Self-regulated learning (SRL) is often portrayed as an active, dynamic process in which self-motivated learners set their learning goals, employ effective learning strategies, monitor their learning progress, evaluate their progress against some self-set standards and use this information to regulate their study (Metcalfe, 2009; Pintrich, 2000; Thiede & Dunlosky, 1999; Winne & Hadwin, 1998; Zimmerman, 1998). However, research has shown that most people are far from being effective self-regulated learners. One of the reasons is that self-monitoring of learning is usually inaccurate (Karpicke, Butler, & Roediger, 2009; Koriat & Bjork, 2005). This is true whether the learning materials are word pairs (Nelson & Dunlosky, 1991), passages of text (Glenberg & Epstein, 1985), or school subject materials (Hacker, Bol, Horgan, & Rakow, 2000; Kostons, van Gog, & Paas, 2010) and whether the learning context is the laboratory (Dunlosky & Lipko, 2007; Maki, 1998), the classroom (Hacker et al., 2000), or computer-based learning environments (Kostons et al., 2010). Inaccurate monitoring is detrimental to SRL because initiation of self-regulatory processes, such as redefinition of a learning task, adjustment of task goals and standards, and changes in study strategies, is contingent on the outcomes of monitoring and evaluation of learning (Winne & Hadwin, 1998). The present study compared the effectiveness of two different approaches to remedy the inadequacy of self-monitoring. The first

This article was published Online First July 30, 2012. Ling-po Shiu, Department of Educational Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong; Qishan Chen, Center for Studies of Psychological Application, Department of Psychology, South China Normal University, Guangzhou, Peoples Republic of China. This research was supported by General Research Fund Grant 450108 from the Research Grants Council of the Hong Kong Special Administrative Region awarded to Ling-po Shiu. We thank Arthur Graesser and Roger Azevedo for many useful comments on earlier versions of this article. Correspondence concerning this article should be addressed to Ling-po Shiu, Department of Educational Psychology, The Chinese University of Hong Kong, Shatin, Hong Kong. E-mail: lshiu@cuhk.edu.hk 78

approach attempts to enhance self-monitoring by strengthening the cues on which monitoring judgments (e.g., judgment of learning, or JOL) are based (Koriat, 1997). Research has shown that JOL accuracy can be improved when a JOL is made after a delay rather than immediately after study (Koriat & Bjork, 2006; Maki, 1998; Nelson & Dunlosky, 1991), after restudying the to-be-learned materials (Rawson, Dunlosky, & Thiede, 2000), after taking a practice test (Dunlosky, Rawson, & McDonald, 2002; Lovelace, 1984), over multiple trials (Koriat & Bjork, 2006), in a condition similar to the retrieval condition at test (Koriat, 1993), and so on. In addition, for judgments of text comprehension, accuracy can be enhanced if readers performed deep, conceptual level of processing (Magliano, Little, & Graesser, 1993), made delayed-generation of keywords or summaries (Thiede & Anderson, 2003; Thiede et al., 2003), among others (see reviews by Dunlosky & Lipko, 2007; Lin & Zabrucky, 1998; Thiede, Griffin, Wiley, & Redford, 2009; Thomas & McDaniel, 2007). The second approach attempts to replace self-monitoring with external regulation based on objective evaluative information. This approach is most evident in computer-assisted instruction, a research field in which adaptive, personalized instruction is favored, but there are contrasting views with respect to whether the learner or the system (computer) should be in control. Learner control is associated with learner motivation and task involvement, although learning effectiveness is often not enhanced or may be even negative if learners lack prior knowledge (Azevedo, Moos, Greens, Winters, & Cromley, 2008; Williams, 1996). Again, the reason is that learners self-monitoring and self-assessment of what they know is often inaccurate, particularly for novice learners (Kostons et al., 2010; Williams, 1996). A new type of model allows the system and the learner to share control over selection of learning tasks (e.g., Corbalan, Kester, & van Merrie nboer, 2006, 2008). In this two-step process, the system first identifies the needs of an individual learner by assessing the learners competence and cognitive load and then selects a small set of learning tasks to meet the learners needs. Next, the learner is allowed to select one task from

MONITORING OF READING COMPREHENSION

79

this small set. Shared control results in better learning efficiency and learner involvement (Corbalan et al., 2008). On the other hand, in a kind of intelligent tutoring systems (ITS) called Cognitive Tutors (e.g., Anderson, Corbett, Koedinger, & Pelletier, 1995; Koedinger, & Aleven, 2007), both monitoring of learning progress and regulation of learning activities are largely performed by the computer. Cognitive Tutors interpret learners responses in solving problems by modeling the learners cognitive structure at any point in the problem-solving process, and track learners learning in terms of the knowledge and skill components defined in a cognitive model of the task domain. These are called model tracing and knowledge tracing, respectively. On such bases, the tutors make instructional decisions such as providing hints, explanations, and answers to problems, as well as selecting the next problems for practice. Although some Cognitive Tutors may offer on-demand help (e.g., a glossary and contextual hints were available in the Geometry Cognitive Tutor in Aleven, McLaren, Roll, & Koedinger, 2004), learners rarely use the facilities effectivelythey either overuse or underuse the facilities. In short, Cognitive Tutors tend to be better monitors and regulators of learning than the learners themselves and can reduce both training time and cost. In both shared-control systems and Cognitive Tutors, the computer monitors and assesses learners cognitive learning. Newly developed ITS have been used to promote learners metacognitive learning. These ITS employ pedagogical agents capable of holding conversations with learners in natural language to promote learners SRL, either directly through explicit instruction, demonstration, and coaching of effective SRL strategies; or indirectly through hinting, questioning, and providing feedback in a just-intime fashion in response to student actions (see review by Graesser & McNamara, 2010). Few recent studies have directly compared the effectiveness of enhancing self-monitoring versus external regulation based on objective evaluation. Yet, dating back to the 1960s and 1970s, in the context of searching for optimal schedule of practice to maximize learning, comparisons between scheduling decisions made by the learners themselves and those by some model-based algorithms were often made (see Pavlik & Anderson, 2008). For example, in Atkinson (1972), a learner-controlled strategy, a response-sensitive strategy and a random-order strategy were compared in a task of learning 84 pairs of GermanEnglish translation equivalents. Participants who learned with the learner-controlled strategy chose items for additional study by themselves. Participants who learned with the response-sensitive strategy were given items for additional study as determined by a Markov model that took into account a participants success and failure in previous learning trials. In other words, regulation of study was controlled externally by objective evaluative information. The random-order strategy was the baseline condition in which items were chosen randomly. Atkinsons results ranked the three strategies in terms of producing the best recall in a delayed test in the following order: response-sensitive, learner controlled, and random order. In another study by Nelson, Dunlosky, Graf, and Narens (1994), a computer-controlled allocation of restudy based on participants JOL and one based on normative performance were compared in a task involving learning SwahiliEnglish translation equivalents. Normative performance refers to the base-rate difficulty level as obtained from a large sample of participants and, thus, does not take into consideration an individuals performance in the preced-

ing learning trials. Nelson et al. found that the computer-controlled regulation strategy based on JOLs produced learning equivalent to the learning achieved by another group of participants who chose allocation by themselves, and both outperformed computercontrolled allocation based on base-rate information. Although it may appear that Atkinson (1972) and Nelson et al. (1994) obtained discrepant results concerning the usefulness of objective evaluative information, in fact their results together suggest that objective evaluative information that takes into account a learners personal historical records of success and failure in learning is useful. Atkinsons (1972) results further show that external regulation of study based on such information can be more effective than one based on learners subjective judgments. However, it is uncertain whether Atkinsons conclusion made four decades ago still holds because, since then, there have been significant developments in the technique to enhance self-monitoring. The participants in Atkinsons learner-controlled condition could have been more accurate in self-monitoring had they performed some of the enhancing activities reviewed previously. It remains a question whether enhanced self-monitoring can be as accurate as external objective evaluation. Furthermore, Atkinson (1972) and other previous studies investigated word-list learning, which may not represent complex learning activities such as reading texts. Therefore, the present study attempted to compare enhanced self-monitoring with external regulation in text comprehension. We used the delayed-keyword effects reported by Thiede et al. (2003) to enhance self-monitoring, because this experimental paradigm was reported to produce highly accurate self-monitoring, and it easily allows comparison between self and external monitoring. We first describe the delayed-keyword effect. In Thiede et al.s (2003) experiment, the participants were asked to read six expository texts and then rate their comprehension of each text. Before making the comprehension judgments, the participants either generated or did not generate five keywords for each text. The former group was further divided into immediate and delayed groups. Participants in the immediate-keyword group generated keywords for a text right after they had finished reading it; while participants in the delayed-keyword group generated keywords after they had finished reading all the six texts. Next, the participants took a comprehension test. Afterward, they were asked to choose some of the texts for restudy. Finally, they took a second comprehension test on both selected and unselected texts. Thiede et al. (2003) found that the participants in the delayedkeyword group were more able than the other two groups to monitor comprehension accurately, regulate (re)study effectively, and comprehend the texts. According to them, monitoring accuracy was indicated by a positive correlation between the comprehension ratings and the scores on the first comprehension test. Effective regulation was indicated by a negative correlation between the comprehension ratings and whether a text was selected for restudy. Finally, comprehension of the texts was indicated by the performance in the second comprehension test. The delayed-keyword effects have been replicated and corroborated by several similar studies conducted by Thiede and coworkers (e.g., Thiede & Anderson, 2003; Thiede, Dunlosky, Griffin, & Wiley, 2005). In particular, Thiede and Anderson (2003) and Anderson and Thiede (2008) found that writing summaries after a delay could also enhance comprehension monitoring accuracy and regulation effectiveness. These investigations are important for

80

SHIU AND CHEN

two reasons. First, they showed that accurate comprehension monitoring can be achieved. The delayed-keyword manipulation produced a correlation between comprehension ratings and test scores as high as .70, very close to what can be found in metamemory studies of paired-associates learning (Nelson & Dunlosky, 1991). Second, they demonstrated a chain of metacognitive processes that starts from accurate monitoring to effective regulation and ends with superior learning. Early investigations did not find a strong association between monitoring accuracy and learning outcomes (e.g., Begg, Martin, & Needham, 1992). Thiede et al. (2003), as well as Nelson et al. (1994), attributed the failure to an absence of a causal chain linking monitoring to learning performance. That is, because the monitoring judgments were made at the end of learning, there was no way for the outcome of monitoring to influence how people learned. In their studies, by allowing participants to choose materials for restudy and take a second test, the impact of monitoring on learning could occur via regulation of study time. In Thiede et al.s (2003) experiment, there was a readily available piece of objective evaluative information that also reflected an individuals degree of learning. Their participants took a comprehension test before choosing texts for restudy. These test scores, though not disclosed to the participants, should be indicative of the participants degree of learning of a text. Even if these scores might be less reliable than information collected over multiple learning trials (as in Atkinson, 1972), they provide a reasonable alternative source of evaluative information against which self-monitoring can be compared. A simple regulation strategy to make use of this information is to choose for restudy those texts on which the participants scored very low. This strategy is called discrepancy-reduction, and it seems to be used by Thiede et al.s participants (see the authors Table 1 and related text). A computer program can exercise external regulation of restudy using this strategy. In short, the present study investigated whether SRL, with the accuracy-enhancing activity of delayed-keyword generation invented by Thiede et al. (2003), would be equally or more effective than external regulation based on objective evaluative information. We used Thiede et al.s delayed-keyword generation manipulation to produce different levels of monitoring accuracy and, at each accuracy level, compared selection of texts for restudy by the participants themselves versus selection based on a comprehension test. In those conditions where self-monitoring is not very accurate (i.e., the no-keyword and immediate-keyword conditions), it was expected that objective evaluative information would result in better selection of texts than self-judgments. The critical comparison lies in the condition in which monitoring accuracy is enhanced (i.e., the delayed-keyword condition). If, in this condition, objective evaluative information also results in better selection of texts than self-judgments, objective evaluative information should be accepted as more useful for regulation of learning.

vision. They participated voluntarily and received approximately US$10 as compensation.

Design
Two between-subjects variables were manipulated: (a) keywords generationno-keyword, immediate-keyword, and delayed-keywordand (b) texts selection for restudy either by the readers themselves or by the computer on the basis of a comprehension test. Accordingly, the participants were randomly divided into six groups (Table 1).

Apparatus and Materials


The reading materials were six expository texts written in Chinese about different topics: tea and coffee, the effects of ultrasonic cleaning, murrain, human-friendly cell phones, and so on. The texts ranged in length from 987 to 1,180 Chinese characters. Their difficulty level was comparable to those of the texts used in university entrance examinations in mainland China. Each text had 12 comprehension questions, half of which were factual, and the other half inferential questions. The factual questions assessed text-based knowledge (i.e., details available within a single paragraph of a text), and the inference questions assessed deep understanding of a text (see Appendix for some samples). This arrangement was the same as that used in Thiede et al. (2003). One set of six questions, divided equally into factual and inferential questions, was used in Comprehension Test 1 for half of the participants in each experimental condition, and another set for the other half. All of the 12 questions were used in Comprehension Test 2. The texts and the comprehension questions were presented on a LCD monitor connected to a PC. Each text was divided into three displays, with each display containing roughly 400 Chinese characters. The participants wrote keywords on a piece of paper and rated ease of learning and comprehension on a keyboard connected to the PC.

Procedure
We tried to follow the experimental procedure of Thiede et al. (2003) as closely as we could because this was the original procedure with which the delayed-keyword effect was first reported. Although this procedure confounds different types of time lag (e.g., keyword keyword lag, keywordjudgment lag, and so on) according to Thiede et al. (2005), they reported that the delayed-keyword effects obtained with the original procedure was not affected by manipulations of other types of time lag. Therefore, we followed the original procedure in the present study. There were two major differences to note. First, in Thiede et al. (2003), the participants made an ease-of-learning (EOL) judgment for each text before they had a chance to look at the texts. They were just given the title of a text and asked to judge how easily they thought they could learn from a text bearing that title. So the judgments were likely to indicate the participants knowledge of or familiarity with a certain topic. On the other hand, in our study, the participants made an EOL judgment after they had glanced through the text briefly. This is in line with the procedure to elicit EOLs in metamemory studies (e.g., Leonesio & Nelson, 1990).

Method Participants
The participants were 100 undergraduate students (59 females; mean age 20.74 years, SD 1.26) at the Chinese University of Hong Kong. They were native Chinese speakers who could read Chinese fluently. They reported normal or corrected-to-normal

MONITORING OF READING COMPREHENSION

81

Second, each keyword group was subdivided into two subgroups, in accordance with how the texts were selected for restudy. The texts were selected by either the participants themselves (same as in Thiede et al., 2003) or the computer according to the participants performance in Comprehension Test 1. In the latter condition, the computer selected those texts for which the participants scored correct on less than half of the comprehension questions. This selection strategy follows the discrepancy-reduction model of SRL and also seemed to be used by Thiede et al.s (2003) participants (see the authors Table 1 and related text). Other than these two major differences, we tried to adhere to their experimental procedure as closely as we could. The procedure consisted of seven steps. First, all participants were given instruction about the general procedure of the experiment. They were informed that they would read six texts, rate their comprehension for each text, and answer test questions on each text. Participants in the keyword groups (both immediate and delayed) were also told that they were required to write five keywords to capture the major ideas of a particular text. They were given one example text and five example keywords to learn the meaning of the term keywords. Then, all participants were asked to make an EOL judgment for each text after they had glanced through it briefly. Following Thiede et al. (2003), EOL judgment was prompted with the title of the text at the top of the screen and the query (written in Chinese), How easily do you think you could learn the information contained in the passage with this title? 1 (very easily) to 7 (with great difficulty). Afterward, all participants read the six texts. The presentation order of the texts was randomized for each participant. Reading was self-paced, and the pages could be turned by pressing the ENTER key on a computer keyboard. A page, once turned, could not be turned back. The participants in the no-keyword groups read the texts only, whereas participants in the keyword groups also generated five keywords for each text. The latter groups were shown the title of a text and instructed to write five keywords that captured the gist of that text. The immediate-keyword groups generated keywords immediately after reading each text, whereas the delayed-keyword groups generated keywords for the six texts together at the end of reading the last text. After finishing reading, all participants rated their comprehension of each text. Following Thiede et al. (2003), the rating was prompted with the title of a text at the top of the screen and the query (written in Chinese), How well do you think you understood the text with this title? 1 (very poorly) to 7 (very well) After rating comprehension of the last text, all participants took Test 1. The texts were rated for comprehension and tested in the same order as they were presented for reading. After answering the last test question, all participants were told the number of questions they had answered correctly, pooled over six texts. That is, they received feedback about overall performance but not performance on a specific text. Following feedback, all participants reread some texts selected by either themselves or the computer. The participants who selected by themselves made their selection by looking at the text titles that appeared on a computer screen and pressing a corresponding key on a computer keyboard. For the other participants, the computer selected those texts for which the participants had answered less than half of the questions in Test 1 correctly. The

texts selected were randomized anew before being presented for rereading. After restudy, all participants were tested on each text again (Test 2). This time, there were 12 questions for each text (six factual questions and six inference questions). Half of these questions were repeated from Test 1, and the other half were new questions. The whole experiment lasted for less than 90 min.

Results
We first followed Thiede et al.s (2003) steps of data analysis as much as possible in order to examine whether the delayedkeyword effects were replicated. Then, we compared the selfselect, delayed-keyword, and the computer-selected, delayedkeyword groups on the dependent measures in order to evaluate SRL (enhanced) and externally regulated learning (ERL) based on objective evaluative information. Following the conventional practice, we used relative monitoring accuracy, operationalized as GoodmanKruskal gamma correlation, as the measure (Nelson, 1984; Nelson, Narens, & Dunlosky, 2004). Most of the analyses were done with 3 (keyword condition) 2 (selection condition) between-subject analyses of variance (ANOVAs). The significance level is .05. In case interactions were significant and pairedcomparisons were made, we made the Bonferroni correction to maintain a family-wise Type I error rate of .05.

Comprehension Monitoring Accuracy


There were two subjective judgments of learning: EOL made before keyword generation, and comprehension rating after keyword generation. Only the latter was expected to show significant differences between keyword groups. Table 1 shows the mean EOL for each subgroup of participants, and its correlation with Test 1 scores. It can be seen that the average EOLs and gammas are roughly the same between the groups. ANOVA confirmed that none of the factors reached sta-

Table 1 Means and Accuracy of Ease of Learning Judgments and Comprehension Ratings
Ease of learning (EOL) Group No keyword Self-select Computer-select Mean Immediate keyword Self-select Computer-select Mean Delayed keyword Self-select Computer-select Mean n Mean (SE) Accuracy (gamma) Comprehension rating Accuracy Mean (SE) (gamma)

16 4.93 (0.21) 18 4.89 (0.20) 4.91 (0.14)

.19 (.16) 4.65 (0.23) .17 (.14) .03 (.14) 4.44 (0.22) .10 (.13) .10 (.11) 4.54 (0.16) .14 (.10)

17 4.74 (0.23) .06 (.15) 5.07 (0.23) .06 (.14) 16 5.31 (0.22) .11 (.16) 5.31 (0.23) .11 (.14) 5.04 (0.16) .02 (.11) 5.19 (0.16) .08 (.10) 16 4.73 (0.23) 17 4.62 (0.21) 4.67 (0.16) .16 (.16) 4.80 (0.23) .55 (.14) .05 (.15) 4.66 (0.22) .57 (.13) .10 (.11) 4.73 (0.16) .56 (.10)

Note. Standard errors are shown in parentheses. Accuracy gamma correlation with the first comprehension test performance.

82

SHIU AND CHEN

tistical significancefor EOLs: main effect of keyword, F(2, 94) 1.17, mean square error [MSE] 0.77, p .32, partial eta squared .025; main effect of reread, F(1, 94) 0.54, p .47, partial eta squared .006; interaction effect, F(2, 94) 1.54, p .22, partial eta squared .032; for EOL gammas: main effect of keyword, F(2, 94) 0.19, MSE 0.39, p .83, partial eta squared .004; main effect of reread: F(1, 94) 0.09, p .77, partial eta squared 0.001; interaction effect: F(2, 94) 0.70, p .50, partial eta squared 0.015. In contrast, the comprehension ratings of the immediate-keyword groups (5.19) were higher than the other two groups (4.54 and 4.73). An ANOVA showed that the effect of keyword condition was significant, F(2, 94) 4.39, MSE 0.83, p .015, partial eta squared .086. Least significant difference (LSD) tests indicate that the immediate-keyword groups were significantly higher than either the no- ( p .004) or delayed-keyword groups ( p .042); but Bonferroni p .013 is significant for the former comparison only. On the other hand, the main effect of text selection (i.e., either by self or by computer) and the two-way interaction effect were not significant, F(1, 94) 0.04, MSE 0.83, p .85, and F(2, 94) 0.57, MSE 0.83, p .57, respectively. The correlation between comprehension ratings and Test 1 scores was shown in the last column of Table 1. Consistent with Thiede et al.s study, the mean correlation was low in general (M 0.262, SE 0.056) and differed significantly across keyword conditions, F(2, 94) 7.27, MSE 0.31, p .001, partial eta squared .135. Participants who wrote keywords after a delay had a significantly higher correlation than those who wrote them immediately (Bonferroni p .02), and a marginally higher correlation than participants who did not write keywords at all (Bonferroni p .07). The main effect of selection condition and the two-way interaction effect were not significant, F(1, 94) 0, MSE 0.31, p .99, and F(2, 94) 0.11, MSE 0.31, p .90, respectively. To summarize, although EOLs and comprehension ratings were similar in magnitude, only the latter and their associated correlations with Test 1 scores showed significant differences among the keyword conditions. The correlation is higher in the delayedkeyword groups. This confirms the effect of delayed-keyword generation on comprehension monitoring accuracy.

Regulation of Study
The number of texts selected for restudy appeared quite similar across groups. It was 3.31 for the no-keyword, self-select group; 3.67 for the no-keyword, computer-select group; 3.18 for the immediate-keyword, self-select group; 3.94 for the immediatekeyword, computer-select group; 3.19 for the delayed-keyword, self-select group; and 3.65 for the delayed-keyword, computerselect group. Although the computer-select groups tended to select more texts for restudy, the difference did not reach significance, F(1, 94) 3.31, MSE 2.08, p .07. The main effect of keyword and the two-way interaction were not significant, F(2, 94) 1, MSE 2.08, and F(2, 94) 1, MSE 2.08. Table 2 shows the mean comprehension ratings and the mean Test 1 scores for the selected and unselected texts. Thiede et al. (2003) used the (negative) correlation between a participants comprehension rating and whether a text was selected for restudy, across six texts, as indication of how effectively the participant was able to regulate his or her study. Although we disagree with using comprehension ratings (discussed later), we did the same analyses with our data. The results are shown in the third column of Table 2. The participants in the three self-select groups selected texts on which they rated low level of comprehension for restudy. This produced high gamma correlations. Although the immediatekeyword group had the highest gamma, the difference between groups was not significant, F(2, 46) 1.13, MSE 0.38, p .05. Our results differ from those of Thiede et al., who found higher correlation in the delayed-keyword condition. For the three computer-select groups, because the selection decision was made by the computer, the numbers showed only that the computer was able to select those texts for which the participants rated their level of comprehension as relatively low. The fourth and fifth columns of Table 2 show the Test 1 scores for the selected and unselected texts, respectively. The pattern of results differs quite substantially from the comprehension ratings. Note that among the self-select groups, the delayed-keyword group was more able to select those texts on which they had scored low

Table 2 Means of Comprehension Rating and First Comprehension Test Performance for the Selected Texts and Unselected Texts and Their Respective Gamma Correlation With Selection of Texts for Restudy
Comprehension rating Group Self-select No keyword Immediate keyword Delayed keyword Mean Computer-select No keyword Immediate keyword Delayed keyword Mean Note. Selected 4.31 (0.30) 4.79 (0.29) 4.44 (0.29) 4.51 (0.17) 4.34 (0.26) 5.01 (0.31) 4.23 (0.27) 4.53 (0.16) Not selected 4.93 (0.29) 5.41 (0.28) 4.94 (0.28) 5.09 (0.16) 4.76 (0.25) 5.18 (0.30) 5.26 (0.26) 5.07 (0.16) Gamma .40 (.18) .72 (.18) .41 (.18) .51 (.11) .28 (.16) .13 (.19) .52 (.17) .31 (.10) Selected .55 (.03) .54 (.02) .50 (.02) .53 (.013) .38 (.02) .37 (.03) .41 (.02) .39 (.013) Test 1 scores Not selected .46 (.03) .56 (.03) .58 (.03) .53 (.015) .71 (.02) .72 (.03) .73 (.03) .72 (.015) Gamma .41 (.10) .02 (.09) .34 (.09) .029 (.05) 1 1 1

Standard errors are shown in parentheses.

MONITORING OF READING COMPREHENSION

83

than the immediate- and the no-keyword group. This agrees with Thiede et al.s results. This pattern was also reflected in the correlation between a participants Test 1 score on a text and whether a text was selected for restudy (the sixth column of Table 2). This correlation, we argue, is a more appropriate measure of effective regulation than the correlation between comprehension ratings and text selection because of the following reason. Both comprehension ratings and selection decisions were made by a participant. A correlation between them might simply mean that the person is consistent or remembers his or her comprehension ratings when making the selection. In contrast, the test score a participant scores on a text reflects more truly his or her understanding of the text. A good monitor and regulator of learning should be able to pick out those texts on which he or she did not score well for restudy. So a negative correlation between Test 1 scores and selection for restudy is a better measure of effective regulation. As can be seen in the sixth column of Table 2, the correlation is negative in the delayed-keyword condition only, indicating that this group of participants had the most effective regulation. However, even the best of the self-select groups compared poorly with any of the computer-select groups. On the whole, for the texts selected for restudy, the three self-select groups obtained significantly higher scores than the three computer-select groups, F(1, 94) 62.52, MSE 0.008, p .001, partial eta squared .407. That means the self-select groups were not able to select the least understood texts for restudy. The main effect of keyword and the interaction were not significant, F(2, 94) 0.19, MSE 0.008, p .83, and F(2, 94) 2.00, MSE 0.008, p .14. To summarize, our results confirm the delayed-keyword effects on regulation of restudy and, furthermore, show that the self-select groups were not as able as the computer-select groups to choose the least understood texts for restudy.

The Effects of Restudy on Test Performance


The outcome measure of effective regulation is the improvement in test scores after restudy. Table 3 shows the mean scores of both Test 1 and Test 2. Test 1 scores were not significantly different among the groups. The main effects of keyword condition, selection condition, and their interaction effect were not significant, F(2, 94) 0.72, MSE 0.012, p .49; F(1, 94)

Table 3 Means of Test Performance for the First and Second Comprehension Test
Group Self-select No keyword Immediate keyword Delayed keyword Mean Computer-select No keyword Immediate keyword Delayed keyword Mean Note. Test 1 (six items) .50 (.02) .54 (.03) .54 (.03) .52 (.015) .51 (.03) .48 (.03) .54 (.03) .51 (.015) Test 2 (12 items) .48 (.02) .52 (.02) .56 (.02) .52 (.012) .53 (.02) .51 (.02) .55 (.02) .53 (.012)

0.68, MSE 0.012, p .41; and F(2, 94) 0.97, MSE 0.012, p .38, respectively. For Test 2, the delayed-keyword groups scored slightly higher (.551) than the other two keyword groups (.51 and .51 for the no- and immediate-keyword groups, respectively), F(2, 94) 3.17, MSE 0.007, p .047, partial eta squared .063. (Pairwise comparison between delayed and immediate-keywords: Bonferroni p .134, LSD p .045; and between delayed and no keywords, Bonferroni p .086, LSD p .029) The main effect of selection and its interaction effect with keyword were not significant. To assess the relative improvement from Test 1 to Test 2, we did a 3 (keyword condition) 2 (selection condition) 2 (Test 1 vs. Test 2) ANOVA, with repeated measure on the last factor. This revealed that the main effect of test, F(1, 94) 0.84, MSE 0.002, p .36; the two-way interaction between test and keyword, F(2, 94) 0.64, MSE 0.002, p .53; and the three-way interaction among test, keyword, and selection, F(2, 94) 1.68, MSE 0.002, p .19, were all not significant. However, there was a significant interaction between test and selection, F(1, 94) 4.90, MSE 0.007, p .029, partial eta squared .05. Simple effect analysis showed that test performance was about the same across tests when the participants themselves selected texts to restudy (.52 vs. .52), F 0.89, MSE 0.002, p .35, but test performance increased slightly and reliably for the computer-select groups (.51 vs. 53), F 4.67, MSE 0.003, p .028, partial eta squared .050. We found the small improvement across tests puzzling, especially for the delayed-keyword groups. Therefore, the test scores were further analyzed according to whether a text was selected for restudy2 and whether the Test 2 items were old (repeated) or new (Table 4). The first two columns of Table 4 are the same as the fourth and sixth column of Table 2. When only the selected texts were considered, there was remarkable improvement from Test 1 to Test 2 old items (from .46 to .64), F(1, 94) 231.16, MSE 0.007, p .001, partial eta squared .718. The improvement was particularly larger for the delayedkeyword groups (increased from .45 to .68) than the immediate(increased from .45 to .61) or no-keyword (increased from .46 to .63) groups. However, the two-way interaction did not reach significance, F(2, 94) 2.74, MSE 0.007, p .07. The improvement from Test 1 to Test 2 old items was also larger for the computer-select (increased from .39 to .61) than the self-select groups (increased from .53 to .68). This two-way interaction effect was significant, F(1, 94) 9.23, MSE 0.007, p .01, partial eta squared .092. The improvement from Test 1 to Test 2 new items3 was, in contrast, more restrictive. It was negative if participants selected
1 Hereafter, when we report the cell means for a main effect, the means are averaged over the levels of the other factors. For example, here, we report the mean of each keyword condition by pooling over the self-select and the computer-select conditions. 2 We did not do statistical analyses on the unselected texts because the effects of restudy should be on selected but not unselected texts. 3 New items refer to those items that appeared for the first time in a particular test. That is, Test 1 new items appeared for the first time in Test 1; Test 2 new items appeared for the first time in Test 2. These items can be compared because Test 1 new items for half of the participants were Test 2 new items for the other half, and vice versa. We called the difference between the two scores improvement because Test 2 was taken after rereading.

Standard errors are shown in parentheses.

84

SHIU AND CHEN

Table 4 Means of Test Performance for Selected and Unselected Texts


Test 2 (12 items) Test 1 (six items) Group Self-select No keyword Immediate keyword Delayed keyword Computer-select No keyword Immediate keyword Delayed keyword Note. Selected .55 (.03) .54 (.02) .50 (.02) .38 (.02) .37 (.03) .41 (.02) Not selected .46 (.03) .56 (.03) .58 (.03) .71 (.02) .72 (.03) .73 (.03) Old items (six items) Selected .66 (.04) .67 (.03) .70 (.03) .60 (.03) .56 (.04) .66 (.03) Not selected .46 (.04) .58 (.04) .60 (.04) .61 (.04) .68 (.04) .65 (.04) New items (six items) Selected .43 (.04) .53 (.04) .52 (.04) .52 (.04) .52 (.04) .54 (.04) Not selected .46 (.04) .41 (.04) .54 (.03) .47 (.03) .42 (.04) .51 (.03)

Standard errors are shown in parentheses.

the texts themselves (decreased from .53 to .49), but positive if participants read texts selected by the computer (increased from .39 to .53). Next, we consider the unselected texts. Among the three selfselect groups, the delayed-keyword group obtained the highest Test 1 scores. This is consistent with the results with the selected texts and showed once again that this group made the most appropriate selection. The three self-select groups performed quite stably from Test 1 to Test 2 old items but dropped somewhat for Test 2 new items. The relatively poor scores on Test 2 new items might reflect some memory loss because there was quite a time lag between reading the texts and taking Test 2. For the three computer-select groups, although the initial performance was very good (this was by design), it dropped quite a bit with Test 2 old items and more substantially with Test 2 new items. The scores were comparable to those of the three self-select groups. The poor performance in Test 2 for unselected texts might explain why the overall scores of Test 2 are only slightly higher than Test 1 (Table 3). That is, the improvement obtained with the selected texts was offset by the decrement obtained with the unselected texts. The latter might be attributed to memory loss. In summary, detailed analyses of the test scores revealed that improvement across tests occurred mostly for only the selected texts. This is reasonable because they were restudied. This improvement was significantly larger for the computer-select than the self-select groups. Furthermore, for the self-select groups, improvement occurred only with the repeated test items. This might be attributed to a restudy strategy of looking for the information they had missed in the first reading. On the other hand, the improvement achieved by the computer-select groups was evident in both old and new items of Test 2 (.25 and .13). This suggests that the participants who reread texts chosen by the computer might have a more balanced goal in the second reading. Second, the effects of keyword generation were not found in the computer-select condition. As long as a text was restudied, the improvement in test scores was the same regardless of keyword condition. This implies that the delayed-keyword effects work largely via selection of texts for restudy, or regulation of study, rather than via text processing.

SRL (Enhanced) Versus ERL


Next, we compared the self-select, delayed-keyword with the computer-select, delayed-keyword groups. They did not differ significantly in EOLs, comprehension ratings, or the respective correlation of each measure with Test 1 scores. The self-select group was less able than the computer-select group to select the lowest scoring texts for restudy: comparison of the scores for selected texts, t(31) 3.07, p .005; for unselected texts, t(31) 3.88, p .001. To analyze the improvement from Test 1 to Test 2 on the selected texts, we did 2 2 ANOVA separately on all items of Test 2 and on old and new items of Test 2. The two-way interaction would indicate whether the improvement differed significantly between groups. The analyses show significant interaction in all items of Test 2, F(1, 31) 5.12, MSE 0.005, p .031, partial eta squared .146, as well as the new items of Test 2, F(1, 31) 5.07, MSE 0.01, p .032, partial eta squared .145, but not in old items of Test 2. In sum, the self-select, delayed-keyword group was less able than the computer-select, delayed-keyword group to select the least understood texts for restudy and to improve their comprehension of those texts after restudy.

Discussion
The results of our experiment confirm that the delayed-keyword effects on comprehension monitoring reported by Thiede et al. (2003) can be replicated with participants of a different ethnicity and texts written in a different language. Despite the positive effects of delayed-keyword generation on self-monitoring and self-regulation, our results show that SRL thus enhanced was not as effective as external regulation based on objective evaluative information provided by a simple comprehension test, in terms of selecting the least understood texts for restudy and obtaining bigger improvement after restudy in a test on those texts. Our research extends Atkinsons (1972) conclusion from word learning to text learning. Of course, the comparison between enhanced self-monitoring and external regulation should not end here. There are more ways to enhance self-monitoring than delayed-generation of keywords. In addition, some researchers have succeeded in training learners

MONITORING OF READING COMPREHENSION

85

to be more accurate in self-monitoring. For example, Kostons, van Gog, and Paas (2012) found that secondary school students who had observed a human model demonstrating accurate selfassessment skills attained better SRL skills and learning gains than the control group. New ITS are also used to train students in SRL (e.g., MetaTutor by Azevedo et al., 2009). It remains to be seen how much better at self-monitoring learners can become after extended training. Nevertheless, such enhancements in SRL should be compared with learning strategies that do not rely on self-monitoring, if the goal is to search for optimal learning strategy. Given the accuracy of objective evaluative information, it is likely that including it in self-monitoring of learning would be beneficial. For example, it is a common practice among students to complete the questions in the course materials as a means to assess their learning progress. This is similar to taking a comprehension test after reading a text in our experiment. Vermunt (1998) classified such behavior under external regulation and favored selfregulation behavior such as answering self-generated questions. We think this is an unnecessarily restrictive view of SRL. More recent researchers accept answering study questions from textbooks as a legitimate means of self-testing or self-assessment (Kornell & Bjork, 2007; Rawson, ONeil, & Dunlosky, 2011). Schunk (1991) said that monitoring is deliberate attention to some aspect of ones behavior (p. 267). There are internal and external aspects of behavior. Karoly (1993) stated that any goal-directed organisms must attend to and perceive information that bears upon their goals. The information includes perceptions of mental and sensory states as well as self and environment transactions. Mental and sensory states are objects of self-monitoring, whereas self and environment transactions may leave records that are subject to objective evaluation. The advantages of transaction records are that external agents can do the monitoring and external standards can be used for their evaluation (Winne & Hadwin, 1998). These are particularly useful when the learners lack cognitive resource or ability to do so. In the example cited earlier, answering questions in course materials is a self environment transaction. Metacognitive control of the learning process may also leave records for objective evaluation. For instance, educational software can log learners interactions in a computer-based learning environment. Such information, in principle, can be used by either the learners themselves or a pedagogical agent to improve regulation of learning (Winne & Nesbit, 2009). Yet learners often do not make as much use of it as they should. In classroom studies, although the scores students got in one test correlated with their scores in the next test better than their subjective predictions, providing feedback on the test scores to students did not improve their prediction accuracy, particularly for low-achieving students (Bol & Hacker, 2001; Nietfeld, Cao, & Osborne, 2005). Therefore, unless learners are led to make full use of external feedback for monitoring and control of learning, either by training (Nietfeld, Cao, & Osborne, 2006) or prompting, external regulation is likely to produce better regulation of learning, as shown in the present study. A problem of objective evaluative information is that it may not transfer across contexts (Greene & Azevedo, 2007). That is, the availability and the nature of external evaluation vary across contexts. But so are scaffolds and other kinds of learning support. If help-seeking has been considered an important element of SRL, and researchers have tried to train learners to perform timely

help-seeking behavior (Roll, Aleven, McLaren, & Koedinger, 2011), it is reasonable to also include consulting an external source for evaluation of learning into SRL. Discussion of a limitation of our study is in order. There is a possible confound in the comparison of the self-select, delayedkeyword and the computer-select, delayed-keyword groups. The two groups differed in terms of information on which text selection was based and who controlled the selection. A reviewer suggested that we should include a third condition in which the computer uses the participants comprehension judgments to select texts. This condition is similar to one in Nelson et al.s (1994) study (discussed earlier). However, even with this new condition included, the confound between information and control cannot be disentangled because it cannot be ascertained whether the participants really used their comprehension judgments to make their selection. If they did, the correlation between comprehension rating and text selection should be close to perfect (cf. Table 2). An alternative is to provide the objective evaluative information to the participants and let them make selection decisions. But there is no guarantee that the participants will necessarily use the information; otherwise, it will be not self-regulation. Thus, the source of the confound lies not in external regulation but in SRL. As long as an experiment allows the participants to self-regulate their restudy, it is not possible to manipulate the kind of information they use to make the regulation. On the other hand, with external regulation, an experimenter can compare different kinds of objective evaluative information (e.g., base rate or an individuals performance accuracy). Regulation strategies can be compared too. External regulation in our experiment used a discrepancy-reduction strategy, which may not necessarily be the best study strategy. Metcalfe and colleagues (Metcalfe, 2002; Metcalfe & Kornell, 2003) have shown that it is better to allocate more study time to materials that are in a learners region of proximal learning than to least understood materials way outside the region. If this is the case, one could investigate whether self-monitoring or some objective information could best identify this region. Despite the limitation, our conclusion about self-monitoring being less accurate than an objective evaluative information (i.e., a comprehension test) in terms of selecting appropriate texts for restudy and about SRL being less effective than external regulation in terms of comprehension of those texts after restudy still holds.

References
Aleven, V., McLaren, B., Roll, I., & Koedinger, K. R. (2004). Toward tutoring help seeking. In J. C. Lester, R. M. Vicari, & F. Parguacu (Eds.), Proceedings of the 7th International Conference on Intelligent Tutoring Systems (pp. 227239). Berlin, Germany: SpringerVerlag. Anderson, J. R., Corbett, A. T., Koedinger, K. R., & Pelletier, R. (1995). Cognitive tutors: Lessons learned. Journal of the Learning Sciences, 4, 167207. doi:10.1207/s15327809jls0402_2 Anderson, M. C. M., & Thiede, K. W. (2008). Why do delayed summaries improve metacomprehension accuracy? Acta Psychologica, 128, 110 118. doi:10.1016/j.actpsy.2007.10.006 Atkinson, R. C. (1972). Ingredients for a theory of instruction. American Psychologist, 27, 921931. doi:10.1037/h0033572 Azevedo, R., Moos, D. C., Greene, A., Winters, F. I., & Cromley, J. G. (2008). Why is externally facilitated regulated learning more effective than self-regulated learning with hypermedia? Educational Technology Research and Development, 56, 4572. doi:10.1007/s11423-007-9067-0

86

SHIU AND CHEN Kornell, N., & Bjork, R. A. (2007). The promise and perils of selfregulated study. Psychonomic Bulletin & Review, 14, 219 224. doi: 10.3758/BF03194055 Kostons, D., van Gog, T., & Paas, F. (2010). Self-assessment and task selection in learner-controlled instruction: Differences between effective and ineffective learners. Computers & Education, 54, 932940. doi: 10.1016/j.compedu.2009.09.025 Kostons, D., van Gog, T., & Paas, F. (2012). Training self-assessment and task-selection skills: A cognitive approach to improving self-regulated learning. Learning and Instruction, 22, 121132. doi:10.1016/ j.learninstruc.2011.08.004 Leonesio, R. J., & Nelson, T. O. (1990). Do different metamemory judgments tap the same underlying aspects of memory? Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 464 470. doi:10.1037/0278-7393.16.3.464 Lin, L. M., & Zabrucky, K. M. (1998). Calibration of comprehension: Research and implications for education and instruction. Contemporary Educational Psychology, 23, 345391. doi:10.1006/ceps.1998.0972 Lovelace, E. A. (1984). Metamemory: Monitoring future recallability during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 756 766. doi:10.1037/0278-7393.10.4.756 Magliano, J. P., Little, L. D., & Graesser, A. C. (1993). The impact of comprehension instruction on the calibration of comprehension. Reading Research and Instruction, 32, 49 63. doi:10.1080/19388079309558124 Maki, R. H. (1998). Predicting performance on text: Delayed versus immediate predictions and tests. Memory & Cognition, 26, 959 964. doi:10.3758/BF03201176 Metcalfe, J. (2002). Is study time allocated selectively to a region of proximal learning? Journal of Experimental Psychology: General, 131, 349 363. doi:10.1037/0096-3445.131.3.349 Metcalfe, J. (2009). Metacognitive judgments and control of study. Current Directions in Psychological Science, 18, 159 163. doi:10.1111/j.14678721.2009.01628.x Metcalfe, J., & Kornell, N. (2003). The dynamics of learning and allocation of study time to a region of proximal learning. Journal of Experimental Psychology: General, 132, 530 542. doi:10.1037/0096-3445.132.4.530 Nelson, T. O. (1984). A comparison of current measures of the accuracy of feeling-of-knowing predictions. Psychological Bulletin, 95, 109 133. doi:10.1037/0033-2909.95.1.109 Nelson, T. O., & Dunlosky, J. (1991). When peoples judgments of learning (JOLs) are extremely accurate at predicting subsequent recall: The delayed JOL effect. Psychological Science, 2, 267270. doi: 10.1111/j.1467-9280.1991.tb00147.x Nelson, T. O., Dunlosky, J., Graf, A., & Narens, L. (1994). Utilization of metacognitive judgments in the allocation of study during multitrial learning. Psychological Science, 5, 207213. doi:10.1111/j.14679280.1994.tb00502.x Nelson, T. O., Narens, L., & Dunlosky, J. (2004). A revised methodology for research on metamemory: Pre-judgment recall and monitoring (PRAM). Psychological Methods, 9, 53 69. doi:10.1037/1082989X.9.1.53 Nietfeld, J. L., Cao, L., & Osborne, J. W. (2005). Metacognitive monitoring accuracy and student performance in the postsecondary classroom. Journal of Experimental Education, 74, 728. Nietfeld, J. L., Cao, L., & Osborne, J. W. (2006). The effect of distributed monitoring exercises and feedback on performance, monitoring accuracy, and self-efficacy. Metacognition and Learning, 1, 159 179. doi: 10.1007/s10409-006-9595-6 Pavlik, P. I., & Anderson, J. R. (2008). Using a model to compute the optimal schedule of practice. Journal of Experimental Psychology: Applied, 14, 101117. doi:10.1037/1076-898X.14.2.101 Pintrich, P. R. (2000). Multiple goals, multiple pathways: The role of goal orientation in learning and achievement. Journal of Educational Psychology, 92, 544 555. doi:10.1037/0022-0663.92.3.544

Azevedo, R., Witherspoon, A., Graesser, A. C., McNamara, D., Chauncey, A., Siler, E., . . . Lintean, M. (2009). MetaTutor: Analyzing selfregulated learning in a tutoring system for biology. In V. Dimitrova, R. Mizoguchi, B. du Boulay, & A. Graesser (Eds.), Artificial intelligence in education (pp. 635 637). Amsterdam, the Netherlands: IOS Press. Begg, I. M., Martin, L. A., & Needham, D. R. (1992). Memory monitoring: How useful is self-knowledge about memory? European Journal of Cognitive Psychology, 4, 195218. doi:10.1080/09541449208406182 Bol, L., & Hacker, D. J. (2001). A comparison of the effects of practice tests and traditional review on performance and calibration. Journal of Experimental Education, 69, 133151. doi:10.1080/00220970109600653 Corbalan, G., Kester, L., & van Merrie nboer, J. J. G. (2006). Towards a personalized task selection model with shared instructional control. Instructional Science, 34, 399 422. Corbalan, G., Kester, L., & van Merrie nboer, J. J. G. (2008). Selecting learning tasks: Effects of adaptation and shared control on learning efficiency and task involvement. Contemporary Educational Psychology, 33, 733756. doi:10.1016/j.cedpsych.2008.02.003 Dunlosky, J., & Lipko, A. (2007). Metacomprehension: A brief history and how to improve its accuracy. Current Directions in Psychological Science, 16, 228 232. doi:10.1111/j.1467-8721.2007.00509.x Dunlosky, J., Rawson, K. A., & McDonald, S. L. (2002). Influence of practice tests on the accuracy of predicting memory performance for paired associates, sentences, and text material. In T. J. Perfect & B. L. Schwartz (Eds.), Applied metacognition (pp. 68 92). Cambridge, MA: Cambridge University Press doi:10.1017/CBO9780511489976.005 Glenberg, A. M., & Epstein, W. (1985). Calibration of comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 702718. doi:10.1037/0278-7393.11.1-4.702 Graesser, A., & McNamara, D. (2010). Self-regulated learning in learning environments with pedagogical agents that interact in natural language. Educational Psychologist, 45, 234244. doi:10.1080/00461520.2010.515933 Greene, J. A., & Azevedo, R. (2007). A theoretical review of Winne and Hadwins model of self-regulated learning: New perspectives and directions. Review of Educational Research, 77, 334 372. doi:10.3102/ 003465430303953 Hacker, D. J., Bol, L., Horgan, D. D., & Rakow, E. A. (2000). Test prediction and performance in a classroom context. Journal of Educational Psychology, 92, 160 170. doi:10.1037/0022-0663.92.1.160 Karoly, P. (1993). Mechanisms of self-regulation: A systems view. Annual Review of Psychology, 44, 2352. doi:10.1146/annurev.ps.44 .020193.000323 Karpicke, J. D., Butler, A. C., & Roediger, H. L., III. (2009). Metacognitive strategies in student learning: Do students practice retrieval when they study on their own? Memory, 17, 471 479. doi:10.1080/ 09658210802647009 Koedinger, K. R., & Aleven, V. (2007). Exploring the assistance dilemma in experiments with cognitive tutors. Educational Psychology Review, 19, 239 264. doi:10.1007/s10648-007-9049-0 Koriat, A. (1993). How do we know what we know? The accessibility model of the feeling of knowing. Psychological Review, 100, 609 639. doi:10.1037/0033-295X.100.4.609 Koriat, A. (1997). Monitoring ones own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126, 349 370. doi:10.1037/00963445.126.4.349 Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring ones knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 187194. doi:10.1037/02787393.31.2.187 Koriat, A., & Bjork, R. A. (2006). Illusions of competence during study can be remedied by manipulations that enhance learners sensitivity to retrieval conditions at test. Memory & Cognition, 34, 959 972. doi: 10.3758/BF03193244

MONITORING OF READING COMPREHENSION Rawson, K., Dunlosky, J., & Thiede, K. W. (2000). The rereading effect: Metacomprehension accuracy improves across reading trials. Memory & Cognition, 28, 1004 1010. Rawson, K. A., ONeil, R., & Dunlosky, J. (2011). Accurate monitoring leads to effective control and greater learning of patient education materials. Journal of Experimental Psychology: Applied, 17, 288 302. doi:10.1037/a0024749 Roll, I., Aleven, V., McLaren, B. M., & Koedinger, K. R. (2011). Improving students help-seeking skills using metacognitive feedback in an intelligent tutoring system. Learning and Instruction, 21, 267280. doi:10.1016/j.learninstruc.2010.07.004 Schunk, D. H. (1991). Learning theories: An educational perspective. New York, NY: Merrill/Macmillan. Thiede, K. W., & Anderson, M. C. M. (2003). Summarizing can improve metacomprehension accuracy. Contemporary Educational Psychology, 28, 129 160. doi:10.1016/S0361-476X(02)00011-5 Thiede, K. W., Anderson, M. C. M., & Therriault, D. (2003). Accuracy of metacognitive monitoring affects learning of texts. Journal of Educational Psychology, 95, 66 73. doi:10.1037/0022-0663.95.1.66 Thiede, K. W., & Dunlosky, J. (1999). Toward a general model of selfregulated study: An analysis of selection of items for study and selfpaced study time. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 1024 1037. doi:10.1037/0278-7393.25.4.1024 Thiede, K. W., Dunlosky, J., Griffin, T. D., & Wiley, J. (2005). Understanding the delayed-keyword effect on metacomprehension accuracy. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 12671280. doi:10.1037/0278-7393.31.6.1267

87

Thiede, K. W., Griffin, T. D., Wiley, J., & Redford, J. S. (2009). Metacognitive monitoring during and after reading. In D. J. Hacker, J. Dunlosky, & A. Graesser (Eds.), Handbook of metacognition in education (pp. 85106). New York, NY: Routledge. Thomas, A. K., & McDaniel, M. A. (2007). Metacomprehension for educationally relevant materials: Dramatic effects of encodingretrieval interactions. Psychonomic Bulletin & Review, 14, 212218. doi:10.3758/ BF03194054 Vermunt, J. D. (1998). The regulation of constructive learning processes. British Journal of Educational Psychology, 68, 149 171. doi:10.1111/ j.2044-8279.1998.tb01281.x Williams, M. D. (1996). Learner-control and instructional technologies. In D. H. Jonassen (Ed.), Handbook of research for educational communications and technology (pp. 957982). New York, NY: Simon & Schuster Macmillan. Winne, P. H., & Nesbit, J. C. (2009). Supporting self-regulated learning with cognitive tools. In D. J. Hacker, J. Dunlosky, & A. Graesser (Eds.), Handbook of metacognition in education (pp. 259 277). New York, NY: Routledge. Winne, P. H., & Hadwin, A. F. (1998). Studying as self-regulated learning. In D. J. Hacker, J. Dunlosky, & A. C. Graesser (Eds.), Metacognition in educational theory and practice (pp. 277304). Hillsdale, NJ: Erlbaum. Zimmerman, B. J. (1998). Academic studying and development of personal skill: A self-regulatory perspective. Educational Psychologist, 33, 73 86.

Appendix Sample Text and Test Questions


Ultrasound is sound waves beyond the frequency of human hearing (16KHz20KHz). Because of its high frequency, strong directionality, and penetrative ability, it has been widely applied in various domains including ultrasonic testing, medical sonography, power ultrasound, and high-frequency ultrasound. Ultrasonic cleaning is one of the most applied methods in power ultrasound. The cleaning process is completed by peeling off the dirt from objects surface using ultrasound cavitation in liquid medium. Under the ultrasonic effect, this liquid will generate a large amount of tiny and unstable bubbles. Through the vibration of ultrasound waves, the bubbles undergo a repetitive process of production, closing, and rapid expansion. In the closing phase, micro shock waves are generated with high pressure ranging from a few hundred up to a few thousand Pa. The bubbles finally explode abruptly due to rigorous bombardment, causing an atmospheric pressure up to 1,000 atm. The whole phenomenon is called cavitation, which takes place easily at the interface between solid and liquid. Locally it may reach a few hundred degrees Celsius and up to 1,000 atm atmospheric pressure. Using this method, dirt is being scattered at the objects surface or unreachable area, while vibration could speed up the pulsation and stirring of solution. This process in turn enhances the cleaning outcome. Therefore, ultrasonic cleaning technology has become the most effective means of cleaning both domestically and internationally. The ultrasonic cleaning device is composed of three major components: the ultrasonic generator, the ultrasonic transducer, and the cleaning trough. The ultrasonic generator is the power supply. The ultrasonic transducer is the vibration platform, a vital part in the entire process. It transforms electromagnetic vibration generated from the ultrasonic generator into its own ultrasonic vibration, then transfers it to the cleaning trough, and finally causes cavitation of the cleaning fluid. The cleaning trough is used for holding both the fluid and the object. It is often made from stainless steel, with size and shape tailor-made according to actual need. Using ultrasonic cleaning, one does not need to worry about any disassembly and brushing of the object since this new technique is fast and efficient and reduces effort. The ability to remove dirt from structurally complex components, deep pinholes, fillisters, and hidden apertures and slits displays the high-quality cleaning of the technology. In situations where cleaning might be difficult or cause harm to health, such as radioactive contamination from

(Appendix continues)

88

SHIU AND CHEN

nuclear or medical settings, ultrasonic cleaning could reduce the harmful effects on human. Automated cleaning could also be easily realized by adopting this technique. Since its introduction in Japan in 1951, the ultrasound cleaning machine has undergone rapid development and applications in various domains. For example, in automotive manufacturing, the major automotive parts and accessory systems (APAS) are usually delicate. Their surface might become rusty and oily following the rolling process and thermal treatment. Ultrasonic cleaning is able to create an excellent final cleaning effect before different parts are assembled. When these APAS undergo maintenance, the accumulated charcoal and grease on their surface could be extremely difficult to remove if general or traditional cleaning methods are used. These methods usually include soaking the object in organic solvent or scraping it with a small saw blade. However, this is not an efficient cleaning method and could even cause damage to the workpiece. The easy-to-use ultrasonic cleaning obviously offers as a better way out. Another example is from the electronics industry. Largely due to the emergence of microelectronics technology, the cleaning standard for APAS and certain materials is on the rise. For example, in superintegrated circuits, motes and germs of more than 0.1m are considered impermissible. However, ultrasonic cleaning can easily reach this standard. Apart from that, ultrasonic cleaning also excels in the removal of greasy dirt, solder, antiseptic, oxide compound, and thumbprints from electron tube components, transistor units, vitreosol plates, and printed circuits. In medicine and hygiene domains, substances like blood, fat, and muscular tissue may attach to the apparatus after surgery. As those apparatus often have tooth grain and joints, merely washing them by hand might not produce a thorough cleaning. Again, ultrasonic cleaning provides a reliable means for asepsis operation. In addition, ultrasonic cleaning is already a widespread application in

precise industries such as gem processing, clocks and watches, and optics machinery. Possessing all the features of environmental protection, water-saving, time-saving, high effectiveness, low cost, and low corrosion, ultrasonic cleaning certainly has a broad and bright prospect for future development and application. Factual question: The vital part of ultrasonic cleaning is A. The ultrasonic generator B.* The ultrasonic transducer C. The ultrasonic conductor D. The cleaning trough Inference question: Ultrasonic cleaning technology has become the most effective means of cleaning both domestically and internationally. Which of the statements below wrongly describe the preceding sentence? A. Ultrasonic cleaning technology is at present the cleaning system with highest efficiency and cleaning power. B. Ultrasonic cleaning technology is especially suitable for cleaning workpieces with complex shapes and structures. C.* The widespread application of ultrasonic cleaning technology can stop pollution completely and eliminate all toxic substances that are harmful to humans. D. Ultrasonic cleaning technology does not require any washing by hand, and its effect exceeds that of other cleaning methods, including hand washing. [Translated from Chinese. Asterisks indicate the correct answer.] Received July 17, 2008 Revision received May 16, 2012 Accepted May 23, 2012

Potrebbero piacerti anche