Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
com/
Teaching Research
Published by:
http://www.sagepublications.com
Additional services and information for Language Teaching Research can be found at:
Subscriptions: http://ltr.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://ltr.sagepub.com/content/18/3/320.refs.html
What is This?
LANGUAGE
TEACHING
Article RESEARCH
information in language
development
Beatriz Lado
Lehman College (CUNY), USA
Catherine A Stafford
University of Wisconsin-Madison, USA
Cristina Sanz
Georgetown University, USA
Abstract
The current study compared the effectiveness of computer-delivered task-essential practice
coupled with feedback consisting of (1) negative evidence with metalinguistic information
(NE+MI) or (2) negative evidence without metalinguistic information (NEMI) in promoting
absolute beginners (n = 58) initial learning of aspects of Latin morphosyntax. This study measured
language development on a variety of dependent measures (three comprehension-based tests and
one production test), assessing both changes in accuracy and reaction time as well as examining
effects on trained (old) vs. untrained (new) items. Although participants under both conditions
improved in accuracy and reaction time on all measures, on immediate post-tests, participants
receiving metalinguistic information outperformed those who did not. However, this advantage
had largely dissipated by the time of the delayed tests. Performance on untrained items also
suggests an advantage for metalinguistic feedback on system learning and on transfer of skills
from comprehension-based practice to production. Furthermore, we argue, based on findings
Corresponding author:
Beatriz Lado, Department of Languages and Literatures, Lehman College of the City University of New
York, Carman Hall, Room 257, 250 Bedford Park Blvd, West Bronx, New York, NY 10468-1589, USA.
Email: beatriz.lado@lehman.cuny.edu
Keywords
Explicit, feedback, metalinguistic information, negative evidence
IIntroduction
The role of feedback in second language acquisition (SLA) is a topic of both theoretical
and practical importance. Many SLA studies have investigated the effects of feedback in
the context of face-to-face interaction (e.g. Carroll & Swain, 1993; Leeman, 2003;
Mackey, 1999; Mackey, Gass, & McDonough, 2000; Mackey & Philp, 1998; McDonough
& Mackey, 2006) and in computer-mediated communication (CMC) (Sachs & Suh,
2007; Sagarra, 2007). Results of recent meta-analyses indicate an overall positive effect
for feedback in both types of interaction (Keck, Iberri-Shea, Tracy-Ventura, &
Wa-Mbaleka, 2006; Mackey & Goo, 2007), and specifically for corrective feedback pro-
vided during interaction (Russell & Spada, 2006) on second language (L2) development.
Lis (2010) metanalysis showed that implicit feedback produces larger long-term and
therefore more reliable effects than explicit feedback.1
Recent research has also incorporated the use of computer assisted language learning
(CALL) applications (e.g. Rosa & Leow, 2004; Sanz, 2004) to examine the effects of
feedback. As opposed to CMC, where the computer acts as a tool to facilitate interaction
that may lead learners toward negotiation and ultimately language development (Sanz &
Lado, 2008), in CALL the computer is a tutor, and feedback is usually immediate, pro-
vided only when needed, individualized, and focused on the key form (Sanz, 2004, p.
12). In this way, as stated by Nagata and Swisher (1995), the learners attention (is
drawn) to weaknesses in their mastery of grammatical points that might not become
apparent in the classroom (p. 339).
While feedback has often been the focus of research in interaction studies, whether
classroom or CMC, fewer studies have investigated the effects of different types of feed-
back in more controlled (laboratory) settings, and especially in CALL. Of the handful of
dissertations and published studies, some have shown that learners exposed to feedback
consisting of negative evidence with metalinguistic information outperform those
exposed to feedback including negative evidence alone, on both immediate and delayed
post-tests (Bowles, 2005; Nagata, 1993; Nagata & Swisher, 1995; Rosa & Leow, 2004).
However, non-significant differences in performance on immediate tests have also been
found (Camblor, 2006; Hsieh, 2007; Moreno, 2007; Sanz, 2004; Sanz & Morgan-Short,
2004). In addition, some results on delayed tests suggest that there may in fact be longer-
term benefits for exposure to negative evidence alone (Moreno, 2007; Stafford, Sanz, &
Bowden, 2012), in line with Lis (2010) conclusions on feedback in interaction research.
Divergent results in many of these studies together with the importance attached to
feedback by language teaching practitioners suggest the need to take a closer look at the
did not control the amount of individualized feedback each participant received, and
retention of knowledge gained was not tracked.
Four additional unpublished dissertation studies (Bowles, 2005; Camblor, 2006;
Hsieh, 2007; Moreno, 2007) have also addressed the issue of type of feedback provided
to learners in a CALL context and have incorporated conditions similar to those included
in the aforementioned studies (i.e. negative evidence with or without metalinguistic
information). Taken together, their results suggest that while provision of feedback is
beneficial, +/ metalinguistic information comparisons do not always favor the metalin-
guistic feedback condition (Camblor, 2006; Hsieh, 2007; Moreno, 2007). Specifically, in
Bowless (2005) study, providing metalinguistic information gave learners some imme-
diate advantage when compared to learners who do not receive any grammar explanation
as part of feedback, but in the long run this advantage disappeared. By contrast, in
Morenos (2007) study, delayed advantages were observed for learners who received
only information about the correctness of their answers to practice tasks, i.e. negative
evidence alone. Limitations in these studies include potential problems with the scoring
procedure, which may have accepted ungrammatical targeted items as correct (e.g.
Bowles, 2005), and lack of statistical power, which may explain lack of differences
between groups (e.g. Camblor, 2006).
Stafford and colleagues (2012) compared the effects of oral and written task-essential
practice combined with negative evidence with and without metalinguistic information
provided to SpanishEnglish bilinguals (n = 65) learning to assign semantic functions
via noun case morphology in third-language (L3) Latin. Results from a grammaticality
judgment test indicated that, unlike their counterparts, participants who received nega-
tive evidence alone not only learned but also retained what they learned over a period of
at least three weeks. However, where transfer of skills from input practice to output
assessment was involved, only the group exposed to negative evidence with metalinguis-
tic information showed significant improvement.2
In sum, studies on the differential effects of computer-delivered feedback have led to
inconclusive results due in part to the limited number of studies that have investigated
such effects. Additionally, research designs have almost always been biased against more
implicit treatment conditions (in our case, without metalinguistic information) by pro-
viding not only a different type of input or feedback (in the language vs. about the lan-
guage) but less input, as in Rosa and Leow (2004). Combining this situation with a lack
of control of time on task often results in shorter implicit conditions (e.g. Sanz & Morgan-
Short, 2004). Thus, research is needed that more equitably compares the effects of differ-
ent degrees of explicitness of feedback, specifically negative evidence with and without
metalinguistic information, in pedagogical conditions. This includes examining the pos-
sibility that feedback that consists of negative evidence without metalinguistic informa-
tion leads to more stable knowledge, as suggested in studies on interaction (e.g. Li, 2010)
and computerized feedback (e.g. Bowles, 2005; Moreno, 2007; Stafford et al., 2012).
Furthermore, the nature of the measurements themselves (R. Ellis, 2005) have tended
to favor planning and explicit processing (e.g. untimed translation task in Bowles, 2005),
which may benefit learners who have undergone treatments providing feedback with
metalinguistic information. Along these lines, additional insight may be gained by exam-
ining the effects of different types of feedback on both speed of processing (reaction
time; RT) and accuracy, given that accuracy alone cannot inform us about potential pro-
cessing differences underlying what may be apparently indistinguishable gains. For
example, longer RTs in one group as compared to another (where accuracy is equivalent)
might suggest that the underlying behavior is different, with the slower group relying on
controlled processes (Newell, 1990). Thus, examining RT can be an effective way to
measure automaticity, i.e. the ability to perform without conscious awareness or while
utilizing minimum attentional resources (Jiang, 2007, p. 2). As Segalowitz (2003)
claims, speed of processing is the characteristic most frequently associated with automa-
ticity. Nevertheless, Segalowitz (2003) also argues that, more than just a synonym for
fast processing, automaticity should be used for situations where the change is of sig-
nificant consequence, such as restructuring of underlying processes (p. 387). In the pre-
sent study, we combine both accuracy and reaction time as a way to look into learners
efficiency in restructuring their underlying processes.
The present study aims to address several of the limitations in previous studies and to
cast additional light on the role of feedback in L2 development. In addition to using
computers to tightly control the amount of input, practice, and feedback provided, we
also collected data through both receptive (input or comprehension-based) and produc-
tive measures, and gathered both accuracy and RT data. The tests also included both old
and new items in order to provide information on item-learning and system-learning.
Specifically, performance on trained items may be indicative of learners ability to
remember chunks of language, whereas performance on untrained items indexes degree
of success at system learning. By considering all these measures, we aim to provide a
broader and deeper picture of language learning and retention.
In this study, we specifically examine how absolute beginners learn to interpret and
produce the semantic functions of noun phrases (i.e. to decide who does what to whom)
under instructional conditions that combine comprehension-based, task-essential prac-
tice with feedback that provides negative evidence but differs in the provision of meta-
linguistic information.
To find equilibrium between ecological validity, on the one hand, and control of exter-
nal input and prior knowledge, on the other, the target language chosen was Latin, a natu-
ral language that is no longer spoken in its classical form, and specifically noun case
morphology as the target form. We were guided by the following research questions:
Does the presence or absence of metalinguistic information (MI) in combination with
negative evidence (NE) in computer-delivered, task-essential practice differentially
affect absolute beginners ability to assign semantic functions in Latin? Do these effects
differ across different tasks or trained vs. untrained items?
IIIMethodology
1 Participants
Participants in the present study were 58 college students native speakers of English
randomly assigned to one of two treatment conditions: NE+MI (n = 33) and NE-MI (n =
25). Participants age ranged from 18 to 22 years old. To control for previous language
experience, we recruited participants who had no knowledge of Latin or any other case
marking language and who were in a second-year Spanish program. We accepted
participants with one or two semesters in a non-case language. Therefore, in some cases,
Latin was the fourth language (L4) rather than the L3. Participants scoring 67%3 or
higher on the pre-tests were not included in the final sample. All participants were com-
pensated for their participation with extra credit.4
2 Target structure
The linguistic target of the study was the assignment of thematic agent/patient roles to
nouns in Latin via case morphology. The theoretical framework that guided the design
of our materials was the Competition Model (CM), developed by Bates and MacWhinney
(1989). In Competition Model (CM) terms, language learning is defined as a process
of acquiring coalitions of formfunction mappings, and adjusting the weight of each
mapping until it provides an optimal fit to the processing environments (MacWhinney,
2001, p. 59). It is argued that when processing language, the assignment of functional
meanings to grammatical forms in the input involves competition, which is governed by
the cue validity of the linguistic input. Cue validity refers to the availability (fre-
quency of appearance), and reliability (degree to which a cue leads to the correct inter-
pretation) of a particular cue in the input. In the present study, the targeted linguistic
forms were noun and verb morphology that indicate thematic agent and patient roles in
Latin (i.e. who does what to whom). In Latin, the strongest cue (i.e. the most available
and reliable) is case morphology, followed by subjectverb agreement, and finally,
word order. This state of affairs is reversed in the participants first language (L1),
English, in which word order is the strongest cue. In contrast, in Spanish, the partici-
pants L2, verb agreement is the strongest cue, followed by word order (Bates &
MacWhinney, 1989). System learning in CM terms is understood as the application of
a new cue hierarchy to novel input; in the current study, system learning is investigated
through the inclusion of novel test items.
Following task-essentialness principles (Loschky & Bley-Vroman, 1993) that have
been shown to lead to reliable linguistic gains in Processing Instruction research
(VanPatten, 2005), we manipulated the input so that participants would be encouraged to
rely on noun and verb morphology to interpret sentences. Practice sentences were manip-
ulated so that neither word order nor subjectverb agreement was a consistently reliable
cue. Given the cue hierarchy of English, which has a strict subjectverbobject (SVO)
word order, we predicted that when participants first read a sentence such as
they would by default respond assuming SVO word order (their L1 cue), which would
lead to an incorrect answer. Provision of immediate negative feedback might then lead
them to restructure their system and shift their reliance to a more reliable cue, in this
case, noun case morphology.5 Given these manipulations and the fact that practice could
not be performed successfully by relying solely on the lexical meaning of the nouns and
verbs, on word order, or on verbal morphology, participants were led to process both
noun and verb morphology as a means of successful task completion.
3 Experimental design
The experiment consisted of three sessions over four weeks; all took place in an Apple labo-
ratory, where participants interacted with an application that combined ColdFusion and
Flash programming to deliver audiovisual treatments and capture participants responses.
During the first session, participants completed a consent form and a background
questionnaire followed by a computer-delivered vocabulary lesson and quiz. Next, they
completed four pre-tests (written and aural interpretation, grammaticality judgment and
sentence production). During the second session, approximately one week later, partici-
pants completed the computer-delivered treatment and four immediate post-tests. At the
final session, two weeks later, participants completed four delayed post-tests and an
online debriefing questionnaire.
Participants responded via key press and received immediate feedback that remained
onscreen for five seconds before the program advanced automatically to the next practice
item. Both groups completed the practice session twice. Therefore, both groups were
exposed to the same number of Latin exemplars, but the NE+MI group additionally got
exposure to metalinguistic information. Time on task was balanced, however, in that
both types of feedback stayed onscreen for the same amount of time.
Following guidelines for developing structured input activities (Lee & VanPatten,
2003), both aural and written comprehension-based tasks were included in the treatment.
Task 1 presented a written Latin sentence and two English translation choices. Task 2
presented a written sentence and two picture choices. Task 3 presented a picture and two
written Latin sentence choices. Task 4 presented an aural sentence and two English trans-
lation choices. Task 5 presented an aural sentence and two picture choices. Task 6 pre-
sented a picture and an aural sentence, and participants had to decide whether or not the
picture shown matched the sentence heard. Participants responded via key presses.
Although the order in which the tasks were presented was fixed, item order was rand-
omized within each task.
All participants received feedback on both correct and incorrect responses during the
practice session. Thus, the amount of feedback provided was controlled across partici-
pants.8 As outlined above, the two treatments differed in the provision (or not) of meta-
linguistic information in the feedback. Feedback in the NE+MI condition confirmed or
rejected the response and included metalinguistic information about the target form.
Feedback in the NEMI condition confirmed or rejected responses, but did not provide
any metalinguistic information9. Examples of both types of feedback are provided in
Figures 1 and 2.10
c Language tests. Four language tests were administered: (1) a written interpretation
test, (2) an aural interpretation test, (3) a written grammaticality judgment test (GJT), and
(4) a sentence production test.11 Three versions of each test, with equivalent but different
items, were created, and these were administered as pre-, post-, and delayed tests. The
order of test version presentation was counterbalanced across participants and test ses-
sions, with the exception of the sentence production task, which was always completed
last in order to minimize test effects from production on comprehension/input-based (i.e.
receptive) tests. All language tests included trained (previously seen) and untrained
(new) items, and items within each test were presented in randomized order. Participants
were asked to respond as quickly and accurately as possible on the tests A third response
choice (I dont know), not included in practice, was included in the tests to minimize
artificially inflating scores from guessing on a 2-choice test.
The written and aural interpretation tests followed the same design as Tasks 2 and 5
in the practice session; participants listened to or read a Latin sentence and were instructed
to select the corresponding picture (from two choices), or the additional I dont know
response. Each test consisted of 20 items (i.e. sentences): 12 critical (6 trained and 6
untrained) and 8 distractors. Whereas the pictures in critical items represented reversed
subject/object roles, as in the practice tasks, the pictures in distractors depicted entirely
different scenes (different subjects, actions and objects), so that items could be answered
using only vocabulary knowledge, without attention to form and meaning of the target
structures.
On the GJT, participants read a sentence and indicated whether it was grammatical or
not (or I dont know) via key press. Like the interpretation tests, this test included 20
items, 12 critical (4 trained and 8 untrained) and 8 distractors. Of the 12 critical items, 6
were grammatical and 6 were ungrammatical. Of the 6 ungrammatical items, 2 had
incorrect case endings, 2 had incorrect subjectverb agreement, and 2 contained both of
these errors. The distractor sentences contained one noun and a verb rather than 2 nouns
and a verb.
On the sentence production test, participants saw a picture on the screen and were
asked to form a sentence that correctly described the picture by dragging and dropping
the provided noun and verb stems as well as appropriate morphological endings (which
they had to select from the complete set of endings provided) in order to form a Latin
sentence. To avoid biasing a particular word order, noun and verb stems appeared
onscreen in random order. For each production item, two nouns (subject and object) and
a verb were required to describe the action in the picture. Of the 15 items on the produc-
tion test, 10 were critical (5 trained and 5 untrained). As with the GJT, distractor sen-
tences contained one noun and a verb rather than 2 nouns and a verb.
The scoring procedure was straightforward: one point was awarded for each correct
answer to the 12 critical items on the interpretation and GJTs, making 12 the maximum
score on each of these 3 tests. Each sentence production item was awarded one, two,
or three points depending on the number of accurate morpheme choices: one point for
correct verb morphology (i.e. correct subjectverb agreement) and one point for each
correct noun ending (to score a point for a noun ending, both number and case had to be
accurate; half points were not awarded). Thus, the maximum score possible for the sen-
tence production test was 30. According to Cronbachs alpha values (minimum = .671 to
maximum = .870), test reliability was medium to high.
entered as the between-participants factor, and Time (pre-test, post-test, delayed test)
entered as the within-participants factor. In addition, when a significant Time Condition
interaction was present, independent samples t-tests were conducted to compare groups
performance on post- and delayed tests. When no statistically significant interaction was
found, post-hoc contrasts were analysed as a means to explore differences in groups
immediate learning (performance from pre-test to post-test) and retention (performance
from post-test to delayed test).
Independent samples t-tests performed on the pretest scores yielded no differences
between the groups for any of the tests: written interpretation, t(55) = .478, p = .635;
aural interpretation, t(54) = .334, p = .740; GJT, t(53) = .376, p = .709; and sentence
production, t(48) = .093, p = .926. Therefore, any differences between groups across time
can be attributed to the treatment.
ANOVA results for overall accuracy are summarized in Table 2. As the table shows,
results on all four Latin tests followed an almost identical pattern: significant main
effects for Time, Time Condition interactions, and main effects for Condition on all
tests except the GJT.
Results of independent samples t-tests showed that the NE+MI group outperformed the
NEMI group at post-test on all four test types (written interpretation, t(55) = 5.19, p < .001;
aural interpretation, t(54) = 4.32, p < .001; GJT, t(53) = 2.92, p < .05; sentence production,
t(48) = 3.005, p < .05), but this advantage remained at the delayed test only for sentence
production (written interpretation, t(55) = 1.92, p = .059; aural interpretation, t(54) = 1.90,
p = .063; GJT, t(53) = .011, p = .913; sentence production, t(48) = 2.37, p < .05).
a Trained items. Results for trained items mirrored those for overall accuracy as reported
above; i.e. the NE+MI group outperformed the NEMI group at the post-test, but only on
sentence production did it maintain its advantage at delayed testing. As results for perfor-
mance on untrained items revealed different patterns, we report these in more detail below.
b Untrained items. For accuracy on written interpretation, results for untrained items
yielded main effects for Time, F(2, 110) = 45.06, p < .001, partial = .45, and Condi-
tion, F(1, 55) = 11.68, p < .05, partial = .17, and a significant Time Condition inter-
action, F(2, 110) = 11.230, p < .001, partial = .17. As the means in Table 3 suggest,
results of the independent samples t-tests conducted on post- and delayed test scores
showed that the NE+MI group outperformed the NEMI group and maintained its
advantage two weeks after treatment (post-test, t(55) = 5.14, p < .001; delayed, t(55) =
2.11, p < .05).
Figure 3. Grammaticality judgment accuracy means for untrained items by treatment group.
For accuracy on untrained items in the aural interpretation test, significant main
effects for Time, F(2, 108) = 30.34, p < .001, partial = .36, and Condition, F(1, 54) =
5.33, p = .025, partial = .09 were identified, and are attributed to the NE+MI groups
overall superior performance. There was no significant Time Condition interaction,
F(2, 108) = 1.06, p = .348, partial = .02, power = .23.13 Statistical contrasts yielded no
significant results for any of the test sessions; pre- to post-test, F(1, 54) = 2.17, p = .147,
partial = .04, power = .30; post- to delayed test, F(1, 54) = .06, p = .811, partial =
.001, power = .06.
For accuracy on untrained items in the GJT, the ANOVA yielded a main effect for
Time, F(2, 106) = 13.62, p < .001, partial = .20, but neither a main effect for
Condition, F(1, 53) = 2.15, p = .148, partial = .04, power = .30, nor an interaction,
F(2, 106) = 2.51, p = .086, partial = .04, power = .30. However, results from contrast
analyses for post- to delayed tests, F(1, 53) = 5.67, p < .05, partial = .10, as illus-
trated in Figure 3, show that the NEMI group gained from post- to delayed test to the
point where it performed similarly to the NE+MI group, whose accuracy actually
declined in the two-week interval between the post- and delayed test sessions. These
results are confirmed with paired-samples t-tests: Whereas the NE+MI group improved
from pre- to post-test t(30) = 4.21, p < .001, and lost significantly from post- to
delayed test, t(30) = 2.06, p < .05, the NEMI group improved significantly from pre-
to post-test, t(23) = 3.50, p < .001, and maintained those gains between post- and
delayed test, t(23) = 1.37, p = .185.
Finally, for accuracy on untrained items in the sentence production test, there was no
main effect for either Time, F(2, 96) = 3.02, p = .053, partial = .06, power = .79, or
Condition, F(1, 48) = 3.92, p = .053, partial = .08, power = .50, but there was a signifi-
cant Time Condition interaction, F(2, 96) = 4.17, p < .05, partial = .08. Results of
independent samples t-test conducted on post-, t(48) = 2.85, p < .01, and delayed tests,
t(48) = 1.28, p = .205, indicated that the NE+MI group outperformed the NEMI group
on the immediate post-test, but again this between-group difference disappeared after
two weeks. Nevertheless, as revealed by paired samples t-tests, whereas the NE+MI
group showed performance gains from pre-test to post-test, t(25) = 3.38, p < .001, and
maintained these gains on the delayed test, t(25) = 1.49, p = .150, the NEMI group did
not have any immediate gains, t(23) = .39, p = .698, nor did they improve significantly
between pre- and delayed tests, t(23) = .51, p = .618.
assessments (written interpretation, F(1, 53) = 18.50, p < .001, partial = .26, and aural
interpretation, F(1, 50) = 22.71, p < .001, partial = .31). The pattern that emerges from
RT data on the delayed tests is more complex. First, both groups maintained their initial
gains in speed on the written interpretation test, as indicated by a non-significant contrast
from post-test to delayed test, F(1, 53) = 1.05, p = .310, partial = .02, power = .17.
However, they slowed down significantly from post- to delayed test on the aural interpre-
tation test, F(1, 50) = 8.97, p < .001, partial = .15.
Somewhat different results were obtained for the GJT RTs. As shown in Table 5, the
analyses again yielded a significant main effect for Time and no main effect for Condition.
Contrast analyses (between pre- and post-tests and between post- and delayed tests)
revealed that all participants responded faster at post-test, F(1, 49) = 7.40, p < .01, partial
= .13, and maintained these RTs at delayed test, F(1, 49) = 1.15, p = .288, partial =
.02, power = .18. Note however that, unlike the results for the other tasks, there was a
significant Time Condition interaction. Results of independent samples t-tests con-
ducted on post-, t(49) = 2.46, p < .05, and delayed tests scores t(49) = 2.23, p < .05
revealed a difference in RT between the two groups. This difference was due to reliably
faster performance by the NEMI group, as revealed in the only significant difference,
found in results of the paired-samples t-test conducted between pre and post-tests,
(NE+MI, t(28) = .55, p = .585; NEMI, t(23) = 3.74, p < .01) (see also Figure 4).
Time, F(2, 102) = 1.82, p = .168, partial = .03, power = .37. Also, there was no signifi-
cant Time Condition interaction for RTs from untrained and trained items on the GJT
test (trained, F(2, 78) = .70, p = .500, partial = .02, power = .16; untrained, F(2, 88) =
1.36, p = .263, partial = .03, power = .28).
To summarize, input-based practice enhanced both accuracy and speed of response
irrespective of type of feedback. This was true across all items as well as for trained items
on all tests. Results showed that participants who received negative evidence with meta-
linguistic information (NE+MI) consistently outperformed their NEMI counterparts on
accuracy of interpretation, sentence production, and grammaticality judgment immedi-
ately following treatment. After two weeks, the between-group differences in accuracy
had largely dissipated, with the NEMI group performing similarly to the NE+MI group
on all measures except the sentence production test. Analyses of RTs revealed that on the
GJT the NEMI group quickened their RTs over time more than the NE+MI group.
Analyses of performance on untrained items revealed a complex view of the effects
of feedback conditions on the ability to extend newly gained knowledge to items never
seen before. First, the main effects of Time show that irrespective of condition, the treat-
ment was beneficial, as reflected in all participants improved performance on the inter-
pretation and GJTs. In addition, in the case of the interpretation tests, the NE+MI group
generally outperformed the NEMI group in accuracy (at both post- and delayed tests for
written interpretation, and overall for aural interpretation). On the GJT, the NEMI and
Table 6. Descriptive statistics: RTs for trained and untrained items in seconds.
NE+MI groups performance was not significantly different across time. For sentence
production, the NE+MI group showed an advantage over the NEMI group at the post-
test, but this difference had dissipated by the time of the delayed test. Results show a lack
of overall trade-off effects as speed increased with accuracy on all tests except for aural
interpretation, for which neither condition seemed to alter participants speed of response.
VDiscussion
This study investigated the differential effects of negative evidence with or without met-
alinguistic information on initial language development as reflected in learners ability
in terms of accuracy and reaction time to assign semantic functions to noun phrases.
Specifically, we looked at learners ability to accurately and efficiently interpret, judge
and produce sentences in Latin, a language that relies on noun case (and verb agreement)
morphology to convey who does what to whom. Mindful of the limitations identified
in the relatively small number of studies similar to ours, we implemented a design that
gave a fair chance to a less explicit treatment group (negative evidence without metalin-
guistic information) by controlling time on task (amount of practice) and by avoiding
testing materials that favor planning and explicit processing. Recall that these factors
may have unduly favored more explicit treatments in previous research. Moreover, by
conducting separate analyses on accuracy and RT, and on trained and untrained items, we
were uniquely able to compare groups initial language development in terms of both
accuracy and efficiency, as well as their item and system learning. Several interesting
conclusions emerge from the results reported above.
In line with DeKeysers (2003), Lis (2010), Norris and Ortegas (2000), and Spada
and Tomitas (2010) reviews and meta-analyses, as well as studies by Bowles (2005),
Nagata (1993), Nagata and Swisher (1995), and Rosa and Leow (2004), our results sug-
gest that participants who received a combination of negative evidence and metalinguis-
tic explanations outperformed those who received only negative evidence on immediate
post-tests. However, these results diverge from those of Camblor (2006), Hsieh (2007),
Moreno (2007), Sanz (2004), Sanz and Morgan-Short (2004), where there was no differ-
ence between conditions with and without metalinguistic explanation in feedback. An
explanation for this difference may be found in the processing demands made by the
target forms, in learner readiness, and in learner proficiency level.
With regard to the target forms, our participants were required to (1) rely on case
morphology over word order and verbal agreement, (2) process eight noun endings (case
morphemes) (3) attach them to the right noun (in the case of sentence production), and
(4) establish verbal number agreement. This process involves far more than the four
forms required in Moreno (2007), Sanz (2004), and Sanz and Morgan-Short (2004),
where the choice of target form (Spanish object clitics) was also guided by Competition
Model principles (Bates & MacWhinney, 1989). Furthermore, the case morphemes are
not perceptually salient, as they are monosyllabic and appear at the end of the word.
Moreover, there are multiple forms feminine, masculine, singular and plural to
encode the same function, accusative or nominative, which contributes to form complex-
ity. This all makes assigning semantic functions one of the most difficult aspects of Latin,
as any teacher or learner will attest. The provision of metalinguistic information along
with negative evidence may have benefited learners in creating form/meaning connec-
tions with such cognitively demanding target forms, as suggested by results in all four
measures, at least at the time of the post-test.
It has also been suggested that learner readiness and level of proficiency may influ-
ence feedback effects (e.g. Iwashita, 2003; Mackey & Philp, 1998). Nagata (1993),
Nagata and Swisher (1995) and Rosa and Leow (2004) examined second-year language
learners; other studies (Hsieh, 2007; Moreno, 2007; Sanz & Morgan-Short, 2004)
included participants with a basic level of proficiency, while our study looked at absolute
beginners. A recent study by Morgan-Short, Sanz, Steinhauer, and Ullman (2010) simi-
larly revealed that naive learners had an initial advantage in performance under a more
explicit condition with metalinguistic information, as compared to a less explicit condi-
tion without metalinguistic information, in learning an artificial language. Thus, at least
with respect to existing laboratory-based research on the effects of more and less explicit
instruction on SLA, it appears that providing metalinguistic rules gives at least an initial
advantage to naive learners when the processing of linguistic forms is cognitively
demanding.
The separate analyses we conducted of untrained and trained items showed an inter-
esting pattern of maintenance and loss of gains depending on type of test and item. First,
trained items analysed alone patterned with all items analysed together. In both analyses,
on all tasks, the NE+MI group had an initial advantage over the NEMI; however, this
advantage was lost on delayed tests in all cases except for sentence production, due in
part to the modest but stable gains made by the group receiving negative evidence only
(NEMI). Note that the sentence production test required transfer of skills from compre-
hension/input-based practice to the ability to produce the target form. Thus, similar to
Stafford et al. (2012), while comprehension-based practice with negative evidence alone
external exposure, especially in early classroom research, a suspicion that has no place
in our study given its controlled laboratory design. To what, then, can we attribute the
apparently more stable gains of the NEMI group, compared with the losses on some
measures by the NE+MI group between immediate and delayed tests? This difference in
stability of accuracy gains over time may reflect qualitatively different learning pro-
cesses at work: more explicit processes in the NE+MI group and more implicit processes
in the NEMI group (Li, 2010). The problem is that immediate post-tests cannot reflect
the full extent of implicit learning because implicit learning takes more time than explicit
learning and includes a latent phase of experience-triggered memory consolidation fol-
lowing practice (see, for example, Ari-Even Roth, Kishon-Rabin, Hildesheimer, & Karni,
2005). Such consolidation processes would occur subsequent to and thus not be captured
by performance on immediate posttests.
Faster performance by the NEMI group on the GJT also suggests that the two groups
may have been engaged in qualitatively different processing that led to quantitatively
similar accuracy outcomes, providing further evidence that different types of instruction
may have led to different types of processing in this sample. The faster processing of the
NEMI group on the GJT is reminiscent of R. Ellis (2005) claim that a timed GJT may
be a good measure of implicit knowledge. Faster RTs are usually interpreted as a sign of
increased automaticity, whereas slower RTs are taken to index reliance on slower, con-
trolled processes, including monitoring. For example, Sanz, Lin, Lado, Bowden and
Stafford (2009) found that participants who were required to verbalize their thought
processes while interacting with a grammar lesson were slower at the post-test, which the
authors interpreted as showing that verbalizations had affected the quality of the cogni-
tive processes involved in learning for participants in the more explicit condition.
Irrespective of whether increased reaction times index increased automatic process-
ing or decreased monitoring, they are always considered a sign of efficiency when accu-
racy is maintained and reaction times speed up. Thus, the NEMI group appears to have
been engaged in more efficient, potentially more automatic and less monitored process-
ing of the L3 on the GJT even when accuracy was similar.
Evidence for different cognitive processes underlying similar performance was also
found by Morgan-Short et al. (2010), a study that employed an artificial language para-
digm to examine whether explicit and implicit training differentially affect neural (elec-
trophysiological) and behavioral (performance) measures of syntactic processing
(nounarticle and nounadjective gender agreement). Explicit training conditions in this
study included metalinguistic explanations and meaningful examples of the target lan-
guage, and implicit training conditions provided only meaningful examples. Results
showed that at high proficiency (i.e. when participants had completed a certain number
of practice blocks), accuracy for the explicitly and implicitly trained groups did not dif-
fer. In contrast, electrophysiological (event-related potential, ERPs) measures revealed
striking differences between the groups neural activity: While explicit training (with
metalinguistic information) resulted in some aspects of brain processing found in native
speakers, only implicit training (without metalinguistic information) led to a fully native-
like neurocognitive pattern.
Overall, our results suggest that exposure to both task-essential practice and either
type of feedback (negative evidence with or without metalinguistic information) is
beneficial for L2 learning. This is true both in terms of accuracy and speed of processing.
Even without metalinguistic information, learning occurs, although it appears to require
more exposure to the target language (cf. N. Ellis, 1993) and more time for consolidation
(Ari-Even Roth et al., 2005). We speculate that with more practice and/or more time for
consolidation, participants exposed to negative evidence only might achieve levels of
performance at least comparable and perhaps superior to those obtained by their counter-
parts, and their gains in achievement would eventually include productive skills.
Acknowledgements
We would like to acknowledge the two anonymous reviewers for their constructive comments. In
addition, we thank Alison Mackey for her helpful insights and Rusan Chen for his statistical exper-
tise. Any remaining errors are ours.
Funding
This study is part of The Latin Project, developed to investigate the relationship between individ-
ual differences and pedagogical variables in language acquisition with support from Georgetowns
GSAS and Spencer Foundation grants to Sanz, as well as assistance from Bill Garr and RuSan
Chen of Georgetowns UIS/CNDLS. Lado conducted the experiment with materials developed by
Sanz, Bowden, and Stafford, analyzed the data, and wrote the manuscript with Sanz, with subse-
quent extensive review by Bowden and Stafford.
Notes
1. Although explicitness in feedback is often operationalized as a dichotomy, it is generally con-
sidered to be a continuum that ranges from more explicit to more implicit. As stated by Sanz and
Morgan-Short (2005), the more metalinguistic the learning condition, the more explicit it is; the
more naturalistic the learning condition, the more implicit it is considered to be (p. 235).
2. Finally, studies employing computer-mediated communication (CMC) have also investi-
gated the effect of degree of explicitness of feedback on SLA. These studies, however,
differ from CALL studies in the nature of the feedback provided. Specifically, implicit
feedback in CMC is given through clarification requests or recasts much like in interaction
research, whereas explicit feedback consists of negative evidence, i.e. specific, overt cor-
rections that may or may not include metalinguistic information on the target form (Mackey
& Abbuhl, 2005).
3. Although this percentage may seem high, half of the test sentences were presented in SVO
word order, which may have aided educated guessing on the pre-test, thereby inflating scores.
4. Participants had extensive experience using computers but had no experience with this type of
language learning approach in their classrooms (receiving feedback through the computer).
5. The noun endings present in the study were as follows: subject masculine singular (-us) and
plural (-i); object masculine singular (-um) and plural (-os); subject feminine singular (-a)
and plural (-ae); object feminine singular (-am) and plural (-as). The verb endings were -t for
the singular and -nt for the plural. The number of sentences with each possible word order
combination (SOV, SVO, VOS, VSO, OVS and OSV) was balanced; therefore, learners were
not able to use word order (and Englishs strong SVO order cue in particular) as a reliable
cue to assigning thematic roles. Also, nounverb agreement was a reliable cue only when one
of the two nouns in a sentence was singular and the other was plural (so that only one noun,
namely the subject, agreed with the verb). Again, items resulting from the different combina-
tions were balanced. Finally, animacy, a potentially informative cue to agency, was controlled
by including only animate nouns in the treatment.
6. The purpose of the vocabulary lesson was to teach absolute beginners of Latin basic nouns
and verbs that appeared later in the practice and tests.
7. Given that the target form was case assignment (who does what to whom), students were
taught more nouns than verbs in order to be able to provide sentences with a variety of sub-
jects and objects.
8. We chose not to include a no feedback condition because of the importance given to feed-
back on both theoretical and practical levels, and because such a condition would likely be
extremely frustrating for absolute beginners, given especially that participants received no
explicit instruction before interacting with the practice activities. Such a condition could clearly
be provided, however, if one wished to investigate the effects of practice without feedback.
9. We renamed our conditions after a comment made by one anonymous reviewer who noted that
our treatments differ in the availability of metalinguistic information vs. negative evidence.
Nevertheless, as mentioned by the same reviewer, our NE-MI condition also includes double
exposure to positive evidence.
10. Feedback provided after an incorrect response is similar to that presented in Figures 1 and 2
but included a negative word such as nope or oops instead of a positive word such as right.
11. Each test was preceded by a mini vocabulary quiz to ensure vocabulary knowledge specific
to that test. When participants scored less than 60% on the vocabulary quiz, they repeated the
vocabulary lesson until they reached 100%
12. As suggested by an anonymous reviewer, given the nature of the production test (more time-
consuming than the rest and with no time limit), we do not report reaction time for this test.
13. Since observed power is a consideration mainly for non-significant results, as low power is
often an indication of small sample size, we only report power in this case.
References
Ari-Even Roth, D., Kishon-Rabin, L., Hildesheimer, M., & Karni, A. (2005). A latent consolida-
tion phase in auditory identification learning: Time in the awake state is sufficient. Learning
and Memory, 12, 159164.
Bates, E., & MacWhinney, B. (1989). Functionalism and the Competition Model. In B.
MacWhinney & E. Bates (Eds.), The cross-linguistic study of sentence processing (pp. 313).
New York: Cambridge University Press.
Bowles, M. (2005). Effects of verbalization condition and the type of feedback on L2 development
in a CALL task. Unpublished dissertation, Georgetown University, Washington, DC, USA.
Camblor, M.T. (2006). Type of written feedback, awareness, and L2 development: A computer-
based study. Unpublished dissertation, Georgetown University, Washington, DC, USA.
Carroll, S., & Swain, M. (1993). Explicit and implicit negative feedback: An empirical study
of the learning of linguistic generalizations. Studies in Second Language Acquisition, 15,
357386.
DeKeyser, R. (2003). Implicit and explicit learning. In C. Doughty & M. Long (Eds.), The hand-
book of second language acquisition (pp. 313347). Oxford: Blackwell.
Ellis, N. (1993). Rule and instances in foreign language learning: Interactions of explicit and
implicit knowledge. European Journal of Cognitive Psychology, 5, 289318.
Ellis, N. (2005). At the interface: Dynamic interactions of explicit and implicit language knowl-
edge. Studies in Second Language Acquisition, 27, 305352.
Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A psychometric
study. SSLA, 27, 141172.
Herron, C. (1991). The garden path correction strategy in the foreign language classroom. The
French Review, 64, 966977.
Hsieh, H. (2007). Input-based practice, feedback, awareness and L2 development through a com-
puterized task. Unpublished dissertation, Georgetown University, Washington, DC, USA.
Iwashita, N. (2003). Negative feedback and positive evidence in task-based interaction: Differential
effects on L2 development. Studies in Second Language Acquisition, 25, 136.
Jiang, N. (2007). Selective integration of linguistic knowledge in adult second language learning.
Language Learning, 57, 133.
Keck, C.M., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S. (2006). Investigating the
empirical link between task-based interaction and acquisition: A meta-analysis. In J.M. Norris
& L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 91131).
Amsterdam: John Benjamins.
Lee, J., & VanPatten, B. (2003). Making communicative language happen. New York: McGraw
Hill.
Leeman, J. (2003). Recasts and second language development: Beyond negative evidence. Studies
in Second Language Acquisition, 25, 3763.
Sanz, C. (2004). Computer delivered implicit vs. explicit feedback in processing instruction.
In B. VanPatten (Ed.), Processing instruction: Theory, research, and commentary (pp. 241
255). Mahwah, NJ: Lawrence Erlbaum.
Sanz, C., & Lado, B. (2008). Technology and the study of awareness. In J. Cenoz & N.H.
Hornberger (Eds.), Encyclopedia of language and education: Volume 6: Knowledge about
language (2nd ed.) (pp. 299312). Philadelphia, PA: Springer Science+Business Media
LLC.
Sanz, C., & Morgan-Short, K. (2004). Positive evidence versus explicit rule presentation and
explicit negative feedback: A computer assisted study. Language Learning, 54, 3578.
Sanz, C., & Morgan-Short, K. (2005). Explicitness in pedagogical interventions: Input, practice,
and feedback. In C. Sanz (Ed.), Mind and context in adult second language acquisition:
Methods, theory, and practice (pp. 234263). Washington, DC: Georgetown University Press.
Sanz, C., Lin, H.-J., Lado, B., Bowden, H.W., & Stafford, C.A. (2009). Concurrent verbalizations,
pedagogical conditions, and reactivity: Two CALL studies. Language Learning, 59, 3371.
Segalowitz, N. (2003). Automaticity and second language acquisition. In C. Doughty & M. Long
(Eds), The handbook of second language acquisition (pp. 382408). Oxford: Blackwell
Publishers.
Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and type of language
feature: A meta-analysis. Language Learning, 60, 263308.
Stafford, C.A., Sanz, C., & Bowden, H. (2012). Optimizing language instruction: Matters of
explicitness, practice, and cue learning. Language Learning, 62, 741768.
VanPatten, B. (2005). Processing instruction. In C. Sanz (Ed.), Mind and context in adult second
language acquisition (pp. 267281). Washington, DC: Georgetown University Press.