Language Teaching Research 2014 Lado 320 44

Languagehttp://ltr.sagepub.
com/
Teaching Research
A fine-grained analysis of the effects of negative evidence with and without

metalinguistic information in language development
Beatriz Lado, Harriet Wood Bowden, Catherine A Stafford and Cristina Sanz
Language Teaching Research 2014 18: 320 originally published online 21 November
2013
DOI: 10.1177/1362168813510382
The online version of this article can be found at:

http://ltr.sagepub.com/content/18/3/320
Published by:
http://www.sagepublications.com
Additional services and information for Language Teaching Research can be found at:
Email Alerts: http://ltr.sagepub.com/cgi/alerts
Subscriptions: http://ltr.sagepub.com/subscriptions
Reprints: http://www.sagepub.com/journalsReprints.nav
Permissions: http://www.sagepub.com/journalsPermissions.nav
Citations: http://ltr.sagepub.com/content/18/3/320.refs.html
>> Version of Record - Jun 18, 2014

OnlineFirst Version of Record - Nov 21, 2013
What is This?
Downloaded from ltr.sagepub.com by Matt Golestan on October 22, 2014

510382
research-article2013
LTR18310.1177/1362168813510382Language Teaching ResearchLado et al.
LANGUAGE
TEACHING
Article RESEARCH
Language Teaching Research
A fine-grained analysis of the

2014, Vol. 18(3) 320344
The Author(s) 2013
Reprints and permissions:
effects of negative evidence sagepub.co.uk/journalsPermissions.nav
DOI: 10.1177/1362168813510382
with and without metalinguistic ltr.sagepub.com
information in language
development
Beatriz Lado
Lehman College (CUNY), USA
Harriet Wood Bowden

University of Tennessee-Knoxville, USA
Catherine A Stafford
University of Wisconsin-Madison, USA
Cristina Sanz
Georgetown University, USA
Abstract
The current study compared the effectiveness of computer-delivered task-essential practice
coupled with feedback consisting of (1) negative evidence with metalinguistic information
(NE+MI) or (2) negative evidence without metalinguistic information (NEMI) in promoting
absolute beginners (n = 58) initial learning of aspects of Latin morphosyntax. This study measured
language development on a variety of dependent measures (three comprehension-based tests and
one production test), assessing both changes in accuracy and reaction time as well as examining
effects on trained (old) vs. untrained (new) items. Although participants under both conditions
improved in accuracy and reaction time on all measures, on immediate post-tests, participants
receiving metalinguistic information outperformed those who did not. However, this advantage
had largely dissipated by the time of the delayed tests. Performance on untrained items also
suggests an advantage for metalinguistic feedback on system learning and on transfer of skills
from comprehension-based practice to production. Furthermore, we argue, based on findings
Corresponding author:
Beatriz Lado, Department of Languages and Literatures, Lehman College of the City University of New
York, Carman Hall, Room 257, 250 Bedford Park Blvd, West Bronx, New York, NY 10468-1589, USA.
Email: beatriz.lado@lehman.cuny.edu

Lado et al. 321
in cognitive neuroscience, that greater maintenance of gains in accuracy as well as evidence of

some faster processing by participants not exposed to metalinguistic information may reflect
qualitatively different learning processes at work: more explicit learning in the [NE+MI] group and
more implicit learning in the [NEMI] group (Li, 2010).
Keywords
Explicit, feedback, metalinguistic information, negative evidence
IIntroduction
The role of feedback in second language acquisition (SLA) is a topic of both theoretical
and practical importance. Many SLA studies have investigated the effects of feedback in
the context of face-to-face interaction (e.g. Carroll & Swain, 1993; Leeman, 2003;
Mackey, 1999; Mackey, Gass, & McDonough, 2000; Mackey & Philp, 1998; McDonough
& Mackey, 2006) and in computer-mediated communication (CMC) (Sachs & Suh,
2007; Sagarra, 2007). Results of recent meta-analyses indicate an overall positive effect
for feedback in both types of interaction (Keck, Iberri-Shea, Tracy-Ventura, &
Wa-Mbaleka, 2006; Mackey & Goo, 2007), and specifically for corrective feedback pro-
vided during interaction (Russell & Spada, 2006) on second language (L2) development.
Lis (2010) metanalysis showed that implicit feedback produces larger long-term and
therefore more reliable effects than explicit feedback.1
Recent research has also incorporated the use of computer assisted language learning
(CALL) applications (e.g. Rosa & Leow, 2004; Sanz, 2004) to examine the effects of
feedback. As opposed to CMC, where the computer acts as a tool to facilitate interaction
that may lead learners toward negotiation and ultimately language development (Sanz &
Lado, 2008), in CALL the computer is a tutor, and feedback is usually immediate, pro-
vided only when needed, individualized, and focused on the key form (Sanz, 2004, p.
12). In this way, as stated by Nagata and Swisher (1995), the learners attention (is
drawn) to weaknesses in their mastery of grammatical points that might not become
apparent in the classroom (p. 339).
While feedback has often been the focus of research in interaction studies, whether
classroom or CMC, fewer studies have investigated the effects of different types of feed-
back in more controlled (laboratory) settings, and especially in CALL. Of the handful of
dissertations and published studies, some have shown that learners exposed to feedback
consisting of negative evidence with metalinguistic information outperform those
exposed to feedback including negative evidence alone, on both immediate and delayed
post-tests (Bowles, 2005; Nagata, 1993; Nagata & Swisher, 1995; Rosa & Leow, 2004).
However, non-significant differences in performance on immediate tests have also been
found (Camblor, 2006; Hsieh, 2007; Moreno, 2007; Sanz, 2004; Sanz & Morgan-Short,
2004). In addition, some results on delayed tests suggest that there may in fact be longer-
term benefits for exposure to negative evidence alone (Moreno, 2007; Stafford, Sanz, &
Bowden, 2012), in line with Lis (2010) conclusions on feedback in interaction research.
Divergent results in many of these studies together with the importance attached to
feedback by language teaching practitioners suggest the need to take a closer look at the

322 Language Teaching Research 18(3)
differential effects, across time, of feedback with or without metalinguistic information

on the development of non-primary languages. In addition, a closer examination of the
effects of feedback as assessed by different but complementary measures (accuracy and
reaction time) will provide a more detailed picture of the effects of such feedback on
SLA. Comparing these measures of performance across a variety of comprehension- and
production-based tasks, which include both trained and untrained items (i.e. items that
are or are not part of the treatment) will also help to refine our knowledge of the role of
different types of feedback on linguistic development. Importantly, such insights will
inform language pedagogy, especially with respect to lesson planning and curriculum
design with the goal of optimizing long-term effects on language development.
II Previous research on computer-delivered feedback

A study reported in Nagata (1993) and Nagata and Swisher (1995) compared what they
called intelligent feedback (I-CALI) (with metalinguistic information) or traditional
feedback (T-CALI) (without metalinguistic information) on the learning of three types of
Japanese passive constructions. Participants (n = 32) read grammar explanations about
the target form and then practiced producing the target form (90 instances). The feedback
in the T-CALI group informed participants of a missing or expected word in their
response. The I-CALI group received metalinguistic information in addition. The results
showed that the metalinguistic group significantly outperformed the non-metalinguistic
group on both post and delayed (three weeks after the treatment) production measures.
The results of Rosa and Leow (2004) partially converge with those of Nagata (1993) and
Nagata and Swisher (1995).
Rosa and Leow (2004) had 5th-semester L2 Spanish college students (n = 100) com-
plete a multiple-choice jigsaw puzzle task to learn contrary-to-fact past conditional con-
structions. The study compared 6 conditions (number of items = 18 per condition) that
varied in degree of explicitness; we focus here on their (IFE) group, whose participants
were given negative evidence, and the (EFE) group, which was also provided with meta-
linguistic information. Results were mixed and task-dependent: the negative evidence/
metalinguistic (EFE) group outperformed the negative evidence alone (IFE) group on
both immediate and delayed tests for production of old items and interpretation of new
items, but no difference between groups was found over time for the recognition of old
items or the production of new items. The limited exposure to the target may not have
been enough for learners in the negative evidence (IFE) condition to show evidence of
development, since learning under such a condition may require more instances of the
target form (N. Ellis, 1993; 2005).
Sanz (2004) and Sanz and Morgan-Short (2004) addressed this limitation and inves-
tigated whether exposure to structured-input practice items (oral and written, total of 56
items), coupled with feedback that offered negative evidence with metalinguistic infor-
mation (+EF) or negative evidence without metalinguistic information (EF), had a dif-
ferential effect on L2 acquisition of Spanish preverbal direct object pronouns (n = 33).
Results suggested that the amount of interaction with the target form may be critical, as
no statistically significant difference between the two groups on interpretation or produc-
tion post-tests was identified. However, and importantly for the present study, the design

Lado et al. 323
did not control the amount of individualized feedback each participant received, and
retention of knowledge gained was not tracked.
Four additional unpublished dissertation studies (Bowles, 2005; Camblor, 2006;
Hsieh, 2007; Moreno, 2007) have also addressed the issue of type of feedback provided
to learners in a CALL context and have incorporated conditions similar to those included
in the aforementioned studies (i.e. negative evidence with or without metalinguistic
information). Taken together, their results suggest that while provision of feedback is
beneficial, +/ metalinguistic information comparisons do not always favor the metalin-
guistic feedback condition (Camblor, 2006; Hsieh, 2007; Moreno, 2007). Specifically, in
Bowless (2005) study, providing metalinguistic information gave learners some imme-
diate advantage when compared to learners who do not receive any grammar explanation
as part of feedback, but in the long run this advantage disappeared. By contrast, in
Morenos (2007) study, delayed advantages were observed for learners who received
only information about the correctness of their answers to practice tasks, i.e. negative
evidence alone. Limitations in these studies include potential problems with the scoring
procedure, which may have accepted ungrammatical targeted items as correct (e.g.
Bowles, 2005), and lack of statistical power, which may explain lack of differences
between groups (e.g. Camblor, 2006).
Stafford and colleagues (2012) compared the effects of oral and written task-essential
practice combined with negative evidence with and without metalinguistic information
provided to SpanishEnglish bilinguals (n = 65) learning to assign semantic functions
via noun case morphology in third-language (L3) Latin. Results from a grammaticality
judgment test indicated that, unlike their counterparts, participants who received nega-
tive evidence alone not only learned but also retained what they learned over a period of
at least three weeks. However, where transfer of skills from input practice to output
assessment was involved, only the group exposed to negative evidence with metalinguis-
tic information showed significant improvement.2
In sum, studies on the differential effects of computer-delivered feedback have led to
inconclusive results due in part to the limited number of studies that have investigated
such effects. Additionally, research designs have almost always been biased against more
implicit treatment conditions (in our case, without metalinguistic information) by pro-
viding not only a different type of input or feedback (in the language vs. about the lan-
guage) but less input, as in Rosa and Leow (2004). Combining this situation with a lack
of control of time on task often results in shorter implicit conditions (e.g. Sanz & Morgan-
Short, 2004). Thus, research is needed that more equitably compares the effects of differ-
ent degrees of explicitness of feedback, specifically negative evidence with and without
metalinguistic information, in pedagogical conditions. This includes examining the pos-
sibility that feedback that consists of negative evidence without metalinguistic informa-
tion leads to more stable knowledge, as suggested in studies on interaction (e.g. Li, 2010)
and computerized feedback (e.g. Bowles, 2005; Moreno, 2007; Stafford et al., 2012).
Furthermore, the nature of the measurements themselves (R. Ellis, 2005) have tended
to favor planning and explicit processing (e.g. untimed translation task in Bowles, 2005),
which may benefit learners who have undergone treatments providing feedback with
metalinguistic information. Along these lines, additional insight may be gained by exam-
ining the effects of different types of feedback on both speed of processing (reaction

time; RT) and accuracy, given that accuracy alone cannot inform us about potential pro-
cessing differences underlying what may be apparently indistinguishable gains. For
example, longer RTs in one group as compared to another (where accuracy is equivalent)
might suggest that the underlying behavior is different, with the slower group relying on
controlled processes (Newell, 1990). Thus, examining RT can be an effective way to
measure automaticity, i.e. the ability to perform without conscious awareness or while
utilizing minimum attentional resources (Jiang, 2007, p. 2). As Segalowitz (2003)
claims, speed of processing is the characteristic most frequently associated with automa-
ticity. Nevertheless, Segalowitz (2003) also argues that, more than just a synonym for
fast processing, automaticity should be used for situations where the change is of sig-
nificant consequence, such as restructuring of underlying processes (p. 387). In the pre-
sent study, we combine both accuracy and reaction time as a way to look into learners
efficiency in restructuring their underlying processes.
The present study aims to address several of the limitations in previous studies and to
cast additional light on the role of feedback in L2 development. In addition to using
computers to tightly control the amount of input, practice, and feedback provided, we
also collected data through both receptive (input or comprehension-based) and produc-
tive measures, and gathered both accuracy and RT data. The tests also included both old
and new items in order to provide information on item-learning and system-learning.
Specifically, performance on trained items may be indicative of learners ability to
remember chunks of language, whereas performance on untrained items indexes degree
of success at system learning. By considering all these measures, we aim to provide a
broader and deeper picture of language learning and retention.
In this study, we specifically examine how absolute beginners learn to interpret and
produce the semantic functions of noun phrases (i.e. to decide who does what to whom)
under instructional conditions that combine comprehension-based, task-essential prac-
tice with feedback that provides negative evidence but differs in the provision of meta-
linguistic information.
To find equilibrium between ecological validity, on the one hand, and control of exter-
nal input and prior knowledge, on the other, the target language chosen was Latin, a natu-
ral language that is no longer spoken in its classical form, and specifically noun case
morphology as the target form. We were guided by the following research questions:
Does the presence or absence of metalinguistic information (MI) in combination with
negative evidence (NE) in computer-delivered, task-essential practice differentially
affect absolute beginners ability to assign semantic functions in Latin? Do these effects
differ across different tasks or trained vs. untrained items?
IIIMethodology
1 Participants
Participants in the present study were 58 college students native speakers of English
randomly assigned to one of two treatment conditions: NE+MI (n = 33) and NE-MI (n =
25). Participants age ranged from 18 to 22 years old. To control for previous language
experience, we recruited participants who had no knowledge of Latin or any other case
marking language and who were in a second-year Spanish program. We accepted

Lado et al. 325
participants with one or two semesters in a non-case language. Therefore, in some cases,
Latin was the fourth language (L4) rather than the L3. Participants scoring 67%3 or
higher on the pre-tests were not included in the final sample. All participants were com-
pensated for their participation with extra credit.4
2 Target structure
The linguistic target of the study was the assignment of thematic agent/patient roles to
nouns in Latin via case morphology. The theoretical framework that guided the design
of our materials was the Competition Model (CM), developed by Bates and MacWhinney
(1989). In Competition Model (CM) terms, language learning is defined as a process
of acquiring coalitions of formfunction mappings, and adjusting the weight of each
mapping until it provides an optimal fit to the processing environments (MacWhinney,
2001, p. 59). It is argued that when processing language, the assignment of functional
meanings to grammatical forms in the input involves competition, which is governed by
the cue validity of the linguistic input. Cue validity refers to the availability (fre-
quency of appearance), and reliability (degree to which a cue leads to the correct inter-
pretation) of a particular cue in the input. In the present study, the targeted linguistic
forms were noun and verb morphology that indicate thematic agent and patient roles in
Latin (i.e. who does what to whom). In Latin, the strongest cue (i.e. the most available
and reliable) is case morphology, followed by subjectverb agreement, and finally,
word order. This state of affairs is reversed in the participants first language (L1),
English, in which word order is the strongest cue. In contrast, in Spanish, the partici-
pants L2, verb agreement is the strongest cue, followed by word order (Bates &
MacWhinney, 1989). System learning in CM terms is understood as the application of
a new cue hierarchy to novel input; in the current study, system learning is investigated
through the inclusion of novel test items.
Following task-essentialness principles (Loschky & Bley-Vroman, 1993) that have
been shown to lead to reliable linguistic gains in Processing Instruction research
(VanPatten, 2005), we manipulated the input so that participants would be encouraged to
rely on noun and verb morphology to interpret sentences. Practice sentences were manip-
ulated so that neither word order nor subjectverb agreement was a consistently reliable
cue. Given the cue hierarchy of English, which has a strict subjectverbobject (SVO)
word order, we predicted that when participants first read a sentence such as
(1) Parvul-um specta-t angel-us.

boy-masc.sing.obj. looks at-sing. angel-masc.sing.subj;
The angel looks at the boy.
and were asked to choose between two possible English translations
(2) a. The boy looks at the angel.

b. The angel looks at the boy.
they would by default respond assuming SVO word order (their L1 cue), which would
lead to an incorrect answer. Provision of immediate negative feedback might then lead

them to restructure their system and shift their reliance to a more reliable cue, in this
case, noun case morphology.5 Given these manipulations and the fact that practice could
not be performed successfully by relying solely on the lexical meaning of the nouns and
verbs, on word order, or on verbal morphology, participants were led to process both
noun and verb morphology as a means of successful task completion.
3 Experimental design
The experiment consisted of three sessions over four weeks; all took place in an Apple labo-
ratory, where participants interacted with an application that combined ColdFusion and
Flash programming to deliver audiovisual treatments and capture participants responses.
During the first session, participants completed a consent form and a background
questionnaire followed by a computer-delivered vocabulary lesson and quiz. Next, they
completed four pre-tests (written and aural interpretation, grammaticality judgment and
sentence production). During the second session, approximately one week later, partici-
pants completed the computer-delivered treatment and four immediate post-tests. At the
final session, two weeks later, participants completed four delayed post-tests and an
online debriefing questionnaire.
a Vocabulary lesson.6. Presentation of vocabulary was timed (12 minutes) as a means to

control exposure. Each Latin noun (n = 35) was presented onscreen as follows: two pic-
tures representing the noun in singular and plural appeared first, and then the singular
and plural, masculine and feminine subject (nominative) and object (accusative) case
forms (8 forms total) were presented aurally and in written form, followed by a written
English translation. There was no explanation of what the noun morphology indicated,
though written singular forms were presented under singular pictures and plural forms
under plural pictures. Each Latin verb (n = 11)7 was presented beginning with two pic-
tures representing the action, followed by the infinitive verb form (written and aural) and
the written English translation.
Immediately following the vocabulary lesson, participants were quizzed on the word
meanings via a multiple-choice quiz. In order to ensure that vocabulary knowledge was
sufficient for comprehension of word meanings in the practice session, participants with
a score of 60% or higher on this quiz reviewed the vocabulary items they had missed
until they reached 100% accuracy. Participants with a score below 60% repeated the
entire vocabulary lesson and then the quiz until they reached 100%. Right after the
vocabulary lesson and quiz, participants completed the pre-tests.
b Treatment: Task-essential practice and feedback.During the treatment, participants

interacted with a computer-delivered task-essential practice session involving interpreta-
tion of written and aural Latin sentences. Practice consisted of 6 different tasks with nine
or 10 items per task. All practice items included two answer choices, and in order to
make the practice task-essential, the two answer choices had reversed roles for subject
and object (e.g. the queens help the king and the king helps the queens); thus, partici-
pants had to make a choice that hinged on interpretation of the critical form (noun case
morphology indicating subject/object) when interpreting sentences.

Lado et al. 327
Participants responded via key press and received immediate feedback that remained
onscreen for five seconds before the program advanced automatically to the next practice
item. Both groups completed the practice session twice. Therefore, both groups were
exposed to the same number of Latin exemplars, but the NE+MI group additionally got
exposure to metalinguistic information. Time on task was balanced, however, in that
both types of feedback stayed onscreen for the same amount of time.
Following guidelines for developing structured input activities (Lee & VanPatten,
2003), both aural and written comprehension-based tasks were included in the treatment.
Task 1 presented a written Latin sentence and two English translation choices. Task 2
presented a written sentence and two picture choices. Task 3 presented a picture and two
written Latin sentence choices. Task 4 presented an aural sentence and two English trans-
lation choices. Task 5 presented an aural sentence and two picture choices. Task 6 pre-
sented a picture and an aural sentence, and participants had to decide whether or not the
picture shown matched the sentence heard. Participants responded via key presses.
Although the order in which the tasks were presented was fixed, item order was rand-
omized within each task.
All participants received feedback on both correct and incorrect responses during the
practice session. Thus, the amount of feedback provided was controlled across partici-
pants.8 As outlined above, the two treatments differed in the provision (or not) of meta-
linguistic information in the feedback. Feedback in the NE+MI condition confirmed or
rejected the response and included metalinguistic information about the target form.
Feedback in the NEMI condition confirmed or rejected responses, but did not provide
any metalinguistic information9. Examples of both types of feedback are provided in
Figures 1 and 2.10
c Language tests. Four language tests were administered: (1) a written interpretation
test, (2) an aural interpretation test, (3) a written grammaticality judgment test (GJT), and
(4) a sentence production test.11 Three versions of each test, with equivalent but different
items, were created, and these were administered as pre-, post-, and delayed tests. The
Figure 1. Example of Negative Evidence + Metalinguistic Information (NE+MI) feedback for a

correct response.

Figure 2. Example of Negative Evidence Metalinguistic Information (NEMI) feedback for a

correct response.
order of test version presentation was counterbalanced across participants and test ses-
sions, with the exception of the sentence production task, which was always completed
last in order to minimize test effects from production on comprehension/input-based (i.e.
receptive) tests. All language tests included trained (previously seen) and untrained
(new) items, and items within each test were presented in randomized order. Participants
were asked to respond as quickly and accurately as possible on the tests A third response
choice (I dont know), not included in practice, was included in the tests to minimize
artificially inflating scores from guessing on a 2-choice test.
The written and aural interpretation tests followed the same design as Tasks 2 and 5
in the practice session; participants listened to or read a Latin sentence and were instructed
to select the corresponding picture (from two choices), or the additional I dont know
response. Each test consisted of 20 items (i.e. sentences): 12 critical (6 trained and 6
untrained) and 8 distractors. Whereas the pictures in critical items represented reversed
subject/object roles, as in the practice tasks, the pictures in distractors depicted entirely
different scenes (different subjects, actions and objects), so that items could be answered
using only vocabulary knowledge, without attention to form and meaning of the target
structures.
On the GJT, participants read a sentence and indicated whether it was grammatical or
not (or I dont know) via key press. Like the interpretation tests, this test included 20
items, 12 critical (4 trained and 8 untrained) and 8 distractors. Of the 12 critical items, 6
were grammatical and 6 were ungrammatical. Of the 6 ungrammatical items, 2 had
incorrect case endings, 2 had incorrect subjectverb agreement, and 2 contained both of
these errors. The distractor sentences contained one noun and a verb rather than 2 nouns
and a verb.
On the sentence production test, participants saw a picture on the screen and were
asked to form a sentence that correctly described the picture by dragging and dropping
the provided noun and verb stems as well as appropriate morphological endings (which
they had to select from the complete set of endings provided) in order to form a Latin
sentence. To avoid biasing a particular word order, noun and verb stems appeared
onscreen in random order. For each production item, two nouns (subject and object) and

Lado et al. 329
a verb were required to describe the action in the picture. Of the 15 items on the produc-
tion test, 10 were critical (5 trained and 5 untrained). As with the GJT, distractor sen-
tences contained one noun and a verb rather than 2 nouns and a verb.
The scoring procedure was straightforward: one point was awarded for each correct
answer to the 12 critical items on the interpretation and GJTs, making 12 the maximum
score on each of these 3 tests. Each sentence production item was awarded one, two,
or three points depending on the number of accurate morpheme choices: one point for
correct verb morphology (i.e. correct subjectverb agreement) and one point for each
correct noun ending (to score a point for a noun ending, both number and case had to be
accurate; half points were not awarded). Thus, the maximum score possible for the sen-
tence production test was 30. According to Cronbachs alpha values (minimum = .671 to
maximum = .870), test reliability was medium to high.
IV Analysis and results

All analyses were run with alpha set at p < .05. Independent analyses were conducted for
each of the four language tests, for both accuracy and RT.12 In addition, we conducted
separate analyses on trained and untrained items. Results for accuracy are presented first,
followed by analyses of RT data.
1 Accuracy across all items (trained + untrained)

Table 1 summarizes descriptive statistics for accuracy on each test by treatment group
(NE+MI, NEMI). Visual inspection suggests that, overall, both groups improved from
pre- to post-test, and that, though the NEMI groups immediate gains were more mod-
est, they maintained those gains better than the NE+MI group from the immediate to the
delayed post-test.
A series of 3 2 (Time Condition) mixed repeated-measures ANOVAs were per-
formed on accuracy scores for each language test, with Condition (NE+MI, NEMI)
Table 1. Descriptive statistics: Overall accuracy.
Test Group n Pre-test Post-test Delayed test
Mean SD Mean SD Mean SD

WI NE+MI 33 5.03 2.02 10.03 1.91 8.76 2.85
NEMI 24 5.25 1.15 7.08 2.37 7.46 1.95
AI NE+MI 31 5.00 1.95 9.35 2.30 8.06 3.02
NEMI 25 5.16 1.54 6.76 2.15 6.64 2.48
GJ NE+MI 31 4.16 1.95 6.93 2.99 5.71 3.05
NEMI 24 3.96 2.03 4.87 1.96 5.79 2.26
SP NE+MI 26 13.81 3.63 18.81 6.18 17.35 5.04
NEMI 24 13.92 4.64 14.04 4.90 14.25 4.09
Notes. WI = written interpretation, AI = aural interpretation, GJ = grammaticality judgment,

SP = sentence production. The number of participants was different for each test due to technical prob-
lems (some computers froze during post or delayed tests).

Table 2. Summary of results for ANOVA on overall accuracy scores.
Test Time/Condition/ Group df Sum of Mean F p Partial Power

squares squares
WI Time 2 383.18 191.59 57.79 .00* .51 1.00
Condition 1 75.09 75.09 10.55 .002* .16 .89
Time group 2 69.71 34.85 10.51 .00* .16 .99
AI Time 2 268.03 134.01 48.77 .00* .48 1.00
Condition 1 68.71 68.71 6.56 .01* .11 7.11
Time group 2 52.91 26.46 9.63 .00* .15 .98
GJ Time 2 113.42 56.71 14.25 .00* .21 .99
Condition 1 21.46 21.46 2.11 .15 .04 .30
Time group 2 36.62 18.31 4.60 .01* .08 .77
SP Time 2 178.16 89.08 6.25 .003* .12 .89
Condition 1 250.07 250.07 6.05 .02* .11 .67
Time group 2 153.20 76.60 5.37 .006* .10 .83
Notes. WI = written interpretation, AI = aural interpretation, GJ = grammaticality judgment, SP = sentence

production. * Indicates statistical significance at p < .05.
entered as the between-participants factor, and Time (pre-test, post-test, delayed test)
entered as the within-participants factor. In addition, when a significant Time Condition
interaction was present, independent samples t-tests were conducted to compare groups
performance on post- and delayed tests. When no statistically significant interaction was
found, post-hoc contrasts were analysed as a means to explore differences in groups
immediate learning (performance from pre-test to post-test) and retention (performance
from post-test to delayed test).
Independent samples t-tests performed on the pretest scores yielded no differences
between the groups for any of the tests: written interpretation, t(55) = .478, p = .635;
aural interpretation, t(54) = .334, p = .740; GJT, t(53) = .376, p = .709; and sentence
production, t(48) = .093, p = .926. Therefore, any differences between groups across time
can be attributed to the treatment.
ANOVA results for overall accuracy are summarized in Table 2. As the table shows,
results on all four Latin tests followed an almost identical pattern: significant main
effects for Time, Time Condition interactions, and main effects for Condition on all
tests except the GJT.
Results of independent samples t-tests showed that the NE+MI group outperformed the
NEMI group at post-test on all four test types (written interpretation, t(55) = 5.19, p < .001;
aural interpretation, t(54) = 4.32, p < .001; GJT, t(53) = 2.92, p < .05; sentence production,
t(48) = 3.005, p < .05), but this advantage remained at the delayed test only for sentence
production (written interpretation, t(55) = 1.92, p = .059; aural interpretation, t(54) = 1.90,
p = .063; GJT, t(53) = .011, p = .913; sentence production, t(48) = 2.37, p < .05).
2 Accuracy: Trained vs. untrained items

Next, we conducted separate analyses for trained and untrained items. Descriptive statis-
tics for accuracy on trained and untrained items are presented in Table 3. Again,

Lado et al. 331
Table 3. Descriptive statistics: accuracy for trained and untrained items.
Test Treatment n Pre-test Post-test Delayed test

WI T NE+MI 33 2.94 1.41 5.15 1.00 4.61 1.48
NEMI 24 3.00 .83 4.00 1.44 4.17 1.00
U NE+MI 33 2.09 .98 4.87 1.22 4.15 1.66
NEMI 24 2.25 .79 3.08 1.41 3.29 1.30
AI T NE+MI 31 2.48 .99 4.93 1.20 4.06 1.75
NEMI 25 2.96 1.13 3.28 1.27 3.36 1.32
U NE+MI 31 2.51 1.26 4.42 1.65 4.00 1.61
NEMI 25 2.20 .96 3.48 1.19 3.28 1.57
GJ T NE+MI 31 1.64 1.05 2.61 1.11 2.10 1.19
NEMI 24 1.75 1.03 1.83 1.00 2.29 1.27
U NE+MI 31 2.52 1.61 4.32 2.27 3.61 2.23
NEMI 24 2.20 1.67 3.04 1.52 3.50 1.41
SP T NE+MI 26 6.42 2.16 9.15 3.74 8.50 3.30
NEMI 24 6.29 2.58 6.71 2.42 6.29 2.01
U NE+MI 26 7.38 1.88 9.65 2.81 8.85 2.50
NEMI 24 7.62 2.55 7.33 2.94 7.96 2.37
Notes. WI = written interpretation, AI = aural interpretation, GJ = grammaticality judgment, SP = sentence

production, T = trained, U = untrained.
independent samples t-tests performed on the pretest scores yielded no differences

between groups for any test with either trained (written interpretation, t(55) = .187, p =
.852, aural interpretation, t(54) = 1.67, p = .101, GJT, t(53) = .37, p = .713, and sentence
production, t(48) = .20, p = .845), or untrained items (written interpretation, t(55) = .65,
p = .516, aural interpretation test, t(54) = 1.03, p = .305, grammaticality judgment, t(53)
= .69, p = .492, and sentence production, t(48) = .38, p = .704). Therefore, any differ-
ences between groups across time can be attributed to the treatment.
a Trained items. Results for trained items mirrored those for overall accuracy as reported
above; i.e. the NE+MI group outperformed the NEMI group at the post-test, but only on
sentence production did it maintain its advantage at delayed testing. As results for perfor-
mance on untrained items revealed different patterns, we report these in more detail below.
b Untrained items. For accuracy on written interpretation, results for untrained items
yielded main effects for Time, F(2, 110) = 45.06, p < .001, partial = .45, and Condi-
tion, F(1, 55) = 11.68, p < .05, partial = .17, and a significant Time Condition inter-
action, F(2, 110) = 11.230, p < .001, partial = .17. As the means in Table 3 suggest,
results of the independent samples t-tests conducted on post- and delayed test scores
showed that the NE+MI group outperformed the NEMI group and maintained its
advantage two weeks after treatment (post-test, t(55) = 5.14, p < .001; delayed, t(55) =
2.11, p < .05).

Figure 3. Grammaticality judgment accuracy means for untrained items by treatment group.
For accuracy on untrained items in the aural interpretation test, significant main
effects for Time, F(2, 108) = 30.34, p < .001, partial = .36, and Condition, F(1, 54) =
5.33, p = .025, partial = .09 were identified, and are attributed to the NE+MI groups
overall superior performance. There was no significant Time Condition interaction,
F(2, 108) = 1.06, p = .348, partial = .02, power = .23.13 Statistical contrasts yielded no
significant results for any of the test sessions; pre- to post-test, F(1, 54) = 2.17, p = .147,
partial = .04, power = .30; post- to delayed test, F(1, 54) = .06, p = .811, partial =
.001, power = .06.
For accuracy on untrained items in the GJT, the ANOVA yielded a main effect for
Time, F(2, 106) = 13.62, p < .001, partial = .20, but neither a main effect for
Condition, F(1, 53) = 2.15, p = .148, partial = .04, power = .30, nor an interaction,
F(2, 106) = 2.51, p = .086, partial = .04, power = .30. However, results from contrast
analyses for post- to delayed tests, F(1, 53) = 5.67, p < .05, partial = .10, as illus-
trated in Figure 3, show that the NEMI group gained from post- to delayed test to the
point where it performed similarly to the NE+MI group, whose accuracy actually
declined in the two-week interval between the post- and delayed test sessions. These
results are confirmed with paired-samples t-tests: Whereas the NE+MI group improved
from pre- to post-test t(30) = 4.21, p < .001, and lost significantly from post- to
delayed test, t(30) = 2.06, p < .05, the NEMI group improved significantly from pre-
to post-test, t(23) = 3.50, p < .001, and maintained those gains between post- and
delayed test, t(23) = 1.37, p = .185.

Lado et al. 333
Table 4. Descriptive statistics: Overall RT in seconds.
Tests Group n Pre-test Post-test Delayed test

WI NE+MI 31 9.77 5.19 7.46 3.62 7.29 2.70
NEMI 24 9.13 2.80 6.87 2.46 7.96 3.12
AI NE+MI 30 8.65 4.14 6.31 2.60 7.82 3.87
NEMI 22 8.41 2.09 6.07 2.90 7.45 2.00
GJ NE+MI 29 5.99 3.69 5.61 2.15 5.35 2.90
NEMI 22 6.32 2.73 4.15 2.02 3.89 1.46
Notes. WI = written interpretation, AI = aural interpretation, GJ = grammaticality judgment.
Finally, for accuracy on untrained items in the sentence production test, there was no
main effect for either Time, F(2, 96) = 3.02, p = .053, partial = .06, power = .79, or
Condition, F(1, 48) = 3.92, p = .053, partial = .08, power = .50, but there was a signifi-
cant Time Condition interaction, F(2, 96) = 4.17, p < .05, partial = .08. Results of
independent samples t-test conducted on post-, t(48) = 2.85, p < .01, and delayed tests,
t(48) = 1.28, p = .205, indicated that the NE+MI group outperformed the NEMI group
on the immediate post-test, but again this between-group difference disappeared after
two weeks. Nevertheless, as revealed by paired samples t-tests, whereas the NE+MI
group showed performance gains from pre-test to post-test, t(25) = 3.38, p < .001, and
maintained these gains on the delayed test, t(25) = 1.49, p = .150, the NEMI group did
not have any immediate gains, t(23) = .39, p = .698, nor did they improve significantly
between pre- and delayed tests, t(23) = .51, p = .618.
3 Speed/RT across all items (trained + untrained)

RT, measured in milliseconds, was calculated as the time interval between the presenta-
tion of a test item and a participants response via key press. Only RTs for correct
responses were entered into the analyses. Table 4 shows the descriptive statistics by treat-
ment group. Overall, the means seem to indicate that both groups responded faster as a
result of the treatment. Moreover, both groups appear to have performed similarly on all
tests except the GJT, in which the NEMI group appears to have responded faster on
both post- and delayed tests.
As with accuracy data, we performed a series of 3 2 (Time Condition) mixed
repeated-measures ANOVAs for RTs on the comprehension-based tests, with separate
analyses for all items, trained items only and untrained items only. The results for all
items are summarized in Table 5.
As Table 5 shows, two of the ANOVAs across all items for written and aural inter-
pretation RTs revealed main effects for Time, but no main effect for Condition, and no
significant interaction. The main effects for Time indicate that participants responded
differently across time. Follow-up contrast analyses (together with the mean values)
revealed faster RTs on the immediate post-tests as compared to the pre-tests for these two

Table 5. Summary of results for ANOVA on overall RT.
Test Time/Condition/ Group df Sum of Mean F p Partial Power

squares squares
WI Time 2 157,943.46 78,971.73 10.26 .00* .16 .98
Condition 1 1,426.05 1,426.05 .07 .80 .001 .06
Time group 2 14,897.65 7,448.83 .97 .38 .02 .21
AI Time 2 141,666.66 70,833.33 14.21 .00* .22 .99
Condition 1 1,024.24 1,024.24 .16 .69 .06 .79
Time group 2 136.06 68.03 .01 .99 .00 .05
GJ Time 2 70,506.57 35,253.28 8.11 .001* .14 .95
Condition 1 9,951.88 9,951.81 2.41 .13 .05 .33
Time group 2 28,344.58 14,172.29 3.27 .04* .06 .61
Notes. WI = written interpretation, AI = aural interpretation, GJ = grammaticality judgment. * Indicates

statistical significance at p < .05.
assessments (written interpretation, F(1, 53) = 18.50, p < .001, partial = .26, and aural
interpretation, F(1, 50) = 22.71, p < .001, partial = .31). The pattern that emerges from
RT data on the delayed tests is more complex. First, both groups maintained their initial
gains in speed on the written interpretation test, as indicated by a non-significant contrast
from post-test to delayed test, F(1, 53) = 1.05, p = .310, partial = .02, power = .17.
However, they slowed down significantly from post- to delayed test on the aural interpre-
tation test, F(1, 50) = 8.97, p < .001, partial = .15.
Somewhat different results were obtained for the GJT RTs. As shown in Table 5, the
analyses again yielded a significant main effect for Time and no main effect for Condition.
Contrast analyses (between pre- and post-tests and between post- and delayed tests)
revealed that all participants responded faster at post-test, F(1, 49) = 7.40, p < .01, partial
= .13, and maintained these RTs at delayed test, F(1, 49) = 1.15, p = .288, partial =
.02, power = .18. Note however that, unlike the results for the other tasks, there was a
significant Time Condition interaction. Results of independent samples t-tests con-
ducted on post-, t(49) = 2.46, p < .05, and delayed tests scores t(49) = 2.23, p < .05
revealed a difference in RT between the two groups. This difference was due to reliably
faster performance by the NEMI group, as revealed in the only significant difference,
found in results of the paired-samples t-test conducted between pre and post-tests,
(NE+MI, t(28) = .55, p = .585; NEMI, t(23) = 3.74, p < .01) (see also Figure 4).
4 Speed/RT: Trained vs. untrained items

Descriptive statistics for RTs on trained and untrained items are presented in Table 6.
ANOVA results generally mirrored those for overall accuracy reported above, i.e. signifi-
cant main effects for Time, Time Condition interactions, and main effects for Condition
on all tests except the GJT; with independent samples t-tests revealing that NE+MI per-
formed faster than NEMI on post-tests. The only exception to this pattern in RTs was
for untrained items on the aural interpretation test, with a non-significant main effect for

Lado et al. 335
Figure 4. Grammaticality judgment response time means by treatment group.

Note. Smaller values indicate faster processing.
Time, F(2, 102) = 1.82, p = .168, partial = .03, power = .37. Also, there was no signifi-
cant Time Condition interaction for RTs from untrained and trained items on the GJT
test (trained, F(2, 78) = .70, p = .500, partial = .02, power = .16; untrained, F(2, 88) =
1.36, p = .263, partial = .03, power = .28).
To summarize, input-based practice enhanced both accuracy and speed of response
irrespective of type of feedback. This was true across all items as well as for trained items
on all tests. Results showed that participants who received negative evidence with meta-
linguistic information (NE+MI) consistently outperformed their NEMI counterparts on
accuracy of interpretation, sentence production, and grammaticality judgment immedi-
ately following treatment. After two weeks, the between-group differences in accuracy
had largely dissipated, with the NEMI group performing similarly to the NE+MI group
on all measures except the sentence production test. Analyses of RTs revealed that on the
GJT the NEMI group quickened their RTs over time more than the NE+MI group.
Analyses of performance on untrained items revealed a complex view of the effects
of feedback conditions on the ability to extend newly gained knowledge to items never
seen before. First, the main effects of Time show that irrespective of condition, the treat-
ment was beneficial, as reflected in all participants improved performance on the inter-
pretation and GJTs. In addition, in the case of the interpretation tests, the NE+MI group
generally outperformed the NEMI group in accuracy (at both post- and delayed tests for
written interpretation, and overall for aural interpretation). On the GJT, the NEMI and

Table 6. Descriptive statistics: RTs for trained and untrained items in seconds.
Test Treatment and item type n Pre-test Post-test Delayed test

WI T NE+MI 30 9.00 4.01 7.13 4.03 6.78 1.97
NEMI 24 9.46 3.00 6.96 2.70 7.83 3.30
U NE+MI 31 10.21 7.28 7.73 3.60 7.69 3.73
NEMI 24 8.71 3.37 7.00 2.58 8.19 3.64
AI T NE+MI 30 8.86 4.91 6.96 1.76 8.12 4.60
NEMI 24 8.65 2.25 6.77 2.56 7.18 2.70
U NE+MI 29 8.11 3.26 7.38 2.49 7.57 3.35
NEMI 24 7.94 3.08 7.20 2.18 7.57 2.11
GJ T NE+MI 23 6.05 5.29 5.50 2.13 4.91 2.42
NEMI 18 6.16 3.33 4.09 1.87 4.06 1.91
U NE+M 26 6.04 3.78 5.45 2.41 5.20 3.22
NEMI 20 5.84 2.10 4.05 2.40 3.58 1.59
Notes. WI = written interpretation, AI = aural interpretation, GJ = grammaticality judgment, T = trained, U

= untrained.
NE+MI groups performance was not significantly different across time. For sentence
production, the NE+MI group showed an advantage over the NEMI group at the post-
test, but this difference had dissipated by the time of the delayed test. Results show a lack
of overall trade-off effects as speed increased with accuracy on all tests except for aural
interpretation, for which neither condition seemed to alter participants speed of response.
VDiscussion
This study investigated the differential effects of negative evidence with or without met-
alinguistic information on initial language development as reflected in learners ability
in terms of accuracy and reaction time to assign semantic functions to noun phrases.
Specifically, we looked at learners ability to accurately and efficiently interpret, judge
and produce sentences in Latin, a language that relies on noun case (and verb agreement)
morphology to convey who does what to whom. Mindful of the limitations identified
in the relatively small number of studies similar to ours, we implemented a design that
gave a fair chance to a less explicit treatment group (negative evidence without metalin-
guistic information) by controlling time on task (amount of practice) and by avoiding
testing materials that favor planning and explicit processing. Recall that these factors
may have unduly favored more explicit treatments in previous research. Moreover, by
conducting separate analyses on accuracy and RT, and on trained and untrained items, we
were uniquely able to compare groups initial language development in terms of both
accuracy and efficiency, as well as their item and system learning. Several interesting
conclusions emerge from the results reported above.
In line with DeKeysers (2003), Lis (2010), Norris and Ortegas (2000), and Spada
and Tomitas (2010) reviews and meta-analyses, as well as studies by Bowles (2005),

Lado et al. 337
Nagata (1993), Nagata and Swisher (1995), and Rosa and Leow (2004), our results sug-
gest that participants who received a combination of negative evidence and metalinguis-
tic explanations outperformed those who received only negative evidence on immediate
post-tests. However, these results diverge from those of Camblor (2006), Hsieh (2007),
Moreno (2007), Sanz (2004), Sanz and Morgan-Short (2004), where there was no differ-
ence between conditions with and without metalinguistic explanation in feedback. An
explanation for this difference may be found in the processing demands made by the
target forms, in learner readiness, and in learner proficiency level.
With regard to the target forms, our participants were required to (1) rely on case
morphology over word order and verbal agreement, (2) process eight noun endings (case
morphemes) (3) attach them to the right noun (in the case of sentence production), and
(4) establish verbal number agreement. This process involves far more than the four
forms required in Moreno (2007), Sanz (2004), and Sanz and Morgan-Short (2004),
where the choice of target form (Spanish object clitics) was also guided by Competition
Model principles (Bates & MacWhinney, 1989). Furthermore, the case morphemes are
not perceptually salient, as they are monosyllabic and appear at the end of the word.
Moreover, there are multiple forms feminine, masculine, singular and plural to
encode the same function, accusative or nominative, which contributes to form complex-
ity. This all makes assigning semantic functions one of the most difficult aspects of Latin,
as any teacher or learner will attest. The provision of metalinguistic information along
with negative evidence may have benefited learners in creating form/meaning connec-
tions with such cognitively demanding target forms, as suggested by results in all four
measures, at least at the time of the post-test.
It has also been suggested that learner readiness and level of proficiency may influ-
ence feedback effects (e.g. Iwashita, 2003; Mackey & Philp, 1998). Nagata (1993),
Nagata and Swisher (1995) and Rosa and Leow (2004) examined second-year language
learners; other studies (Hsieh, 2007; Moreno, 2007; Sanz & Morgan-Short, 2004)
included participants with a basic level of proficiency, while our study looked at absolute
beginners. A recent study by Morgan-Short, Sanz, Steinhauer, and Ullman (2010) simi-
larly revealed that naive learners had an initial advantage in performance under a more
explicit condition with metalinguistic information, as compared to a less explicit condi-
tion without metalinguistic information, in learning an artificial language. Thus, at least
with respect to existing laboratory-based research on the effects of more and less explicit
instruction on SLA, it appears that providing metalinguistic rules gives at least an initial
advantage to naive learners when the processing of linguistic forms is cognitively
demanding.
The separate analyses we conducted of untrained and trained items showed an inter-
esting pattern of maintenance and loss of gains depending on type of test and item. First,
trained items analysed alone patterned with all items analysed together. In both analyses,
on all tasks, the NE+MI group had an initial advantage over the NEMI; however, this
advantage was lost on delayed tests in all cases except for sentence production, due in
part to the modest but stable gains made by the group receiving negative evidence only
(NEMI). Note that the sentence production test required transfer of skills from compre-
hension/input-based practice to the ability to produce the target form. Thus, similar to
Stafford et al. (2012), while comprehension-based practice with negative evidence alone

was effective, exposure to metalinguistic information in feedback provided an advantage

in retention when transfer from input- to output-based skills was involved. Note that, as
suggested by an anonymous reviewer, the production test may have favored the use of
metalinguistic knowledge, which may explain the maintenance of gains on this test for
the group exposed to metalinguistic information.
In order to consider the differential effects of feedback on item learning vs. system
learning, we must examine the pattern of results for untrained items only, which would
indicate degree of success at system learning. An advantage for metalinguistic information
was observed in increased accuracy for untrained items on the interpretation tests (in post
and delayed tests in written interpretation, and overall in aural interpretation), which were
precisely the tests that most closely resembled the way practice was delivered in the study,
i.e. matching one of two pictures with a Latin sentence where the key information is who
does what to whom. We interpret these results as evidence that metalinguistic information
has a significant, positive effect on the realignment of processing cues and on retention (as
evidenced for written interpretation) of the new hierarchy, or learning in Competition
Model terms. In contrast, both types of feedback positively affected the learners ability to
assign semantic functions to nouns in untrained items on the GJT. However, in this case,
metalinguistic information did not contribute over and above what negative evidence alone
contributed to retention of knowledge gained. Finally, metalinguistic information con-
ferred an initial advantage for performance on untrained items in sentence production, but
this advantage was not retained over the two-week interval between the immediate and
delayed post-tests. We suspect that learners cue hierarchy restructuring towards reliance
on verbal agreement and/or on noun case morphology rather than on word order was not
stable enough to allow for transfer to new skills or modalities.
Going back to the main research question guiding the study i.e. whether negative
evidence plus metalinguistic information vs. negative evidence alone included in com-
puter-delivered task-essential practice differentially affects absolute beginners ability to
assign semantic functions in Latin the answer is affirmative. Practice with feedback
that includes grammatical explanation provides an advantage for system learning defined
as cue realignment used to process new items. This advantage was longer-lasting for tests
similar to the learning task than for tests that require transfer of skills (i.e. from compre-
hension/input-based practice to sentence production).
From the more accurate and faster performance observed overall in both groups on
interpretation, judgment and sentence production measures, we conclude that task-essen-
tial practice and either type of feedback (i.e. negative evidence with or without metalin-
guistic information) positively affected learners short-term ability to assign semantic
functions to nouns. These results obtained irrespective of measurement and generally run
parallel to those identified in previous literature on pedagogical conditions in SLA
(Norris & Ortega, 2000; Spada & Tomita, 2010).
Until recently, the loss of effects in more explicit treatments and maintenance of
(more modest) effects in less explicit treatments (e.g. Li, 2010; Norris & Ortega, 2000,
but see Spada & Tomita, 2010) has not received much attention, for a number of reasons.
Early research was limited to immediate effects of explicit instruction, with implicit or
less explicit groups sometimes acting as controls (e.g. Herron, 1991). Also, the surpris-
ing progress in less explicit groups has sometimes been attributed to a lack of control of

Lado et al. 339
external exposure, especially in early classroom research, a suspicion that has no place
in our study given its controlled laboratory design. To what, then, can we attribute the
apparently more stable gains of the NEMI group, compared with the losses on some
measures by the NE+MI group between immediate and delayed tests? This difference in
stability of accuracy gains over time may reflect qualitatively different learning pro-
cesses at work: more explicit processes in the NE+MI group and more implicit processes
in the NEMI group (Li, 2010). The problem is that immediate post-tests cannot reflect
the full extent of implicit learning because implicit learning takes more time than explicit
learning and includes a latent phase of experience-triggered memory consolidation fol-
lowing practice (see, for example, Ari-Even Roth, Kishon-Rabin, Hildesheimer, & Karni,
2005). Such consolidation processes would occur subsequent to and thus not be captured
by performance on immediate posttests.
Faster performance by the NEMI group on the GJT also suggests that the two groups
may have been engaged in qualitatively different processing that led to quantitatively
similar accuracy outcomes, providing further evidence that different types of instruction
may have led to different types of processing in this sample. The faster processing of the
NEMI group on the GJT is reminiscent of R. Ellis (2005) claim that a timed GJT may
be a good measure of implicit knowledge. Faster RTs are usually interpreted as a sign of
increased automaticity, whereas slower RTs are taken to index reliance on slower, con-
trolled processes, including monitoring. For example, Sanz, Lin, Lado, Bowden and
Stafford (2009) found that participants who were required to verbalize their thought
processes while interacting with a grammar lesson were slower at the post-test, which the
authors interpreted as showing that verbalizations had affected the quality of the cogni-
tive processes involved in learning for participants in the more explicit condition.
Irrespective of whether increased reaction times index increased automatic process-
ing or decreased monitoring, they are always considered a sign of efficiency when accu-
racy is maintained and reaction times speed up. Thus, the NEMI group appears to have
been engaged in more efficient, potentially more automatic and less monitored process-
ing of the L3 on the GJT even when accuracy was similar.
Evidence for different cognitive processes underlying similar performance was also
found by Morgan-Short et al. (2010), a study that employed an artificial language para-
digm to examine whether explicit and implicit training differentially affect neural (elec-
trophysiological) and behavioral (performance) measures of syntactic processing
(nounarticle and nounadjective gender agreement). Explicit training conditions in this
study included metalinguistic explanations and meaningful examples of the target lan-
guage, and implicit training conditions provided only meaningful examples. Results
showed that at high proficiency (i.e. when participants had completed a certain number
of practice blocks), accuracy for the explicitly and implicitly trained groups did not dif-
fer. In contrast, electrophysiological (event-related potential, ERPs) measures revealed
striking differences between the groups neural activity: While explicit training (with
metalinguistic information) resulted in some aspects of brain processing found in native
speakers, only implicit training (without metalinguistic information) led to a fully native-
like neurocognitive pattern.
Overall, our results suggest that exposure to both task-essential practice and either
type of feedback (negative evidence with or without metalinguistic information) is

beneficial for L2 learning. This is true both in terms of accuracy and speed of processing.
Even without metalinguistic information, learning occurs, although it appears to require
more exposure to the target language (cf. N. Ellis, 1993) and more time for consolidation
(Ari-Even Roth et al., 2005). We speculate that with more practice and/or more time for
consolidation, participants exposed to negative evidence only might achieve levels of
performance at least comparable and perhaps superior to those obtained by their counter-
parts, and their gains in achievement would eventually include productive skills.
VI Conclusions and future research

In line with previous studies investigating the effectiveness of negative evidence with
or without metalinguistic information, we conclude that both types of feedback appear
to lead to more accurate and faster ability to interpret, judge, and produce target sen-
tences for naive learners of Latin learning how to interpret and assign semantic func-
tions to nouns. Providing metalinguistic information gives an initial advantage to naive
learners on accuracy when processing the target form is cognitively demanding or when
transfer from input- to output-based skills is involved. Two weeks are enough, however,
to see most of that advantage disappear, as participants apparently lose much of what
they had learned from exposure to metalinguistic information. This is especially evident
when processing items that were part of the treatment and could therefore be remem-
bered (and, consequently, be susceptible to forgetting). Importantly, evidence of more
stable (though more modest in the short run) gains by learners receiving negative evi-
dence alone suggests that simple right/wrong feedback combined with task-essential,
written and aural comprehension-based practice that focuses on connecting form with
meaning leads to sustained gains. It may be that receiving negative evidence alone
allows learners to engage in more implicit processing that is evidenced in quicker reac-
tion times and that may, in the long run, foster more stable learning than explicit
processing.
Limitations to take into account when considering the interpretation of results pre-
sented above include the number of items included in each test (12 in comprehension-
based and 10 in production) and the size of the partial found for some of the
interactions (see Tables 2 and 5). Future research should address these limitations by
implementing tests with more items and, whenever possible, by including more than
three options per item.
In addition, future research should investigate the differential effects of feedback
given other linguistic structures, languages, tasks, and instructional contexts, but we
would like to underscore the importance of looking at treatment length/amount of expo-
sure and time elapsed between tests to see if, after a longer period of time and/or addi-
tional exposure, learners who are exposed to less explicit feedback continue to improve
to eventually outperform those exposed to feedback with explicit, metalinguistic
information.
Acknowledgements
We would like to acknowledge the two anonymous reviewers for their constructive comments. In
addition, we thank Alison Mackey for her helpful insights and Rusan Chen for his statistical exper-
tise. Any remaining errors are ours.

Lado et al. 341
Funding
This study is part of The Latin Project, developed to investigate the relationship between individ-
ual differences and pedagogical variables in language acquisition with support from Georgetowns
GSAS and Spencer Foundation grants to Sanz, as well as assistance from Bill Garr and RuSan
Chen of Georgetowns UIS/CNDLS. Lado conducted the experiment with materials developed by
Sanz, Bowden, and Stafford, analyzed the data, and wrote the manuscript with Sanz, with subse-
quent extensive review by Bowden and Stafford.
Notes
1. Although explicitness in feedback is often operationalized as a dichotomy, it is generally con-
sidered to be a continuum that ranges from more explicit to more implicit. As stated by Sanz and
Morgan-Short (2005), the more metalinguistic the learning condition, the more explicit it is; the
more naturalistic the learning condition, the more implicit it is considered to be (p. 235).
2. Finally, studies employing computer-mediated communication (CMC) have also investi-
gated the effect of degree of explicitness of feedback on SLA. These studies, however,
differ from CALL studies in the nature of the feedback provided. Specifically, implicit
feedback in CMC is given through clarification requests or recasts much like in interaction
research, whereas explicit feedback consists of negative evidence, i.e. specific, overt cor-
rections that may or may not include metalinguistic information on the target form (Mackey
& Abbuhl, 2005).
3. Although this percentage may seem high, half of the test sentences were presented in SVO
word order, which may have aided educated guessing on the pre-test, thereby inflating scores.
4. Participants had extensive experience using computers but had no experience with this type of
language learning approach in their classrooms (receiving feedback through the computer).
5. The noun endings present in the study were as follows: subject masculine singular (-us) and
plural (-i); object masculine singular (-um) and plural (-os); subject feminine singular (-a)
and plural (-ae); object feminine singular (-am) and plural (-as). The verb endings were -t for
the singular and -nt for the plural. The number of sentences with each possible word order
combination (SOV, SVO, VOS, VSO, OVS and OSV) was balanced; therefore, learners were
not able to use word order (and Englishs strong SVO order cue in particular) as a reliable
cue to assigning thematic roles. Also, nounverb agreement was a reliable cue only when one
of the two nouns in a sentence was singular and the other was plural (so that only one noun,
namely the subject, agreed with the verb). Again, items resulting from the different combina-
tions were balanced. Finally, animacy, a potentially informative cue to agency, was controlled
by including only animate nouns in the treatment.
6. The purpose of the vocabulary lesson was to teach absolute beginners of Latin basic nouns
and verbs that appeared later in the practice and tests.
7. Given that the target form was case assignment (who does what to whom), students were
taught more nouns than verbs in order to be able to provide sentences with a variety of sub-
jects and objects.
8. We chose not to include a no feedback condition because of the importance given to feed-
back on both theoretical and practical levels, and because such a condition would likely be
extremely frustrating for absolute beginners, given especially that participants received no
explicit instruction before interacting with the practice activities. Such a condition could clearly
be provided, however, if one wished to investigate the effects of practice without feedback.
9. We renamed our conditions after a comment made by one anonymous reviewer who noted that
our treatments differ in the availability of metalinguistic information vs. negative evidence.
Nevertheless, as mentioned by the same reviewer, our NE-MI condition also includes double
exposure to positive evidence.

10. Feedback provided after an incorrect response is similar to that presented in Figures 1 and 2
but included a negative word such as nope or oops instead of a positive word such as right.
11. Each test was preceded by a mini vocabulary quiz to ensure vocabulary knowledge specific
to that test. When participants scored less than 60% on the vocabulary quiz, they repeated the
vocabulary lesson until they reached 100%
12. As suggested by an anonymous reviewer, given the nature of the production test (more time-
consuming than the rest and with no time limit), we do not report reaction time for this test.
13. Since observed power is a consideration mainly for non-significant results, as low power is
often an indication of small sample size, we only report power in this case.
References
Ari-Even Roth, D., Kishon-Rabin, L., Hildesheimer, M., & Karni, A. (2005). A latent consolida-
tion phase in auditory identification learning: Time in the awake state is sufficient. Learning
and Memory, 12, 159164.
Bates, E., & MacWhinney, B. (1989). Functionalism and the Competition Model. In B.
MacWhinney & E. Bates (Eds.), The cross-linguistic study of sentence processing (pp. 313).
New York: Cambridge University Press.
Bowles, M. (2005). Effects of verbalization condition and the type of feedback on L2 development
in a CALL task. Unpublished dissertation, Georgetown University, Washington, DC, USA.
Camblor, M.T. (2006). Type of written feedback, awareness, and L2 development: A computer-
based study. Unpublished dissertation, Georgetown University, Washington, DC, USA.
Carroll, S., & Swain, M. (1993). Explicit and implicit negative feedback: An empirical study
of the learning of linguistic generalizations. Studies in Second Language Acquisition, 15,
357386.
DeKeyser, R. (2003). Implicit and explicit learning. In C. Doughty & M. Long (Eds.), The hand-
book of second language acquisition (pp. 313347). Oxford: Blackwell.
Ellis, N. (1993). Rule and instances in foreign language learning: Interactions of explicit and
implicit knowledge. European Journal of Cognitive Psychology, 5, 289318.
Ellis, N. (2005). At the interface: Dynamic interactions of explicit and implicit language knowl-
edge. Studies in Second Language Acquisition, 27, 305352.
Ellis, R. (2005). Measuring implicit and explicit knowledge of a second language: A psychometric
study. SSLA, 27, 141172.
Herron, C. (1991). The garden path correction strategy in the foreign language classroom. The
French Review, 64, 966977.
Hsieh, H. (2007). Input-based practice, feedback, awareness and L2 development through a com-
puterized task. Unpublished dissertation, Georgetown University, Washington, DC, USA.
Iwashita, N. (2003). Negative feedback and positive evidence in task-based interaction: Differential
effects on L2 development. Studies in Second Language Acquisition, 25, 136.
Jiang, N. (2007). Selective integration of linguistic knowledge in adult second language learning.
Language Learning, 57, 133.
Keck, C.M., Iberri-Shea, G., Tracy-Ventura, N., & Wa-Mbaleka, S. (2006). Investigating the
empirical link between task-based interaction and acquisition: A meta-analysis. In J.M. Norris
& L. Ortega (Eds.), Synthesizing research on language learning and teaching (pp. 91131).
Amsterdam: John Benjamins.
Lee, J., & VanPatten, B. (2003). Making communicative language happen. New York: McGraw
Hill.
Leeman, J. (2003). Recasts and second language development: Beyond negative evidence. Studies
in Second Language Acquisition, 25, 3763.

Lado et al. 343
Li, S. (2010). The effectiveness of corrective feedback in SLA: A meta-analysis. Language

Learning, 60, 309365.
Loschky, L., & Bley-Vroman, R. (1993). Grammar and task-based learning. In G. Crookes &
S. Gass (Eds.), Tasks and language learning: Integrating theory and practice (pp. 123167).
Clevedon: Multilingual Matters.
Mackey, A. (1999). Input, interaction, and second language development: An empirical study of
question formation in ESL. Studies in Second Language Acquisition, 21, 557587.
Mackey, A., & Abbuhl, R. (2005). Input and interaction. In C. Sanz (Ed.), Mind and context in
adult second language acquisition (pp. 207233). Washington, DC: Georgetown University
Press.
Mackey, A., & Goo, J. (2007). Interaction research in SLA: A meta-analysis and research synthe-
sis. In A. Mackey (Ed.), Conversational interaction in second language acquisition: A collec-
tion of empirical studies (pp. 407452) Oxford: Oxford University Press.
Mackey, A., & Philp, J. (1998). Conversational interaction and second language development:
Recasts, responses, and red herrings? The Modern Language Journal, 82, 338356.
Mackey, A., Gass, S., & McDonough, K. (2000). How do learners perceive interactional feed-
back? Studies in Second Language Acquisition, 22, 471497.
MacWhinney, B. (2001). The competition model: The input, the context, and the brain. In P.
Robinson (Ed.), Cognition and second language instruction (pp. 6990). New York:
Cambridge University Press.
McDonough, K., & Mackey, A. (2006). Responses to recasts: Repetitions, primed production, and
linguistic development. Language Learning, 56, 693720.
Moreno, N. (2007). The effects of type of task and type of feedback on L2 development in CALL.
Unpublished dissertation, Georgetown University, Washington, DC, USA.
Morgan-Short, K., Sanz, C., Steinhauer, K., & Ullman, M.T. (2010). Second language acquisition
of gender agreement in explicit and implicit training conditions: An event-related potential
study. Language Learning, 60, 154193.
Nagata, N. (1993). Intelligent computer feedback for second language instruction. Modern
Language Journal, 77, 330339.
Nagata, N., & Swisher, M.V. (1995). A study of consciousness-raising by computer: The effect
of metalinguistic feedback on second language learning. Foreign Language Annals, 28,
337347.
Newell, (1990). Unified theories of cognition. Cambridge, MA: Harvard University Press.
Norris, J.M., & Ortega, L. (2000). Effectiveness of L2 instruction: A research synthesis and quali-
tative meta-analysis. Language Learning, 50, 417528.
Rosa, E., & Leow, R.P. (2004). Computerized task-based exposure, explicitness, type of feedback,
and Spanish L2 development. The Modern Language Journal, 88, 192217.
Russell, J., & Spada, N. (2006). The effectiveness of corrective feedback for the acquisition of L2
grammar: A meta-analysis of the research. In J.M. Norris & L. Ortega (Eds.), Synthesizing
research on language learning and teaching (pp. 133157). Philadelphia, PA: John
Benjamins.
Sachs, R., & Suh, B. (2007). Textually enhanced recasts, learner awareness, and L2 outcomes in
synchronous computer-mediated interaction. In A. Mackey (Ed.), Conversational interaction
in second language acquisition: A collection of empirical studies (pp. 199227). Oxford:
Oxford University Press.
Sagarra, N. (2007). From CALL to face-to-face interaction: the effect of computer-delivered
recasts and working memory on L2 development. In A. Mackey (Ed.), Conversational
interaction in second language acquisition: A collection of empirical studies (pp. 229248).
Oxford: Oxford University Press.

Sanz, C. (2004). Computer delivered implicit vs. explicit feedback in processing instruction.
In B. VanPatten (Ed.), Processing instruction: Theory, research, and commentary (pp. 241
255). Mahwah, NJ: Lawrence Erlbaum.
Sanz, C., & Lado, B. (2008). Technology and the study of awareness. In J. Cenoz & N.H.
Hornberger (Eds.), Encyclopedia of language and education: Volume 6: Knowledge about
language (2nd ed.) (pp. 299312). Philadelphia, PA: Springer Science+Business Media
LLC.
Sanz, C., & Morgan-Short, K. (2004). Positive evidence versus explicit rule presentation and
explicit negative feedback: A computer assisted study. Language Learning, 54, 3578.
Sanz, C., & Morgan-Short, K. (2005). Explicitness in pedagogical interventions: Input, practice,
and feedback. In C. Sanz (Ed.), Mind and context in adult second language acquisition:
Methods, theory, and practice (pp. 234263). Washington, DC: Georgetown University Press.
Sanz, C., Lin, H.-J., Lado, B., Bowden, H.W., & Stafford, C.A. (2009). Concurrent verbalizations,
pedagogical conditions, and reactivity: Two CALL studies. Language Learning, 59, 3371.
Segalowitz, N. (2003). Automaticity and second language acquisition. In C. Doughty & M. Long
(Eds), The handbook of second language acquisition (pp. 382408). Oxford: Blackwell
Publishers.
Spada, N., & Tomita, Y. (2010). Interactions between type of instruction and type of language
feature: A meta-analysis. Language Learning, 60, 263308.
Stafford, C.A., Sanz, C., & Bowden, H. (2012). Optimizing language instruction: Matters of
explicitness, practice, and cue learning. Language Learning, 62, 741768.
VanPatten, B. (2005). Processing instruction. In C. Sanz (Ed.), Mind and context in adult second
language acquisition (pp. 267281). Washington, DC: Georgetown University Press.

Language Teaching Research 2014 Lado 320 44

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Language Teaching Research 2014 Lado 320 44

Caricato da

Copyright:

Formati disponibili

Languagehttp://ltr.sagepub.

A fine-grained analysis of the effects of negative evidence with and without

The online version of this article can be found at:

Email Alerts: http://ltr.sagepub.com/cgi/alerts

>> Version of Record - Jun 18, 2014

Downloaded from ltr.sagepub.com by Matt Golestan on October 22, 2014