Sei sulla pagina 1di 49

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/234472342

Language corpora and the language classroom

Chapter · January 2010

CITATION READS
1 208

2 authors:

Pascual Pérez-Paredes María Belén Díez-Bedmar


University of Cambridge Universidad de Jaén
91 PUBLICATIONS   184 CITATIONS    36 PUBLICATIONS   144 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

INTlUni View project

Book reviews View project

All content following this page was uploaded by Pascual Pérez-Paredes on 14 July 2014.

The user has requested enhancement of the downloaded file.


Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Language corpora and the language classroom

1. Introduction

These days, language corpora are being used by language teachers, researchers and students
more and more often. Computers have become widely available in homes and schools,
corpora can be searched on the Internet for free and corpus resources have improved the
quality and the access to the methods of corpus linguistics in applied fields such as foreign
language teaching. Compiling your own ad-hoc corpus or a corpus of your own students is
easier today than ever before and free resources abound.

The most important application of corpora in language classrooms is called Data-driven


learning. Corpus Linguistics (CL) and Data-driven learning (DDL) are two terms that have
caught the attention of teachers in foreign language teaching (FLT) and researchers alike for a
decade now. This is so because the assumptions behind CL and DDL are of enormous
importance to language researchers and FL teachers. In a very recent publication, O'Keeffe,
McCarthy and Carter (2007:21) state the following about the application of language corpora
in FLT:

As well as providing an empirical basis for checking our intuitions about language,
corpora have also brought to light features about language which had eluded our
intuition […] In terms of what we actually teach, numerous studies have shown us that
the language presented in textbooks is frequently still based on intuitions about how
we use language, rather than actual evidence of use.

It seems that language corpora can help us discover that which apparently appears undisputed
in prescriptive or in intuition-led textbooks and other reference materials.

In the following paragraphs, we will offer a brief account of the implications of CL and DDL
for mainstream FLT. In particular, we aim to present useful insights into how using language
corpora can help our teaching.

Most of the resources presented in this chapter are freely available on the Internet.

Page 1 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

2. Corpus linguistics and Data and Data-driven learning in a


nutshell

2.1. Data in FLT: preliminary issues

Data-driven learning is a language learning approach that is “basically developed through


self-conscious activities instead of being imparted through conceptual knowledge” (Pérez
Basanta, C and Rodríguez Martín: 146-7). In DDL, learners become active researchers, they
see language from a different perspective and discover language and communication facts that
otherwise may remain unseen.

In DDL, reading concordance lines is a usual practice. Take the word important, a basic
adjective that learners use on an everyday basis in schools. The following screenshot from
Collins WordbanksOnline English corpus1 shows fifty random uses of the Word in a 10-
million corpus of spoken British English:

1
http://www.collins.co.uk/Corpus/CorpusSearch.aspx

Page 2 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 1. Sample concordances of important in the Collins WordbanksOnline English corpus.

In a way, DDL promotes vertical reading rather than horizontal reading as learners are invited
to look at the accumulated frequency and co-occurrence of lexical items. In Figure 1, learners
could note the following:

The words to the left of important: more, most, quite, awfully, very, etc.
The words to the right of important: to + infinitive, factor, thing, point, etc.

However, using concordance lines is useful to note language behaviour that goes beyond the
boundaries of two words that appear in contiguity. Take the word sure as an instance. The
Cambridge Advanced Learner‟s Dictionary2 offers 8 entries for the word. You can find the
entries and examples below:

1: certain; without any doubt:


"What's wrong with him?" "I'm not really sure."

2
http://dictionary.cambridge.org/

Page 3 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

I'm sure (that) I left my keys on the table.


I feel absolutely sure (that) you've made the right decision.
It now seems sure (that) the election will result in another victory for the government.
Simon isn't sure whether/if he'll be able to come to the party or not.
Is there anything you're not sure of/about?
There is only one sure way (= one way that can be trusted) of finding out the truth.
See also cocksure.

2 be sure of/about sb to have confidence in and trust someone:


Henry has only been working for us for a short while, and we're not really sure about him yet.
You can always be sure of Kay.

3 be sure of yourself to be very or too confident:


She's become much more sure of herself since she got a job.

4 be sure of sth be confident that something is true:


He said that he wasn't completely sure of his facts.

5 be sure of getting/winning sth to be certain to get or win something:


We arrived early, to be sure of getting a good seat.
A majority of Congress members wanted to put off an election until they could be sure of
winning it.

6 be sure to to be certain to:


She's sure to win.
I want to go somewhere where we're sure to have good weather.

7 make sure (that) to look and/or take action to be certain that something happens, is true, etc:
Make sure you lock the door behind you when you go out.

8 If you have a sure knowledge or understanding of something, you know or understand it


very well:
I don't think he has a very sure understanding of the situation.

Isolated from any context, sure is usually taught as being highly assertive, that is, it is taught
to express certainty like I’m sure I was there. Of course, there is nothing wrong with this. As
you have read above, this is the usual mainstream use of the word. However, if we search for
sure in a corpus, in this case the SACODEYL English corpus of European young people, we
will find that there is a new pattern which emerges clearly: I‟m not sure + what / if/ whether.
See Figure 2:

Page 4 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 3: sure in SACODEYL English corpus.

It appears that I’m not sure is a powerful pattern to express hedging or tentative opinion as in
I’m not sure if I’d like to live there. Or followed by a canonical Subject + Verb + Complement
clause to indicate contrast or opinion as in I’m not sure. I’ve always wanted to be... or in I’m
not sure. I find art relaxing because…

As you can see, when we examine the different contexts in which a node is found, that is, the
word you are looking up, we can clearly see different patterns of use that are not always found
in textbooks or dictionaries.

Corpus linguists often discuss this phenomenon and try to account for it by looking at
language as a lexico-grammatical field of interplay rather than one where meaning is created
by the use of word in isolation (i.e. sure).

Bernardini (2004:16) highlights the fact that in DDL there is a “shift of emphasis from
deductive to inductive learning routines” which has a great impact on the agents of FLT. This
is summarised in Table 1:

FLT agents Shift


Teachers Become coordinators of research and facilitator
Learners Learn how to learn through exercises that involve the observation
and interpretation of patterns of use
Pedagogic grammars Are now informed by enough evidence and stimuli for the learner to

Page 5 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

arrive at developmentally-appropriate generalisations


Table 1. Shift of emphasis in DDL-FLT (Bernardini 2004: 16-7).

DDL then is about using data to promote richer language learning experiences. The
definition needs clarification, though. D in DDL stands for data, in other words, for language
data:

However, we should say that in the CL literature these data markedly present a computational
reading. We will try to go deep in the implications for language teachers and deflate the
obscurity that the term may shed in the following paragraphs.

2.1.1. Our English teaching is mediated by language data

We may have not reflected on the issue before, but when we decide on a textbook we are
opting for a particular set of language data to be used in our classroom.

In all probability, you face a situation where the Education Authorities have set an official
curriculum that you are bound to abide by. In a similar way, as a member of a large
institution, you are required to follow certain general methodological guidelines. Leaving
organizational aspects aside, however, teachers have the chance to reflect on their teaching
and choose the materials that best suit their learners. What choices can you make in terms of
the contents of your teaching? What are the main ingredients of your teaching? Do you stick
to a textbook? If so, to what extent do you or your Department consider the language in
there? Have you examined the language used in your textbook?

This is a fundamental issue that deserves our attention. EFL teachers, as most professionals in
other teaching areas, rely on solvent, reliable publishing houses that make an effort to mediate
between the learners and their teachers. In this process, the teacher, or group of teachers of a
school, has the opportunity to revise first and select then the textbooks that will be later used.
If we use language corpora as a complement to our teaching, we will be enlarging the width
of the scope of the language that we present to our students and, certainly, we will be
enriching their learning environment (Aston 1997).

But, before we move on to dealing with the ways in which we can use language corpora, let
us consider briefly the very basics of corpus linguistics.

Page 6 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

2.2. Introducing Corpus Linguistics

Corpus linguistics (CL) makes use of data to gain insight into how language works. A well-
known definition for corpus is the following:

Any collection of more than one text can be called a corpus, (corpus being Latin for
"body", hence a corpus is any body of text). But the term "corpus" when used in the
context of modern linguistics tends most frequently to have more specific connotations
than this simple definition3.

This definition is well rooted in the linguistic tradition, and thus the connotations that
McEnery and Wilson bring up are concerned with the role of a corpus in a research-oriented
paradigm. These connotations are

 representativeness,
 size,
 machine-readable form and
 standard reference.

If linguists claim that using a corpus is a convenient way to research language use and
behaviour, they have to make sure that their tool, that is their language corpus, and their
methodology are geared towards maximizing the representative quality of the language
samples that have been included in the corpus. McEnerey and Wilson have put it this way:

We are therefore interested in creating a corpus which is maximally representative of


the variety under examination, that is, which provides us with an as accurate a picture
as possible of the tendencies of that variety, as well as their proportions. What we are
looking for is a broad range of authors and genres which, when taken together, may be
considered to "average out" and provide a reasonably accurate picture of the entire
language population in which we are interested4.

An example of all this is the British National Corpus (BNC). The BNC claims to be
representative of the English language used in the UK in the late 80‟s; its size (100 million
words) is big enough to include most communications genre and textual types; it is of course
electronic and, as a consequence of it all, it has become a standard reference of British
English. The BNC is introduced in its website as follows:

The British National Corpus (BNC) is a 100 million word collection of samples of
written and spoken language from a wide range of sources, designed to represent a
wide cross-section of British English from the later part of the 20th century, both
spoken and written. The latest edition is the BNC XML Edition, released in 2007.

3
McEnery and Wilson. Corpus Linguistics. Available at
http://bowlandfiles.lancs.ac.uk/monkey/ihe/linguistics/corpus2/2fra1.htm
4
Idem.

Page 7 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

The written part of the BNC (90%) includes, for example, extracts from regional and
national newspapers, specialist periodicals and journals for all ages and interests,
academic books and popular fiction, published and unpublished letters and
memoranda, school and university essays, among many other kinds of text. The
spoken part (10%) consists of orthographic transcriptions of unscripted informal
conversations (recorded by volunteers selected from different age, region and social
classes in a demographically balanced way) and spoken language collected in different
contexts, ranging from formal business or government meetings to radio shows and
phone-ins5.

The BNC can be searched free of charge from http://www.natcorp.ox.ac.uk/ The results are
limited to 50 hits, but this is enough to have a clear idea of what we are looking into:

Figure 3. The BNC website.

However, using corpora is not the ultimate, one and only solution to linguistic inquiry and
research. This is not the place to revisit the old controversy between Noam Chomsky and
Charles Fillmore, two influential linguists of the second half of the XXth century. The former
has overtly criticized the use of language corpora as they are not seen as a reliable way to
render the complexity and vastness of language. Chomsky believed that the rules governing a
language could actually be scrutinized through introspection; the actual performance was
considered, by contrast, something that could not be apprehended. Fillmore criticised
armchair linguists that do not use real, that is, attested language data and, on the contrary,
rely on their own intuition and idiolect to develop complex theories of language.

5
From http://www.natcorp.ox.ac.uk/corpus/index.xml

Page 8 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

By the way, Fillmore criticises similarly corpus linguists that waste their time on design
issues, but that‟s a different story. The point here is that there has traditionally been a
controversy between introspection and data examination as valid tools for linguistic analysis.
Corpus Linguistics has gained now the interest of many researchers that believe that data need
to be collected before we can jump into conclusions about language use. In this sense, CL
methodology is empirical and data-driven.

Corpus-based research can be then characterised by two main features (Conrad 1999:3-4):

1. The use of a principled collection of naturally-occurring texts, that is, a corpus. The
BNC discussed above.

2. The use of computers for language analyses. Depending on the items being analysed,
these can be automatic or may need human interaction.

Corpus-based studies include both quantitative analyses and functional interpretations of


language use. The following table offers the basics of CL:

Page 9 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Term Explanation
Chunks Groups of words that cluster together in n-number of words, i.e., 2,3,4,5, etc.
These are not necessarily phrases (i.e. Noun Phrases) or clauses, but rather
words that combine together in a statistically significant way. I don’t know,
what I really mean or a couple of are good examples of chunks.
Collocates Words that occur frequently in contiguity or almost in contiguity. To
determine whether a collocate is significant, the software package performs
statistical analyses.
Concordance Lines of text which show a node in the middle. The node is the word or string
lines of words that is being searched in a corpus.
Concordancer The software that generates concordance lines.
Corpus A principled collection of texts. This collection should follow strict design
guidelines if the corpus is to represent a language or a register.
Wordlist The list of words that are found in a corpus or in a particular text. This list
usually shows the frequency of occurrence and, possibly, other statistical
indexes.
Table 2. The basics of CL.

All these terms are usually found in descriptive accounts of English and have a very
interesting potential in language learning. For example, chunks are strings of n-words that
cluster together in a systematic way. Linguists such as Lewis (1993) or Nattinger and De
Carrico (1992) have stressed that lexis is primed over grammar in discourse:

Lexis is central in creating meaning, grammar plays a subservient managerial role. If


you accept this principle then the logical implication is that we should spend more
time helping learners develop their stock of phrases, and less time on grammatical
structures6.

Corpora are useful in revealing that the language speakers use relies heavily on chunking, that
is, the repetition of string of words. O'Keeffe, McCarthy and Carter (2007:60) highlight that
“language is available for use in ready-made chunks to a far greater extent than could ever be
accommodated by a theory of language which rested upon the primacy of syntax”. Let us give
you real instances of chunking in English. These authors have used the CANCODE corpus 7, a
5-million word corpus of spoken British English, to generate the most frequent chunks of n-
words. These are the results for the top 1 and 2:

Top 1 chunk Top 2 chunk


3-word chunks I don‟t know a lot of
4-work chunks You know what I know what I mean
5-word chunks you know what I mean at the end of the
6-word chunks do you know what I mean at the end of the day

and these for the top 15 and 19 (chosen at random):

Top 15 chunk Top 19 chunk

6
Islam and Timmis: http://www.teachingenglish.org.uk/think/methodology/lexical_approach1.shtml
7
http://www.cambridge.org/elt/corpus/cancode.htm

Page 10 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

3-word chunks I think it‟s you know the


4-work chunks or something like that that sort of thing
5-word chunks I don‟t know what it an hour and a half
6-word chunks and at the end of the if you see what I mean (top
16)

O'Keeffe, McCarthy and Carter (2007:71) state that despite being syntactic fragments, these
chunks perform a very important pragmatic function beyond the word level and, significantly,
many have a discourse marking function (I mean, you know, you know what I mean, at the
end of the day, if you see what I mean,...).

In the same way, a corpus can be used to generate collocates, frequency lists and, as seen,
concordance lines. There are software packages that can handle this. Probably WordSmith
5.08 is one of the most complete suites available. Interesting non-commercial applications
include:

Generate concordance lines for every word in a text:


Text-based concordances: http://www.lextutor.ca/concordancers/text_concord/

Generate chunks for a text:


N-Gram phrase extractor: http://www.lextutor.ca/tuples/eng/

Search principled corpora:


Online concordancer: http://www.lextutor.ca/concordancers/concord_e.html

Generalte concordance lines, frequency lists, etc.:


Tubo Lingo: http://www.staff.amu.edu.pl/~sipkadan/lingo.htm

8
http://www.lexically.net/wordsmith/

Page 11 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 4. Online concordancer.

Page 12 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

2.3. How can we make use of Corpus Linguistics? Indirect


approaches
Following Geoffrey Leech, Römer (2008) distinguishes between indirect and direct
applications of CL in the field of language teaching. Indirect approaches to corpora provide
access to corpus-informed insights into the nature of language. Those who consume this
information are typically, although not exclusively, researchers and language material writers
and designers. The typical users of this approach are teachers and learners themselves. The
following figure summarises this dichotomy:

Figure 5. Indirect and direct applications of CL in the language classroom (Römer 2008).

Direct approaches are focused on straight, hands-on learning activities and the generation of
classroom material. These direct hands-on experiences can be either guided or unguided by
the teachers, and thus it is likely that most teachers find tasks that are suitable to their
students‟ needs and contexts.

Indirect approaches to using corpora in the language classroom have occupied the agenda of
applied linguists for over a quarter of a century now. These approaches have benefited from
linguistic research into the nature of language and offer a fresh non-normative view of
naturally occurring language. One of the main contributions of these studies is that corpus
data very often question our perceptions of how language works. A good example of this is
Biber (1988) and, particularly useful in the context of FLT, Biber at al. (1999):

Page 13 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 6. Longman Grammar of Spoken and Written English (LGSWE).

The authors of the LGSWE claim that this work “describes the actual use of grammatical
features in different varieties of English: mainly conversation, fiction, newspaper language,
and academic prose […] The LGSWE adopts a corpus-based approach, which means that the
grammatical descriptions are based on the patterns of structure and use found in a large
collection of spoken and written texts, stored electronically, and searchable by computer”
(Biber et al. 1999: 4). So the idea here is that a well-designed corpus can be useful in learning
more about how language works. This is useful for both native and non-native speakers as
even the latter cannot rely on pure intuition to determine how language works across every
single register and communicative domain.

Let us have a look at one syntactical construction to illustrate the usefulness of corpora in the
language classroom. Existential clauses contain, in most cases, be as a verb and there as a
subject: There is no coffee is a nice example of locative here. There, however, introduces
other verbs: seem, appear, suppose and use to are nice examples. When to use one or another
as their meanings are so close? In the LGSWE we find corpus-driven information that tells us
that the frequency of appearance of these verbs after existential there depends on the textual
and domain features of the communicative event.

Thus there exist/exists is very frequent in academic texts while it is rare or infrequent in
conversation, fiction and news language. There come/comes, on the contrary, is infrequent in
academic language, conversation and news, but very often found in fiction texts and creative
language use. Figure 7 illustrates this point:

Page 14 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 7. Verbs other than be in existential constructions. Biber et al. (1999).

When these and similar verbs are followed by to be we discover interesting facts. There
seem/seems to be is found to occur across all 4 domains and textual types while there used to
be is untypical and not frequent at all in fiction, news or academic language:

Figure 8. To be after some verbs in some existential constructions. Biber et al. (1999).

In these examples we can note the interplay between grammatical categories and register.

Page 15 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

3. Direct approaches

As stated, direct approaches are more prone to immediate, straightforward classroom


applications. In some schools, it might be convenient to make use of a computer room while
in others teachers will prefer to develop materials that can be printed and later distributed. The
nature of the lesson will determine what kind of interaction we expect from our students.

3.1. Some tips

If you want your learners to plunge into using a corpus, our suggestion is to follow a
carefully-planned route:

1. Select a small group of learners. Using technology is cumbersome at times and computers
tend to crash in multimedia LANs which are often used by many. If your LAN restricts IPs or
domains, make sure before hand that the sites you plan to use are availble.

2. Avoid meta-language, such as linguistics, node or principled corpus. It is language, real


language that your learners will be more interested in.

3. Before getting your students to use a concordancer or a similar tool, distribute activities
where they can get used to reading vertically rather than horizontally. Make sure they get used
to interpreting the context and making hypothesis about contexts of use and prosodies, that is,
whether the line is used in a derogatory way or positively.

4. Select what you want your students will be looking up well beforehand. Examples or
activities that are over the top easily discourage students.

5. Try to put interesting questions to your students. Motivate them and make them become
interested in turning themselves into researchers or, better, detectives.

6. Select carefully the corpus you want to use. You may consider building your own corpus.

3.2. Activities: using SACODEYL

A corpus is an excellent tool to discover language behaviour and to learn more about
collocations and patterning. In teaching contexts, principled corpora may not adapt well to
your students‟ level, especially if these are very young. We recommend that you build your
own collection of texts if they are suitable to your students‟ needs. However, using
SACODEYL is a more straightforward option if you want to use teen talk, multimedia
corpora:

Page 16 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

By using a corpus as a tool to find out language, learners are given the chance to empower
their inductive skills to learn about language, which is highly instrumental for further
learning. Sinclair (2004:288) is definitely optimistic about the unmediated use of reference
corpora in the language classroom:

...both teacher and student can make use of a corpus right away, with only a modest
few hours orientation; there is no need to wait for the new textbooks and reference
books. Only fairly simple queries can be handled at this stage, but the results can be
illuminating and very helpful. For this, you will need a computer of normal
performance, a corpus and some query software. Will the corpus be 100% reliable,
comprehensive and representative? Of course not, but do your present books match
these targets? Or your reference grammars and dictionaries? Or any native speaker
models? Or any combination of these? Of course not.

Despite Sinclair‟s statement, the teaching context in secondary education is still far from
complying with much of the requirements above. Good reference corpora are commercial and
search tools are difficult to handle9. Mauranen (2004:1999) has voiced her concern for the
actual use of innovation in classrooms:

No teaching method can become an important innovation, whatever its potential, if it


does not make its way to the normal classroom where teachers and students ca use it as
part of their everyday routines, whit not too much extra hassle.

Fortunately, there are now a few instances of pedagogical corpora whose focus is more on
learning than on linguistic research and which happen to be free to use. SACODEYL is one of
these pedagogically-motivated corpora. ELISA, its predecessor and inspiration, is another
interesting effort:

ELISA is a collection of video-based interviews with native speakers of different


varieties of English (e.g. US, England, Scotland, Ireland, Australia) and from different
walks of life. They talk about their professional career. All interviews follow a general
pattern, covering a similar range of topics, e.g. the what the speakers do, their
educational background, how they started their career or business, the type of projects
they are involved in, their daily routines and future plans. While some of the speakers
engage in unusual professions (e.g. a tour guide at Ayers Rock, a guitar teacher, a
travel journalist and an arts therapist) and thus make for the attraction of the materials,
they all describe issues of general interest in professional contexts. The corpus
currently contains 25 interviews of 5 to 15 minutes. the transcripts amount to about
60,000 words in total10.

9
Guy Aston and Lou Burnard published in 1998 The BNC handbook: exploring the British National Corpus
with SARA. Edinburgh Textbooks in Empirical Linguistics, an excellent reference book to fully exploit SARA.
10
http://www.uni-tuebingen.de/elisa/html/elisa_index.html

Page 17 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

SACODEYL offers young learners the language and the voices of their peers. As in ELISA,
SACODEYL kids talk about their daily routines, about themselves, their schools, their
hometowns, their leisure time activities and hobbies, films, books, sports and many other
topics.

The SACODEYL corpus has been annotated with a view on pedagogical applications. This
makes SACODEYL a very interesting complementary material in mainstream teaching where
teachers and students can find a familiar range of language/communications context. The
following figure illustrates this:

Figure 9. SACODEYL search categories.

These categories resemble the language and the communication-oriented methodology of


mainstream language teaching. Learners ant teachers using SACODEYL may want to
navigate the English corpus in exactly the same way as they mavigate the contents of their
textbook. In SACODEYL, every interview has been split into sections, that is, convenient
teaching and learning stretches of language which have a pedagogical value. Each section has
been annotated by experienced teachers who have assigned them a full array of categories and
subcategories. Having annotated the corpus, this can be searched accordingly:

Page 18 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 10. SACODEYL search categories in detail.

Users can also browse interviews:

Page 19 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 11. Browse area for SACODEYL English corpus.

And sections within interviews, search for sections that meet the criteria you set:

Figure 12. Browse area for SACODEYL section search.

Let us consider some activities for the language classroom. We assume that your learners are
Secondary School students of English, so we will use SACODEYL English corpus, a small

Page 20 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

corpus of teenage talk contributed by some 25 interviewees from the Reading area in the UK.
Here is a selection of activities that illustrate the type of

3.2.1. Activities focused on communication and attention to form

Tell your students to search for [Reading]. You may want to introduce them to the area and
neighbouring cities, all of them widely known. Ask them to read the concordance lines and
get them to classify (A) words on the left, (B) words on the right and (C) contexts of use:

Figure 13. Simple SACODEYL word search.

The following screen shows the number of hits by displaying the concordance lines:

Page 21 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 14. SACODEYL Search tool.

You may want to guide your students in their search. Providing tables to fill in is usually very
productive as this keeps students focused on the task, which becomes more convergent:

A Write here the most frequent words or punctuation to the left of Reading

(like, feel, tell) about (live, be) (here) in the (centre, outskirts) of
B Write here the most frequent words or puntuation to the right of Reading

as a place ./? festival


C Guess: What is it talked about?

Context 1 Context 2 Context 3

Page 22 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Reactions to/ opinions on Staying in Reading of leaving Reading festival


your hometown / Reading / Travelling
where you live

Table 3. Fill-in table.

In A and B students are invited to observe the surrounding context of a word and note the
accumulation of certain instances to the left or to the right of the node. In C, students are
invited to make hypotheses about what is being talked about. If desired, you can explore uses
of like about / feel about / tell about or [Murcia/ Cartagena as a place] or, more from a
communicative perspective, expressing opinion about your city/ place or the place where you
live. If you tell your students to search for [like about], they will be given instances where
kids use it in a real context embedded in the flow of speech. And more importantly, your
students will be presented with an opportunity to disambiguate other uses of [like about]:

Figure 15. SACODEYL Search tool.

In the case highlighted above, [like about] is used as a hedge, a very common feature of
spoken English. This is a convenient way to combine communication oriented teaching and
Form-focused instruction. This range of activities is focused on analysing the context of use
of a given word [Reading], both linguistically and communicatively.

Page 23 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

In a unit where music and concerts are presented, you may want to ask your students to find
out about [Reading Festival]. This is what they may find11:

Figure 16. SACODEYL Search tool.

From here, students can go to the interview section where the speaker talks about it:

11
At the time of writing, the corpus search facility was under construction, so search results may vary.

Page 24 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 17. SACODEYL Search tool: section level.

and read and listen to what this speakers says about it:

Figure 18. SACODEYL English corpus: section level.

It is interesting to see how the online nature of spoken discourse affects the way we put things
while speaking. In this very short extract, your students can find the following, among others,:

-Native correction: [gonna to]


-Unfinished sentences: [been so, but]
-Contractions not frequently used by Sapnish speakers: [it‟ll be]

Page 25 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

As put by Bernardini (2004: 17) working “concordancing in particular may prove unique in
the acquisition and restructuring of competence [...] Language learning may be viewed as an
inductive process in which meaning and form come to be associated”.

3.2.2. Activities focused on attention to form and communication

Römer (2008: 19) has pointed out that concordance lines can be used by teachers to “create
DDL exercises tailored to their learners‟ proficiency level and their particular learning needs”.
A case in point is the use of articles. This will be dealt with later in chapter 4 from a different
angle.

Let us search for sections in SACODEYL English corpus that have been annotated as being
representative of this particular linguistic feature:

Figure 19. SACODEYL English corpus: category search on section level.

From this you may want to select stretches of language that can be submitted to students for
evaluation and analysis or simply they can be used as materials to improve their mastery of
the form. The following bits are interesting for different reasons. A is actually very
convenient to see the use of the indefinite article:

(A)

Page 26 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Interviewer: So, what kind of house do you live in? Can you describe what kind of
house you live in?
Rachel: It‟s a semi-detached and it‟s got a garage and a big garden and it‟s quite big. It‟s got
quite a lot of rooms but I have to share my room with my sister.

You could present this in a cloze format:

Interviewer: So, what kind of ...house do you live in? Can you describe what kind of
...house you live in?
Rachel: It‟s ... semi-detached and it‟s got ...garage and ... big garden and it‟s quite big. It‟s
got quite ... lot of rooms but I have to share my room with my sister.

In B, we can notice the presence of the zero article:

(B)

Interviewer V: You say you‟ve got a lot of work this year why is that?
Sam: It‟s our first year of GCSEs so you‟ve got course work and it‟s like
writing essays for different subjects. And recently we‟ve been doing English we
did a we did a we did course work on a book Hard Times by Charles Dickens. Which
was a bit boring but, but we‟ve finished that now so it‟s alright.

You could present this in a cloze format:

Interviewer V: You say you‟ve got a lot of work this year why is that?
Sam: It‟s our first year of GCSEs so you‟ve got ...course work and it‟s like
writing ...essays for ...different subjects. And recently we‟ve been doing ...English we
did a we did a we did ...course work on ... book Hard Times by Charles Dickens. Which
was a bit boring but, but we‟ve finished that now so it‟s alright.

In actual fact, (B) can be expanded easily into an interesting source for pragmatic information
including sentence restructuring [did a a we did], sentence relatives to express evaluation
[Which was a bit boring] and conclusion [so it‟s alright].

Barlow (1996) sees in activities like these a potential for teachers to enrich the learning
environment and students‟ knowledge of language.

For a thorough account of concordance-based DDL, we suggest reading a practical book on


the issue (Tribble and Jones 1990):

Page 27 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 20. Concordances in the classroom, by Chris Tribble and Glyn Jones. Longman 1990.

Page 28 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

4. Indirect approaches: Learner corpora in the EFL classroom

4.1. Definition

Among the many types of corpora which can be compiled, analysed and used (see McEnery,
Xiao and Tono, 2006, for an overview), Computer Learner Corpora (CLC) stand out as one of
the most powerful pedagogic tools for the EFL or ESL classroom. As recently defined, they
are

„[…] electronic collections of foreign or second language learner texts collected on the basis
of strict design criteria.‟ (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007: 254)

In other words, a learner corpus is compiled when the oral or written texts produced by your
students of English are collected with strict design criteria, put in electronic format, and then
stored in your hard drive, memory stick, etc., so that you can conduct analyses with
programmes like WordSmith Tools, already mentioned:

Figure 21. From oral or written texts to a computer learner corpus.

Thanks to the availability of computers and freely available software to carry out analyses,
Learner Corpora Research (LCR) has been a fruitful field since the second half of the 1990s.

From that moment onwards, the growing number of publications either in edited volumes (cf.
Granger, 1998; Granger, Hung and Petch-Tyson, 2002; Guilquin, Papp and Díez-Bedmar, in
press, etc.), or international journals (cf. Corpora, Applied Linguistics, English Corpus
Studies, Journal of English for Academic Purposes, ReCALL, etc.) shows the potential of this
type of research and constitutes the first steps to the awareness of the possibilities that CLC
can offer for Second Language Acquisition and for the TEFL or TESOL classroom.

Page 29 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

4.2. Types of CLC

Due to the importance of CLC-based results, the number of CLC has mushroomed since the
second half of the 1990s. The research questions pursued by various researchers or research
teams have fostered different types of CLC, which are frequently classified according to four
related variables, namely the mode of the language in the learner corpus, its size, the type of
intervention (i.e. when the CLC-based will be applied in the design of materials, the
sequencing of the curriculum, etc.), and the type of annotation in the corpus.

Written
Mode Spoken
Multimedia
Big (commercial or some research teams)
Size Small
(research)

Delayed Human Intervention


Type of Intervention12
Early Human Intervention

Raw
Type of annotation13
POS-tagged
Semantically- tagged
Error-tagged
Table 4. Main variables considered for the classification of learner corpora.

4.3. Methodologies used with CLC

Compiling students‟ production does not constitute new practice to teachers of English as a
second or foreign language, as it has always been considered to create remedial exercises, test
their command of the foreign language, etc. However, the methodology used to conduct the
analysis of the students‟ production has changed along time, as researchers and teachers have
focused their attention on different aspects (the students‟ L1, the target language, etc.) and
technology has made it possible to compile CLC, i.e. learners‟ real data in electronic format.

Table 5 shows the three main methodologies used before the arrival of CLC. The first one,
Contrastive Analysis, in its strong form, did not consider the students‟ production, but the
12
This distinction was made by Sinclair (2001, vii).
13
For the types of annotation, refer to McEnery and Wilson

Page 30 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

similarities and differences between the students‟ L1 and their target language (i.e. Spanish
and English), in order to predict the difficulties that students would have. The weaknesses
found in this methodology led researchers to shift their attention to Error Analysis, whose
theoretical principles and methodological issues were provided in a series of articles in the
1960s and 1970s (and reprinted in Corder, 1981). Specially outstanding was the paper „The
significance of learners errors‟ (included in Corder, 1981), which proved that errors were
crucial to researchers, teachers and students, since they all could learn from them and apply
that knowledge to their research, teaching practice or learning process. Thus, the steps for
conducting an EA were followed by many teachers and researchers and the results published,
on some occasions, as dictionaries and lists of common errors.

However, Error Analysis only considered errors and dismissed the learners‟ correct use of the
foreign or second language. This led Selinker to his Interlanguage Analysis (IA) (Selinker,
1972), which examined the students‟ entire production, i.e. errors and non-errors alike. In this
way, it was possible to obtain a better description of the students‟ use of the foreign language
when performing a task at a specific point in time in their language learning process: their
interlanguage.

Methodology Focus of interest Publications


Contrastive Analysis (CA) Comparison of Lado (1957)
the students‟ L1 and their TL
Error Analysis (EA) Students‟ real errors Corder (1981)
Pre CLC

The students‟ whole


Interlanguage Analysis (IA) Selinker (1972)
production, errors and non-
errors
Table 5. Methodologies used to describe the students‟ language before CLC.

Despite not in a systematic way, teachers of English as a foreign or second language


frequently analyse their students‟ production following any of these methodologies or a
combination of some of them.

For instance, an Error Analysis is conducted when a teacher corrects a batch of essays and
uses a code system, i.e. an error taxonomy,14 to make the students aware of the type of error
made. Thus, „sp‟ may stand for a spelling error, „wo‟ for word order, „prep‟ for a problem
with a preposition, etc. After marking all the essays, and skimming his or her annotation, the
teacher realises that the most frequent error in the compilation of essays has to do with a
certain aspect of the foreign language (be it prepositions, articles, verb tenses, etc.). If the
correct instances of those aspects are considered together with the incorrect ones, an
Interlanguage Analysis is conducted. However, if the students‟ L1 is compared to their TL

14
For an overview of various error taxonomies, refer to (Dulay, Burt and Krashen, 1982: 146-197) or James
(1998: 102-117).

Page 31 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

either before or after analysing their production in an attempt to explain the causes of the
students‟ errors, a CA in its strong or weak version, respectively, is completed.

The manual analysis of the students‟ errors, following a CA, EA or IA methodology, proves a
time- and effort- consuming task which a teacher can only do with a limited number of
essays, as it is necessary to go to the essays, look for the errors, highlight, classify and count
them, make sure all the errors are being considered, look for the correct use of the aspect of
the language being analysed, compare the use of the aspect under analysis in the L1 and the
FL, etc. Fortunately, those processes have been sped up thanks to the improvement in
technology and, consequently, the advent of CLC, their electronic format being among their
main advantages (Nesselhauf, 2004: 139-40), because they make their compilation and their
analysis easier.

Not to fall prey of the temptation to collect huge disorganized amounts of data, as it is the
case with corpora in general (see section 2.2. above), strict design criteria are to be observed
when compiling a learner corpus. Special attention needs to be given to the principles of
authenticity and representativeness, and all attempts are to be made to avoid the effects of
variability not to compare aspects from a not homogeneous learner corpus. Thus, if the
teacher aims at representing students‟ in-class argumentative writing at intermediate level,
pieces of writing which belong to other genres, which are written by students at other
proficiency levels, or at home (and presumably with access to reference materials), should not
be included in that corpus, since the results would be biased. Just consider, from your own
experience, the difference in the type and amount of errors which an argumentative essay
written by a student in class (and without the use of dictionaries, online resources, etc.) and at
home would have or, likewise, the type of errors that you expect from descriptive writing as
compared to narrative writing.

Drawing from the methodologies in the pre-CLC era, the analysis of students‟ use of
language, as represented in a learner corpus, is nowadays being made in a systematic and
scientific way following Computer-aided Error Analysis (CEA), Contrastive Interlanguage
Analysis (CIA) or the Integrated Contrastive Method (ICM):

Methodology Focus of interest Publications


Computer-aided Error AnalysisStudents‟ real errors, as (Dagneaux, Dennes
(CEA) attested in a CLC and Granger, 1998)
Contrastive Interlanguage Comparison of (Granger, 1996)
Analysis (CIA)  NS vs. NNS
production
 NNS vs. NNS
production
Integrated Contrastive Method  CA (Granger, 1996;
CLC

(ICM)  CIA Gilquin, 2000/2001)


Table 6. Methodologies used in the description of the learners‟ production of the foreign
language.

The first one, CEA, is a „new type of EA‟ (Dagneaux, Dennes and Granger, 1998: 165). In
other words, it is a computerized version of EA, which allows a quicker error annotation and
easy retrieval of the erroneous instances of students‟ use of the foreign language. There are

Page 32 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

two ways to conduct such an analysis, which depends on whether the learner corpus is error-
tagged or not, i.e. whether a code system to highlight the errors has been used or not.

If it is not, an intuitive search for an error-prone aspect is undertaken. This is the case when
the teacher feels that the central articles the and a(n) pose problems to his or her students. By
means of a learner corpus and retrieval tools, s/he can read in the concordances retrieved the
use of those articles and decide which ones are incorrect, thus conducting an EA.

However, a raw learner corpus, i.e. one without error annotation, will not allow the researcher
to retrieve those instances of the (mis-)use of the zero article, since it would be impossible to
automatically retrieve them. To do so, the learner corpus needs to be error-tagged.

There are two types of error-tagged learner corpora:

 Fully error-tagged and


 Partially error-tagged

In the former, a comprehensive error taxonomy has been used to highlight all the possible
errors in a learner corpus. Although few learner corpora are fully error-tagged due to practical
reasons of time and money, the results which such EAs yield provide a bird‟s-eye perspective
of the students‟ problems when using the foreign language at a specific moment in their
language acquisition process. As an example, Figure 7 shows the percentage of errors in
forty-three aspects of the foreign language (as represented in the error tags on the horizontal
axis) that the written production by first-year university students contains at the beginning of
the academic year (Díez-Bedmar, 2005):

Figure 22. EA of first-year University students when beginning the academic year (Díez-
Bedmar, 2005).

Page 33 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

A partially error-tagged learner corpus only highlights a specific type of error, which is of
interest to the teacher or the researcher. Resuming the case of the central articles, a partially
error-tagged learner corpus will make it possible to easily retrieve, quantify and analyse the
errors made with the articles the and a(n) (as it was the case with a raw learner corpus), but
also those errors involving the zero article (Ø). Notice in the following concordance lines the
cases of incorrect use of the central articles, a(n), followed by erroneous uses of the zero
article, and then erroneous uses of the, as error-tagged (GA).

Figure 23. Article errors as retrieved from a partially error-tagged learner corpus using
WordSmith Tools..

The second methodology used with CLC, the Contrastive Interlanguage Analysis, allows the
researcher to compare the students‟ production with:

1 the production by native speakers of English


2 the production by other groups of learners of English with a different L1

On the one hand, if your students‟ production is compared to that by native students of
English (at the same level and under the same external variables), it would be possible to see
how (dis-)similar both productions are when an aspect of the foreign language is studied. As a
result, instances of misuse but also under- or over-use are revealed and conclusions such as
the overuse of the prepositions between, inside and according to by Spanish university
students, when comparing them to native learners of English can be drawn (Martínez Osés
and Neff, 2001: 144). On the other hand, you may be interested in comparing how various

Page 34 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

groups of students of English (at the same proficiency level and under the same external
variables) struggle with the same aspect of the foreign language, as Kaszubski (2001) did
when comparing the use of the lemma be by Spanish, Polish and Belgian-French students.

Finally, the Integrated Contrastive Model includes a CIA and a corpus-based CA. Therefore,
three different corpora are used, namely the learner corpus, the control corpus and a corpus
which contains the production by native speakers in the L1. As it happened with CA in the
pre-CLC era, there are two ways of conducting an ICM. First, the corpus-based CA is
conducted in order to see the main differences between the two native languages considered
and, then, the problems posed by such differences are attested in the learner corpus. On the
contrary, the problems in a learner corpus, as revealed by a CIA may lead to a corpus-based
analysis of the two native languages in an attempt to find the causes of such errors.

4.4. The application of CLC in the TEFL classroom

The potential of CLC in the direct and indirect approaches will be explored in this section.
The first one will deal with the indirect approach, that is, using the results from the analysis of
CLC (following the methodologies described in 4.3) to improve teaching materials, the
curricula, etc., whereas the second one will focus on the direct approach, which provides
hand-on experience in working with CLC.

4.3.1. The indirect approach

Although CLC-based descriptions of the students‟ interlanguage are still limited and only
provide „[…] patchy knowledge of the different stages of interlanguage development.‟
(Gilquin et al., 2007: 322), the results obtained are progressively being introduced in teaching
materials.

Among the ones which have benefited more from the results in CLC are the dictionaries of
common errors, such as The Longman Dictionary of Common Errors (Turton and Heaton,
1987) and the Cambridge series Common Mistakes at… (Tayfoor, 2004; Driscoll, 2005; etc.),
in which frequent errors in learner corpora are highlighted and explained.

Page 35 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 24. CLC-informed materials focused on common errors.

Likewise, dictionaries have also been CLC-informed. The first one was the Longman
Essential Activator (LEA), which made use of the information in the Longman Learner’s
Corpus (LLC), and was followed by some others such as the Cambridge International
Dictionary of English, based on the error-tagged Cambridge Learners’ Corpus (Nicholls,
2003), or the second edition of the Macmillan English Dictionary for Advanced Learners,
based on a CIA analysis of the International Corpus of Learner English (ICLE) and a corpus
of native speakers‟ academic writing.

Figure 25. CLC-informed monolingual dictionaries of English.

The CLC-based information in these dictionaries is typically provided in „help boxes‟, which
are quite familiar to any learner of English as a foreign or second language. However, new
ways of offering information from CLC are being devised, as it is the case of the graphs in the
Macmillan English Dictionary for Advanced Learners, which shows the results of the CIAs
conducted on problems of frequency, register confusion, etc. Similarly, alternative ways to
express the students‟ typical errors are also suggested (as exemplified from the control
corpus) and extended writing sections on twelve rhetorical or organizational functions which
are particularly prominent in academic writing are included (cf. Gilquin, Granger and Paquot,
2007, pp. IW1-IW29).

Page 36 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 26. CLC-based results as provided in the Macmillan English Dictionary for Advanced
Learners (MED2).

Recent grammars also include information from learner corpora, as it is the case of Carter and
McCarthy‟s (2006) Cambridge Grammar of English, or the on-line Chemnitz Internet
Grammar of English.

Figure 27. CLC-informed grammars of English.

Finally, CLC may inform CALL programmes, such as WordPilot (Milton, 1998) or be
integrated into CALL programs, so that teachers and students, if deemed convenient, have a
direct access to the real data, as in the EXample eXtractor Engine for LAnguage Teaching
(eXXelant) (Granger, Kraif, Ponton, Antoniadis and Zampa, 2007).

Although syllabus design, textbooks and writing courses are now beginning to consider native
data in their recent editions (cf. the Touchstone Student’s Book series), there is no doubt that
the information provided by CLC can complement and improve such materials to meet the
students‟ real needs.

Page 37 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

4.3.2. Designing remedial exercises from a learner corpus

Analysing a learner corpus and designing CLC-based remedial exercises to meet your
students‟ real needs is not a difficult task. To help you analyse the data in a learner corpus,
this section will explore two ways to approach a small raw learner corpus. The first one deals
with the students‟ use of vocabulary, and the second one with the lexico-grammatical pattern
of the verb „say‟ and „tell‟.

The learner corpus used is one composed of the handwritten production by 16 first-year
university students (amounting to 17,765 words) when writing descriptive texts in class,
without any access to reference materials and a time limit of 60 minutes, was used. The piece
of software used for such purpose will be WordSmith Tools version 4.0.

4.3.2.1. Exploring vocabulary usage: wordlists and concord

This piece of software allows the teacher or researcher to create a wordlist, to run
concordances and explore keywords, as can be seen in the following Figure. However, we
will focus on the use of word lists and concordances for an exploratory analysis of the
adjectives used by a group of learners.

Page 38 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 28. WordSmith Tools 4.0.

As this self-explanatory term indicates, a word list is a list of the words in your learner
corpus. This term was reviewed in Table 2 above. Such list may be quantitatively ordered
from the word which presents the highest number of occurrences to the ones which only
appear once, or the other way round.

As can be seen in Figure 29 below, a word list of the adjectives that students used in the
learner corpus was obtained after removing from the list the words which did not belong to
this open word-class. As a result, it was possible to check that the adjectives which were most
used by those students were „good‟, „important‟ and „different‟. This finding may not have
surprised an experienced teacher, but the co-text in which these adjectives are used may
reveal interesting and unexpected deficiencies in the learners‟ vocabulary.

In order to explore such co-texts, the next step is to run concordances of any of these words.
For this example, „important‟ was selected. As can be seen in Figure 30 below, by running a
concordance we obtain sentences with the searched word in the middle and in blue. This is
known as „Key Word In Context‟ (KWIC), or node, and the lines obtained (i.e. concordance
lines) are not to be read in the traditional way (that is, everything from left to right as already
seen above), but we only focus on the first word to the left or to the right of the KWIC. Thus,
we are able to see the type of pre-modification the students use with the adjective under
consideration (first word to the left of the KWIC), and which elements are qualified as
„important‟. As already reported (cf. Granger and Tribble, 1998 or Osborne, 2004, among
others), students rely on this adjective, to the detriment of the use of others like „crucial‟,
„outstanding‟, „main‟, „valuable‟, etc., in the appropriate contexts. Therefore, a very easy

Page 39 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

exercise to create with the students‟ real words in their compositions is to remove the KWIC
and leave a blank, so that they have to think of a better alternative to fit in the linguistic
contexts they have created.

Figures 29 and 30. WordSmih Tools: Running a concordance and hiding the KWIC.

Page 40 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 32, presents a screenshot of such worksheet, which you can put into a word document
and use in class. The strongest aspect of this exercise is that it is based on your students‟ own
errors, and therefore, cater for their very specific needs. Furthermore, students are more likely
to feel motivated to do this exercise, since they may recognise their sentences and may be
willing to learn how to improve them.

Figure 31. Concord utility.

Page 41 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 32. Worsksheet in a .doc document.

4.3.2.2. Exploring lexico-grammatical patterns: „say‟ and „tell‟

The use of the verbs „say‟ and „tell‟ are reported to pose difficulties to students at various
levels due to their different lexico-grammatical patterns. However, it is worth exploring
whether your students do make those mistakes and, if so, which are the most problematic
uses.

In order to do so, the first step is to run a concordance of the verb „say‟ and sort the first
words to the right of the concordance line, as shown in Figures 33 to 35.

Page 42 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figures 33 to 35. Running a concordance and sorting them considering the first element to the
right of the KWIC

By doing so it is now possible to see how the students complement the verb „say‟ in different
contexts and co-texts that they have created themselves. In checking those uses, it is also
possible to notice uses of the verb „say‟, where „tell‟ would have been preferred, or where
another wording would have been more native-like.

In order to show students real native examples of the use of those problematic verbs, i.e. „say‟
and „tell‟, we can use the freely available version of the British National Corpus (BNC) or the

Page 43 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Collins Wordbanks Online English Corpus as control corpora, and show students some
examples in KWIC format to foster their analysis of the lexico-grammatical patterns used
(with the help of the teacher if necessary). To do so, we only have to query those corpora
(Figures 36 and 37), select the examples which show the various possibilities to complement
the verbs and, finally, create a word document for them to work with

Once real input has been provided to students and they have reflected on the various lexico-
grammatical patterning, an exercise based on their own written production, that is, in the
learner corpus compiled, can be created. As it was the case with the example of the use of
„important‟ above, we can easily remove the KWIC (the verbs „say‟ or „tell‟ in this case) from
the concordance lines and create a remedial exercise.

Page 44 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Figure 36 and 37. Concordances of the verbs „say‟ and „tell‟ in two native corpora.

As can be seen, creating materials which meet our students‟ real needs is not such a difficult
or time-consuming task. EFL teachers‟ experience is highly valuable when considering their
intuitions regarding their students‟ problems, which are worth checking and exploring in the
learner corpus that they have compiled. Once the remedial exercises have been created, the
worksheets can be stored either in paper format or distributed in a virtual platform, so that
students with the same problems, in our school or in another, may benefit from our work
created and improve their use of the foreign language.

Page 45 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

References

Barlow, M. (1996). Corpora for Theory and Practice. International Journal of Corpus
Linguistics, 1, 1. 1-37.

Bernardini, S. (2004). In the classroom: Corpora in the classroom: An overview and some
reflections on future developments. In John Sinclair (ed) How to Use Corpora in Language
Teaching,15-36. Amsterdam: John Benjamins.

Carter, R. and McCarthy, M. (2006). Cambridge Grammar of English. Cambridge:


Cambridge University Press.

Corder, S. P. (1981). Error analysis and interlanguage. Oxford: Oxford University Press.

Dagneaux, E., Dennes, S., and Granger, S. (1998). Computer-aided error analysis. System 26:
163-174.

Díez-Bedmar, M.B. (2005). Struggling with English at university level: error-patterns and
problematic areas of first-year students‟ interlanguage. In P. Danielsson and M. Wagenmakers
(eds), The corpus linguistics conference series. Retrieved 16 September 2007, from
<http://www.corpus.bham.ac.uk/PCLC/>

Driscoll, L. (2005). Common Mistakes at PET… and How to Avoid Them. Cambridge:
Cambridge University Press.

Dulay, H.., Burt, M., and Krashen, S. (1982). Language Two. Oxford: Oxford University
Press.

Gilquin, G. (2000/2001). The integrated contrastive model. Spicing up your data. Languages
in Contrast 3(1): 95-123.

Gilquin, G., Papp, Sz. and Diez-Bedmar, M. B. (eds.) (in press) Linking up Contrastive and
Learner Corpus Research. Amsterdam and Atlanta: Rodopi.

Gilquin, G., Granger, S, and Paquot, M. (2007). Learner corpora: The missing link in EAP
pedagogy. Journal of English for Academic Purposes 6: 319-335.

Granger, S. (1996). From CA to CIA and back: an integrated approach to computerized


bilingual and learner corpora. In K. Aijmer, B.Altenberg and M. Johansson (eds.), Languages
in Contrast. Text-Based Cross-Linguistic Studies, 37-51. Lund: Lund University Press.

Granger, S. (ed.) (1998). Learner English on Computer. London and New York: Addison
Wesley Longman.

Granger S. and Tribble C.(1998). Learner corpus data in the foreign language classroom:
form-focused instruction and data-driven learning. In S. Granger (ed.) Learner English on
Computer, 199-209. London and New York: Addison Wesley Longman.

Page 46 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Granger, S., Hung, J. and Petch-Tyson, S. (eds.) (2002). Computer Learner Corpora, Second
Language Acquisition and Foreign Language Teaching, Amsterdam and Philadelphia: John
Benjamins.

Granger, S., Kraif, O., Ponton, C., Antoniadis, G. and Zampa, V. (2007). Integrating learner
corpora and natural language processing: A crucial step towards reconciling technological
sophistication and pedagogical effectiveness. ReCALL 19(3): 252-268.

James, C. (1998). Errors in Language Learning and Use. Exploring Error Analysis. London
and New York: Longman.

Kaszubski, P. (2001). Tracing idiomaticity in learner language –the case of BE. In P. Rayson,
A.Wilson, T. McEnery, A. Hardie and S. Khoja (eds.), Proceedings of the Corpus Linguistics
2001 Conference (29 March-2 April), 312-322. Lancaster: University Centre for Computer
Corpus Research on Language

Lado, R. (1957). Linguistics Across Cultures. Ann Arbour, Michigan: Michigan University
Press.

Lewis, M. (1993). The Lexical Approach. Language Teaching Publications.

McEnery, T.; Xiao, R., and Tono, Y. (2006). Corpus-based language studies. An advanced
resource book. London: Routledge.

Milton J. (1998). Exploiting L1 and Interlanguage Corpora in the Design of an Electronic


Language Learning and Production Environment. In S. Granger (ed.) Learner English on
Computer, 186-198. London & New York: Addison Wesley Longman.

Martínez Osés, F. and Neff Van Aertselaer, J. (2001). Corpus analysis of prepositional
patterns in native and non-native university writing. In C. Muñoz, M. L. Celaya, M.
Fernández-Villanueva, T. Navés, O. Strunk and E. Tragant (eds.), Trabajos en Lingüística
Aplicada, 139-147. Barcelona: Univerbook.

Mauranen, A. (2004).Spoken corpus for an ordinary learner. In John Sinclair (ed) How to Use
Corpora in Language Teaching, 89-105. Amsterdam: John Benjamins.

Nattinger, J. R. and J. S. Decarrico. (1992) Lexical phrases and language teaching. Oxford:
Oxford University Press.

Nesselhauf, N. (2004). How learner corpus analysis can contribute to language teaching: A
study of support verb constructions. In G. Aston, S. Bernardini and D. Stewart (eds.),
Corpora and Language Learners, 109-124. Amsterdam and Philadelphia: John Benjamins.

O'Keeffe, A. McCarthy, M. and Carter, R. (2007). From corpus to classroom. Cambridge:


Cambridge Univrsity Press.

Osborne, J. (2004). Top-down and Botom-up Approaches to Corpora in Language Teaching.


In U. Connor and T. A. Upton (eds.). Applied Corpus Linguistics. A Multidimensional
Perspective, 251-265. Amsterdam and New York: Rodopi.

Page 47 of 48
Pascual Pérez-Paredes & Belén Díez Bedmar (2009). Language corpora and the language classroom.
Materiales de formación del profesorado de lengua extranjera (Inglés) Murcia: Consejería de Educación y Cultura.

Römer, U. (2008). Corpora and language teaching.

Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10: 209-231.

Sinclair, J. (2001). Preface. In M. Ghadessy, A. Henry and R. L. Roseberry (eds.), Small


Corpus Studies and ELT. Theory and Practice, vii-xv. Amsterdam and Philadelphia: John
Benjamins.

Sinclair, J. (2004). New evidence, new priorities, new attitudes. In John Sinclair (ed) How to
Use Corpora in Language Teaching, 271-299. Amsterdam: John Benjamins.

Tayfoor, S. (2004). Common Mistakes at First Certificate… and How to Avoid Them.
Cambridge: Cambridge University Press.

Tribble, C. and Jones, G. (1990). Concordances in the classroom. London: Longman.

Turton, N. D. and Heaton, J. B. (1987). Longman Dictionary of Common Errors. Harlow:


Longman.

Page 48 of 48

View publication stats

Potrebbero piacerti anche