Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Corpus
texts collected and processed in a unified,
systematic manner
British National Corpus: http://www.natcorp.ox.ac.uk/
BTANT 129 w5
“A corpus can be defined as a collection of texts assumed
to be representative of a given language put together so
that it can be used for linguistic analysis. Usually the
assumption is that the language stored in a corpus is
naturally-occurring, that it is gathered according to explicit
design criteria, with a specific purpose in mind, and with a
claim to represent larger chunks of language selected
according to a specific typology.” (Tognini-Bonelli 2001,
p. 2)
Corpus Linguistics: Theory or Method?
Theory
WHY?
A set of ideas to explain the apparent facts
Methodology
HOW?
An approach to something; set of methods
What is corpus linguistics?
Focuses of research
Paradigmatic claims
C.B. Approaches C.D. Approaches
CORPUS-BASED APPROACHES
Sampling techniques:
Simple random sampling: all sampling units within
the sampling frame are numbered and the sample is
chosen by use of a table or random numbers; rare
features could not be accounted for.
Stratified random sampling: the population is
divided in relatively homogeneous groups, i.e. the
strata, and then these latter are sampled at random;
never less representative than the former method.
Historical background of Corpus
Linguistics
1960s -1980s
Brown Corpus (American English) 1 million words
Lancaster –Oslo-Bergen (LOB) corpus (British English) 1 million words
These corpora inspired the International Corpora of English (ICE) projects, which are
still continuing: see http://ice-corpora.net/ice/
1980s-2000
British National Corpus (100 million words)
COBUILD corpus > Bank of English http://www.mycobuild.com/about-collins-
corpus.aspx
2000-now
BYU corpora (see http://corpus.byu.edu): CoCA, CoHA, TIME, Corpus of American
Soap Operas, etc
SCOTS; Corpus of Modern Scottish Writing (1700-1945) (see
http://www.scottishcorpus.ac.uk)
BYU corpus suite: http://corpus.byu.edu
The scope of corpus linguistics
powerful
strong
wind, feeling, accent, flavour
• Authenticity
• Objectivity
• Verifiability
• Exposure to large amounts of data
• New insights into language
• Enhancement of learner motivation
Authenticity
“A cover term for the cohesion that results from the co-
occurrence of lexical items that are in some way or
other typically associated with one another, because
they tend to occur in similar environments.” (Halliday &
Hasan 1976:287)
candle – flame – flicker
hair – comb – curl – wave
Parole (Utterance)
syntagmatic
…to see the selected collocate
Enhancement of learner motivation
• Getting permissions
• Discussion and research points.
• Research the copyright laws of Lithuania and find
out what restrictions govern the production of an
electronic copy of copyrighted material for
research purposes. Contact one or more publishers
to find out about their policy and practice in
assisting researchers to build corpora.
• Further reading
• McEnery et al. 2006: 77-79
Corpus creation
• Phrasal Verbs
• Using the data from the BNC choose a group of phrasal
verbs:
back away/down/off/out/up
break away/down/in/into/off/out/through/up/with
put
about/across/around/away/down/forward/off/on/out/thro
ugh/together/up
set about/apart/aside/back/down/forth/in/off/on/to/up
step aside/back/down/in/on/up/
Corpora in teaching and learning
• Prepositions
• Study the concordances of above and over
and work out the similarities and differences
between them.
Collocations
• Research points:
• Use BNCWeb to analyse the collocations of
the words of your choice.
• Further reading:
• Mc Enery et al. 2006: 80-85, 52-58, 208-226.
Idiomaticity
• Research points:
• Use the BNC and the Corpus of
Contemporary Lithuanian to analyse idioms
contrastively.
Lexical difficulties
Use the BNC to study the differences between the following pairs of words:
• Adverse, averse • Compare to, compare • Distinct, distinctive
• Acute, chronic with • Each other, one another
• Among, amid • Complement, compliment • Economic, economical
• Amoral, immoral • Continual, continuous • Elicit, illicit
• Between, among • Convince, persuade • Fewer, less
• Biannual, biennial, • Creole, pidgin • Flammable, inflammable
• Bimonthly, biweekly • Definite, definitive • Ingenious, ingenuous
• Broach, brooch • Different from, to, than • Lay, lie
• Cement, concrete • Disinterested,
uninterested
• Cession, session
• Disposal, disposition
False friends