Collocation Games From A Language Corpus: January 2012

See
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/304787633
Collocation Games from a Language Corpus
Chapter · January 2012

DOI: 10.1057/9781137005267_11
CITATION READS
1 45
3 authors:
Shaoqun Wu Margaret Franken

The University of Waikato The University of Waikato
26 PUBLICATIONS 93 CITATIONS 37 PUBLICATIONS 155 CITATIONS
SEE PROFILE SEE PROFILE
Ian Witten
The University of Waikato
535 PUBLICATIONS 68,215 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
FLAX (Flexible Language Acquisition flax.nzdl.org) View project
EThOS for EAP View project
All content following this page was uploaded by Shaoqun Wu on 15 December 2016.
The user has requested enhancement of the downloaded file.

Corpus-based collocation games
Shaoqun Wu, Margaret Franken and Ian H. Witten

University of Waikato, New Zealand
Abstract
This chapter describes a system that automatically generates collocation language
games. Underpinning the system is a corpus of contemporary collocations from the
Web, the Google n-gram collection, which has to date had little or no attention from
linguists, or developers of language teaching tools and resources. The text constituting
this corpus was collected in January 2006 by Google from publicly accessible Web
pages and was generated from approximately one trillion word tokens of text. The
corpus contains short sequences of consecutive words, called n-grams, along with
their frequencies. For our purposes, it was ‘cleaned’ and refined (see Wu, Franken and
Witten, 2009). The corpus and the accompanying system, that we have named FLAX,
provides for the automatic generation of games at the instigation of the teacher who
may choose to set particular parameters such as type of collocation. In such a way, the
games can be controlled by the teacher both in terms of form, and level of difficulty
or complexity. Using such an extensive corpus to generate games means that learners
can be exposed to and can manipulate language items that are authentic and
contemporary, an important consideration in language learning (see for instance
Gardner, 2008; McAlpine & Myles, 2003). The language items are also continually
being refreshed. In this chapter, we describe five such games: Collocation Dominoes,
Collocation Matching, Common Alternatives, Related Words and Collocation
Guessing. We compare our system with other potential sources of collocations that a
teacher may draw on to design games, and point out the particular benefits that the
system brings to language learners.
Key words: collocations, corpus, on-line games
Introduction
The notion of language games gained prominence with communicative language teaching as
course developers and teachers thought of ways to structure opportunities for meaning
negotiation. They used split information activities to provide the impetus for learners to
interact with each other. Task based language teaching has more recently explored the
parameters of interactive tasks to improve the nature of that interaction and to ensure better
and more diverse learning outcomes (see for example, Skehan, 2003). The use of computers
in the design and implementation of language games has added significant value to what
1
teachers can now offer students in terms of challenging and productive interactive language
games. Wright, Betteridge and Backby’s (2006, p.1) definition of a game as “an activity
which is entertaining and engaging, often challenging and an activity in which the learners
play, and usually interact with others,” focuses on interaction as a key feature of language
games.
The games we describe in this chapter are able to be played with others, but we believe that
there are more significant features that make them entertaining, engaging and challenging,
and which support effective learning of language. This chapter presents an explanation of a
system (FLAX) that supports collocation language games. Underpinning the system is a
corpus of contemporary collocations from the Web, the Google n-gram collection. The text
constituting this corpus was collected in January 2006 by Google from publicly accessible
Web pages and was generated from approximately one trillion word tokens of text. The
corpus contains short sequences of consecutive words, called n-grams, along with their
frequencies. The system making use of this corpus provides for the automatic generation of
games at the instigation of the teacher who may choose to set particular parameters such as
type of collocation. In such a way, the games can be controlled by the teacher both in terms
of form, and level of difficulty or complexity. Using such an extensive, and relatively
contemporary corpus to generate games means that learners can be exposed to language items
that are continually being refreshed, but are nonetheless frequent.
Collocation learning and games
The role of lexical sequences in the process of language learning and their importance in
supporting fluent language production as “islands of reliability” (Dechert, 1980; Pawley &
Syder, 1983) has recently received renewed attention (Ellis, 1996; Schmidt, 2004). In this
regard, collocations in particular are seen as much more significant and useful than other
lexical sequences such as idioms, which are relatively infrequent but somewhat easy to learn
because of their saliency. Collocations represent a type of lexical sequence (Wray, 2000) in
which the lexemes “co-occur in natural text with greater than random frequency” (Lewis,
1997, p.8), and which occur in particular grammatical patterns.
Collocations are of great importance for second language learners. Knowledge of them plays
a key role in producing language fluently (Pawley & Syder, 1983; Wray, 2002). But such
knowledge is difficult to acquire for a number of reasons. Firstly there is simply so much of
this knowledge to acquire, and it is of a complex nature. As a number of researchers have
pointed out (Benson, Benson, & Ilson, 1986; Lewis, 1997; Nattinger & DeCarrico, 1992;
Nesselhauf, 2003), collocations vary in a number of ways that present challenges for learners:
in terms of the degree to which items are fixed, the relative frequency of the words they are
made up of, whether they occur in an unbroken sequence or whether they can tolerate lexical
insertion, and so on. In addition, the boundaries of collocations are often difficult to identify
(Bishop, 2004). These factors also present challenges for teachers in their selection of
collocations for students’ learning. Hill (2000) recommends choosing collocations that follow
particular grammatical patterns, particularly those that incorporate nouns, such as adjective +
noun, noun + noun, verb + adjective + noun, verb + adverb, adverb + adjective and verb +
2
preposition + noun. He also suggests that teachers think of collocations on a spectrum, with
weak and strong collocations at each end and medium-strength ones in the middle. It is those
of medium-strength that are particularly important for learners, because they make up a large
part of what we say and write every day. Nation (2001) likewise regards frequency – together
with range – as an essential criterion for selection.
Teachers must have access to tools, particularly on-line tools, to enable the selection of
frequent, and useful collocations. While such tools are available for linguists, few other than
Tom Cobb’s Compleat Lexical Tutor (http://www.lextutor.ca/) from Université du Québec á
Montréal, and the search and browse functions of FLAX (flax2.nzdl.org), and described in
Wu, Witten & Franken, 2010) are readily accessible to teachers. An activity or game
generating system based on selected collocations, as described in this chapter, is to our
knowledge not available elsewhere. There are a small number of collocation exercises on the
Web, as opposed to millions of vocabulary and grammar based ones. For example, a4esl.org,
one of the most popular English learning websites, hosts hundreds of language exercises
contributed by teachers around the world, of which only two are collocation exercises, each
containing ten questions.
Collocation exercises, normally presented as complementary material for vocabulary study,

often take the form of quizzes, puzzles, fill-in-blanks, matching, permutation, or games. The
two collocation exercises in a4esl.org are matching ones. www.better-english.com provides
15 business collocation exercises in multiple choice format. Each one contains 20 questions,
focusing on a particular group of nouns or adjectives. The student chooses the noun that fits
the context presented in each question. The noun is removed from the question text and the
student chooses one from a dropdown list that is the same for all questions. The words in the
dropdown list appear to be randomly rather than pedagocially selected. The eleven
collocation exercises offered by another website, angelfire.com take the form of drag-and-
drop, matching, and gap filling.
Playing the games
We have designed and implemented five collocation games underpinned by a system that
identifies collocations by type, by pattern and by frequency. The games are designed to be
interesting in that they move beyond the conventional arrangements discussed above. They
can be played individually or in groups, either cooperatively or competitively. For example,
students can work on networked computers logging into the games simultaneously, or the
games can be played in a class or group setting, by having students nominate suggestions to
the group or class from their own computers.
These games are created automatically under the guidance of a designer, usually the teacher,
through an interface described in a later section, Designing the games. The main technical
innovation is the use of the Google n-gram collection, a vast collection of collocations (30
million) and phrases (500 million) gathered from the World-Wide Web. These collocations
are selected and identified by syntactic type and are listed by frequency. Another resource
that is used when creating games are three sets of commonly-used lists of words, the most
frequent 1000 and 3000 words in English (West, 1953), and a list containing these plus the
3
570 most frequent academic words (Coxhead, 1998), and their word families. This process is
described in the section entitled Under the hood.
The five games described below are: Collocation Dominoes, Collocation Matching, Common
Alternatives, Related Words, and Collocation Guessing.
Collocation Dominoes
The first example of a game mimics the traditional game of dominoes where the last word of
the previous collocation becomes the first word of the next collocation. Here is an example:
bank cheque — cheque book — book club — club sandwich — sandwich board — board
room …
Beginning with an initial seed word (in this case bank), players come up with the remaining
words in succession, each word pair forming a collocation. The game can be open or closed:
in the closed version words are chosen from a fixed list; in the open version, which is much
more difficult to play, they are freely chosen.
Figure 1 illustrates a game created using our system from the starting word turn, using
dominoes of the form noun + of + noun. This screen is what players see when the game
begins. The list of available words is given at the top of the panel. Players move a word into
the appropriate box, whereupon the adjacent box is also updated to contain the same word.
The last word (person in this case) as well as the first (turn) is given: players can work
forwards from the beginning or backwards from the end.
Figure 1. The Collocation Dominoes game
Moves can be undone by clicking the word in the domino, whereupon it is erased and
reappears in the list of words at the top. When the “Check Answer” button is clicked,
incorrectly formed collocations revert to empty boxes and their contents reappear in the list of
words.
English word classes are highly flexible as many verbs can be used as nouns, and nouns can
be used as adjectives. Many learners, even advanced ones, can feel challenged when
4
attempting to use noun + noun combinations accurately given this conflation of form and
function. Collocation Dominoes can help them them gain fluency in using these collocations.
However, different syntactic patterns can be specified for the game. The designer (usually the
teacher) selects a syntactic pattern for the collocation type, chooses a starting word and the
number of dominoes, and determines whether the game is to be open or closed.
This type of game relies on the capacity of a system that has a large repertoire of collocations
in order to determine a suitable list of words in the closed version, and to check the player’s
answer in either version. A system such as ours which is underpinned by a huge corpus has
this capacity.
Collocation Matching
In the Collocation Matching game the system selects a set of collocations with the same
syntactic pattern, splits each into its left and right component, and shuffles the two sets of
components. For example, the secretary of state, course of action, and hundreds of dollars
might be presented as:
the secretary of action

hundreds of state
course of dollars
Learners must rematch them.
Figure 2 shows agame created using six quantification words: grain, drop, slice, sheet, chunk,
and bar. The words and their associated nouns are separated, shuffled and placed in the left
and right columns respectively. Players match quantification words with the appropriate noun
by dragging and dropping the words in the right-hand column, so that together they form a
strong partnership. At any point they can restart the game, check their answer, or begin a new
game that uses the same quantification words but with different nouns.
Figure 2. The Collocation Matching game
Picking collocations thematically (in this case, quantification) can help learners practise
particular groups of collocations, which adds extra value to this activity.
5
Common Alternatives
For the Common Alternatives game, learners enter as many collocations as possible and their
choices are scored. Figure 3 shows a game that focuses on nouns commonly associated with
the verb reduce. To get learners started, they are given some sample collocations—in this
case reduce stress, reduce heat loss, reduce fighting and reduce the risk of.
Learners type a word or phrase into the text box and press the “Enter” key, at which point the
system checks it. For example, reduce more would be rejected because this exercise requires
a noun, or a phrase that contain a noun. Then the input text, preceded by the word reduce, is
sought amongst n-grams of the same length in the database. If it is found, the associated
usage frequency is retrieved and displayed as a score.
Figure 3. The Common Alternatives game
In Figure 3 the user has previously entered reduce costs and reduce poverty, and has just
entered reduce the possibility of, bringing his or her total score of 10,181. The competitive
factor makes this activity compelling. Players can be connected to work on the same game
and see each other’s scores. This challenges them to increase their score by discovering more
collocations.
Related Words
The Related Words game picks several related words and a number of their associated
collocations, removes the related words, and shuffles the remaining text. For example:
pay make
_____the bill, _____ efforts, _____the debt, ____a difference
Learners are asked to choose the correct word to complete each collocation (here pay the bill,
make efforts, pay the debt and make a difference).
6
Figure 4 depicts a game for the pair speak and tell, which are shown at the top. The main
panel shows a list of collocations with related words replaced by blanks. Players drag a word
and drop it onto a line below to complete the collocation. The numbers following the words
indicate how many occurrences there are, and decrease whenever the word is used. Moves
can be undone by clicking the collocation text. When the “Check Answer” button is clicked,
incorrect collocations revert to their original state.
Figure 4. The Related Words game
Learners are often confused by frequent words which have a wide range and often overlap in
meaning, and find it difficult to understand their differences just by looking them up in
dictionaries. This activity works well with these types of words. Some further examples are
make and do, wound and injury , and see and look.
Collocation Guessing
For the Collocation Guessing game, the teacher chooses a target word and a number of
associated collocations. The target word is removed and the associated collocations are
revealed one by one; players must guess the target word as quickly as possible. For example,
given this list: plain, dark, white, bitter, milk, bar of, learners guess the word that collocates
with all of them. (The answer is obvious to chocolate lovers!)
The interface, shown in Figure 5, mimics the well-known puzzle game of Tetris, which has
been called the “greatest game of all time” (100 million copies have been sold for cell phones
alone). ‘Bricks’ that show a collocation where the target word is missing appear in the main
panel on the left. They drop down one by one from the top of the panel, and as soon as one
reaches the bottom the next one launches. Players type guess after guess, and the game ends
when the correct word is given or the collocations run out. Bonus points are awarded
according to the number of collocations that have been shown. Players can restart the current
game or move on to another one at any time. The slider adjusts the speed at which the bricks
drop.
7
Figure 5. The Collocation Guessing game
To create a set of games, the teacher provides some target words. This allows for topic-
related exercises. Alternatively, exercises can focus on a particular collocation type or a range
of types. Taking the word make as an example: if verb + noun were chosen, the collocations
might be make money, make use of, make every effort. If all collocation types were used, they
might be make sure, make up, actually make, make money. Both are good ways to enrich
collocation knowledge.
Designing the games
A number of parameters can be set and manipulated by teachers when designing the games.
While these have been mentioned in relation to the descriptions above, it is useful to reiterate
them and discuss the affordances associated with them. The parameters include: target words,
collocation type, wordlist, the number of collocations, and selection method. Having selected
appropriate values for these parameters, the teacher has an opportunity to review and modify
the material that has been selected before making the game available to students.
Target words are used to retrieve the collocations that will appear. The opportunity to specify
these gives the teacher control over the game’s focus or purpose. If target words are not
specified, the system picks words randomly from one of the three word lists mentioned above
(most common 1000 and 3000 words, and a list that includes academic words as well).
However, randomly generated words are unsuitable for some games. For example, those used
for Related Words should be somehow related, as the name suggests.
Collocation type is determined by a grammatical pattern. Some types are particularly suitable
for certain games, such as noun + noun and noun + of + noun for Collocation Dominoes (as
illustrated earlier). Different groups of students may experience difficulty in learning specific
collocation types, and so a careful selection can target areas for learning.
8
Another parameter is used for generating random words in situations where the target words
parameter is not specified, and can be set to one of the three lists mentioned earlier. For
example, when set to 1000, target words are picked from the list of the 1000 most common
English words (West, 1953). If the collocation type parameter mentioned above is also set,
words are picked at random from the words used in these collocations. For example, if the
collocation type is noun + noun and the wordlist parameter is set to 1000, words are
randomly picked from the most frequent 1000 nouns extracted from all noun + noun
collocations.
The number of collocations determines the size of a game. In Collocation Guessing, the more
collocations there are, the easier the game, because learners are able to see more hints. For
the other games, balance is necessary to avoid learners from being overwhelmed by, and
unable to adequately pay attention to and process the language items presented.
The final parameters control the actual selection of collocations. How can games be created
that use the best group of collocations and also allow learners to practice a variety of
collocations associated with a particular word? Most words, particularly common ones such
as take, make, cause, have many collocations, and they can be grouped together by frequency.
The top one or two collocations are normally at least twice as frequent as the others; a second
group with various numbers of collocations follows; and so on. It is crucial that learners study
collocations in the first group, and they should also study the second or third groups in order
to expand their collocation knowledge. Our system selects the n best collocations for a word
and randomly picks one for each game, so that learners can practice different groups of
collocations by clicking the “New Game” button. The value of n (default 5) is specified by
the teacher and should be adjusted according to the frequency of usage of a particular word
and the language ability of students. A general rule is to use a high value for common words
or more advanced students.
In the Collocation Matching and Related Words games, learners match or differentiate
collocations of two or more words. Of course, different words may share the same group of
collocates e.g., speak the truth and tell the truth. In this case our system chooses the strongest
collocate, in this case, tell the truth, which is more frequent than speak the truth. Since
collocations are picked randomly, learners still have the chance to practise speak the truth
when another collocation is chosen for tell.
Games are designed using a standard interface that varies slightly from one game to another
depending on the parameters that can be manipulated. We use Collocation Guessing below in
Figure 6 as an example, of the interface. Here a teacher is creating a game that asks students
to differentiate between the words make and take. She specifies two or more target words by
entering them in the input box near the top (make and take in this case), or uses randomly
generated words by choosing a wordlist and the number of words to generate. Then she
specifies the desired collocation type (here, verb + noun), , which collocation constituent to
practise (here the first word, i.e. the verb), the number of collocations to use in this game (10),
and how to select collocations (randomly from the top 15).
9
Figure 6. Designing a Collocation Guessing game
When the “Preview” button is clicked, the system retrieves collocations that match the
criteria specified. Particular collocations can be discarded by unchecking the check box
following them. For example, here either make a good decision or make any decisions might
be removed because they are similar.
Under the hood
Some researchers have made use of Google for language activities (see for instance, Guo &
Zhang, 2007; Shei, 2008). The fact that this is an emerging area of activity can be evidenced
by the recently coined neologism GALL, for Google Assisted Language Learning (Chinnery,
2008; Shei, 2008). However, as discussed in Wu, Franken and Witten (2009), there are
serious limitations to using Google and the live Web. For one, “search engine companies do
not support the use of their services through secondary interfaces” (Wu, Franken, & Witten,
2009, p. 253). One logical response to constraints is to use the Google n-gram collection. It is
an extensive resource with the potential for teachers to build a seemingly inexhaustable
supply of games for their learners. However exploitation of this resource to date is limited,
perhaps largely because people are unaware of its existence, and maybe also because of the
complexity of building the system (FLAX) which mediates it – something we were able to as
10
part of a doctoral project (Wu, 2010); and also because of our access to Greenstone digital
library software1.
Essentially the process involved assigning part-of-speech tags to the five-grams, matching
tagged five-grams against syntactic patterns, discarding “dirty” collocations, and ranking
collocations by frequency. The OpenNLP tagger2 was used to assign part-of-speech tags to
five-grams and the tagged five-grams were compared against a chosen set of ten syntactic
patterns or collocation types. The first six patterns were adopted from the work of Benson,
Benson and Ilson (1986). The other four categories, noun + noun, adverb + verb, verb + to +
verb, and verb + adjective were used because of their presence in the Oxford Collocation
Dictionary for Students of English. Those collocations that that matched the patterns were
extracted as candidate collocations. The next step was to ‘clean’ the collocations. Like the
Web itself, the five-grams are messy. They include many non-word character strings, website
names and grammatical errors. Unfortunately, it is virtually impossible to eliminate all errors.
However, we used the British National Corpus wordlist to remove non-words and website
names, and we located and discarded unconventional single-character words (other than the
article a or pronoun I) such as time t, p values, and m sections, or repeated words. Finally, the
surviving collocations were ranked by frequency. Further details of the way in which we
‘cleaned’ and refined the corpus for use are detailed in Wu, Franken and Witten (2009).
The total number of collocations, and the number of words within each type of structure that
remained after the process described above was carried out, is shown in Table 1.

1
http://www.greenstone.org/
2
http://opennlp.sourceforge.net/
11
Table 1. The ten collocation types with examples and frequencies
collocation type examples collocations words
verb + noun(s) verb + noun + noun make appointments,

includes: verb + adjective + noun(s) cause liver damage,
8,700,000 54,000
verb + preposition + noun(s) take annual leave,
result in dismissal
verb + adverb apologize publicly 200,000 11,000
noun + noun a clock radio 4,200,000 53,000
noun + verb noun + verb with present tense the time comes, the
includes: noun + be + present participle time is running out, 1,200,000 34,000
noun + be + past participle the time is spent on
noun + of + noun a bar of chocolate 7,800,000 40,000
adjective(s) + adjective + noun + noun
noun(s) includes: a little girl, a solar
adjective + adjective + noun(s)
energy system, a 6,300,000 56,000
beautiful sunny day
verb + adjective
includes: make available, take
verb (incl. phrasal) + adjective 91,000 9,800
up more, take it easy
verb + noun + adjective
verb + to + verb cease to amaze 440,000 11,000
adverb + verb beautifully written 500,000 13,000
adverb +
seriously addicted 200,000 10,000
adjective
Evaluating collocations
One of the major arguments in this chapter is that a corpus such as the Google n-gram
collection can much more successfully provide a tool for learning frequent, relevant and
useful collocations than more traditional sources. In order to substantiate this claim, we
present evaluations of collocations extracted from Google five-grams with respect to those in
the BNC and the Oxford Collocation Dictionary for Students of English.
FLAX collocations vs Oxford Collocation Dictionary for Students of English

After investigation, we decided to build the baseline data from The Oxford Collocation
Dictionary for Students of English (OCDSE) for comparison as it is based on a relatively
large corpus (the BNC), and contains about 150,000 collocations for 9,000 headwords,
organized into eleven collocation types shown in Table 2. For each type it gives the number
of headwords, the number of collocations, and some examples. Adjective + noun collocations
constitute the largest group (37.5%), followed by verb + noun (19.2%), adverb + adjective
(7.0%), and so on. It is unclear how this dictionary was generated: automatically, manually or
both?
12
Table 2. Number of collocations extracted from the Oxford Collocation Dictionary for
Students of English
collocation type headwords collocations example

adjective + noun 4997 69362 (37.5%) vague recollection
verb + noun 4529 35516 (19.2%) keep the promises
noun + preposition or
3584 12475 (6.7%) in press, position on
preposition + noun
noun + verb 1846 8091 (4.4%) plot unfolds
noun + noun 2100 12283 (6.6%) plot development
adverb + verb directly recruit,
1436 10144 (5.5%)
or verb + adverb: recruited specially
verb + to + verb: 749 3539 (1.9%) try to recruit
verb + preposition: 1076 3027 (1.6%) recruit as
adverb + adjective: 1450 13006 (7.0%) awfully careful
verb + adjective 1464 7605 (4.1%) be + careful
adjective + preposition 689 1121 (0.61%) careful about
phrases 2791 8850 (4.8%) a plot of land
The dictionary contains about 185,000 collocations in all, considerably more than the
150,000 that it claims. Upon further investigation, it was found to include some arguable
collocations such as 19th century, $20 reward, children’s book and men’s loo.
Only adjective + noun, noun + noun, and adverb + adjective collocations, comprising 52%
of the total, were used as baseline data because the other types contain non-consecutive
words, and the number of constituent words that are included is inconsistent, and collocations
with variable length (two to four words) and form.
For each of the three collocation types, test data was extracted and organized by headword.
Table 3 gives the size of the two data sets (headwords are in bold). The largest group,
adjective + noun, covers 4,234,318 FLAX collocations with 870 per headword, which is
almost 66 times larger than the 62,919 OCDSE collocations with 13 per headword.
Table 3. Number of collocations in the baseline and test data
collocation type headword OCDSE average FLAX average

collocations collocations
adjective + noun 4863 62,919 13 4,234,318 870
noun + noun 2048 11,836 5.8 1,459,283 712
adverb + adjective 1420 11,385 8 24,9147 175
13
The data shows that the FLAX colocation database contains many more potential collocations
for games than the Oxford Collocation Dictionary for Students of English
FLAX collocations vs BNC Collocations

For this evalaution all of the ten types were extracted from the BNC. Table 4 shows the total
number of collocations, the number of headwords, and the average number of collocations for
each headword of each collocation type. For each collocation type, the headword (in bold) is
somewhat arbitrarily selected to give some idea of how many collocations there are for a
particular word.
Table 4. Collocation types with statistical data from two corpora
FLAX collocations British National

Corpus
collocation type collocations head- collocations collocations head- collocations

words /headword words /headword
verb + noun(s) 20,000,000 72,000 277 1,700,000 64,000 27
noun + verb 6,600,000 92,000 71 800,000 27,000 30
adjective(s) + 19,000,000 80,000 2,800,000
237 84,000 33
noun(s)
noun + noun 8,500,000 70,000 121 1,000,000 39,000 26
adverb + adjective 510,000 20,000 25 75,000 13,000 6
adverb + verb 1,300,000 20,000 65 180,000 12,000 15
noun + of + noun 14,000,000 50,000 280 1,200,000 41,000 29
verb + adverb 870,000 19,000 45 190,000 9,000 21
verb + adjective 230,000 16,000 14 37,000 6,600 6
verb + to + verb 170,000 9,500 17 90,000 6,200 15
As the table shows, 2 - 9 times more collocations were extracted from the FLAX collocations
than from the BNC, and the number of collocations available for a particular headword
increases accordingly. The top three types have more than ten million examples, containing
50,000 to 80,000 headwords. Even the smallest, verb + to + verb, contains 170,000
collocations. The most frequent Web collocation is constitutes acceptance of (95,000,000
times), while the most frequent one in the BNC is last year. The 767 FLAX collocations
demonstrate great diversity in the language patterns they represent. For example, there are
285 variants of cause problems, including cause serious problems, cause major problems and
cause unpredictable problems. The BNC contains only 56, half of which occur only once.
Table 5 gives five more examples.
14
Table 5. Web and British National Corpus entries for cause + noun
collocation FLAX BNC examples

cause + problems 285 56 cause serious problems, cause major problems
cause + damage 257 54 cause permanent damage, cause significant damage
cause + harm 147 24 cause irreparable harm, cause no harm
cause + injury 90 14 cause physical injury, cause substantial injury
cause + death 68 14 cause sudden death, cause premature death
As a final example, we include results from the Compleat Concordancer3. Table 6 shows the
top ten cause + noun(s) collocations from three resources: the collocation database, the BNC
and the Compleat Concordancer.
Table 6. Top ten cause + noun collocations in three concordances
FLAX collocations British National Corpus Compleat Concordancer

36,000 collocations 2360 collocations 54 collocations
samples frequency samples frequency samples frequency
cause problems 2,100,000 cause problems 160 cause problems 5
cause actual results 1,900,000 cause trouble 71 cause suffering 4
cause damage 1,300,000 cause damage 48 cause damage 2
cause harm 850,000 cause difficulties 40 cause offence 2
cause injury 580,000 cause cancer 34 cause death 2
cause cancer 580,000 cause injury 32 cause distress 2
cause confusion 400,000 cause death 28 cause a great increase 2
cause death 410,000 cause confusion 27 cause another war 1
cause trouble 280,000 cause harm 23 cause deactivation 1
cause pain 250,000 cause offence 22 cause a deviation 1
The first contains 36,000 collocations; the second 2360, of which 84% occur once and 8%
twice, and the third 54, most of which appear just once. Interestingly, cause problems is the
most frequent entry in all three cases. Upon further examination, it seems that cause is used
mostly in a negative sense and associated with problems, damage, death, and so on.
The results of the evaluations comparing the number and type of collocations from different
sources underscore the massive and diverse nature of the FLAX collocations. While the sheer
volume of examples could present a challenge for less proficient learners, we believe it is
valuable for advanced learners who wish to expand their range of collocation phrases for
expressing propositions in precise and authentic ways.

3
http://www.lextutor.ca/
15
Conclusion
Wright, Betteridge and Backby’s (2006) third edition of language games is testimony to the
place of ‘games’ in a language teacher’s repertoire. They advise, “If you can only take one
book with you… take this one!” (2006, p. xii). However we argue that teachers need to look
at technologies that not only give them a pedagogical framework for designing games (as
Wright et al’s book does), but which also draw on the immense capacity of language corpora
to provide a seemingly endless source of authentic language items.
We have described a particular system, FLAX, which automatically generates collocation

learning games, but which has parameters for teacher manipulation. The corpus we chose was
the Google n-gram collection, available in 2006. As such it provides a relatively
contemporary source of collocations. We note that as we write this chapter, Google has just
released news of its new searchable database of 500 billion words contained in books
published between 1500 and 2008. The database was created from Google's index of some
5.2 million books digitized as part of its Google Books project, and includes texts in English,
French, Spanish, German, Chinese and Russian. We look forward to considering this as
potential source of language learning. We also continue with current developments exploring
how best to provide learners with opportunities to interact with their peers or teachers through
computer-mediated communication tools such as text-based chat.
References
Benson, M., Benson, E., and Ilson, R. (1986). The BBI combinatory dictionary of English: A
guide to word combinations. Amsterdam/Philadelphia: John Benjamins.
Bishop, H. (2004). “The effect of typographic salience on the look up and comprehension of
unknown formulaic sequences.” In N. Schmidt (Ed.), Formulaic sequences:
Acquisition, processing, and use, 227–244. Philadelphia, PA, USA: John Benjamins
Publishing Company.
Chinnery, G. M. (2008). You’ve got some GALL: Google-assisted language learning.
Language Learning and Technology, 12(1) 3–11.
Coxhead, A. (1998). An academic word list. Occasional Publication Number 18, LALS,
Victoria University of Wellington, New Zealand.
Dechert, H. W. (1980). Pauses and intonation as indicators of verbal planning in second-
language speech productions: Two examples from a case study. In H. W. Dechert & M.
Raupach (Eds), Temporal variables in speech (pp. 271-285). The Hague: Mouton.
Ellis, N. C. (1996). Sequencing in SLA: Phonological memory, chunking, and points of
order. Studies in Second Language Acquisition, 18, 91-126.
Gardner, D. (2008). Vocabulary recycling in children’s authentic reading materials: A
corpus-based investigation of narrow reading. Reading in a Foreign Language 20.
Retrieved from http://nflrc.hawaii.edu/rfl/April2008/gardner/gardner.html
Guo, S. & Zhang, G. (2007). Building a customised Google-based collocation collection to
enhance language learning. British Journal of Educational Technology, 38(4), 747-750.
16
Hill, J. (2000). Revising priorities: From grammatical failure to collocational success. In M.
Lewis (Ed.), Teaching collocations (pp. 70–87). Hove, U.K.: Language Teaching
Publications.
Lewis, M. (1997). Implementing the lexical approach: Putting theory into practice. Hove:
Language Teaching Publications.
Lewis, M. (Ed.) (2000). Teaching collocations. Hove, U.K.: Language Teaching
Publications.
McAlpine, J. & Myles, J. (2003). Capturing phraseology in an online dictionary for
advanced users of English as a second language: A response to user needs. System, 31,
71-84.
Nakata, H. (2006). English collocation learning through meaning-focused and form-focused
activities: Interactions of activity types and L1-L2 congruence. Proceedings of the 11th
Conference of Pan Pacific Association of Applied Linguistics. Retrieved from
www.paaljapan.org/resources/proceedings/PAAL11/pdfs/13.pdf
Nation, P. (2001) Learning vocabulary in another language. Cambridge University Press.
Nattinger, J. R., & DeCarrico, J. S. (1992). Lexical phrases and language teaching. Oxford:
Oxford University Press.
Nesselhauf, N. (2003). The use of collocations by advanced learners of English and some
implications for teaching. Applied Linguistics, 24(2), 223-242.
Pawley, A., & Syder, F. H. (1983). Two puzzles for linguistic theory: Nativelike selection
and nativelike fluency. In J. C. Richards & R. W. Schmidt (Eds.), Language and
communication (pp. 191-226). New York: Longman.
Schmitt, N. (2004). (Ed.), Formulaic sequences: acquisition, processing, and use.
Amsterdam: John Benjamins.
Shei, C.C. (2008). Discovering the hidden treasure on the Internet: using Google to uncover
the veil of phraseology. Computer Assisted Language Learning, 21(1), 67–85.
Skehan, P. (2003). Task-based instruction. Language Teaching, 36(1), 1-14.
West, M. (1953). A general service list of English words. Longman, Green and Co., London.
Wray, A. (2002). Formulaic Language and the lexicon. New York: Oxford University Press.
Wright, A., Betteridge, D., and Buckby, M. (2006). Games for language learning. Cambridge
Handbooks for Language Teachers (3rd edition). Cambridge: Cambridge University
Press.
Wu, S. (2010). Supporting collocation learning (Unpublished doctoral dissertation).
University of Waikato, Hamilton, New Zealand.
Wu, S., Franken, M., and Witten I. H. (2009). Refining the use of the web (and web search)
as a language teaching and learning resource. Computer Assisted Language Learning,
22(3), 249-268.
Wu, S., Witten, I. H., & Franken, M. (2010). Utilizing lexical data from a Web-derived
corpus to expand productive collocation knowledge. ReCALL 22(1), 83–102.
doi:10.1017/S0958344009990218
17
View publication stats

Collocation Games From A Language Corpus: January 2012

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Collocation Games From A Language Corpus: January 2012

Caricato da

Copyright:

Formati disponibili

See

Collocation Games from a Language Corpus

Chapter · January 2012

Shaoqun Wu Margaret Franken

SEE PROFILE SEE PROFILE

FLAX (Flexible Language Acquisition flax.nzdl.org) View project

EThOS for EAP View project

The user has requested enhancement of the downloaded file.

Shaoqun Wu, Margaret Franken and Ian H. Witten

Key words: collocations, corpus, on-line games

Collocation learning and games

Collocation exercises, normally presented as complementary material for vocabulary study,

Playing the games

Figure 1. The Collocation Dominoes game

the secretary of action

Learners must rematch them.

Figure 2. The Collocation Matching game

Figure 3. The Common Alternatives game

_____the bill, _____ efforts, _____the debt, ____a difference

Figure 4. The Related Words game

Designing the games

Under the hood

collocation type examples collocations words

verb + noun(s) verb + noun + noun make appointments,

FLAX collocations vs Oxford Collocation Dictionary for Students of English

collocation type headwords collocations example

Table 3. Number of collocations in the baseline and test data

collocation type headword OCDSE average FLAX average

FLAX collocations vs BNC Collocations

Table 4. Collocation types with statistical data from two corpora

FLAX collocations British National

collocation type collocations head- collocations collocations head- collocations

collocation FLAX BNC examples

Table 6. Top ten cause + noun collocations in three concordances

FLAX collocations British National Corpus Compleat Concordancer

We have described a particular system, FLAX, which automatically generates collocation

View publication stats

Potrebbero piacerti anche

___the bill, _ efforts, _the debt, __a difference