Sei sulla pagina 1di 39

doi:10.

1093/ijl/eci051

COLLOCATION, COLLIGATION AND


ENCODING DICTIONARIES. PART II:
LEXICOGRAPHICAL ASPECTS
Dirk Siepmann: Universitat-GH Siegen, Fachbereich 3, Adolf-Reichwein-Strae,
D-57068 Siegen,Germany (dsiepmann@t-online.de)

Abstract
The present article starts from a broad definition of collocations as holistic
lexico-grammatical or semantic units (see Part I for full details), asking how such
units can be adequately represented in bilingual and monolingual encoding dictionaries.
It is found that an onomasiological approach to dictionary making is better suited to
this task than a semasiological, framework-based methodology whereby individual
lexicographers work on small, alphabetically classified sections of the dictionary.
Typically, semasiological dictionaries and corresponding methodologies have difficulty
in arranging items in a clear and memorable way, give patchy or inadequate coverage to
semantic-pragmatic collocations, cannot provide adequate cross-referencing between
synonymous items and are prone to translation errors. It is shown how onomasiological
dictionaries and methodologies can remedy such deficiencies. The Bilexicon project
aimed at creating thematic learners dictionaries is the main source laid under
contribution with a view to illustrating the suggestions made.

1. Introduction
There is growing recognition that both structurally simple (i.e. (bound)
morphemes, lexemes) and structurally complex units (i.e. collocations or
colligational patterns) are linguistic signs (Feilke 2003). If the dictionary is
meant to be a record of such signs, the task of the lexicographer is to gather
together evidence of both types of sign. So far it has been lexemes, noncompositional idioms and morphemes that have received the bulk of
lexicographic attention, but the future clearly belongs to collocation and
colligation in the widest possible sense. However, most linguistic models of
collocation are too limited (e.g. Hausmann 1999), too formalist (e.g. Melcuk
1998) or too broad (e.g. Kjellmer 1994) to be readily adaptable to lexicographic
practice (see the first part of this article, IJL 18/4).
International Journal of Lexicography, Vol. 19 No. 1. Advance access publication 29 November 2005
2005 Oxford University Press. All rights reserved. For permissions,
please email: journals.permissions@oxfordjournals.org

Dirk Siepmann

A viable lexicographic definition of collocation can be based on the notions


of Gebrauchsnorm, or usage norm (Steyer 2000: 108), reflected in concepts
such as minimal recurrence (Kocourek 1991, Siepmann 2003) or statistical
significance (Sinclair 1991), on the one hand, and the notion of inhaltliche
Geschlossenheit or holisticity, on the other hand (Siepmann 2003). Holisticity
here refers to the facts that native speakers can ascribe meaning to generallanguage collocations even if these are divorced from context and that such
units are intuitively considered as self-contained wholes. We thus arrive at the
following definition of collocation:
a collocation is any holistic lexical, lexico-grammatical or semantic
unit which exhibits minimal recurrence within a particular discourse
community.
It should also be taken to include colligation with a particular grammatical
category, such as a noun phrase. Thus, the collocations the future belongs to
(die Zukunft gehort, lavenir appartient a`) or lautoroute file would be felt to
be incomplete by most speakers, requiring as they do a prepositional object.
This variable complement is conceived of as part of the collocation.
With this definition in mind, it becomes possible to suggest a four-way
typology of collocation along the following lines (see Part I):
(a) Colligation ( you can stick your NP, far be it from me to INF, ignorer
tout de N, il ny a qua` INF, ce/cette N [tradition, etc.] est reste(e),
NP dans lame, typisch N, etc.); note that this definition of colligation is
different from Firths (1957) or Hoeys (1998)1, since it concerns not only
the grammatical preferences of individual words, but also those of longer
syntagms. Thus, the syntagm tu navais qua` can be said to be in colligation
with an infinitive clause.
(b) Collocation between lexemes or phrasemes ( just as clause . . . so / in the
same manner clause, levy charges, briser ses chaussures, cest-a`-dire en
loccurrence, regarde ou` tu vas, bon ben, a` la fin, etc.).
(c) Collocation between lexemes and semantic-pragmatic (contextual)
features (beautifully [result of creative activity], [uncertainty] not so,
[question] eh bien, [expectation] duly, [negative contextual aspect]
(not) detract from s.o.s enjoyment, help! [on such one-word collocations,
cf. Gonzalez-Rey 2002: 95, 101)
(d) Collocation between semantic-pragmatic features (e.g. long-distance
collocations, Siepmann 2005).
This typology and the notational conventions that go with it present two
major advantages with a view to lexicographic applications: they allow us to
capture the full range of collocational phenomena, and they dispense almost

Collocation, Colligation and Encoding Dictionaries

entirely with complicated metalanguage such as that used in Melcuks


lexicologie combinatoire et explicative (Melcuk et al. 1995).
In what follows, I shall discuss some of the demands the full-scale integration
of lexico-grammatical units of the type just discussed places upon commercial
monolingual and bilingual encoding dictionaries. My main concern therefore
is with the reference needs of active users, such as the native French speaker
trying to write, speak or translate into English. My thesis is that the bilingual
onomasiological rather than the semasiological dictionary constitutes the ideal
repository for the collocational and colligational units required by active
users. After a brief description of the Bilexicon project aimed at producing
near-comprehensive thematic learners dictionaries, I shall go on to marshal
various sorts of evidence on the weaknesses of the semasiological and the
strengths of the onomasiological approach. This will lead to the conclusion
that the traditional dictionary-making process should be turned on its head:
rather than starting from an alphabetical framework it should proceed from
a bilingual or multilingual onomasiological research base.
I shall then proceed to discuss coverage of collocations in current bilingual
and monolingual dictionaries, together with suggestions for improvement.
The last two sections will be devoted to types of lemmas and limits on the
translatability of collocations.

2. A brief outline of the Bilexicon project


The Bilexicon project pursues a theoretical as well as a practical aim. On the
theoretical side, the aim is to provide a sound basis for the production of
unabridged onomasiological bilingual learners dictionaries which focus on
collocation. On the practical side, such dictionaries are to be developed for the
language pairs English/French, English/German and French/German, both in
print and electronic form.
The project can be sketched in rough outline only. What is said here should
not be taken to suggest that the problem of describing the native-speaker
lexicon or specific sections thereof is easily solved (for a fuller account,
see Siepmann, in preparation; for a sample chapter, see the authors website
www.dirk-siepmann.de).

2.1 Rationale
The rationale behind the Bilexicon project proceeds from a paradox about
foreign language learning in higher education: language teaching specialists
have long demanded that university graduates in modern languages should
have a native-like lexical competence in their L2 (e.g. Meiner et al. 2001);
in practice, however, such a competence is seldom attained, and few serious

Dirk Siepmann

efforts have been made to improve attainment levels. De Florio-Hansen


(2004: 83f ) sums up the situation at German universities by stating that
students linguistic competence does not increase significantly between the
beginning of their course of study and its successful completion.
However, to sustain a prolonged learning effort, students must be told how
many and which lexical items they have to learn before they can confidently
claim to be competent users of the foreign languages of their choice (cf. Council
of Europe 2001: 6.4.7.2). Only once this material basis for vocabulary learning
has been laid do methodological factors come into play and can realistic
assimilation targets be set.

2.2 The compilation of a native-like vocabulary


So far little research effort has been expended upon describing the extent
of native-like lexical competence in the L2. There is only one study for the
language pair German-French (Hausmann, forthcoming), whose aim it is to list
a large section of the receptive vocabulary of French which is intransparent
from a German perspective.
What Hausmann has achieved for the receptive side the Bilexicon project
aims to do for the productive side: to draw up a near-comprehensive list of
those collocations (including colligations) which may be considered to make
up a native-like vocabulary. The compilation of the native-like vocabulary
proceeds from two premises:
(a) Any attempt to determine basic and advanced vocabularies must start
from a list of all native-speaker signs (perhaps even including manual and
facial gestures), i.e. the entire lexicon of the language. The approach is thus
essentially top-down.
(b) It is from such a list that a near-native vocabulary can then be constructed.
Thus, rather than asking, as the traditional frequency approach did, which
are the most frequent words in the language, and which words do we need
to add to these to obtain a good working vocabulary?, this approach poses
the question what are the meaning units that native speakers use, and which
of these have to be mastered to be able to perform at a near-native (or lower)
proficiency level?. It is based on the simple observation that some adult
learners can pass as native speakers of the L2 because they have perfect
pronunciation and a command of lexico-grammar which is sufficient to express
any communicative need in a correct and natural manner. Nevertheless these
learners have not normally attained the same level of lexical competence as
a native; even for them, the framing of ideas in the foreign language is
conditioned by linguistic proficiency. It is the level of vocabulary knowledge
achieved by such learners that can be described as near-native.

Collocation, Colligation and Encoding Dictionaries

In theory, therefore, it should be fairly easy to establish a procedure that


might be used in compiling a near-native vocabulary. In practice, however,
such a procedure still comes up against considerable, if not insuperable
difficulties. The procedure might look something like this. In a first step a
full-size lexico-grammar of at least one language would have to be compiled.
The main problem at this stage is to give a definition of multi-word units that is
sophisticated enough to distinguish these from lexical bundles (Biber et al.
1999) or n-grams, i.e. mere strings of word forms which occur more than once
in a corpus. Such a definition has been attempted in Part I of this article. Thus,
for example, at the end of the is an n-gram retrievable from any medium-sized
corpus, but underlying it is the colligation at the end of the NP.
The frequency approach is an adaptation, to linguistic units beyond the
word level, of the traditional procedure for determining core vocabularies.
At its simplest, it uses a very large corpus to determine the frequency of
each meaning unit; units whose frequency is below a minimum threshold are
discarded. It is not difficult to see why this approach, if used exclusively,
is more or less unworkable. The main reason is that there is no such thing as
a representative corpus, and there are no very large corpora available which
can provide accurate guidance on spoken usage. Even the Internet or sections
of it, such as google.co.uk with the option pages from the UK is neither
representative nor reliable as a corpus. Apart from being skewed towards the
written language, it contains large amounts of outdated and non-native speaker
material2; it is also uninformative on range and distribution, i.e. the extent to
which an item appears in several different text types.
In an alternative approach, each collocational or colligational unit could
be subjected to a test for economy effects. As explained above, foreign-born
speakers who pass as natives have not normally developed the same lexical
competence as native speakers; they succeed in giving a native-like impression
by recycling or creatively recombining items from what is admittedly
a vast repertoire. This repertoire, however, need not contain the hundreds of
thousands of rough formulaic synonyms that native speakers have at their
disposal. In other words, the native-like speaker can achieve considerable
economies in learning effort by acquiring just one expression for each communicative need. Siepmann (in preparation) suggests that such economies
manifest themselves in at least eight different economy effects resulting in the
elimination of a collocation or lexeme from the near-native vocabulary.
To take but one example, a native English speaker wishing to describe the
state of being stationary in traffic can choose from among a number of
synonymic expressions, such as be / get caught in a traffic jam, be / get caught
up in a traffic jam, be / get stuck in a traffic jam, sit in traffic, sit in a traffic jam,
be stationary, etc. For the non-native, knowledge of just one of these expressions will do; when it comes to choosing which, the criteria of frequency,
availability and learnability may be invoked.

Dirk Siepmann

It should have become clear that, despite its deficiencies, the second
alternative is more promising than an approach based on frequency alone,
especially if the point of departure is a clearly delimited area of the vocabulary,
such as the language of motoring or the vocabulary relating to feelings. First,
a very large corpus of subject-specific material is assembled from Internet and
other sources, such as corpora and published dictionaries. In constructing
such a corpus, it is important to include Internet genres that are lexically close
to real-life speech, such as news forums, e-mail, fan fiction, film and soap
opera scenarios. A further means of reducing the inevitable bias towards
writing in corpus construction is to elicit judgements from native speakers on
the currency of particular words and collocations in speech. It is to be expected,
however, that such tests will produce tangible results in only a few vocabulary
areas, such as proverbs (Arnaud 1992) or idioms. In others, such as motoring,
the sheer size of the lexical material precludes any detailed investigation of
native-speaker judgements.
The third alternative is some sort of combination of the frequency-based
approach and the approach drawing on economy effects, which could,
for example, be applied in succession. Economy effects may also be taken
into consideration in determining proficiency levels below the near-native level.
The subsequent procedure involves three major steps:
(1) Corpora and dictionary sources are tapped to identify all the individual
word-forms and words belonging to the vocabulary area in question. This
involves the making of a corpus-based word list using for example
the WordSmith tool of the same name and the use of dictionaries which
allow full-text searches or searches by subject area, such as TLF, DO, PR
or CIDE.
(2) In the next step, programs such as WordSmith and Collocate are used
to determine the collocations and patterns entered by the items on the
word list.
(3) The third step is to eliminate redundant collocations on the basis of the
aforementioned economy effects.
In a fourth, optional step various proficiency levels might be distinguished
on the basis of the frequency of collocations and single words or on the
basis of the transparency of items for particular user groups (cf. Hausmann,
forthcoming).

2.3 Macrostructure
The project stands in the long tradition of what, borrowing from McArthur
(1986), we might call thematic learner lexicography a tradition that goes

Collocation, Colligation and Encoding Dictionaries

back almost to the dawn of civilisation. Recent examples of this tradition


include LLCE, VAEA and CW, to name but a few.
As McArthur (1998: 153) believes, it is impossible to find an ultimate true
schema for ordering things and words in the world, and the Bilexicon Project
lays no major claim to innovation in this respect. Its point of departure is
a fairly traditional division of the lexicon into topic areas such as motoring
and sub-areas such as parking. Where it does innovate is in the distinction
between topic areas and situation types and in cross-referencing between
syntactically and semantically similar patterns, which will be available only
in the electronic version.
The distinction between topic areas and situation types is not perfectly
clear-cut and merits a brief explanation. In a sense, every communicative
situation is of course unique, but it seems permissible to generalise across
specific situations to arrive at similar situation-types (Lyne 1985) or texttypes embedded in more general topic areas (McArthur 1981). An exclusive
focus on either of these, as found in the works just cited, seems severely
limiting, as topic areas and situation-types are interdependent. One situationtype, such as a court hearing, can involve widely varying topics. It may also
be subdivided into any number of sub-types, down to as narrow a discoursal
span as the conversational turn in the case of a simple exchange of greetings
(speaker A: hello, speaker B: hello); conversely, the same topic, such as an
account of an accident, can occur in several different situation-types or texttypes, such as general conversation, court hearings, newspaper reports or
insurance claims letters. Let us consider a few examples to illustrate the
possible categorisation of various types of collocation (see Table 1).
What distinguishes the Bilexicon from other bilingual thesauri is that
allocation of entries to topic areas is essentially bottom-up, that is, it is
the collocations found in the subject-specific corpora which determine the
Table 1: Semantic categorization in a conceptually organised dictionary
Collocation

Topic Area 1:
Situation Type 1

Topic Area 2:
Situation Type 2

money/funds/a sum/etc.
leave account/bank/etc.
Tu craches ta valda ?

Banking

Road traffic: Traffic


lights (obsolescent)
Movement: Moving
with care
Emotions: Humiliation
Emotions: Cravings

Emotions: Impatience

regarde ou` tu vas!


make s.o. feel small
I would give anything
to INF

Emotions: Care
(or Caution)

Dirk Siepmann

setting up and internal structuring of sub-areas and situation types. This stands
in contrast with traditional approaches to thesaurus building, where terms were
inserted into a fully pre-determined ontological structure. There are, of course,
obvious limitations to such an approach in that some words and collocations
have both general and topic-specific uses. A case in point is the vocabulary
relating to damage, which is important in such situation types as car accidents
but may also apply to a wide range of other situations (any kind of accident,
intention to harm, legal terminology, etc.).
Underlying this thematic organization in the electronic version will be a layer
of semantic links inspired by such work as Francis, Hunston and Manning
(1996, 1998), who have shown that words entering similar patterns usually
share an aspect of meaning. This will enable users to extend their vocabulary
along a non-thematic route and will raise their awareness of the close link
between sense and syntax.

3. Semasiological vs. onomasiological dictionaries


As noted in the previous section, the Bilexicon project aims at producing
bilingual onomasiological dictionaries whose main entry type will be of a
collocational nature. This represents a break with the word-based lexicography
still current in both semasiological and onomasiological approaches. Semasiological dictionaries tend to consist of an alphabetical word list leading the user
from the word to its meaning, while onomasiological dictionaries allow the
user to proceed from a particular concept and find the most appropriate
word for it. Both types of dictionary are therefore mainly based on individual
words although, perforce, including phraseology in sub-entries and examples.
This section begins with a brief critique of the notion of word meaning before
discussing the effectiveness of the two types of dictionary in representing
collocation.

3.1 Meaning units beyond the word


The vast majority of todays dictionaries are based on the Saussaurean
paradigm that the basic unit of meaning in a language is the word; accordingly,
dictionaries are regarded as word books (cf. German Worterbucher) which
provide records of the various senses of individual words. So influential
has been this view of the dictionary that the bestsellers among present-day
monolingual and bilingual encoding dictionaries are small to medium-sized,
alphabetically organised pocket or desk dictionaries which list one-to-one
equivalents between words and provide only limited guidance on the syntagmatics of language. Modern dictionaries thus perpetuate the time-honoured

Collocation, Colligation and Encoding Dictionaries

tradition of recording single words which has existed at least since Babylonian
antiquity.
There is, of course, no denying the fact that speakers can isolate words
from context and thus arrive at a definition of word meanings. However, since
the definition of word meaning requires the speaker to engage in a process of
abstraction, it is at least debatable whether it is word meanings that underlie
the speakers competence. Even the elicitability of paradigmatic relations
between the meanings of individual words does not allow us to conclude
that word meanings are stored in paradigmatic networks in what is often
called the mental lexicon (cf. Aitchison 1994). It is equally conceivable that
observees in psychological experiments respond with particular paradigmatic
associations because they have repeatedly met the associated items in
syntagmatic strings (cf. Rapp and Wettler 1992, Rapp 1995); as Jones (2002)
has shown, antonyms, for example, tend to co-occur syntagmatically (good or
bad, rich and poor).
The crucial factor in the acquisition of meanings thus seems to be the
primary association between lexical units of varying length3 and their extralinguistic and/or intralingual context of occurrence rather than the secondary
paradigmatic connections between two or more words that speakers can
establish when prompted or the word meaning which they can abstract out of
context when asked. Put another way, when unprompted, speakers produce
meanings by syntagmatically associating and/or modifying lexical chunks
which they have encountered before in similar contexts as the current one.
Our own practices of dictionary making have blinded us to the fact that we do
not communicate by stringing together individual words, but rather by means
of semi-prefabricated lexico-grammatical units.
This view, first proposed in outline by Bally (1909), has recently come to the
fore again in the Firthian tradition. Meaning is seen as residing in typical
combinations of lexical choices or collocability on the one hand, and typical
combinations of grammatical choices or colligation on the other (Hunston
2001). A crucial aspect of an items meaning is its semantic prosody, a term
which reflects the realisation that lexical items become infused with particular
connotations due to their typical linguistic environment (Sinclair 1991, Louw
1993, Stubbs 1995).
The implications of the above for lexicography, especially learner lexicography are clear: if a) meaning is considered to be inherent in collocation (under
which term I here subsume colligation) and b) the dictionary is intended to
provide a record of the units of meaning in a language, then future dictionaries
will have to provide a full account of collocational meaning units and their
typical contexts of occurrence.4 One of the most obvious desiderata, then, is for
collocations, as defined in the introduction, to be given entry status. Rather
than appear in the exemplificatory material, collocations of this type should
themselves be illustrated with examples as necessary.

10

Dirk Siepmann

3.2 Difficulties ofthe semasiological dictionaryin recording and representing collocation


The foregoing considerations raise questions about the macrostructure, microstructure and mediostructure (Hartmann 2001: 6466) of a dictionary which
could adequately represent collocation. There are a variety of systematic
reasons why traditional semasiological print dictionaries, whether monolingual or bilingual, will tend to fall short of this goal. Tersely stated, the main
reasons are:
(1) the difficulty of arranging items in a clear and memorable way;
(2) the inadequate coverage and representation of collocation between lexemes
and semantic-pragmatic features;
(3) insufficient discrimination between collocations and examples.
Let us deal with these in sequence.
3.2.1 Place of entry. Firstly, semasiological dictionaries arrange entries by the
alphabet. If collocations are to be given entry or sub-entry status in such
dictionaries, this will pose the age-old question about the word or word-form
under which the multi-word entry should appear. There is a wide range of
possibilities for resolving this question. The policy of many dictionaries is to
indicate some of the collocates of headwords in square brackets or in the
exemplificatory material and to enter (comparatively) fixed expressions such as
idioms at the first notional word. Thus, the idioms all hell breaks loose and
out of a clear blue sky would be found respectively at hell and clear. There are
a number of possible alternatives to this organizing schema (cf. Gates 1988).
For example:
(1) Collocations may be arranged alphabetically by their first components.
(2) Collocations may be entered at the semantically most important
component.
(3) Collocations may be entered at the grammatically most important
component.
(4) Collocations may be entered at the least frequent component if there is a
wide difference in frequency between the constituents (cf. Bogaards 1990).
The second of these possibilities would partially solve the difficulties users
have in locating collocations because of their directionality; two-item
collocations are still normally recorded at the entry for the collocate rather
than for the base (i.e. the semantically most important word). Thus, users will
find meet a criterion under meet rather than criterion, although their
formulation process starts with the noun. One wonders, however, whether
the second and third of these schemas will always lead to an unequivocal
solution, as lexicographers and users views on what is semantically and
grammatically most important may differ. The fourth solution reflects user

Collocation, Colligation and Encoding Dictionaries

11

preferences identified in an empirical study, but seems only to apply to native


(French) dictionary users rather than language learners (Bogaards 1990).
For the sake of user convenience, it is desirable therefore to enter a collocation under each of its meaning components and to cross-refer the user to
the place where the entry is found. Drawing on this insight, Petermann (1983)
has devised a consistent location policy for traditionally conceived phrasemes
(i.e. fixed expressions) which could also be applied to collocations. He suggests
that each phraseme should appear under each of its notional components
while being assigned only to one main entry. The choice of this entry is to be
determined by the following criteria: if the phraseme contains a noun, this
becomes the main entry; if there are several nouns, main entry is given to the
first. If there is no noun, main entry is given to the first adjective, etc., in the
following order: verb, adverb, pronoun, numeral, interjection. Consistent as
this policy may be in theory, the question is whether the average dictionary user
can be expected to comprehend it. Interestingly, however, it is in keeping with
the results of an empirical study (Bogaards 1990), which found that Dutch
language learners begin their searches with nouns, followed by adjectives
and verbs.
Another common suggestion consists in recording different types of
phrasemes in different ways (Burger 1989: 595). Fully idiomatic phrasemes
are to be listed under one of their components only, with cross-references at
the entries for other components; the choice of the entry term should not be
governed by semantic considerations, as these require the largest amount of
previous knowledge on the part of the user. Partially idiomatic phrasemes
which are linked to a specific meaning of a headword are to be treated under
the relevant sense division. Non-idiomatic phrasemes have to be discussed
at each of their components, under the relevant senses. Although presenting
the clear advantage of highlighting connections of meaning, this arrangement
is theoretically unsound in that, rather than recognizing the holisticity of
collocations, it presupposes their semantic divisibility and may entail an
etymological re-motivation of what is only a partially motivated or
unmotivated fixed expression (see also Burger 1989: 595).
To compound matters, the nesting of collocations may make retrieval
difficult. A large number of syntactically well-formed collocations (cf. for
example regarde ou` tu vas or Ive got [liquid, crumbs, etc.] all over/on [piece of
clothing, exercise book, etc.]) are made up of highly frequent individual lexemes
such as regarder, aller, have, haben, etc., a factor which contributes to heavily
inflating entries for such words. Current unabridged dictionaries bear ample
testimony to this, although they are still a long way from including the
totality of collocations. Thus, the entry for aller in PR, for example, runs to
three and a half columns.
One way of solving this problem would be to draw items together in blocks
at the end of the entry. Each block would present items exhibiting a particular

12

Dirk Siepmann

type of syntactic relationship, after the manner of OC, for example. But then
again such clustering may be difficult to justify with clearly motivated multiword units like there is good reason to INF; there is a strong case here for
treatment under the relevant sense division of reason.
There are, of course, equally good reasons for giving main entry to
collocations as there are for recording them under a sub-entry, whether this be
a separate entry or a sense division of a particular headword (cf. Burger 1998:
172 on multi-word units). However, if we decide to give collocations main
entry status, this will entail an even more complex macrostructure. To take but
one example, multi-word collocations serving a pragmatic or text-structuring
function and beginning with the pronoun it (it behoves us to INF, it is worth
bearing in mind that/wh-clause, etc.) or the preposition to (to give an example,
to this end, to return to NP) would fill dozens of pages, and so would two-item
collocations beginning either with common nodes or common collocates
(such as increase or give).
From all this it seems reasonable to conclude, as most theorists do (cf. for
example Burger 1989: 595 on phrasemes), that there is no ready-made solution
for the positioning of collocational units in semasiological dictionaries.
Each case requires to be considered on its own merits, and the preferences of
particular user groups have to be taken into account (Bogaards 1990, 1991);
there should be neither consistent conflation into end-of-article nests nor
arbitrary allocation to a particular sense division. Rather, as with derivatives
and compounds (which have traditionally been conceived of as distinct from
collocations), it is inevitable to steer a middle course between considerations
of semantic relatedness, user convenience and economy of treatment (cf. Cowie
1999: 150 on derivatives and compounds). In any case, collocations should
be highlighted typographically, and, if necessary, attention should be drawn
to their special pragmatic and/or text-structuring functions. However, given
the sheer size of the class of collocations, alphabetical access seems an
unmanageable solution in the long run.
3.2.2 Representation of semantic-pragmatic collocations. If we now ascertain the
relationship between types of collocations and the problems associated with
recording them, it turns out that the semasiological dictionary experiences
the greatest difficulty in adequately representing purely semantic-pragmatic
collocations occurring in specific situation-types or topic areas. A pertinent
example is afforded by semantic-pragmatic collocations based around mordre
sur (overlap into, go over into, cut into, veer off course into/onto), which
occur in three main topic areas, viz. a) geography (e.g. une region mord sur une
autre), b) medicine (une partie du corps mord sur une autre) and c) motoring
(une voiture mord sur une partie de la route).
The bilingual semasiological encoding dictionary has two options to
represent such information: by adapting PGF style: une voiture mord sur qc

Collocation, Colligation and Encoding Dictionaries

13

(accotement, ligne mediane, etc.), or by adapting CR style: [voiture] mordre sur


[accotement]. Of these, the first would seem to be immediately comprehensible
to the user, since it is very close to a natural language sentence. The monolingual encoding dictionary could solve the problem by using Cobuilds folk
definition style, which allows the lexicographer to place typical collocates in
the first part of the defining sentence:
lorsquune voiture mord sur une partie de la chaussee ou sur le bas-cote,
elle va au-dela` de la voie de circulation qui lui est normalement attribuee
Unfortunately, apart from Cobuild, DAFA and, to a lesser extent, CIDE, none
of the available monolingual dictionaries have so far made any use of the above
procedures for representing collocational meaning.
One deficiency of the semasiological encoding dictionary which even Cobuild
has been unable to remedy is the impossibility of representing synonymy
between collocations in a space-saving and user-friendly manner. Let us
consider the following example of a collocation of type 3 and its possible
representation in a semasiological dictionary:
money=funds=a sum leave account=bank=fund=country
If we were to record this semantic-pragmatic collocation ([money]
leave [place where money is stored]) with a view to enabling the user to
comprehend and encode it in its entirety, we would have to make a minimum of
three entries (at money, funds and sum) and a maximum of eight entries (money,
funds, sum, account, bank, fund, country, leave), not to speak of the amount of
cross-referencing that would be required. Moreover, collocational attraction
between any two of the constituents in this semantic-pragmatic collocation
(e.g. funds leave country) may be too weak to show up in a concordance
based on mutual information (Church and Hanks 1990) or log likelihood
(Dunning 1993), thereby not warranting the inclusion of any specific
collocation. Yet the semantic-pragmatic collocation as a whole is clearly
frequent enough and of interest to language learners, especially since other
languages such as German may have slightly different ways of expressing
the same idea (e.g. money leaves an account Geld geht von einem Konto ab /
[less commonly:] Geld verlat ein Konto).
3.2.3 Examples vs. collocations. Another problem with existing semasiological
dictionaries is that they fail to distinguish between examples and collocations,
i.e. they frequently record holistic units within the exemplificatory material
rather than assigning them entry status and exemplifying them in their turn.
This is not usually a serious problem with traditional two-word collocations
in which the collocate assumes a specific meaning if we disregard for
the moment the fact that such collocations may still be difficult to locate

14

Dirk Siepmann

for users but it becomes one in the case of collocations which appear to have
been freely put together by the application of general semantic and syntactic
rules. This can be illustrated with two examples, one from an unabridged
monolingual dictionary (GR) and one from a monolingual learners dictionary
(CCED).
GR, which offers a sprinkling of extended collocations, will serve to
illustrate the haphazard nature of current practice (for further detail, see
Siepmann 2005). Thus, the exemplificatory infinitive clause pour nen citer
quun exemple a collocation of type 2 common in academic writing is found
as the second example under sub-entry II.2:
(XIVe). Cas, evenement particulier, chose precise qui entre dans
(une categorie, un genre . . .) et qui sert a` confirmer, illustrer, preciser
(un concept). Voici un exemple de sa betise. Pour ne (nen) citer quun
(seul) exemple. Apercu, echantillon, specimen. Ce cas offre un exemple
typique de telle maladie. 5X Type. Cest un bel exemple de presence
desprit! Alleguer, apporter des exemples a` lappui dune assertion, dune
affirmation. 5X Preuve. Exemple concret illustrant une idee abstraite.
Appuyer (cit. 5) dun exemple. Exemples donnes dans un manuel de physique,
de chimie. Exemple bien, mal choisi. Donnez-moi un exemple de volcan
eteint, de plissement tertiaire. Exemples a` lappui dun raisonnement,
dune demonstration. Exemple qui prouve que . . . Il ma cite lexemple de
ce chanteur (! 1. Basse, cit. 7). Puiser ses exemples dans lhistoire
(! Ego sme, cit. 1). (GR, s.v. exemple)
The multi-word collocation in question has been entered as an example
sentence followed by a full stop. This implies that the phrase can stand on its
own, thus obscuring its textual function of introducing an example, and
potentially leading at least the foreign-born user astray.
With a collocation such as we (now) turn (now) to the situation is even less
clear. In CCED it appears in the exemplificatory material at sub-entry 12 for
turn and is not explicitly marked as a collocational unit:
We turn now to the British news.
This example sentence may, however, not be very useful to learners, since it
neglects to highlight that we are dealing with a transitional device that can be
employed in both spoken and written English rather than an ad-hoc formation.
The drawbacks of such practice should by now be obvious. For one thing,
neither the native nor the non-native user will be sensitised to the holistic
nature of multi-word units. For another, the non-native user in particular
will find it difficult to find variants of a particular collocation, such as pour ne
donner quun exemple or pour prendre un seul exemple in the case of the example
from GR this is due to the lack of synonymic links in the mediostructure

Collocation, Colligation and Encoding Dictionaries

15

already touched upon. One reason for the lack of cross-referencing with
regard to synonyms is what may be termed the alphabetical framework
approach to dictionary making. In the compilation of large-scale dictionaries
one commonly starts by drawing up an alphabetical list, or framework
of the major sense divisions before assigning one small section of the
alphabetical list to the individual lexicographer, who will identify and enter
collocations of individual lexemes without much regard to the findings of his
or her colleagues.
As can also be inferred from the above examples, another serious
disadvantage of current practice is that common collocations tend to be
submerged amid a welter of detail. Thus, in GR, it takes a considerable amount
of searching to locate the concessive discourse marker il faut bien reconnatre
que within one of the sub-entries for reconnatre. The specific pragmatic
function of the marker is not made explicit; rather, it must be inferred from
the general definition given under sense division 4 of reconnatre or from its
synonymy with the evidence marker il faut se rendre a` levidence, to which the
reader is cross-referred.
4. (XIVe). Admettre pour vrai apre`s avoir nie, ou apre`s avoir doute,
accepter malgre des reticences. 5X Admettre, averer, declarer . . . On a fini
par reconnatre son innocence. 5X Croire (a`); ! aussi Rendre hommage*
a` . . . On est force de reconnatre des divergences (cit. 1) entre certains
textes . . . Maintes fois, il le reconnat lui-meme, il manquait de bon sens
(! Grain, cit. 26). Reconnatre la superiorite de qqn. 5X Ceder (3.: le
ceder a`); proclamer . . . Amener qqn a` reconnatre. 5X Convaincre.
Reconnatre que. 5X Admettre, avouer, convenir (de); ! Boiteux, cit. 7;
demarche, cit. 4; Dieu, cit. 47; malheur, cit. 39; oracle, cit. 4. Ils ont tous
reconnu quil a fait ce quil a pu. 5X Tomber (daccord). Vous nhesiterez
(cit. 14) pas a` reconnatre que. . . Je reconnais que . . . 5X Accorder; entendre
(jentends bien). - Quoi quon dise, on doit reconnatre que . . . (- Canaille,
cit. 12). Force (cit. 58) lui etait de reconnatre que . . . (- Exciter, cit. 32).
Il faut bien, on doit reconnatre que . . . 5X Evidence (se rendre a` levidence);
! Melodique, cit. 1.
Turning now to colligational patterns, we find that quite a number of these
have found their way into the dictionaries, but that they are usually treated by
way of lexical exemplification. Here are a few examples from PR:
un mecanicien en herbe (PR; underlying colligation: NP [vocation]
en herbe)
de la graine de voyou (PR; underlying colligation: de la graine de NP)
etre musicien dans lame (PR; underlying colligation: NP dans lame)

16

Dirk Siepmann

Note that such treatment is doubly limiting. For one thing, it conceals the
generativity of the patterns as well as the limits of such generativity; for
another, it omits to signal typical textual embeddings. Thus, a colligational
pattern such as NP/ADJ a` ses heures tends to occur as an appositive (often
clause-initial), and this information must be made available to the dictionary
user. Cf. for example:
Poe`te a` ses heures, Guillaume improvisait des vers.
Nicolas, jardinier a` ses heures, dispose dune plantation qui lui fournit la
matie`re premie`re de ses petards.

3.2.4 Other deficiencies resulting from a semasiological methodology. Another point to


note (and one I shall expand upon in the section on translation equivalence
below) is that definitions and sense divisions in monolingual dictionaries
as well as translations in bilingual semasiological encoding dictionaries often
leave something to be desired. Again, this is primarily because bilingual
lexicographers who work on single letters or words often lack contextual,
or more accurately, subject-specific information; even if they have such
information in one language, they may still find it difficult to provide natural
textual equivalents because they fail to avail themselves of the time-honoured
strategy used by professional translators of comparing parallel texts, i.e. texts
which deal with the same or similar subject matter in different languages.
To compound matters, bilingual dictionaries tend to exhibit an empirical
dependency (Kromann 1991: 2714, Hausmann 2002: 1619) on monolingual
dictionaries in the sense that the aforementioned alphabetical framework
is generally grounded on monolingual dictionaries. As a consequence,
interlingual divergences which could emerge from a contrastive analysis are
not normally taken account of.
There is ample evidence from a number of studies of such dependencies.
Hausmann (2002: 1619) shows that OH was the first dictionary to introduce
the notion of tact into its French renderings of the English adjective
insensitive, for the simple reason that its compilers had at their disposal two
new monolingual dictionaries which used tact in their definitions and
provided several examples of its use including several typical collocations.
In similar vein, Cummins and Desjardins (2002) demonstrate that there is
insufficient discrimination in a number of bilingual dictionaries between the
various senses of two English-French pairs ( population/population and plus ou
moins/more or less) to enable correct encoding. For example, French population
has an affective use not paralleled by its direct English equivalent which is
better rendered by nouns or collocations such as people or the (general) public.
Again, it is reliance on monolingual dictionaries which appears to be the root
cause of such oversights.

Collocation, Colligation and Encoding Dictionaries

17

Another example can be seen in GW (German-English), which renders


the German compound noun Bildungsangebot by the clumsily literal word
combination educational offer. As a study of parallel texts will reveal, however,
the intended meaning is idiomatically expressed in British English as
educational provision (see also Laffling 1991) or training provision, as the case
may be.
While such shortcomings could be remedied fairly easily by consulting
parallel texts available from corpora or the Internet or by developing
algorithms for the automatic extraction of traditionally-conceived bipartite
verb-noun or noun-adjective collocations (cf. Laffling 1991; Smadja,
McKeown and Hatzivassiloglou 1996; Fontenelle 2003), the situation is less
straightforward with extended collocations of the type far be it from me
to INF, vieles spricht dafur, dass (see Siepmann 2005), regarde ou` tu vas or
tout se passe comme si (see Siepmann 2004). These collocations are either
absent from dictionaries or wrongly translated because there are usually no
node words on which either the human lexicographer or extraction software
could base their search for an equivalent (cf. regarde ou` tu vas pass auf, wo du
hintrittst).5
Take, for example, the discourse marker far be it from me to . . . , which is
common in academic and journalistic prose. In CG this has been rendered by
es sei mir ferne, zu . . . The German expression is untypical of modern academic
or newspaper style and has a distinctly archaic ring to it. For lack of resources
in which to locate a workable equivalent, the lexicographer must have selected
one from the entry for fern(e) in an outdated monolingual German dictionary.
Greater familiarity with academic and newspaper German or reliance on
parallel texts would have thrown up solutions such as es liegt mir
fern zu INF or nichts liegt mir ferner, als zu INF.
4. Potential benefits of the onomasiological approach
My contention in this section is that the adoption of an onomasiological,
collocation-based approach is likely to make the dictionary compilation process
more reliable and more efficient, thereby ultimately leading to more reliable
final products. So far commercially available onomasiological dictionaries,
like their semasiological counterparts, have focussed on single words or
traditionally-conceived fixed expressions (e.g. RO, DO, WE) but they will
really come into their own when collocation is taken into account.
The principal reason why the onomasiological approach is superior to the
semasiological is not far too seek: as communicators, we do not start from
lists of individual words which we then go on to combine in a suitable fashion.
It is not atomised single units, but concepts and processes (Gotze 1999: 11)
that are represented in our brain. The concepts we wish to convey and the communicative choices we make are normally expressed either by collocations or,

18

Dirk Siepmann

less commonly, by individual words.6 As pointed out above, collocations are


inextricably linked with, and usually restricted to, some particular topic area
and/or situation-type through what may be described as neuronal assemblies,
i.e. the repeated association of lexical units or semantic-pragmatic features with
a situational or syntagmatic context. In the same way, the lexicographer gains
considerable advantage from focussing on collocational choices within a
particular subject area.
Let us now consider the ways in which the onomasiological approach can
resolve the problems noted above for the semasiological approach.

4.1 General lexicographic principles and the onomasiological approach


We may start by looking at a number of lexicographical stringency criteria
proposed by Melcuk et al. (1995: 33 ff.). They point out, among other things,
that traditional dictionaries fail to describe semantically related lexemes in
a sufficiently uniform manner (Melcuk et al. 1995: 40). As an example they
cite nouns designating nationality. Whereas un Francais is defined as une
personne de nationalite francaise in one dictionary, un Chinois has no
definition, etc. Melcuk et al. (1995: 40) therefore posit the principle of
uniformity, which states that the articles representing phrasemes belonging to
one semantic field must be as closely similar as possible. It follows that,
although their idealized dictionary is alphabetical for reasons of ease of use,
it is ultimately onomasiological since the central concept underpinning it is
the semantic field. Only an onomasiological methodology can guarantee
uniformity of treatment.
Another clear advantage of the onomasiological approach lies in its being
explicit in the sense that nothing is left to the users intuition. As Melcuk
et al. (1995: 3536) point out, a collocation such as magazine feminin cannot be
entered as a mere example because it could theoretically mean either magazine
about women or magazine for women. One wonders, however, whether full
explicitness can ever be achieved when using a monolingual methodology;
as mentioned in Section 2.1 above, many of the nicer sense distinctions in
one language (such as the various meanings of French population) only come to
light against the background of another language. Thus, while monolingual
collocational dictionaries such as OC may well record stream of traffic or flow
of traffic, they do not differentiate between the two senses of the collocation
which become apparent when comparison is made with equivalent German
expressions (in German a distinction is made between flieender Verkehr
into which the road user merges and Verkehrsstrome or Verkehrsfluten
visualised as continuous lines of dense traffic).7 Nor do they take note of
triple collocations such as endless stream of traffic, which may, however,
become apparent from a contrastive search for a viable equivalent of the

Collocation, Colligation and Encoding Dictionaries

19

Table 2: stream of trafc and its German equivalents


English

German

stream of traffic / flow of traffic /


traffic flow
the steady stream of traffic
heading to St Sampsons

der Verkehrsstrom /
die Verkehrsflut
die kontinuierliche Verkehrsflut
in Richtung St. Sampsons
(die sich nach St Sampsons
ergieende Blechlawine)
schauen Sie sich fruhzeitig um
und ordnen Sie sich bei einer
gunstigen Gelegenheit in den
flieenden Verkehr ein
die Blechlawine*

look behind early and move into


the stream of traffic when safe

endless stream of traffic /


solid line of cars / heavy traffic
there is an endless stream of traffic
from the Strae des
17. Juni going past the Brandenburg Gate
we go around a bend and there
ahead of usis a solid line
of cars as far as you can see

von der Strae des 17.


Juni rollt eine Blechlawine am
Brandenburger Tor vorbei
wir fahren um eine Kurve und
vor uns ergiet sich eine
Blechlawine soweit das
Auge reicht

Table 3: wait and its French and German equivalents


English

French

German

I couldnt
je ne pouvais pas rester en
ich konnte nicht lange halten /
wait very long stationnement tre`s longtemps ich konnte nicht lange anhalten

German compound noun Blechlawine. See the entry from the projected
English-German bilingual thesaurus in Table 2.
To take but one more example, neither the big four monolingual learners
dictionaries8 nor CR recognize the specific sense that wait assumes in the area
of traffic; a bilingual methodology would reveal this sense since it requires nonliteral renditions such as rester en stationnement in French and stehen or halten
in German (see Table 3). This shows that, in a bilingual thesaurus, explicitness
can be achieved quasi automatically by recording all possible variants of
a collocation along with its topic-specific or situation-specific translations,
e.g. magazine feminin / magazine pour femmes womens magazine.
Likewise, the principle of internal coherence (Melcuk et al. 1995: 36 ff.) can
be readily adhered to in a bilingual thesaurus based on collocations rather than

20

Dirk Siepmann

lexemes (or lexemes and collocations). This principle states that there should
be perfect correspondence between the definition (i.e., in the case of a bilingual
thesaurus, the translation), the syntactic patterns and the lexical patterns
entered by a lexeme or phraseme; the only problem here is the directionality
of translation, which may lead to a larger number of entries in a bilingual
dictionary, as illustrated by the aforementioned collocation stream of traffic.
When used on its own, this collocation can be translated almost literally into
German in the form of the compound nouns Verkehrsstrom or Verkehrsflut.
When modified by the adjective endless, however, it can be rendered more
elegantly by the colloquial compound Blechlawine.
The problems with the definition of lexemes which arise from the inclusion
of such collocations as celibataire endurci do not occur in bilingual dictionaries
and are in fact purely theoretical, since collocations should be considered as
holistic meaning units. As Melcuk et al. (1995: 37) rightly conclude, the lexeme
celibataire on its own can never have the meaning homme en age detre marie
qui na jamais ete marie et qui veut rester tel although the above collocation
would seem to suggest just that.
Two additional principles proposed by Melcuk et al. (1995) are the principle
of exhaustiveness and that of compulsory consultation of databases.
As outlined in Section 2, the fulfilment of these principles can be greatly aided
through using a bilingual or multilingual approach which should proceed in an
iterative cycle:
compilation of subject-specific corpora in at least two languages !
compilation of subject-specific word and collocation lists ! analysis of the
contextual embedding of collocations with the help of the Internet !
additions to corpora from Internet sources used in context analysis (etc.)
In summary, it could be said that future lexicography should pursue
a methodology which is diametrically opposed to the framework approach
outlined above. Sooner than proceeding from alphabetical lists of individual
lexical units based on monolingual dictionaries, it would be grounded in topicspecific lists of collocations. The methodology of monolingual dictionary
making would thus also be turned on its head, since monolingual dictionaries
would benefit from the more detailed sense divisions established by bilingual
onomasiological lexicography.

4.2 Other potential benefits


An onomasiological methodology allows us to solve the problem of separating
different meaning units which would normally be allocated to the same article
in a semasiological dictionary. An example of this is the French collocation

Collocation, Colligation and Encoding Dictionaries

21

donner exemple, which can be used in three different types of situation with
two different meanings (see Siepmann 2003):
(1) a situation where the speaker/writer wishes to cite another author: Miller
(1995) donne un exemple de . . .
(2) a situation where the speaker/writer introduces an example of his or her
own: pour donner un exemple, je vais vous donner un exemple
(3) a situation where the speaker/writer gives an actual example: lArabie
Saoudite donne un exemple dEtat islamique moderne ( is an example)
The collocation would thus be given at least three entries in different subsections of an onomasiological dictionary. Similar considerations hold true for
English collocations such as avoid an accident (cf. French empecher un accident
vs. eviter un accident) or leave the road (cf. German von der Strae abfahren
[intentional] vs. von der Strae abkommen [accidental]). It is the contrastive
background of a foreign language that allows the lexicographer to uncover the
polysemy of such items.9
Another problem noted above was the placement of collocations within
the dictionary; this can be resolved quite elegantly in an onomasiological
dictionary (or hybrid electronic dictionaries) such as the projected EnglishFrench Bilingual Thesaurus (Bilexicon), where topic area and situation type
are the decisive factor in determining place of entry.
Likewise, in an onomasiological dictionary semantically related or synonymic expressions do not need to be cross-referenced, as they will appear at
the same place in the dictionary. Examples are given in Table 4.
Table 4: Synonymic collocations in an onomasiological dictionary
Synonymic or semantically related
collocations

Topic Area: Situation Type

encore nomme / autrement appele / quon


appelle aussi
dont say a word / dont make a sound /
be quiet / hush / quiet, please / shut up /
wrap up / belt up / put a sock in it
Freizeit-N, Gelegenheits-N, Hobby-N
when the right moment has come, in due
course, at the appropriate juncture, at the
appropriate moment, when the time has
come
fahren auf / befahren / benutzen / fahren
(trans.) ( Strae)

Discourse Markers:
Reformulation
Noise: Telling people
to be quiet
Hobbies: Describing amateurs
Timing: Right moment

Driving: Road use

22

Dirk Siepmann

The division of labour among various lexicographers can thus be by topic


area rather than the alphabet. For one thing, this solves the problem of missing
cross-references or missing translations for synonymic items; for another,
it allows an allocation of tasks to lexicographers by areas of real-world
expertise rather than the alphabet. Errors or infelicities such as those discussed
in Section 3 can thus be avoided.
Turning now to the problems involved in adequately representing collocations (especially of the semantic-pragmatic type), we note that the onomasiological approach allows us to adapt and further develop PGF style, as already
sketched above. PGF style indicates possible collocates in both subject and
object position; sometimes generalised labels such as s.o. or s.th. are replaced
by more specific labels such as un animal. A few examples from PGF follow:
qn fait un appel du pied a` qn jd gibt jdm einen Wink mit dem Zaunpfahl
qn conduit qn/un animal/qc quelque part jd bringt jdn/ein Tier/etw
irgendwohin; (a` pied ) jd fuhrt jdn/ein Tier/etw irgendwohin; (en voiture)
jd fahrt jdn/ein Tier/etw irgendwohin
jd schlachtet qn tue [o abat] un animal/des animaux
un animal butine ein Tier sammelt Nektar [o Blutenstaub]
This practice can be further refined in onomasiological dictionaries. The
example of Table 5 illustrates the collocations entered by the French verb
butiner; this is a typical case where an individual word in French corresponds to
a collocation in English (for further evidence of interlingual correspondences
across morpho-syntactic levels, see Part I of this article).
For reasons of space and user convenience, typical subjects of butiner are
shown in the first line of the entry, so that they do not clutter up the following
lines, where the emphasis is on object complementation. In these lines the most
Table 5: An entry for butiner
butiner {une abeille,

to collect nectar / pollen {a bee,

un papillon, une guepe, . . . butine}

a butterfly, a wasp, . . . collects nectar}

une abeille butine (quelque part:


sur les fleurs des artichauts / dans
les pissenlits)

a bee gathers / collects / sucks (up)


nectar / pollen ( from artichoke
blossoms / from dandelions); a bee
gathers / collects honey11
a bee visits a plant (to collect nectar);
collects nectar from a plant; sucks
(up) nectar from a plant
a bee sucks up nectar / a bee collects
pollen (somewhere)

une abeille butine une plante


(pour qqc: pour le nectar)
une abeille butine le pollen /
le nectar / le miel (quelque part)

Collocation, Colligation and Encoding Dictionaries

23

common specific subject abeille is used consistently, where PGF uses a


superordinate term such as animal. In the case of butiner subject and object
complementation could probably be dealt with in the same way for any number
of language pairs. With some verbs, however, the presentation of subject verb
collocations and object verb collocations may be determined by the target
language. Consider, for example, the French verb craquer and its German
equivalents in Table 6.
This second example shows that complex colligations of the type qqc craque
de qqc must be illustrated with examples to be comprehensible to the dictionary
user. PGF style can also be adapted to variable idioms. In the example of
Table 7, the core meaning is given as a noun entry, while the sentence entries
illustrate different collocations.

Table 6: An entry for craquer


craquer

knacken / knistern / knarren /


krachen / knirschen

une branche / une articulation craque


la chaussure / le toit / le fauteuil /
le parquet craque
la neige craque
qqc / qqn craque de qqc

ein Ast / ein Gelenk knackt


der Schuh / das Dach / der Sessel /
das Parkett knarrt
der Schnee knirscht
(etwa:) bei j-m knackt es irgendwo /
an einem Ort knarrt etw.

{bruits, materiaux de construction, . . .;


jointures}

il craquait de toutes ses jointures


la maison craque de bruits de
radiateurs et de boiseries

alle seine Gelenke knackten / bei ihm


knackte es in allen Gelenken
im Haus knackt und knarrt es aus
der Heizung und der Holztafelung

Table 7: An entry for un pave dans la mare


un pave dans la mare

eine Bombe (die irgendwo einschlagt)

( uberraschende und beunruhigende


Nachricht)

cest un pave dans la mare


qqn jette un pave dans la mare /
qqn envoie un pave dans
la mare / qqn lance un pave
dans la mare

das schlagt ein wie eine Bombe


j-m sorgt fur Aufregung / j-m erregt die
Gemuter / j-m wirbelt einigen
Staub auf / j-m sorgt fur Wirbel /
j-m lat die Wellen der Aufregung
hoch schlagen

24

Dirk Siepmann

In onomasiological dictionaries, additional economy of treatment may be


achieved by presenting collocations common to a particular semantic field at the
entry for the generic lexeme of the field, a suggestion that has already been
implemented by Melcuk and Wanner (1996: 233ff.) for the field of German
nouns denoting emotion. However, Melcuk and Wanner also draw attention to
the limitations of such an approach, given that even closely related nouns do not
share all their collocates (cf. Part I on the arbitrariness of collocation). For ease
of use and memorisation, it may in any case be preferable to give the entire set of
collocations for each concept or lexeme at the entry for that concept or lexeme.

5. Coverage
This section is meant to illustrate by example how the onomasiological
approach can close some of the gaps found in current encoding dictionaries.
It will be seen that even the best collocational dictionaries are far from covering
anything like the entire range of collocation described in Part I of this article.
The section is divided into three parts. The first deals with breadth of coverage,
the second with depth, while the third offers suggestions for improvement.

5.1 Breadth ofcoverage


Within the Bilexicon project, a detailed trilingual investigation was conducted
into general-language items peculiar to one area of the vocabulary familiar to
most native speakers, namely road traffic. It was found that, while offering
a fair number of collocations in this area, OC misses out some very common
ones, such as
an empty parking space, a tight parking spot, a traffic jam clears, double
bend, avoid a traffic jam, the motorway (road) links (Paris) with
(Bordeaux), close a motorway, come off the motorway, open a (new)
motorway, motorway journeys, a clear motorway, a valid driving licence,
take ones driving test, nothing coming (etc.)
Table 8 compares the results for the English noun motorway with the
list of motorway collocations given in OC. The comparison shows that a
large number of collocations which an active user (i.e. a translator or language
learner) might need have been missed out. Numerically best represented in
this example as well as in traditional dictionaries generally are noun noun,
adjective noun and noun verb collocations. Equally well covered in
traditional dictionaries are fully fixed expressions such as proverbs or idioms.
Among the collocations of type 2 three-item collocations or triples
(Hausmann 2003) are patchily covered, probably because both monolingual

Collocation, Colligation and Encoding Dictionaries

25

Table 8: Coverage of motorway in OC and in an ideal dictionary


Published dictionaries

Additional collocations
from trilingual analysis

N ADJ: busy, four-lane (etc.), N ADJ: big, large, major (! Fr. grande
autoroute); clear (! G. frei); clogged;
orbital, urban
congested;controlled; deserted; elevated;
N V: join, leave, turn off, build empty; toll-free (! G. gebuhrenfrei,
mautfrei)
N N: driving, traffic, network,
N V: block, come off, cruise, get onto,
system, bridge, junction,
go onto, go on, turn off, get off,
service area, service station,
pull off, open, reopen
crash, pile-up
N Prep.: along the motorway, N motorway: toll (! Fr. a` peage,
G. gebuhrenpflichtig, mautpflichtig),
down the motorway,
motorway N: access, bridge, company
off the motorway,
(! Fr. societe dautoroute),
onto the motorway,
intersection, journey
on the motorway,
(! G. Autobahnfahrt),
up the motorway,
lay-by, madness, maintenance,
motorway from,
miles, project
motorway to
(! Fr. projet dautoroute), trip
N Prep.: (be) beside he motorway
(! F. border lautoroute)
triples: electronic motorway tolls
(elektronische Mauterhebung), on a clear
motorway, on clear motorway
(! G. auf freier (Auto-)Bahn, auf einer
freien Autobahn), excellent motorway
access, turn a trunk road into a
motorway (enlarge a trunk road into
a motorway) (! G. eine Bundesstrae
zur Autobahn ausbauen), widen a
motorway to four lanes (! G. vierspurig
ausbauen), to do a lot of motorway
driving, the motorway links A with B
(! F. relie A a` B)

26

Dirk Siepmann

and collocational dictionaries such as OC exclude many common compound


nouns from their alphabetical framework. Thus, OC records parking as
a participial noun, but does not accord entry status to parking space, thus
missing out common triples such as empty parking space or look for a parking
space. It might be argued that empty parking space is not a collocation at all
but a free combination; this line of reasoning is contradicted by the fact that
the equivalent German collocation is freier Parkplatz (as opposed to leerer
Parkplatz, which corresponds to a deserted / empty car park; see Part I of this
article). This underscores again the importance of an onomasiological
approach, which does not pre-empt decisions on what to include on the basis
of a restricted starting list. To take another example, while all unabridged
French dictionaries enter the expressions cest-a`-dire and en loccurrence, none
of them mentions the frequent co-occurrence of the two.
This brings us to one of the most severely neglected subsets of collocations,
which have been termed second-level discourse markers (Siepmann 2005).
Second-level discourse markers are fixed expressions, restricted collocations
or colligational patterns usually composed of two or more printed words;
typical examples are it is argued that, the same goes for, strictly speaking, force
est de INF, dapre`s ce qui prece`de or with this in mind. Although ubiquitous
in both academic and journalistic language, they have so far been paid scant
attention in lexicography. In PR, for example, there is no mention at all
of the various collocations based on the colligation force est de INF
( force est de constater / reconnatre / ajouter / . . .). As in the case of cest-a`-dire
en loccurrence, these collocations in turn form their own collocations, which,
unsurprisingly, also go unrecorded in current semasiological dictionaries.
Some examples:
with this in mind let us turn to NP
turning to NP we find/note that-clause
not clause any more than clause
Patchy coverage is also given to conversational formulae of the type dont
make a sound, do you hear me, I couldnt agree more, look at the time. While
these four examples can all be located in CG or CR, those given in Table 9
are absent from at least one of the two.

5.2 Depth of coverage


Turning to depth of coverage, we find that three areas in particular are in need
of improvement, viz. a) triples b) collocational synonymy c) complementation

Collocation, Colligation and Encoding Dictionaries

27

Table 9: Conversational formulae


English

French

theres no discussion

il ny rien a` discuter

German

da gibt es nichts
zu diskutieren
I wouldnt wish it
cest quelque chose que das wurde ich
niemandem wunschen
on anyone
je ne souhaiterais
pas a` mon pire ennemi (wollen) / das wurde
ich nicht einmal
meinem argsten
Feind wunschen
ich meine es ja nur gut
just being friendly
jai seulement voulu
etre (me montrer)
aimable avec
(pour) toi/vous
this isnt really
(pour toi) il ne sagit
Dir geht es ja gar
about NP
pas de INF / NP
nicht um NP
and Bobs your uncle
et le tour est joue /
und fertig ist die Laube
et voila` le travail
Ich wurde ihn/
I wouldnt kick
Je ne coucherais
sie nicht von der
him/her out of the bed. pas dans le
Bettkante stoen.
porte-savon.

patterns or semantic-pragmatic collocations. The deficiencies found in each


of these areas will now simply be illustrated with a few examples from the
investigation into motoring vocabulary. The investigation revealed that triples
have been severely underestimated by theoreticians of collocations. Again,
the sheer size of the class, not all of whose members have been reproduced
here, indicates the superiority of an onomasiological, multilingual approach.
Where triples can be used alongside two-item collocations the triples have been
underlined (see Table 10).
Similar observations can be made for colligational patterns. The items in
Table 11 are just a small sample of those which have not been given their fair
share of attention in current dictionaries. Detailed cross-linguistic investigation
also threw up evidence of a general difference in patterning between English
and French which could never have been detected in a monolingual
investigation: in English two prepositions are often used in sequence to
describe movement, whereas French must resort to two clauses and two
different verbs to express the same idea (see Table 12). Finally, it may not be
amiss to illustrate (see Table 13) how the onomasiological approach can reveal
that synonymy, whether perfect or approximate, is not at all rare in natural
languages at the level of complex signs (i.e. collocations).

28

Dirk Siepmann

Table 10: Examples of common triples not found in other dictionaries


(English-German)
a busy road / a busy street; a much used
road
on the open road; on clear roads / on
clear motorways (etc.)
outside lane hogging / blocking the fast
lane / sitting in the outside lane
winter road clearance
s.o. changes into first gear / goes into
first gear / engages first gear / puts
the car into first gear / gets the car
into first gear
a good driving road
s.o. goes along a path / a road
the cab went along the coast road
s.o. uses a road as a rat-run
s.o. gets into the correct lane / s.o.
selects the correct lane / s.o. moves
into the correct lane

eine stark befahrene Strae / eine


viel befahrene Strae / eine
verkehrsreiche Strae
auf freier Strecke; auf offener Strae
das Blockieren der Uberholspur
der Winterdienst
j-m legt den ersten Gang ein

eine Strae, auf der es sich gut fahrt


j-m fahrt (auf ) einem Weg / einer
Strae
das Taxi fuhr uber die Kustenstrae
(fuhr die Kustenstrae entlang)
j-m nutzt eine Strae als einen
Schleichweg
j-m ordnet sich ein

5.3 Improving coverage


How can coverage be improved in future? Since OC was based on a large
general corpus (the BNC), this question is intimately linked to another, namely
whether any corpus can approach the collective linguistic experience of
a language community (Howarth 1996: 72). Clearly, the answer still has to be
in the negative at the moment of writing, especially since most of todays major
corpora are narrowly synchronic, comprising only the last fifteen years or so.
Yet in future very large corpora may well be built which will reflect the
knowledge and experience of language accumulated over several generations.
Everything stands or falls by the size and diversity of the corpora consulted,
so that it would obviously be wrong at the present time to infer the nonexistence of a collocation from its absence from a corpus.
As already pointed out, one way to overcome the limitations of exclusive
reliance on a large general corpus is by using sizeable subject-specific comparable corpora (this is the old principle of overall frequency vs. range first

Collocation, Colligation and Encoding Dictionaries

29

Table 11: Examples of common colligational patterns not found in other


dictionaries (English-German)
a car comes ( verb of motion ing)

another car came careering


around the corner
a road has a . . . mph speed limit

there is a car somewhere


there was hardly a car
on the streets
shall we go the [place name] way?

ein Auto kommt


( Bewegungsverb
Partizip Perfekt)
noch ein Wagen kam
um die Ecke gerast
auf einer Strae ist die
Geschwindigkeit auf . . . km/h
begrenzt/ auf einer
Strae gilt eine
Geschwindigkeitsbegrenzung
von . . . km/h
ein Auto fahrt irgendwo
es fuhr kaum ein Auto

sollen wir uber [Ortsname]


fahren?
eine Strae fuhrt ( j-mden)
a road takes s.o. somewhere /
irgendwo hin / eine Strae
a road takes s.o. [distance]
geht irgendwo hin / uber eine
somewhere (through / past /
Strae erreicht man [(nach)
to / into / across s.th.)
Distanz] [Ort]
eine Windboe (usw.) drangt
a gust of wind / a bend (etc.) forces
j-mden / ein Fahrzeug
a car / s.o. (somewhere:
(irgendwohin) ab; der Wind
off the road, into the crash barrier, into
druckt ein Fahrzeug aus der
the path of another vehicle, etc.); . . . forces
Fahrtrichtung; der Wind
a car to swerve (somewhere); causes a
druckt ein Fahrzeug zur
car to swerve; {wind, force of the impact}
Seite; in einer Kurve wird
pushes a car somewhere
ein Fahrzeug abgedrangt

Table 12: Cross-linguistic difference in verb patterning


English

French

the car swerved (1) across the road


and (2) into the ditch
the car veered (1) off the side of the
road and (2) several yards down an
embankment

la voiture (1) a traverse la route et


(2) a fini dans le fosse
la voiture (1) sest deportee sur le
cote de la route et (2) a devale a`
plusieurs me`tres en contrebas

30

Dirk Siepmann

Table 13: Collocational synonymy in an onomasiological dictionary


English

German

driving standards / driving practice /


driving behaviour / road manners
s.o. sticks to the speed limit / s.o.
keeps to the speed limit / s.o.
observes the speed limit

das Fahrverhalten / das Verhalten


im Straenverkehr
j-m halt sich an die
Geschwindigkeitsbegrenzung /
j-m beachtet die
Geschwindigkeitsbegrenzung
j-m / ein Fahrzeug uberschlagt
sich dreimal

a car turns over three times / rolls


three times / somersaults three
times / overturns three times
s.o. / a car is stopped by the police
(*s.o. is pulled by the cops)
a car / a trailer swerves / goes out of
control / wipes out / veers off its
path
a car gets trapped under another /
a car is jammed under another / a
car is left wedged under another / a
car is left embedded under another

j-m / ein Wagen wird von der Polizei


angehalten (*wird von den Bullen
gestoppt)
j-m / ein Wagen bricht aus; j-m gerat
aus der Spur; j-m kommt von der
Fahrtrichtung ab; j-m gerat ins
Trudeln
ein Fahrzeug verkeilt sich in einem
anderen / ein Fahrzeug ist
eingekeilt unter einem anderen

applied by Thorndike 1921); in addition, all such corpora should be compiled


for several languages. This is exactly the procedure followed in the aforementioned investigation of road traffic vocabulary, which used a specialist
trilingual corpus of around 200 million words and three large general corpora
of around 600 million words. Such breadth in corpus selection will usually
enable the lexicographer to fill gaps in the corpora of one language by
translating an item from another language (of course, the translation should
itself be checked against a very large corpus such as the Internet). To give
a simple example, the French collocation heurter de plein fouet is highly
common in newspaper reports on car accidents, but corresponding English
collocations such as hit with full force / at speed are extremely rare in
comparable English corpora.
Such a procedure is also of great interest to contrastivists, since it enables
them to discover lexical gaps and divergences in colligational or clause patterns
(see above). Thus, the aforementioned study of motoring vocabulary showed
that there is no standard English equivalent for German aus der Kurve getragen
werden or French etre deporte dans un virage; however, expressions such
as wipe out on the bend or veer off the road on the bend may fill the bill.

Collocation, Colligation and Encoding Dictionaries

31

Similarly, monolingual German lexicography might well overlook such


colligational patterns as Geschwindigkeit auf der Autobahn or Strae, auf der
sich gut fahren lat, whereas combinations such as the compound noun
motorway speed or the adjective-noun collocation a good driving road will be
readily detectable in an English corpus. Of course, such considerations are also
true for the other translation direction (cf. sick note on demand
Gefalligkeitsattest certificat de complaisance; accident involving . . . accident
mettant en cause . . . Unfall, an dem . . . beteiligt sind ).
Finally, it should be noted that, if the aim is to cover collocation as well as
colligation, then it will be impossible to fully automate the dictionary-making
process in the foreseeable future. The reason for this is that such colligational
patterns as NP/ADJ dans lame / en herbe (etc.) cannot be located in even the
most sophisticated tagged corpora, since the retrieval software will also come
up with such sequences as NP/ADJ dans la maison / dans la grotte / dans
lhotel (etc.). Human intervention will thus remain indispensable.

6. Collocation types, lemma types and citation forms


As seen above, a useful distinction can be established between four major types
of collocational relationship. However, the distinction cannot be transferred
as such to the dictionary for a number of reasons:
(1) Firstly, there is no one-to-one correspondence between collocation types
and the three traditional lemma types (one-item lemma, multi-item lemma,
morphematic lemma); long-distance collocations do not fall into any of
these three categories; they also cut across the boundary of categories 2
and 3, as do some two-item collocations.
(2) Any dictionary maker who aims at commercial viability and user
friendliness should at least be wary of representing collocations of type 3
by means of general semantic labels such as [uncertainty] not so. In such
cases it may be wiser to exemplify rather than abstract away from actual
instances. For maximum user friendliness, the example should exhibit
prototypical features of the collocation to be recorded (cf. Harras 1989: 611
on entry words; on prototype theory, see Aitchison 1994). In learners
dictionaries, the definition may help to introduce an element of generality
or abstraction that would be missing in other dictionaries, as witness the
example in Cobuild style (see Figure 1; Siepmann 2005: 318).
Note the pioneering use of broken underlines to illustrate the presence of
long-distance collocational attraction based on semantic features. The same
typographical presentation could be used in any bilingual dictionary. Since
bilingual dictionaries do not normally contain definitions, at least two examples

32

Dirk Siepmann
so /sou/
(...)
12 You can use not so to say that what you have just
PHR as
stated is untrue although it may have seemed probable sentence
at first sight. This use is particularly common in written PRAGMATICS
English. Some might think Volkswagen, which now
owns 70 per cent of the Czech company, would have
thought the Skodas identity problematic. Not so. VW
sees Skoda as one of the most recognised brand names
in advertising.

Figure 1: A sample entry for not so in Cobuild style

Table 14: Lemma types


Linguistic Category

Lemma type

Example

morpheme
lexeme
collocations of
type 1, 2 and 3

morphematic lemma
one-item lemma
multi-item lemma:
a) colligational
b) collocational

long-distance
collocations
of type 3

separable lemmas

un micro-N, ein Hobby-N


une pomme
a) N a` ses heures
b) une pomme de terre,
tomber dans les pommes,
reconnatre ses torts
de meme que . . . de meme ;
turning to . . . we find / note ;
it was hoped that . . . not so

of each collocation should be given for the user to form a correct understanding of its use and to be able to use it productively in a new context.
Accordingly, unabridged dictionaries of the future should contain at least
the three major types of lemmas (one-item lemmas, multi-item lemmas
and morphematic lemmas)10; to this we might add separable lemmas as
representations of long-distance collocations and some collocations of type 3
(see Table 14). As seen in Tables 5 and 6, complementation patterns can be
shown using placeholders such as so or sth or typical representatives of the
semantic class which can be inserted into a particular slot, such as abeille
in Table 5.
7. The limits of translatability
Opponents of bilingual dictionaries or vocabulary lists for encoding purposes
have often argued that such learning materials encourage the erroneous
assumption of one-to-one equivalences between items. The argument is
clearly valid if we equate one-word items such as house and maison or
English population and French population, but it falls apart in the case of

Collocation, Colligation and Encoding Dictionaries

33

monoreferential collocations. As the aforementioned investigation into road


traffic vocabulary in English, French and German has shown, the overwhelming majority of collocations in this area are not culture-specific and have
direct equivalents in the other languages. Even colloquial idioms, which
might be intuited to be culture-specific, usually have perfect equivalents
(see Table 15). Translational equivalences may exist between any type of
construction, as witness the examples given in Table 16.
There are, however, a few exceptions, which may arise from two types
of causes: 1) real-world constraints 2) language-internal developments
(cf. Siepmann 2003). Examples of type 1 are Trauspruch, which has no
equivalent in French wedding ceremonies, and Reiverschlussverfahren, which
refers to the procedure whereby cars alternately move into another lane when
a lane closure is ahead. It follows that any collocation based on these
nouns, such as Trauspruch beten or nach dem Reiverschlussverfahren,
has to be rendered by means of a paraphrase (e.g. merge in turn). In such
cases, the lexicographer has no alternative but to record two example sentences
Table 15: Translational equivalences
English

French

German

he must have found his


licence in a lucky bag
/ (AE) he must have
got his licence from a
lucky dip
theyve got nothing
against me

il a du avoir son
permis dans une
pochette surprise

er hat wohl den


Fuhrerschein im Lotto
gewonnen / er hat wohl
seinen Fuhrerschein bei
Neckermann gekauft
man hat nichts gegen
mich in der Hand

mon dossier est vide

Table 16: Translational equivalences between different types of item


English

French

German

A budding N

un N en herbe

an amateur N

un N a` ses heures

similarly with NP

il en va semblablement
pour NP
un NP superieur aux
attentes

ein angehender N / eine


angehende N / ein
angehendes N
ein Freizeit- (N) / ein
Gelegenheits- (N)
Ahnliches gilt fur NP

an NP that exceeds
expectations

ein NP, der die Erwartungen


ubertrifft

34

Dirk Siepmann

rather than citation forms. The same goes for collocations where one language
uses an implicit form of words which the other tends to make explicit. Thus,
imagine a car parked alongside a fence, so that little space is left between the
passenger door and the fence. The typical question German drivers put to their
passengers in such a situation will go something like this: Soll ich ein Stuck
vorsetzen? An English driver might prefer a more explicit wording along
the lines of: Do you want me to move the car / it forward a bit? (alongside Shall I
go forward a bit?)
Exceptions of type 2 occur when the languages under survey do not offer
the same number of collocations for some particular idea. Such difference
has frequently been noted in the area of single-word lexemes: it has long
been known, for example, that English has more verbs of movement than
either French or German. Similar observations can now be made for
collocations. Thus, English resemblance collocates with a wider variety of
adjectives denoting strangeness than its French and German counterparts
(cf. Siepmann 2003).
Both types of exceptions require special attention on the part of the lexicographer. It is particularly dangerous to resort to intuitive translations, as
a number of defective translations from published dictionaries (e.g. weitraumige
Umleitung ! *diversion covering a wide area [PGE]) readily attest.
Sometimes such translation errors occur because there are genuine
collocational gaps, but nevertheless the translator wishes to provide a
collocation at all costs. The best strategy to follow in such cases is to study
parallel texts and to offer a suitable paraphrase which should be marked
as such (e.g. by using the tilde).

8. Conclusion
The broadly-based definition of collocation on which this article is based
opens up new perspectives for both monolingual and bilingual lexicography.
Future dictionaries will need to record any type of structurally complex unit,
paying increased attention to collocational frameworks (my NP exactly) and
fixed expressions of regular syntactic composition (Ive got eyes in my head,
there are good reasons for believing that, I couldnt agree more, etc.). It has been
shown that bilingual or multilingual onomasiological lexicography is set to lead
the way in this endeavour, since it has obvious advantages over monolingual
and semasiological approaches; bilingual dictionaries should no longer be
based on monolingual dictionaries, but rather the other way round. It has
also emerged that the onomasiological dictionary of the future will constitute
a new kind of dictionary of synonyms to the extent that it will contain
collocational rather than one-word synonyms, along the lines of Schemanns
(1991) dictionary of German idioms (SR).

Collocation, Colligation and Encoding Dictionaries

35

Notes
1

Hoey (1998) defines colligation thus: (a) the grammatical company a word keeps
(or avoids keeping) either within its own group or at a higher rank (b) the grammatical
functions that the words group prefers (c) the place in a sequence that a word prefers
(or avoids).
2
Note, however, that there is much less non-native material to be found on the
Internet for languages such as French, German or Italian, so that a more reliable picture
of native language use can be built up.
3
Of course, meaning arises through the interaction of mother and child long before
it can be represented linguistically (cf. Nelson 1998, Stern 1998). It is commonly
assumed that babies who are not yet able to speak assign meaning to the different
phases of a proto-narrative sequence. The first meanings acquired in early language
acquisition are therefore of a holistic nature; the words bath or bathroom, for
example, will be associated with the relevant proto-narrative sequence (entering the
room, opening the tap, feeling the warmth of the water, the stinging sensation of soap
in the babys eyes, etc.) rather than a room containing a toilet, a shower, a bathtub and
a washbasin. It thus appears that meaning is created by the repeated connection between
feelings and/or lexical units on the one hand and contexts on the other hand.
4
It will be noted that the underlying assumption here is that more is better. Active
users such as advanced foreign language learners and translators working into a foreign
language require the most detailed and comprehensive information possible. It might
be argued that such users should turn directly to corpora instead, but the advantage of
a good dictionary is that it provides a ready-made account of the significant features
of a lexical item in a clear and memorable way.
5
Another problem attendant upon automatic extraction is the lack of an adequate
corpus base for collocations typical of spoken language.
6
It may be noted in passing that most complete utterances which consist of an
individual word are, in fact, collocational in nature, cf. help!; blood!; bed!; they are
holistic, situation-specific units (cf. Gonzalez-Rey 2002: 95, 101).
7
I do not wish to suggest that contrastive lexicology and bilingual or multilingual
lexicography can take account of all possible distinctions arising from cross-linguistic
comparison. As Hausmann (1995: 23) notes, such comparison could only be exhaustive
if it is restricted to lexical units with a relative degree of semantic autonomy; Hausmann
argues, for example, that lexical units exhibiting a high degree of context-dependence,
such as the French adjective sauvage would give rise to an endless multiplication
of potential equivalences. Arbitrary limits must therefore be set on the number of
languages to be compared as well as on equivalences and sense distinctions. The number
of languages will usually be restricted to two, i.e. the language pair treated in the
dictionary, since sense distinctions that are useful to, say, Italians using English are
not relevant to a French-English dictionary. It should also be noted, however, that
Hausmann overstates his case by focussing too much on the language of literature,
where creativity is at a premium. We will soon be able to cover exhaustively the ordinary
patterns, collocations and sense distinctions found in conversation and pragmatic
text types.
8
OALD is the only monolingual dictionary to record a similar sense (stop a vehicle
at the side of the road), which is too specific (cf. waiting at the traffic lights).

36

Dirk Siepmann
9

This does not mean that the question of an items polysemy is decided by applying
interlingual criteria; rather, cross-linguistic comparison should be viewed as a useful
heuristic to discovering language-internal polysemy which could theoretically also be
detected through monolingual investigation. It is also worth bearing in mind that
polysemy is an extremely relative notion, and that the spectrum of meanings covered by
a large number of words can give rise to an almost infinite number of context-dependent
sense divisions (cf. footnote 4 above).
10
It may be misleading to speak of multi-word lemmas, as Steyer (2000) does,
since colligational patterns contain slots filled by particular categories rather than
a specific word.
11
Technically, of course, bees do not collect honey, but the collocation is often
used in everyday language.

References
1. Dictionaries
Atkins, B. T. et al. 1993. Collins Robert French-English English-French Dictionary.
Unabridged. (3rd ed.). Glasgow: HarperCollins. (CR)
Atkins, B. T. et al. 1994. Le Robert & Collins. Vocabulaire anglais et americain. Paris:
Le Robert. (VAEA)
Binon, J. et al. 2000. Dictionnaire dapprentissage du francais des affaires. Paris: Didier.
(DAFA)
Dendien, J. 2004. Tresor de la Langue Francaise Informatise. Paris: CNRS. (TLF)
Cop, M. et al. 2001. PONS Groworterbuch Englisch. Stuttgart: Klett. (PGE)
Correard, M. (ed.) 1994. Oxford/Hachette French Dictionary. French-English/
English-French, Oxford: Oxford University Press. (OH)
Crowther, J. et al. 2002. Oxford Collocations Dictionary for Students of English. Oxford:
Oxford University Press. (OC)
Chapman, R. L. (ed.) 1996. Rogets International Thesaurus. Glasgow: HarperCollins.
(RO)
Collins Cobuild English Dictionary for Advanced Learners (3rd ed. 2001). Glasgow:
HarperCollins. (CCED)
Dornseiff, F. and Quasthoff, U. 2004. Der deutsche Wortschatz nach Sachgruppen. Berlin:
De Gruyter. (DO)
Hamblock, D. and Wessels, D. 1999. Groworterbuch Wirtschaftsenglisch
Deutsch-Englisch/Englisch-Deutsch (5th ed.). Berlin: Cornelsen. (GW)
Knight, L. S. et al. 1999. Collins German-English English-German Dictionary. Unabridged
(4th ed.). Glasgow: HarperCollins. (CG)
McArthur, T. 1981. Longman Lexicon of Contemporary English. London: Longman.
(LLCE)
Quasthoff, U. (ed.) 2003. Franz Dornseiff: Der deutsche Wortschatz nach Sachgruppen
(CD-ROM). (DO)
Procter, P. (ed.) 2001. Cambridge International Dictionary of English on CD-ROM.
Cambridge: Cambridge University Press. (CIDE)
Rey, A. (ed.) 1993. Le nouveau Petit Robert. Paris: Le Robert. (PR)
Rey, A. (ed.) 1985. Le Grand Robert de la langue francaise sur CD-ROM. Paris:
Le Robert. (GR)
Schnorr, V. et al. 1996. PONS Groworterbuch Franzosisch. Stuttgart: Klett. (PGF)
Schemann, H. 1991. Synonymworterbuch der deutschen Redensarten. Stuttgart:
Klett. (SR)

Collocation, Colligation and Encoding Dictionaries

37

Walter, E. (ed.) 1994. Cambridge Word Routes. Anglais-Francais. Cambridge:


Cambridge University Press. (CW)
Wehrle, H. and Eggers, H. 2001. Deutscher Wortschatz. Stuttgart: Klett. (WE)

2. Other literature
Aitchison, J. 1994. Words in the mind. An Introduction to the Mental Lexicon. Oxford:
Blackwell.
Arnaud, P. J. L. 1992. La connaissance des proverbes francais par les locuteurs natifs
et leur selection didactique. Cahiers de Lexicologie 1: 195238.
Baker, M., Francis, G. and Tognini-Bonelli, E. 1993. Text and Technology: In Honour
of John Sinclair. Amsterdam/Philadelphia: Benjamins.
Bally, C. 1909/1951. Traite de Stylistique Francaise (Vol. 1). Geneva: Librairie
Georg & Cie.
Biber, Douglas et al. 1999. Longman Grammar of Spoken and Written English. London:
Longman.
Bogaards, P. 1990. Ou` cherche-t-on dans le dictionnaire? International Journal of
Lexicography 3: 79102.
Bogaards, P. 1991. Word frequency in the Search Strategies of French Dictionary
Users. Lexicographica 7: 202212.
Burger, H. 1989. Phraseologismen im allgemeinen einsprachigen Worterbuch
in F. J. Hausmann, Franz Josef, et al. (eds.), Worterbucher: Ein internationales
Handbuch zur Lexikographie. Vol. 1 (Handbu cher zur Sprach- und
Kommunikationswissenschaft; Vol. 5). Berlin/New York: De Gruyter, 593599.
Burger, H. 1998. Phraseologie: Eine Einfuhrung am Beispiel des Deutschen. Berlin:
Schmidt.
Church K. W. and Hanks P. 1990. Word Association Norms, Mutual Information
and Lexicography. Computational Linguistics 1: 2229.
Council of Europe 2001. Common European Framework of Reference for Languages:
Learning, Teaching, Assessment. Cambridge: Cambridge University Press.
Cowie, A. 1999. English Dictionaries for Foreign Learners. Oxford: Oxford
University Press.
Cummins, S. and Desjardins, I. 2002. A Case Study in Lexical Research for
Translation. International Journal of Lexicography 2: 139156.
de Florio-Hansen, I. (2004), Wortschatzerwerb und Wortschatzlernen von
Fremdsprachenstudierenden. Erste Ergebnisse einer empirischen Untersuchung.
Fremdsprachen Lehren und Lernen 33: 83113.
Dunning, T.E. 1993. Accurate Methods for the Statistics of Surprise and Coincidence.
Computational Linguistics 1: 6174.
Feilke, H. 1996. Sprache als soziale Gestalt. Frankfurt: Suhrkamp.
Feilke, H. 2003. Kontext Zeichen Kompetenz. Wortverbindungen unter
sprachtheoretischem Aspekt in K. Steyer (ed.), 4164.
Firth, R. 1957. Papers in Linguistics. London: Oxford University Press.
Fontenelle, T. 2003. Collocations et traitement automatique du langage naturel
in F. Grossmann et A. Tutin, 7588.
Francis, G., Hunston, S. and Manning, E. 1998. Collins Cobuild Grammar Patterns 2:
Nouns and Adjectives. London: HarperCollins.
Gates, E. 1988. The treatment of multi-word lexemes in some current dictionaries of
English in M. Snell-Hornby, Mary (ed.) (1986), ZuriLEX86 Proceedings. Papers
read at the Euralex International Congress. Tubingen: Francke, 99106.

38

Dirk Siepmann

Gotze, L. 1999. Der Zweitspracherwerb aus der Sicht der Hirnforschung in Deutsch
als Fremdsprache 1: 1016.
Gonzalez-Rey, I. 2002. La phraseologie du francais. Toulouse: Presses Universitaires
du Mirail.
Grossmann, F. and Tutin, A. (eds.) 2003. Les collocations: analyse et traitement. Travaux
et recherches en linguistique appliquee Serie E. Amsterdam: De Werelt.
Harras, G. 1989. Zu einer Theorie des lexikographischen Beispiels in Hausmann et al.
(eds.), 607614.
Hartmann, R. R. K. 2001. Teaching and Researching Lexicography. London: Longman.
Hausmann, F. J. et al. (eds.) 19891991. Dictionaries: An International Encyclopedia
of Lexicography (3 Vols.). Berlin: Walter de Gruyter.
Hausmann, F. J. 1995. Von der Unmoglichkeit der kontrastiven Lexikologie in
H.-P. Kormann and A. L. Kjaer (eds.), Von der Allgegenwart der Lexikologie.
Kontrastive Lexikologie als Vorstufe zur zweisprachigen Lexikographie. Tubingen:
Niemeyer, 1923.
Hausmann, F. J. 1999. Le dictionnaire de collocations Crite`res de son organisation
in N. Greiner et al., Texte und Kontexte in Sprachen und Kulturen. Festschrift fur
Jorn Albrecht. Trier: Wissenschaftlicher Verlag Trier, 121140.
Hausmann, F. J. 2002. La lexicographie bilingue en Europe: peut-on lameliorer? in
La Lessicograa Bilingue tra presente e avvenire, Atti del Convegno Vercelli, 45
maggio 2000, a cura di Elena Ferrario e Virginia Pulcini, Vercelli: Mercurio, 1132.
Hausmann, F. J. 2003. Was sind eigentlich Kollokationen? in K. Steyer (ed.), 309334.
Hausmann, F. J. forthcoming. Der undurchsichtige Wortschatz des Franzosischen.
Lernwortlisten fur Schule und Studium.
Hoey, M. 1998. Introducing Applied Linguistics: 25 Years On. Plenary Paper in the
31st BAAL Annual Meeting: Language and Literacies, University of Manchester,
September 1998.
Howarth, P. 1996. Phraseology in English Academic Writing. Some Implications for
Language Learning and Dictionary Making. Tubingen: Niemeyer.
Hunston, S. 2001. Colligation, Lexis, Pattern and Text in M. Scott and G. Thompson
(eds.), Patterns of Text. In honour of Michael Hoey. Amsterdam: Benjamins, 1334.
Jones, S. 2002. Antonymy: A Corpus-based Perspective. London: Routledge.
Kjellmer, G. 1994. A Dictionary of English Collocations. Oxford: Clarendon Press.
Kocourek, R. 1991. La langue francaise de la technique et de la science. Wiesbaden:
Brandstetter.
Kromann, H.-P. 1991. Principles of Bilingual Lexicography in F. J. Hausmann et al.,
27112728.
Lafing, J. 1991. Towards High-Precision Machine Translation. Based on Contrastive
Textology. Berlin: Foris Publications.
Louw, B. 1993. Irony in the text or insincerity in the writerthe diagnostic potential
of semantic prosodies in M. Baker, G. Francis and E. Tognini-Bonelli (eds.),
157176.
Lyne, A. A. 1985. The vocabulary of French business correspondence. Word frequencies,
collocations and problems of lexicographic method. Gene`ve/Paris: Slatkine-Champion.
McArthur, T. 1981. Longman Lexicon of Contemporary English. Londres: Longman.
McArthur, T. 1986. Thematic Lexicography in R. R. K. Hartmann, The History
of Lexicography. Papers from the Dictionary Research Centre Seminar at Exeter,
March 1986. Amsterdam: Benjamins, 157166.
McArthur, T. 1998. Living Words: Language, Lexicography and the Knowledge
Revolution, Exeter: University of Exeter Press.

Collocation, Colligation and Encoding Dictionaries

39

Meiner, F. J. et al. 2001. Zur Ausbildung von Lehrenden moderner Fremdsprachen.


Ergebnisse einer Reexionstagung zur Lehrerbildung (23./24. Marz 2000, Schloss
Rauischholzhausen). Franzosisch heute 32: 212227.
Melcuk, I. 1998. Collocations and Lexical Functions in A. Cowie, Phraseology.
Theory, Analysis and Applications. Oxford: Clarendon Press, 2353.
Melcuk, I., Clas, A. and Polgue`re, A. 1995. Introduction a` la lexicologie explicative
et combinatoire. Louvain-la-Neuve: Duculot.
Melcuk, I. and Wanner, L. 1996. Lexical Functions and Lexical Inheritance for
Emotion Lexemes in German in L. Wanner (ed.), Lexical Functions in Lexicography
and Natural Language Processing. Amsterdam: Benjamins, 207277.
Nelson, K. 1998. Language in Cognitive Development. The Emergence of the Mediated
Mind. Cambridge: Cambridge University Press.
Petermann, J. 1983. Zur Erstellung ein- und zweisprachiger phraseologischer
Worterbucher: Prinzipien der formalen Gestaltung und der Einordnung von
Phrasemen in J. Matesic (ed.), Phraseologie und ihre Aufgaben. Beitrage zum 1.
Internationalen Phraseologie-Symposium vom 12. bis 14. Oktober 1981 in Mannheim.
Heidelberg: Groos, 172191.
Rapp, R. and Wettler, M. 1992. Wie mit Hilfe des Assoziationsgesetzes freie
Wortverbindungen vorhergesagt werden konnen. Tagungsband der 34. Tagung
experimentell arbeitender Psychologen, Osnabruck, 401.
Rapp, R. 1995. Die Berechnung von Assoziationen. Hildesheim: Olms.
Sinclair, J. M. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University
Press.
Siepmann, D. 2003. Eigenschaften und Formen lexikalischer Kollokationen: Wider ein
zu enges Verstandnis Zeitschrift fur franzosische Sprache und Literatur 1: 260283.
Siepmann, D. 2004. Linguistische und didaktische Aspekte der U bersetzung
von Mehrwortgliederungssignalen am Beispiel der Suggestoren in B. Kovtyk and
G. Wendt, Ausbildung von ubersetzern im neuen geeinten Europa 2004linguistische,
didaktische und psychologische Aspekte. Berlin: Logos, 123142.
Siepmann, D. 2005. Discourse Markers across Languages. A contrastive study of
second-level discourse markers in native and non-native text. New York: Routledge.
Siepmann, D. (in preparation). Thematic Learner Lexicography. Linguistic and
User-Related Aspects.
Smadja, F., McKeown, K. R. and Hatzivassiloglou, V. 1996. Translating collocations
for bilingual lexicons: A statistical approach. Computational Linguistics 1:138.
Stern, D. 1998. Die Mutterschaftskonstellation. Eine vergleichende Darstellung
verschiedener Formen der Mutter-Kind-Psychotherapie. Stuttgart: Klett-Cotta.
Steyer, K. 2000. Usuelle Wortverbindungen des Deutschen. Linguistisches Konzept
und lexikograsche Moglichkeiten. Deutsche Sprache 2: 101125.
Steyer, K. (ed.) 2003. Wortverbindungenmehr oder weniger fest. (Jahrbuch des Instituts
fur deutsche Sprache.) Berlin: De Gruyter.
Stubbs, M. 1995. Corpus evidence for norms of lexical collocation in Cook, G. and
Seidlhofer, B. (eds.) 1995. Principle and practice in Applied Linguistics: Studies in
Honour of H.G. Widdowson. Oxford: Oxford University Press, 245256.
Thorndike, E. L. 1921. The Teachers Word Book. New York: Columbia University.

Potrebbero piacerti anche