Sei sulla pagina 1di 25

Concordances, collocations

and connotation
Barnbrook G (1996) Language and Computers.
Edinburgh: EUP. Chapters 3,4,5
Partington A (1998) Patterns and Meanings.
Amsterdam: John Benjamins. Chapters 1,2,4

Lexical information in corpora


Start looking at the kind of information (about
individual words) that can be got from corpora

Simple frequency information


Distribution information
Collocation (co-occurrence information)
Connotation (semantic prosody)

Introduce basic ideas


Future topics
Statistics
Case studies
2

Frequency information
Most banal information: counting how
many times a word (type) appears in a
text
Most frequent words will be function
words, so often f counts exclude words
listed in a stop list
Should you count words or lemmas?
Should you distinguish alternate meanings
of ambiguous word forms (if you can)?
3

Frequency information
Frequency information on its own is not
particularly interesting
Quite useful to compare f of related words
eg alternative readings of a given word form (already
seen in probability calculations in tagging)
or comparing near synonyms, especially if we can
take context into account (see later)

f of a given word in a given context can be


indicative, eg pronouns more frequent as subject
or 1st word of sentence
4

Types and tokens


Remember distinction between tokens (words) and
types (different words)
Type count gives a measure of how many DIFFERENT
words are used
Type-token ratio gives a measure of vocabulary
richness
If vocabulary is very varied, TTR will be higher

TTR is very sensitive to overall text length, so it is not


meaningful to compare TTRs for texts of different lengths
Standardized TTR is the average of the TTR for each
sequence of n words (typical default n=1000) in a text or
corpus
5

Vocabulary growth curve

Plotting types against tokens for a given text shows us how the TTR
grows as the text gets longer
Typically, the curve starts steeply and then flattens, sooner or later
reflecting homogeneity (or otherwise) of the text

VGC for Macbeth in Basic English


source: http://web.missouri.edu/~youmansc/vmp/help/Youmans-TypeToken.pdf

Vocabulary growth curve


Comparative VGC for four texts
Simple measure used in some literary studies
(a)
(b)
(c)
(d)
(a) Longfellow
(b) Hemingway
(c) Basic English (Macbeth)
(d) Bible (Genesis 2)
7

Vocabulary in context
Concordance, also known as KWIC list
(key word in context)
Allows us to see the (immediate)
environment in which a word appears
Listings can be customised to show what
you want more clearly, eg
sorted according to next or previous word
showing more or less context
8

source:
A Partington
Patterns and
Meanings.
Amsterdam
(1998): John
Benjamins

CIWK search
inverted KWIC
specify the context and look to see what
words occur in it

10

Collocation
Term coined by J R Firth (1957) to characterise
(part of) his theory of meaning
You shall judge a word by the company it
keeps
The occurrence of two or more words within a
short space of each other in a text (Sinclair
1991)
The relationship a lexical item has with items
tha appear with greater than random probability
in its (textual) context (Hoey 1991; emphasis
added)
11

Collocation, text type and style


Distinguish between general and more
usual collocations vs technical and more
personal ones
eg in a general corpus time collocates with
save, spend, waste, fritter away,
but in a corpus of sports reports time
collocates with half, full, extra, injury, first,
second, third,
12

Collocation and idiom


Listing collocations will often reveal idioms
and cliches
Important to think of collocation as
extending beyond neighbouring words
(which can be captured by simple
concordances)

13

Collecting collocations
If we are to look beyond neighbouring
words, what constraints might we impose?
Collocation means co-occurrence within
some defined context
possibly a window of n words to left and/or
right
if corpus is tagged/parsed, we can look at
collocations within structures
or we can define the window in terms of
constituents rather than words
14

Measuring significance
The significance of any co-occurrences
nees to be established
Raw co-occurrence frequency counts mean
nothing
Need to be compared to something else

Need to compare a given co-occurrence


with random chance, or with some other
co-occurrence
More detail next time
15

Collocation and synonymy


Collocation is good evidence in discussing
(near) synonymy
Lots of studies take near synonyms and
look to see if the nature of their
relationship can be characterised by their
distribution
In other words: what words does each of
the synonym set collocate with?
Especially useful for language learners
16

Example of sheer and synonyms


(from Partington book)
three senses (LDOCE)
pure, nothing but, eg sheer luck
steep, sheer drop
thin, sheer stockings

(Cobuild) use sheer to emphasize


completeness of state
92 occurrences of sheer (in meaning 1) in
his corpus
17

collocations of sheer
expression of magnitude of weight or volume to right
(20%)
volume, weight, numbers, mass, scale, quantity, size
almost always with article the

expression of force, strength or energy (22%)


energy. exertion, force, muscle, strength, power, pressure, fury,
pace, intensity
usually with the, or a preposition but no article

expression of persistence (14%)


pesistence, irreversibility, obstinacy, indomitability, insistence,
reliability, integrity, hard work
left context: through, because of, out of, expressing causation,
but not the
18

collocations of sheer
nouns expressing strong emotion (11%)
fun, joy, panic, inspiration, enjoyment, terror

nouns expressing extreme personal qualities


(11%)
beauty, glamour, brutality, thuggery, madness, folly

nouns expressing extreme ability or lack of same


(8%)
expertise, competence, virtuosity, gamesmanship

19

Synonyms of sheer - pure


LDOCE definitions, 5 meanings of which two
overlap:
not mixed with anything
complete, thorough

Corpus has 135 examples


Larger variety of syntactic environments (sheer
was always modifying a noun) including
predicative, which sheer does not occur in
*? The drop was sheer
* His fury was sheer
20

Synonyms of sheer - pure


Religious-moral context; sense of unmixed
doctrine, faith, goodness; chemicals, gold

But, many examples where it has an emphasizing function,


like sheer
accident, chance, comedy, guesswork, honesty, idiocy, malice,
nostalgia, pleasure, selfishness, talent, theatre, vulnerability, whim,
wickedness
often with proper nouns (unlike sheer)

No examples of pure collocating with items expressing


magnitude, force or persistence
Some overlap with sheer
personal qualities, emotion (though generally less extreme ones)

Only few examples of pure in prepositional phrase


expressing causation; causes can be sheer, but states are
pure
21

Other synonyms of sheer


Partington does similar analysis of
complete and absolute
Shows that each of the synoynms has
more typical uses and patterns, though
there is some overlap
But there is also clear evidence of
complementary usage

22

Connotation and semantic prosody


Collocation can also be used to illustrate
connotation
secondary implications of a word (Lyons 1977)

Three distinct uses of the term


marker of a particular speech variety (eg lovely)
cultural implications (words used to describe women
show what society thinks of them)
marker of speakers evaluation (firm ~ stubborn)

Semantic prosody (Sinclair 1987)


use of a certain word spreads its connotation over the
whole utterance
23

Some examples
object of commit is often something bad (foul,
deception, offence)
if something is described as rife, it is not good
(crime, disease, mistakes), and describing it as
rife expresses a negative connotation
(speculation is rife)
both the above exemplify unfavourable
prosody, but other prosodies are possible
good example claim vs admit responsibility for
an atrocity
24

More power to your elbow


Examples given in last few slides were
largely subjective
More interesting if we can back up
observations with calculations of statistical
significance
Next time we will look at some simple
statistical measures

25

Potrebbero piacerti anche