Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
and connotation
Barnbrook G (1996) Language and Computers.
Edinburgh: EUP. Chapters 3,4,5
Partington A (1998) Patterns and Meanings.
Amsterdam: John Benjamins. Chapters 1,2,4
Frequency information
Most banal information: counting how
many times a word (type) appears in a
text
Most frequent words will be function
words, so often f counts exclude words
listed in a stop list
Should you count words or lemmas?
Should you distinguish alternate meanings
of ambiguous word forms (if you can)?
3
Frequency information
Frequency information on its own is not
particularly interesting
Quite useful to compare f of related words
eg alternative readings of a given word form (already
seen in probability calculations in tagging)
or comparing near synonyms, especially if we can
take context into account (see later)
Plotting types against tokens for a given text shows us how the TTR
grows as the text gets longer
Typically, the curve starts steeply and then flattens, sooner or later
reflecting homogeneity (or otherwise) of the text
Vocabulary in context
Concordance, also known as KWIC list
(key word in context)
Allows us to see the (immediate)
environment in which a word appears
Listings can be customised to show what
you want more clearly, eg
sorted according to next or previous word
showing more or less context
8
source:
A Partington
Patterns and
Meanings.
Amsterdam
(1998): John
Benjamins
CIWK search
inverted KWIC
specify the context and look to see what
words occur in it
10
Collocation
Term coined by J R Firth (1957) to characterise
(part of) his theory of meaning
You shall judge a word by the company it
keeps
The occurrence of two or more words within a
short space of each other in a text (Sinclair
1991)
The relationship a lexical item has with items
tha appear with greater than random probability
in its (textual) context (Hoey 1991; emphasis
added)
11
13
Collecting collocations
If we are to look beyond neighbouring
words, what constraints might we impose?
Collocation means co-occurrence within
some defined context
possibly a window of n words to left and/or
right
if corpus is tagged/parsed, we can look at
collocations within structures
or we can define the window in terms of
constituents rather than words
14
Measuring significance
The significance of any co-occurrences
nees to be established
Raw co-occurrence frequency counts mean
nothing
Need to be compared to something else
collocations of sheer
expression of magnitude of weight or volume to right
(20%)
volume, weight, numbers, mass, scale, quantity, size
almost always with article the
collocations of sheer
nouns expressing strong emotion (11%)
fun, joy, panic, inspiration, enjoyment, terror
19
22
Some examples
object of commit is often something bad (foul,
deception, offence)
if something is described as rife, it is not good
(crime, disease, mistakes), and describing it as
rife expresses a negative connotation
(speculation is rife)
both the above exemplify unfavourable
prosody, but other prosodies are possible
good example claim vs admit responsibility for
an atrocity
24
25