Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. Introduction
(2) "It is no great matter to me," Hotchkiss concluded, "for I had only
the wages of my Portland engagement, and that was no great sum, I
assure you."
(APHB Corpus, B002:21)
In examples (1) and (2) above, the person or entity being referred to by the
pronoun, the antecedent, is easily recoverable from the preceding context —
therefore, these are examples of wbat can be called direct anaphora, where the
anaphor and antecedent are coreferential. Here, a reader or hearer would have
little trouble identifying the antecedent, as tbe nature of tbe link between the
anaphor and antecedent is fairly straightforward. But let us see what happens
wben we are faced with examples like (3) and (4):
(3) In 1973 the government met the premiers ofthe western provinces. Just
the other day we received copies of an update from the Prime Minister
addressed to Premier Barrett on the event ofthe recent conference of
western premiers. Some of that process is worthy of commendation,
which I sincerely extend to the Prime Minister.
(Hansard Corpus, H0205274-76)3
In this section, the three types of IA considered in this paper will be discussed,
as a preliminary to tbe empirical study.
Indirect anaphora 75
2.1 Labelling
(5) Those who have lost patience and manifested that loss in this
demonstration were few in number on Monday, but that number could
grow rapidly if it appears the government has lost interest or is too
preoccupied with other needs even to include an expression of concern
for their interests in the Speech from the Throne. I wish the new minister
success in his difficult task. He will find an early opportunity, I trust in
this debate, to make up for this omission.
(Hansard corpus, H0205332-34)
a. illocutionary nouns
These include sucb nominalisations of verbal processes as accusation, allega-
tion, answer, observation, statement, and can be used strategically by a writer
or speaker, as we will see below in Section 3.6. Here is an example from the
Hansard corpus to illustrate:
(7) However, according to the Evening Telegram of lune 24, the Minister
of Regional Economic Expansion is reported to have said privately that
Canada would act unilaterally to take control ofthe continental shelf off
its coast should the current Law ofthe Sea Conference negotiations not
go mostly our way. In view of that statement, Mr. Speaker, I wonder what
is Canada's position on the continental shelf control issue.
(Hansard corpus, H030673-74)
(9) After all, I reflected, I was like my neighbours; and then I smiled,
comparing myself with other men, comparing my active goodwill
with the lazy cruelty of their neglect. And at the very moment of that
vainglorious thought, a qualm came over me, a horrid nausea and the
most deadly shuddering.
(APHB corpus, BO 100683-84)
d. text nouns
Tbis class of nouns refers to tbe formal structure ofthe discourse. Tbeir use sig-
nals no interpretation, and merely functions to label tbe discourse. Such nouns
include phrase, sentence, word, page, excerpt, section. An example follows:
(10) I appreciate the fact that the minister is not able to be here tonight to
answer my question, but he is well represented by a new member in the
person ofthe hon. member for South Western Nova (Miss Campbell),
who comes into this House as the Parliamentary Secretary to the
Minister of National Health and Welfare and will shortly be making her
maiden speech. For once that phrase seems to apply, does it not?
(Hansard Corpus, H020612-13)
Fraurud (1992a) explored the phenomenon oi situation reference, and gives the
following example, reproduced from (4) above:
78 Simon Philip Botley
Now that we bave considered the distinction between events and things, we
will consider the distinction Fraurud makes between events on the one hand,
and facts and propositions (or factualities in her terms) on tbe other. Many
situation anaphors such as 'it' (and the English demonstratives as we will see
below) refer to events but also to propositions, facts and assertions (and in
more traditional linguistics 'sentences').
Lyons (1977) approached this by means of his distinction between^rsf-or-
der entities (equating approximately to objects), second-order entities (events),
and third-order entities (facts/propositions). Fraurud uses her own terms for
Lyons' second- and third-order entities, talking of 'second-order entities' as
'eventualities', and 'third order entities' as 'factualities'. Fraurud also points out
that 'eventualities' can be divided into such things as activities, accomplish-
ments, achievements and states (Vendler 1967) or events, processes and states
(Mourelatos 1981).
Lyons (1977) proposed the term 'textual deixis', wbich he saw as a mid point
between the deictic and the anaphoric function of pronouns." Lyons' defini-
tion of deixis, which has remained more or less unquestioned,'^ assumes that
deixis links referring expressions to the 'spatio-temporal co-ordinates ofthe act
of utterance' (Lyons 1977:637).'^
Lyons makes a distinction between 'pure textual deixis' and 'impure textual
deixis'. 'Pure textual deixis' describes cases where an anaphor would refer to a
linguistic entity, as in example (12):
(12) A: I've been to Mount Kinabalu.
B: How do you spell that?
Here, the demonstrative anaphor refers to the linguistic form 'Kinabalu' rather
tban tbe referent ofthe noun phrase.^* On the other hand, 'impure textual de-
ixis' is for Lyons closer to some classes of indirect anaphora than 'pure textual
deixis' and can be seen in example (13):
(13) A: I don't know who you're talking about. Inspector!
B: That's a lie!
Here, the demonstrative refers, not to the linguistic form of A's utterance, but to
the proposition expressed by tbe sentence uttered by A. Lyons is in fact includ-
ing under 'impure textual deixis' a wide range of tbird-order entities such as
facts, propositions etc, which Fraurud terms 'factualities'.'^
8o Simon Philip Botley
3.1 Introduction
3.2 Methodology
However, Biber argues tbat one method of reaching some measure of rep-
resentativeness in a corpus sample is to include a wide variety of different texts.
In the three corpus samples used in this chapter, it has not in every case been
possible to do this, because ofthe nature of tbe corpus from wbich tbe samples
were taken, especially with the Hansard sample, whicb is a continuous record of
a parliamentary session and is therefore not easily divided into separate texts.
However, the AP sample does contain a wide variety of news stories on a
variety of topics, and the APHB corpus sample contains a range of narrative
texts whicb deal witb different subject matter and include novels, short stories
and biograpbies. This textual variety does, I argue, follow Biber's method of as-
suring the representativeness of samples taken from the original corpora.
All demonstrative pronouns in the three corpus samples were annotat-
ed using the demonstrative feature scheme outlined in Botley and McEnery
(2001:8-10). Tbis scbeme is briefly described in Table 1 but see Appendix 1 for
some textual examples.
Using the above annotation scbeme, tbis study considered only those cases
whicb had been identified as being Indirectly Recoverable, that is, those cases
that were tagged with the value I in Table 1 (also, see Appendix 1 for examples).
Once all cases bad been identified in the corpus samples, frequency statistics
were obtained with the aid of concordances generated using the WordSmith
concordance tool (Scott 1996). The frequency data from tbe different corpora
were then subjected to a test of statistical significance, namely the log-likeli-
bood (LL) test.
Indirect anaphora 83
Tbe following tables give tbe frequencies for all demonstratives identified as
indirectly recoverable across all tbree genres. Indirectly recoverable demonstra-
tives were so identified using the demonstrative annotation scheme outlined in
Botley and McEnery (2001:8-10) as well as Appendix 1 below.
The first table gives the overall distribution of demonstratives in the three
corpora, along with percentage figures. Tbe figures in columns 3, 5 and 7 in
Table 2 are percentages of tbe total number of cases in each corpus, whereas
the figures in column 9 are percentages ofthe total number of cases in all tbree
corpora. In Table 3, distribution frequencies are given by demonstrative fea-
ture values, e.g. DA = Direct, Anapboric, SH = Syntactic, Head function etc
(Botley & McEnery 2001:8-10).
In this paper, descriptive statistics are enhanced by a significance test, the
log-likelihood measure (LL), wbich shows the statistical significance of differ-
ences in the distribution of IA cases between the three genres being consid-
ered.^" Furthermore, all features whose distribution profiles are statistically
significant are given in bold type, and the cut-off point is marked as an empty
grey row, below which all figures are not significant at the confidence level
given.
butions wbich are statistically significant at this level, some of them highly so,
suggesting some genre-based differences.
For instance, in Table 3, tbe distribution figures for proximal this differ
highly significantly across the genres (>100) v^^here tbe antecedent type is zero
(hardly surprising given the indirectly recoverable antecedents), where tbe di-
rection of reference is anaphoric, where the phoric type is zero (again following
from the indirectly recoverable antecedent where phoric type is not an issue),
and finally wbere the syntactic function of the anaphor is modifier (meaning
that the demonstrative pre-modifies a noun, as in 'this book').
These observations suggest that there are some differences between the
three genres with respect to antecedent type, syntactic function, phoric type
and direction of reference. As well as this, we can make some further general
observations, as follows:
3.4 Labelling
The starting point was to investigate the labelling, or encapsulative function
of demonstrative anapbors with indirectly recoverable antecedents, across the
three genres analysed. This analysis follows Francis' (1989, 1994) distinction
between advance labels (ALs) and retrospective labels (RLs).
Tbe frequencies of RLs and ALs were obtained from each ofthe three cor-
pus samples. The following tables show these frequencies in eacb corpus, ar-
ranged according to demonstrative features.
Tables 4-6 reveal the following general patterns:
1. In all three corpora, RLs vastly outnumber ALs, as would be predicted by
the overall preponderance of anaphoric cases in the data.
2. Across the three samples, modifier demonstratives tend to function as RLs
more frequently than head demonstratives. This aspect will be discussed in
more detail below.
3. In the Hansard and APHB data, modifier and head demonstratives togeth-
er comprise the largest set of RL cases.
4. Generally, proximal demonstratives ('that', 'those') tend to function as RLs
and ALs more frequently than is the case with distal demonstratives ('this',
'these'). This reflects the observation that most of Francis' examples involve
Indirect anaphora 87
However, as we see from Table 7 below, the Hansard sample displays a signifi-
cant amount of proximal RLs compared to the other corpus samples. Table 7
gives the distributions and significance scores for RL cases across the three
genres. Because of their low frequency in the data, ALs are not considered. In
the rest of this paper, the emphasis will be on RLs, and the subtypes identified
by Francis.
From Table 7, we can see that the highest log-likelihood scores (>118) are
associated with singular proximal demonstratives with the features anaphoric,
zero-antecedent, modifier and zero phoric type, suggesting some genre-based
differences with respect to these demonstratives and features. This result main-
ly reflects the general pattern in Table 3 above.
As can be seen from the above tables, I have included head demonstratives
in the figures for labels. This inclusion of head demonstratives in the label
class reflects the observation by Francis (1994:97) that the label sometimes
functions as the predicative complement of the demonstrative anaphor. Also,
from the literature outlined above, it would appear that many cases of antec-
edentless head demonstratives will function in terms of situation reference or
text deixis.
Indirect anaphora 89
However, I argue that there seems to be some overlap between the situa-
tion-referential function and the labelling function with head demonstratives
in certain syntactic environments. For instance, the data revealed a number
of cases where a demonstrative head is in a copular relationship with a lexical
noun or noun phrase which itself labels or encapsulates a preceding stretch of
discourse, as in examples (16) and (17), with the relevant information given in
bold type:
90 Simon Philip Botley
(19) "O, I know it's not evidence, Mr. Utterson; I'm book-learned enough for
that; but a man has his feelings; and I give you my bible-word it was Mr.
Hyde!" "Ay, ay," said the lawyer. "My fears incline to the same point. Evil,
I fear, founded — evil was sure to come — of that connection. Ay, truly,
I believe you; I believe poor Harry is killed; and I believe his murderer
(for what purpose, God alone can tell) is still lurking in his victim's
room. Well, let our name be vengeance. Call Bradshaw." The footman
came at the summons, very white and nervous. "Pull yourself together,
Bradshaw," said the lawyer. "This suspense, I know, is telling upon all of
you; but it is now our intention to make an end of it."
(APHB corpus, B0100793-B0100802)
In this extract from 'Dr Jekyll and Mr Hyde', the protagonists are engaged in a
hunt for the eponymous Doctor's dark nemesis. The lawyer refers to the sus-
pense felt by all in the situation, without that suspense overtly being mentioned
in the text. The antecedent is instead inferable from the actions and feelings of
the characters. One might ask whether this kind of case should be analysed as
'situation reference' but in the absence of a clear definition, this case remains
somewhat problematic.
Labels can be broadly subdivided into general labels (although this is not a term
Francis uses explicitly), and metalinguistic labels. General labels function to
label or package a stretch of discourse, characterising it or evaluating it in some
manner. Metalinguistic labels function similarly to Lyons' 'pure textual deixis'
in that they label a stretch of discourse as a linguistic entity.
In what follows, I explore the metalinguistic function performed by some
labels in the three genres under consideration, with examples and statistics.
To re-iterate, Francis (1989, 1994) identified four subtypes of metalinguistic
label:
92 Simon Philip Botley
- illocutionary nouns
- language activity nouns
- mental process nouns
text nouns
Although the number of cases of metalinguistic label is small, we can see from
Table 8 below that for all three text genres studied, it is largely proximal de-
monstratives labelling illocutionary and language activity entities which are the
most frequent, especially in the Hansard data and the APHB data. These have
the highest log-likelihood scores (16.64 and 12.78 respectively). However, de-
spite this pattern, the frequency for that labelling an illocutionary entity is also
relatively significant (log-likelihood = 10.99).
theseJUocutionary 0 1 4 5.98
thatjanguage act 0 0 2 4.39
that_mental process 0 3 1 4.29
these_text noun 0 1 3 4.29
this_text noun 0 1 0 2.20
thosejanguage act 1 0 0 2.20
these_mental process 1 1 0 1.62
TOTALS 4 22 42
We can see that metalinguistic labels have both low absolute frequencies and
relatively low log-likelihood scores (compared to general labels). However, de-
spite this, we may hypothesise that the general preponderance of metalinguistic
cases in the APHB and Hansard samples stems from the possibility that Han-
sard and APHB both represent narrative genres, with a highly structured dis-
course and (in the case of Hansard especially) with a strong rhetorical aspect.
Therefore, it is not surprising that RLs in general, and metalinguistic labels in
particular, predominate in these genres.
Let us compare the figures for metalinguistic labels in Table 8 with those
for general labels in Table 9.
Indirect anaphora 93
We see from Table 9 that general labels greatly outnumber metalinguistic la-
bels in the three text genres considered in this study (228 against 68). This can
be contrasted with the findings made by Petch-Tyson (2000), who found that
native speaker writers of narrative essays tended to make more use of metalin-
guistic labels than general labels, when compared to non-native writers with
different Ll (first language) backgrounds. The reasons for this would require a
great deal of further work to be carried out, but may provide further evidence
of genre-specific differences in the use of retrospective labels and metalinguis-
tic labels.
Furthermore, it can be suggested that the high preponderance of metalin-
guistic labels in the Hansard sample reflects the fact that the Hansard sample
is composed of parliamentary debates with a heavy reliance upon rhetorical
devices. This may militate in favour ofthe use of labels which mark a speakers
attitude to an utterance or a question from another speaker.
The above observations suggest that there is some genre-based difference
with respect to different types of label, although there is a small amount of
fuzziness in the categorisation of labels in the data analysed. Let us now look at
the other types of IA in this study — situation reference and textual deixis.
3.7.1 Eventualities
These make up the greatest proportion of situation-referring cases in the three
corpora, particularly in the Hansard and APHB samples. Also, as can be seen
from Table 10, the distribution figures for eventualities differ most significantly
across the three genres, although the statistical significance does not appear
great, due to the low absolute frequencies involved.
Here is an example from the Hansard corpus:
(20) No wonder Mr. C. W. Lewis, chairman ofthe bargaining committee of
Local 60, asked Mr. Trudeau on August 22 that the basic wage of grain
inspectors be "at least parity with all Vancouver grain workers." This
justifies the stand that the Perry report is just a springboard for increased
inflation across the country.
(21) These fleets came back empty-handed, because wherever they drew near
the fertile shore with its fine white buildings, regiments ofthe Shah's
army rode down to meet them. At this Stenka Razin and Filka put their
heads together to think what to do.
(APHB corpus, B0201382-83)
Indirect anaphora 95
(22) When he faced them all Stenka Razin explained that henceforth they
must till the farmlands, and carry on trade up and down the river. They
must repair the damage to Astrakhan, and guard themselves from attack.
No longer would the Muscovites give them orders. In this way they
would rule themselves, Cossack fashion.
(APHB corpus, B0201879-82)
3.7.2 Factualities
References to factualities occur to a lesser extent than reference to 'eventuali-
ties', primarily in the Hansard sample, and with chiefly proximal demonstra-
tives, as in example (23):
(23) I am interested in having good food on the table, that is, in good train
service for transporting people and enough cars to carry our goods, so
that we can export manufactured products and create good service to
our ports. This holds true particularly for Vancouver and some parts of
eastern Canada.
(Hansard corpus, H0206458-59)
Here, the demonstrative refers to the argument expressed indirectly in the pre-
vious paragraph. Another example follows:
(24) We Social Crediters have a solution that several still find funny, and
the majority of Canadians still do not accept it. We saw that at the last
election.
(Hansard corpus, H0206013-14)
In example (24), the proposition 'that the Majority of Canadians do not accept
it' is the antecedent. We see here that there is a certain amount of potential
overlap between the directly recoverable cases with propositional antecedents
and the cases of situation reference with 'factuality' antecedents. I argue that
the propositional antecedent cases are those with clear surface markers such as
'that-clauses' whereas the cases I point to above tend to be non-surface propo-
sitions or arguments encoded in a sentence and referred to as facts.
(25) Mr. Speaker, I rise on a point of order. With respect, yesterday I wished
to ask a supplementary question but was overlooked, which does not
seem possible. With all due respect, and I will sit down after I have said
this, members on the front benches have a responsibility.
(Hansard corpus, H0205068-70)
Here, the speaker is referring to his own utterance metalinguistically, yet there
is no labelling function, as the demonstrative is a head noun.
In example (26), the reference is to the previous utterance, but there is an
element of event reference, providing evidence of overlap between the text de-
ictic and event-referential functions:
(26) Holding his hat in his tense hands as he had seen the others do, the
young Stenka explained. "Because, Father, their standard poles no longer
showed against the sky — because the sound of their wagons went away
toward the sea." At this the Ataman lifted his gray head with pride.
(APHB corpus, B0200940-42)
The observations made above suggest some genre-based differences in the dis-
tribution of IA cases referring to situations and functioning as textual deic-
tics, despite the relatively limited amount of data studied. These observations
also start to reveal more strikingly how problematic and fuzzy the categories of
situation reference, and textual deixis can be. Let us complete this analysis by
considering a few examples taken from the three corpora which provide some
serious challenges.
So far in this paper, I have considered only 71% (462 cases) of all indirectly re-
coverable demonstratives in the three corpora studied. Although we can place
these cases into fairly clearly defined categories, at this point, we are faced with
the remaining 29% (186 cases in all) which have proved difficult to classify eas-
ily as labels, situation reference or text deictics.
A subset of these cases (numbering some 100 cases in all, 15.43% of all indi-
rectly recoverable cases) do fall into small classes based on a number of semantic
or syntactic criteria, but the numbers involved are too small to propose any mean-
ingful cross-genre patterns at this stage. The classes can be given as follows:
Temporal reference
Idiomatic cases/discourse markers/genre-specific examples
Indirect anaphora 97
Quantifiers
'Class membership' references
There are some cases where demonstratives are used in a way which does
not appear to be situation-referential, nor encapsulative. This is a loose class,
which, like the temporal examples above, requires further research on a larger
corpus in order to produce firmer conclusions. Some examples appear to be
genre-specific, as we see from the following example (28) from the Hansard
corpus sample:
(28) I solemnly believe that if we do not or cannot make that right a reality
during the life of this parliament with its commitment to bilingualism,
with the very emphatic words ofthe Leader ofthe Opposition in his
speech an hour ago, with the strong French-speaking representation in
government, then it will never be done and separatism will have proved
its point and Canadian unity will cease to have meaning for the majority
of Quebeckers. In that respect, we should be happy about the progress
made during these lastfiveyears.
(Hansard corpus, H0205573-74)
98 Simon Philip Botley
Idioms like 'in that respect' (as well as 'in that regard') would be, according to
Francis, treated as neutral retrospective labels. However, I would argue that it
is difficult to see their encapsulative function, and instead they appear to be
'inherently unspecific' (Winter 1982) idiomatic expressions, which do not have
a specific antecedent, yet in a sense are linked to the preceding context.^^ An-
other noteworthy example comes from the Hansard corpus:
(29) He said he was having a legal opinion researched on the proper course
for censure in anticipation that a formal move would be made in that
direction.
(Hansard corpus, H0205276)
(30) Meanwhile the Volga men put lighted fuses in the powder kegs and
hurled them up over the rail among the Persian soldiers, who were
shattered by the explosions. Then, throwing the torches before them, the
Cossacks swarmed up the sides. Their shout echoed over the still water
— "Sarin na kitchkou!" In this way, by attacking a few vessels at a time,
the Cossacks became masters ofthe enemy fleet except for some smaller
craft that rowed away in panic.
(APHB Corpus)
This example seems to function to link one discourse unit to the next, and does
not refer to any particular segment of the previous discourse. Its discourse-
linking function may be more important than any discourse-referring function
it may perform.
(31) In winning the honor for the fourth time this season, Sampson became
only the third player to earn rookie honors that many times in a single
- season since the award began in 1970.
(AP corpus, A037:33)
In example (31), there is clearly some kind of anaphoric relationship, but it is
not clear what is being quantified. The antecedent has to be inferred from 'the
fourth time'.
Finally, I will discuss a group of some 86 noteworthy cases which were high-
ly challenging for the current analysis. This group seems to elude classifica-
tion for the time being, and may provide support for the claim that indirect
anaphora marks the limit of the corpus approach to anaphora, in that the an-
notation scheme used in this paper does not currently allow such cases to be
clearly delineated.
These cases, as far as it has been possible to discern, do not fit into clear
categories described in this section so far, although many of the cases raise
important research questions. I will select some notable examples and discuss
the issues that they raise.
Table 11 shows the overall distribution of these cases in the three corpora,
along with significance scores.
As can be seen from Table 11, the majority of unclassified cases occur in the
Hansard and APHB corpora, suggesting some division along genre lines. How-
ever, we see that proximal demonstratives are the only significantly frequent
unclassified cases, albeit receiving relatively weak significance scores.
We can, however discuss several issues in relation to these unclassified
cases. For instance, there were many cases in the data where reference is made
to the topic ofa segment of discourse. There were relatively few of these cases,
but here are some of them:
(34) Mr. Speaker, it must first be made clear that the Department of
Agriculture does not run CEMA. CEMA runs the department. CEMA
runs its own business. I will be saying more about this later today.
(Hansard corpus, H0206163-66)
Example (34) is typical of several cases in the Hansard sample where a speaker
appears to be referring to the matter under discussion, or the topic that has just
been introduced. These are akin, I would speculate, to cases of 'this matter' or
'this topic' which would be included as neutral labels. Here is another example:
Indirect anaphora loi
(35) We are talking about STOL. You are stalling. You don't know. I do
know, but I am not going to mention it, for obvious reasons. I am being
perfectly honest. I am telling members the plain truth, but I am not
going to reveal all the details. I repeat, the figure of $25 million which I
mentioned previously is the right one. If Mr. Sinclair contends the STOL
program has cost $135 million, let him prove what he says. I think he is
wrong. He should not consider himself as speaking for the government
of Canada. He is speaking as a citizen, and that is all. My interest in this
is no smaller than his.
(Hansard corpus, H0206416-27)
(36) Marjory seemed glad to see him, and gave him her hand without
affectation or delay. "I have been thinking about this marriage," he
began.
(APHB corpus, BOlOl525-26)
6. Conclusions
This paper has provided some detailed empirical insights into three types of
indirect anaphora in three written English genres, despite a relatively small
data sample (300,000 words) which makes strong generalisations somewhat
premature.
Also, it must be stressed that this paper has taken a deliberately descrip-
tive and comparative approach to the three types of indirect anaphora under
analysis, and there has not been any attempt to speculate on how the cases of
IA might be resolved by humans. In particular, any serious attempt to explore
the role of indirect anaphora in, for instance, automatic anaphor resolution,
has been beyond the scope of this paper, even though the data discussed would
102 Simon Philip Botley
certainly serve as the basis for the development of automatic anaphor resolu-
tion systems.^^
We have instead made several observations concerning the distribution
of indirect anaphora involving demonstratives in parliamentary exchanges,
newswire stories and literary narrative. We offer support, for instance, for the
hypothesis that argumentative genres such as parliamentary proceedings make
much use of retrospective labels, especially those which characterise and eval-
uate a stretch of text. It would be worth analysing a larger stretch of similar data
or court proceedings to further explore this notion.
Another genre-based observation that deserves further investigation is a
seemingly higher proportion of proximal demonstratives functioning as RLs
in the Hansard data compared to the other corpus samples. This may have
a number of explanations, one of which might be that parliamentary speak-
ers often label utterances which have only just been said, either by themselves
or by other speakers; and another might be that parliamentarians make more
metalinguistic references than speakers in other genres.
However, despite the appearance of clear genre-based patterns, this paper
has explored a number of challenging issues. For instance, an underlying as-
sumption of modern corpus-based linguistics (CBL) is that empirical, quan-
titative descriptions of language should be at least on an equal footing with
rationalistic ones based on intuition (McEnery & Wilson 1996:16). One of
the corrolaries of this is that a corpus-based study of language should be able
to provide observations which either confirm or deny rationalistic intuitions
about language.
This can only be achieved realistically if categories such as 'indirect anapho-
ra' or 'retrospective label' are easy to identify (using corpus annotation) and to
count. If this is not the case, then what can result is fuzziness and ambiguity.
This paper has shown that indirect anaphora definitely poses difficulties for
corpus-based linguistics, in that almost 30% of IA cases analysed were hard to
classify straightforwardly, whether or not an annotation scheme was used.
This is because antecedents lack clear surface linguistic boundaries (such
as situation reference and some ofthe unclassified cases), where the inference
process for retrieving an antecedent is complex, or unclear (as in situation ref-
erence and text deixis), or where overlapping definitions make it difficult to
make hard and fast analysis decisions (as is the case with most types of indirect
anaphora). Therefore, indirect anaphora reveals some limitations of descriptive
corpus-based linguistics.
Furthermore, although it has been possible to identify some patterns in the
corpora, especially in the area of retrospective labels and situation reference.
Indirect anaphora 103
this has sometimes been challenging, because some cases could not be straight-
forwardly described, and therefore assigned an unambiguous annotation
symbol.
Even though Botley and McEnery's (2001) demonstrative feature annota-
tion scheme was applied mostly successfully, the information contained in the
feature tags is limited — we can at best retrieve information about the sur-
face syntactic function ofthe anaphors and their antecedents. Full information
concerning the referential and discourse functions of demonstratives, as well
as the inferential complexity associated with some indirect anaphors, is more
difficult to obtain using the scheme as it currently stands. Therefore, further
development is required before a corpus annotation scheme can provide richer
information about indirect anaphora and its various complexities.
Also, an issue that needs to be addressed is the reliability ofthe annotation
scheme used, and the extent of agreement between annotators as to the appli-
cation of tags to particular cases of LA. In this study, all ofthe annotation was
carried out by one analyst — which to an extent side-steps the issue of agree-
ment between annotators.
However, the reliability of the annotation process still remains an issue to
be considered in future work, given that there is plenty of evidence in the lit-
erature that users of annotation schemes do not always agree on how to apply
them (see Baker 1997 and Poesio & Vieira 1998).
Despite the challenges inherent in this work, the methodological issues
raised have a positive outcome, because they force us to re-evaluate existing
analytical categories, such as label or situation reference, reinforcing the value
of naturally-occurring language data in helping us to provide a rigorous and
complete description of English discourse.
Notes
* The author was formerly in the Department of Linguistics and Modern English Language
at Lancaster University, UK. He is now an Associate Professor in the Language Department,
MARA University of Technology, Kota Samarahan, Malaysia.
1. American Printing House for the Blind Treebank — contains mostly literary extracts
and motivational stories. Total size: 200,000 words. Available from http://www.comp.lancs.
ac.uk/computing/research/ucrel/corpora.html. A 100,000 word sample of the corpus was
used for this paper.
2. All corpus examples are given codes like this, which are references to the computer file
name plus the line numbers in which the examples are found.
104 Simon Philip Botley
3. The Hansard Corpus contains proceedings from the Canadian House of Commons
throughout the 1970s. Its total size is 750,000 words and it is available from http://www.
comp.lancs.ac.uk/computing/research/ucrel/corpora.html. A 100,000 word sample was
used for this paper.
4. It is worth mentioning here the phenomenon of'bridging reference' (Clark 1977), also
known as associative anaphora (Hawkins 1978), where the link between a definite descrip-
tion and its 'anchor' has to be inferred. One example might be: "I have just decorated my
house. The door is now red". In this paper, I have left bridging reference out ofthe scope of
indirect anaphora primarily because bridging references tend to involve definite NPs rather
than demonstratives, which are the focus of this paper.
5. Though see Halliday and Hasan (1976), Krenn (1985), d'Addio (1988, 1990) and Conte
(1980,1981,1996) for other perspectives on the encapsulative function of NPs.
6. Francis admits that there is some overlap between the illocutionary type and the lan-
guage activity type. It is possible to identify a broad continuum from purely mental pro-
cesses to purely verbal ones. (ibid.:92-93).
7. See Fraurud (1992a) for a detailed treatment ofthese aspects of situation reference in the
literature.
8. This definition is generously inclusive, covering as it does a wide variety of different re-
ferring phenomena. Many of the cases of indirectly recoverable anaphora examined below
fall into these categories and also some aspects of the directly recoverable cases examined
in Botley and McEnery (2001) are covered by this term, a fact which may cause confusion
which is worth addressing. Botley and McEnery (2001) particularly refer to the propositional
antecedent type cases, which would presumably be included as 'factualities' by Fraurud.
However, these cases are treated here as examples of anaphora with directly recoverable
surface antecedents in the form of'that-clauses' rather than as cases of indirect anaphora.
11. Lyons argued that deixis was more basic than anaphora.
12. Though see Gundel et al (1988), Ehlich (1982) and Bosch (1983) for different views of
the definition of deixis and anaphora.
13. C. Lyons (1999) makes a distinction between deixis meaning 'closeness to or association
with some centre (typically the speaker and the moment of utterance)' and deixis in the
sense of directing a hearer's attention toward a referent (Lyons 1999:160). C. Lyons points
out that I. Lyons (1977) uses the term 'deixis' in both senses, and therefore reserves the term
'ostention' for the second ofthe above two senses ofthe term.
15. Fraurud notes that Lyons seems to omit to include second-order entities — her eventu-
alities — among the class of referents involved in impure textual deixis. The reasons for this
omission are not clear.
16. As Fraurud notes, Levinson refers to cases like this as 'mention' or 'quotation'.
18. Although for lower-frequency features, such as subject that-clauses, such small samples
may not provide enough examples.
19. See Biber (1993), McEnery and Wilson (1996:63-65), and Aston and Burnard (1998:21-
40) for discussions of issues of corpus design, sampling and representativeness.
20. With thanks to Dr. Paul Rayson at Lancaster University for providing guidance in im-
plementing the significance test.
21. 2 degrees of freedom are considered normal for frequency profiles with three columns,
as is the case with all tables in this paper.
22. The data used in Francis' study was primarily newspaper material.
23. However, if we paraphrase these examples so that the demonstrative is rescued from
the idiom, these cases become much more specific in reference, as in 'We have not met our
commitment with respect to that'.
24. See Grosz, loshi and Weinstein (1995) and Walker et al (1998).
25. But see the work of Poesio, Vieira and Teufel (1997), Poesio and Vieira (1998) and Byron
(2002) in this regard.
References
Aston, G. & Burnard, L. (1998). The BNC Handbook: Exploring the British National Corpus
with SARA. Edinburgh: Edinburgh University Press.
Bauerle, R. (1988a). Ereignisse und Reprasentationen. LILOG-Report 43. Frankfurt: IBM
Germany.
Bauerle, R. (1988b). Aspects of Anaphoric Reference to Events and Propositions in German.
Unpublished Ms.
Baker,). R (1997). Consistency and accuracy in correcting automatically tagged data. In R.
Garside, G. Leech & A. McEnery (Eds.), Corpus Annotation (pp. 243-250). London and
New York: Longman.
Biber, D. (1993). Representativeness in Corpus Design. Literary and Linguistic Computing
8 (4), 243-57.
Biber, D. (1990). Methodological Issues Regarding Corpus-based Analyses of Linguistic
Variation. Literary and Linguistic Computing 5, 257-269.
Biber, D., Conrad, S. & Reppen, R. (1998). Corpus Linguistics: Investigating Language Struc-
ture and Use. Cambridge: Cambridge University Press.
io6 Simon Philip Botley
Krenn, M. (1985). Probleme der Diskursanalyse im Englischen. Verweise mit this, that, it und
Verwandtes. Tubingen: Narr.
Levinson, S. C. (1983). Pragmatics. Cambridge: Cambridge University Press.
Lyons, C. (1999). Definiteness. Cambridge: Cambridge University Press.
Lyons, J. (1977). Semantics. Volume 1 and 2. Cambridge: Cambridge University Press.
McEnery, A. M. & Wilson, A. (1996). Corpus Linguistics. Edinburgh: Edinburgh University
Press.
Mourelatos, A. P. D. (1981). Events, processes and states. In P Tedesci & A. Zaenen (Eds.),
Syntax and Semantics Volume 14: Tense and Aspect (pp. 191-212). London: Academic
Press.
Petch-Tyson, S. (2000). Demonstrative expressions in argumentative discourse: a computer
corpus-based comparison of non-native and native English. In S. P Botley & A. M.
McEnery (Eds.), Corpus-Based and Computational Approaches to Discourse Anaphora
(pp. 46-66). Amsterdam: John Benjamins.
Poesio, M. & Vieira, M. (1998). A Corpus-based Investigation of Definite Description Use.
Computational Linguistics 24 (2), 183-216.
Poesio, M., Vieira, M. & Teufel, S. (1997). Resolving bridging references in unrestricted text.
In Proceedings ofthe ACL Workshop on Operational Factors in Robust Anaphor Resolu-
tion (pp. 1-6). Madrid, Spain, July 1997.
Reichenbach, H. (1947). Elements of Symbolic Logic. Toronto: Macmillan.
Schuster, E. (1986). Towards a Computational Model of Anaphora in Discourse: Reference to
Events and Actions. MS-CIS-86-34, LINC LAB 17, University of Philadelphia.
Scott, M. R. (1996). WordSmith Tools. Oxford: Oxford University Press.
Vendler, Z. (1967). Linguistics and Philosophy. Ithaca: Cornell University Press.
Walker, M. A., Joshi, A. K. & Prince, E. F. (Eds.). (1998). Centering Theory in Discourse.
Oxford: Clarendon Press.
Webber, B. L. (1991). Structure and Ostention in the Interpretation of Discourse Deixis.
Language and Cognitive Processes 6 (2). 107-135.
Webber, B. L. (1987). Two Steps Closer to Event Reference. MS-CIS-86-74, LINC LAB 159,
University of Philadelphia.
Webber, B. L. (1979). A Eormal Approach to Discourse Anaphora. PhD Thesis, Harvard Uni-
versity. New York: Carland.
Winter, E. O. (1982). Towards a Contextual Grammar of English. London: Allen and Un-
win.
Author's address
Simon Botley
Language Department
MARA University of Technology (UiTM)
Jalan Meranek, Kota Samarahan
Sarawak, East Malaysia
Phone: (6) 82 677593
Email: spbotley@sarawak.uitm.edu.my
io8 Simon Philip Botley