Sei sulla pagina 1di 3

Huang (2013)

Analysis of Lexical Bundles across Four Varieties

The use of lexical bundles among four varieties is observed and analyzed quantitatively and
qualitatively. Grammatical patterns are categorized based on Biber et al. (1999)’s framework.
Log-likelihood score is applied to identify distinctive bundles in each variety of English; in other
words, if a bundle has significant difference in frequency among the four varieties according to
its log-likelihood score, it will be deemed as a distinctive bundle in the corresponding variety.

In terms of types, the core bundles exist in all four varieties. In terms of frequency, there are
differences observed, for frequencies of some core bundles may vary greatly in different English
varieties. To determine whether the frequency scores of a bundle between corpora are significant
or not, log-likelihood test is applied in the present study because it is an useful test for
comparing the relative frequency of words or phrases across corpora and determining
whether the frequency of an item is statistically higher in one corpus than another (Rayson
& Garside, 2000; Simpson-Vlach & Ellis, 2010). Another merit of log-likelihood is that it
compares frequencies, not only between two corpora but also between more than two corpora;
thus, it is, to date, one of the best and the most vigorous statistical test for the current research.

According to Paul Rayson (http://ucrel.lancs.ac.uk/llwizard.html), log likelihood value is


calculated by constructing a contingency table as follows:

Tetyana Bychkovska a, *, Joseph J. Lee b (2017)

Statistical analysis was performed using log-likelihood tests. Using Rayson's (n.d.) Log-
likelihood calculator,5 token fre- quencies for each structural and functional category and
subcategory in the two corpora were compared to determine whether the differences in the
occurrences were statistically significant. The higher the log-likelihood (LL) value, the more
significant is the difference between the two frequency scores: an LL of 3.84 or higher is
significant at p < 0.05; an LL of 6.63 or higher is significant at p < 0.01; an LL of 10.83 or
higher is significant at p < 0.001; and an LL of 15.13 or higher is significant at p < 0.0001.

5 http://ucrel.lancs.ac.uk/llwizard.html.

Annelie Ädel ⇑, Britt Erman (2012)

The frequency differences across subcorpora were tested for statistical significance, using the
log-likelihood statistic.3 Applying statistical tests goes against the tradition in the literature on
lexical bundles, which is characterized primarily by sim- ple descriptive statistics. Some,
however, such as Simpson-Vlach and Ellis (2010, p. 492), have argued that statistics such as log-
likelihood are ‘‘useful for comparing the relative frequency of words or phrases’’ across corpora.
The bundles in Table S1 are marked with if they occur in the list for only one subcorpus and if
the difference in frequency between the two subcorpora (not shown in the table) does not reach
statistical significance. This symbol is also used below when a bundle is discussed for which
statistical significance was not found. The dispersion cut-offs have been taken into account in
that only when a given bundle meets the dispersion criterion has it been tested for statistical
significance. The tests show that as many as 70% of the lexical bundles occurring in only one list
(43 types in the non-native data and 89 in the native data) do so with a significance level of p <
0.01. While 70% is a large proportion, it is still the case that 30% of the bundles types do not
reach statistical signif- icance—despite the initial frequency and dispersion cut-offs.4 While our
study does not depart from the established procedure for selecting bundles based on simple
descriptive statistics (we merely mark those that are not significant), these results suggest that
future research should consider augmenting the procedures used for bundle selection with more
sophisticated inferential statistics.

3 The frequencies of all of the bundle types in either list were checked against the frequencies in
the other subcorpus, using Paul Rayson’s online calculator

(http://ucrel.lancs.ac.uk/llwizard.html).

4 We also tested the frequency differences of the shared bundles for statistical significance
(suggested by Stefan Gries, p.c.). This analysis showed that most of the shared bundles (87%)
were not used differently by the two groups, but that 13% of the shared bundles were
significantly overused by either group at the level of p < 0.01.

of introducing the topic, considering that the native speakers’ essays mostly involve expository
discussion. This suggests that the non-native speakers’ overuse of this type of metadiscourse
(such as the aim of this) is less strong, but also that they use different wordings from the native
speakers. Finally, it is unclear why the bundle can be used to is used more extensively by the
native speakers. Testing the statistical significance of these shared bundles, we find that, with the
exception of the results from the and can be used to, they are used more often by both group with
a statistical significance of p < 0.01.

Choongil Yoon* Ewha Womans University Ji-Myoung Choi 2015

Finally, log likelihood (LL) was calculated for each bundle across the two corpora to identify
bundles that occur unusually frequently (i.e., overused) or infrequently (i.e., underused) in
NICKLE relative to LOCNESS. In previous studies (e.g., Y. E. Kwon & E. J. Lee, 2014), the
Keywords function in WordSmith was used to retrieve overused and underused bundles, which is
based on LL statistic. Since AntConc was used for this study, we calculated LL values using
Excel.

Shin 2019

Considering that the two corpora each contain approximately 490,000 words, raw frequencies
were used without con- verting them to a normalized rate. The frequencies of all the bundle types
in the two corpora were tested for statistical significance using log-likelihood tests.2

2 I used Paul Rayson's log-likelihood calculator from http://ucrel.lancs.ac.uk/llwizard.html.

Xiaofei Lu a, Jinlei Deng b, * 2019

In comparing the distribution of bundles of different structures or functions between the corpora,
we computed the log- likelihood (LL) ratio of raw frequencies of the bundles using Paul
Rayson's spreadsheet calculator at http://ucrel.lancs.ac.uk/ llwizard.html. An LL of 3.84
corresponds to p < .05, an LL of 6.63 p < .01, an LL of 10.83 p < .001, and an LL of 15.13 p <
.0001. As 32 LL tests were run, the alpha value was adjusted to p < .0016 with Bonferroni
correction. Effect sizes, computed with the same calculator, are reported using Log Ratio
(Hardie, 2014). In Section 4, the term “overuse” or “underuse” is used when the majority of
bundles in a group were used significantly more or less frequently by Chinese writers (Gulquin,
Granger & Paquot, 2007).

Potrebbero piacerti anche