Code Switch

Structured Variation in Codeswitching: Towards an Empirically Based Typology of Bilingual Speech Patterns
Margaret Deuchar ESRC Centre for Research on Bilingualism in Theory and Practice, University of Wales, Bangor, UK Pieter Muysken Radboud University, Nijmegen, The Netherlands Sung-Lan Wang School of Linguistics and English Language, University of Wales, Bangor, UK
This paper aims to accomplish two things: first, to develop precise criteria to establish profiles for bilingual speech, following the typology of insertion, alternation and congruent lexicalisation developed in Muysken (2000); and second, to test these criteria on specific data sets. A first set involves Welsh English bilingual data analysed by Deuchar, a second set comprises Tsou Mandarin Chinese data collected and analysed by Sung-Lan Wang, and a third set involves Taiwanese Mandarin Chinese data, also collected and analysed by Sung-Lan Wang. We conclude that it is indeed possible to establish more precise quantitative profiles which capture the intuition that different data sets show different codeswitching properties, but that there are a number of conceptual and methodological issues that require further investigation.
doi: 10.2167/beb445.0
Keywords: bilingual speech, codeswitching, Chinese, Mandarin, Taiwanese, Tsou, Welsh
Introduction
In many bilingual communities there is bilingual speech involving extensive codeswitching or codemixing (we will adopt the first term here), as has been shown in recent studies such as Clyne (2003), Muysken (2000), Myers-Scotton (1993, 2003) and Poplack (1980); see also Sankoff et al. (1990). Many studies of codeswitching are based on the collection of corpora of bilingual speech. Those corpora typically manifest very diverse language interaction phenomena, ranging from lexical borrowings to phonetic interferences, and from semantic calques to mixed sentences and wholesale language switching. Some authors treat the corpora collected primarily as the potential source for individual codeswitching examples, meant to confirm or argue against a particular model or constraint for codeswitching in general. In this case the
1367-0050/07/03 298-43 $20.00/0 The International Journal of Bilingual Education and Bilingualism 2007 M. Deuchar et al. Vol. 10, No. 3, 2007
298
Structured Variation in Codeswitching
299
overall features of the corpus play much less of a role, if any role at all. This is not the approach we want to take here. On the contrary, we want to link the study of codeswitching to the insight from the sociolinguistic research of the last 40 years or so that bilingual speech, just as monolingual speech, shows variation, but that this variation is patterned or structured, not random. However, no models have gained consensus so far to systematically explore the patterns in bilingual speech. While there are many similarities between the data sets of recorded bilingual speech encountered, there are also many differences in the types of switches and mixes, and in their frequency. These differences may have been caused by various factors, including the contrasting typological features of the languages involved, the language competence of the bilingual speakers, and social factors, such as the type of community, power relations between the speakers of the languages, and language attitudes (Muysken, 2000). Finally, the way that the data have been recorded may also play an important role. To study the role of these factors adequately, and their weight in contributing to the differences in patterns of bilingual language use, we need to be able to draw up adequate codeswitching profiles for the recorded data-sets. These profiles involve the individual switches as well as the overall switching pattern in a corpus. In Muysken (2000) a set of criteria was outlined to help classify individual codeswitches in terms of a specific typology. Muysken (2000) characterised the phenomenon of codeswitching in terms of three competing bilingual speech strategies:
.
. .
the insertion of material (most often a word or a constituent) from one language into an utterance in another language; the alternation between stretches of words in different languages; and the congruent lexicalisation of a shared language structure with words from different languages.
These strategies were structurally defined, a number of their diagnostic traits were listed, and an attempt was made to link their occurrence to various crosslinguistic and extralinguistic factors, both psycholinguistic and sociolinguistic. Here we aim to complement the theoretical approach taken by Muysken and further test the hypotheses put forward by him. For this purpose, it turned out that the diagnostic criteria proposed needed to be more sharply defined to allow testing. In this paper we therefore aim to accomplish two things: first, to develop precise criteria to establish profiles for bilingual speech, and second, to test these criteria on specific data sets. A first set involves Welsh English bilingual data analysed by Deuchar (see also Deuchar, 2006), a second set comprises Tsou Mandarin Chinese data collected and analysed by Sung-Lan Wang (see also Wang, 2007), and a third set involves Taiwanese Mandarin Chinese data, also collected and analysed by Sung-Lan Wang. For reasons to be made clear in the section on The Approach Taken by Muysken (2000), individual switches cannot always be unambiguously classified as belonging to a specific type. For this reason, we prefer to profile bilingual data sets in their totality.
300
The International Journal of Bilingual Education and Bilingualism
This paper has the following structure. In the next section we briefly survey earlier attempts to characterise bilingual speech corpora, and then we outline the approach taken by Muysken (2000). In the fourth section we propose a quantitative implementation of Muysken (2000). Here we describe the way we will define the criteria used to establish the profile, and in the fifth section we illustrate how this would work with reference to Welsh English (the first subsection), Tsou Mandarin (the second subsection) and Taiwanese Mandarin (the third subsection) data. In the sixth section we consider the problems raised in applying the proposed implementation to data, and consider an alternative approach using a rigorous decision tree model. We illustrate this with reference to some of the examples analysed in the fifth section. In the final section we conclude and make suggestions for further research.
Earlier Attempts to Characterise Bilingual Speech Corpora

It is important to situate our own research in the codeswitching research tradition, in which a number of attempts have been made to characterise bilingual speech corpora. It should be kept in mind that not all researchers explicitly acknowledge this to be an important task; if the corpus is simply viewed as the source for interesting individual examples of codeswitching, the need to characterise the overall corpus is of secondary importance. The attempts to characterise a corpus can be classified under various headings: The absolute use of different languages . The most basic information, of course, concerns the use of both languages: e.g. roughly how much French is spoken, and how much English, etc. Strikingly few studies actually provide this type of information in detail, partly because it is often the case that only the portions of the corpus showing extensive codeswitching were transcribed. The same holds for analyses counting turns in the different languages used. Number of loans and/or single word switches . A second measure concerns the absolute number of loans or single word switches in the corpus. This information is present in many descriptions of corpora, at least for the parts showing extensive language interaction phenomena and hence transcribed. Categorisation of different phenomena and number of different types of multiword switches . Many studies contain quantitative information about the nature of the constituents switched (noun phrases, verb phrases, etc.). There is a moderate amount of variation in the categories employed. Directionality of switching . Most studies address this issue explicitly, and from almost all studies basic information about directionality can be gathered, although not all authors think directionality is crucial (it does not play a role in the findings in Poplacks 1980 New York study). Typology of language interaction phenomena . Yet another dimension concerns a qualitative typology of language interaction phenomena, used as a cover term here for codeswitching, borrowing, interference, etc. What types of distinctions
Table 1 Characterisation of a number of corpora in terms of the quantitative treatment of language interaction phenomena
Absolute use/ overall language choice Single-word switches in both languages Distinction between tag- and intrasentential switches; lists of switched constituents Explicit listing in Inter-, extra-, both intrasentential, coord. S, directions adv, inside VP, main/ subord. S, apposition/ dislocated element., PP, inside NP, P/NP; n, adv, adj, conj, v, preposition, pron, num Clearly Swahili Copula VP-complements; PP; time adverbials, N, English copula VP-complements; set expressions; main clauses; complement clauses; miscellaneous n, v, adj, adv, conj, greetings/interjection; independent clause, dependent clause, intraclause, disjointed element Not indicated Only cursory information Some brief notes Single-word switches in both languages Explicit listing in both directions Yes Number of loans/ single-word switches Categorisation of phenomena Directionality of switching Typology of language interaction phenomena Likelihood of switches at different sites
Author and date
Languages and location
[Poplack, 1980]; Spanish Eng- Not given [Sankoff & lish. New York Poplack, 1981]
[Nortier, 1989]
Moroccan Arabic Dutch. Netherlands
Only part of the material transcribed (because of switches)
Not available
[MyersSwahili EngScotton, 1993] lish. Kenya
General indication of Swahili as overall language Single-word switches in both languages
Table with single English forms per category
Discussion about borrowings
Not available
[GardnerChloros, 1991]
French Alsatian. Strasbourg
Survey data: overall use of Fr, Als, and switching
Multiple word switches
Not available
301
302
Table 1 (Continued ) Absolute use/ overall language choice Single-word switches in both languages and type/token ratio Information given % of Cree and French V, Adv, adpositions, synthetic verb forms Different TTR for assimilated borrowings and single-word and phrasal switches Excluded: discourse markers, coordinate conjunctions, metalinguistic talk, proper nouns, reported speech, assimilated borrowings; N and NP, A nom. modifiers, V, Adv and AdvP, backtracking, clausal Detailed discussion Considerable discussion about individual cases Not available Directionality clearly discussed in both studies n, interjection, adv, adj, v, conj, prep, pron, det, mod/ aux/cop; NP, PP, AP, S, non-tensed clause, relative clause, embedded question Both directions Full constituent; multiple constituent; non-constituent Number of loans/ single-word switches Categorisation of phenomena Directionality of switching Typology of language interaction phenomena Likelihood of switches at different sites Not available
Author and date
Languages and location
[Treffers-Daller, 1994]
French Dutch. Brussels
Total no. of words in both languages
[Backus, 1996]
Turkish Dutch. Netherlands
Not given explicitly
Fairly detailed discussion of switch types Not much information about constituency; abundant further descriptive information
Not available Not available
[Bakker, 1997]
Cree French. Canada
Not available
[Halmari, 1997]
Finnish English. USA
Not given
Indication of directionality
Fair amount of discussion
Not available
303
are made? Most studies distinguish inter- from intrasentential switching (giving some kind of criterion for distinguishing the two), and possibly also a category like exclamations, tag-switching, etc. However, the corpora tend to be underdescribed as far as phenomena beyond lexical switching are concerned, such as pronunciation and intonation, interference and calques, semantic borrowings, etc. (but see Clyne, 2003). Likelihood of switches at different sites . This is a little studied topic, but Sankoff and Poplack (1981) tried to quantify the likelihood of switches at particular syntactic boundaries, as compared to those same boundaries in monolingual contexts; this involved extensive analysis of monolingual syntactic contexts, and hence, potential switch sites. Others have quantified switches in terms of where they occur and/or type of switch, but have not to our knowledge contrasted the occurrence versus non-occurrence of switches quantitatively. In Table 1 a few of the studies are summarised in terms of these headings, which help profile each bilingual corpus. It is clear that the sources diverge widely in terms of the quantitative profiling provided.
The Approach Taken by Muysken (2000)

Muysken (2000) suggests that there are three main codeswitching patterns that may be found in bilingual speech communities: insertion, alternation and congruent lexicalisation. One pattern will usually dominate, though not necessarily to the exclusion of other patterns. In the insertion pattern, one language determines the overall structure into which constituents from the other language are inserted: this is illustrated in Figure 1, based on Muysken (2000: 7). This pattern is assumed by the Matrix Language Frame (MLF) theory proposed by Myers-Scotton (e.g. 1993). It can be illustrated in (1) by a Swahili English example1 (English in bold) from Myers-Scotton (1993: 86): (1) a-na-ku-l-a plate 3SG-PRS-NFIN-eat-IND He eats two plates of maize. m-bili CLM 10-two z-a CLM 10-of murram maize
In this example the word order is as in Swahili, including the phrase plate m-bili two plates, and all the inflectional morphology is from Swahili. The
....a.
....b.
....a.
Figure 1 The insertional codeswitching pattern19
304
a.
Figure 2 The alternation pattern
b .
asymmetry between the two languages involved in the insertional pattern is captured in the MLF by labelling the main language the matrix language and the other the embedded language. In the alternation pattern, both languages occur alternately, each with their own structure, as illustrated in Figure 2 based on Muysken (2000: 7). This type of codeswitching is assumed in Poplacks work (e.g. 1980) and is well represented in her Spanish English data (English shown in bold) in examples like (2) (Poplack, 1980: 594): (2) o, your fathers a Puerto Rican, si tu eres puertorriquen if PRO.2SG be PRS.2SG Puerto-Rican you should at least de from ol hablar espan speak.NFIN Spanish If youre Puerto-Rican, youre fathers a Puerto Rican, you should at least sometimes speak Spanish. In this example the pattern of alternation is particularly clearly represented o and your fathers a in the switch between the clauses si tu eres puertorriquen Puerto Rican as well as in the way the language material switches back and forth from Spanish to English to Spanish to English and finally back to Spanish. In the third type of codeswitching, congruent lexicalisation, the grammatical structure is shared by languages A and B , and words from both languages a and b are inserted more or less randomly (Muysken, 2000: 8). This is illustrated in Figure 3 based on Muysken (2000: 8). Muysken proposes this type with reference mainly to standard variety/dialect mixing. This type can be illustrated by Example (3) from the Ottersum dialect (in bold) and standard Dutch: (3) ja maar yes but bij ouwe mensen komt dat gauw-er tot stilstand als with older people comes that quick-er to-a halt than vez time en to cuando, you know, time
305
bij jonge mense with younger people
wa eh
Yes but with older people that comes to a halt more quickly than with younger people. (Muysken, 2000: 130, citing Giesbers, 1989: 147) In this example the fragments from each variety apparently do not form grammatically coherent chunks. We have so far illustrated the three types on an impressionistic basis, using examples containing several switches and assuming that all these switches fall into the same category. However, this assumption could easily be challenged with Example (4) (from Poplack, 1980: 589), where the first English Spanish s pa que ) looks more representative of congruent switch (sentarse atra s belong to the previous lexicalisation, as its first two elements sentarse atra clause, and its second part pa que to the second clause, while the second switch (pa que se salga ) is a case of alternation, as it involves a coherent purposive complement clause at the end of the utterance. s (4) Why make Carol sentar-se atra pa que everybody has to move sit.NFIN-REFL at_back for that pa que se for that REFL salga? get_out.SUBJ
Why make Carol sit in the back so that everybody has to move for her to get out? We shall argue below that we need to focus on individual switches, once we have defined what these are, in order to arrive at a more rigorous analysis. Muysken (2000) suggests that the type of codeswitching that is prevalent in a corpus can be identified by the use of a set of diagnostic features. Each type can be associated with a set of specific values for those features, and the set of
A/B
Figure 3 The congruent lexicalisation pattern
306
Table 2 Diagnostic features of the three patterns of codemixing Insertion Constituency Single constituent Several constituents Non-constituent Nested a b a Not nested a b a Element switched Diverse switches Long constituent Complex constituent Content word Function word Adverb, conjunction Selected element Emblematic or tag Switch site Major clause boundary Peripheral Embedding in discourse Flagging Dummy word insertion Bidirectional switching Properties Linear equivalence Telegraphic mixing Morphol. integration Doubling Homophonous diamorphs Triggering 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Alternation Congruent lexicalisation

Table 2 (Continued ) Insertion Mixed collocations Self-corrections 0 Alternation
307
Congruent lexicalisation -
, indicative of a specific pattern; -, counterindicative of a specific pattern
values for a specific corpus can be expected to match one of the three patterns more than the others. This works because each proposed feature has a value associated with each type of codeswitching. The features, together with the values associated with each codeswitching type, are listed in Table 2, based on Muysken (2000: 230, Table 8.1). As shown in Table 2, the features are grouped under four headings: constituency, element switched , switch site and properties of the switch. The features will be considered in more detail in the next section, but for the moment we can focus on how different feature values are expected for different codeswitching patterns. For example, we can see in Table 2 that the feature content word has a in the column relating to the insertion pattern, but a - in the column relating to the alternation and congruent lexicalisation patterns. This means that if a particular switch is a content word and thus has a positive value for this feature, it is more likely to reflect the insertion pattern than the other two. If a switch is not a content word, however, and has a negative value for this feature, this would be indicative of either the alternation or the congruent lexicalisation pattern. Of course the features cannot be considered in isolation: the entire set must be applied to each switch insofar as is possible. Muysken (2000: 231) provides a list of those features that have a positive value for each codeswitching type. For example, content word and single constituent have positive values for insertion, several constituents and long constituents for alternation, and non-constituent and function word for congruent lexicalisation. Muysken goes on to illustrate how his features might be used to test specific predictions about the predominant codeswitching type to be found in various corpora of existing data. For example, he suggests that Pfaffs (1979) Spanish English data may be more insertion-like than Poplacks (1980) Spanish English data because of a lesser diversity in the grammatical category of switches and a higher proportion of noun phrase switches. In relation to Nortiers (1989) data he suggests that the insertional pattern is dominant but that all three patterns occur. He reaches this conclusion on the grounds that there are frequent occurrences of switches with specific features associated with the insertion pattern: single constituent, nested a b a (see next section for discussion of this feature), selected element, major clause boundary, morphological integration. On the other hand, he also finds switches having features associated with the other two patterns, such as several constituents and peripheral (associated with alternation) and linear equivalence and non-constituent (associated with congruent lexicalisation). Muysken (2000:
308
238) lists the relative frequency of switches with these features using terms such as frequent, often, some cases and few cases. In the next section we will outline our first attempt to develop a rigorous quantitative method to apply Muyskens features to both individual switches and entire corpora, in order to determine what the predominant codeswitching pattern is and whether there are any minor patterns. Before we do this we need to outline the extralinguistic dimension of Muyskens proposal. Whereas previous approaches to codeswitching predicted that a specific model would account for all patterns to be found in any speech community, Muysken (2000: 8 9) proposes that the pattern will vary according to both linguistic and extralinguistic factors. The way in which he suggests this might work is summarised in Table 3. In this table we see that, in terms of linguistic factors, insertion and alternation are favoured by typological distance between the languages involved, whereas congruent lexicalisation is seen to be more likely when the two languages are typologically similar. Although both insertion and alternation are predicted by typological distance, the two patterns are associated with different extralinguistic factors. For example, insertion is predicted to be more likely in colonial settings2 and where there is asymmetry in the speakers proficiency in the two languages, whereas alternation is predicted in stable bilingual communities where there is a tradition of language separation in the sense that people believe that the two languages should generally be kept separate. Where there is no such tradition, however, and the languages are typologically similar, congruent lexicalisation is more likely. As indicated in the previous section, the three main patterns of codeswitching are not expected to be either watertight or static, however. More than one pattern of codeswitching may be found in a corpus of data, though it is likely that one will be predominant. It is also possible that the predominant pattern of codeswitching may change over time. Muysken (2000: 249) suggests that prolonged language contact may lead to a change of pattern, and that in particular, an insertional pattern may change to one of either alternation or congruent lexicalisation, again depending on both linguistic and extralinguistic factors. He suggests that alternation will be favoured by strong norms, language competition and typological distance, whereas congruent lexicalisation is more likely in the case of non-rigid norms of correctness in speech, balance between languages and structural parallels.
Muyskens (2000) Framework: A Quantitative Implementation

In the previous section we outlined Muyskens (2000) proposal for a typology of codeswitching, including predictions for the identification of codeswitching type in specific data sets as well as in communities that could be characterised by particular combinations of structural linguistic and extralinguistic features (see Table 3). As in the examples described above, the predictions presented for specific data sets made use of some, but not all, of Muyskens proposed linguistic features as set out in Table 2. In this section we shall describe an initial attempt to make use simultaneously of all the proposed
309
Table 3 Muyskens view of the relation between codemixing patterns and extralinguistic factors Codemixing pattern Insertion Linguistic factors favouring this pattern Typological distance Extralinguistic factors favouring this pattern Colonial settings; recent migrant communities; asymmetry in speakers proficiency in two languages Stable bilingual communities; tradition of language separation Two languages have roughly equal prestige; no tradition of overt language separation
Alternation Congruent lexicalisation
Typological distance Typologically similar languages
features by means of a quantitative implementation that involves calculating three scores for each switch in a corpus: an insertion score, an alternation score and a congruent lexicalisation score. The higher the score, the better the match to the pattern in question. So if a switch scores 4 on insertion, 10 on alternation and 2 on congruent lexicalisation, for example, we may conclude that it matches the alternation pattern best. Adding up the three scores for all switches in a corpus will allow us to determine, on the basis of which score is highest, the pattern that is matched best in the corpus as a whole. This allows for the possibility that there may be a secondary pattern also. What is a switch? Previous work on codeswitching has not always made it very clear what is to be identified as a switch. As Treffers-Daller (1994: 203) points out Many researchers in the field concentrate on switch points, and define as precisely as possible between which elements switching is (im)possible. However, Muyskens approach pays attention not only to the switch site but also to the characteristics of the element switched. We shall continue to pay attention to both here, so that a switch should be understood as indicating switched material. It is the definition of switched material that can be tricky. In a bilingual sequence consisting of ABABAB, for example, where A and B represent two different languages, the switch site is relatively easily identified as the point of transition between A and B. It is more difficult, however, to decide how to identify the element switched. Should we take a sequential approach, saying that the first occurrence of B in the string AB ABAB is a switch because it involves a change of language from A to B? If so, does the second occurrence of A in ABA BAB count as a switch too, because it involves a change of language from B to A? Or do instances of A not count as switches because the utterance started with A? In this case only the instances of B in AB AB AB would count as switches. But given that language is hierarchical, involving constituency, as well as sequential, a purely sequential approach does not seem satisfactory. This is because it leads to either treating virtually all stretches of bilingual discourse as switches or to deciding on a fairly
310
arbitrary (purely sequential) basis which language in an utterance counts as switched and which does not. In the analysis to be illustrated in the next section, we work on a clause-by-clause basis. Each clause is first identified as bilingual or monolingual, depending on whether it contains material taken from one or two languages. If it is bilingual we identify the matrix language of the clause on the basis of word order and subject verb agreement, insofar as this is possible. If the matrix language can be identified in this way then any continuous material in the clause which is in another language counts as switched material and is analysed as such. This can be illustrated by the clause from the Welsh English data (English in bold) shown in (5): (5) oedden nhw mor desbrad though be.3PL.PST PRO.3PL so desperate They were so desperate though. [MEP130]
In this example the word order is VSO as in Welsh, rather than English, and subject verb agreement is in Welsh. We thus conclude that the matrix language is Welsh, and thus the material in the non-matrix language, English, though , counts as a switch. We would of course have identified the same material as a switch if we had adopted a sequential approach, but we shall see from some examples reported below that this would not always be the case. Sometimes it is not possible to determine the matrix language of the clause, as in the example in (6) where there is no verb and the word order Negative NP could equally well be English or Welsh: (6) dim AIDS NEG Not AIDS [MEG177]
In this type of case we resort to a sequential approach, identifying the matrix or basic language as the first occurring one, Welsh in this case. AIDS is thus treated as the switched material. A sequential approach is also taken for interclausal switching, which would otherwise have to be left out of our analysis. So in a sequence of clauses where one clause is in language A and a following one (following either in the same turn or across turns) in language B, the clause (or sequence of clauses) in language B is treated as an interclausal switch. The same treatment is applied to a clause which has B as its matrix language but which includes material from A as well. This is illustrated in Example (7), which includes utterances from two speakers (normal type indicates Welsh, bold English): (7) Speaker 1: oedden nhw-n meddwl bod gynni be.3PL.PST PRO.3PL-PRT think.NFIN be.NFIN with.3SG.F lais bendigedig voice wonderful And they thought that she had a wonderful voice. Speaker 2: beauty is in the eye of the beholder ngwas i lad POSS.1SG Beauty is in the eye of the beholder, my dear.
311
The clause spoken by speaker 2, beauty is in the eye of the beholder ngwas i , is treated as switched material following an interclausal (and in fact, interturn) switch. This switch involves a change of matrix language from Welsh to English. It so happens that the switched clause itself contains an intraclausal switch, ng was i my dear: this is separately analysed. In discussing the question of what counts as a switch, we also need to distinguish switches from loans. Loans are words that have been borrowed from one language into another, like the English word restaurant, originally from French, and which would be found in a dictionary of the words of the recipient language. Where dictionaries are available for the languages concerned (e.g. Welsh, Mandarin) we have excluded from our identification of switches any other-language words found in the dictionary of those languages. This dictionary criterion is trickier with a minority language lacking a dictionary, like Tsou, and the criteria for distinguishing loans from switches in this case will be discussed in the next section. Of course even where dictionaries exist they may be conservative and not record all loans, which means that we may have mistakenly identified some loans as singleword switches. However, in the next section we shall report on analyses that minimise this possibility by focusing on multiword switches and excluding all single-word switches. Within the clause, the significance of using a clause-based rather than a sequential approach can be demonstrated by considering examples which are similar to (7) in that there is both an interclausal switch and an intraclausal switch, but the intraclausal switch comes at the beginning of the clause. This would be the case if we had recorded the version of (7) shown below in (8): (8) ngwas i beauty is in the eye of the beholder lad POSS.1SG My dear beauty is in the eye of the beholder.
In this imagined example, we are assuming that the material before (8) is the first speakers turn as in (7) and is in Welsh. A purely sequential approach would presumably identify beauty is in the eye of the beholder as an intraclausal switch while no interclausal switch would be identified. However, this seems counterintuitive as almost all the clause is in English. An additional, actually occurring example is provided by the utterance in (9), by a speaker who is bilingual in Taiwanese and Mandarin (square brackets show clause boundaries, normal type Mandarin and bold Taiwanese) (9) [zhe-li you xi-gua ne] [xi-gua in-ma zo gui here have watermelon EXCLAM watermelon now very expensive ne.] EXCLAM Here are some watermelons! Watermelons are very expensive now! We would analyse the first clause as having Mandarin as its matrix language but as involving an intraclausal switch to Taiwanese (ne ). Under a sequential approach xi-gua in the second clause (xi-gua jin-ma zo gui nei )
312
would presumably count as an interclausal switch to Mandarin, but in our clause-based approach we consider the whole clause and count it as an interclausal switch to Taiwanese, which is the language providing the majority of the material in the second clause. Within this second clause, however, there is an intraclausal switch to Mandarin in the form of the inserted xi-gua. Having outlined our approach to identifying switches or switched material for the purposes of analysis, we now go on to explain how each switch in a corpus can receive three scores that will indicate which codeswitching pattern it approximates most closely: insertion, alternation or congruent lexicalisation. In Table 4 we provide the same list of features as in Table 2, but we also indicate, with an example, how a sample switch can receive three scores on each feature according to whether or not the value of that feature matches the expected value for each of the three patterns. The general principle, which can be coded3 on a spreadsheet, is that a score of 1 is given where the feature value is as expected for that type, -1 where it is the opposite of that expected, and 0 where the feature value is neutral for that pattern. In Table 4 the illustrative material is camouflaged , as taken from the Welsh English Example (10): (10) mae o-n reit camouflaged yn dydi [MEW50] be.3SG.PRS PRO.3SG-PRTquite PRT NEG.be.3SG.PRS Hes quite camouflaged isnt he? In Table 4 the first four columns are as in Table 2, except that we have underlined features that always apply. The other features apply only when certain conditions are met, to be outlined below. Features without underlining always apply, although we should note that three features, DIVERSE SWITCHES , 4 BIDIRECTIONAL SWITCHING and HOMOPHONOUS DIAMORPHS , only apply to a dataset or corpus as a whole. These features are given in Table 4 in upper case to remind us of that fact. In the fifth column of Table 4 we show the feature values for the switched material analysed, camouflaged in this case, and in the last three columns we show the three scores that this switch would give rise to. Column 5 in Table 4 shows that camouflaged has been assigned the value for the feature single constituent.5 A single constituent is a string of one or more words that can be plausibly parsed as forming a single exclusive word group together, like noun phrase or prepositional phrase. This matches the expected value of for the insertion pattern, so the switch receives a score of 1 on the insertion pattern as shown in column 6. Columns 3 and 4 show that the value of the feature single constituent is neutral or irrelevant for the alternation and congruent lexicalisation patterns, so a score of 0 for this feature on these two patterns is noted in columns 7 and 8. In relation to the next feature, several constituents, camouflaged receives a minus value. This matches the expected value for the insertion pattern, so that a score of 1 on this pattern is noted again in column 6. This value is opposite to the positive value expected for the alternation pattern (shown in column 3) and so a score of -1 is recorded for the alternation pattern in column 7. The value of the feature is neutral for congruent lexicalisation, so a score of 0 is recorded for this pattern in column 8. Although one might consider that the existence of two contrasting features, single constituent and several constituents, could introduce redundancy, we see here that both
Table 4 Analysis of data using quantication of Muyskens features Column 3 Congruent lexicalisation 0 Type of switch: past participle 0 1 1 1 1 1 1 0 0 0 -1 -1 -1 1 -1 -1 0 -1 -1 1 1 -1 -1 1 1 0 0 0 1 1 1 1 -1 -1 1 -1 0 0 0 0 0 1 0 Example: camouflaged Column 4 Column 5 Column 6 Column 7 Column 8
Column 1
Column 2
Muyskens features 0 0 0 0
Insertion Alternation
Score on INS Score on ALT Score on CON 0 0 -1 0 -1
Single constituent
Several constituents
Non-constituent
Nested aba
Non-nested aba
DIVERSE SWITCHES
Long constituent
Complex constituent
Content word
Function word
Adverb, conjunction
Selected element
Emblematic or tag
Major clause boundary
Peripheral
313
314
Table 4 (Continued ) Column 3 Congruent lexicalisation 0 Total score 0 0 0 1 11 0 1 -1 -7 0 -1 1 4 0 -1 -1 1 Switch into English 1 1 1 -1 1 1 -1 1 -1 0 1 -1 0 0 0 Example: camouflaged Column 4 Column 5 Column 6 Column 7 Column 8
Column 1
Column 2
Muyskens features 0 0 0 0 0 0 0 -
Insertion Alternation
Score on INS Score on ALT Score on CON 0 1 1
Embedding in discourse
Flagging
Dummy word insertion
BIDIRECTIONAL SWITCHING
Linear equivalence
Telegraphic mixing
Morphological integration
Doubling
HOMOPHONOUS DIAMORPHS
Triggering
Mixed collocations
Self-corrections
315
features are necessary given that single constituent is neutral for the alternation pattern, but that several constituents is significant (see column 3). The third feature, non-constituent, contrasts with both and has different expected values for all three patterns (see columns 2, 3, 4). Camouflaged is coded as - on this feature (see column 5) with the consequences for the three scores shown in columns 6, 7 and 8. The next two features, nested aba and non-nested aba, only apply to switches that have other or matrix language material both before and afterwards. They do not apply to interclausal switches. Thus camouflaged has Welsh material before and after it, but if it had not been, then 0 would have been noted in column 5, indicating that the feature does not apply. Nested aba and non-nested aba differ in that the material before and after the switch is grammatically related in the nested case: that is, the material before and after the switch is clearly part of the same clause. This is the case6 with the material before and after camouflaged , so it receives the value on nested aba and - on non-nested aba. The next group of features relates to the nature of the element switched. As the feature diverse switches applies to the whole corpus rather than individual instances, we merely note in column 5 the grammatical category of camouflaged , which is a past participle. This can then be used to determine the overall diversity of switches afterwards. A high diversity of switches would favour the congruent lexicalisation pattern, whereas low diversity would be indicative of the insertion pattern as shown in columns 2 and 4. This feature is neutral with regard to the alternation pattern (column 3). Camouflaged receives a negative value for long constituent, which we can see in Table 3 gives it a score of 1 on insertion and congruent lexicalisation, and -1 on alternation. Exactly the same value and scores are noted for complex constituent, which refers to constituents with a hierarchical internal structure involving various lexical heads. We assume the feature content word to apply only to monomorphemic words or multimorphemic words (apple) and phrases (apple tree) of which all the constituents could be described as content words. There is room for discussion about whether this feature applies to camouflaged , which could be analysed as being made up of a ( content) verbal stem and a (-content) inflectional affix. However, we have analysed it as a ( content) adjectival participle. This value matches the insertion pattern but not the others, so this has the effect of resulting in a score of 1 for insertion and -1 for the other two patterns. On similar grounds to those outlined for content word, the feature function word will not always apply, but if the feature content word applies, then the feature function word should do so also. In the case of camouflaged we have given it a negative value, leading to scores of 1 for insertion and alternation and -1 for congruent lexicalisation. Another feature whose applicability should be in line with that of content word and function word is adverb, conjunction. Although many analysts might consider an adverb to be a content word and a conjunction to be a function word,7 this feature is needed in addition to content word and function word because a positive value for it is indicative of the alternation pattern as shown in Table 3. Camouflaged receives a minus value on adverb, conjunction resulting in the scores shown in columns 6 8.
316
Selected element always applies: it receives a positive value if the switch is an object or complement, otherwise a negative value. It is positive in the case of camouflaged . The feature emblematic or tag involves the mixing in of tags or interjections (Muysken, 2000: 99). We have assumed that it does not apply to a switch that has a positive value for any of the features long constituent, selected element or morphological integration (see below). As camouflaged has the value for selected element, then this feature does not apply and is marked 0. This has the consequence of a score of 0 on all three patterns shown in columns 6 8. The third group of features relates to the switch site involved. Interclausal switches are the main type that will receive a positive value for the feature major clause boundary; however camouflaged receives a negative value as it occurs mid-clause. This has an impact only on the score for alternation, which is shown in column 7 as -1. The feature peripheral will receive a positive value if the switch is peripheral to the clause; but in the case of an item like camouflaged , which is a selected element, it will receive a minus value, again impacting only on the score for alternation, which will be -1. Embedding in discourse applies only to switches that occur at the end of a turn: it receives the value if the next turn begins in the same language as this switch, and - if not. This feature does not apply to camouflaged as it does not occur at the end of the turn. Flagging refers to mixing that is highlighted by the insertion of a discourse marker, a pause or a repair (see Muysken, 2000: 101); and is indicative of the alternation pattern. It receives a negative value for camouflaged , however, resulting in a score of 1 for the insertion and congruent lexicalisation pattern and -1 for alternation. Dummy word insertion refers to the insertion of semantically empty elements such as like, thing or do (cf. Muysken, 2000: 105): this receives a negative value for camouflaged . As bidirectional switching applies only to the whole corpus, we note in relation to this feature the direction of the switch, which in the case of camouflaged is to English. At a later stage the percentage of switches occurring in each direction will be calculated. A roughly equal proportion of switches in each direction will be indicative of the alternation or congruent lexicalisation pattern, whereas a predominance of switches in one direction rather than another will favour the insertion pattern. The next feature, linear equivalence,8 is taken to refer to whether the switched material occurs in the same position in the clause, sequentially, in which it would have appeared in the matrix language. It receives a positive value in relation to camouflaged , as predicative adjectives in Welsh appear after the verb, as do predicative adjectives and adjectival participles in English. This leads to a score in favour of the alternation and congruent lexicalisation pattern. In our analysis we have assumed that the feature linear equivalence does not apply to interclausal switching. Telegraphic mixing occurs where elements have been omitted that should have been present in one or both of the languages involved. This is not the case with camouflaged , which thus receives a minus value, which again favours the alt and cong patterns. Morphological integration, which is also indicative of congruent lexicalisation, occurs where one of the languages determines the overall grammatical framework, and where items switched from the other language
317
are morphologically integrated into the main or matrix language. We have assumed this feature does not apply where the possibility of morphological integration does not arise, e.g. where there is no bound morpheme that could be replaced by a bound morpheme from the matrix language. In the case of camouflaged , morphological integration does apply because mutation9 of the initial consonant from [k] to [g] would be expected but does not occur. Doubling applies where the semantic value of the switch is the same as that of another morpheme in the original language also found in the utterance, as e.g. when plurality is marked twice, as in the doubly marked Welsh English plural in llwynogods 10 foxes (Welsh llwynog fox, -od Welsh plural, -s English plural). This does not apply in the case of camouflaged , which thus receives a minus value for this feature. The feature homophonous diamorphs applies to pairs of words that are phonetically similar in both varieties: in transcriptions coding the language of each word11 these words are shown as language-neutral. For this reason, however, they do not count as switches and so cannot be scored like other switched material. We have therefore assumed in the analyses reported in the next section that this feature does not apply to the switches we have identified. However, Muysken suggests that a high frequency of homophonous diamorphs would indicate the congruent lexicalisation pattern. In the next section we will indicate the frequency of homophonous diamorphs in the Welsh English data as a separate analysis. The next feature, triggering, arises from a phenomenon identified by Clyne (e.g. 1967) and applies only to multiword switches. It is an interpretation of a multiword switch where the choice of one of the words in the switch may have led to other words being switched as part of a longer string. It is indicative of the congruent lexicalisation pattern. This feature appears to involve more subjective interpretation than the others, but we have applied it here as carefully as possible. In Table 4 we can see that triggering does not apply to camouflaged : this is because it is not a multiword switch. Mixed collocations occur where two elements of an idiomatic collocation from one of the languages are from different languages, e.g. mynd on go on in Welsh English conversations. Camouflaged is not part of a mixed collocation and receives a negative value for this feature. Finally, self-corrections are switches that involve repetition of similar material in the other language, often after a hesitation. A good example can be found in Poplack (1998: 53), where a speaker is relating in French a conversation he had in English (note that we have indicated the self-correction by underlining): Jai dit you dont swim. Il dit, sure il dit, I can swim. Il dit sure. Well, I says jai dit show it to me. (I said, you dont swim. He says, sure he says, I can swim. He says sure. Well, I says I said show it to me.) This example involves interclausal switching from French to English followed by the insertion of material in French as a self-correction. Our example, camouflaged , however, is not a selfcorrection and so receives a minus on this feature. After this procedure has been completed scores can be assigned for each of the three patterns. In the case of our example camouflaged we can see from the last row of Table 4 that it receives a score of 11 on insertion, -6 on alternation and 3 on congruent lexicalisation, which we interpret as indicating that
318
insertion is the dominant pattern for this switch. The scores also show that congruent lexicalisation is a secondary pattern. As we shall show in the next section, the scores for a set of switches can be added together so that we achieve an overall score on each pattern for the whole set of data.
Case Studies: Welsh English, Tsou Mandarin and Taiwanese Mandarin

In this section we illustrate how the quantitative method outlined in the previous section can be applied to actual data. We shall apply it to a sample of switches from three data sets: Welsh English, Tsou Mandarin and Taiwanese Mandarin. These data were based on recordings of spontaneous conversations.12 Welsh English data Several hours of conversational data were collected from Welsh English bilinguals in North-west Wales, and can be found on the Talkbank13 website. One transcript was selected for analysis. This conversation was recorded by a native speaker of Welsh at a social gathering to which she invited some colleagues. All of the seven participants in the conversation are female, native Welsh speakers who are bilingual in Welsh and English. (Aside from infants, monolingual Welsh speakers are virtually nonexistent, given the extent of exposure to English at home (at least in the media), school and in the community, where monolingual English speakers are to be found in all areas, to a greater or lesser extent.) Six of the women are in their 30s while one is in her 50s. The first hundred switches were coded as in the case of the example, camouflaged , illustrated in the previous section. Examples (11) and (12) will be used in addition to (7) and (10), shown previously, to further illustrate the method of analysis. (11) Speaker 1: . . . mae-n bwysig. be.3SG.PRS.PRT important Its important. Speaker 2: ply them with alcohol gynta. first Ply them with alcohol first. ti-n gwybod gwestai crand, pam bod na PRO.2SG.PRT know.NFIN hotels grand why be.NFIN there ddim handy andies efo dy toothpicks di? NEG Handy Andies with POSS.2SG POSS.2SG You know the grand restaurants, why arent there any handy andies with your toothpicks? In Example (12) we have analysed toothpicks as a single-word switch in an otherwise Welsh clause (pam bod na ddim handi andies efo dy toothpicks di ). In Example (11), however, we consider Ply them with alcohol gynta to
(12)
Table 5 Analysis of six Welsh English examples (2) beauty is in the eye of the beholder, ng was i 0 0 Clause 0 0 0 0 0 0 0 0 0 0 0 0 0 Address term Clause 0 0 0 Adverbial 0 0 0 Noun phrase (3) ng ngwas (4) ply them with (5) gynta (first) i (my dear) alcohol gynta (6) toothpicks
Muyskens features
(1) camouflaged
Single constituent -
Non-constituent
Nested aba
Non-nested aba
Type of switch 0 -
Past participle
Long constituent
Complex constituent
Content word
Function word
Adverb, conjunction
Selected element
Emblematic or tag
Peripheral
319
320
Table 5 (Continued ) (2) beauty is in the eye of the beholder, ng was i Welsh 0 See endnote (4) See endnote (4) 0 0 English Welsh 0 See endnote (4) 0 English 0 0 See endnote (4) 0 0 English See endnote (4) 0 (3) ng ngwas (4) ply them with (5) gynta (first) i (my dear) alcohol gynta (6) toothpicks
Muyskens features
(1) camouflaged
Embedding in discourse -
Flagging
Direction of switch -
English
Linear equivalence
Telegraphic mixing
Doubling
Homophonous diamorphs 0 -
See endnote (4)
Triggering
Mixed collocations
Self-corrections

Table 6 Scores on codeswitching patterns for Welsh English examples SCORES ON MIXING PATTERNS (1) camouflaged (3) (2) beauty is in ng was i the eye of (my dear) the beholder, ng was i -1 2 1 0 3 4 (4) ply them with alcohol gynta 1 3 2 (5) gynta
321
(6) toothpicks
Insertion Alternation Congruent lexicalisation Dominant pattern
11 -7 4
5 3 1
11 -7 4
Insertion Alternation Congruent Alternation Insertion Insertion lexicalisation
Table 7 Results of classifying 100 switches Scores on Switching Patterns Insertion Alternation Congruent lexicalisation Dominant Pattern TOTAL 710 -385 434 Insertion MEAN 7.1 -3.85 4.34
be an interclausal switch to a clause with English as its matrix language instead of Welsh, the matrix language of the previous clause. This clause then contains an intraclausal switch, gynta . Thus we treat Ply them with alcohol gynta and gynta as two switches, one interclausal and the other intraclausal. The results for the coding of the switched material in these examples are shown in Table 5. If the feature values as reported in Table 4 are then used to calculate scores on the three different codeswitching patterns as outlined in the previous section, the scores will be as shown in Table 6. The bottom row in Table 6 shows the dominant pattern for each switch. Table 6 shows the results for just six sample switches, but to arrive at an overall result for the sample of 100 switches we need to add up the individual scores. If this is done for the first 100 switches in the sample, then our results are as shown in Table 7.14 Table 7 shows that the dominant pattern for this set of data is insertion, with congruent lexicalisation as a secondary pattern. There are three features not considered until now that apply to the whole corpus: diverse switches, direction of switch and homophonous diamorphs (see previous section). If we examine diverse switches and bidirectional switching, we find that the
322
Table 8 Results of classifying 50 multiword switches Scores on Switching Patterns Insertion Alternation Congruent lexicalisation Dominant Pattern TOTAL 291 -179 213 Insertion MEAN 2.91 -1.79 2.13
overall results are compatible with the insertion pattern. Sixty-four percent of the switches are either single nouns or noun phrases, which makes for an overall low diversity of switch types, and matches the negative value of this feature for the insertion pattern shown in Table 2. As for directionality of switching, 96% of the switches are from Welsh to English rather than English to Welsh, which indicates a lack of bidirectionality, also matching the negative value of this feature for insertion shown in Table 2. The feature homophonous diamorphs is not relevant for the insertion pattern, but a relatively high proportion would be compatible with the secondary codeswitching pattern indicated in Table 7, congruent lexicalisation. In the sample we have examined there is a totalx of 746 homophonous diamorphs out of a total of 6515 words (excluding names), or 11%. However, it remains to compare this with other corpora to see whether or not this is a relatively high proportion. In the previous section we discussed the difficulty of distinguishing loans from switches and indicated that we would conduct analyses that helped to minimise the effect of mistakenly identifying loans as single-word switches. It could be argued that including an inflated proportion of single-word switches in the data would unduly favour the insertion pattern. In the case of the Welsh English data we therefore recalculated the total score for the sample excluding all examples of single-word switches. The results are reported in Table 8. As shown in Table 8, we still find that insertion is the dominant pattern, that congruent lexicalisation is a secondary pattern and that alternation is counterindicated. Table 3 summarised Muyskens (2000) view of the relation between linguistic factors, extralinguistic factors and codeswitching patterns. From a linguistic point of view, we might have expected the typological distance between English (an SVO language) and Welsh (a VSO language) to predict either the insertion or the alternation pattern. From an extralinguistic point of view, a colonial setting might have favoured insertion, whereas a tradition of language separation might favour alternation. It can be argued that Wales has been a colony of England since it was conquered in 1284, the Welsh language being excluded from legal use after the Acts of Union in 1536 and 1542. (This situation did not change substantially until 1942: see Thomas, 1982: 87.) At least in informal situations, however, language separation does not seem to have applied for Welsh English bilinguals, and this has led to language
323
contact over a long period of time. This prolonged contact has doubtless led to some structural approximation between the two languages e.g. in the form of calquing. This is beyond the scope of this paper, but could explain why congruent lexicalisation, a pattern more typical of closely related languages, appears to be a secondary pattern in our data.
Tsou Mandarin data About nine hours of conversational data were recorded. Ten old people (age 55 70), including seven males and three females, were selected as the subjects. All participants were fluent bilingual speakers of Mandarin and Tsou. (A Tsou village is one primarily inhabited by members of the Tsou ethnic minority who speak Tsou, a VOS Austronesian language, which is structurally very different from Chinese.15) The recordings were conducted by one of the subjects in a small restaurant in a Tsou village in Taiwan: Tapangu. The data were transcribed using the Chinese phonemic pin-yin16 system for Mandarin and the IPA for Tsou. The first one hundred switches transcribed were used for the analysis reported here. Below we give examples of switches in (13) (16), which we shall then use to illustrate how the quantitative method outlined in the previous section was applied. (Tsou material is shown in normal type while Mandarin material is shown in bold.) (13) moso gao-zhuang do lao-ban ?o Basuya. PST.AG. [ VIS] tell-tales OBL boss NOM.[ VIS] name Basuya told tales on someone to his boss.
In this example we consider Tsou to be the matrix language on the basis of its VOS word order, which is characteristic of Tsou, and on the basis of the Tsou subject verb agreement and case-marking. We assume that there are two Mandarin switches, gao-zhuang and lao-ban respectively, which are (1) and (2) in Table 9. (14) [o-suko c?o PST.[ VIS].2SG only You felt he (is) there. gan-jue [ta zai feel he in na-bian]] there
In Example (14) of a complex structure, with a main clause containing a subordinate clause (shown by square brackets inside other brackets), we consider the main clause to have Tsou as its matrix language because of the Tsou subject verb agreement. However, there is an intraclausal switch to Mandarin, gan-jue ta zai na-bian (switch no. 3 in Table 9), which also contains an interclausal switch to Mandarin as the matrix language in the subordinate clause ta zai na-bian (switch no. (4) in Table 9). (15) magci moso no shang-diao why PST.AG OBL hang Why did (he) hang himself? ne? Q
324
In (15) the Mandarin materials shang-diao ne is a switch (no. (5) in Table 9) in a clause which has Tsou as its matrix language (because of the Tsou casemarking). (16) Basuya, Avai, hai-you Fa?e de ba-ba cai ai-yu-zi name name and name POSS father pick plant.name yone shi-san-xi in place name Basuya, Avai, and Fa?es father picked ai-yu-zi in Shi-san-xi. Finally, Example (16) has Mandarin as its matrix language (as it follows Mandarin word order) with an inserted preposition in Tsou, yone . The results of the analysis of the six switches illustrated above are given in Table 9, and Table 10 shows the total scores for each of these switches. Unlike Mandarin and Taiwanese, Tsou has an abundance of inflectional morphology and its word order (VOS) is different from the Chinese SVO order. This is illustrated by the monolingual Examples (17) and (18). (17) ba-ba chi xiang-jiao. (Mandarin SVO) father eat banana. Father ate a banana. mo bonu to tacumu ?o amo. (Tsou - VOS) PST-AG eat OBL banana NOM father Father ate a banana. (Zeitoun, 1992: 15)
(18)
Hence, for the Mandarin Tsou data, we adopted the quantitative method outlined in the previous section without any adjustment. To illustrate how a switch is defined in a Tsou Mandarin bilingual clause, see Example (13), which is repeated as (19) below. As the word order is VOS as in Tsou and the inflectional morphology is also provided by Tsou, we therefore identify Tsou as the matrix language and the elements in Mandarin as the switches. (19) moso gao-zhuang do lao-ban ?o Basuya. PST-AG tell-tales OBL boss NOM name Basuya told tales on someone to his boss.
Table 11 shows that our sample of 100 Tsou Mandarin switches appears to have two dominant patterns, insertion and congruent lexicalisation, with insertion just receiving the higher score. Referring to Table 3, this result in Table 11 can be explained in terms of both linguistic and extralinguistic factors. The linguistic factor is typological distance, as the word order of Mandarin is SVO while that of Tsou is VOS. As for extralinguistic factors, the indigenous minority Tsou group can certainly be described as existing in a colonial setting. The secondary pattern of congruent lexicalisation cannot at first sight be accounted for by the linguistic factor in Table 3 of typological similarity. However, it is possible that some structural convergence of Tsou towards Mandarin has occurred during a period of intense language contact since 1945. Since Tsou speakers are numerically, politically and socioeconomically a
Table 9 Analysis of six Tsou Mandarin examples (2) lao-ban 0 0 Clause 0 0 0 0 0 Verb particle 0 0 0 0 Noun 0 0 0 0 0 VP 0 0 (3) gan-jue ta zai na-bian (4) ta zai na-bian (5) shang-diao ne (6) yone Preposition 0 -
Muyskens Features Verb 0 -
(1) gao-zhuang
Single constituent
Non-constituent
Nested aba
Non-nested aba
Type of switch
Long constituent
Complex constituent
Content word
Function word
Adverb, conjunction
Selected element
Emblematic or tag
Peripheral
Flagging
325
326
Table 9 (Continued ) (2) lao-ban T to M 0 0 0 0 0 0 0 0 0 T to M T to M T to M (3) gan-jue ta zai na-bian (4) ta zai na-bian (5) shang-diao ne (6) yone M to T 0 0 0 -
Muyskens Features T to M 0 0 -
(1) gao-zhuang
Direction of switch
Linear equivalence
Telegraphic mixing
Doubling
Homophonous diamorphs
Triggering
Mixed collocations
Self-corrections
Table 10 Scores on code-switching patterns for Tsou Mandarin examples (5) shangdiao ne 3 -10 8 (6) yone 7 -5 7 Congruent Insertion & congruent lexicalisation lexicalisation
Scores on mixing patterns (1) gao-zhuang (2) lao-ban (3) gan-jue ta zai na-bia (4) ta zai na-bian 12 -8 4 Insertion Insertion Congruent lexicalisation 4 4 0 Alternation -8 1 4 12 0 -7
Insertion
Alternation
Congruent lexicalisation
Dominant pattern
327
328
minority, Tsou people have to speak Mandarin in their work places. In addition to that, all Tsou people have to receive their education through the media of Mandarin. Mandarin is currently replacing Tsou as the main language for daily communication in Tsou villages, and Tsou is gradually losing its speakers. There are three features that we have not yet applied to the entire Tsou Mandarin corpus, namely diverse switches, direction of switch and homophonous diamorphs. If we examine diverse switches, we find that 30% of the switches are nouns or noun phrases, 14% are verb or verb phrases, 22% are adjectives (or adjective phrases), prepositions, etc., and the remaining 34% are non-constituents. This result suggests a high diversity of switch types, which contradicts the negative value of this feature for the insertion pattern shown in Table 2, but matches the value for congruent lexicalisation, which we have also seen to be an important pattern. When we look at direction of switch, we find that the majority of the switches (67%) are from Tsou to Mandarin. This relative lack of bidirectional switching matches the negative value of this feature for the insertion pattern shown in Table 2. Finally, no homophonous diamorphs are found in our Tsou Mandarin corpus, but the value of this feature is not relevant for the insertion pattern (see Table 4). In the previous section, we raised the question of how to distinguish switches from loans, and suggested that words found in dictionaries can be considered to be loans. However, there is no reliable Tsou dictionary, so that we had to use our own judgement in distinguishing Tsou loans from switches. To guard against the possibility that we may have mistakenly identified some loans as single-word switches, we followed the same procedure as for our Welsh English data described in the previous subsection, that is, we recalculated the total score for the Tsou Mandarin data excluding singleword switches. The results are shown in Table 12. Table 12 shows that congruent lexicalisation is the dominant pattern for this subset of data, scoring higher than insertion in this analysis. As single-word switches would tend to favour the insertion pattern and are yet excluded in this analysis, the result would suggest that the truly dominant pattern in these data is congruent lexicalisation. According to Muyskens (2000) prediction as shown in Table 3, this is a pattern likely to occur when the two languages are typologically similar, unlike Tsou and Mandarin. However, the dominance of congruent lexicalisation may be explained in terms of the
Table 11 Results of classifying 100 Mandarin Tsou switches Scores on Switching Patterns Insertion Alternation Congruent lexicalisation Dominant pattern TOTAL 543 -505 517 Insertion MEAN 5.43 -5.05 5.17
329
possible structural convergence suggested as an interpretation of the results in Table 11 also. Taiwanese Mandarin data About 4.5 hours of conversational data were recorded in the Department of Environmental Protection of Tainan County, Taiwan. There were 10 speakers (aged 50 60): three males and seven females. They all worked in this department and were fluent speakers of both Mandarin and Taiwanese. The recordings were conducted by one of the participants. The data were transcribed using the Chinese phonemic pin-yin system for Mandarin and the IPA for Taiwanese.17 The first 100 switches in the transcription were analysed using the quantitative method outlined in the previous section.
Defining switches
As outlined in the Taiwanese Mandarin data in the previous section, we worked on a clause-by-clause basis for the analysis, and tried to identify the matrix language of a clause by checking the word order and subject verb agreement. However, these criteria are problematic for Taiwanese Mandarin data. Taiwanese (also known as Hokkien and South Min) is one of the dialects spoken in southern China and Taiwan, and it shares most of the syntactic structures with Mandarin, or standard Chinese. The major differences between these two languages are their phonology and lexicons, and the word order remains the same. This is illustrated by the two monolingual sentences in (20). (20) a. ta jing-zhui zhang yi ke zhong-liu. (Mandarin) his neck grow one CLF tumour There is a tumour on his neck. i am-gun-na son i liab pai-mi. (Taiwanese) his neck grow one CLF tumour There is a tumour on his neck.
b.
Moreover, Chinese is well known as an isolating language in which there is little inflectional morphology, as shown in (20a) and (20b). Therefore, it is not possible to clearly identify the matrix language of a Taiwanese Mandarin
Table 12 Total scores excluding Mandarin Tsou single-word switches Scores on switching patterns Insertion Alternation Congruent lexicalisation Dominant pattern Total based on 64 switches 219 (mean 3.42) -284 (mean -4.44) 343 (mean 5.36) Congruent lexicalisation
330
bilingual clause by using the two criteria of word-order and agreement. We therefore used additional criteria to identify the matrix language in these data, including the assumption that the language of any classifier was the matrix language for that clause. This is illustrated in Example (21), where the classifier liab is in Taiwanese. We therefore assume that the matrix language of (21) is Taiwanese, and that the items jing-drue neck and zhong-lio tumour are switches into Mandarin. (21) i jing-zhui son i liab zhong-liu. (Taiwanese Mandarin) his neck grow one CLF tumour There is a tumour on his neck.
Another criterion used to identify the matrix language of a clause in the absence of other relevant criteria was the language of the majority of items in the clauses, for example, in (22) there is only one item in Mandarin (gu-li ) while the rest of the material is in Taiwanese. We therefore assume that Taiwanese is the matrix language of this clause and gu-li is a switch. (22) an-na gu-li in e. (Taiwanese Mandarin) how encourage them EVAL How to encourage them?
Further examples showing how we identified switches are given in (23) (27). (23) da-go wu da-go e qu-yu a! everyone have everyone POSS area AF Everyone has his/her own responsible area!
Here Taiwanese is the matrix language and the Mandarin materials qu-yu a is a switch. (24) [li gin-la-li na gon [mei-you chu-fa] [mei-you fen qu-yu] you today if say NEG punishment NEG divide area [bo NEG wo gon.]] word say.
If, today, there is no punishment and no responsible area being divided, then I have nothing to say. Taiwanese is again the matrix language of this complex clause, which includes an interclausal switch to Mandarin: mei-yo chu-fa mei-yo fen chu-yu . (25) [er-qie zhe ge dong-xi hon] you jiang jiu you fa. also this CLF thing REP have award then have punishment Also, (when dealing with this kind of thing), you have to punish them, and sometimes give them an award. Here the matrix language of the subordinate clause in the brackets is Mandarin, for the classifier ge is from Mandarin, and the Taiwanese discourse particle hon is a switch.
331
(26)
wa ma lon bo iE-dio yu-jin ne. I also all NEG use bath-towel INFOR I dont use a bath towel at all, either.
In (26), Taiwanese is the matrix language and the Mandarin material yu-jin is a switch. (27) ying-gai ma e-daE a-no- i la! Probably also can like.this AF Probably, (it) can also be like this.
Example (27) is similar to (26) in that Taiwanese is the matrix language and the Mandarin material ying-gai is a switch. Table 13 shows how these six switches were analysed. Table 14 lists the dominant patterns of these six switches and the scores they receive for each pattern. Table 15 shows the overall result of analysing the 100 Taiwanese Mandarin switches. We can see that congruent lexicalisation receives a score of 638, which is the highest score among these three patterns. Insertion receives the second highest score of 494, while alternation receives the lowest score of -449. We did a separate analysis of the three features (i.e. diverse switches, direction of switch and homophonous diamorphs) that are not included in the analysis in Table 15. As for diverse switches, we find that 34% of the switches in our Taiwanese Mandarin corpus are nouns and noun phrases, 37% are non-constituents and the remaining 29% are others (verb 7%, CP 9% and etc). This result shows that there is a high diversity of switches, and matches the positive value of this feature for the congruent lexicalisation pattern shown in Table 2. As for direction of switch, we find that 77% of the switches are from Taiwanese to Mandarin rather than Mandarin to Taiwanese. This result shows that there is a relative lack of bidirectional switching, which contradicts the positive value of the feature for congruent lexicalisation shown in Table 2. Finally, we found a total of 22 homophonous diamorphs (all sentence final particles) out of a total of 634 words18 (3.4%) in our Taiwanese Mandarin corpus. Referring to Table 2, this relatively low proportion would seem to be at odds with the congruent lexicalisation pattern. In addition, we did a separate analysis to exclude all the single-word switches. The result is shown in Table 16. We can see that congruent lexicalisation is still the dominant pattern, again receiving the highest score. The insertion pattern still comes second but receives a lower mean score than in Table 15, confirming the dominant pattern to be congruent lexicalisation once the single-word switches have been removed from the analysis. One possible interpretation for this result would be linguistic. As illustrated by the examples in (20), the grammatical structures of Mandarin and Taiwanese are almost the same. In other words, they are typologically similar languages. This means that switches can occur almost anywhere as illustrated in the congruent lexicalisation pattern shown in Figure 3. Another explanation for the predominance of the congruent lexicalisation pattern would be extralinguistic. According to Muysken (2000): 9), congruent lexicalization may be particularly associated with second generation migrant
332
Table 13 Analysis of six Mandarin Taiwanese switches (1) qu-yu a Noun 0 0 0 0 0 0 Clause 0 0 0 0 0 Clause Discourse particle 0 0 0 0 0 0 Noun particle 0 0 0 0 0 0 Adverb 0 (2) mei-yo chu-fa mei-yo fen qu-yu (3) bo wo gon (4) hon (5) yu-jin (6) ying-gai
Muyskens Features
Single constituent
Non-constituent
Nested aba
Non-nested aba
Type of switch
Long constituent
Complex constituent
Content word
Function word
Adverb, conjunction
Selected element
Emblematic or tag
Peripheral
Table 13 (Continued ) (1) qu-yu a T to M 0 0 0 0 0 0 0 0 T to M M to T M to T T to M 0 0 T 0 0 (2) mei-yo chu-fa mei-yo fen qu-yu (3) bo wo gon (4) hon (5) yu-jin (6) ying-gai
Muyskens Features
Flagging
Direction of switch
Linear equivalence
Telegraphic mixing
Doubling
Homophonous diamorphs
Triggering
Mixed collocations
Self-corrections
333
334
Table 14 Scores on codeswitching patterns for six Mandarin Taiwanese examples (1) qu-yu a 0 -5 8 Congruent lexicalisation Congruent lexicalisation Congruent lexicalisation 7 8 0 -4 -5 10 Congruent lexicalisation -3 3 1 (2) mei-yo chu-fa mei-yo fen qu-yu (3) bo wo gon (4) hon (5) yu-jin 12 -8 5 Insertion (6) ying-gai 7 1 2 Insertion
Scores on mixing patterns
Insertion
Alternation
Dominant pattern

Table 15 Results of classifying 100 Mandarin Taiwanese switches Scores on switching patterns Insertion Alternation Congruent lexicalisation Dominant pattern Total 494 -449 638 Mean 4.94 -4.49 6.38
335
Table 16 Total scores of excluding Mandarin Taiwanese single-word switches Scores on mixing patterns Insertion Alternation Congruent lexicalisation Dominant pattern Total based on 65 switches 153 (mean 2.35) -237 (mean -3.65) 465 (mean 7.15) Congruent lexicalisation
groups, dialect/standard and post-creole continua . . .. The relation of Taiwanese to Mandarin can certainly be considered that of a dialect to a standard. As Table 15 shows, insertion receives a score of 494, which is the second highest. According to the predictions summarised in Table 2, insertion will be favoured by the linguistic factor of typological distance and by various extralinguistic factors including that of colonial settings. Linguistic factors clearly do not account for the relatively high insertion scores as Mandarin and Taiwanese are typologically quite close. However, an extralinguistic explanation may be provided by the quasicolonial relationship between Mandarin and Taiwanese. During 55 years of rule by the nationalist party (KMT), the Taiwanese government implemented strict language policies (1945 1987), and Mandarin had very high prestige compared to other languages spoken in Taiwan (Liao, 2000). The use of languages other than standard Mandarin (e.g. Taiwanese or Tsou) was not allowed in schools, mass media or any other public places (Chen, 1998). Such a situation resembles the colonial settings in which one language has very high prestige, while other languages are in an inferior position. As Table 2 suggests, insertion is often associated with colonial settings. The high score for the insertion pattern may thus reflect the quasicolonial situation in Taiwan.
Conclusions and Implications

In conclusion, the results of analysing our Welsh English, Tsou Mandarin and Taiwanese Mandarin data suggest that no data set can be exclusively categorised by one of Muyskens (2000) three CS patterns. There is always a more or less strong secondary pattern. However, there are clear preferences,
336
which may be related to grammatical or extralinguistic factors. Thus the insertion pattern in the Welsh English and in the Tsou Mandarin data may be related to divergent typological features as well as to the extralinguistic factor of a colonial setting. The congruent lexicalisation pattern in the Taiwanese Mandarin data, on the other hand, may be related to the typological similarities between these languages. In this pair, however, we have suggested that the relatively high score achieved for the insertion pattern may be explained in terms of extralinguistic factors, in particular the standard variety-dialect relation between the two. However, some methodological and conceptual issues remain. The first of the methodological issues is that in the analyses reported above we have treated all of the features as if they had equal weight. However, it may be that some features should be treated as more important than others. Second, there is some redundancy in the system that we have not taken account of, such as where the value on a particular feature will determine the value on another. For example the value - on the feature long constituent will predict - on the feature complex constituent. It might be possible to refine the analysis further if we were to adopt a model with dependent binary choices, but the development of such a model will have to await further research. Third, we have seen that not all features apply to all the switches. For instance, the feature morphological integration was not applicable if morphological integration of the switch was not possible, as in Mandarin and Taiwanese. Because of these problems, the scores for each switch are not directly comparable although we have treated them as though they were. From a conceptual point of view, one issue is that although our framework seems to work well in identifying the insertion and congruent lexicalisation patterns, this does not seem to be true of alternation. The results show that alternation receives considerably lower scores across these three sets of data when compared to the other two patterns, to the extent that all our alternation scores are negative. This may be because we have not chosen data sets that reflect alternation, or perhaps because the framework identifies alternation as dominant only if we restrict our analysis to interclausal switching. Another possible explanation is that typological distance plays a more important role than we have acknowledged in determining which of the three patterns appears to be dominant. Table 2 represents typological distance as favouring insertion and alternation while typological similarity favours congruent lexicalisation. Another way of looking at it would be that typological distance favours one of two patterns, either insertion or alternation, while typological similarity gives rise to just one pattern, congruent lexicalisation. This would be compatible with our findings that insertion was dominant for Welsh English and Tsou Mandarin, where typological distance was involved, but that congruent lexicalisation was dominant for Taiwanese Mandarin, where typological similarity prevailed. Another way of viewing the relationship between the three patterns is thus that the contrast between insertion and alternation is neutralised in the form of congruent lexicalisation where typological similarity occurs.
337
A final conceptual issue to be mentioned here is that of how to define a switch. The clause-based approach that we outlined in the subsection What is a switch? may have been a factor in the apparent non-occurrence of the alternation pattern in our data, and to develop a different analysis would require a whole new set of diagnostic criteria. The clause-based approach in effect assumes an asymmetry between the languages which the alternation pattern does not, and thus may bias the analysis towards the insertion and congruent lexicalisation patterns. We suggest that future research could attempt to dispense with the notion of a switch altogether, replacing it with the notion of language-specific chunk. The diagnostic features in Table 2 would then be applied not just to switches, but to the entire data, divided into chunks or stretches of same-language material. The overall results for each language (i.e. the set of language-specific chunks from a particular language) could then be compared in terms of constituency, category (instead of element switched, cf. Table 2), switch site (focussing on the site of the beginning of the language-specific material) and properties of language-specific chunk (instead of properties of switch). On this basis a new typology of codeswitching or language alternation patterns might be developed. Acknowledgements Thanks are due to the following for their comments on an earlier version of this paper: Marika Fusser, Helena Halmari, Mark Sebba, and to audiences at oral presentations of this work given in Manchester, Nijmegen and York. We should also like to thank all those bilingual speakers in Taiwan and Wales who kindly gave permission for their conversations to be recorded. Correspondence Any correspondence should be directed to Professor Margaret Deuchar, School of Linguistics and English Language, University of Wales, Bangor, Gwynedd LL57 2DG, UK (m.deuchar@bangor.ac.uk). Notes
1. Note that we have used the Leipzig Glossing Rules (http://www.eva.mpg.de/ lingua/les/morpheme.html, see also Croft, 2003: xix xxv; Lehmann, 1983) in the glosses for all examples. This has sometimes involved changing the glossing conventions of cited published examples or inserting our own glosses. 2. We have interpreted colonial settings as referring to situations where the language of the majority can be considered to be a colonial language which coexists with a minority language that is associated primarily with the colonised. 3. For example, the formula IF(OR($E$30,0,E350,0),0,IF(E35$E$3,1,-1)) is based on the Microsoft Excel IF(logical_test,value_if_true,value_if_false). The example given indicates that if the value for a particular feature in relation to a particular switch is 0, then the score will be 0 on the particular pattern being scored in this cell. If the value matches that found in E3 (where the expected value for that pattern is coded), the score will be 1, if not -1. So if the value is found in E3 and E35, the score will be 1, but if is found in E3 and - in E35 then the score will be -1. 4. As explained later in the text, homophonous diamorphs do not count as switches because they are language-neutral and therefore they cannot be scored like other switched material. However, the frequency of homophonous diamorphs can be
338

calculated in order to determine which codeswitching pattern this criterion indicates. Muysken (2000: 129 130) provides examples of the switching of several constituents and of a nonconstituent in his examples (12) and (13) respectively. We have assumed that the tag yn dydy is part of the same clause, a decision that could possibly be challenged. If the material following camouaged were clearly part of a new clause, then the switch would receive the value - on nested aba and on non-nested aba . Although there might appear to be some redundancy in the existence of these two features, which are the converse of one another, we can see in Table 3 that both features are necessary as their values on the three possible codeswitching patterns are not simply a mirror image of one another. See also chapter 6 of Muysken (2000). See Muysken (2000) for considerable discussion of the relation between linear and categorical equivalence and the relative importance of the two notions. Mutation is a morphophonological process that applies to certain initial consonants in Welsh under certain conditions: for more information see e.g. Thomas (1992: 34). We are grateful to Winifred Davies for this example. See e.g. Bangor data at http://talkbank.org/data/LIDES. A rough idea of the frequency of switching in these data can be obtained by calculating the proportion of words in each language in the dataset. In the Welsh English data analysed, 80% of the words were unambiguously Welsh, 8% were unambiguously English and over 11% could belong to either language. In the Tsou Mandarin data analysed, approximately 62% of the words were unambiguously Mandarin, 37% of words were Tsou and 1% were either Taiwanese or Japanese. In the Taiwanese Mandarin data analysed, 32% of the words were unambiguously Mandarin, 67% of words were Taiwanese and approximately 1% could belong to either language. See Bangor data at http://talkbank.org/data/LIDES. We are grateful to Marika Fusser for assisting with this analysis. The population of the Tsou ethnic group is 6149 (Taiwan National Institute of Educational Resources & Research: http://3d.nioerar.edu.tw/2d/native/course/ course_0101. Accessed 31.03.07). Tsou is spoken by an unknown proportion of the Tsou ethnic group. It is estimated that the number of the speakers of Tsou is approximately 3000 3500. Hyphens in the representation of both Mandarin and Taiwanese indicate adjacent Chinese characters. Hyphens in the representation of both Mandarin and Taiwanese indicate adjacent Chinese characters. Unlike the Welsh English data, we only transcribed the utterances that contain switches in our Taiwanese Mandarin corpus. Cf. Muysken (2000: 7): In this situation, a single constituent B (with words b from the same language) is inserted into a structure dened by language A , with words a from that language.
5. 6.
7. 8. 9. 10. 11. 12.
13. 14. 15.
16. 17. 18. 19.
References
Backus, A. (1996) Two in one. Bilingual speech of Turkish immigrants in the Netherlands. Doctoral dissertation, Katholieke Universiteit Brabant, Tilburg, Netherlands: Tilburg University Press. Bakker, P. (1997) A Language of Our Own. The Genesis of Michif, the Mixed Cree French tis . Oxford: Oxford University Press. Language of the Canadian Me Chen, M.R. (1998) The Review and Future Development of The Language Policy in Taiwan . Kaohsiung: Fu Wen Publisher Ltd. Clyne, M. (1967) Transference and Triggering . The Hague: Nijhoff. Clyne, M. (2003) Dynamics of Language Contact. English and Immigrant Languages . Cambridge: Cambridge University Press.
339
Croft, W. (2003) Typology and Universals (2nd edn). Cambridge: Cambridge University Press. Deuchar, M. (2006) Welsh English code-switching and the Matrix Language Frame model. Lingua 116 (11), 1745 2022. Gardner-Chloros, P. (1991) Language Selection and Switching in Strasbourg . Oxford: Clarendon Press. Giesbers, H. (1989) Code switching tussen dialect en standaardtaal. Doctoral Dissertation, Katholieke Universiteit, Nijmegen. Halmari, H. (1997) Government and Code-switching. Explaining American Finnish . Amsterdam/Philadelphia: Benjamins. Lehmann, C. (1983) Directions for interlinear morphemic translations. Folia Linguistica 16, 193 224. Liao, Ch.Ch. (2000) Changing dominant language use and ethnic equality in Taiwan since 1987. International Journal of the Sociology of Language 143, 165 182. Muysken, P. (2000) Bilingual Speech: A Typology of Code-mixing . Cambridge: Cambridge University Press. Myers-Scotton, C. (1993) Duelling Languages. Grammatical Structure in Codeswitching . Oxford: Clarendon Press. Myers-Scotton, C. (2003) Contact Linguistics. Bilingual Encounters and Grammatical Outcomes . Oxford: Oxford University Press. Nortier, J. (1989) Moroccan Arabic and Dutch in contact. Code-switching among Moroccans in the Netherlands. Doctoral dissertation, University of Amsterdam [published in 1990 by Foris, Dordrecht]. Pfaff, C. (1979) Constraints on language mixing. Language 55, 291 319. Poplack, S. (1980) Sometimes Ill start a sentence in Spanish y termino en Espan ol. Linguistics 18, 581 618. Poplack, S. (1998) Contrasting patterns of code-switching in two communities. In P. Trudgill and J. Cheshire (eds) Sociolinguistics Reader (Vol. 1) (pp. 44 65). Oxford: Oxford University Press. Sankoff, D. and Poplack, S. (1981) A formal grammar for code-switching. Papers in Linguistics: International Journal of Human Communication 14 (1), 3 45. Sankoff, D., Poplack, S. and Vanniarajan, S. (1990) The case of the nonce loan in Tamil. Language Variation and Change 2, 71 101. Thomas, C. (1982) Registers in Welsh. International Journal of the Sociology of Language 35, 87 115. Treffers-Daller, J. (1994) Mixing Two Languages. French Dutch Contact in Comparative Perspective . Berlin/New York: Mouton de Gruyter. Wang, S.-L. (2007) Evaluating competing models of code-switching with reference to Mandarin/Tsou and Mandarin/Southern Min data. Unpublished PhD dissertation, University of Wales, Bangor. Zeitoun, E. (1992) A syntactic and semantic study of Tsou Focus System. MA Dissertation, National Tsing Hua University, Hsinchu.
Appendix
List of glosses 3P etc. third person plural etc. 3S etc. third person singular etc. AF affirmative particle AG agentive CLF classifier CLM class marker EVAL evaluative particle EXCL exclamative particle
340
F IND INF INFOR NEG NFIN NOM OBL PO PR PRO PRT PST Q REF REP SUBJ VIS
feminine gender indicative infinitive informative particle negation non-finite nominative oblique possessor present pronoun sentence particle past question particle reflexive particle of reporting subjunctive visible

Code Switch

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Code Switch

Caricato da

Copyright:

Formati disponibili

Structured Variation in Codeswitching: Towards an Empirically Based Typology of Bilingual Speech Patterns

Keywords: bilingual speech, codeswitching, Chinese, Mandarin, Taiwanese, Tsou, Welsh

Structured Variation in Codeswitching

The International Journal of Bilingual Education and Bilingualism

Earlier Attempts to Characterise Bilingual Speech Corpora

Author and date

Languages and location

Structured Variation in Codeswitching

Moroccan Arabic Dutch. Netherlands

Only part of the material transcribed (because of switches)

[MyersSwahili EngScotton, 1993] lish. Kenya

General indication of Swahili as overall language Single-word switches in both languages

Table with single English forms per category

Discussion about borrowings

French Alsatian. Strasbourg

Survey data: overall use of Fr, Als, and switching

Multiple word switches

Author and date

Languages and location

French Dutch. Brussels

Total no. of words in both languages

Turkish Dutch. Netherlands

Not given explicitly

Not available Not available

Cree French. Canada

The International Journal of Bilingual Education and Bilingualism

Finnish English. USA

Fair amount of discussion

Structured Variation in Codeswitching

The Approach Taken by Muysken (2000)

Figure 1 The insertional codeswitching pattern19

The International Journal of Bilingual Education and Bilingualism

Structured Variation in Codeswitching

bij jonge mense with younger people

Figure 3 The congruent lexicalisation pattern

The International Journal of Bilingual Education and Bilingualism

Structured Variation in Codeswitching

 , indicative of a specific pattern; -, counterindicative of a specific pattern

The International Journal of Bilingual Education and Bilingualism

Muyskens (2000) Framework: A Quantitative Implementation

Structured Variation in Codeswitching

Alternation Congruent lexicalisation

Typological distance Typologically similar languages

The International Journal of Bilingual Education and Bilingualism

Structured Variation in Codeswitching

The International Journal of Bilingual Education and Bilingualism

Score on INS Score on ALT Score on CON 0 0 -1 0 -1

Structured Variation in Codeswitching

Major clause boundary

Score on INS Score on ALT Score on CON 0 1 1

Dummy word insertion

The International Journal of Bilingual Education and Bilingualism

Structured Variation in Codeswitching

The International Journal of Bilingual Education and Bilingualism

Structured Variation in Codeswitching

The International Journal of Bilingual Education and Bilingualism

Case Studies: Welsh English, Tsou Mandarin and Taiwanese Mandarin

Structured Variation in Codeswitching

Major clause boundary

Dummy word insertion

See endnote (4)

The International Journal of Bilingual Education and Bilingualism

Structured Variation in Codeswitching

, indicative of a specific pattern; -, counterindicative of a specific pattern

Muyskens Features Verb 0 -