Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
motivation to switch from English-language sites will be limited to those for whom
issues of
identity outweigh issues of information (p.233). The strong interdependence
between
technology and language sets a standard for academics, research and development.
Within the
fields of science and technology the language of its technical materials and manuals
builds
momentum to create more. Thus, identity and its impact on the future direction of
science and
technology development relates to the personal and cultural beginnings of
language. The
presentation of cultural identity and its connection to language will be expanded in
Chapter Two.
The upsurge of English in business, science and technology created an increase in
the
popularity of English language learning and teaching abroad. Motivated young
students and
working adults searching for a better quality of life sought out English schools and
products to
help improve their English as a Second or Foreign Language skills in hopes of
obtaining better
working conditions and/or an increase in salary. Pennycook stated, English is all too
often
assumed to be a language that holds out promise of social and economic
development to all who
learn it, a language of equal opportunity, a language that the world needs in order
to be able to
communicate (2010, p.116). As controversial as the expanding reach of English is,
the utility of
it has further raised its status in many parts of the world and likewise fostered high
rates of
English development and interest.
Global English is a term often used to describe Englishs ability to assert itself
around the
U.K. and Canada (p. 145). Although this present research study does not focus on
language
shift and instead concentrates on code-switching and issues of language-mixing, the
complexity
of language shift is a topic of interest and should not be ignored.
The consequences of language shift toward English and other trends that favor
English
over native languages like those mentioned earlier are important issues affecting
language choice
around the world. Heller, whose work on globalization and the corresponding
commodification
of language, stated, What we are seeing then is a shift from understanding
language as being
primarily a marker of ethnonational identity, to understanding language as being a
marketable
commodity on its own, distinct from identity (2003). In this case, English becomes
a medium of
communication around the world in regions where the global market and economic
potential
creates a niche for an international language. More information on Hellers theories
involving the
globalization of English and how it affects identity acquisition will be presented in
Chapter Two.
Identity, culture and language are intertwined on many levels and their
interrelationship
reinforces deep connections on personal, social and global scales. With so many
interconnections
between language and a way of life, how does the addition of learning language(s),
aside from a
persons native language, alter ones identity in the broader sense of culture and
society? And
how does code-switching relate to this concept and/or expand it? Multilingual
speakers must
move between multiple metacognitive processes, languages and contexts every
day to maintain
their skills and cultural understanding of what they are speaking about and how
they are using
language.
The negotiation of moving between languages becomes a common juncture for
codeswitching
even though much of it occurs at an unconscious level. But is code-switching just a
syntactical byproduct of speaking more than one language within the same
sentence or within the
same stream of thoughts? Or is code-switching a more significant indicator of a
creative
linguistic process where multiple languages and vocabulary within those languages
are forged
together to create a new form of communication? Yamuna Kachru (1992), a linguist
notes, the
nature of the relationship between language acquisition and/or linguistic
competence and
socialization on the one hand, and grammar of language and grammar of culture
on the other
has been of great interest to sociolinguists (p. 341). Research and theories
presented in this
outline and expanded upon in Chapter Two draw attention to the connections
between language
and cultural adaptation and the subsequent expansion of ones identities within a
shifting world
of borders and newly established communication opportunities.
In the latter half of the 20th Century, ethnographic researchers such as Labov
began
linking the disciplines of anthropology, sociology and linguistics and theorizing
about their
connections. Labov (1978) outlines the logical progression from anthropologists who
studied
foreign people which necessitated the learning of their languages. They reported
their
understanding of these cultures through a description of the observed grammar and
communication. Field work was conducted and they reported on previously
unstudied groups of
people from within the observed groups social structure. These sociolinguists
evolved their
fields of study combining sociology and linguistic analysis simultaneously with
empirical roots
learning multiple languages while explaining how the mother tongue can be
maintained,
especially when, at any given time, at least half of the post-elementary school nonEnglish
mother tongue world seems to be learning English, (p. 355). Indeed, Fishman and
other
researchers note the popularity of English, but at the same time Fishman discusses
the reasoning
behind adding languages for different purposes instead of consolidating all of ones
languages
down to a single one and in this case, English.
establishing an understanding of English as a background to developing codeswitching.
Learning a second language creates the opportunity for language mixing or
codeswitching
(from here this will be written as CS) to occur. Gumperz (1977) defines
conversational
CS as juxtaposing two language systems within the same conversational exchange.
Bilingual and
multilingual users commonly switch between two or more languages, especially
when they use
these languages in daily activities. Language mixing and specifically CS has become
a popular
research area in second language learning circles because of the implications of
bilingual/multilingual education and communication needs across community
boundaries
(McConvell & Meakins, 2005). Some critics have seen CS as an intrusion to language
learning
as the mixing can present itself as unexpected and messy. One assumption made
about CS is that
while you are speaking one language, you should use only that language (Heller,
1988).
CHAPTER TWO: LITERATURE REVIEW
In this chapter the literature reviews and expands on theories and research outlining
the
globalization of English and how it creates the framework for identity expansion
through
language. This chapter also discusses linguistic anthropology and sociolinguistic
issues relating
to multilingual speakers and identity construction. Moving on from the theoretical
basis of
multilingualism and identity development, language mixing and in particular, codeswitching
with an emphasis on more formal examples in the mass media and society will be
discussed.
Why has English become the choice language of global business and development?
It is
curious that English has grown in popularity around the world as a much often
discussed lingua
franca, especially in Asia where China and India are fast growing economies and
have
populations of language speakers who far outnumber the languages spoken in other
regions of
the world. Besides all of the possibilities that an Asian language would emerge as a
dominant
global language, the request for English teaching and training is as popular as ever.
Prendergast
(2008) states, at the millennial moment, defined by global capitalism and the rise
of the
knowledge economy, people around the world are buying into English, investing
their money
and time in it, hoping for a favorable outcome (p.1). English has grown to be the
illogical
choice as an international language of commerce and technology and foreign
language learning.
The free movement of people, governmental power, goods and services also has to
do
with information and systems of moving that information. One cannot discuss global
English
development in business, politics, language learning or any other purpose without
mentioning the
Council report that in Egypt, the emergence of Facebook activism grew directly out
of a strong
digital-activist community. This community was initially led by English-language
bloggers
(2012). Internal changes emanating from current events and local issues can impact
the world on
a scale that affects billions of people. Multiculturalism and multilingualism can both
lead to
beneficial gains, but it is the role of governments and educators to create a place
where both
native and foreign languages can thrive.
Examining language and economic connections from the global perspective of
information-sharing raises the question of inequality. In a world that favors global
English nonnative
speakers of English from developing countries become the global minorities. Nonnative
English speakers adopt the role of outsiders in a market that prefers the linguistic
native speaker.
The concepts surrounding global English have both positive and negative
components.
English is an arbitrary choice as any other language option would also have its
positive and
negative issues as well. However, perhaps because of the Internets origins or
international
business favoring English or its connections to democracy, English is currently the
most popular
choice. With language schools and private tutors spread across the world, English is
being taught
as a language that will enhance your chances of entering a prestigious school and
improve your
chances for financial success in the global marketplace (Kim, 2006). As more
international-based
work is conducted as companies and their suppliers spread around the world, travel
and
communication continue to be big business. People are constantly moving greater
distances for
When investigating the interaction of languages among each other and across
borders,
Gumperz comments on the work of Salisbury and the specific shifting of two New
Guinea tribal
languages across a local border. The boundaries separating these languages had
shifted over
several generations and the two groups were bilingual. Gumperz states, the shift
does not
necessarily involve the replacement of one language by another or the migration of
population,
but rather a shift in attitude (1965, p. 104). It is this attitude shift and how it
relates to language
and languages within a cultural context that is important and significant. Like the
natural
evolution of a language changing over generations incorporating different usages
and vocabulary
to relate to more contemporary times, multiple languages spoken in the same
region are likewise
evolving and interacting with each other and their environment.
Similar to Gumperz and also in contrast to Chomsky, Hymes, a sociolinguist and
anthropologist, discusses two views of linguistics. Chomskys view is labeled by
Hymes (1974)
as, Cartesian linguistics, not as a historically exact label, but in recognition of a
direction given
to the theory of language in the period following Descartes by an emphasis on the
nature of mind
as prior to experience and an analytic, universalizing, reconstituting methodology
(p. 120). The
other theory opposing Chomskys Cartesian linguistics is named Herderian
linguistics which
again is not a historically exact label, but which names Herder (1744-1801) who
placed an
emphasis on language as constituting cultural identity (p. 120). Like Gumperz,
Hymes
differentiates between the universal origins of language theory proposed by
Chomsky and the
not only use the language they are communicating in, but they will have to identify
with it to be
productive as language carries cultural components as well as syntactical ones.
Hymes writes more on the discipline of linguistic anthropology with regard to
multilingual speakers, ethnographically, one must begin, not with the function of
language in
culture, but with the functions of languages in cultures (2010, p.577). The
distinction the author
makes is that information can be gleaned across multiple settings. Languages are
active within
and across cultures and researchers should not stop at one languages
investigation, but look at
the interplay among the meanings and negotiations among more than one context.
Hymes vision
is as broad seeking as there are cultures because the intersection of those
interactions of language
and their analysis may contribute much to language study.
Hymes is referenced in their work as being one of the first people to
look at the relationship among multiple languages and cultures instead of
attributing one
language to one culture. Gal and Irvine state that their work investigates the ways
in which
boundaries between languages and dialects are socially constructed, and the
cultural processes by
which linguistic units come to be linked to social units (1995, p.970). They are
arguing that
language and culture are interconnected and created autonomously through social
groups.
Although their work is aimed at the researcher and directs attention to how
language and culture
should be studied, the larger implication is that communities of language speakers
should be seen
as having multiple boundaries of social and cultural significance affecting linguistic
ideologies.
Gumperz and Cook-Gumperz (2008) again note the previous connections to Hymes
and
Labov and the continuing research in the field of sociolinguistics and linguistic
anthropology.
Gumperz and Cook-Gumperz state that, the ethnography of communication
provided the insight
that culture is essentially a communicative phenomenon (2008, p.536). Many
researchers are
coming to the same conclusion. There are inseparable connections among language
(linguistics),
culture (anthropology) and society (sociology). The researchers in these fields all
point to
examining where the individual disciplines can stand alone, but then conclude that
the
connections among these disciplines cannot exist in isolation. If one is studying
language or
people, then one needs to look at the other as well. In their conclusion, Gumperz
and CookGumperz
encourage the relationship between sociolinguistics and linguistic anthropology, in
reconsidering the new issues of language politics and postcolonial language, now
seen as part of
a changing urban sense of personal identity and belonging (2008, p.542).
Shifting from the connections seen in the larger realm of sociolinguistics and
linguistic
anthropology, Eckert (2003) as described in Chapter One, discusses how individuals
create
identity. With a review of the literature, Eckert sees how, these studies emphasize
the
performance nature of social identity and move the focus from the reflection of
identity to the
construction of identity (2003, p.112). Sociolinguistics and linguistic anthropology
may
research the interrelationships between the fields and among culture, society and
language.
However, it is the individual speakers of a language who perform the identity
associated with
that culture and in a sense, create it. Because language and culture are fluid and
constantly
changing, they are also adapting to new inclusions and at the same time
transitioning away from
outdated and unpopular choices. Technology, inventions, entertainment and social
fads all
contribute to altering what is desirable within a community of people.
This bilingual o rmultilingual speaker can now interchange words and phrases within
multiple languages and may
communicate within those multiple languages within the same sentence or
exchange. Codeswitching
between languages reinforces that the thought processes and usage of more than
one
language are at play. The code-switching acts as a beacon for the identity
development of the
speaker. The speaker of more than one language must identify their connection to
the language
and culture to all of the languages they speak. This connection aids in expanding
the cultural
identity of the multilingual speaker in a world that has grown toward expanding
boundaries and
understanding.
Historical Basis of English in the Philippines
The Philippines deteriorating economy, extended inflation and escalating
unemployment
rates have purported English into an economic commodity. Students able to master
the language
and partner it with an exportable skilled field such as engineering, mechanics,
education or
nursing have been eligible to migrate to foreign countries at high rates through job
placement
agencies. More than 8.2 million Filipinos are said to be working abroad (Opiniano,
2006) with
the majority in the Middle East, the UK, Canada, Hong Kong and the United States
and across
the seas on cruise ships and commercial vessels. The departure of well-trained
Filipinos
searching for jobs and higher income has caused much public commentary. The
brain drain of
Filipino professionals vacating their homeland and leaving the country without skill
and talent
contribute to the Philippines economic troubles and a lack of political leadership.
However, in terms of linguistic growth and development, Filipinos migrations and
associations abroad have created extended travel networks, increased opportunities
for
international jobs and connected families spread across multiple borders. Because
of these
reasons, English has intermixed in the Philippines long enough to have born an
informal Tagalog
dialect, Taglish, which mixes the Tagalog and English languages. Thompson (2003),
a Taglish
researcher and author, reports on Filipinos language mixing by stating, their mixing
of English
and Tagalog, first called halo-halo mix-mix, Engalog, and then Taglish, spread
rapidly from the
classroom to the general populace through radio and television in much the same
way that
Tagalog had spread earlier (p. 41). The influence and contribution to language
mixing through
mass media and young people cannot be denied. Thompson continues about
Taglishs popularity
by stating, today nearly all educated Filipinos, including those in high places, use
Taglish
except in formal situations when only pure English or pure Tagalog may be
used (p. 41).
Currently in diverse locations across the Philippines, numerous examples of codeswitching are
heard in schools, the workplace and politics as well as in informal conversations and
formal
contexts such as newspapers. Language mixing such as Taglish and CS is seen as a
vital
component of communicating across language and identity boundaries in the
Philippines.
research from studies on language socialization may present clues to how and why
multilingual
people choose to utilize languages.
Continuing on from what was mentioned in Chapter One, code-switching is the use
of
more than one language while communicating. This multi-language usage can be
within the
same sentence or within the same conversation or paragraph. In the introduction of
an edited text
on code-switching, Heller states that,code-switching is seen as a boundary-leveling
or
boundary-maintaining strategy, which contributes, as a result, to the definition of
roles and role
relationships at a number of levels, to the extent that interlocutors bear multiple
role relationships
to each other (1998, p.1). The use of the word boundary with regard to language
and roles is
significant. Code-switching is a communicative technique that enables one to
access and
transcend an invisible line that relates to multiple cultures and identities and the
languages at use
within those cultures and identities. Heller is saying that the boundary crossing of
multilingual
speakers creates a relationship between and amongst the languages and cultures
and roles of the
individuals. These roles are crossed just as boundaries are crossed by speaking
multiple
languages within the same context.
Heller notes that the purpose of her volume of research, is to illustrate ways in
which the
study of code-switching addresses fundamental anthropological and sociolinguistic
issues
concerning the relationship between linguistic and social processes in the
interpretation of
experience and the construction of social reality (1998, p.2). Connecting language
and the
Scotton clarifies the implications behind utilizing more than one language in CS and
how
that affects their identity construction, by using two codes in two different turns,
however, the
speaker also has been able to encode two different identities and the breadth of
experience
associated with them (1998, p.177). Scotton reinforces the magnitude of what it
means when
multilingual speakers code-switch. Beyond analyzing the appearance of CS and
what it means
syntactically, Scottons research provides a theoretical background to the actual
practice of CS
and how it stretches ones identities through their language usage. Scotton provides
the
theoretical background that reinforces a groups motivation for being multilingual as
well as
discussing the reasons of how CS lends itself to being a creative linguistic output of
communication.
Uriciuoli, a linguistic and cultural anthropologist, expands upon the limits of
language
rules and borders. Beyond language mixing and lexical analyses exists a broader
sense of
identity, ethnicity, race and class. Languages allow speakers to identify with a
concept of
nationhood that unites together all notions of politics, economics and social
relationships.
Uriciuoli states that, the rhetorical purposes that emerge in code-switched
discourse are very
much tied into the long-term political economy of language that shapes not only the
language
situation itself but social actors relations (1995, p.529). A particular language
speaker will
most often identify themselves with the language of their particular country of
origin. Reasoning
must follow that speakers of multiple languages build on their native identity/ies
with every
This is especially noticeable with larger populations settling in one area. The
confluence of
cultures limits opportunity to maintain isolated social and linguistic communities
abroad in a
potentially global environment of constantly mixing identity.
Garrett and Baquedano-Lopez (2002), discuss the concepts associated with
language
socialization in terms of its research and types of study. Drawing on a base of
anthropological
works, sociolinguistics, psychological and sociological aspects of human
development while also
using the medium of longitudinal and ethnographic studies, the authors attempt to
explain the
process of holistic reasoning behind what constitutes creation of functional
language within a
community. The authors examine the social theory of learning, a practice-based
approach that
aids in defining the conceptual tensions between such pairings as, collectivity and
subjectivity,
power and meaning, practice (as socially constituted ways of engaging with the
world) and
identity (as a function of the mutual constitution of group and self) (p. 347).
Focusing on the
everyday, mundane details of language, their research focuses on communication
socialization
and how language is taught and interpreted within a community.
When looking at language education within a community, parents, children,
teachers,
friends and relatives all contribute to the expansion of language and the
understanding of its
meaning through socialization. The constant change of language usage and
meaning and the
complexity of mixing cultures add to both perpetual exchange and conflict.
Language is no
exception as it contributes to socialization and as Garrett and Baquedano-Lopez
report,
how language use corresponds to social categories and thus can be seen as an
identity marker
(p. 100). Associating with a particular language also aligns an individual and/or
community of
speakers with all that language and culture envelops. Food, dress and cultural
experiences such
as spiritual expression, family gatherings and celebrations contribute to marking
ones identity
and defining their communicative practices. Beyond the concept of just combining
grammar and
words of two separate languages, Sayer also states that mixing languages can be
politically and
socially significant.
Combining languages blends different cultures and can introduce those cultures to
other
communities. For instance, by including Spanish within ones English speaking or
writing can
further the linguistic/social agenda of the other culture in schools, art and politics.
Many a
politician reaches out to Latino/a voters by flourishing greetings and speeches with
Spanish. This
politically charged type of CS attempts to establish a connection between Spanish
speakers and
the politicians requesting their votes. In Florida, for instance, the Miami Herald
reported that a
politician speaking Spanish is almost a requirement for being elected to a state
office. Mitt
Romney recently released a Spanish speaking Ad called Nosotros in Florida where,
72 % of
registered Republicans are Latino (2012, HuffPost). Romney and other politicians
are using
Spanish to show the voters that they understand them and are sensitive to their
issues.
Blending two or more languages across cultures and into different contexts allows
for an
skill (p. 323). It is clear that Ribdy views CS as a method to build language
education in a
context of multilingual and multiethnic students.
In another Asian country whose population derives from a multiethnic and
multilinguistic
base, David (2003) examines the presence of CS in the formal court system. In
Malaysia, Malay, Chinese and Indians compose the bulk of the population and
English is taught
as a compulsory second language beginning in grade one. Maya Khemlani David, a
linguist with
research based on cross-cultural communication, writes on the contextual
background of CS
within the Malaysian court system and the perceptions of how this language mixing
is viewed.
Davids article (2003) states that, with differing levels of proficiency and zones of
comfort in
English and Malay it is inevitable that code-switching will be used in such a
multilingual setting
(p. 7).
In transcripts collected from Malaysian courtrooms, in which Bahasa Malaysia (BM)
is
the official language of the court, English is often used by counsel and witnesses
when they are
not able to speak BM fluently. The findings of the transcripts showed that CS was
extensive
throughout the hearings and occurred in many different situations. Some people
code-switched
habitually and other people switched languages when they were spoken to. Codeswitching
occurred to substitute technical terms and some code-switched due to limited
proficiency as
others code-switched for emphasis. Other reasons behind CS were sarcasm,
coercing witnesses,
and to quote others.
The findings of this study are evidence that CS is found in formal settings and
contribute
http://dspace.library.colostate.edu/webclient/DeliveryManager/digitool_items/csu01_
storage/2012/06/26/file_21/164068
locations. It is clear that even though English still is the principal language for web
communication, there is a growing need to develop technologies for other languages.
However, an essential prerequisite for any kind of automatic text processing is to first
identify the language in which a specific text segment is written. The work presented
here will in particular look at the problem of word-level identification of the different
languages used in social media texts. Available language detectors fail for social
media text due to the style of writing, despite a common belief that language identification is an almost solved problem (McNamee, 2005).
In social media, non-English speakers do not always use Unicode to write in their
own language, they use phonetic typing, frequently insert English elements (through
code-mixing and Anglicisms), and often mix multiple languages to express their
thoughts, making automatic language detection in social media texts a very challenging
task. All these language mixing phenomena have been discussed and defined by several linguists, with some making clear distinctions between phenomena
based on certain criteria, while others use code-mixing or code-switching as umbrella
terms to include any type of language mixing (Auer, 1999; Muysken, 2000; Gafaranga
and Torras, 2002; Bullock et al., 2014), as it is not always clear where borrowings/Anglicisms
stop and code-mixing begins (Alex, 2008). In the present paper,
code-mixing will be the term mainly used (even though code-switching thus is
equally common). Specifically, we will take code-mixing as referring to the cases
where the language changes occur inside a sentence (which also sometimes is called
intra-sentential code-switching), while we will refer to code-switching as the more
general term and in particular use it for inter-sentential phenomena.
Code-mixing is much more prominent in social media than in more formal texts, as
in the following examples of mixing between English and Bengali (the language spoken
in Eastern India and Bangladesh), where the Bengali segments (bold) are written
using phonetic typing and not Unicode. Each example fragment (in italics) is followed
by the corresponding English gloss on the line after it.
Background and Related Work
In the 1940s and 1950s, code-switching was often considered a sub-standard use of
language. However, since the 1980s it has generally been recognized as a natural part
of bilingual and multilingual language use. Linguistic efforts in the field have mainly
concentrated on the sociological and conversational necessity behind code-switching
and its linguistic nature (Muysken, 1995; Auer, 1984), dividing it into various subcategories
such as inter- vs intra-sentential switching (depending on whether it occurs
outside or inside sentence or clause boundaries); intra-word vs tag switching (if the
switching occurs within a word, for example at a morpheme boundary, or by inserting
a tag phrase or word from one language into another), and on whether the switching
is an act of identity in a group or if it is competence-related (that is, a consequence of
a lack of competence in one of the languages).
Characteristics of Code-Mixing
The first work on processing code-switched text was carried out over thirty years
ago by Joshi (1982), while efforts in developing tools for automatic language identification started even earlier (Gold, 1967). Still, the problem of applying those language
identification programs to multilingual code-mixed texts has only started to be addressed
in very recent time. However, before turning to that topic, we will first briefly
discuss previous studies on the general characteristics of code-mixing in social media
text, and in particular those on the reasons for users to mix codes, on the types and the
frequencies of code-mixing, and on gender differences.
Clearly, there are (almost) as many reasons for why people code-switch as there
are people code-switching. However, several studies of code-switching in different
type of social media texts indicate that social reasons might be the most important,
with the switching primarily being triggered by a need in the author to mark some
in-group membership. So did Sotillo (2012) investigate the types of code-mixing
occurring in short text messages, analysing an 880 SMS corpus, indicating that the
mixing often takes place at the beginning of the messages or through simple insertions,
and mainly to mark in-group membership which also Bock (2013) points to as
the main reason for code-mixing in a study on chat messages in English, Afrikaans
and isiXhosa. Similar results were obtained by Xochitiotzi Zarate (2010) in a study
They compared dictionary-based methods to language models, and with adding logistic
regression and linear-chain Conditional Random Fields (CRF). The best system
created by Nguyen and Dogruz (2013) reached a high word-level accuracy ( 97.6%),
but with a substantially lower accuracy on post-level (89.5%), even though 83% of the
posts actually were monolingual.
Similarly, Lignos and Marcus (2013) also only addressed the bi-lingual case, looking
at Spanish-English Twitter messages (tweets). The strategy chosen by Lignos and
Marcus is interesting in its simplicity: they only use the ratio of the word probability
as information source and still obtain good results, the best being 96.9% accuracy at
the word-level. However, their corpora are almost monolingual, so that result was
obtained with a baseline as high as 92.3%.
Voss et al. (2014) on the other hand worked on quite code-mixed tweets (20.2% of
their test and development sets consisted of tweets in more than one language). They
aimed to separate Romanized Moroccan Arabic (Darija), English and French tweets
using a Maximum Entropy classifier, achieving F-scores of .928 and .892 for English
and French, but only .846 for Darija due to low precision.
Carter collected tweets in five different languages (Dutch, English, French, German,
and Spanish), and manually inspected the multilingual micro-blogs for determining
which language was the dominant one in a specific tweet. He then performed
language identification at post-level only, and experimented with a range of different
models and a character n-gram distance metric, reporting a best overall classification
accuracy of 92.4% (Carter, 2012; Carter et al., 2013). Evaluation at post-level is reasonable
for tweets, as Lui and Baldwin (2014) note that users who mix languages in
their writing still tend to avoid code-switching within a tweet. However, this is not the
case for the chat messages that we address in the present paper.
Code-switching in tweets was also the topic of the shared task at the recent First
Workshop on Computational Approaches to Code Switching for which four different
code-switched corpora were collected from Twitter (Solorio et al., 2014). Three
Code-Mixing in Social Media Text
of these corpora contain English-mixed data from Nepalese, Spanish and Mandarin
Chinese, while the fourth corpus consists of tweets code-switched between Modern
Standard Arabic and Egyptian Arabic. Of those, the Mandarin Chinese and (in particular)
the Nepalese corpora exhibit very high mixing frequencies. This could be a
result of the way the corpora were collected: the data collection was specifically targeted
at finding code-switched tweets (rather than finding a representative sample of
tweets). This approach to the data collection clearly makes sense in the context of a
shared task challenge, although it might not reflect the actual level of difficulty facing
a system trying to separate live data for the same language pair.
3. The Nature of Code-Switching in Social Media Text
According to the Twitter language map, Europe and South-East Asia are the most
language-diverse areas of the ones currently exhibiting high Twitter usage. It is likely
that code-mixing is frequent in those regions, where languages change over a very
short geospatial distance and people generally have basic knowledge of the neighbouring
languages. Here we will concentrate on India, a nation with close to 500 spoken
languages (or over 1600, depending on what is counted as a language and what is
treated as a dialect) and with some 30 languages having more than 1 million speakers.
India has no national language, but 22 languages carry official status in at least
parts of the country, while English and Hindi are used for nation-wide communication.
Language diversity and dialect changes instigate frequent code-mixing in India,
and already in 1956 the countrys Central Advisory Board on Education adopted what
is called the three-language formula, stating that three languages shall be taught in
all parts of India from the middle school and upwards (Meganathan, 2011). Hence,
Indians are multi-lingual by adaptation and necessity, and frequently change and mix
languages in social media contexts. Most frequently, this entails mixing between English
and India 3.1. Data Acquisition
Various campus Facebook groups were used for the data acquisition, as detailed
in Table 1. The data was annotated by five annotators, using GATE (Bontcheva et al.,
2013), as annotation tool. The two corpora (English-Bengali and English-Hindi) were
then each split up into training (60%), development (20%), and test (20%) sets. Table 2
presents corpus statistics for both language pairs.
None of the annotators was a linguist. Out of the five, three were native Bengali
speakers who knew Hindi as well. The other two annotators were native Hindi speakers
not knowing Bengali. Hence all the English-Hindi data was annotated by all the
five annotators, while the English-Bengali data was annotated only by the three native
speakers. Among the annotators, four (both the Hindi speakers and two of the Bengali
speakers) were college students and the fifth a Bengali speaking software professional.
The annotators were instructed to tag language at the word-level with the tagset
displayed in Table 3. Each tag was accompanied by some examples. The univ
tag stands for emoticons (:), :(, etc.) and characters such as ", , >, !, and @,
while undef is for the rest of the tokens and for hard to categorize or bizarre things.
The overall annotation process was not very ambiguous and annotation instruction
was also straight-forward. The inter-annotation agreement was above 98% and 96%
(average for all the tags) for English-Bengali and English-Hindi, respectively, with
kappa measures of 0.86 and 0.82 for the two language pairs.n languages, while mixing Indian languages
is not as common, except
for that Hindi as the primary nation-wide language has high presence and influence on
the other languages of the country.
https://www.atala.org/IMG/pdf/2.Das-TAL54-3.pdf
Introduction
After centuries of British colonization, English became a common denominator between nations
across the globe. Eventually, English became the lingua franca for global communication. Due to
the colonial history and the events that led to indigenization of English in the Philippines,
English dominates in school, work and media. Studies conducted by Filipino linguist Maria
Lourdes Bautista of De La Salle University, show that Fil-English code-switching is a feature of
the linguistic repertoire of Filipinos (2004).
There are about 110 indigenous languages in the Philippines (McFarland, 1994); most of these
languages belong to the Malayo-Polynesian category of the Austronesia language family. These
languages are categorized by their mutual intelligibility. Because of this heterogeneity of
languages, bilingualism in the Philippines has existed from the very beginning of the Philippine
education system. At home, Filipino children are exposed to English words and concepts at a
very early age. To further understand bilingualism in the Philippines, linguist Stephen Krashen
distinguishes between language acquisition and language learning. He refers to language
acquisition as the subconscious assimilation without any awareness of knowing the rules. Thus,
Filipino children acquire Filipino simultaneously with English (Bautista, 2004). On the other
hand, language learning is a conscious process, achieved particularly through formal study, thus
resulting in an explicit knowledge of rules (Krashen, 1987). Therefore, English is both acquired
and learned amongst native-born Filipinos. In school, learners vernacular is used as a medium of
transitional bilingualism (Gonzalez, 1996). Moreover, English is not only taught as a curricular
subject but is also used as the dominant medium of instruction in History, Science and
Mathematics. Thus, code-switching and borrowing is a natural occurrence in the Philippine
context. Thus, code-switching between Filipino and English as well as the borrowing of English
words are born out of necessity. It is an unavoidable alternative used to teach new concepts, to
introduce new ideas in curricular subjects where the supposed medium of instruction is English.
As a result, a bilingual education scheme i.e. Filipino and English as media of instruction was
implemented as a response to the rising demand and rising linguistic trend. It was adapted in
favour of a less dominant language or as a political compromise for language rights (Gonzalez,
1988). At least four of the seven compulsory subjects in grade schools are taught in Filipino.
The colloquial term for code-switching between Filipino and English is popularly known as Taglish,
a portmanteau of Tagalog and English. Let us be reminded, that the initial name of the
indigenous official language was Tagalog which was later renamed to Filipino. However,
common people as well as linguists argue that Filipino is just another terminology for Tagalog.
Thus, Tagalog and Filipino are treated as the same, thus the coining of Taglish as the codeswitching
variety of Filipino and English. The term is also occasionally used as a generic name
for the switching between any Philippine language variety (not necessarily Tagalog) and English
(Bautista, 2004). Since code-switching is the lingua franca of urban areas throughout the
Philippine Archipelago (Bautista, 2004), Taglish is an inappropriate term to use when referring to
code-switching between Filipino language varieties and English because Tagalog is only one of
the 110 languages in the Philippines.
In order to make this study more linguistically inclusive, Fil-English is the generic name used
throughout this paper as opposed to Taglish. Thus, Fil-English denotes code-switching between
any Filipino language variety and English.
Overview of the Philippine Education System
The Philippine education system is moulded after the US educational system, but with a slight
alteration. It is divided into three levels. Primary education is from grade one to grade six (age 7
to 12) and is obligatory. Secondary education is a non-compulsory four-year education and also a
pre-requisite for college or university. This level also incorporates theoretical and vocational
knowledge as part of the national curriculum. At age 16-17, students start tertiary education
through college or university and will have finished a baccalaureate degree by the age of
approximately 20.
There are two types of schools: public and private schools. It is mostly the socio-economic
background of the students that determines the type of school. Public schools are government
funded, therefore accessible to all. However, families of the students will have to shoulder other
necessary expenses such as school uniform, stationeries and transportation. Textbooks in public
schools are lent out to students on a ratio of one book to three students. The ratio could be worse
in some areas, particularly in rural areas (Chua, 2008). These schools usually suffer from neglect
and insufficient financial support which may have negative effects on students and teachers
performance. Situations can be as bad as one teacher to 50-60 learners in one classroom. In fact,
some students have to sit on their classmates laps in order to accommodate all students in one
classroom (personal communication with Ipil national high schools students and teachers 2008 in
Ormoc City). The social-economic background of public high school students is usually not as
comfortable as those of private high school students. To make ends meet, some students have to
work with odd jobs after school. In addition, most parents of public high school students have not
gone to school for long in comparison to parents of private high school students (Jimnez &
Lockheed, 1995).
Private schools are independently run by private entrepreneurs. These schools finance themselves
through tuition fees. Unlike public schools where tuition is for free, students are required to have
the prescribed stationeries, books and school uniforms, including shoes. Textbooks are mandated
to be a one to one ratio may increase the probability of a better education. Private school
students families generally have relatively good purchasing power. It is a pre-requisite to be able
to go to a private school. Moreover, parents of such students have generally gone to school
longer than most of the parents of public high school students (Jimnez & Lockheed, 1995).
Previous Research
There has been a great deal of research conducted with regard to New Englishes. These are
mostly studies regarding language trends which have sprouted from former English-speaking
colonies. One of these is English in the Philippine context which is presented in the book Filipino
English and Taglish-Language Switching from Multiple Perspectives by Roger Thompson. He
suggests that Filipinos code-switch between English and Filipino because:
English was indigenized in the Philippines from 1898 to 1946.
When Philippines became an independent commonwealth in 1936, the rise of the Filipino
language created a linguistic tension between English and Filipino.
Bilingual Education Scheme was implemented in 1987 which gave way to the
officialization of Fil-English (Thompson, 2003).
In addition, Thompson claims that English is mostly associated by Filipinos with better
predominant in most bilingual societies such as the Philippines due to the close relationship
between languages. Fil-English goes beyond the borrowing of words or ready-made phrases; it
involves switching between languages. . . [it] is standard English placed side by side with
Filipino. It is the alternation of Filipino and English in the same discourse or conversation
(Gumperz, 1982). Further, Fil-English is the use of Filipino words, phrases, clauses and
sentences in English discourse or vice versa. Some linguists claim, like Bautista, that codeswitching
is a mode of discourse and the language of informality among middle-upper class,
college-educated, urbanized Filipinos (Bautista, 2004).
An example of Fil-English:
Fil- English - Gusto na ko mo-eat mommy. Im gutom now.
(I want to eat, mommy. Im hungry now.)
Code-switching comes in different forms and may or may not occur as a necessity, the example
above is just one form. Normally, sentences like this can be expressed purely in Filipino.
For Example:
Filipino - Gusto na ko mokaon, Nanay. Gutom na ko.
However, according to Bautista (2004), there
are two contrasting types of code-switching in
the Philippines namely proficiency-driven codeswitching
and deficiency-driven code-switching.
Proficiency driven code-switching is when the
speaker is competent in both Filipino and
English and can easily switch from one to the
other, for maximum effect. Proficiency-driven
code-switchers switch codes for precision, for
transition, for comic effect, for atmosphere, for
bridging or creating social distance, for snob
appeal and for secrecy (Goulet, 1971).
Deficiency-driven code-switching is when the
speaker is not fully competent in the use of one
language and therefore has to utilize both
languages.
This warning sign presents a rough idea of the
extent of bilingualism in the Philippines.
Bilingualism in the Philippines
Students are in contact with English on a daily basis. Although English is mostly associated with
education, there is much more English outside school premises. Means of communication such as
street signs, election posters and hazard warnings are written in English. From the day of birth,
Filipinos medical documents are printed and expressed in English. Moreover, government and
legal documents i.e. birth and baptismal certificates are archived in English, not to mention, job
interviews and hiring which are mostly carried out in English. Numbers, most importantly, are
normally expressed in English, for instance calendars, prices, times and dates. In addition, most
highly regarded and well respected daily papers such as The Philippine Daily Inquirer and the
Manila Times are printed in English. The photographs below help us further understand the
extensiveness of English and Fil-English in the Philippines.
This is a school motto painted on one of the classroom buildings in one of the participating
high schools. It shows the amount of Fil-English in the Philippines.
Some linguists argue that Fil-English code-switching is evidence of additive bilingualism which
refers to acquiring the second language without it interfering in the acquiring of a first language.
This is to explicitly say that both languages are developed simultaneously. According to Bautista
(2004), Fil-English is a linguistic resource in the bilinguals repertoire. Others believe that FilEnglish
is evidence of subtractive bilingualism which refers to the acquiring of a second
language that interferes with the acquiring of a first language. Subtractive bilingualism is also
believed to be evidence of transitional bilingualism where Filipino is still incompletely acquired
amongst learners and is inevitably replaced or interfered with English, the societally dominant
language (Lambert, 1978) in the Philippines. In addition, Fil-English is perceived by most
Filipino linguists as subtractive bilingualism. It is mocked and said to be a sign of deterioration
of English in the Philippines (Gonzalez & Sibayan, 1988).
Markedness Model
According to C. Myers-Scottons (1993), code-witching has a social function which attempts to
define and redefine the relationship between speakers in a bilingual speech community. MyersScottons
white collar jobs, and education and the likes while local languages are associated with local
based activities and relationships (Mesthrie et al 2000) i.e. home, family and friends.
English is mostly associated with the ethnic group who brought the language to the Philippines,
the Americans. The association of English to Americans may trigger an underlying colonial
mentality which encompasses Filipinos subservient attitudes towards the colonial ruler as well
as our predisposition towards aping Western ways (Constantino, 1978). Colonial mentality
corresponds to what Fanon (1967) referred to as the internalization or epidermalization of
inferiority among peoples subjected to colonization. Diglossia may be viewed as an offspring of
colonial mentality amongst Filipinos. As English is the language of the colonizers and Filipino is
the language of the colonized, diglossia has been bred through the colonial history of the
Philippines. Consequently, the history of English in the Philippines has inevitably placed the
English language superior in comparison to local vernaculars.
As claimed earlier, schools have downplayed the significance of local languages. This is
exemplified through the photographs below, taken from a principals office of one of the
participating schools.
The sign suggests that everyone who comes close to the principals office premises is advised to
speak English, regardless of the errands that brought them to the office. With all these reminders
and unofficial mandates, a hierarchy of languages is inevitably created. Diglossia sets in. English
gains more respect, while Filipino and other indigenous languages fall into subordination
(Sibayan, 1989).
As a summary to this segment, this study explores Filipino students attitudes towards English
and Fil-English. The languages involved are in a diglossic situation. This language imbalance, is
directly connected to colonial mentality theory, evident in the Philippines. English and FilEnglish
have social and professional implications. Moreover, the study examines evidence of
code-switching i.e. Fil-English as a social marker through Myers-Scottons Markedness Model.
In addition, this research explores perceptions of Fil-English as additive and subtractive
bilingualism and as proficiency-driven or deficiency-driven code-switching. These established
linguistic theories are significant concepts in order to understand the findings. These concepts
will reappear and be further elaborated in the discussion segment.
http://dspace.mah.se/bitstream/handle/2043/9650/Filipino-English.pdf