Sei sulla pagina 1di 24
STS met comes Sots a “Modeling lexical borrowability Rostaxo vax Hour Tilburg University Preven Muvaxes University of Amsterdam Inti aril, we develop ania techiguso determine borowabily— that i the eae wth whch el em or a ctegory flea Rema be bbocrowed The anal bad op wo ssumptons (1) the dtibuton of Semen both the hw and don language shoul! be ake into azoun 0 ‘lin vi cuties ae another ae no, Borowed: (2) the Boro Sty of erie ctor ay rut om ase ander operative Fe toro constants Our anaes apple to Spans borowings i Bolan ‘Gonchs onthe Hato eof baal et Borrowing of lexical items isa subject that has ven se to many offhand Tevcal borrowing has no been perceived as particulary interesting from Strctural perspective, but rather 3b cultural phenomenon. Second, the study of lial borrowing as suffered from the fact that the lexicon is dif ficul to study using the standard ool of structural nguistic analysis. Here tre Focs on one particular aspet ofthe sty of borrowing —borcowabiy ‘To what extent can a iven item oe clas of items be borrowed into another language? Are there differences between diferent lexical tems as tothe ese with which they are borrowed? Iso, what causes these differenes? In what {ollows we combine two diferent tradition in dealing with the issue of bor- rowabily (I) a tradition relating Borrowabiity to grammatical pater, (@)corpurbased socilinguisi esearch eating boerowabity fom a quan ‘ative and stastcal perspective “The present study looks atthe borrowing of Spaish elements in Bolivian ‘Quechua on the bass oF a bilingual corpus. We are not looking at che his tory o fate of individual words (e.g. the very ealy Borowing of Spanish _parlar foro speak’ ands subsequent disappearance from the donor la ‘uae, whee now hablar ithe common frm), but a general factors o con ‘Eran, Our approach fis nt the aeneral constants framework developed by Sankoff and Labov (1979) Before we approach the problem of boro ability in more detail, let us briefly survey a numberof questions involved in he general study of lea! borrowing. These questions include problems ‘of delimitation, which we eannt begin to deal with in any satisfactory way. » “ ROHLAND VAN HOUT AND PIETER MUYSKES ‘The first problem involves multple-word borrowing. hough generally lexical tems are single words, sometimes they correspond f0 phrases, as in "he case of idioms and fixed expressions. These may agnn be borrowed a holes, thus resembling mulivord switches. In thi article, we do not deal with this problem. We took the existence of Spanish borrowed fixed exres- sions (suchas asf queso that, over let's see and yest t's there) into account in the analysis of the data, but we didnot analyze the component Darts of the expressions. This means that we didnot take fkedexpresions Ito further consideration, The second problem involves the degre of adaptation. Words may be borrowed and then undergo various deares of adaptation, We do not yet now what determines this process of integration, bat it may be relevant to borrowability Because the later may changeover time. There may belong ‘erm integration of items, and there may be the development of channels for Integration, as suggested by Heath (1989). We should stat ight away that ‘we donot deal with phonological aspect of the adaptation of borrowings, however important they may be from al points of vew. Infact, we collated siferent forms ofa borrowed word including dminutives, i our data base (e.g, burro, burr, buriquite, Borgo, burrito 'donkey). “Third, often lexical borrowing will go together with phenomena sush as syntactic convergence of ifluence, andi i hard to separate the efecs of "he two in individual eases. A casein pin isthe Quechua numeral w ‘ne ‘This 4 nueneal, bot can also be sed asthe indetinte ari, posstby under the influence of Spanish un ‘a’, which is related to uno “one. ‘The fourth issue concerns the types of borrowing. The most complex ‘typology of Borrowings due to Haugen (1930), who inwoduced number ‘of concepts such as loon-lend oars, and So forth. From an anhrope- logical perspective, a ciffereat baie dtintion in lexical Borowing made by Albo (1968), who distinguished beoween substitution and addition of ‘vocabulary There is substitution If the borrowed tem ie used fr a concep that already exists inthe culture (an, ofcourse, ie expresed by a lence item in the hos language), and addition i tis anew concept. This relates irectl tothe embedding problem of Weinrich, Herzog, and Labov (1968) the structural and socal embedding ofthe borrowed ecient, What isthe "elation of a borrowing with existing semantic fields orsrutused meaning domaine? Finally, a fundamental problem for any typology of lexical borrowings ‘is how to distinguish between words that are taken from another language in discourse only accidentally in which ease we speak of lexical interference ‘or nonce Borrowings. Sankoft,Popack, & Vannarajn, 199), and words that become fully integrated int the receptor language. We look st noace borrowing late, in that we consider te fs for which word categories bor. rowing is productive process. In other respects, we see no reason to make 8 fundamental distinction beowen nonce Bertowings and ordinary borowings “The words ofa language are loose elements, ut atthe same time they are part ofa system the lexicon elf spar structured, and the context in ‘which words occur inthe sentence may also Impose structural constrains on their osurrence. These constraint may manifest themselves inthe fact that ome catepories appear to be borrowed more easily than others ora leas are borrowed more frequently than others. This fact was observed by the ‘Snkritt Willams Dwight Whitney (181), who aeived atthe following erate (0) poune~ other past of pech suffixes ~ineton—sounds “This hierarchy was elaborated on by Haugen (1950), using data from Nor ‘wegan immigrants in he United States, 0 include (2) nouns —verb—aaives adverbs —prepostos—ineretons [Nouns are borrowed more easly than verbs according to this perspective, ‘verbs more easly than adjectives, and so on Independently from Haugen, Singh (981 ame oa comparable hierarchy onthe basis of English boro (2) nouns adjsves—verbs— prepositions ‘On the bass of data from Spanish borrowings in Quechua, Muysken (1981) tentatively concluded that there maybe something Uke thefllowing hierarchy” (4) une adjectives ers preprint coordina conjuction’ = unt Five ter proute-ch pronoun —ssbordatagcntions “The data Muysken used included, among othe things, absolute numbers of ‘Spanish words ina given corpus of ecorded speech (pes, 01 tokens). These ‘numbers are given in Table ‘Several main issues come tothe fore immediatly i relation to meaning ‘ofthe concept of hierarchy. Fist, whats pessly claimed in setting up such Fierarchies as in (1-4)? Can they be tant at the kind of impicerional tniversls of borrowing developed by Moravesik (1978) when a lan- ‘suage has borrowed verb, i alo has borrowed nouns)? Is there a quant tarve lim (nthe se of ortowed elements, there are more nouns than ‘erbs)? ethere& tempor cisim (ea language ist borrows nouns, and bly then vert)? Tee interpretations are compatible, bt clearly separate. 2 ROELAND VAN HOUT ASD PIETER MUYSKEN sue torent scat ae unt ei St pe ae zi = : = i a ; We init ourseives to quantitative analysis, given the fact that we ae doing ‘ether a typological no a historia study Next, how are hierarchies of type (I), insofr as they hold ru, tobe ‘explained? What factors or constraints can be appealed to inorder to explain "he hierarchies found? In this article, we explore a umber of such actors, 'A very important factor involves one othe primary smevation for Ix. ical borrowing thats to extend the referential potential of a language. Since references established primary through nouns, thes te the elements bors owed most easily. More genealy the class of words mos closely involved ith the culture ofa language are the content words, such as adjectives, ‘nouns, verbs They may be borrowed more easly than fnction words (8. tices, pronouns, conjunctions) Because the former havea clea link t cl tural content ad the ater do not. in some eases, borrowing extends beyond cultural content words, however, and there may wel be other constrains on borrowing (4. distinguishing among differen kinds of content words). Another explanatory factor to considers the frequency of lexical ems, perhaps both in the donor language and inthe reciplent language (et, Poplac, Sankoft, & Miler, 198). “The neat factor in fact a ster of factor is stractural in nature, To what tent do syntagmatic and paradiematc constants on lexical tems, again both in the donor language and inthe ecient language, influence thee bor rowabilty? Here we focus onthe ole of inflection, the ext to which forms ‘ate part ofa paradigm, the role words play in the clase, and 20 on Finally, there is the factor of equivalence to consider. Weineich (195361) ‘ote that resistance ro borrowing is alvays a funtion, not much of prop” ies of recipient and source language by themselves, but ofthe difference In srgtares ofthe recipient and source languages MODELING LEXICAL BORROWABILITY “ {eis not posible to establish hierarchies of borrowability simply by count ing lements in corpus or dictionary. There are four important questions felevant fo sudying difersnces in borrowablty between diferent ypes oF words empirically 1. Which aegvaldtnctions do we make? 2. Dove coum types or tokens? 5. To nia do we compare ven numberof borrowed items? 4 Whar ype of corpus do we se? In the present study, we look tthe borrowing of Spanish elements belong- ing to different word clases in Bolivian Quechua, basing our work ona bili ‘gual corpus consltng of about 40 fit shor folkloric texts recorded in the Department of Potor bythe Jesuit Lngust Federico Agu (1980). They are literally transcribed, and one ofthe advantages ofthe corpus is that Aull had the stories translated into Spanish by bilngual peasants of the same ‘etion. Thus, we havea source corpus of Spanish avallable that corresponds Incontent tothe Quechua corpus and also to the tye of Spanish that will, bethe source fora east the more recent borrowings into Quechua (which sccording 1 Albd, 1968, wil bea substantial majority given that massive ‘lingual dates from the pos-1952 ea). To pve but one example of he ‘ype of matchine possible, consider the sentences taken from the Quechua corpus (Sa) andthe Spanish corpus (Sb). (9) paronay uf ola duron witsgrkts Fopine oe iver bart what you old Seba comprometgo paar complete ds ‘ead promised to ay ham wit hard sive. [Eventhough the translation snot itera one, clear that a subset of he items in the Spanish translation alzo occur, a8 borrowings, in the Quechua Almost all tues of borrowing so far have focused onthe main sytac tic categorie (eg nouns, verbs, el). There hasbeen some work that has taken semantic eds into account. [nour research, we it ourselves to the ‘orphorytactic categories n Table 2 we ilusate the categories used in our “Analysis, As Becomes lear, however, the morphosyntactc categorie play role atthe lve of description, not explanation. Indeed, the thrust of our lrgument it study why certain word lasses behave the way they doin the borrowing process. “The man seu in dealing with the hierarchies iscusein the previous ec tion show to analyze quantitative borrowing data ofthe type Table 1 he 4“ ROELAND VAN HOUT AND PIETER MUYSKEN "AMUE 2, Theceteores employed 0 clas the element In the corpus Tee ee — a | Se eS es 2 =. Btey oo Ene SET ett Slt a oS pa ‘sof Spanish cements borrowed into Ecuadoran Quechua. Obviously they cannot be drety used to establish a hierarchy ofthe ype in (4, sine ‘ere may be differing amounts of elements ofeach category avaiable for borrowing. Spanish has many more nouas than vers, and the fac tht three times as many nouns than verbs were borrowed cou be interpreted, i we take th pereentage of elements ofa category borrowed, at meaning that verbs are easier to borrow than nouns. ‘The data in Table | give a distorted picture for yet another reason: types are counted, not roken, Ths makes ais difference, beeause one word may ‘bused many tes, Inthe Sango corpus studied by Taber (1979), 508 French loans accounted for S1% ofthe types, but for only 78 of the fokens hey are used relatively infrequently. token analy ofthe elements in Table | shows that Spanish nouns are much mote fequent than this able sugges, and elements such as prepositions, advert, and quantifies are much les fe ‘quent. Should one Took atthe numberof different lements borrowed (ihe ‘pes, or should one take into account the number of tokens (ocurrences)? Interms ofthe vocabulary of the recipient language and the speakers using ‘he vocabulary, it seems more relevant to determine how many lela tems are borrowed In peaking the recipient language and aetully producing bor. ‘owed elements, the frequency of ascurrene of borrowed lea items au MODELING LEXICAL BoRROWABILITY 4 ber of tokens) can be critic factor. It implies that it is relevant to compare borrowed items both onthe mamber of types and tokens. also implies that ‘tora was distribution canbe measured both on the level of numberof tems (ope) and numberof token. "There are several possible approaches to the issue of satsically compar- 1 borrowings to other categories in the loicon If we have language sam- ple containing borrowed elements of more particularly borrowed lexical, Ite, the next tp eto compare tis set of elements to something ee. A purely setinternal, word-clastdsribution G2, where only ase of boro Heme is taken into account), as adopted in Table I, is not revealing. se- extemal comparison i called for, involving ether the recipient or the donor language. "A fist approach, then, i st-exteral comparison with word-las dist butions of the recipient language (both word-class distributions involved belong tothe same language the realplent language). Taking such an spprosch, there ae wo options. 'Asis done in Popack ta (988), who investigated English-orign vocab ary in Ottawa French, the Borrowed element can be alzulated in terms of ‘percentages ofthe recipient language vocabulary (okens; Fr = French or tn, En = Eagsh oii). (© Noun = Verb « Adee =. 10056 Nouns): Ver(n = Agiethesten) 2 108 In this way, the percentage of nouns in the French-orgin vocabulary in (uaa French is contrasted sith the percentage of nouns inthe English brgin vocabulary, and son. The structural possibilities and the frequency pattern of the recipient language ae taken as the point of comparison. For {his procedure tobe most revealing, there need o bea basic correspondence beewoun the recipient language categories andthe categories of borrowed cements. “A similar approach ist lok within each category atthe proportion of borrowed elements (inthe corpus studi). (0) Nout Verdte7) Adjeciveste) NounsEa)Verbs(Em Adicts) now cote to “To ake once agin the example of English and French, here we woud sim ply take the category of nouns i the recipient language corpus and se what percentage eof French origin and what i of Enalish origin. This approach fas the advantage tat the lack of corespondence of recipient and source lan {age categories sno problem rom a computational pont of view. I either Janguage lacks a category, we simply havea 100% distribution 6 NOBLAND VAN HOUT AND PIETER MUYSKEN "ALA 3. The dlsribation of ferent late pes Jn 341 Yaga laws rig tet) teeta Pea on al 5 Soe abe or » ‘Source Pape 4 However, when alien categories are borrowed (¢.., Spanish conjunctions 8nd prepositions in many Amerada language) comparison withthe dis tebution in terms of he guest language is meaningless. This type of bortow- ing fs 4 manifestation ofthe fact that, in some eases, lescal borrowing is linked to or coreate with stuctural borrowing. And then a comparison ith the itribuionalproperss ofthe source language is more tothe pint. 'A more Fruitful approach, and infact the one we opt fr, i therefore Secexternal comparison wih a source language corpus (the wor-lass di ‘ebutions involved belong to different languages the borrowed items nthe ‘ecient language and the ems ofthe source language), Here te dissibu. im in the source language i taken as a pont of comparison, Then the qucs- Won is: Wnat ste ehance an iadivigual source element will end up inthe ‘ecipent language? “There are no paticular properties bingy comput must have, par from ‘he ones usualy suggested. However, it should be cea that diferent lam suages will show diferent distributional patterns. In many Amerindian lan swags, there extensive cross-eferencing morphology oa the ver and ence lie need to express the arguments ofthe clause with eial means or pro nouns. Atypical example is Yagua, a Peruvian Amazonian language which Payne (1986) described as follows: “its reasonable o suggest thatthe basic clause type of Yaguais an inflected verb and nothing more, tht the use of full noun phrases or free pronouns is somehow a marked wage” (34). A Frequency count ofa number of Yaga txts ives the distribution in ‘Table3, which amounts to 34 verbs and 128 nouns or pronouns forthe text sample analyzed, In adtion, the number of pronouns wil be modes, when compared to language ike English. Similar observation can be made wth respect aunibaries and modal in languages such as Yagua. Whereas ia English tense categories are frequet ia most types of text, in Amerindian languages uch a Yagu) the notions of tense, mood, and aapect ae marked ‘on the vetb with aies, Although there ae sl some elements that ay be ‘lassie as auslaries, thee frequencies in texts wil be much lowe than in English Most studies on borrowing give some indication ofthe word-lass dist bution ofthe lexical elements borrowed. What is the use of comparing the clement being borrowed in terms of word classes, and how are the Figures, tatuated? What figures are in Tact neessary to say anything meaningfl Shout the role of word-clas categories? twill be clear that using a blingual corpus has great advantages forthe particular approach taken. For one thing, it makes t posible to obtain si rifiant results even though the size of the corps is relatively modest. “The analysis of the borrowings took place in several steps. We fist had to ‘stablsh the relevant types and tokens ia the database before we coud ana- Iyze the data quantitatively Data bases ‘The lexical databases usd inthe analyses are standardized, in that endings including Mlestion and derivaonal affine) have been taken off and speling ‘variants have ben regularized. Because Quechua morphology i ery rch, Inany different forms of a word were counted as one lexical type. This also holds forthe Spanish orowigs in that database, since they were often ‘marked with all relevant Quechua sffines as wll. We started with wo sep- 1. Qua: onaning tot Qushus word types and Spanish word pet and ‘he espetve number of token he dats are derived tom the corpus of (Gare tet 2, Spanih coining Spanish word ype ad hub of tokens the ata te derive om te corpus of Spanish xs “These two databases were combined into one general database. 5 Comblae: coining all word ypes ofthe Ques an the Spanish ata tae andthe rempetive umber ook; word peso occ in Oe ‘te oiial das bases sng have an ocuenc 20 fr he ngage fovpus in gues, ‘Ascan be seen in Table 4, the Quechua database contains 908 word types, nd 6,870 tokens; 28 types could not be classified as ether Quechua or Span {eh OF the remaining BBD types, 40.8% are Spansh-based. The relative share ia rokens is much lower 23 are Spanish based. all he amount of ‘Spanish elements fairy hh, patculaly when we compare to other cot- ‘ora such asthe Otava corps (Poplacke a, 1988) othe Brussels corpus (Teefes, 199. “The numberof hapaxcs (word types tht oecur only once i also given in “Table Thee frequency divided by Sze of the sample (the numberof tokens) ‘aa 4, Number of totems, es and pase in the Qusoua ta bse rete th orait Lanse Tre Token Hage ocr 8 Sse ia ” ‘ams $. Number of tokens, pes and pases ie the Spanth date ase ited 0 tr ri Lome Tre Taken Ragas Qe " Sor masa ves an impression of tne productivity of a specific word category (ct ‘Baayen, 1989). One problem shat our sample not particularly large pe ‘lly not f the numberof Spanish words i consiered.* Nevertheless, one «an conclude that the number of Spanish hapaxes in elation to the number ‘of Spanish word tokens i fay high. This result indicates that the number of possible Spanish words (ee types in Quechua isnot exhausted at all by fur sample, I larger sample were drawn (fact, it more realistic 1058) ‘hati larger corpus would have been available), new Spanish word types would be found.” The outcomes in our sample sugpet that Spsnish-baced words in Quechua is rather extended category. The contribution af Span [sh words to Quechua not restricted toa stall and specific se of borrowed Spanish word types, [As Table 5 shows in comparison with Table 4 the Spanish database con- is more word tokens than the Quechua database, dve tothe fat that ia Spanish many concepts ae expressed by separate words, including determin: fs and cite pronouns, that are affixal in Quechua, “able S shows too that several Quechua word types are present inthe Span ‘sh database. Tat is nota surpise given the fact thatthe Spanish texts were translations ofthe orginal Quechua texts by tingal speakers from the same region in Bolivia. The word-lassdntribution ofthe Quechua word types i 4s follows: 7 exclamations, 25 nouns, § names, verb. The most frequent ‘Quectua word in the Spanish data base isthe noun eandor (number of tokens = 73), word whi was already borrowed from Quechua (kuntur) MODELING LEXICAL RORROWADILITY * long ago. We pay no further attention to this Quechua subse inthe Span- in ta base because this subse is latively small and not very intresting fom a mote structural point of view. ‘With espect to leial chess, normally one may expect that the Span- lsh and Quochua database are about equally ich in the lexical means used, despite the diference Inthe number of efferent word types. Measures of lx teal richness relat in some way the number of types othe numberof tokens, One ofthe beter measures for lexical richness (. Broede, Ears, & van Lou, 1993; van Hout & Vermeer, 1989) isthe number of types divided by the square root of the numberof tokens (especially when the ie ofthe sam ple it reatively small) a pariully bad measure would be the type/token ‘allo Applying the square root measure, the Quechua corpus gives an out ‘Some tha ive higher than the one obtained forthe Spanish database, The ‘Quechua corpus gives an outcome of 10.95 (908/(6¥70: the Spanish corpus ‘ves an outcome of 9.91 (103/VI0HGT. However, this iference can most probably be ascribed tothe high number of Spanish borrowings In the ‘Quechua data bate, since the extensive borrowing hs le othe availability ‘of Quechua and Spanish words fr expressing similar coneps. The relatively high correspondence in lexical chness ofthe two data bases lustats, as ‘could have been expected, tha the two language corpora involved are quite quivaleat. A random distribution? ‘When we consider the st of Borowed items in a given recipient languase ample, there isthe obviow possibilty thatthe dseibuton may be random with respect to word clase, The words Borrowed into recipient language Inigh be the ovtome of a el andom process by which every element has fn equal chance of being borrowed, In term ofthe reipent language, one ‘ight say that each element has an equal chance of being Spanish, Despite {he fast tha his hypothesis cems tobe tvial and wlatresting from ais tuistie point of view, the eeection ofthis hypothesis must be approached from different angles, First ths hypothesis can be elated to the numberof {ypes o the aunber of tokens occurring in Quechu. Second, the random nes of borrowing ean be related to frequency distributions ofthe numberof {pes and tokens typical of Spaish. Tn order to text the random model for the recipient language, the dist ‘oti figures of Quechua and Spanish word ype and tokens inthe Quechua database are given in Table or all word clases distinguished. Tale 6 also Includes the figures forthe Spanish lexical items in the Spanish data base, IT bortowably is ely random with respect to the ingustie properties of the rspient language, the dsibution of Spanish words inthe Quechua dts bse shouldbe independent from the word clas 8 Word belongs 10. Given the figures in Table 6, oe diel to reject the randomness bypothess ‘We can restrict ourselves to the daa on the category of nouns and verbs. The "AM 6, Type and token inthe Quchus and Spanth date bas by word clas = Sanh Coes cas__hed__Tes Tee Q Tokens ym Tokens Sa 8G : Ey a os [SS EER SS SS ac ein eS aba a el Se Te ‘Me 7. Number of types and tokens nthe Queche dete base ‘oy language of tin ond word se, eon Sa Giese ata both for types and tokens inthe Quechua database ae summarized in “Table 7. Table 7 shows that Spanish verbs are strongly underepresented in felation to nouns. Their dstbution clay contadits the randomness hypothesis. This conclusion holds both forthe numberof types and the nu berof Tokens. MODELING LEXICAL BoRROWABILITY st ‘ams 8, Number of rps and tokens of Spanish words Inthe Quechua and Span database by wordless vs io ‘Quechua and Spanish databases sven in Table 8. Again, the category of nouns and verbs can be compared. Now the distribution ver word classes ‘ofthe Spanish words in the Quechia database has tobe compared with he ‘istibution inthe Spanish database. The esuls are abvios. Both the nu be of verb types and verb tokens ae foo low inthe Quechu data base, The ctu would become even more evident ifthe Spanish words inthe catego Fis of aux and cop were incorporated ito the category of verbs, sce aux ‘occurs in the Spanish database only (with high token frequen: n= 80) lad cop has a high token frequency only in the Spanish database (n= 26; the token frequency of Spnis-tased copulas inthe Quechua databases), ‘Werhave to conclude that the random model has insufficient explanatory power It doesnot explain difference inthe amount of borrowing between Word clase in satisfactory way. Given this et, we now dacs the way tte sued the Four possible sets of factors influencing borrow lex- ‘al content, fequency, structural coherence factors, and equivalence Lesicol content ‘We teste the hypothesis that leial items that potently carry cultural _meaaing ae primarily borrowed by distinguishing between clear content word types (ers, nouns, names, exclamaties, adjectives, and adverbs—many of ‘which are to some exten content-bound) and the remaining word types (including cleae function worés tke pronouns, determiners and quanti) ‘We cn test tis hypothesis onthe Spanish word types inthe database. For ‘each Spanish word type we know whether ican be lasified as borrow ing in our data ase. The distinction between + and -content word types can ‘bemade by using the morphosyatactc word-as information available in the database. Crosstabulating borrowing and scontent word gives the out comes presented in Table 9. ‘We notice a strong effect in Table 9. The percentage of borrowings inthe category of #eontent word ype i 63 (338/930); the percentage inthe cat gory of ~content word types i 19-2 25/130). The distinction in content ‘words isan imteresing candidate to inventiaate further, eipecialy in elation to other candidate factors. The effect of the conten factor i stong, bul 32 ROELAND VAN HOUT AND PIETER MUYSKES. "AME 9. Cusstabulation of sborowing andthe Scontnt ord ction of the Spanish word types Foal om 381 Isalso a mater of degree. Even within the atgory of ~content words many borrowings were found, Frequency {A frequency mode claims thatthe chance or probability of a specifi word being borrowed is determined by the frequency ofthe word in question inthe donor language. An even stronger interpretation ofthe fequency mode can be formulated by adding the stipulation that not only is borowing deer- rine bythe Frequency Tacor, But sos the numb of occurreces af «Bot- owed word in the epient language (the numberof tokens). "The most natural interpretation of the Fequency factor soem to Bo take lnmpact ofthe frequency Factor we have to nvertigate whether the borrowed Spnish words havea relatively high frequency of occurrence Inthe Spanish database. The more obvious interpretation ofthe frequency factor may not, prevent us from facing recipient language version ofthe frequency Tato. In theory, two opposite paters are possible: ‘Word fequncy in he repent language operates aa pling ator Frequent trrds readily acep Borrowed competitors quent mors Yesst source language competes. “The data ase available is not set up in terms ofa replacement model @ replacement model would equtespeling out whic borrowed Spanish words ‘compete with and replace which Quechua words. We have took for indret ‘evidence if we want to investigate the recipient language interpretation ofthe frequency factor. Allin all, it can be concluded already that an extended imerpretation ofthe frequency model allows a ange of various posible pro- cesses going on in borrowing. And if Fequency plays a role on bot sides of the borrowing process, we may be faced with the following “paradox” Frequency operates as a promoting factor: requent Word in the source lan ung are beter candidate for bang borowed. Frequency operates as a ining or locking facto: reuent words inthe repens langune eit barowing competitors rom he sour language. MODELING LEXICAL BoRROWABILITY 3 “ux 10. Token frequency of word tp inthe Quechua dat basen relaion To the reetv share of Spansh word pe ioe Feey Gates Speah Toul Frey Can ‘hems Soi ‘The opposite operation of these two forces might imply that especialy lex comes with an average frequency in both the eiient ad source language are Involved in processes of borrowing “Table inate a fequency patter nthe Quechua database with revpet to the token frequency of Quechua versus Spanish word types. The Spanish ‘words have a much lower token-type rato. For instance, the Quechua verbs havea token-iype aio (= meaa fequency) of 9.01 (1929/214), whereas the ‘Spanish verbs have atoken-type ratio of 2.9 (281/81). 1 hs patern con Sslen, inthe sense thatthe higher the frequency of @ word the lower the probability that a word it Spanish? Tu order to gt «good impression ofthe fruency patter, the word tyes in the Quechua data base were split up into seven frequency classes. Net, the number of Quechua word types andthe numberof Spanish word types ‘were counted, The outcomes are presented in Tale 10, which shows a step- Wise decreas ofthe share of Spanish word types as token frequency inereases, “The more frequen a word inthe Quechua database, the ess the chance that itis Spanish, This suggests ingirecty tht frequency inthe recipient language ‘may operate as an inhibiting factor Direct evidence would involve an anal sis based on a replacement approach, implying detalled analysis of the (Quechua and Spanish lexicon. We donot pursue this hee ‘What ste role of word frequencies i relation tothe soutce language? For any Spanish word ype, we know its token frequency inte Spanish data ‘base. The relationship between fequency and borrowing can be investigated ‘by computine the corelation Between the token frequency in the Spanish data ‘base and whether «Spanish word tye is being borrowed. Because word fre ‘quencies have rather skewed distribution (witha small umber of word {ype having a very high frequency, iis beter to sea transformed fe ‘quency score as well~for instance, the log ofthe frequency. Because a log ‘ale of zero doesnot eit, a value of i added oa frequencies: log tr ‘quency ~ lo frequency +5). The correlations between borrowing and he Frequency and the log Frequency canbe found in Table 1. 4 ROELAND VAN HOUT AND PIETER MUYSKEN “Ta Il, Correlations between word frequencies of Spanish onde inthe Spanish data bee ond sborrowine ‘Table 1 sves rather low correlations forall Spanish word types together, ‘The coreation coefficient is signitiant because of the high numberof abet ‘ations but the relationship is weak, Te log Fequency even contradicts the ‘correlation found forthe raw frequency; the cortelatio coefficient is nega ie. Iindicates thatthe frequency factor forthe whole data set canbe sid {o be rather minimal, if not absea altogether. This result ot surprising fone takes nc account that highly frequent words are often function words, which are generally resistant to borrowing. This elfect canbe invested by distinguishing between lear +sontent words (adv, ») and —content words, a distinction we uied before. The correlations are given in Table “This table shows ha the distinction hetween conten words fnul la lear difference. The fequency factor operates asa pushing or favoring face {or within the class of “content woed types. There ino notable effec within the cass of content word types. We must draw the conclusion that fe- ‘quency effects possibly play a role ia the chance of a word of beng borrowed, but aio that this requens effet should be interpreted with care Because is impact may depend on other factor, ‘Structural coherence factors ‘We explore the importance of structural factors from a numberof preps: ‘ives: peripberalty in the clause, ease assignment, paradigmatic coherence, and inflection. 1s clear from a number of recent studies on language contac that words ‘hat play a peripheral ole in sentence grammar, particularly the grammar of ‘he recipient language (e., interjection, some types of adverbs, discourse markers, and even sentence coordination markers), are borrowed relatively «easly. Not that this ithe sare las of clemeats that partispaes in emblem li itching, therype of phenomenon halfway between nterentetial and intrasentental code switching (ct. Poplac, 1980). What this suggests that Sritching and borrowing may to some extent be subject fo the same type of «constraints; both ae difficult when te coherence of the clauses disturbed ‘A telated way fo approach the same question so se to what extent ct ‘goves are diet implied nthe organization of the sentence: for example, MODELING LEXICAL soRROWARILITY 55 ‘vers more crucial to that organization than a noun, and pethapsthere- fore it may be harder to borrow verbs than aouns, This, for 8 nown/vero amet in borrovabily, te principal explanation could lin the difer {ent role that these categories play i the organization ofthe sentence. Nowns ‘denote elements referred o, and verbs ink the elements referted 1 0 each ‘other. In other words, nouns are inert a fat asthe syntactic makeup ofthe ‘lause i concerned (~structure building). limited subclass, action nom ‘alztions, shares the property of allowing a complement with verbs. None- ‘theless, this complement is never obligatory. Verbs are active i the symtax ‘and form the nuceus ofthe clause. More generally, complementier, aX ire, copulas, and vers play aoe structuring the else, and pre/post positions, determines, and demonstrates help structure the argument Constituents in the cause (+struture building) We take tis feature to Be ‘comparable for Quechua and Spanish "The cental role ofthe verb also reflected in its assigning different cases, which may be speci to that verb and idiosyncratic to diferent elements inthe sentence: This aso inhibits thee being taken from one sytem to nother. Pepostions sare his property wit verbs, which may inhibit heir ‘being borrowed. On the other hand, prepostons ae rarely infected them- selves. Nonetheless, they may not be frequently borrowed. Additional fe {ors hindering their borrowabilty inclse the ac tha often their meaning 's grammatiaied (and hence language specif), that they are sometimes Darasiematically organized ystematelysvbdviding a semantic Wild ssh 15 space ina language specific way) or that they themselves are part ofthe Subeategoriation of a verb or adstive (angry with, ara of, walt for, ‘autend on). This ine of thinking would pedi that elements such as tans tive verbs and prepostons would be harder to borrow than, for example, In addition co these factors deriving from syatagmatie coherence (i.e. peripheraliy,stucurebuldig,casemarking), thee is also paradigmatic Coherence. Paradigmate coherence the ishnes of oreanization of even Subcategory. The pronoun system is tghly organized, and itis difficult to Imagine English borrowing a aew pronoun o erate a Sond perio dal in addition to second person singular and plural. For this reason, determines, pronouns, demonstrative, and other paradigmatially organized words may be harder to borrow. Although, in principle, the paradigmatic cobesion of ‘lements of the recipient language isthe crucial one, a auinber of categories "elevant here ae simply absent in Quechua, so we took cohesion nether oF both languages asthe criterion. Notice also that paradigmatic organization in the donor language may also stand in the way of borrowing, because par adigmatically organized cements often have rather abstac, grammatical ‘meanings which ae nt accessible independently ofthe subsystem 0 which ‘hey belog.° Ths, Spanish ete“ and ese that" may be hard to boro «ue 10 wo factors first, their Quechua equivalent kay this and ehay hat” also form a tight subsysiem (0 which it would bedfcult to add new mem. 6 ROELAND VAN HOUT AND PIETER MUYEKES “nur 12, Coding ofthe sretral coherence fctor® ber) Second, the meaning of exes defined in opposition to that of ese (and ‘of aquel yonder), and ths he element i not quite independently transport able into another system. However, we were not able to separate these #0 factor. (Often the tfeent elements inthe clause ate marked on the ver, which may be morphologically quite comple fortis eason. Borrowing willy ‘morphological intepration as well, and this often a hindering factor. ASep- arate dimension, then, wil be inflection: agreement (subjeebjec/verb and ajective/noun agreement) and case affixes (ef. Moravesik, 1978; a cical survey ofthe literature on borrowably of inflection is given in Thomason 4 Kaufman, 1988), We predicted that uninfected elements would beeaslet to borrow than inflected ones. nation tothe inflection ofthe donor lan- suage, we assumed that then leon of the host language plays ale. Ie msy well be easier to incorporate elements into the lexi that Jo NOt have become integrated morphologically ae well “The structural factors to be used in the analysis are summarize i Table 12. The factors are: pesiperaiy (re), structure bulding (r=). tan- ‘ikvty mas), paradigmatic (para), inflection doaor (0), ad inflee tion host (ox). Moreover, Table 12 shows how the factors are coded for the different morphosymacic lasses. To est the rans factor, ll verbs MODELING LEXICAL BoRROWABILITY st “HOLE 13, Crosstabultion of sborrowing andthe structure! fcr ded “ontney Spann Word Tyee rowing +8arowing Sotto ~peroheatey 2 oa ane 2oeeietan » oe a were code for the property of being transive or not. We decided 10 code ‘Bouns as not inflected forthe language par involved and to code adjectives, 48 inflected only in the donor language In Quechua, nouns and adjectives an be affixed with case markers or posipsitons, and, ination, they can ‘ary person markers. There also plural marker, particularly used when ‘nouns are nigh on te animacy rare. Ine no adjectve-noun aree- ment, Indeed, many researchers take nouns and adjecives to form one si se category in Quechua (f, Lefebvre & Muysken 1988) In Spanish, nouns fan be marked for number, and they are either feminine of masculine Something thats wsbe ia their -e or ending. The reason we have taken ‘nouns to be uninfected is that they need no infetin contrary to verbs) and ate Interpreted in the language contact situation involved as uninfected ‘again, contrary to verbs). Spanish nouns ae either borrowed as singulars (plata “money’ oF as ivaviant plural (arts ‘errings,xabas ima beans) {Gf Alb, 1968), The end vowels marking gender are ated as part ofthe stem. Marking nouns as inflected would have obfuscated the large difference With verbs, which are always nected, "The results forthe separate analyses on thes factors are given in Table 13, “Thesis factors were cross-tabulated with sbortowing, and-a chi-square was calculated to determine the strength ofthe dependency.® The figures, the hisquare values, andthe contingency coefficients in Table 13 make clear ‘hat particularly paradigmatisty and infleeson nthe donor language have 2 Til strong effet taken by dhemslies. Structure bulding and inflection inthe recpient language aio have an effet, but apparently les strong, sven {he resulting contingency coefFiients. Al significant effects have the expected Airecion: sstructue-bulding, +paradigmatic, +infecton donor, and “tnfleston hos turn otto be inhibiting factor for borzowabilty 2 ROELAND VAN HOUT AND PIETER MUYsKEN "AME 16. Crostabuation of borrowing ond ‘he sequblene distinction” Spain Wo Typ ——Bowoving—sBaorng Fo Eats 2 * a escent @ ay so Equivalence ‘he final factor we want to consider is equivalence between word cases, ‘There is litle precise information available about the universality of word clases (a classe study i Stele et al, 1981). Two points on which there i 8 ‘moderate amount of consensus ar: () there are verbs and nouns in al fuges, and (2) there sno exact match between the minor categories in if ferent languages. Well documented problem area in categoria equivalence Include: fll pronouns (West Germanic) versus elie (Romance) (Kayne, 1975), auxiliaries (Modern English) versus main vers (Old English, Dutch) (Lightfoot, 179, predicate adjectives (Indo-European) versus stative verbs ‘rs, Caribbean Creoles) OWinford, 1993) clauses Lndo-European) vers ‘omializations (4, Turkeh/ Questa) (Lefebvre fe Muysken, 1988). and cases (eg, Turkish, Fansh) versus adposione (eg, English) (van Rlems- ‘ik, 198) All oppositions in his it corespond 0 otional equivalents with siferent categoria realizations. The list can be extended once we gti ‘eterminer systems. We inore forthe moment the sue of equvlence of ‘morphosyntatic and morpholesicl categories, such a case, conjugation lass, gender although these ar clearly relevant se wel iis clear that a numberof Spanish categories are either part ofa larger «las (aueliares) or expresie by ais complementizes and conjunctions, determines, prepositions, tc and posesvepronouss) in Quechua, That, there is no equivalence for these categories. The results for Borrowing a sven in Table 16 "Equivalence may sem to have some minor effect on borrowing; the ci square value (87, df =) andthe contingency coefficient (O41), howeves are not significant. This may perhaps be interpreted implying that helen ical classes ofa language are Somewhat separate fom the gamma ofthat same language ‘The constraints model Now that we have looked atthe individual, separate consbution of factors ‘of lexical content, requncy,stsctura dimensions of cohesion, and equiv: MopELive texicat BORROWAMILITY 9 ‘uate 1S. Resuls of loge regreson onthe constrains aa! ae om vasa BSE Wad Seine RB) lence, we may wonder how these factors combine o determine the borrow: Silty ofs given source-langage item. We have studied this through the statistical technique of loge regression, a technique that matches the Spprosch in variable ule analysis. Hosmer &Lemeshow, 1989; Rieweld ‘© an Hout, 1993). In this analysis, the dependent variable the property ‘of being borzowed or not, which is binomial variable (a indicating “bor owing anda indicating rborrowing). The question to be ankwered i: 10 {he chance of lexical tem inthe sours language being borzowed? The inde pendent or criterion variables used are: content, log frequency (sing the ‘ata loparithm), transits, periphery, structure bullig, par adigmatic, inflection donor, inflection host, =equvalence, These (wo- ‘valved variables ar coded as 0 (=) and 1 (+). Inthe repression analysis, the criterion variables (our factors) were eva- uated step by step sto tel possible contibuion to the explanation ofthe borrowing variable, Beginning withthe variable withthe strongest impact, ‘he anal then successively evaluates the remaining factors. The final out comes ofthe logistic regresion snalas ate gven in Table 15.” “Te outcomes inthe top half of Table 15 pve information on a general level ofthe sucessulnest ofthe analysis. The significance ofthe model chi- square means that, given the variables entered in the analysis, the degree of explanatory force is acceptable although the degree of explanation could have been mich beter, sven the residual amount of chisquare under the heading of ~? log likelihood. The goodnes of fit is acceptable aswel sven isnonsigifieane, which means that thee ia reasonable atch between the predicted probably values of being borrowed snd the borrowings actually found “The second part of Table 15 gives detailed information onthe variables entre in the analyst ist, we can observe that only Tour variables were entered, hich implies that the remaining variable donot improve the Out. © OELAND VAN HOUT AND PIETER MUYaKEN ‘ams 16, Ordering ofthe relevant consi nd the morhonmtae word ls from extol borrow OE EES e : z a comp con) ec et ep + dc pose romps comes in a sgifcane way. The values unde B inform ws abou the strength ‘ofthe factor. The significance of the individual variables i als evaluated. lcurns out thatthe frequency factor og frequeney) has a probability valuc (of jus above 05. Should this variable be remaved from the mode!” Given ‘he fat that the frequency factor is measured very roughly because of the small sie of the corpus, we canbe sats with the reul obtained, “The B values show that paradigmatic is he srongest structural Factor in our mode. The second strongest structural factor i infetion inthe donor language. Frequency also as a (omewhat weaker) effet, whereas periph the factors of conten, stuctre-bulding, transitivity, infection host, and auivalence, “The role ofthe factors can be made more understandable by ordering the ‘morphosynactc classes from mare easly borrowed to more dificult. The ‘order is based on the variables of paradiamatcy, infection donor language, and peripheralty, aswel asthe B values inthe tegesion analysis Table 16 Gontins the resus. AS this table shows, the ess are fail plasible- Mest «ally borrowed are nominal heads without sructure-bulding properties, Functional heads occur lower inthis irarchy, The moe dificult categories {o borrow consi of functional ements that are nina n nature and fom ‘ighly organized subsystems. Im adaltion, the frequency factors operative in al eategores tinguished, requeny favors borrowing quite generally. The one puzzling result of out analysis is that peripheralty has the opposite ee from the oe prediced for the differentiation ofthe clas [n, name! and the cle adv, comp, con, el, ne, p]- Apparently nominal nce of constituents canbe quite easy borrowed inspite of not being a ll peripheral Weean conclude thatthe constraints model, operating onthe basis ofa com- parson beeen a donor language and a recipient language corpus, seem MODELING LEXICAL BoRROWABILITY 6 bea promising way of studying the proces of lexical borrowing. The results may be interpreted in such a way a 10 Setup a new hierarchy of borrowabil- iy which would simply rent fom classifying the individual eategores terms of ther vale forthe factors inhibiting or furthering the borrowing process. Te should be stressed that the results we obtained for Spanish borrowings in Bolivian Quechua ae nt meant tobe independent of this particular et of languages, In other language pais, quite different factors may turn out to beoperan, depending on socoingistc factors and dfereat contrasting ‘ypolonial properties. The same hols forthe particular factors chosen and {he way they ae applied to casi the borrowings. The ones We chose late, in part, to particular properties of Spanish and Quechua and to our fist limestone of which factors may have been steering the Borrowing proces 1, att psa ed or cdf oma 1 Selagnage corpse by Pople al (fortunes mh ager. We nat ramen teeta ele re ce tac ce ees Fs ot aRreateme cap tee thre Ee tcgrt mae Sos ‘Selena tae cate a rae st Maen iboeeiiees tees hme tee ere amc op cf Sune ta eos trae wtb aribe Wipe itera et open ena naw ‘Sesytlgeemr teers onemwnen i ee SUSAN ae et pe a= 5 wah we Sityiiat Moora tas owen “aac enn ating Tee mcr eens ol oaewen en ek SVitrietigme pan wo sony ene gas iia, agar tte ra i acta ret esta 2 FS ish vrs ha cecal wit tetra omay be sms non a ‘Tard tepe ce Amie nba ec appt ree Bene me A. Sm, hacen Tain enn: a ken Sen comin on cantante ce Dae sean Ca 2 ROELAND VAN HOUT AND PIETER MUYSKEN Monson hope Dosa cts Urea et Aenean ‘rower, Ex. Gan Hout, (1983), Richa a0 vary ne developing lexi Tern (el) Au meuniere Ye Fe ee theses Craig Lene 3631-28, ESE ido au Pov ee ond in Maa ote D8 Lento, 5. (98), Apple log repaon. NewYork: Wie. Kap 09) Pech ae Chae, Ml Pee, tere hee, (0 Maney Nomis in Qu Date ites, B90, Principe of cone ma Cambri: Cntr Unvy Press. uma (rb arate fa Ge ed), Uneta f aa arn Fg) ec Spa Aoi Ee ‘sit In: Dye ©, Kham a edbot of amen age a pe. Ben aonb ra oa op 5h) Somes ss seein Engi EN A, Lig plc ant. & Mi, C98. The slr angi proces of "a reign. Ln 2 108 mi Eve fv 0) Se to ea of bgt on en sel Bo taby 9) Ont ef wae ale, Langage nei 19-2, Stet D: Poison 0) Te ca of hts Ta La te Soin ond Changer Sele oe soccate wean sense in InP MMe Testament a mt fina ruse Coe “ibe 99 Fh apc iS: Te maton fea own. a, a ck ot Rens nc Sore See ‘Thomo ame (on Lae on ron rt as ‘ees (Oy Fe Duch ge tien Des Dat eon, Ui vane Ru ener, A. 98), Sontag alae Me ce a sia in vailona it) Scene mr ec ra ‘Went 199, anaes acon proses Te Hague: Mouse, ‘Weck U Meroe to, WC) npr onda tay age ‘hw Cami sh et), ret or hares nu a wali BO une inane, Tao te Arn pi! ‘waft D U9 Pram Cree gh cra, Amster esas

Potrebbero piacerti anche