Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Arnaud FOURNET
The primary goal of comparative linguistics is to classify the languages of the world, to sort
them out and to assign them to genetic families according to the existence, either attested or
hypothesized, of a more ancient idiom, of which they are the direct continuation. The
uninterrupted use of the languages throughout the generations of locutors may be attested or
supposed, according to whether it is based on historical data or on a credibly substantiated
hypothesis.
Another subject of the field is to determine what the more ancient idioms should be like.
These proto-languages can be pieced together according to the techniques and principles
developed by comparative and historical linguistics, as will be shown in the following
paragraphs.
As a general rule, language families are comprised of still more languages as they derive
from a still more ancient proto-language. Thus language families follow a rule of increasing
overlapping and inclusion. It is noteworthy that the same word of family applies to all levels of
ancientness and overlapping without any hierarchy. This situation is very different from the
taxinomics of zoology where each level has a specific term with hierarchical significance :
species, genus, order, phylum, etc.
Language families are generally shown as trees, each branch being the divergent
continuation of a given state of language, which is increasingly recent when one goes down
to an individual language along the branching and contrariwise increasingly ancient and
hypothetical when one goes up the branching to the starting point. In the following
paragraphs, we shall assess the meaning and accuracy of the branching representation.
In a way, these two fields nearly have the same goal. Both undertake a comparison and a
classification of languages. But they do so according to different criteria.
Typology compares languages, either partially or fully, each being stable and homogeneous
enough to be synchronically describable. As a general rule, typology does not take into
account known or supposed genetic relationships. When it is based on the synchronic
features of languages, observed at some point of their history, the classification is
typological. But it is genetic when it deals with the inherited features of languages, either
preserved or altered, and with the innovations carried out by each language. In this respect,
common innovative features are a very efficient key to establishing credible language
relationships. Preserved features follow a more random pattern, but only closely related
languages share the same innovations.
More than one explanation may account for shared features :
- mere coincidence, a reason that can never be excluded in the first place,
- universal features, shared by about all languages,
- onomatopeia, when words are anchored to reality,
- borrowings made to a common outside source,
- shared innovations, made by closely related languages, after they started drifting
apart.
- last but not least, inherited features from an originally common state of language.
Before speaking of inherited features, one must discard the potential risks that a weaker
explanation accounts for the data investigated. Moreover, it should be noticed that inherited
features and common innovations are most often very hard to discriminate.
3. Linguistic Basics
Are said to be related or belonging to the same family the languages that are demonstrably
derivable from one and only ancient language, either historically attested or hypothesized.
Are called cognates the lexical units existing in the languages of a genetic family and
presenting features, phonetic, semantic and grammatical, much likely to be inherited from an
ancient idiom through a continuous transmission. On the contrary, a cognate in a given
family may ultimately prove to be a loanword at an earlier stage of that family.
The proto-language of a family is :
- in a strong sense, the state of language, supposedly synchronic, hypothesized at the
starting point of this family and acceptably matching the linguistic theories of typology,
phonology, semantics, etc.
- in a weak sense, primarily comparative, the collection of supposedly inherited
features.
These two meanings of proto-language do not describe the same linguistic reality, because
common features may be traced back to states of language of varying ancientness, or may
be shared innovations, independently developed in each language.
Similarly, proto-phoneme and proto-system refer to the phonemes and phonological system
hypothesized for the proto-language.
The languages originating in one proto-language, or mother-language, may be called
daughter-languages. The modern Roman languages and Latin are the archetype of the
relationship between daughter-languages and mother-languages.
An isogloss is the geographic limit between two (or more) forms embodying one linguistic
specific phenomenon, be it lexical, phonetic or grammatical.
When one starts studying a corpus of linguistic data, the first step is to compare, sort out and
organize the discovery of relevant facts. Once a significant number of potential cognates
have been spotted and after a number of putative correspondences have been hypothesized,
the analysis reaches a new stage called reconstruction. An etymon is posited for each set of
potential cognates.
The linguistic status of these etyma is rather difficult to tell :
- in a prudent and conservative approach, they may be held to be a kind of algebraic
convenient formulae that account for the fact that the cognates display recurrent
similarities of structure. This fictionalist approach is coherent with the weak sense to
the proto-language.
- it is hard to resist the idea that these etyma should be held to be actual lexemes of
the proto-language. This point of view is coherent with the strong sense of the proto-
language.
In fact, the term construction would be better than that of reconstruction, although the latter is
well established in historical linguistics. Reconstruction is a word somehow naively positivist
that seems to imply that some obviously right solution will appear through processing the
data. In fact, there often exists more than one possible solutions among which a choice has
to be made. Most corpuses of data do not rigidly pre-determine one and only solution for the
proto-language. For example, most work on Old Chinese has been done in the paradigm of
mono-syllabism : this means that Old Chinese is supposed to have had only C1vC2 syllables
with a lot of vowels. We consider this paradigm to be inadequate because the number of
vowels necessary to account for all dialects is unrealistic (it may exceed one hundred). We
think that this mono-syllabic paradigm should be replaced by a poly-syllabic paradigm with a
limited set of vowels. Another example is the laryngeal theory of Indo-european. It took about
a century before German comparatists finally accepted that it was the best hypothesis to
account for Indo-european data.
To work with phonetic correspondences is the most orthodox way of practicing comparative
linguistics. But this orthodoxy does not exclude open-mindedness. It should always be borne
in mind that more than one paradigm or hypothesis may be an acceptable framework for
research.
If the linguistic status of etyma is to be changed from algebraic convenient formulae into
proto-lexemes, some issues about the proto-language must be solved :
- what phonological features are to be assigned to the proto-phoneme(s) embodied in
the correspondences ?
- what should the whole proto-system look like ?
- what was the syllabic structure of the proto-language ?
- did the proto-language have the same morphology as the daughter languages ?
As a rule, one should expect the proto-language to differ more or less notably from the
systems attested in the daughter-languages. And these differences should be all the greater
as the proto-language is the more ancient. In such a case, the manifold reshuffles of past
systems may have reached a tremendous level of consequences.
The determination of a credible proto-system harnesses all the theoretical principles of
phonology and typology. A suggested proto-system that would be attested nowhere in at
least one present-day language or that would run against some well-established principles
would immediately be held as dubious or definitely unacceptable.
In general, as a starting point, the proto-phonemes are assigned some or all the features
appearing in the phonemes of the cognates. This is the simplest solution but the proto-
system made up with these proto-phonemes may prove to be very strange or inadequate. A
complete overhaul of the phonological features of the proto-system may then be necessary.
Such an analysis and assessment is included in the task of reconstructing the proto-
language.
Several hypotheses for the proto-system may account for the same initial data and this just
underlines the central position of the linguist in the process of reconstruction. The possibility
of comparing several rival hypotheses should not be held to be a kind of weakness. Much to
the contrary it is one of the strongest indications that the process of reconstruction has
reached a high level of maturity and reliability. Indo-european is about the only family that
has reached this level so far.
Two types of reconstruction are to be distinguished :
- bottom-up, starting with data and working toward the reconstruction of the proto-
language,
- top-down, having a proto-language and making a new language fit into this model.
In the following paragraphs, we will show what differences these bottom-up and top-down
reconstructions have.
The method that we are describing in this paper relies heavily on the lexical data and much
less on the grammatical or morphological structures. We believe that on a wide macro-
comparative scale lexical data are much stabler than usually thought and that phonology and
lexicology are more immune to theoretical fashions and crazes than grammar is, so that
lexical data are reliable enough for the purpose of looking for genetical relationships.
Several kinds of cognates can be distinguished :
- local cognates, typical of a given family, such as Germanic.
These cognates definitely authenticate the family but in a wider outlook they may be
of little use. English good and god are typical cognates of Germanic, but for the sake
of Indo-european comparison, they are much less useful.
- general cognates, typical of a large-scale family, such as Indo-european.
The major problem with general cognates is their rarity because the search for
widespread and uncontroversial lexemes leaves very few candidates.
- false isolates, that have no local comparative basis but have distant cognates in a
super-family. The false isolates have a very high value because they must be lexical
fossils and they are trustworthier reliques than general cognates.
Macro-comparison of genetic local families is pursued in a way that leads to a certain
number of contradictions. Usually the first step is to sort out local cognates and afterwards to
sort out general cognates among the local cognates of the different families. This is the way
many linguists think things should be done. The major drawback of this method is the
number of surviving cognates goes down at a very sharp rate when the process is carried at
least three times. The feeling induced by this method is that potential cognates for large-
scale families must be very few. We believe this conclusion is wrong and this method is
inadequate.
We consider it absolutely necessary to compare individual languages because of many
reasons :
- languages are the primary facts to be classified, not the proto-languages,
- proto-languages are not data but hypotheses integrating a high level of work and
theoretical background.
- experience shows that two languages taken at random always share much more
potential cognates than their supposed proto-languages do. The process of selection
of general cognates is too harsh as we already noticed.
- some false-isolates in one family can be compared with false isolates or with local
cognates in another family.
For example, English brine (salty water) from Old English bryne is a word with no etymology
inside Indo-european but it is quite obvious that it must have a connection with Arabic baHr
'sea'. Brine is an adjective from i.e *bh°Hr-nyos > Germanic *brûnyaz > OE bryne. Another
example in Uralic is Vogul nom-t a thought. This word has not been included in Uralic
potential cognates but it is another case of false isolates. This radical nom also exists in
Chinese 念niàn and means 'idea, thought, to think'.
A major revolution in the study of language happened when phonetics was given an equal
footing with semantics at the beginning of the XIXth century. Up to then, philosophy and most
related fields had only delt with semantics. In the new approach, a linguistic unit is defined as
the assembly of three components :
- a phonological signifier, or sound in a more naive jargon,
- a semantic signified, or meaning,
- a syntactical class.
The analysis of Signifiers relies on correspondences in order to guarantee an inherited
isomorphism.The analysis of Signified is much less reliable. So far very little work has been
done on criteria to determine what maximum meaning discrepancies in meanings are
acceptable and semantic leeway obviously is too much lax. In this regard, it must be made
clear that with lax semantic criteria the probability of finding similar words may go up as 200
out of 1000, without exceeding the random limit. On the opposite side, it is utopic to look for
cognates with exactly identical meanings.
Several kinds of acceptable changes of meaning can be identified :
- adding or removing semantemes,
For example 'young unmarried man' > 'young man' > 'man' > 'husband'. In such a
case, that exists in Uralic, the shift may yield units with slightly opposite meanings.
But the units still have a core meaning : an adult human male.
- extension to a new semantic-referential field,
This especially applies to concrete or abstract meanings.
- change in diathesis or verbal aspect,
For example, 'to look for' and 'to find', 'to hit' and 'to break', 'to give' or 'to accept', 'to
fly away' and 'to rob'. The change may occur without any morphological mark. These
changes are often related to some kind of slang.
- a salient feature,
For example, 'motley' > 'salmon', 'udder' > 'cattle'. This case is frequent with names of
plants and animals.
- functional and organic solidarity between process and agent.
For example, 'to rain' > 'water', 'to speak' > 'mouth', 'fire' > 'to cook', 'to give milk' >
'woman'. In this case, derivational morphology is involved.
- transfer of syntactical class,
For example, 'anaphoric pronoun' > 'to be', 'hand' > 'number five or ten'. These
transfers often have very unexpected meaning shifts but they can be studied in a
large-scale typological frame.
Furthermore, many words have very precise meanings that allow little change. For example,
it is dubious that to squint may appear as a variation of to look. These two meanings do not
fuse through a natural process of semanteme addition or removal. Another major risk is to
compare meanings through a very abstract semantic core. For example, it is dubious that a
word meaning woman may ever be used for man, because it would require the semantic
core human being that is a late creation of modern western culture.
The representation of families as trees has been criticized. It is necessary to assess what it
can bring and to avoid deluding oneself with its actual value. In particular, the split of
diverging speech communities and the areal diffusion of features are sometimes described
as though they were in sharp opposition. This is artificial and does not take into account real
linguistic changes. In fact there is no theoretical reason to reject the tree representation.
In most cases, the split of dialects often begins with an incomplete diffusion of a particular
innovation in a given linguistic area. For example, Greek definitely is one of the closest Indo-
European relatives of Balto-slavic and Indo-iranian. They share the same verbal
organization, different from that shown in Italic and Celtic. Nevertheless Greek has kept the
old centum way of pronouncing the velars while Balto-slavic and Indo-iranian have modified it
into the satem way. It seems that the process of satemisation was first introduced by Indo-
Iranian, which is the most consistent in applying it. Indo-Iranian was partially followed by
Balto-slavic but Greek and, to a lesser extent, Armenian were never involved in this process.
Quite obviously the so-called wave theory and the tree representation do not conflict at all.
A useful tree representation requires determining the right set of typical innovations, shared
or not by the different branches, to be used for language subgrouping. The verb organization
can be used to separate Italic and Celtic from Balto-slavic, Indo-iranian and Greek-Armenian.
Another branching will separate Balto-slavic and Indo-iranian from Greek-Armenian on
account of the satem process.
The reader should be warned against the prejudice that the full diffusion of an innovation pre-
supposes a small speech community. Such a full diffusion requires longstanding contacts
between speakers that may adopt the innovation one at a time and then pass it over to other
speakers. The social and psychological reasons that lead to the rejection or adoption of
innovations are issues that most often are out of the control of linguistics.
One of the most difficult points is to distinguish innovations made by already separated
speech communities, at different times of their history, from innovations made within one and
the same speech community. The result may appear to be the same but the historical
processes are different and this bears upon the inner classifications of families.
One often asked question is the reasons why and how a language changes. This question
remains hard to answer, especially if the process of language change is also held to be
gradual and inconscious. The truth is this question is not worded in a workable way. To think
of changes in a language is like to think about motion when one is at standstill. A language
and worse a state of language is a concept, an ideality (nearly platonician). It makes
description of languages easier. But changes are the utmost opposite of a state of language.
Descriptive linguistics has discarded all normative ideas and the observation of native
speakers has shown that real language is highly unhomogeneous both among and inside
speakers. This built-in diversity in speakers is the basic reason why changes begin and
spread within a speech community observed at a larger scale. This is what socio-linguistics
has uncovered since the 1960ies. In fact, languages do not change, changes make new
languages, not all changes, but some changes lead to major splits in a previously uniform
community or to major reshuffles.
A synthesis of the most basic ideas of comparative linguistics could read as this :
Some languages are related to one another. They form language families. They originate
through gradual changes in ancient languages, most often unattested. Their vocabularies
and grammars show remarkable similarities that exclude random coincidences. Indo-
european languages are the archetype of such a linguistic family.
Quite interestingly such problems were never investigated in the Antiquity. One has to wait till
the last millenium of human history to see the emergence of these ideas. Many European
people of the late Middle Ages had intuitive recognition that languages scattered all over the
world had special relationships.
Dante Alighieri (1265-1321), the famous writer of the Commedia del'arte, is the first
European to assert that Roman languages must be related and are the contemporary forms
of Latin. He classified Roman languages according to the word yes in three branches lingue
di si, lingue di oc, lingue di oil in the book written in Latin De vulgari eloquentia (1305).
Robert Bacon (1214-1294) noticed that Modern Greek was the new form taken by the
dialects of Ancient Greek.
In the Middle Ages, the Jews of North Africa also were struck by the structural similarities of
Hebrew and Arabic, asserting that this likeness should be explained by the common origin of
these languages. The Jewish doctor Yehuda ibn Quraysh is known as the first to have
asserted this around the year 1000.
At the beginning of the XIIIth century, Giraud de Cambrie assumed that Breton, Welsh and
Cornish were the continuation of an older Celtic language spoken in Great Britain.
All these common sense remarks were made by native speakers who at their time had no
theory to account for the facts they had observed.
A word has to be written about Sir William Jones, who is often propagandized as the epoch-
making creator of Indo-european comparative linguistics, in English speaking countries. In
1786, this man, who was then an English judge of Supreme Court in Calcutta, pronounced a
statement in his address to the Royal Asiatic Society of Bengal, that stated :
"The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more
perfect than the Greek, more copious than the Latin, and more exquisitely refined than
either; yet bearing to both of them a stronger affinity both in the roots of verbs and in
the forms of grammar, than could possibly have been produced by accident; so strong
indeed that no philologer could examine them all three, without believing them to have
sprung from some common source, which perhaps, no longer exists. There is a similar
reason though not quite so forcible, for supposing that both the Gothick and the Celtick,
though blended with a different idiom, had the same origin with the Sanskrit; and the
old Persian might be added to the family."
It is very unclear from this text to assert whether William Jones is referring to the old Scythian
theory or whether he is suggesting the mother language is of prehistorical ancientness.
Moreover he explicitly says that the Celtic and Gotic languages are blended, implying an
obsolete framework of medieval origin. What is worse is that he held Pahlavi, an Indo-aryan
language, to be Semitic and he rejected the genetic relationship between Hindi and Sanskrit
because their grammars were too different. In fact many of his suggested comparisons are
terribly shaky at best. As far as we see, the real significance of Jones in modern linguistics is
very low.
16. The invention of structuralism and modern comparative linguistics
Modern-day comparative linguistics maintains that the proof of genetic relationships lies in
the structural isomorphism of words and grammatical morphemes among related languages.
This method took some time before it became established. In the Middle Ages kabbalistic
speculation accepted a whole array of letter operations that "derived" words of one language
from other words, preferably from Hebrew. Afterwards, Turgot, a Frenchman who counted
among the writers of the French Encyclopédie (1756) held that etymology of words had to
rely on the morphemic structure of words. For example Britannic, to be analysed as Britann +
ic could not be compared with Hebraic Baratanac, "land of tin", because the structure of
these two words was totally different. The German J.C. Adelung (1732-1806) also made
clear that a word such as German pack-en could never be compared with Greek ap-ago of
similar meaning. In a modern jargon, this translates as "words with historical significance
must have synchronically similar morphemic structures".
The realization that phonetic correspondences were crucial in determining cognates also
took some time. Some early precursors have only been recognized these last years. Among
them a Spanish churchman of South American, Felipe Salvador Gilij, in the year 1782 called
coherencia mayor the fact that Arawak languages display the forms shapa, dapa, yapa for
mountain and shema, dema, yema for tobacco. Rasmus Rask (1787-1832), a Dane, is the
first to have explicitly asserted that Islandic, Latin and Greek words displayed such phonetic
recurrent properties that they should be held to be cognates for that particular reason.
The principle of phonetic correspondences got firmly reaffirmed around 1870 with the
generation of German comparatists, nicknamed the Young Grammarians.
This method has been challenged in recent years by Joseph Greenberg and his followers,
but it is quite clear that the alternative method advocated by these linguists has little chance
of convincing the majority of scientists.