Synopsis of Comparative Linguistics

A synopsis of comparative linguistics
Arnaud FOURNET
1. The goal of comparative linguistics
The primary goal of comparative linguistics is to classify the languages of the world, to sort
them out and to assign them to genetic families according to the existence, either attested or
hypothesized, of a more ancient idiom, of which they are the direct continuation. The
uninterrupted use of the languages throughout the generations of locutors may be attested or
supposed, according to whether it is based on historical data or on a credibly substantiated
hypothesis.
Another subject of the field is to determine what the more ancient idioms should be like.
These proto-languages can be pieced together according to the techniques and principles
developed by comparative and historical linguistics, as will be shown in the following
paragraphs.
As a general rule, language families are comprised of still more languages as they derive
from a still more ancient proto-language. Thus language families follow a rule of increasing
overlapping and inclusion. It is noteworthy that the same word of family applies to all levels of
ancientness and overlapping without any hierarchy. This situation is very different from the
taxinomics of zoology where each level has a specific term with hierarchical significance :
species, genus, order, phylum, etc.
Language families are generally shown as trees, each branch being the divergent
continuation of a given state of language, which is increasingly recent when one goes down
to an individual language along the branching and contrariwise increasingly ancient and
hypothetical when one goes up the branching to the starting point. In the following
paragraphs, we shall assess the meaning and accuracy of the branching representation.
2. Comparative linguistics and typology
In a way, these two fields nearly have the same goal. Both undertake a comparison and a
classification of languages. But they do so according to different criteria.
Typology compares languages, either partially or fully, each being stable and homogeneous
enough to be synchronically describable. As a general rule, typology does not take into
account known or supposed genetic relationships. When it is based on the synchronic
features of languages, observed at some point of their history, the classification is
typological. But it is genetic when it deals with the inherited features of languages, either
preserved or altered, and with the innovations carried out by each language. In this respect,
common innovative features are a very efficient key to establishing credible language
relationships. Preserved features follow a more random pattern, but only closely related
languages share the same innovations.
More than one explanation may account for shared features :
- mere coincidence, a reason that can never be excluded in the first place,
- universal features, shared by about all languages,
- onomatopeia, when words are anchored to reality,
- borrowings made to a common outside source,
- shared innovations, made by closely related languages, after they started drifting
apart.
- last but not least, inherited features from an originally common state of language.
Before speaking of inherited features, one must discard the potential risks that a weaker
explanation accounts for the data investigated. Moreover, it should be noticed that inherited
features and common innovations are most often very hard to discriminate.
3. Linguistic Basics
Are said to be related or belonging to the same family the languages that are demonstrably
derivable from one and only ancient language, either historically attested or hypothesized.
Are called cognates the lexical units existing in the languages of a genetic family and
presenting features, phonetic, semantic and grammatical, much likely to be inherited from an
ancient idiom through a continuous transmission. On the contrary, a cognate in a given
family may ultimately prove to be a loanword at an earlier stage of that family.
The proto-language of a family is :
- in a strong sense, the state of language, supposedly synchronic, hypothesized at the
starting point of this family and acceptably matching the linguistic theories of typology,
phonology, semantics, etc.
- in a weak sense, primarily comparative, the collection of supposedly inherited
features.
These two meanings of proto-language do not describe the same linguistic reality, because
common features may be traced back to states of language of varying ancientness, or may
be shared innovations, independently developed in each language.
Similarly, proto-phoneme and proto-system refer to the phonemes and phonological system
hypothesized for the proto-language.
The languages originating in one proto-language, or mother-language, may be called
daughter-languages. The modern Roman languages and Latin are the archetype of the
relationship between daughter-languages and mother-languages.
An isogloss is the geographic limit between two (or more) forms embodying one linguistic
specific phenomenon, be it lexical, phonetic or grammatical.
4. The meaning of phonetic correspondences
Comparative and historical linguistics are about linguistic change.

From an epistemological viewpoint, all scientific fields dealing with change have determined
a few principles of conservation of some features that enable the traceability of changes. For
example, the theory of chemical reactions is based on the conservation of mass and nature
of the chemical components involved in the reactions. Contrariwise, the pre-scientific
alchemy admitted (and even was looking for) the transmutation of atoms and ignored any
idea about mass. The concept of change is commonplace. What really matters in a theory of
changes is to identify parameters and features that are traceable.
Comparative and genetic linguistics also has its own explicitly traceable criteria. The most
important of these criteria is the structural isomorphism, both phonological and syllabical, of
the cognate words present in the related languages. The isomorphic criterion identifies the
cognates according to the possibility of comparing the structure of their pronunciation or, in a
more technical jargon, of their phonological signifiers. It means that such and such phoneme
or phonological feature in one given language appears to be such and such in the others
languages, and this structural identity is true in a recurrent and predictable way in a large
number of cognate words, if not all words.
Phonemes and phonological features are said to be a phonetic correspondence when they
are involved in such a structural identity in the different components and segments of the
pronunciations of supposed cognates.
The fact that phonetic correspondences are recurrent and predictable is a major reason to
discard random and coincidence as the most plausible explanation. It gives a substantiated
footing to a proto-system and henceforth to a proto-language, in the strong sense of the
word.
In such an approach, that we abide by, the principle of exceptionless changes, so called
Ausnahmlosigkeit in German, just makes changes traceable. During the XIXth century, this
principle was upheld as a kind of law, much in the way physics has laws and gravity is a law.
Basically, this is not necessary in linguistics and entails too much epistemological load,
especially some pointless organicist and biological similarities between languages and living
beings. Dialectology and sociology as well have shown that the principle of exceptionless
changes is contradicted by some facts, but these are rather marginal. From a methodological
point of view, when nothing is known about the pre-history of languages, there exists no
alternative but to accept the principle of phonetic isomorphism and exceptionless regularity of
sound changes.
5. Correspondences and proto-phonemes
A phonetic correspondence is neither a phoneme nor a proto-phoneme. It is the synchronic

trace and reminiscence of the previous existence of one or more proto-phonemes to be
determined. As a rule, the number of correspondences within a given family must be
expected to be higher than the actual number of phonemes in the proto-language. A certain
number of (false) correspondences are induced on the basis of widespread loanwords, that
must be spotted and removed because they unduly complicate and distort the proto-system.
In the general case when one has little or no previous knowledge at all about the past of the
studied languages, it is hard to tell the right correspondences from the false ones but this
operation of sorting out is crucial.
Positing proto-phonemes without having any observable traces in the data is to be avoided.
In the first step of reconstruction, the best proto-system has the smallest amount of proto-
phonemes necessary to account for the data. The proto-system may offer the potentiality for
some combinations of phonological features for which the data reveal no traces. In such a
case, it is impossible to determine whether these are empty spots in the system or proto-
phonemes that have disappeared altogether leaving no traces of their past existence. Some
correspondences may not be phonetically obvious and may be overlooked in the analysis. In
that case, the proto-system will be incomplete.
A major point that must be emphasized is the methodological interdependence between the
cognates, the correspondences and the proto-system. None of these three notions makes
sense without the others. None preexists. They all acquire authentication and value at the
same time. For example, if we compare Bask, Kabyle, English and Hebrew, we find for the
word 'no' : ez ur no lo and for the word 'on' az ar on al. Is this pattern z~r~n~l due to
coincidence ? Is this to be held for a correspondence ? Then these words are cognates and
these languages should also be held to be related. These questions can not be answered
one at a time.
One should be warned against the methodological delusion, according to which it might be
possible to proceed in a linear motion from the cognates, facts allegedly heuristically
obvious, toward the correspondences, facts presumed to be easily uncovered, further up to
the proto-system, a final step that may entail some specific hardships. The epistemological
acceptability of these notions is neither sequential nor hierarchical. They are simultaneously
guaranteed or unacceptable. This linear description of the method is linked to the history of
linguistics in the XIXth century and to Indo-european languages for which Sanskrit offered a
straightforward framework, so that in this particular case, comparatists had an immediate
guideline for sorting out right and wrong. In most other cases, there exists no immediate
guideline and there is no such thing as an obvious cognate.
6. Comparison and reconstruction
When one starts studying a corpus of linguistic data, the first step is to compare, sort out and
organize the discovery of relevant facts. Once a significant number of potential cognates
have been spotted and after a number of putative correspondences have been hypothesized,
the analysis reaches a new stage called reconstruction. An etymon is posited for each set of
potential cognates.
The linguistic status of these etyma is rather difficult to tell :
- in a prudent and conservative approach, they may be held to be a kind of algebraic
convenient formulae that account for the fact that the cognates display recurrent
similarities of structure. This fictionalist approach is coherent with the weak sense to
the proto-language.
- it is hard to resist the idea that these etyma should be held to be actual lexemes of
the proto-language. This point of view is coherent with the strong sense of the proto-
language.
In fact, the term construction would be better than that of reconstruction, although the latter is
well established in historical linguistics. Reconstruction is a word somehow naively positivist
that seems to imply that some obviously right solution will appear through processing the
data. In fact, there often exists more than one possible solutions among which a choice has
to be made. Most corpuses of data do not rigidly pre-determine one and only solution for the
proto-language. For example, most work on Old Chinese has been done in the paradigm of
mono-syllabism : this means that Old Chinese is supposed to have had only C1vC2 syllables
with a lot of vowels. We consider this paradigm to be inadequate because the number of
vowels necessary to account for all dialects is unrealistic (it may exceed one hundred). We
think that this mono-syllabic paradigm should be replaced by a poly-syllabic paradigm with a
limited set of vowels. Another example is the laryngeal theory of Indo-european. It took about
a century before German comparatists finally accepted that it was the best hypothesis to
account for Indo-european data.
To work with phonetic correspondences is the most orthodox way of practicing comparative
linguistics. But this orthodoxy does not exclude open-mindedness. It should always be borne
in mind that more than one paradigm or hypothesis may be an acceptable framework for
research.
7. The quest for a phonological proto-system
If the linguistic status of etyma is to be changed from algebraic convenient formulae into
proto-lexemes, some issues about the proto-language must be solved :
- what phonological features are to be assigned to the proto-phoneme(s) embodied in
the correspondences ?
- what should the whole proto-system look like ?
- what was the syllabic structure of the proto-language ?
- did the proto-language have the same morphology as the daughter languages ?
As a rule, one should expect the proto-language to differ more or less notably from the
systems attested in the daughter-languages. And these differences should be all the greater
as the proto-language is the more ancient. In such a case, the manifold reshuffles of past
systems may have reached a tremendous level of consequences.
The determination of a credible proto-system harnesses all the theoretical principles of
phonology and typology. A suggested proto-system that would be attested nowhere in at
least one present-day language or that would run against some well-established principles
would immediately be held as dubious or definitely unacceptable.
In general, as a starting point, the proto-phonemes are assigned some or all the features
appearing in the phonemes of the cognates. This is the simplest solution but the proto-
system made up with these proto-phonemes may prove to be very strange or inadequate. A
complete overhaul of the phonological features of the proto-system may then be necessary.
Such an analysis and assessment is included in the task of reconstructing the proto-
language.
Several hypotheses for the proto-system may account for the same initial data and this just
underlines the central position of the linguist in the process of reconstruction. The possibility
of comparing several rival hypotheses should not be held to be a kind of weakness. Much to
the contrary it is one of the strongest indications that the process of reconstruction has
reached a high level of maturity and reliability. Indo-european is about the only family that
has reached this level so far.
Two types of reconstruction are to be distinguished :
- bottom-up, starting with data and working toward the reconstruction of the proto-
language,
- top-down, having a proto-language and making a new language fit into this model.
In the following paragraphs, we will show what differences these bottom-up and top-down
reconstructions have.
8. The process of bottom-up reconstruction
The bottom-up reconstruction includes these stages :

Data collection
- Determine a set of languages felt to be potentially related
- Collect data, as well described as possible
Hypothesis construction
- Compare and sort out the data
- Determine a set of potential cognates
- Determine a set of potential correspondences
- Look for a possible phonological proto-system
Cross examination
- Determine a plausible morphological and syllabical structure
- Check out and reanalyse the whole set of data
- Keep acceptable cognates and remove the dubious cases
- Confirm the correspondences et sound changes
- Check apparently empty spots in the proto-system
Assessment
- Confirm phonological units and features of the proto-system
- Assess the phonological and typological plausibility of the proto-system
Correction
- Reformulate the syllabic structure and proto-system, if necessary
- Rewrite the sound changes, if necessary
So far, very few linguistic families have progressed beyond the two first steps. Indo-
european, in spite of its flaws, is the only family to have reached the stage of assessment
and to work on the issue of correction. Uralic still is in the middle of the stage of building a
proto-language hypothesis.
9. The difficulties of bottom-up reconstruction
A process of bottom-up reconstruction may stumble into the following difficulties :

- X-ocentrism, i.e to give to much weight to a particular language.
In this respect, the notion of conservative language or archaism often is circular.
Objectivity in this field is very difficult to maintain. The risk of producing reasonings
where the conclusion already is hidden in the premisses is very high.
- Entelechy, i.e to attribute the proto-language many features of the daughter-language
and to create a kind of linguistic model with unfathomable life-time.
This may originate in some (inconscious) difficulty to accept change. The proto-
language is then created in a reassuring but implausible conformity with the daughter-
languages. Paradoxically a proto-system is all the more plausible as it is slightly at
odds with the modern languages. Because these partial dissimilarities have been
introduced by waves of innovations in a certain number of features in the daughter-
languages.
- Extralinguistic prejudices about what (and where) the proto-language should be.
- Isolationism, that is the temptation to explain everything within a family.
This leads to compare data with minor incoherences, that time and random made to
look alike, although outside comparison definitely shows that these data had no real
connections in the first place.
In practice, it is necessary to determine among the presumedly related languages those
which seem to display the best phonetic stability and the best conservation of syllabic frames
of the words. Such languages have a high objective value. In a given family, some languages
better maintain initials or finals, consonants or vowels, stress or pitch. It is necessary to
assess what each language can bring to the process of reconstruction. Some languages may
shed light on some points, whereas others give no information at all.
10. Top-down reconstruction
A typical instance of top-down reconstruction is the integration of a new language in a well-

known family. Because a proto-system and a set of cognates already are at hand, the issue
of relationship hinges around the reflexes in the new language of the proto-phonemes,
supposedly existing at an ancient stage.
The new language (or the new family) may integrate without trouble, in which case it will not
force any change. Otherwise, it may lead to some modifications, establishing new or
overlooked correspondences for example. It may also give more muscle to one hypothesis
against another.
The hypothesis that all languages in the world may well be related has gotten new ground
since genetics started yielded unexpected results about human genome. The hypothesis of
mono-genesis has gained momentum and for linguistics the issue is the coherence of the
different proto-languages available so far. As they have been reconstructed independently, it
is no wonder that in most cases they display rather little coherence. This is the drawback of
the bottom-up approach : proto-languages may be reconstructed in a way that makes global
comparison more difficult than it should. Sometimes proto-languages look more different
from one another than the languages themselves. This means that some proto-languages
reconstructed in the bottom-up approach have to be seriously amended to fit better in the
top-down macro-comparison.
11. The cognates
The method that we are describing in this paper relies heavily on the lexical data and much
less on the grammatical or morphological structures. We believe that on a wide macro-
comparative scale lexical data are much stabler than usually thought and that phonology and
lexicology are more immune to theoretical fashions and crazes than grammar is, so that
lexical data are reliable enough for the purpose of looking for genetical relationships.
Several kinds of cognates can be distinguished :
- local cognates, typical of a given family, such as Germanic.
These cognates definitely authenticate the family but in a wider outlook they may be
of little use. English good and god are typical cognates of Germanic, but for the sake
of Indo-european comparison, they are much less useful.
- general cognates, typical of a large-scale family, such as Indo-european.
The major problem with general cognates is their rarity because the search for
widespread and uncontroversial lexemes leaves very few candidates.
- false isolates, that have no local comparative basis but have distant cognates in a
super-family. The false isolates have a very high value because they must be lexical
fossils and they are trustworthier reliques than general cognates.
Macro-comparison of genetic local families is pursued in a way that leads to a certain
number of contradictions. Usually the first step is to sort out local cognates and afterwards to
sort out general cognates among the local cognates of the different families. This is the way
many linguists think things should be done. The major drawback of this method is the
number of surviving cognates goes down at a very sharp rate when the process is carried at
least three times. The feeling induced by this method is that potential cognates for large-
scale families must be very few. We believe this conclusion is wrong and this method is
inadequate.
We consider it absolutely necessary to compare individual languages because of many
reasons :
- languages are the primary facts to be classified, not the proto-languages,
- proto-languages are not data but hypotheses integrating a high level of work and
theoretical background.
- experience shows that two languages taken at random always share much more
potential cognates than their supposed proto-languages do. The process of selection
of general cognates is too harsh as we already noticed.
- some false-isolates in one family can be compared with false isolates or with local
cognates in another family.
For example, English brine (salty water) from Old English bryne is a word with no etymology
inside Indo-european but it is quite obvious that it must have a connection with Arabic baHr
'sea'. Brine is an adjective from i.e *bh°Hr-nyos > Germanic *brûnyaz > OE bryne. Another
example in Uralic is Vogul nom-t a thought. This word has not been included in Uralic
potential cognates but it is another case of false isolates. This radical nom also exists in
Chinese 念niàn and means 'idea, thought, to think'.
12. Semantic traceability
A major revolution in the study of language happened when phonetics was given an equal
footing with semantics at the beginning of the XIXth century. Up to then, philosophy and most
related fields had only delt with semantics. In the new approach, a linguistic unit is defined as
the assembly of three components :
- a phonological signifier, or sound in a more naive jargon,
- a semantic signified, or meaning,
- a syntactical class.
The analysis of Signifiers relies on correspondences in order to guarantee an inherited
isomorphism.The analysis of Signified is much less reliable. So far very little work has been
done on criteria to determine what maximum meaning discrepancies in meanings are
acceptable and semantic leeway obviously is too much lax. In this regard, it must be made
clear that with lax semantic criteria the probability of finding similar words may go up as 200
out of 1000, without exceeding the random limit. On the opposite side, it is utopic to look for
cognates with exactly identical meanings.
Several kinds of acceptable changes of meaning can be identified :
- adding or removing semantemes,
For example 'young unmarried man' > 'young man' > 'man' > 'husband'. In such a
case, that exists in Uralic, the shift may yield units with slightly opposite meanings.
But the units still have a core meaning : an adult human male.
- extension to a new semantic-referential field,
This especially applies to concrete or abstract meanings.
- change in diathesis or verbal aspect,
For example, 'to look for' and 'to find', 'to hit' and 'to break', 'to give' or 'to accept', 'to
fly away' and 'to rob'. The change may occur without any morphological mark. These
changes are often related to some kind of slang.
- a salient feature,
For example, 'motley' > 'salmon', 'udder' > 'cattle'. This case is frequent with names of
plants and animals.
- functional and organic solidarity between process and agent.
For example, 'to rain' > 'water', 'to speak' > 'mouth', 'fire' > 'to cook', 'to give milk' >
'woman'. In this case, derivational morphology is involved.
- transfer of syntactical class,
For example, 'anaphoric pronoun' > 'to be', 'hand' > 'number five or ten'. These
transfers often have very unexpected meaning shifts but they can be studied in a
large-scale typological frame.
Furthermore, many words have very precise meanings that allow little change. For example,
it is dubious that to squint may appear as a variation of to look. These two meanings do not
fuse through a natural process of semanteme addition or removal. Another major risk is to
compare meanings through a very abstract semantic core. For example, it is dubious that a
word meaning woman may ever be used for man, because it would require the semantic
core human being that is a late creation of modern western culture.
13. The tree or Stammbaum representation
The representation of families as trees has been criticized. It is necessary to assess what it
can bring and to avoid deluding oneself with its actual value. In particular, the split of
diverging speech communities and the areal diffusion of features are sometimes described
as though they were in sharp opposition. This is artificial and does not take into account real
linguistic changes. In fact there is no theoretical reason to reject the tree representation.
In most cases, the split of dialects often begins with an incomplete diffusion of a particular
innovation in a given linguistic area. For example, Greek definitely is one of the closest Indo-
European relatives of Balto-slavic and Indo-iranian. They share the same verbal
organization, different from that shown in Italic and Celtic. Nevertheless Greek has kept the
old centum way of pronouncing the velars while Balto-slavic and Indo-iranian have modified it
into the satem way. It seems that the process of satemisation was first introduced by Indo-
Iranian, which is the most consistent in applying it. Indo-Iranian was partially followed by
Balto-slavic but Greek and, to a lesser extent, Armenian were never involved in this process.
Quite obviously the so-called wave theory and the tree representation do not conflict at all.
A useful tree representation requires determining the right set of typical innovations, shared
or not by the different branches, to be used for language subgrouping. The verb organization
can be used to separate Italic and Celtic from Balto-slavic, Indo-iranian and Greek-Armenian.
Another branching will separate Balto-slavic and Indo-iranian from Greek-Armenian on
account of the satem process.
14. The issue of language change
The reader should be warned against the prejudice that the full diffusion of an innovation pre-
supposes a small speech community. Such a full diffusion requires longstanding contacts
between speakers that may adopt the innovation one at a time and then pass it over to other
speakers. The social and psychological reasons that lead to the rejection or adoption of
innovations are issues that most often are out of the control of linguistics.
One of the most difficult points is to distinguish innovations made by already separated
speech communities, at different times of their history, from innovations made within one and
the same speech community. The result may appear to be the same but the historical
processes are different and this bears upon the inner classifications of families.
One often asked question is the reasons why and how a language changes. This question
remains hard to answer, especially if the process of language change is also held to be
gradual and inconscious. The truth is this question is not worded in a workable way. To think
of changes in a language is like to think about motion when one is at standstill. A language
and worse a state of language is a concept, an ideality (nearly platonician). It makes
description of languages easier. But changes are the utmost opposite of a state of language.
Descriptive linguistics has discarded all normative ideas and the observation of native
speakers has shown that real language is highly unhomogeneous both among and inside
speakers. This built-in diversity in speakers is the basic reason why changes begin and
spread within a speech community observed at a larger scale. This is what socio-linguistics
has uncovered since the 1960ies. In fact, languages do not change, changes make new
languages, not all changes, but some changes lead to major splits in a previously uniform
community or to major reshuffles.
15. A short history of comparative linguistics
A synthesis of the most basic ideas of comparative linguistics could read as this :
Some languages are related to one another. They form language families. They originate
through gradual changes in ancient languages, most often unattested. Their vocabularies
and grammars show remarkable similarities that exclude random coincidences. Indo-
european languages are the archetype of such a linguistic family.
Quite interestingly such problems were never investigated in the Antiquity. One has to wait till
the last millenium of human history to see the emergence of these ideas. Many European
people of the late Middle Ages had intuitive recognition that languages scattered all over the
world had special relationships.
Dante Alighieri (1265-1321), the famous writer of the Commedia del'arte, is the first
European to assert that Roman languages must be related and are the contemporary forms
of Latin. He classified Roman languages according to the word yes in three branches lingue
di si, lingue di oc, lingue di oil in the book written in Latin De vulgari eloquentia (1305).
Robert Bacon (1214-1294) noticed that Modern Greek was the new form taken by the
dialects of Ancient Greek.
In the Middle Ages, the Jews of North Africa also were struck by the structural similarities of
Hebrew and Arabic, asserting that this likeness should be explained by the common origin of
these languages. The Jewish doctor Yehuda ibn Quraysh is known as the first to have
asserted this around the year 1000.
At the beginning of the XIIIth century, Giraud de Cambrie assumed that Breton, Welsh and
Cornish were the continuation of an older Celtic language spoken in Great Britain.
All these common sense remarks were made by native speakers who at their time had no
theory to account for the facts they had observed.
Three different approaches were pursued to explain the origin of languages :

- one held that languages were blendings of older languages,
- one held that all languages originated in Hebrew
- another one held that the mother tongue was a particular language, e.g Dutch.
It took some time before a prehistoric mother tongue was suggested. Probably because
Christian Religion excluded evolution and considered everything to have been created once
and for all, the Renaissance thinkers generally explained language changes through a
process of blending. Italian was supposed to be Latin mingled with Lombardic (an Eastern
Germanic language) and French was held to be Gaulish mixed with Latin.
At the same period, other thinkers started deriving the words of one language through
intricate processes of letter permutation and substitution. Estienne Guichard wrote an
archetypical book in 1606 L'Harmonie étymologique des langues, where all the words of
known languages are supposed to derive from three letter roots taken from Hebrew. Such
languages as South-American Arawak were "explained" with such letter games.
In Christian Europe, Hebrew was quite logically held to be the "mother" of all other
languages. A typical example is Guillaume Postel (1510-1581), one of the most learned
Frenchmen of his time, who wrote that Arabic, Sanskrit and Greek had their source in
Hebrew, presumably the language that Noah had bequeathed unto mankind.
Another approach, with chauvinistic purposes instead of religious ones, suggested
contemporary languages instead of Hebrew. This reached a high level of laughability when
some Dutchmen tried to derive every other language from the Antwerpen dialect of Dutch.
This is known as Goropianism, after the name of Jan van Gorp.
All these researches were made on the written form of languages instead of relying on the
true phonetics of spoken languages and they never assumed that a given language could
originate in an unknown language, of prehistorical ancientness. Nevertheless all this
intellectual agitation opened the way that ultimately led to Indo-European comparative works.
The major event with unresisting influence upon the development of comparative linguistics
was the encounter of Sanskrit and European speakers in India in the XVIth century. The
striking ressemblance between Sanskrit, Latin and Greek was first noticed as early as 1583
by an English jesuit, Thomas Stephens, who lived in India from 1579 to 1619. Even people
with more terrestrial interests like the Italian salesman Filippo Sassetti in 1585 were struck by
the apparent familiarity of Sanskrit. A lot of work was carried out especially in the
Netherlands by Marcus Boxhorn (1640) and in France by Claude de Saumaise (1643) on the
comparison of Indo-european languages, that had not yet received this name, foremost
Sanskrit, Greek, Latin, Persan and Germanic languages. The obvious similarities of these
languages were explained in the framework of the "Scythian" origin, sometimes also labelled
"Japhetic". The well-known Scyths, an historical people of the Iranian branch of Indo-
european were then considered to have spread all over Eurasia and have ramified into so
many modern languages. It was not until the middle of the XIXth century and the triumph of
evolutionism that it became established that none of the Indo-european languages should be
held to the be the mother of all others. The "Scythian" theory was discarded and the original
proto-Indo-european language was considered prehistoric and unattested as we still believe
today.
So the history of comparative linguistics can be roughly depicted in this way :
Before the Renaissance, very little work was done, although some thinkers had penetrating
intuition about linguistic potential relationships. With the cultural encounter with Sanskrit,
Europe, at this time especially France and the Netherlands, is struck by the incredible
similarity of this language with Latin and Greek. This brings forth the theory of the "Scythian"
origin of all these languages, that lasts from about 1650 to about 1850. Afterwards, the
Germans, Franz Bopp, Brugmann and others, gave Indo-European the impulse and
theoretical bases that we still know today : i.e many languages spoken in Eurasia originate in
a lost prehistorical language called proto-Indo-European.
A word has to be written about Sir William Jones, who is often propagandized as the epoch-
making creator of Indo-european comparative linguistics, in English speaking countries. In
1786, this man, who was then an English judge of Supreme Court in Calcutta, pronounced a
statement in his address to the Royal Asiatic Society of Bengal, that stated :
"The Sanskrit language, whatever be its antiquity, is of a wonderful structure; more
perfect than the Greek, more copious than the Latin, and more exquisitely refined than
either; yet bearing to both of them a stronger affinity both in the roots of verbs and in
the forms of grammar, than could possibly have been produced by accident; so strong
indeed that no philologer could examine them all three, without believing them to have
sprung from some common source, which perhaps, no longer exists. There is a similar
reason though not quite so forcible, for supposing that both the Gothick and the Celtick,
though blended with a different idiom, had the same origin with the Sanskrit; and the
old Persian might be added to the family."
It is very unclear from this text to assert whether William Jones is referring to the old Scythian
theory or whether he is suggesting the mother language is of prehistorical ancientness.
Moreover he explicitly says that the Celtic and Gotic languages are blended, implying an
obsolete framework of medieval origin. What is worse is that he held Pahlavi, an Indo-aryan
language, to be Semitic and he rejected the genetic relationship between Hindi and Sanskrit
because their grammars were too different. In fact many of his suggested comparisons are
terribly shaky at best. As far as we see, the real significance of Jones in modern linguistics is
very low.
16. The invention of structuralism and modern comparative linguistics
Modern-day comparative linguistics maintains that the proof of genetic relationships lies in
the structural isomorphism of words and grammatical morphemes among related languages.
This method took some time before it became established. In the Middle Ages kabbalistic
speculation accepted a whole array of letter operations that "derived" words of one language
from other words, preferably from Hebrew. Afterwards, Turgot, a Frenchman who counted
among the writers of the French Encyclopédie (1756) held that etymology of words had to
rely on the morphemic structure of words. For example Britannic, to be analysed as Britann +
ic could not be compared with Hebraic Baratanac, "land of tin", because the structure of
these two words was totally different. The German J.C. Adelung (1732-1806) also made
clear that a word such as German pack-en could never be compared with Greek ap-ago of
similar meaning. In a modern jargon, this translates as "words with historical significance
must have synchronically similar morphemic structures".
The realization that phonetic correspondences were crucial in determining cognates also
took some time. Some early precursors have only been recognized these last years. Among
them a Spanish churchman of South American, Felipe Salvador Gilij, in the year 1782 called
coherencia mayor the fact that Arawak languages display the forms shapa, dapa, yapa for
mountain and shema, dema, yema for tobacco. Rasmus Rask (1787-1832), a Dane, is the
first to have explicitly asserted that Islandic, Latin and Greek words displayed such phonetic
recurrent properties that they should be held to be cognates for that particular reason.
The principle of phonetic correspondences got firmly reaffirmed around 1870 with the
generation of German comparatists, nicknamed the Young Grammarians.
This method has been challenged in recent years by Joseph Greenberg and his followers,
but it is quite clear that the alternative method advocated by these linguists has little chance
of convincing the majority of scientists.
For any suggestion or correction

e-mail : fournet.arnaud@wanadoo.fr

Synopsis of Comparative Linguistics

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Synopsis of Comparative Linguistics

Caricato da

Copyright:

Formati disponibili

A synopsis of comparative linguistics

1. The goal of comparative linguistics

2. Comparative linguistics and typology

4. The meaning of phonetic correspondences

Comparative and historical linguistics are about linguistic change.

5. Correspondences and proto-phonemes

A phonetic correspondence is neither a phoneme nor a proto-phoneme. It is the synchronic

6. Comparison and reconstruction

7. The quest for a phonological proto-system

8. The process of bottom-up reconstruction

The bottom-up reconstruction includes these stages :

9. The difficulties of bottom-up reconstruction

A process of bottom-up reconstruction may stumble into the following difficulties :

10. Top-down reconstruction

A typical instance of top-down reconstruction is the integration of a new language in a well-

11. The cognates

12. Semantic traceability

13. The tree or Stammbaum representation

14. The issue of language change

15. A short history of comparative linguistics

Three different approaches were pursued to explain the origin of languages :

For any suggestion or correction

Potrebbero piacerti anche