Presentation on theme: "New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University."— Presentation transcript:
New approaches to language and prehistory from typology, genetics, and quantitative linguistics Søren Wichmann MPI-EVA & Leiden University
Language and prehistory Linguistic reconstruction is one of our primary tools for learning about the prehistoric past. In many ways it is our best, and this is especially true at time depths where archaeology has trouble identifying the ethnicity of its subject matters. Robert L. Rankin Too many comparative historical linguists want to dig up Troy, linguistically speaking. They consider it more important that comparative linguistics shed light on prehistoric migrations than that it shed light on the nature of language change. I can only say that I do not share those views on the focus of comparative linguistics. I do not consider comparative linguistics a branch of prehistory... S. P. Harrison
Traditional methods Battery of tools in traditional historical linguistics: –Wörter und Sachen –Loanwords Aspects of prehistory within reach of these methods: –Reconstruction of cultural and environmental inventories, homelands –Language contact situations
Wörter und Sachen Make reconstructions of lexical items for a given proto- language and infer something about the culture and environment of the speakers. A time-honored method, which has been applied to many language families in the world.
Problems with Wörter und Sachen The method assumes that a proto-language corresponds to a point in time, but proto-languages have long lives and can be widespread. So different words for different things can belong to different temporal strata. Precisely words for culturally important items are likely to diffuse quickly. Early borrowing among related languages can be impossible to detect. E.g., ‚whisky‘ can be reconstructed for proto-Algonkian, which should date to thousands of years before European conquest. Native words for ‚glasses‘ and ‚church‘ can be reconstructed for proto-Oaxaca Mixean (a subgroup of the Mixe-Zoquean languages of Mexico), which should date to at least a century before European conquest.
Problems with homelands The signature in the lexicon may date to the latest expansion of the group. Inferring homelands from lexical reconstructions requires a large proto-vocabulary. A large proto-vocabulary can only be reconstructed for a family with a relatively shallow time depth and many languages. Precisely this type of language family is likely to be one which has expanded quickly and recently. So the kind of language family that provides the best materials for inferring homelands is also the kind of language family whose homeland is most difficult to narrow down!
Problem with any approach within the traditional framework of comparative linguistics TIME! Beyond some 10,000 years (nobody know how many exactly) the lexicon has changed so much that there is nothing to compare for the comparative method.
So... We need to go beyond the comparative methods. The time depth which it reaches is limited, and even within its ‚field of operation‘ there are problems when it comes to reconstructing aspects of culture and environment or to determine homelands.
It turns out that the key to going beyond the traditional method is quantification. Comparative linguistics is an extremely qualitatively oriented field, and most historical linguistics have been afraid of numbers.
Let‘s make a fresh start and go as far back in time as we can imagine...
Can we say anything about the linguistic situation in pre- neolithic times, i.e. some 10,000-20,000 years ago?
Yes, I believe we can something about language family sizes, for instance.
But it will require an imaginative use of quantitative data and some detours before we get to say anything about this. So hang on...
The problem: were there many little language families of roughly the same size or perhaps a few big ones, some middle-size ones, and a lot of small ones? Dixon‘s model of punctuation and equilibrium assumes the first model. Linguistic equilibrium: „Each political group would have a population comparable to those of other groups in the area. That is, one group could be, say, four times as big as another, but not a hundred times as big“ (Dixon 1997: 69)
(1489) Niger-Congo (1262) Austronesian (552) Trans-New Guinea (443) Indo-European (372) Afro-Asiatic (365) Sino-Tibetan (258) Australian (199) Nilo-Saharan (172) Oto-Manguean (168) Austro-Asiatic (104) Sepik-Ramu (75) Dravidian (70) Tupi (70) Tai-Kadai (69) Mayan (65) Altaic (62) Uto-Aztecan (60) Arawakan (48) Torricelli (47) Na-Dene (46) Quechuan (40) Algic (38) Uralic (36) East Papuan (34) North Caucasian (33) Geelvink Bay (33) Penutian (32) Macro-Ge (32) Hmong-Mien (30) Panoan (29) Carib (29) Khoisan (28) Hokan (27) Salishan (26) West Papuan (25) Tucanoan (22) Chibchan (17) Siouan (16) Mixe-Zoque (13) Andamanese (12) Japanese (11) Totonacan (11) Mataco-Guaicuru (11) Eskimo-Aleut (10) Choco (10) Iroquoian (8) Arauan (7) Chumash (7) Sko (7) Zaparoan (7) Barbacoan (7) Left May (6) Maku (6) Muskogean (6) Kwomtari-Baibai (6) Kiowa-Tanoan (6) Tacanan (6) Witotoan (5) Caddoan (5) Chukotko-Kamchatkan (5) Mascoian (5) Guahiban (5) South Caucasian (5) Wakashan (5) Nambiquaran (5) Chapacura-Wanham (4) Huavean (4) Gulf (4) Misumalpan (4) Subtiaba-Tlapanec (4 Jivaroan) (4) Yanomam (3) Katukinan (3) Basque (3) Aymaran (3) East Bird's Head (2) Lower Mamberamo (2) Harakmbet (2) Peba-Yaguan (2) Yenisei Ostyak (2) Arutani-Sape (2) Amto-Musan (2) Zamucoan (2) Alacalufan (2) Araucanian (2) Yukaghir (2) Yuki (2) Uru-Chipaya (2) Keres (2) Cahuapanan (2) Salivan (2) Chon (2) Bayono-Awbono (1) Coahuiltecan (1) Paezan (1) Lule-Vilela (1) Chimakuan (1) Mura (1) Mosetenan (1) Cant (30) Other language Isolates A ranking of the world’s language families in terms of number of languages according to data from Ethnologue
Internet sites ranked by the number of unique AOL visitors they received Dec. 1, 1997 (after Adamic and Huberman 2002: Fig. 2).
Different phenomena exhibiting power-law distributions Urban conglomerations(Auerbach 1913) Abundance of biological taxa(Yule 1924) Word frequencies(Zipf 1949) Distributional of personal income(Champernowne 1953) Earthquakes sizes(Kanamori and Anderson 1975) Popularity of internet sites(Glassman 1994) Gene activity(Ueda et al. 2004)
‘Zipf’s law’ applies when: For a set of quantities their distribution is such that each quantity Q is inversely proportional to its rank R Zipf‘s law a special instance of a power-law distribution Q = c R -a
Poisson distribution Exponential Network Power-law distribution Scale-free Network Albert László Barabási Albert László Barabási et al.: The Architecture of Complexity: From the Diameter of the WWW to the Structure of the Cell (power-point presentation) (http://www.nd.edu/~networks/papers.htm)
P(k) ~k - ( = 3) SCIENCE CITATION INDEX Albert László Barabási Albert László Barabási et al.: The Architecture of Complexity: From the Diameter of the WWW to the Structure of the Cell (power-point presentation) (http://www.nd.edu/~networks/papers.htm)
4781 Swedes; 18-74; 59% response rate. Liljeros et al. Nature 2001 Nodes: people (Females; Males) Links: sexual relationships Albert László Barabási Albert László Barabási et al.: The Architecture of Complexity: From the Diameter of the WWW to the Structure of the Cell (power-point presentation) (http://www.nd.edu/~networks/papers.htm)
Power-law distributions can be generated in networks by preferential attachment
∞ m = ∑ i ∙ P i i=0 At any given taxonomic level an entity has the probability P 0 of producing no offspring, the probability P 1 of producing one offspring, the probability P 2 of producing two offspring, etc. The mean number of offspring, m, is the sum of the set of probabilities P i times offspring i. If m > 1 (as in the example) the family will likely grow, if it is equal to 1 it will converge towards extinction over infinite time, and if m < 1 the family is certain to eventually become extinguished. i PiPi Example of a probability set:...But how can power-laws be explained in a branching model?
Efforts to simulate power-law distributions based on the branching model did not succeed (straight lines emerged on a log-normal, not a log-log chart). However, succes was achieved when a variable was added giving a probability for any given language to start a new language family (lineage).
Returning to the question of language family sizes. It is unlikely that language families some 10,000-20,000 years ago would have roughly equal sizes, following a poisson distribution. Given the universality of power-laws it is better to assume that the language family sizes had this kind of distribution early on as well.
The large language families before neolithic times were likely not the same today‘s large families. The following six language families are the largest in terms of number of languages and all are spoken in areas where agriculture was practiced very early on: Niger-Congo Austronesian Trans-New Guinea Indo-European Afro-Asiatic Sino-Tibetan
Prior to the spread of agriculture there would have been other large language families. Maybe the big families include some families that are still relatively large, and yet seem to have been reduced by the more recent expansion of families of agriculturalists, for instance Eastern Sudanic (reduced by Afro- Asiatic and Niger-Congo) or West Papuan (reduced by Austronesian).