1 Stability of language features: a comparison of the WALS and JM typological databases Oleg Belyaev

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Advanced Piloting Cruise Plot.
Steps in the Scientific Method
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Introductory Mathematics & Statistics for Business
Chapter 1 The Study of Body Function Image PowerPoint
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Effective Change Detection Using Sampling Junghoo John Cho Alexandros Ntoulas UCLA.
Business Transaction Management Software for Application Coordination 1 Business Processes and Coordination.
Introduction to Algorithms 6.046J/18.401J
Statistical Significance and Population Controls Presented to the New Jersey SDC Annual Network Meeting June 6, 2007 Tony Tersine, U.S. Census Bureau.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
Process a Customer Chapter 2. Process a Customer 2-2 Objectives Understand what defines a Customer Learn how to check for an existing Customer Learn how.
Copyright © 2010 Pearson Education, Inc. Slide
Addition Facts
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Around the World AdditionSubtraction MultiplicationDivision AdditionSubtraction MultiplicationDivision.
ZMQS ZMQS
Chapter 7 Sampling and Sampling Distributions
Chi Square Interpretation. Examples of Presentations The following are examples of presentations of chi-square tables and their interpretations. These.
BT Wholesale October Creating your own telephone network WHOLESALE CALLS LINE ASSOCIATED.
Break Time Remaining 10:00.
The basics for simulations
On Comparing Classifiers : Pitfalls to Avoid and Recommended Approach
(This presentation may be used for instructional purposes)
PP Test Review Sections 6-1 to 6-6
ABC Technology Project
Hash Tables.
Business and Economics 6th Edition
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
Copyright © 2013, 2009, 2005 Pearson Education, Inc.
指導教授:李錫堤 教授 學生:邱奕勛 報告日期:
VOORBLAD.
Chapter 6 The Mathematics of Diversification
Copyright © 2013, 2009, 2006 Pearson Education, Inc.
Squares and Square Root WALK. Solve each problem REVIEW:
Traditional IR models Jian-Yun Nie.
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Science as a Process Chapter 1 Section 2.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Applied Hydrology Regional Frequency Analysis - Example Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
1 Multidimensional poverty in Senegal : a non-monetary approach using basic needs By Jean Bosco KI The many dimensions of poverty Brasilia, Brazil –
Addition 1’s to 20.
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
25 seconds left…...
Test B, 100 Subtraction Facts
: 3 00.
5 minutes.
Januar MDMDFSSMDMDFSSS
Week 1.
Statistical Inferences Based on Two Samples
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
©2006 Prentice Hall Business Publishing, Auditing 11/e, Arens/Beasley/Elder Audit Sampling for Tests of Controls and Substantive Tests of Transactions.
A SMALL TRUTH TO MAKE LIFE 100%
PSSA Preparation.
Chapter 11: The t Test for Two Related Samples
Testing Hypotheses About Proportions
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
1 Decidability continued…. 2 Theorem: For a recursively enumerable language it is undecidable to determine whether is finite Proof: We will reduce the.
1 Volume measures and Rebasing of National Accounts Training Workshop on System of National Accounts for ECO Member Countries October 2012, Tehran,
Presentation transcript:

1 Stability of language features: a comparison of the WALS and JM typological databases Oleg Belyaev

2 Defining stability Explaining distributions: what is where why? Most distributions probably have a historical explanation.

3 Defining stability Languages involved in sprachbunds share a number of typological features. But what determines which features exactly they share?

4 Defining stability A concept of an inherent 'stability' of a linguistic feature can be used to explain the historic development of languages. Some features are more prone to change, while others are less so. Claims are often made that some parts of grammar are more 'stable' than others. E. g. «Morphology is more stable than syntax».

5 Defining stability Edward Sapir, «Language»: slowly modifiable features which are more peculiar to the core or 'genius' of a language

6 Defining stability Joseph H. Greenberg: If a particular phenomenon can arise very frequently and is highly stable once it occurs, it should be universal or near universal (...). If it tends to come into existence often and in various ways, but its stability is low, it should be found fairly often but distributed relatively evenly among genetic linguistic stocks (...). If a particular property rarely arises but is highly stable when it occurs, it should be fairly frequent on a global basis but be largely confined to a few linguistic stocks (...). If it occurs only rarely and is unstable when it occurs, it should be highly infrequent or non-existent and sporadic in its geographical and genetic distribution (...) (Greenberg 1978: 76)

7 Defining stability Johanna Nichols (Nichols 1992, Nichols 1994) For the first time, a set of precise metrics based on the frequencies of features among different families is proposed. Also cf. Maslova 2004, Dahl 2004 for similar metrics.

8 Defining stability Wichmann and Holman (pending) Uses a metric similar to Nichols (1992, 1994), but it is applied to WALS data, making it the first stability calculation to use a large typological database. Stability is defined as the probability that a given feature would remain unchanged during an arbitrary period of time.

9 Defining stability The values we can obtain from typology are not and cannot be true mathematic probabilities. What we can do is to use a metric which would serve as an approximation of which features are more stable than other features.

10 Stability in Jazyki Mira Jazyki Mira: a typological database similar to WALS, with 3821 binary features for 318 Eurasian languages. Stability as a means of comparison with WALS: apply the same metric to JM's data. Wichmann & Holman include stability values for WALS features in binary form.

11 The metric used The metric uses two frequency values, R and U. For R, one should count the proportion of language pairs for which the feature has the same value in each of the genera present (genera defined as per Dryer 2001). Then, one should calculate a weighted average of the values for all genera with the weight being the square root of the number of languages for a given genus:

12 The metric used The value for U is calculated in a similar way: one gets the proportion of all the pairs of unrelated languages where the value of the feature is different. The final value of stability is calculated via the formula (Albatineh et al cited as explanation):

13 Application to JM All features are considered to be attested for all languages. The classification is the same as the one used by Wichmann & Holman: genera for calculating R and WALS families for calculating U. Usually only features with U < 0.9 are considered.

14 The results See your handouts Four-way categorization by Wichmann & Holman: very stable: 51.8 – stable: 32.8 – 51.7 unstable: 19.2 – 32.7 very unstable: – 18.9 (for binary values of WALS features)

15 The results See your handouts. Of the 42 feature pairs, 23 (~55%) fall into the same categories, 13 (~31%) are in adjacent categories (blue), 6 (~14%) have no correlation at all (red). This means that the correlation is acceptable for 86% of feature pairs.

16 The results For the 13 statements in the literature analyzed, there seems to be no correlation in only 2 cases, and these have questionable interpretation for WALS.

17 The results Stabilities for features with U < 0.9

18 The results The distribution of JM stabilities for features where U < 0.9

19 The results The WALS distribution included in Wichmann & Holman's paper (for multi-value features!)

20 The results This seems to show that, while the two databases are different in structure and scope, the data contained in them is in fact comparable and can be used for conducting similar research. Which database is more suited for calculating stability remains, as of yet, unclear. Examples of the most stable features: agglutination, inflection, gender, ergativity (needs analysis)

21 Testing the results Conducting a simulation similar to the one done by Wichmann and Holman. Using the obtained values as weights when calculating typological similarity (for later constructing phylogenetic trees; cf., e. g., Wichmann & Saunders 2007 and Polyakov & Solovyev 2006). Double-checking the metric on two databases.

22 Problems JM only contains binary features, while counting stability would be useful for some multi-value features. E. g., the stability of «number of vowels» vs. the stability of «8 vowels». Binary features are sometimes too gradual to provide adequate statistics; for this reason, and for the sake of comparison with WALS, combining them into one would sometimes be useful. Stability can probably be decomposed into stability per se and borrowability

23 Possible solutions It is possible to extend JM with additional markup which would group binary features into multi-value ones. Unifying several features into one is a purely computational problem, but the sheer number of JM's features requires one to manually sort through them.

24 Conclusions The results seem to show that WALS and JM are comparable with regard to their usefulness in conducting quantitative research. Stability does not seem to be dependent on geography. To make comparison more fruitful it is crucial, however, to find better ways of establishing feature correspondences. Two databases allow double-checking data.

25 Acknowledgements Special thanks to Søren Wichmann, Eric W. Holman, Vladimir Polyakov, Valery Solovyev and Dmitry Egorov for their aid, suggestions and helpful discussion. The research is supported by RFBR grant ( а

26 References Albatineh, Ahmed N., Magdalena Niewiadomska-Bugaj, and Daniel Mihalko On similarity indices and correction for chance agreement. Journal of Classification Dahl, Östen The Growth and Maintenance of Linguistic Complexity. Amsterdam: John Benjamins. Dryer, Matthew S List of genera, available at Greenberg, Joseph H Diachrony, synchrony and language universals. In Universals of Human Language, Vol. III: Word Structure, ed. by Joseph H. Greenberg, Charles A. Ferguson, and Edith A. Moravcsik, pp. 47–82. Stanford: Stanford University Press. Maslova, Elena Динамика типологических распределений и стабильность языковых типов [Dynamics of typological distributions and stability of language types]. Voprosy jazykoznanija Nichols, Johanna Linguistic Diversity in Space and Time. Chicago: The University of Chicago Press. Nichols, Johanna Diachronically stable structural features. Historical Linguistics Selected Papers from the 11th International Conference on Hitorical Linguistics, Los Angeles August 1993, Amsterdam/Philadelphia: John Benjamins Publishing Company. Polyakov, Vladimir N. and Solovyev, Valery D Компьютерные модели и методы в типологии и компаративистике [Computational models and methods in typology and comparative linguistics]. Kazan. Wichmann, Søren, and Eric W. Holman (pending publication). Assessing temporal stability for linguistic typological features. Available online: Wichmann, Søren and Arpiar Saunders How to use typological databases in historical linguistic research. Diachronica 24.2: Available online: