Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy.

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Chapter 16 Inferential Statistics
Hypothesis testing Another judgment method of sampling data.
Sheng Xiao, Weibo Gong and Don Towsley,2010 Infocom.
Self-Organization of the Sound Inventories: An Explanation based on Complex Networks.
Mutual Information Mathematical Biology Seminar
Methods of identification and localization of the DNA coding sequences Jacek Leluk Interdisciplinary Centre for Mathematical and Computational Modelling,
Fundamental limits in Information Theory Chapter 10 :
Sample size computations Petter Mostad
Hypothesis Testing After 2 hours of frustration trying to fill out an IRS form, you are skeptical about the IRS claim that the form takes 15 minutes on.
Topic 2: Statistical Concepts and Market Returns
PSY 1950 Confidence and Power December, Requisite Quote “The picturing of data allows us to be sensitive not only to the multiple hypotheses that.
Inference.ppt - © Aki Taanila1 Sampling Probability sample Non probability sample Statistical inference Sampling error.
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Modeling Cross-linguistic Relationships Across Consonant Inventories: A Complex Network Approach.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
AM Recitation 2/10/11.
Multiple testing correction
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 9. Hypothesis Testing I: The Six Steps of Statistical Inference.
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Hypothesis Testing.
From last lecture (Sampling Distribution): –The first important bit we need to know about sampling distribution is…? –What is the mean of the sampling.
Random Sampling, Point Estimation and Maximum Likelihood.
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Theory of αBiNs: Alphabetic Bipartite Networks Animesh Mukherjee Dept. of Computer Science and Engineering Indian Institute of Technology, Kharagpur Collaborators:
Information Coding in noisy channel error protection:-- improve tolerance of errors error detection: --- indicate occurrence of errors. Source.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
CONFIDENCE INTERVAL It is the interval or range of values which most likely encompasses the true population value. It is the extent that a particular.
Comp. Genomics Recitation 3 The statistics of database searching.
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
RDPStatistical Methods in Scientific Research - Lecture 41 Lecture 4 Sample size determination 4.1 Criteria for sample size determination 4.2 Finding the.
© Department of Statistics 2012 STATS 330 Lecture 20: Slide 1 Stats 330: Lecture 20.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
CHAPTER 9 Testing a Claim
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Simple linear regression Tron Anders Moger
A Biased Fault Attack on the Time Redundancy Countermeasure for AES Sikhar Patranabis, Abhishek Chakraborty, Phuong Ha Nguyen and Debdeep Mukhopadhyay.
CS654: Digital Image Analysis
1 Source Coding and Compression Dr.-Ing. Khaled Shawky Hassan Room: C3-222, ext: 1204, Lecture 10 Rate-Distortion.
DO LOCAL MODIFICATION RULES ALLOW EFFICIENT LEARNING ABOUT DISTRIBUTED REPRESENTATIONS ? A. R. Gardner-Medwin THE PRINCIPLE OF LOCAL COMPUTABILITY Neural.
ANOVA, Regression and Multiple Regression March
ECE 101 An Introduction to Information Technology Information Coding.
A Quantitative Overview to Gene Expression Profiling in Animal Genetics Armidale Animal Breeding Summer Course, UNE, Feb Analysis of (cDNA) Microarray.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
1 Where we are going : a graphic: Hypothesis Testing. 1 2 Paired 2 or more Means Variances Proportions Categories Slopes Ho: / CI Samples Ho: / CI / CI.
Statistical principles: the normal distribution and methods of testing Or, “Explaining the arrangement of things”
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
8 Coding Theory Discrete Mathematics: A Concept-based Approach.
Unit 5: Hypothesis Testing
CHAPTER 9 Testing a Claim
Chapter 8: Inference for Proportions
When we free ourselves of desire,
Georgi Iskrov, MBA, MPH, PhD Department of Social Medicine
CHAPTER 9 Testing a Claim
Hypothesis Testing.
CHAPTER 9 Testing a Claim
Significance Tests: The Basics
CHAPTER 9 Testing a Claim
Basic Practice of Statistics - 3rd Edition Introduction to Inference
Statistical significance using p-value
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly Department of Computer Science & Engg. Indian Institute of Technology, Kharagpur

Redundancy in Natural Systems  Reduce the risk of information loss – fault tolerance  Examples of redundancy:  Biological systems – Codons, genes, proteins etc.  Linguistic systems – Synonymous words  Human Brain – Perhaps the biggest example of neuronal redundancy

Redundancy in Sound Systems  Like any other natural system, human speech sound systems are expected to show redundancy in the information they encode  In this work we attempt to  Mathematically formulate this redundancy, and,  Unravel the interesting patterns (if any) that results from this formulation

Feature Economy: An age-old Principle  Sounds, especially consonants, tend to occur in pairs that are highly correlated in terms of their features  Languages tend to maximize combinatorial possibilities of a few features to produce many consonants If a language has in its inventory then it will also tend to have voiced voiceless bilabial dental /b//p/ /d/ /t/ plosive

Mathematical Formulation  We use the concepts of information theory to quantify feature economy (assuming features are Boolean)  The basic idea is to compute the number of bits req- uired to pass the information of an inventory of size N over a transmission channel Ideal Scenario Noiseless Channel Inventory of Size NInfo. Undistorted log 2 N bits are required for lossless transmission

Mathematical Formulation  We use the concepts of information theory to quantify feature economy (assuming features are Boolean)  The basic idea is to compute the number of bits req- uired to pass the information of an inventory of size N over a transmission channel General Scenario Noisy Channel Inventory of Size NInfo. Distorted > log 2 N bits are required for lossless transmission

Feature Entropy  The actual number of bits required can be estimated by calculating the binary entropy as follows  p f – number of consonants in the inventory in which feature f is present  q f – number of consonants in the inventory in which feature f is absent  The probability that a consonant chosen at random form the inventory has f is and that is does not have f is (=1- ) pfpf N qfqf N pfpf N

Feature Entropy  If F denote the set of all features, F E = –∑ fєF log 2 + log 2  Redundancy Ratio (RR) RR =  The excess number of bits required to represent the inventory pfpf N pfpf N qfqf N qfqf N FEFE log 2 N

Example

Experimentation  Data Source  UCLA Phonological Inventory Database  Samples data uniformly from almost all linguistic families  Hosts phonological systems of 317 languages  Number of Consonants: 541  Number of Vowels: 151

RR: Consonant Inventories  The slope of the line fit is  RR is almost invariant with respect to the inventory size  The result means that consonant inventories are organized to have similar redundancy irrespective of their size  important because no such explanation yet Inventory Size Redundancy Ratio

The Invariance is not “by chance”  The invariance in the distribution of RRs for consonant inventories did not emerge by chance  Can be validated by a standard test of hypothesis  Null Hypothesis: The invariance in the distribution of RRs observed across the real consonant inventories is also prevalent across the randomly generated inventories.

Generation of Random Inventories  Model I – Purely random model  The distribution of the consonant inventory size is assumed to be known a priori  Conceive of 317 bins corresponding to the languages in UPSID  Pick a bin and fill it by randomly choosing consonants (without repetition) from the pool of 541 available consonants  Repeat the above step until all the bins are packed /p/ /b/ /d/ /k/ 4 /p/ /g/ /d/ /t/ 6 /n/ /m/ /d//t//n//b//p//k//m/ ……………… …………………………………………….. Bin 1Bin 2Bin /p//n/ Pool of phonemes Fill randomly

 Model II – Random model based on Occurrence Frequency  For each consonant c let the frequency of occurrence in UPSID be denoted by f c.  Let there be 317 bins each corresponding to a language in UPSID.  f c bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition. Generation of Random Inventories /p/ /b/ /d/ /k/ /p/ /g/ /d/ /t/ /n/ /m/ …………………………………………….. Bin 1Bin 2Bin 317 /p//n/ /t/ (25)/n/ (12)/p/ (100) ……………………. Pool of phonemes /t/ Choose 25 bins randomly and fill with /t/

Results  Model I – t-test indicates that the null hypothesis can be rejected with ( e-15)% confidence  Model II – Once again in this case t- test shows that the null hypothesis can be rejected with (100–2.55e–3)% confidence  Occurrence frequency governs the organization of the consonant inventories at least to some extent Inventory Size Average Redundancy Ratio Model I Model II Real

The Case of Vowel Inventories  The slope of the line fit is  For small inventories RR is not invariant while for Larger ones (size > 12) it is so  Smaller inventories  perceptual contrast and Larger inventories  feature economy  t-test shows that we can be 99.93% confident that the two inventories are different in terms of RR Inventory Size Redundancy Ratio Vowels Consonants

Error Correcting Capability  For most of the consonant inventories the average hamming distance between two consonants is 4  1 bit error correcting capability  Vowel inventories do not indicate any such fixed error correcting capability Consonants Vowels Inventory Size Average Hamming Distance

Conclusions  Redundancy ratio is almost an invariant property of the consonant inventories with respect to the inventory size,  This invariance is a direct consequence of the fixed error correcting capabilities of the consonant inventories,  Unlike the consonant inventories, the vowel inventories are not indicative (at least not all of them) of such an invariance.

Discussions  Cause of the origins of redundancy in a linguistic system  Fault tolerance: Redundancy acts as a failsafe mechanism against random distortion  Evolutionary Cause: Redundancy allows a speaker to successfully communicate with speakers of neighboring dialects – “Linguistic junk” as pointed out by Lass (Lass, 1997)

Děkuji