Presentation is loading. Please wait.

Presentation is loading. Please wait.

Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy.

Similar presentations


Presentation on theme: "Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy."— Presentation transcript:

1 Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy Ganguly Department of Computer Science & Engg. Indian Institute of Technology, Kharagpur

2 Redundancy in Natural Systems  Reduce the risk of information loss – fault tolerance  Examples of redundancy:  Biological systems – Codons, genes, proteins etc.  Linguistic systems – Synonymous words  Human Brain – Perhaps the biggest example of neuronal redundancy

3 Redundancy in Sound Systems  Like any other natural system, human speech sound systems are expected to show redundancy in the information they encode  In this work we attempt to  Mathematically formulate this redundancy, and,  Unravel the interesting patterns (if any) that results from this formulation

4 Feature Economy: An age-old Principle  Sounds, especially consonants, tend to occur in pairs that are highly correlated in terms of their features  Languages tend to maximize combinatorial possibilities of a few features to produce many consonants If a language has in its inventory then it will also tend to have voiced voiceless bilabial dental /b//p/ /d/ /t/ plosive

5 Mathematical Formulation  We use the concepts of information theory to quantify feature economy (assuming features are Boolean)  The basic idea is to compute the number of bits req- uired to pass the information of an inventory of size N over a transmission channel Ideal Scenario Noiseless Channel Inventory of Size NInfo. Undistorted log 2 N bits are required for lossless transmission

6 Mathematical Formulation  We use the concepts of information theory to quantify feature economy (assuming features are Boolean)  The basic idea is to compute the number of bits req- uired to pass the information of an inventory of size N over a transmission channel General Scenario Noisy Channel Inventory of Size NInfo. Distorted > log 2 N bits are required for lossless transmission

7 Feature Entropy  The actual number of bits required can be estimated by calculating the binary entropy as follows  p f – number of consonants in the inventory in which feature f is present  q f – number of consonants in the inventory in which feature f is absent  The probability that a consonant chosen at random form the inventory has f is and that is does not have f is (=1- ) pfpf N qfqf N pfpf N

8 Feature Entropy  If F denote the set of all features, F E = –∑ fєF log 2 + log 2  Redundancy Ratio (RR) RR =  The excess number of bits required to represent the inventory pfpf N pfpf N qfqf N qfqf N FEFE log 2 N

9 Example

10 Experimentation  Data Source  UCLA Phonological Inventory Database  Samples data uniformly from almost all linguistic families  Hosts phonological systems of 317 languages  Number of Consonants: 541  Number of Vowels: 151

11 RR: Consonant Inventories  The slope of the line fit is -0.0178  RR is almost invariant with respect to the inventory size  The result means that consonant inventories are organized to have similar redundancy irrespective of their size  important because no such explanation yet Inventory Size Redundancy Ratio

12 The Invariance is not “by chance”  The invariance in the distribution of RRs for consonant inventories did not emerge by chance  Can be validated by a standard test of hypothesis  Null Hypothesis: The invariance in the distribution of RRs observed across the real consonant inventories is also prevalent across the randomly generated inventories.

13 Generation of Random Inventories  Model I – Purely random model  The distribution of the consonant inventory size is assumed to be known a priori  Conceive of 317 bins corresponding to the languages in UPSID  Pick a bin and fill it by randomly choosing consonants (without repetition) from the pool of 541 available consonants  Repeat the above step until all the bins are packed /p/ /b/ /d/ /k/ 4 /p/ /g/ /d/ /t/ 6 /n/ /m/ /d//t//n//b//p//k//m/ ……………… …………………………………………….. Bin 1Bin 2Bin 317 2 /p//n/ Pool of phonemes Fill randomly

14  Model II – Random model based on Occurrence Frequency  For each consonant c let the frequency of occurrence in UPSID be denoted by f c.  Let there be 317 bins each corresponding to a language in UPSID.  f c bins are then chosen uniformly at random and the consonant c is packed into these bins without repetition. Generation of Random Inventories /p/ /b/ /d/ /k/ /p/ /g/ /d/ /t/ /n/ /m/ …………………………………………….. Bin 1Bin 2Bin 317 /p//n/ /t/ (25)/n/ (12)/p/ (100) ……………………. Pool of phonemes /t/ Choose 25 bins randomly and fill with /t/

15 Results  Model I – t-test indicates that the null hypothesis can be rejected with (100 - 9.29e-15)% confidence  Model II – Once again in this case t- test shows that the null hypothesis can be rejected with (100–2.55e–3)% confidence  Occurrence frequency governs the organization of the consonant inventories at least to some extent Inventory Size Average Redundancy Ratio Model I Model II Real

16 The Case of Vowel Inventories  The slope of the line fit is -0.125  For small inventories RR is not invariant while for Larger ones (size > 12) it is so  Smaller inventories  perceptual contrast and Larger inventories  feature economy  t-test shows that we can be 99.93% confident that the two inventories are different in terms of RR Inventory Size Redundancy Ratio Vowels Consonants

17 Error Correcting Capability  For most of the consonant inventories the average hamming distance between two consonants is 4  1 bit error correcting capability  Vowel inventories do not indicate any such fixed error correcting capability Consonants Vowels Inventory Size Average Hamming Distance

18 Conclusions  Redundancy ratio is almost an invariant property of the consonant inventories with respect to the inventory size,  This invariance is a direct consequence of the fixed error correcting capabilities of the consonant inventories,  Unlike the consonant inventories, the vowel inventories are not indicative (at least not all of them) of such an invariance.

19 Discussions  Cause of the origins of redundancy in a linguistic system  Fault tolerance: Redundancy acts as a failsafe mechanism against random distortion  Evolutionary Cause: Redundancy allows a speaker to successfully communicate with speakers of neighboring dialects – “Linguistic junk” as pointed out by Lass (Lass, 1997)

20 Děkuji


Download ppt "Redundancy Ratio: An Invariant Property of the Consonant Inventories of the World’s Languages Animesh Mukherjee, Monojit Choudhury, Anupam Basu and Niloy."

Similar presentations


Ads by Google