Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India A tale of the lazy.

Similar presentations


Presentation on theme: "Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India A tale of the lazy."— Presentation transcript:

1 Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India monojitc@microsoft.com A tale of the lazy tongue Indo-Australia Workshop on Optimization in Human Language Technology 16 th Dec 2012, IIT Patna

2 Language Change

3 Change in the syntactic/semantic/phonological features of a language Perpetual, universal, directional (?) Phonological Change: –Affects the sounds –Structured, independent of syntax/semantics –Example: Loss of consonant clusters in Hindi agni  aag, dugdha  dUdh, raatri  raat

4 Effects of the “Lazy Tongue” Assimilation in+apt = inapt in+decent = indecent in+polite = impolite in+mature = immature in+legal = illegal in+regular = irregular Deletion cannot  can’t do not  don’t will not  won’t are not  ain’t information  info

5 Explanations for Change Exogenous causes –Language contact –Socio-political factors –Communication medium Endogenous causes –Functional –Phonetic error-based –Frequency drifts –Evolutionary

6 Functional Explanation of Language Change There are three evolutionary forces on any linguistic system: –Minimization of effort (energy) –Maximization of perceptual distinctiveness (Minimization of ambiguity) –Maximization of learnability Language is a perpetually evolving system shaped by these three conflicting forces

7 Outline of the Talk Morpho-phonological change of Bangla Verb systems and emergence of dialect diversity –Approach: Multi-Objective Constrained Optimization –Technique: Multi-Objective Genetic Algorithm (MOGA) Understanding Computer Mediated Communication –Normalization of Texting language –Romanization of Indian Language text

8 Geography of Bangla Standard Colloquial Bengali (SCB) Agartala Colloquial Bengali (ACB) Sylhetti

9 History of Bangla 1200 AD 1800 AD

10 BanglaVerb Morphology করেছিলাম kar-echh-il-aam Verb root (do) Aspect (perfect) Tense (past) Person (first) I had done

11 Cognates in the Dialects FeaturesClassicalSCBACB Non-finitekariyAkorekairA Ps,2, per.kariyAChilakoreChilokorsilo Ps,1, cont.kariteChilAmkorChilAmkartAslAm root: kar (to do)

12 Atomic Phonological Operators kariteChila kariChila kairChilakorChila karitChila korChilo Del(e/t_Ch) Del(t/_Ch) Met(ri/_Ch) Asm(a  o/_i) Mut(a  o/_$) Deletion, Metathesis Assimilation, Mutation

13 Hypothesis A sequence of Atomic Phonological Operators, is preferred if the verb forms obtained by application of this sequence on the classical forms have some functional benefit over the classical forms. Thus, all the modern dialects of Bangla have some functional advantage over the classical dialect.

14 A Formal Model of Functional Explanation f 1 : Effort of articulation f 2 : [Acoustic distinctiveness] -1 Unstable languages Impossible languages Metastable languages

15 Genetic Algorithm Gene (A string of symbols) How the solution actually looks like GA: search for good solutions mimicking nature [recombination and mutation of genes]

16 Phenotype kori korChi : korte kori kartAsi : kartA Lexicon consisting of 28 forms for the verb kar

17 Genotype A sequence of atomic phonological operators Del tMet riNOPDel eAsm aDel iNOP Dsm eNOP Met riAsm aDel eNOP

18 Genotype  Phenotype kari kariteChi karite Del tMet riNOPDel eAsm aDel iNOP kari karieChi karie kair kaireChi kaire kor korCh kor

19 Crossover

20 Mutation

21 Multi-Objective GA

22 Multi-Objective GA: Apply constraints

23

24 Multi-Objective GA: Finding out good solutions

25 Multi-Objective GA: But also keep some not-so-good solutions

26

27 Multi-Objective GA: After several iterations

28 Objective functions Articulatory effort –f e (Λ): weighted sum of number of syllables, letters and vowel height differences averaged over all words in the lexicon Acoustic Distinctiveness –f d (Λ): Inverse of mean edit distance between words Learnability –f r (Λ): correlation between feature match and edit distance

29 Experiments NSGA – II : a package for fast MOGA Gene length: 15 APOs A repertoire of 128 APOs Population: 1000, Generation: 500 6 Models with different combinations of constraints and objectives

30 Pareto-optimal front CB Sylhetti ACB SCB

31 Observations vertical and horizontal limb real dialects on the horizontal limb Sound changes push the dialects from right to left (reduce effort) but never up the limb why?

32 Role of Constraints

33 For more information Choudhury et al., Evolution optimization and language change: the case of Bengali verb inflections, in Proceedings of ACL SIGMORPHON9, Association for Computational Linguistics, 2007 http://research.microsoft.com/people/monojitc/ MOGA and NSGA II Kanpur Genetic Algorithms Laboratory http://www.iitk.ac.in/kangal/index.shtml

34 Food for Thought Evaluation: –Myriads of possible dialects, but only a few observed in nature Fixed set of pre-defined APOs – how to generalize for any change? MOGA is an optimization tool, which in no way simulates language change –How do languages optimize themselves?

35 Outline of the Talk Morpho-phonological change of Bangla Verb systems and emergence of dialect diversity –Approach: Multi-Objective Constrained Optimization –Technique: Multi-Objective Genetic Algorithm (MOGA) Understanding Computer Mediated Communication –Normalization of Texting language –Romanization of Indian Language text

36 Computer Mediated Communication Form

37 Texting Language A new genre of English & also other languages used in chats, sms, emails, blogs, tweets, FB posts, comments etc. dis is n eg 4 txtin lang This is an example for Texting language

38 Texting Language A new genre of English & also other languages used in chats, sms, emails, blogs, etc. Ungrammatical, unconventional spellings dis is n eg 4 txtin lang This is an example for Texting language 24 39 The shorter  the faster Constraint: understandability

39 Analysis of Social Media A hot topic in NLP –Normalization –Language identification –Sentiment/Polarity detection –Summarization/trend prediction Choudhury et al. (2007) Investigation and Modeling of the Structure of Texting Language. In IJCAI Workshop on Analytics of Noisy Data 2007

40 Tomorrow never dies!!! 2moro (9) tomoz (25) tomoro (12) tomrw (5) tom (2) tomra (2) tomorrow (24) tomora (4) tomm (1) tomo (3) tomorow (3) 2mro (2) morrow (1) tomor (2) tmorro (1) moro (1)

41 Patterns or Compression Operators Phonetic substitution (phoneme) –psycho  syco, then  den Phonetic substitution (syllable) –today  2day, see  c Deletion of vowels –message  mssg, about  abt Deletion of repeated characters –tomorrow  tomorow

42 Patterns or Compression Operators Truncation (deletion of tails) –introduction  intro, evaluation  eval Common Abbreviations –Bangalore  blr, text back  tb Informal pronunciation –going to  gonna, better  betta

43 HMMs for SMS Normalization G 1 ‘T’ S6S6 G 2 ‘O’ G 3 ‘D’ G 4 ‘A’ G 5 ‘Y’ S0S0 P 2 /AH/ P 4 /AY/ S 1 “2” εT@εO@ εD@εA@εY@

44 Bigram Examples TL: would b gd 2 c u some time soon Op: would be good to see you some time soon TL: just wanted 2 say a big thanx 4 my bday card Op: just wanted to say a big thanks for my today card TL: me wel i fink bein at home makes me feel a lot more stressed den bein away from it Op: me well i think being at home makes me feel a lot more stressed deny being away from it

45 Code mixing Transliteration Spelling Change Indian English Use of Indian Languages on Online Social Media

46 Concluding Remarks Languages are perpetually evolving and optimizing systems –Computational modeling of language change is still in its infancy –Lots of scope for research

47 Thank You! monojitc@microsoft.com Questions??

48 Why Computational Models? FORAGAINST Formalization Virtual experimentation Exploration Intractable Simplified assumptions Toy languages Can we model real world language change?

49 Objectives and Constraints - 1 Articulatory effort f e (w) = α 1 f e1 (w) + α 2 f e2 (w) + α 3 f e3 (w) f e1 (w) = |w| f e2 (w) =  hr(σ i ) f e3 (w) =  |ht(V i ) - ht(V i+1 )|

50 Objectives and Constraints - 2 Acoustic distinctiveness f d (Λ) = (1/N)  ed(w i,w j ) -1 C d (Λ) = -1 if ed(w i,w j ) = 0 for > 2 pairs Phonotactic constraints C p (Λ) = -1 if any of the words violate the phonotactic constraints of the language

51 Objectives and Constraints - 3 Learnability as Regularity –f r : The correlation coefficient between the edit distance and number of matching morphological attributes for every word pair –C r = -1 if f r > 0.8

52 Emergent dialects ClassicalD1D2D3 kariteChilAmkartAkarChi (korChi) karteChi (kartAsi) kariteChilakartAakarCha (korCha) karteCha (kartAsa) kariteChilenkartAenkarChen (korChen) karteChen (kartAsen)


Download ppt "Language Change as a Constrained Multi-Objective Optimization Monojit Choudhury Microsoft Research Lab, India A tale of the lazy."

Similar presentations


Ads by Google