1 LING 696B: Gradient phonotactics and well- formedness.

1 LING 696B: Gradient phonotactics and well- formedness

2 Vote on remaining topics Topics that have been fixed: Morpho-phonological learning (Emily) + (LouAnn’s lecture) + Bayesian learning Rule induction (Mans) + decision tree Learning and self-organization (Andy’s lecture)

3 Voting on remaining topics Select 2-3 from the following (need a ranking): OT and Stochastic OT Alternatives to OT: random fields/maximum entropy Minimal Description Length word chopping Feature-based lexical access

4 Well-formedness of words (following Mike’s talk) A word “sounds like English” if: It is a close neighbor of some words that sound really English. E.g. “pand” is neighbor of sand, band, pad, pan, … It agrees with what English grammar says what an English word should look like, e.g. gradient phonotactics says blick > bnick

5 Well-formedness of words (following Mike’s talk) A word “sounds like English” if: It is a close neighbor of some words that sound really English. E.g. “pand” is neighbor of sand, band, pad, pan, … It agrees with what English grammar says what an English word should look like, e.g. gradient phonotactics says blick > bnick Today: relate these two ideas to the non-parametric and parametric perspectives

6 Many ways of calculating probability of a sequence Unigrams, bigrams, trigrams, syllable parts, transition probabilities … No bound on the number of creative ways

7 Many ways of calculating probability of a sequence Unigrams, bigrams, trigrams, syllable parts, transition probabilities … No bound on the number of creative ways What does it mean to say the “probability” of a phonological word? Objective/frequentist v.s. subjective/ Bayesian: philosophical (but important)

8 Many ways of calculating probability of a sequence Unigrams, bigrams, trigrams, syllable parts, transition probabilities … No bound on the number of creative ways What does it mean to say the “probability” of a phonological word? Objective/frequentist v.s. subjective/ Bayesian: philosophical (but important) Thinking “parametrically” may clarify things “likelihood” = “probability” calculated from a model

9 Parametric approach to phonotactics Example: “bag of sounds” assumption/ exchangable distributions p(blik) = p(lbik) = p(kbli)

10 Parametric approach to phonotactics Example: “bag of sounds” assumption/ exchangable distributions p(blik) = p(lbik) = p(kbli) Unigram models: N-1 parameters B L I K What is  ? How to get  (hat)? How to assign prob to “blick”?

11 Parametric approach to phonotactics Unigram model with overlapping observations: N 2 - 1 parameters B L I K Note: input is #B BL LI IK K# What is  ? How to get  (hat)? How to assign prob to “blick”?

12 Parametric approach to phonotactics Unigram with annotated observations (Coleman and Pierrehumbert) BL IK Onset of strong Initial/final syllable Rhyme of strong Initial/final syllable “osif” “rsif” Input: segment annotated with a syllable parse

13 Parametric approach to phonotactics Bigram model: N(N-1) parameters {p(w n |w n-1 )} (how many for trigram?) B L I K Input: segment sequence

14 Ways that theory might help calculate probability Probability calculation must be based on an explicit model Need a story about what sequences are How can phonology help with calculating sequence probability? More delicate representations More complex models

15 Ways that theory might help calculate probability Probability calculation must be based on an explicit model Need a story about what sequences are How can phonology help with calculating sequence probability? More delicate representations More complex models But: phonology is not quite about what sequences are …

16 More delicate representations Would CV phonology help? Auto-segmental tiers, features, gestures? The chains no longer independent: more sophisticated models are needed Limit: generative model of speech production (very hard) B L I K I T

17 More complex models Mixture of unigrams Used in document classification B L I K Lexical strata Unigram

18 More complex models More structure in the Markov chain Can also model the length distribution with the so-called semi-Markov models BL IK “onset” “rhyme V” “rhyme VC”

19 More complex models Probabilistic context free grammar Syllable --> C + VC (0.6) Syllable --> C + V (0.35) Syllable --> C + C (0.05) C --> _ (0.01) C --> b (0.05) … See 439/539

20 What’s the benefit for doing more sophisticated things? Recall: maximum likelihood need more data to produce a better estimate Data sparsity problem: training data often insufficient for estimating all the parameters, e.g. zero counts Lexicon size: we don’t have infinitely many words to estimate phonotactics Smoothing: properly done, has a Bayesian interpretation (often not)

21 Probability and well- formedness Generative modeling: characterize a distribution over strings Why should we care about this distribution? Hope: this may have something to do with grammaticality judgements But: judgements also affected by what other words “sound like”. Puzzle of mrupect/mrupation It may be easier to model a function with input = string, output = judgements

22 Bailey and Hahn Tried all kinds of ways of calculating phonotatics and neighborhood density, and see which combination “works the best” Typical reasoning: “metric X and Y as factors explain 15% variance”

23 Bailey and Hahn Tried all kinds of ways of calculating phonotatics and neighborhood density, and see which combination “works the best” Typical reasoning: “metric X and Y as factors explain 15% variance” Methodology: ANOVA Model (1-way): data = overall mean + effect + error What can ANOVA do for us? How do we check if ANOVA makes sense? What is the “explained variance”?

24 Non-parametric approach to similarity neighborhood A hint from B&H: the neighborhood model d ij is weighted edit distance A,B,C,D estimated from polynomial regression Recall: radial basis functions F(x) =  i a i K(x, x i ), with K(x, x i ) = e -d(x, xi) Quadratic weighting ad hoc, should just do general nonlinear regression with RBF

25 Non-parametric approach to similarity neighborhood Recall: RBF as a “soft” neighborhood model Now think of strings also as data points, with neighborhood defined by some string distance (e.g. edit) Same kind of regression with RBF

26 Non-parametric approach to similarity neighborhood Key technical point: choosing the right kernel Edit-distance kernel: K(x, x i ) = e - edit(x, xi) Sub-string kernel: measuring the length of common sub-sequence (mrupation) Key experimental data: controlled stimuli, split into training and test sets (equal phonotactic prob) No need to transform rating scale

27 Non-parametric approach to similarity neighborhood An enterprise of questions open up with the non-parametric perspective: Would yes/no task lead to word “anchor” like support vectors? Would the new words interact with each other, as seen in the transductive inference? What type of metric most appropriate for inferring well-formedness from neighborhoods?

28 Integration Hard to integrate with a probabilistic (parametric) model Neighborhood density has a strong non- parametric character -- grows with data Possible to integrate phonotactic prob in a non-parametric model: kernel algebra aK 1 (x,y) + bK 2 (x,y), K 1 (x,y)*K 2 (x,y) are also kernels p kernel: K(x 1, x 2 )=  i p(x 2 |h)p(x 1 |h)p(h) p comes from parametric model

1 LING 696B: Gradient phonotactics and well- formedness.

Similar presentations

Presentation on theme: "1 LING 696B: Gradient phonotactics and well- formedness."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 LING 696B: Gradient phonotactics and well- formedness.

Similar presentations

Presentation on theme: "1 LING 696B: Gradient phonotactics and well- formedness."— Presentation transcript:

Similar presentations

About project

Feedback