Presentation is loading. Please wait.

Presentation is loading. Please wait.

Words in puddles of sound

Similar presentations


Presentation on theme: "Words in puddles of sound"— Presentation transcript:

1 Words in puddles of sound
Padraic Monaghan University of York Morten Christiansen Cornell University

2 Words in a “sea of sound” (Saffran, 2001)
Discovering words from continuous speech with no reliable cues to word boundaries (Jones, 1918; Liberman et al., 1967) where words are realised variably (Pollack & Pickett, 1964)

3 Segmentation and sublexical cues
Final syllables of words are longer (Klatt, 1975) hamster v. ham (Saffran, Newport, & Aslin (1996; Salverda & McQueen, 2004): First syllables of words are stressed ~60% of the time in English (Crystal & House, 1990; Pierrehumbert, 1981) Johnson & Jusczyk (2001); Thiessen & Saffran (2003) Certain diphones are more likely to occur across words than within words (Mattys et al., 2005)

4 Multiple cues in speech segmentation
Hierarchical model (Mattys, White, & Melhorn, 2005)

5 whosalovelybabyyesyouareyourealovelybabyarentyouyesyouare
Puddles whosalovelybabyyesyouareyourealovelybabyarentyouyesyouare In 5.5M words of child-directed speech: Utterance Length Percentage of Corpus 1 2 3 4 5 6 7 8 > 8 26.2 13.7 13.1 11.8 9.5 7.5 5.6 3.9 8.6

6 Lexical approach to segmentation
Once you’ve got the words, segmentation is easy (Norris, 1994; 2007) Assume each utterance is a word until you know differently if it’s repeated, you keep it if it doesn’t occur again, you lose it

7 Aims of Modelling Utterances can’t be used as don’t know when it’s a single word, when it’s multiple (Brent & Cartwright, 1996) utterance boundaries are sufficient to get started single-word utterances are useful anchors for segmentation It is possible to distinguish (most) single-word from (most) multiple word utterances Proper nouns have a special role Frequent multiple-word sequences will be “lexicalised” (Tomasello, 2001)

8 Lexical approach to segmentation
Familiar words used for segmentation by “Maggie” (Bortfeld et al., 2005): “maggie’s bike had big, black wheels” “hannah’s cup was bright and shiny” infants familiarised to “bike” more quickly than “cup” Proper nouns often occur as single utterances: 3.3% of utterances in “naomi” corpus in CHILDES Very high frequent words are useful for categorising content words (Monaghan, Chater, & Christiansen, 2005; Redington, Chater, & Finch, 1998)

9 Corpora 6 corpora from CHILDES:
child-directed speech to children aged < 2:6 Orthographic transcription run through festival speech synthesiser (Black et al., 1990) Corpus Utterances Words MLU Reference Eve 18,280 62,734 3.43 Brown, 1973 Peter 21,311 74,185 3.48 Bloom et al, 1974 Nina 17,075 73,562 4.31 Suppes, 1974 Naomi 9,006 29,003 3.22 Sachs, 1983 Anne 28,250 96,008 3.49 Theakston et al., 2001 Aran 24,801 106,983

10 The model kitty thatsrightkitty sayitagain lookkitty LEXICON

11 The model kitty thatsrightkitty sayitagain lookkitty LEXICON

12 The model kitty thatsrightkitty sayitagain lookkitty LEXICON kitty 1.0

13 The model LEXICON kitty 0.99 kitty thatsrightkitty sayitagain
lookkitty LEXICON kitty

14 The model LEXICON kitty 0.99 kitty thatsrightkitty sayitagain
lookkitty LEXICON kitty

15 The model LEXICON kitty 1.99 thatsright 1.00 kitty thatsrightkitty
sayitagain lookkitty LEXICON kitty thatsright 1.00

16 The model LEXICON kitty 1.98 thatsright 0.99 kitty thatsrightkitty
sayitagain lookkitty LEXICON kitty thatsright 0.99

17 The model LEXICON kitty 2.98 thatsright 0.99 kitty thatsrightkitty
sayitagain lookkitty LEXICON kitty thatsright 0.99

18 The model LEXICON kitty 3.96 thatsright 0.97 sayitagain 0.99 look 1.00
thatsrightkitty sayitagain lookkitty LEXICON kitty thatsright 0.97 sayitagain 0.99 look

19 More constraints in the model: Phonological glue
oh okay noway nevertheless LEXICON oh kay n way evertheless Candidate words with recognised beginnings and endings admitted Candidate words which divide a recognised word-internal diphone rejected

20 More constraints in the model: Phonological glue
oh okay no nevertheless GLUE Beg: oh End: oh Glue: oh LEXICON oh

21 More constraints in the model: Phonological glue
oh okay no nevertheless GLUE Beg: oh End: oh Glue: oh LEXICON oh

22 More constraints in the model: Phonological glue
oh okay no nevertheless GLUE Beg: oh End: oh Glue: oh LEXICON oh x ka? oh? ok?

23 More constraints in the model: Phonological glue
oh okay no nevertheless GLUE Beg: oh, ka End: oh, ay Glue: oh, ok, ka, ay LEXICON oh okay

24 More constraints in the model: Phonological glue
oh okay no nevertheless GLUE Beg: oh, ka End: oh, ay Glue: oh, ok, ka, ay LEXICON oh okay no nevertheless

25 Testing the model Decisions: Internal diphone glue constraint
Legal beginnings/endings constraint Decay-rate Ordering of lexicon… Accuracy: Proportion of words segmented that are words Completeness: Proportion of words that are segmented Baseline segmentation: correct number of words in utterance, randomly positioned boundary (Brent & Cartwright, 1996) Included By length

26 Results: Accuracy t(5) = , p < .0001

27 Results: Completeness
t(5) = , p < .0001

28 Results: Naomi’s Lexicon
Top 10 after 1K utterances: Nomi Say No Yes The Okay Whatsthis Blanket Is What

29 Results: Naomi’s Lexicon
Top 10 after 8K utterances: You Nomi The It To What I That’s No Your

30 Results: Naomi’s Lexicon
0.05 decay Dashed line shows mean word length in corpus

31 Results: Naomi’s Lexicon
0.01 decay

32 Summary Model based on puddles of sound accurate, complete
reliance on Proper noun frequent words “pop” out same words useful for grammatical categorisation No mechanism for alternative, competing parses of speech first, cognitively plausible step for how lexicon may be generated Relative role of phonological glue, legal boundaries, sorting by length/frequency


Download ppt "Words in puddles of sound"

Similar presentations


Ads by Google