Presentation is loading. Please wait.

Presentation is loading. Please wait.

Puzzles and Patterns in 50 years of Research on Speech Perception Sarah Hawkins University of Cambridge

Similar presentations


Presentation on theme: "Puzzles and Patterns in 50 years of Research on Speech Perception Sarah Hawkins University of Cambridge"— Presentation transcript:

1

2 Puzzles and Patterns in 50 years of Research on Speech Perception Sarah Hawkins University of Cambridge sh110@cam.ac.uk

3 Three periods 1.1950-1965 Broad-based exploration 2.1965-1990s Narrowed to focus on the search for invariance in the relationship between speech signal and its percept: THEORY 3.1995…. This focus is broadening again –to include ‘discrepant’ data & new understanding –which requires changes in conceptualization of task goals processes involved

4 The Main Message Speech perception is at an exciting stage: we are beginning to integrate areas of old research with the mainstream theoretical work of the last 30 years or so A paradigm shift?

5 Early work Glorious Discovery

6 Early work often looked at effects on the whole signal but as puzzles arose, and we looked more closely, then attention became focused on small domains in an effort both to simplify and to clarify

7 Early work: source separation Cocktail party effect / multi-talker perception Cherry (1953) continuous natural speech, with different types of content, presented in different ways a huge wealth of observations relevant to –memory –attention –transitional probabilities –speaker vs message Cherry (1953) JASA 25, 975-979

8 Early work: source separation Cocktail party effect / multi-talker perception Broadbent & Ladefoged (1957) separate synthetic formants fuse to sound like a single vowel when presented to the same or different ears, only if they have the same f0 compared ‘natural’ and ‘sustained’ formants extensions to theories of hearing (e.g. Licklider) Broadbent & Ladefoged (1957) JASA 29, 708-710 Darwin (1981) QJEP 33, 185-207 Bregman (1990) Auditory Scene Analysis ASA special session, 2004 Cooke & Ellis (2001) Sp. Comm. 35, 141–177

9 Early work: source integration Sumby & Pollack (1954) Especially in high levels of noise: audiovisual presentation increases intelligibility (visual contribution is relative to the available auditory contribution) Sumby & Pollack (1954) JASA 26: 212-215 Massaro (1998) Perceiving Talking Faces Widespread AV groups and applications

10 Early work: source integration Sumby & Pollack (1954) Especially in high levels of noise: audiovisual presentation increases intelligibility (visual contribution is relative to the available auditory contribution) in auditory-only presentations, polysyllables are more intelligible than monosyllables (overall shape... neighborhoods…cohorts…) Richard Warren, Paul Luce, Marslen-Wilson Sumby & Pollack (1954) JASA 26: 212-215 Massaro (1998) Perceiving Talking Faces Widespread AV groups and applications

11 Early work: brain function Kimura (1961) speech is processed more efficiently by the ear that is contralateral to the language-dominant hemisphere independent of handedness and right/left focus of damage due to epilepsy  complexities of auditory pathways, cerebral dominance, and speech processing Kimura (1961) Canadian J. Psychol., 15, 166-171 The new ‘cognitive neuroscience/psychology’…

12 Early work: memory Miller (1956) short term memory span for unrelated items –The Magical Number Seven ± Two can increase this span by: –making relative rather than absolute judgments –increasing the number of dimensions –chunking into larger items recoding is a crucial process Miller (1956) Psychological Review 63, 81-97 Serial learning and recall (e.g. Underwood) Lashley (1951) Serial order in behavior Pisoni (1973) and later

13 Early work: intelligibility Context of Possible Responses Miller, Heise & Lichten (1951) monosyllables size of test vocabulary affects identification 2…256…all monsylls though presumably there are limits: –two vs six –five vs nine ! Miller, Heise & Lichten, (1951) J.Exp.Psych. 41, 329-335

14 Early work: intelligibility Phonetic Context Pickett & Pollack (1963) excerpts from connected speech must be ≥ 800 ms long to be fully intelligible regardless of rate: –faster rates need more syllables to be understood (slowing the speech down does not help)  crucial role of coarticulation & style (‘connected speech processes’) Pickett & Pollack (1963) Language & Speech 6, 165-171

15 Early work: preceding context Ladefoged and Broadbent (1957) "Please say what this word is: bitbetbatbut Ladefoged and Broadbent (1957) JASA 29, 98-104 affects the interpretation of the current sound F1 of CARRIER 200-380 Hz 380-660 Hz bet bit

16 Early work: immediate context determines the interpretation of the current stimulus Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606 Synthesizing bursts and transitionless vowels

17 Early work: immediate context determines the interpretation of the current stimulus Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606 Identification of bursts and transitionless vowels: the CV is identified as the minimal acoustic unit

18 Early work: immediate context determines the interpretation of the current stimulus Delattre, Liberman, & Cooper (1955) JASA 27, 769-773 Identification of burstless stops with different vowels: transitions are all you need!

19 Categorical Perception of obstruent consonants Equal acoustic changes  unequal auditory percepts place of articulation of stops: /b/ vs /d/ vs /g/ Liberman, Harris, Hoffman, and Griffith (1957) Journal of Experimental Psychology 54, 358-368 b dg

20 Categorical Perception of obstruent consonants together with a theoretical bias in favor of binary oppositions encouraged a focused search for simple transformations from the encoded signal to an unambiguous, formal linguistic mental representation

21 This narrower focus required clear conceptualisation of –identity of the important unit(s) of perception –process of abstraction On the whole, the units and levels of linguistic description were rather uncritically adopted

22 …units of linguistic description were rather uncritically adopted “we….had undertaken to find the ‘invariants’ of speech, a term which implies, at least in its simplest interpretation, a one-to-one correspondence between something half- hidden in the spectrogram and the successive phonemes of the message.” Cooper, Delattre, Liberman, Borst & Gerstman, Perception of synthetic speech sounds JASA (1952) 24, 604-5

23 …though not without some misgivings “…one should not expect always to be able to find acoustic invariants for the individual phonemes…we are trying to [compile] the code book, one in which there is one column for acoustic entries and another column for message units, whether these be phonemes, syllables, words, or whatever.” Cooper, Delattre, Liberman, Borst & Gerstman, Perception of synthetic speech sounds JASA (1952) 24, 604-5

24 Middle period The search for essence: ‘invariance’

25 Middle period: the search for essence Impose order on the chaos! Focus: non-linearity between variation in acoustic signal and perceptual response Categorical Perception (of consonants) Context becomes seen as variability, so we control for it ever more stringently

26 to discover the crucial—invariant—properties requires a view of what is fundamental The basic syllable! ba –CV –in isolation –stressed –possibly with only one V if we’re looking at Cs, and only one C if we’re looking at Vs

27 Imposing order on chaos The basic syllable: ba (context: silence) What was lost? –polysyllables –unstressed syllables –prosody –accounting for rate changes –connected speech –informativeness of variation esp. in connected speech –meaning –communication –(most things really)

28 Development of theory and the search for essence Two main approaches The Motor Theory Quantal Theory leading to Acoustic/Auditory Invariance

29 The Motor Theory of Speech Perception Liberman, Cooper, Shankweiler & Studdert-Kennedy (1967) Psychological Review 74, 431–461 Liberman & Mattingly (1985) Cognition 21, 1-36 Listeners interpret speech sounds in terms of –motoric gestures they would make them with (1967) –intended gestures of the speaker (1985) Gestural unit: ‘phonetic category’

30 Quantal Theory of Speech Perception (and production) Stevens (1972, 1989) Regions of stability in the acoustic signal, or auditory response, provide a basis for forming categories of sounds Unit: distinctive feature (Chomsky & Halle 1968) Stevens (1989) Journal of Phonetics 17, 3-45 Stevens (1972) In David & Denes Human Communication. 51-66

31 Quantal Theory becomes Acoustic/Auditory invariance theory Stevens & Blumstein (1978) ……. Stevens (2002) For each DF there is a binary response to an invariant acoustic or auditory property e.g. particular changes in spectral shape over short time periods at crucial parts of the signal –segment boundaries –vowel steady states +consonantal -consonantal change little change Stevens (2002) JASA 111, 1872-1891 Stevens & Blumstein (1978) JASA 64, 1358-1368

32 Acoustic/Auditory invariance theory Stevens (2002) landmarks: –islands of reliability –built-in local context connected speech… +strident -strident Stevens (2002) JASA 111, 1872-1891

33 Common properties Motor and Acoustic Invariance theories have much in common –dynamic –early abstraction –discrete units –phonological

34 Common properties Motor and Invariance theories have much in common –dynamic –early abstraction –discrete units –phonological allowed psycholinguistic theories to assume an input that is abstract and discrete: to ignore phonetic information

35 Psycholinguistic theories Focus on word segmentation & identification Top-down knowledge compensates for impoverished (phonemic) input –metrical stress, possible words, phonotactics…. Statistical, probabilistic Some names: –McClelland & Elman (TRACE) –Cutler, Norris, McQueen (Race, Shortlist, Merge) –Marslen-Wilson, Gaskell… (Cohort)

36 extensions, questions: is simplicity the best answer? Kewley-Port (1983) better identification with overall pattern (more detail?) Klatt (1979) Lexical Access From Spectra (LAFS) whole-word patterns? Kewley-Port (1983) JASA 73, 322-335 Klatt (1979) Journal of Phonetics 7, 279-312

37 extensions, questions: wider influences Ganong (1980) J. Exp. Psych: HPP 6, 110-125 Ganong (1980) identification expt VOT continuum word at one end, non- word at the other perception is more forgiving when the sound means something! nonword-word: dask-task word-nonword: dash-tash short VOT (d) long VOT (t) % /d/ 100 0

38 Summary: ‘context’ and ‘signal’ ‘Units’ functionally inseparable from ‘context’ The context and the signal together determine whether the signal is coherent –and hence what each unit ‘is’

39 Recent developments (since early-to-mid 90s) systematic subtle variation as linguistically informative: classify the contexts in a more linguistically-sophisticated way

40 Combining old and new themes re-examination and extension of information provided by systematic phonetic variation new areas, e.g. –cross-linguistic work (Best, Beddor, Bradlow...) –memory & learning (Goldinger, Pisoni...) –functional brain imaging (Sophie Scott)

41 Listeners use fine phonetic detail Allen & Miller (2004) speaker identity: listeners generalize talker- specific VOT information to a novel word Smith (2004) lexical identity: slightly inappropriate allophones in a sentence disrupt word-spotting only when speaker is familiar to listener familiarization to speakers is fast Allen & Miller (2004) JASA 116, 3171-3183 Smith (2004) PhD Dissertation, Cambridge University

42 ChineseEnglishSpanish Small, statistically non-significant changes in each of several formants can add up to large perceptual difference; conversely, some statistically significant differences may have no perceptual effect. Valaki et al. (2004) Neuropsychologia 42, 967–979 Spoken word recognition test, which is used to establish cerebral dominance large groups of native speakers of Chinese/English/Spanish coronal MRI slices, data for 3 Ss, >200 ms post-stimulus onset Lateralisation (%Ss): Spanish 100% left English 80% left Chinese 79% bilateral (tone lang.)

43 What sort of model? biologically plausible roles of attention, memory & learning focus on meaning (‘sound to sense’) multiple potential ‘units of perception’ no obligatory units? structure from incomplete information Adaptive Resonance Theory (ART) ? Grossberg 1986… Grossberg (2003) Journal of Phonetics 31, 423-445

44 A key issue what is a phonetic category? (Carol Fowler, May 2004: ‘never been sure’) mental representations of phonetic categories are dynamic, relational, & plastic –Repp, Lindblom, Studdert-Kennedy –Bradlow, Pisoni, Hawkins….. Hawkins (2003) Journal of Phonetics 31, 373-405

45 bottom-up vs top-down? phonetic variation that systematically indicates linguistic structure makes many ‘top-down’ processes unnecessary –e.g. allophonic detail vs Possible Word Constraint and blurs the traditional distinction between signal & knowledge

46 A Challenge to define and refine new questions in testable ways – i.e. to refocus, but to do it in ways that: –are rigorous yet focus on meaning and communication –avoid the ‘new understanding’ becoming doctrinaire –build on past contributions

47 Some topics I haven’t mentioned but should have… and could have, if I’d told the same story in a different way infants’ & animals’ perception (periods 2 & 3) vowel perception (dynamics; center of gravity) sine wave speech more theories (direct perception, auditory enhancement, FLMP) more on memory (incl. associations) & learning connections with psychoacoustics production-perception connections

48

49 Categorical Perception Run a discrimination experiment 1... 3 … 5 … 7 % difft 100 0 1 versus 3 Discrimination peak Courtesy Chris Darwin’s web site Run an identification experiment % /b/

50 Valaki et al. (2004) Neuropsychologia 42, 967–979 Monolingual/near monolingual native speakers: –30 Mandarin-Chinese –20 Spanish speakers all right handed –42 American English Whole-head MEG, auditory word recognition test, used clinically to establish hemispheric dominance for receptive language: 63 abstract words/language –33 target words, each in 3 lists, with 10 novel non- target words in each list –lift finger when you recognize a target word

51 Patterns of dominance (%) Laterality Index: (LH – RH) / (LH + RH) LHRHbilateral Spanish100 English8020 Mandarin14779

52 Vowel-to-vowel coarticulation /  / vs /  / Naturally spoken Schwas exchanged /  / /  /


Download ppt "Puzzles and Patterns in 50 years of Research on Speech Perception Sarah Hawkins University of Cambridge"

Similar presentations


Ads by Google