Central Auditory Processing Part II: Lent Term 2014: ( 2 of 4) Central Auditory Processing Roy Patterson Centre for the Neural Basis of Hearing Department of Physiology, Development and Neuroscience University of Cambridge email rdp1@cam.ac.uk Lecture slides on CamTools https://camtools.cam.ac.uk/portal.html Lecture slides, sounds and background papers on http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/
The Overture Act I: the information in communication sounds (animal calls, speech, musical notes) Act II: the perception of communication sounds and the robustness of perception to changes in acoustic scale Act III: the processing of communication sounds in the auditory system (signal processing) Act IV: the processing of communication sounds (anatomy, physiology, brain imaging)
Auditory perception is robust to changes in Ss and Sf Decreasing VTL Increasing GPR (1/Sf ) ( Ss ) Kawahara and Irino (2004). Principles of speech manipulation system STRAIGHT. In Speech separation by humans and machines, P. Divenyi (Ed.), Kluwer Academic, 167-179.
Rana catesbeiana (1/Sf ) Decreasing VTL Increasing GPR ( Ss )
[Patterson, Smith, van Dinther and Walters (2008)] Low High Pitch Long Short VTL ( Ss ) (Sf ) Time Time
Spectra on a linear frequency axis Low High Pitch Long Short VTL ( Ss ) (Sf )
Recognition of Scaled Vowels Smith, Patterson, Turner, Kawahara and Irino JASA (2005) pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPTKIjasa05.pdf /u/ mean
waveform and spectrum of a child’s /a/ Sf Ss Frequency on a logarithmic axis (octaves)
Psychophysical experiments confirm: The Perception of Communication Sounds: Summary Psychophysical experiments confirm: I: Humans can extract the content of the communication without being confused by size differences http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPTKIjasa05.pdf
Speaker Size estimates for vowels varying in GPR and VTL pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPjasa05.pdf Smith and Patterson (2005) JASA log(VTL) log(GPR)
Psychophysical experiments also confirm: The Perception of Communication Sounds: Summary Psychophysical experiments also confirm: I: Humans can extract the content of the communication without being confused by size differences II: Humans can extract the size information without being confused by differences in the content http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPTKIjasa05.pdf http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPjasa05.pdf
Discrimination of Ss and Sf Ss : the semitones on the keyboard differ by 5.9% Experiments with vowels show that people can reliably discriminate a 2% difference in Ss Sf : the discrimination of Sf is more tricky Present two vowels and ask: Which vowel came from the larger speaker? http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/ISPjasa05.pdf
waveform and spectrum of a child’s /a/ Sf Ss Frequency on a logarithmic axis (octaves)
Ives, Smith and Patterson (2005) JASA Syllable database Sonorants Stops Fricatives ma na la ra wa ya ba da ga pa ta ka sa fa va za xa ha me ne le re we ye be de ge pe te ke se fe ve ze xe he mi ni li ri wi yi bi di gi pi ti ki si fi vi zi xi hi mo no lo ro wo yo bo do go po to ko so fo vo zo xo ho mu nu lu ru wu yu bu du gu pu tu ku su fu vu zu xu hu am an al ar aw ay ab ad ag ap at ak as af av az ax ah em en el er ew ey eb ed eg ep et ek es ef ev ez ex eh im in il ir iw iy ib id ig ip it ik is if iv iz ix ih om on ol or ow oy ob od og op ot ok os of ov oz ox oh um un ul ur uw uy ub ud ug up ut uk us uf uv uz ux uh aa ee ii oo uu CV’s VC’s Ives, Smith and Patterson (2005) JASA vowels large (voiced) small (voiced) mi en ka it so us Kawahara and Irino (2004). The vocoder STRAIGHT. Kluwer Academic
Ives, Smith and Patterson (2005) JASA Speaker-size discrimination with syllables Present two intervals of syllables and ask: Which is the larger speaker? The syllables are randomly chosen for the intervals. The overall level is varied randomly between the intervals. The pitch contours are different. VTL = x VTL = x + Δx The only consistent cue is a difference in VTL ( Sf ) interval 1 interval 2 pitch /wa/ /et/ /am/ /ku/ /ma/ /te/ /om/ /se/ Ives, Smith and Patterson (2005) JASA http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/ISPjasa05.pdf
Experiment: Sf discrimination thresholds for five different people Glottal pulse rate / Hz 80 160 320 SER 0.92 1.22 1.65 SMALL MALE LARGE MALE SMALL CHILD DWARF CASTRATO Ives, Smith and Patterson, JASA (2005) ( Ss ) VTL/cm 14 10 19 Sf
Results: all subjects, all syllables DWARF SMALL CHILD % reported larger % reported larger Trials test as smaller Trials test as smaller SMALL MALE Ives, Smith and Patterson (2005) JASA % reported larger Trials test as smaller LARGE MALE CASTRATO % reported larger % reported larger Trials test as smaller Trials test as smaller
Results: all subjects, all syllables DWARF SMALL CHILD average JND across syllable category for specific speaker type. SMALL MALE grand average JND for the experiment basically 5%, independent of acoustic scale CASTRATO LARGE MALE http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/ISPjasa05.pdf
Psychophysical experiments confirm: The Perception of Communication Sounds: Summary The acoustic scale values in communication sounds tell us which individual, within a population, is speaking or which instrument, within a family, is playing Psychophysical experiments confirm: I: Humans can extract the content of the communication without being confused by the size information II: Humans can extract the size information without being confused by the content of the communication III: Auditory perception is amazingly robust to changes in acoustic scale (Ss and/or Sf ) in communication sounds
End of Act II Thank you Smith, D. R. R., Patterson, R. D., Turner, R., Kawahara, H., and Irino, T. (2005). "The processing and perception of size information in speech sounds," J. Acoust. Soc. Am.117, 305-318. http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPTKIjasa05.pdf Smith, D. R. R. and Patterson, R. D. (2005). "The interaction of glottal-pulse rate and vocal-tract length in judgements of speaker size, sex and age," J. Acoust. Soc. Am. 118,3177-3186. http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/SPjasa05.pdf Ives, D. T., Smith, D. R. R. and Patterson, R. D. (2005). "Discrimination of speaker size from syllable phrases," J. Acoust. Soc. Am. 118 (6), 3816-3822. http://www.pdn.cam.ac.uk/groups/cnbh/teaching/lectures/ISPjasa05.pdf
Concurrent Speech and the cocktail party Colin Cherry (1952)
Syllable database Sonorants Stops Fricatives ma na la ra wa ya ba da ga pa ta ka sa fa va za xa ha me ne le re we ye be de ge pe te ke se fe ve ze xe he mi ni li ri wi yi bi di gi pi ti ki si fi vi zi xi hi mo no lo ro wo yo bo do go po to ko so fo vo zo xo ho mu nu lu ru wu yu bu du gu pu tu ku su fu vu zu xu hu am an al ar aw ay ab ad ag ap at ak as af av az ax ah em en el er ew ey eb ed eg ep et ek es ef ev ez ex eh im in il ir iw iy ib id ig ip it ik is if iv iz ix ih om on ol or ow oy ob od og op ot ok os of ov oz ox oh um un ul ur uw uy ub ud ug up ut uk us uf uv uz ux uh aa ee ii oo uu CV’s VC’s vowels
concurrent-speech: experimental paradigm Identify the syllable in the interval that stays lit wu osh Vestergaard et al (2009) JASA
Robustness of speech perception
600 400 200 Target Distracter ms
Concurrent-speech paradigm Sonorants (semivowels) Stops (plosives) Fricatives 600 400 200 Target Distracter ms de mi ki lu ez osh Target triplet: de mi osh Masker triplet: ki lu ez Concurrently at 0 dB SNR Pre-cursor, 0 dB SNR Vestergaard et al (2009) JASA
Vestergaard, Fyson and Patterson, JASA, 2009 Ss Sf Vestergaard, Fyson and Patterson, JASA, 2009