From speech signal acoustics to perception Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC)

Slides:



Advertisements
Similar presentations
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Advertisements

Auditory scene analysis 2
Function words are often reduced or even deleted in casual conversation (Fig. 1). Pairs may neutralize: he’s/he was, we’re/we were What sources of information.
CS 551/651: Structure of Spoken Language Lecture 12: Tests of Human Speech Perception John-Paul Hosom Fall 2008.
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.
Periodicity and Pitch Importance of fine structure representation in hearing.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
SPEECH PERCEPTION 2 DAY 17 – OCT 4, 2013 Brain & Language LING NSCI Harry Howard Tulane University.
The Neuroscience of Language. What is language? What is it for? Rapid efficient communication – (as such, other kinds of communication might be called.
Speech perception 2 Perceptual organization of speech.
Speech Science XII Speech Perception (acoustic cues) Version
The Perception of Speech. Speech is for rapid communication Speech is composed of units of sound called phonemes –examples of phonemes: /ba/ in bat, /pa/
Nuclear Accent Shape and the Perception of Prominence Rachael-Anne Knight Prosody and Pragmatics 15 th November 2003.
AN ACOUSTIC PROFILE OF SPEECH EFFICIENCY R.J.J.H. van Son, Barbertje M. Streefkerk, and Louis C.W. Pols Institute of Phonetic Sciences / ACLC University.
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Speech Perception Overview of Questions Can computers perceive speech as well as humans? Does each word that we hear have a unique pattern associated.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
Psychoacoustics of Dynamic ‘Center-of-Gravity’ Signals Larry Feth Ashok Krishnamurthy Ohio State University.
Profile of Phoneme Auditory Perception Ability in Children with Hearing Impairment and Phonological Disorders By Manal Mohamed El-Banna (MD) Unit of Phoniatrics,
TEMPLATE DESIGN © Perceptual compensation for /u/-fronting in American English KATAOKA, Reiko Department.
Cognitive Processes PSY 334 Chapter 2 – Perception April 9, 2003.
PSY 369: Psycholinguistics
Interrupted speech perception Su-Hyun Jin, Ph.D. University of Texas & Peggy B. Nelson, Ph.D. University of Minnesota.
Pavel Skrelin (Saint-Petersburg State University) Some Principles and Methods of Measuring Fo and Tempo.
SPEECH PERCEPTION The Speech Stimulus Perceiving Phonemes Top-Down Processing Is Speech Special?
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Cognitive Processes PSY 334 Chapter 2 – Perception.
Speech acoustics and phonetics Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) NATO-ASI “Dynamics.
Applied Psychoacoustics Lecture 2: Thresholds of Hearing Jonas Braasch.
Segment Duration and Vowel Quality in German Lexical Stress Perception Klaus J. Kohler University of Kiel, Germany Paper presented at Speech Prosody 2012.
Speech Perception. Phoneme - a basic unit of a speech sound that distinguishes one word from another Phonemes do not have meaning on their own but they.
Speech Perception 4/6/00 Acoustic-Perceptual Invariance in Speech Perceptual Constancy or Perceptual Invariance: –Perpetual constancy is necessary, however,
Flexible, Robust, and Efficient Human Speech Processing Versus Present-day Speech Technology Louis C.W. Pols Institute of Phonetic Sciences / IFOTT University.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Speech Perception1 Fricatives and Affricates We will be looking at acoustic cues in terms of … –Manner –Place –voicing.
1 Auditory, tactile, and vestibular sensory systems n Perceptually relevant characteristics of sound n The receptor system: The ear n Basic sensory characteristics.
Mr Background Noise and Miss Speech Perception in: by Elvira Perez and Georg Meyer.
Studies of Information Coding in the Auditory Nerve Laurel H. Carney Syracuse University Institute for Sensory Research Departments of Biomedical & Chemical.
Human and Machine Performance in Speech Processing Louis C.W. Pols Institute of Phonetic Sciences / ACLC University of Amsterdam, The Netherlands (Apologies:
Epenthetic vowels in Japanese: a perceptual illusion? Emmanual Dupoux, et al (1999) By Carl O’Toole.
Phonetic Context Effects Major Theories of Speech Perception Motor Theory: Specialized module (later version) represents speech sounds in terms of intended.
1 Cross-language evidence for three factors in speech perception Sandra Anacleto uOttawa.
Temporal masking of spectrally reduced speech: psychoacoustical experiments and links with ASR Frédéric Berthommier and Angélique Grosgeorges ICP 46 av.
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
IIT Bombay {pcpandey,   Intro. Proc. Schemes Evaluation Results Conclusion Intro. Proc. Schemes Evaluation Results Conclusion.
Katherine Morrow, Sarah Williams, and Chang Liu Department of Communication Sciences and Disorders The University of Texas at Austin, Austin, TX
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
Speech Perception.
Language Perception.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Transitions + Perception March 25, 2010 Tidbits Mystery spectrogram #3 is now up and ready for review! Final project ideas.
SOUND PRESSURE, POWER AND LOUDNESS
What can we expect of cochlear implants for listening to speech in noisy environments? Andrew Faulkner: UCL Speech Hearing and Phonetic Sciences.
SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.
Danielle Werle Undergraduate Thesis Intelligibility and the Carrier Phrase Effect in Sinewave Speech.
PSYCHOACOUSTICS A branch of psychophysics
Precedence-based speech segregation in a virtual auditory environment
Cognitive Processes PSY 334
Speech Perception.
CHAPTER 10 Auditory Sensitivity.
Speech Perception (acoustic cues)
Auditory Demonstrations
Presentation transcript:

From speech signal acoustics to perception Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) NATO-ASI “Dynamics of Speech Production and Perception” Il Ciocco, Tuscany, Italy, July 4, 2002

July 4, 2002From speech signal acoustics to perception, Il Ciocco2 Overview how do we perceive (speech) dynamics? The Intelligent Ear. On the Nature of Sound Perception, by Reinier Plomp (2002) from psychoacoustics to speech perception (lack of) context; robustness; continuity V and C reduction; coarticulation perceptual compensation for artic. undershoot? speech efficiency conclusions

July 4, 2002From speech signal acoustics to perception, Il Ciocco3 Various scientific preferences several biases have affected the history of (speech &) hearing research (Plomp, 2002): dominance of sinusoidal tones as stimuli preference for microscopic approach (e.g., phoneme discrimination rather than intelligibility) emphasis on psychophysical (rather than cognitive) aspects of hearing clean stimuli in the lab rather than the acoustic reality of the outside world (disruptive sounds)

July 4, 2002From speech signal acoustics to perception, Il Ciocco4 Psychoacoustics - speech perc. duration, pitch, loudness, timbre, direction absolute and masked threshold, jnd, discrim. continuity complexity (pure - complex tone, voicing) effect of context, meaning (intell.), freq. occ. phoneme: more text-guided than perceived speech perceptual tasks: phoneme —> sent. identif.; discrim.; matching

July 4, 2002From speech signal acoustics to perception, Il Ciocco5 Detection thresholds and jnd multi-harmonic, simple, stationary signals single-formant-like periodic signals 3 - 5% 1.5 Hz % frequency F2 BW

July 4, 2002From speech signal acoustics to perception, Il Ciocco6 Perceiving speech-like trans. Ph.D thesis A. van Wieringen (1995) “Perceiving dynamic speechlike sounds. Psycho- acoustics and speech perception” see also vWie & Pols, Acustica 84 (1998) stimulus characteristics (segmented and/or reversed) natural or synthetic tone glide; single- or multi-formant transition isolated trans.; initial or final trans. with steady st. converg. or diverg. trans. (var. duration or slope) task: jnd/DL; matching; abs. ident.; classif.

July 4, 2002From speech signal acoustics to perception, Il Ciocco7 DL for short speech-like transitions Adopted from van Wieringen & Pols (1998), Acta Acustica 84, “Discrimination of short and rapid speechlike transitions” complex simple short longer trans. initial final

July 4, 2002From speech signal acoustics to perception, Il Ciocco8 Perceiving (speech) dynamics vowel perception w/w or w/o transitions? our claims (vSon, IFA Proc. 17 (1993)): only evidence for compensatory processes, i.e. perceptual-overshoot and dynamic-specification, when in an appropriate context synthetic isolated dynamic formant tracks lead to perceptual undershoot (=averaging) silent center studies are ambiguous concl.: info in formant dynamics is only used when V’s are heard in appropriate context

July 4, 2002From speech signal acoustics to perception, Il Ciocco10 Vowel identification compare V responses for dynamic stimuli with those for static stimuli calculate net shift in V responses per onglide (CV), complete (CVC), or offglide (VC) result: responses average over the trailing part of the formant track

July 4, 2002From speech signal acoustics to perception, Il Ciocco11 Net shift in vowel responses to tokens with curved formant tracks vs. stationary tokens. All values significant, except small open triangles Perceptual undershoot

July 4, 2002From speech signal acoustics to perception, Il Ciocco12 Effect of local context “Perisegmental speech improves consonant and vowel identification”, vSon & Pols, Speech Comm. 29,1-22 (1999) also “Phoneme recognition as a function of task and context”, IFA Proc. 24, (2001) and Proc. SPRAAC, (2001) also Pols & vSon (1993), “Acoustics and perception of dynamic vowel segments”, Speech Comm. 13,

July 4, 2002From speech signal acoustics to perception, Il Ciocco13 V and C identification gated tokens from 120 CVC speech fragments taken from a long text reading 50 ms V kernel, + V trans., + C part (L/R) stimuli randomized; V identification (17 Ss) and C i and C f identification (15 Ss) results: phoneme identification benefits from extra speech left context more beneficial than right context better identification when also other member of pair was identified correctly (context effect)

Error rates of vowel identification for the individual stimulus token types. Long-short vowel errors (/α-a:, -o:/) are ignored c

V and C in CV tokens were identified better when the other member of the pair was identified correctly

July 4, 2002From speech signal acoustics to perception, Il Ciocco17 Effect of (lack of) context 100 Dutch listeners identifying V segments “Vowel contrast reduction”, K-vBeinum (1980) 3 conditionsM1M2F1F2Av. isolated V% (3)ASC words% (5)ASC unstr., free conv. % (10)ASC ASC = 1/n Σ |LF i - LF i | 2 (total variance), LF i = log F i i=1 n

July 4, 2002From speech signal acoustics to perception, Il Ciocco18 Human word intelligibility vs. noise from Ph.D thesis H. Steeneken (1992) ‘On measuring and predicting speech intelligibility’

July 4, 2002From speech signal acoustics to perception, Il Ciocco19 Robustness to degraded speech speech = time-modulated signal in frequency bands relatively insensitive to (spectral) distortions prerequisite for digital hearing aid modulating spectral slope: -5 to +5 dB/oct, Hz temporal smearing of envelope modulation ca. 4 Hz max. in modulation spectrum  syllable LP>4 Hz and HP<8 Hz little effect on intelligibility spectral envelope smearing for BW>1/3 oct masked SRT starts to degrade (for references, see keynote paper Pols in Proc. ICPhS’99)

July 4, 2002From speech signal acoustics to perception, Il Ciocco20 Some examples partly reversed speech (Saberi & Perrott, Nature, 4/99) fixed duration segments time reversed or shifted in time perfect sentence intelligibility up to 50 ms (demo: every 50 ms reversedoriginal) low frequency modulation envelope (3-8 Hz) vs. acoustic spectrum syllable as information unit? (S. Greenberg) gap and click restoration ( Warren ) gating experiments

July 4, 2002From speech signal acoustics to perception, Il Ciocco21 Continuity, especially while masked continuity effect (Miller & Licklider), auditory induction (Warren), pulsation threshold (Houtgast) also for gliding tones also for complex tones also for pitch fission, fusion segregation, streaming phonemic restoration Hz —> time

July 4, 2002From speech signal acoustics to perception, Il Ciocco22 V and C reduction, coarticulation spectral variability is not random but, at least partly, speaker-, style-, and context-specific read - spontaneous; stressed - unstressed not just for vowels, but also for consonants duration; spectral balance intervocalic sound energy difference F2 slope difference; locus equation

Mean consonant durationMean error rate for C identification Adopted from van Son & Pols (Eurospeech’97) C-duration C error rate 791 VCV pairs (read & spontan.; stressed & unstr. segments; one male); C-identification by 22 Dutch subjects

July 4, 2002From speech signal acoustics to perception, Il Ciocco24 Perception of ac. V reduction Ph.D thesis Dick van Bergem (1995) “Acoustic and lexical vowel reduction” lexical V reduction: Fr /betõ/ vs. Du acoustic V reduction: Du ‘miljoen’ as /mIljun/ or as identify the unstressed vowels (as V by 20 listeners (8M, 12 F) in 47 words (cond. W and S) or 20 words (cond. P), like ‘milJOEN’ or ‘biosCOOP’ spoken by 20 male speakers (2280 stimuli)

5% 36% 60% 69% 4 reduction stages for 20 speakers % schwa responses on /I/ by 20 listeners model prediction for schwa in this m-l context adapted from vBergem (1995) Conclusion: Vowel reduction is not centralization but contextual assimilation

July 4, 2002From speech signal acoustics to perception, Il Ciocco26 Speech efficiency speech is most efficient if it contains only the information needed to understand it: “Speech is the missing information” (Lindblom, JASA ‘96) less information needed for more predictable things: shorter duration and more spectral reduction for high- frequent syllables and words C-confusion correlates with acoustic factors (duration, CoG) and with information content (syll./word freq.) I(x) = -log 2 (Prob(x)) in bits (see van Son, Koopmans-van Beinum, and Pols (ICSLP’98))

Correlation between consonant confusion and 4 measures indicated Adopted from van Son et al. (Proc. ICSLP’98) Dutch male sp. 20 min. R/S 12 k syll. 8k words 791 VCV R/S lex. str unstr. C ident. 22 Ss

July 4, 2002From speech signal acoustics to perception, Il Ciocco28 Conclusions perceiving speech (segments) very much depends on speech quality and context isolated segments is also a kind of context only ‘proper’ interpretation of formant transitions (perceptual compensation for spectro-temporal undershoot) when presented in an appropriate context reduced V are best perceived as schwa if transitions are contextually assimilated