Presentation on theme: "The Complexities of Understanding Speech in Background Noise Stuart Rosen UCL Speech, Hearing and Phonetic Sciences First International Conference on."— Presentation transcript:
The Complexities of Understanding Speech in Background Noise Stuart Rosen UCL Speech, Hearing and Phonetic Sciences First International Conference on Cognitive Hearing Science for Communication
A caveat about ‘cognition’ Important aspects of this problem are not ‘cognitive’ but … Cognitive processing … –relies on adequate sensory representations, and … –can compensate for impoverished sensory representations.
Why is this interesting? Most speech is not heard in quiet, anechoic conditions. People vary a lot in how well they can understand speech in the presence of other sounds. –Effects of hearing impairment –Effects of age –Auditory processing disorder (APD)?
Some determinants of performance: I The nature of the target speech material –Predictability context number of alternative utterances frequency of usage size of lexical ‘neighbourhoods’
Some determinants of performance: II The configuration of the environment –Open air or in a room? –How ‘dry’ is a room? effects of reverberation –spatial separation between target and noise or, the transmission system (e.g. mobile telephone) –distortion & noise added by the system
Some determinants of performance: III Talker characteristics –Different talkers vary considerably in intrinsic intelligibility –Talkers vary their own speech depending upon demands of the situation hyper/hypo distinction of Lindblom (1990) –Match between talker and listener accents
Some determinants of performance: IV Listener characteristics –Linguistic development vocabulary knowledge ability to use context the presence of language impairments L1 vs L2 –Hearing sensitivity and any hearing prosthesis used –Neuro-developmental disorders Language impairment Autism spectrum disorder APD
Some determinants of performance: V The nature of the background noises –level (SNR) –fluctuations in level –spectral characteristics –genuine ‘noise’: aperiodic or periodic? –and/or other talkers how many there are speaking your own language or a language you don’t know –How ‘attention-grabbing’ the background noises are
The simplest case: A steady-state background noise
Much is understood about what makes one steady noise more or less interfering than another spectral shape SNR
‘Energetic’ masking Noises interfere with speech to the extent that have energy in the same frequency regions Can be quantified in the ‘articulation index’ Reflects direct interaction of masker and speech in the cochlea, which acts as a frequency analyser.
masker Fluctuating maskers afford ‘glimpses’ of the target signal target masker glimpses
‘dip listening’ or ‘glimpsing’ People with normal hearing can listen in the ‘dips’ of an amplitude modulated masker SRT for VCVs in simple on/off fluctuations as a function of the duration of the fluctuation. Howard-Jones & Rosen (1993) Acustica better performance
‘Dips’ can be limited in frequency (‘checkerboard noise’) SRT for VCVs in 10 Hz modulations with different numbers of channels. Howard-Jones & Rosen (1993) JASA better performance
But maskers can be periodic too, most importantly, when speech is in the background.
Miller (1947) ‘The masking of speech’ It has been said that the best place to hide a leaf is in the forest, and presumably the best place to hide a voice is among other voices.
Listening to speech in ‘noise’ Bouncy in quiet in steady noise in modulated noise against another talker Children’s Coordinate Response Measure
A useful distinction Energetic masking –maskers interfere with speech to the extent that have energy in the same time/frequency regions –primarily reflecting direct interaction of masker and speech in the cochlea –relevance of glimpsing/dip listening Temporal and/or spectral ‘dips’ in the masker allow ‘glimpses’ of target speech Informational masking everything else!
Informational masking Something to do with target/masker similarity? –signal and masker ‘are both audible but the listener is unable to disentangle the elements of the target speech from a similar-sounding distracter’ (Brungart, 2001)
Informational masking: a finer distinction (Shin-Cunningham, 2008) Problems in ‘object formation’ –Related to auditory scene analysis –similarities in auditory properties make segregation difficult voice pitch, timbre, rate Problems in ‘object selection’ –Related to attention and distraction –the masker may distract attention from the target e.g., more interference from a known as opposed to a foreign language 2 men 1 woman, 1 man
EM & IM appear to operate at different parts in the auditory pathway Energetic masking at the periphery, in the cochlea –Early developing abilities –Increased EM from hearing impairment –Unlikely to be a factor in APD Informational masking at higher centres –Late developing abilities? –Increased IM in older listeners? –Increased IM in developmental disorders? –But aspects of IM can be made difficult by peripheral factors e.g., CI users difficulties with auditory scene analysis
little glimpsing for CI users Nelson et al. (2003) speech-spectrum-shaped masking noise square- wave modulated added to IEEE sentences normal listeners better performance →
CI users not only poor frequency selectivity, but lack of sensation of voice pitch (poor perception of TFS) makes auditory scene analysis difficult: How do you tell the noise from the speech? better performance →
But IM can be excessive in the presence of normal hearing …
Children find it hard to ignore another talker better performance
Slow development of abilities that minimise IM better performance
Increased IM in Specific Language Impairment (SLI) 9 SLI & 10 TD children aged 6-10 years better performance steady noise ed speech CCRM sentences MSc work of Csaba Redey- Nagy
Increased IM in some people with High Functioning Autism (HFA) CCRM sentences in various backgrounds PhD work of Katharine Mair evidence for a temporal processing deficit but … not the crucial factor in excessive masking for speech control better HFA worse HFA
Increased IM in some people with High Functioning Autism (HFA) CCRM sentences in various backgrounds PhD work of Katharine Mair HFA poor performers (and younger children) are highly susceptible to informational masking … but what aspect? ASA? attention? linguistic aspects? control better HFA worse HFA
An ecologically valid test bed for evaluating the roles of EM and IM: Speech in n-talker babble for n=1,2,3…∞ talkers
Miller (1947) Increasing the number of talkers in the masker SNR (dB) +12 +6 0 -6 -12 -18‘It is relatively easy for a listener to distinguish between two voices, but as the number of rival voices is increased the desired speech is lost in the general jabber.’ target words from multiple males babble: equal numbers of m/f (1 VOICE is male) better performance →
IEEE sentences in n-talker babble What happens as n increases? –glimpsing opportunities so EM –linguistic content so IM (selection?) –number of Fo contours so IM (ASA) better performance →
1-talker voice pitch source with envelopes derived from n-talker babble 1-talker babble-modulated 1-talker F0 (plus with an unmodulated envelope) 2-talker 16-talker
2-talker voice pitch source with envelopes derived from n-talker babble 1-talker babble-modulated 2-talker F0 (plus with an unmodulated envelope) 2-talker 16-talker
Unintelligible maskers on noise- vocoded IEEE sentences noise 1 Fo contour 2 Fo contours Periodicity in the maskers leads to better performance, probably through better ASA It’s easier to ignore a single F0 contour, rather than two but... Why improved performance for steady-state vs 16-talker envelopes? Worse still, why glimpsing in noise?! better performance
Final remarks The balance of EM & IM effects presumably varies with the age and hearing status of the listener The linguistic effects seen may represent a separate aspect of IM apart from object formation and selection. Unraveling the contributions of various factors in understanding the masking of speech by other sounds is very important … –But very complex!
Tack så mycket! Work supported by: UCL Speech, Hearing and Phonetic Sciences National Institutes of Health DC006014 Bloedel Hearing Research Center Thanks to my collaborators: Sophie Scott, Katharine Mair, Tim Green, Csaba Redey-Nagy, Jude Barwell, Zoe Lyall & Arooj Majeed of UCL Pam Souza, Northwestern U