Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division,

Slides:



Advertisements
Similar presentations
Auditory Localisation
Advertisements

Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.
Sound Localization Superior Olivary Complex. Localization: Limits of Performance Absolute localization: localization of sound without a reference. Humans:
Hearing relative phases for two harmonic components D. Timothy Ives 1, H. Martin Reimann 2, Ralph van Dinther 1 and Roy D. Patterson 1 1. Introduction.
Binaural Hearing Or now hear this! Upcoming Talk: Isabelle Peretz Musical & Non-musical Brains Nov. 12 noon + Lunch Rm 2068B South Building.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Intensity representation 1 Representation of the intensity of sound (or is it something else about efficiency?)
Pitch Perception.
1 Auditory Sensitivity, Masking and Binaural Hearing.
Chapter 6: Masking. Masking Masking: a process in which the threshold of one sound (signal) is raised by the presentation of another sound (masker). Masking.
A.Diederich– International University Bremen – Sensation and Perception – Fall Frequency Analysis in the Cochlea and Auditory Nerve cont'd The Perception.
Development of sound localization
ICA Madrid 9/7/ Simulating distance cues in virtual reverberant environments Norbert Kopčo 1, Scott Santarelli, Virginia Best, and Barbara Shinn-Cunningham.
Stochastic Properties of Neural Coincidence Detector cells Ram Krips and Miriam Furst.
Localising multiple sounds. Phenomenology Different sounds localised appropriately The whole of a sound is localised appropriately …even when cues mangled.
Effect of roving on spatial release from masking for amplitude-modulated noise stimuli Norbert Kopčo *, Jaclyn J. Jacobson, and Barbara Shinn-Cunningham.
Auditory Objects of Attention Chris Darwin University of Sussex With thanks to : Rob Hukin (RA) Nick Hill (DPhil) Gustav Kuhn (3° year proj) MRC.
Hearing & Deafness (3) Auditory Localisation
Two- tone unmasking and suppression in a forward-masking situation Robert V. Shannon 1976 Spring 2009 HST.723 Theme 1: Psychophysics.
Spectral centroid 6 harmonics: f0 = 100Hz E.g. 1: Amplitudes: 6; 5.75; 4; 3.2; 2; 1 [(100*6)+(200*5.75)+(300*4)+(400*3.2)+(500*2 )+(600*1)] / = 265.6Hz.
Sound source segregation (determination)
Plasticity in sensory systems Jan Schnupp on the monocycle.
Acoustical Society of America, Chicago 7 June 2001 Effect of Reverberation on Spatial Unmasking for Nearby Speech Sources Barbara Shinn-Cunningham, Lisa.
Speech Segregation Based on Sound Localization DeLiang Wang & Nicoleta Roman The Ohio State University, U.S.A. Guy J. Brown University of Sheffield, U.K.
COMBINATION TONES The Science of Sound Chapter 8 MUSICAL ACOUSTICS.
Improved 3D Sound Delivered to Headphones Using Wavelets By Ozlem KALINLI EE-Systems University of Southern California December 4, 2003.
SOUND IN THE WORLD AROUND US. OVERVIEW OF QUESTIONS What makes it possible to tell where a sound is coming from in space? When we are listening to a number.
Applied Psychoacoustics Lecture: Binaural Hearing Jonas Braasch Jens Blauert.
3-D Sound and Spatial Audio MUS_TECH 348. Main Types of Errors Front-back reversals Angle error Some Experimental Results Most front-back errors are front-to-back.
Studies of Information Coding in the Auditory Nerve Laurel H. Carney Syracuse University Institute for Sensory Research Departments of Biomedical & Chemical.
Sounds in a reverberant room can interfere with the direct sound source. The normal hearing (NH) auditory system has a mechanism by which the echoes, or.
Calibration of Consonant Perception in Room Reverberation K. Ueno (Institute of Industrial Science, Univ. of Tokyo) N. Kopčo and B. G. Shinn-Cunningham.
Localization of Auditory Stimulus in the Presence of an Auditory Cue By Albert Ler.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Chapter 12: Sound Localization and the Auditory Scene.
‘Missing Data’ speech recognition in reverberant conditions using binaural interaction Sue Harding, Jon Barker and Guy J. Brown Speech and Hearing Research.
3-D Sound and Spatial Audio MUS_TECH 348. Physical Modeling Problem: Can we model the physical acoustics of the directional hearing system and thereby.
Jens Blauert, Bochum Binaural Hearing and Human Sound Localization.
Hearing Research Center
Study on Frequency Domain Primary-Ambient Extraction (PAE) HE Jianjun PhD Candidate, DSP Lab, School of EEE, Nanyang Technological University, Singapore.
This research was supported by Delphi Automotive Systems
Pitch perception in auditory scenes 2 Papers on pitch perception… of a single sound source of more than one sound source LOTS - too many? Almost none.
Human Detection and Localization of Sounds in Complex Environments W.M. Hartmann Physics - Astronomy Michigan State University QRTV, UN/ECE/WP-29 Washington,
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Scaling Studies of Perceived Source Width Juha Merimaa Institut für Kommunikationsakustik Ruhr-Universität Bochum.
Listeners weighting of cues for lateral angle: The duplex theory of sound localization revisited E. A. MacPherson & J. C. Middlebrooks (2002) HST. 723.
Nuclear Accent Shape and the Perception of Syllable Pitch Rachael-Anne Knight LAGB 16 April 2003.
Modeling human localization in multiple sound-source scenarios Jonas Braasch Jens Blauert Institut für Kommunikationsakustik Ruhr-Universität Bochum
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
>>ITD.m running… IC 800Hz 40 sp/sec 34 O azim Neuron April 16, 2009 Bo Zhu HST.723 Spring 2009 Theme 3 Paper Presentation April 1, 2009.
PSYC Auditory Science Spatial Hearing Chris Plack.
Fletcher’s band-widening experiment (1940)
The role of reverberation in release from masking due to spatial separation of sources for speech identification Gerald Kidd, Jr. et al. Acta Acustica.
SPATIAL HEARING Ability to locate the direction of a sound. Ability to locate the direction of a sound. Localization: In free field Localization: In free.
Jonas Braasch Architectural Acoustics Group Communication Acoustics and Aural Architecture Research Laboratory (C A 3 R L) Rensselaer Polytechnic Institute,
Speech Enhancement Algorithm for Digital Hearing Aids
Auditory Localization in Rooms: Acoustic Analysis and Behavior
PSYCHOACOUSTICS A branch of psychophysics
Precedence-based speech segregation in a virtual auditory environment
Consistent and inconsistent interaural cues don't differ for tone detection but do differ for speech recognition Frederick Gallun Kasey Jakien Rachel Ellinger.
Loudness level (phon) An equal-loudness contour is a measure of sound pressure (dB SPL), over the frequency spectrum, for which a listener perceives a.
Attentional Tracking in Real-Room Reverberation
Loudness asymmetry in real-room reverberation: cross-band effects
Volume 62, Issue 1, Pages (April 2009)
CHAPTER 10 Auditory Sensitivity.
Volume 66, Issue 6, Pages (June 2010)
Volume 62, Issue 1, Pages (April 2009)
Ben Scholl, Xiang Gao, Michael Wehr  Neuron 
Speech Perception (acoustic cues)
Presentation transcript:

Source Localization in Complex Listening Situations: Selection of Binaural Cues Based on Interaural Coherence Christof Faller Mobile Terminals Division, Agere Systems Juha Merimaa Institut für Kommunikationsakustik, Ruhr-Universität Bochum

Complex listening situations Jazz Blaah, blaah, blaah Hum Speech source at -15º, good music at 50º, and noise through an open door at -125º azimuth

This work A model to extract binaural cues corresponding to human localization performance in several complex listening situations

Outline 1. Model descripiton 2. Simulation results A) Independent sources in free-field B) Precedence effect C) Independent sources and reverberation 3. Comparison with earlier models 4. Summary

HRTF/ BRIR 1 Left ear input Stimulus 1 HRTF/ BRIR N Right ear input Gammatone filterbank HRTF/ BRIR N HRTF/ BRIR 1 Stimulus N Internal noise Normalized cross-correlation & level difference calculation Model of neural transduction Exponential time window 10 ms Bernstein et al. 1999

Extraction of binaural cues Estimated at each time instant: – Interaural Time Difference (ITD) Time lag of the maximum of the normalized cross-correlation – Interaural Level Difference (ILD) Ratio of signal energies within time window – Interaural coherence (IC) Maximum of the normalized cross-correlation

Assumption for correct localization The auditory system needs to acquire ITD and ILD cues similar to those evoked by each source separately in an anechoic environment

Example: Two active sound sources Superposition with different level and phase relations at left and right ears For independent or non-stationary source signals: – Time-varying binaural cues – Reduced IC

How to obtain correct localization cues? Simply select ITDs and ILDs only when IC is above a set threshold – An adaptive threshold is assumed

Simulation results

1. Effect of number of sources Speech sources at same overall level (Hawley et al. 1999; Drullman & Bronkhorst 2000) – One or two distracters have little effect on localization performance – Performance is still good for 5 competing sources Simulations with different phonetically balanced sentences recorded by the same male speaker

Two talkers, ±40º azimuth 65 and 58 % selected signal power

3 and 5 talkers Simulated at 500 Hz critical band 3 talkers: 0º and ±40º azimu th 5 talkers: 0º, ±40º, and ±80º azimuth

3 talkers: c 0 = 0.99 p 0 = 54 % 5 talkers: c 0 = 0.99 p 0 = 22 % All cues Selected cues

2. Effect of target-to-distracter ratio Click-train target in presence of a white noise distracter – Target is localizable down to a few dB above detection threshold (Good & Gilkey 1996; Good et al. 1997) – High frequencies are more important for localization (Lorenzi & et al. 1999)

Simulation 2 kHz critical band White noise at 0º azimuth 100 Hz clicktrain at 30º azimuth -3, -9, and -21 dB absolute target-to- distracter ratios (T/D) – Corresponds to 8, 2, and -10 dB T/D relative to detection threshold, as defined by Good & Gilkey (1996)

-3 dB T/D c 0 = 0.990, p 0 = 3 % -9 dB T/D c 0 = 0.992, p 0 = 9 % -21 dB T/D c 0 = 0.992, p 0 = 99 % All cues Selected cues

Precedence effect Perception of subsequent sound events – Fusion – Localization dominance by the first event – Suppression of directional discrimination of latter events Depends on interstimulus delay – Summing localization (approx. 0-1 ms) – Localization dominance by first event (stimulus dependent, until 2-50 ms) – Independent localization

1. Click pairs Classical precedence effect experiment: Two consecutive clicks with same level from different directions

Lead: 40º, lag: -40º, ICI: 5 ms

Click pairs as a function of inter- click interval (ICI) Simulations for ICI between ms Same click sources: ±40º azimuth 500 Hz critical band A single threshold did not predict all cases correctly – Threshold was determined for each ICI such that the standard deviation of ITD is 15 μs

Click pairs as a function of ICI

Note on crossfrequency processing At certain small ICIs the required IC threshold gets very high – Anomalies of precedence effect have been reported for bandpass filtered clicks (Blauert & Cobben 1978) Some characteristic power peaks occur at different ICIs at different critical bands Across frequency band processing would allow extraction of correct cues

2. Sinusoidal tones and a reflection Steady state cues are a result of coherent summation of sound at the ears of a listener Localization depends on onset rate (Rakerd & Hartmann 1986) – Correct localization with a fast onset – Localization based on misleading steady state cues for tones with a slow onset

Sinusoidal tones: Simulation 500 Hz sinusoidal tone Direct sound from 0º azimuth Reflection after 1.4 ms from 30º Linear onset ramp Steady state level of 65 dB SPL

Sinusoidal tones: Results The model cannot as such explain discounting of the steady state cues Dependence on onset rate can be explained by considering cues at the time when signal level gets high enough above internal noise

Independent sources and reverberation Final test for the model Simulation at 2 kHz critical band – One speech sources at 30º azimuth – Two speech sources at ±30º azimuth BRIRs measured in a hall with RT = 1.4 s at 2 kHz octave band

All cues Selected cues 1 talker: c 0 = 0.99 p 0 = 1 % 2 talkers: c 0 = 0.99 p 0 = 1 %

Comparison with earlier models

Weighting of localization cues with signal power Not done outside 10 ms analysis window Contribution of each time instant to localization is defined by IC Model can neglect information corresponding to high power when due to concurrent activity of several sources Power still affects how often ITDs and ILDs of individual sources are sampled

Lindemann (1986) Based on contralateral inhibition using a fixed (10 ms) time constant Tends to hold cross-correlation peaks with high IC Differences – Operation of the cue selection method is not limited to the 10 ms time window – When necessary (complex situations), the “memory” of past cues can last longer

Zurek (1987) Localization inhibition controlled by onset detection In precedence effect conditions, the cue selection naturally derives most localization cues from onsets Differences – Cue selection is not limited to getting information from signal onsets

Summary A method was proposed for modeling auditory localization in presence of concurrent sound ITD and ILD cues are selected only when they coincide with a large IC Operation of the model was verified with results of several psychoacoustical studies from the literature

Thank you!