Auditory Scene Analysis

Auditory Scene Analysis
Auditory Neuroscience 6 Prof. Jan Schnupp

Understanding sound perception
Psychophysics Neurophysiology How the auditory system extracts the pitch, identity and spatial location of sounds is usually studied in the context of single sound sources present at any one time. However, most of the time the sound we receive is a mixture of sounds from multiple sources. Traditionally, studies of sound perception have usually taken 2 forms. Psychophysics examines the relation between the physical attributes of a sound stimulus and our percepts of them. Neurophysiological approaches examine changes in the activity of a neural population as a function of a particular stimulus parameter. But what role these neural populations play in shaping an animal’s perception of that stimulus remains largely a matter of conjecture. One approach designed to address this relation is the “neurometric” analysis. Neurometric analysis is a statistical approach that can be used to investigate whether stimulus related changes in firing rate within a population of sensory neurons could provide the “psychophysical signal” on which sensory discriminations are based. This has been used with great success in the visual and the somatosensory systems, but so far, to a much lesser extent in auditory research. Neurometrics

Auditory grouping and segregation

Tonotopicity of the auditory pathway
0.5 kHz 6 kHz 16 kHz

The Spectrogram The spectrum is a complete description of a sound only if the frequency composition of the sound is constant over time. However, natural sounds usually do vary with time. To deal with this, sounds can be divided into short time segments, and spectra calculated for each time segment in turn. The result of this analysis is called a spectrogram.

The Long Road from Spectrogram to Auditory Scene Analysis
The Neurogram idea is deceptively simple, and does not capture some of the fine scale temporal encoding the auditory system is capable of. It is nevertheless clear that the job of the auditory system is to perform some sort of “spectro-temporal analysis” to identify and localise “auditory objects”. The auditory system is astonishingly good at this job. To appreciate how hard this is, try to guess what this sound to the left represents.

Masking a tone by noise The auditory scene consists of separate tone and noise “objects”

Masking Tone By Noise

Masking Tone By Noise To decide which of the time periods delineated by the vertical stripes has a tone in it, an “ideal observer” would only look at the amount energy in the frequency band delineated by the vertical stripes. But is that how the brain analyses the sound to decide whether it “hears” a tone embedded in the noise?

Co-modulation masking release
If the noise fluctuates in amplitude, the tone becomes easier to detect: this is called “release from masking” Paradoxically, this is observed when fluctuating noise is added away from the frequency of the tone Moore 1999

Co-modulation masking release has a neurophysiological correlate

Co-modulation masking release has a neurophysiological correlate
Tone Onset Response to fluctuating noise Response to weak tone alone Response to tone plus fluctuating noise Response to loud tone alone Noise Stimulus Intracellular recordings from a neuron in auditory cortex to a tone in modulated noise. The noise response is suppressed in the presence of the tone. Las et al. 2005

“Gestalt Psychology” Principles
Similarity Good continuation Proximity Closure (Pattern Completion) Common Fate

The continuity illusion
The red line is obviously broken in two as you can see the gap. However, most people would see the blue line as continuous, assuming that it continues behind the green boxes. A similar effect in hearing…..

Visual objects can occlude each, but rarely mix
“Grouping”: Whose foot is this? Which parts of the image belong to what? Sound waves from different sources mix Grouping cues – Common onset – Pitch & harmonic structure – Interaural time differences

Common onset as a grouping cue
A: artificial vowel heard as /I/ or /e/ depending on where the first formant peak is. B: Moving the first formant peak toward 500 Hz makes the vowel sound more like /I/. But if the 500 Hz harmonic is started 32 ms or 240 ms earlier, it is no longer part of the vowel and it no longer contributes perceptually to the formant. The perception shifts to /e/. According to Bregman, common onset at <30 ms is one of the strongest grouping cues. AN Fig 6.4. Based on Darwin and Sutherland 1984

Pitch and harmonicity structure as a grouping cue
0 dB level difference 10 dB level difference A difference in pitch improves the identification of two vowels heard simultaneously % trials in which both vowels are correctly identified 20 dB level difference Difference in fundamental frequency of two simultaneously presented vowels de Cheveigné et al. 1997

“Unmixing” of Responses by Pitch
Responses of a “chopper” neuron from the cochlear nucleus to “I” at F0=88 Hz and “ae” at F0=112 Hz. “I” alone. “I” and some “ae” “ae” and some “I” “ae” alone. Note that the periodicity of the responses transitions quite rapidly as the ratio of I/ae changes. B and C aren’t mixtures of A+D. AN Fig 6.7. Based on Keilson et al 1997

Sequential Grouping Cues
Proximity Similarity Rhythm

Auditory stream formation
Increased frequency separation 2 streams of tones of different frequencies 3-tone galloping melody

https://auditoryneuroscience.com/scene-analysis/la-campanella

A possible neural correlate of auditory streaming
As the rate at which tones of alternating frequencies is increased, the response to the tone closest to the neurons best frequency dominates. Fishman et al. 2001

The Galloping Rhythm Paradigm

Buildup of Streaming It often takes time for an alternating tone or galloping rhythm sequence to break into two streams. How long it takes depends on the frequency separation and speed. It suggests that the brain starts with the simplest assumption that there is only one source or stream, and only introduces additional streams if enough evidence accumulates (“Occams razor”)

Build-up of streaming: behaviour and physiology
First triplet Last triplet Neuron firing rate Time (s) Human behaviour Probability of 2 streams Monkey physiology Time (s) Micheyl et al. 2005

Rhythm and Streaming When one sequence of alternating sounds has a regular rhythm, the other an irregular (“jittered”) rhythm, the two sequences are more likely to be heard as separate streams Rajendran and colleagues (2013 JASA-EL).

Mismatch Negativity and Deviant Detection
Figure 6.15 Mismatch negativity (MMN) to frequency deviants. (Left) The potential between electrode Fz (an electrode on the midline, relatively frontal) and the mastoid in response to a 1,000-Hz tone (dashed line) that serves as the standard, and deviant tones at the indicated frequencies. (Right) The difference waveforms (deviant-standard), showing a clear peak around 150 ms after stimulus onset. Note that negative potentials are plotted upward in this figure. From figure 1 of Naatanen et al. (2007).

Sensing changes in the auditory scene
The ‘mismatch negativity’ – difference in human event-related potentials to unexpected sounds (‘deviant’) embedded in a stream of expected sounds (‘standard’). Winkler et al. 2003

Neural correlates of scene segregation
Background noise Main chirp Natural birdsong Bar-Yosef & Nelken 2007

Listening to 2 people talking simultaneously
Speaker 1 (male) Speaker 2 (female) Mesgarani & Chang, 2012, Nature

Sensing changes in the auditory scene
Probability D Response amplitude Time (ms) Stimulus-specific adaptation in the responses of auditory neurons - Neurons tire of the same repetitive stimulus, but fire vigorously to a different, rare stimulus Ulanovsky et al. 2003

Further Reading Auditory Neuroscience – Chapter 6
Bregman AS (1990) Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press

Auditory Scene Analysis

Similar presentations

Presentation on theme: "Auditory Scene Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Auditory Scene Analysis

Similar presentations

Presentation on theme: "Auditory Scene Analysis"— Presentation transcript:

Similar presentations

About project

Feedback